ENCYCLOPEDIA OF
IMAGING SCIENCE TECHNOLOGY AND
ENCYCLOPEDIA OF IMAGING SCIENCE AND TECHNOLOGY
Editor Joseph P. Hornak Rochester Institute of Technology
Editorial Board Christian DeMoustier Scripps Institution of Oceanography William R. Hendee Medical College of Wisconsin Jay M. Pasachoff Williams College William Philpot Cornell University Joel Pokorny University of Chicago Edwin Przybylowicz Eastman Kodak Company
John Russ North Carolina State University Kenneth W. Tobin Oak Ridge National Laboratory Mehdi Vaez-Iravani KLA-Tencor Corporation
Editorial Staff Vice President, STM Books: Janet Bailey Vice-President and Publisher: Paula Kepos Executive Editor: Jacqueline I. Kroschwitz Director, Book Production and Manufacturing: Camille P. Carter Managing Editor: Shirley Thomas Editorial Assistant: Susanne Steitz
ENCYCLOPEDIA OF
IMAGING SCIENCE TECHNOLOGY AND
Joseph P. Hornak Rochester Institute of Technology Rochester, New York
The Encyclopedia of Imaging Science and Technology is available Online in full color at www.interscience.wiley.com/eist
A Wiley-Interscience Publication
John Wiley & Sons, Inc.
This book is printed on acid-free paper. Copyright 2002 by John Wiley & Sons, Inc., New York. All rights reserved. Published simultaneously in Canada. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4744. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158-0012, (212) 850-6011, fax (212) 850-6008, E-Mail:
[email protected]. For ordering and customer service, call 1-800-CALL-WILEY. Library of Congress Cataloging in Publication Data: Encyclopedia of imaging science and technology/[edited by Joseph P. Hornak]. p. cm. ‘‘A Wiley-Interscience publication.’’ Includes index. ISBN 0-471-33276-3 (cloth:alk.paper) 1. Image processing–Encyclopedias. 2. Imaging systems–Encyclopedias. I. Hornak, Joseph P. TA1632.E53 2001 2001046915 621.36 7 03–dc21 Printed in the United States of America. 10 9 8 7 6 5 4 3 2 1
PREFACE Welcome to the Encyclopedia of Imaging Science and Technology. The Encyclopedia of Imaging Science and Technology is intended to be a definitive source of information for the new field of imaging science, and is available both in print and Online at www.interscience.wiley.com/eist. To define imaging science, we first need to define an image. An image is a visual representation of a measurable property of a person, object, or phenomenon, and imaging is the creation of this visual representation. Therefore, imaging science is the science of imaging. Color plays an important role in many imaging techniques. Humans are primarily visual creatures. We receive most of our information about the world through our eyes. This information can be language (words) or images (pictures), but we generally prefer pictures. This is the reason that between 1200 and 1700 A.D., artists were commissioned to paint scenes from the Bible. More recently, this is the reason that television and movies are more popular than books, and many current day newspapers provide many pictures. We rely more and more on images for transmitting information. Take a newspaper as the first supporting evidence of this statement. The pictures in the newspaper are possible only because of image printing technology. This technology converts gray-scale pictures into a series of different size dots that allow our eyes to perceive the original picture. Chances are that some of the pictures in your newspaper are in color. This more recent printing technology converts the color image to a series of red, green, blue, and black dots for printing. That is the technology of printing an image, but let’s examine a few of the pictures in a typical newspaper in more detail to see how ubiquitous imaging science is.
from 10.5 to 202 light years away. This feat would never be possible if it were not for the powerful telescopes available to astronomers. Turning to the weather page, we see a 350-mile, fieldof-view satellite image of a typhoon in the South Pacific Ocean. This image is from one of several orbiting satellites devoted to providing information for weather forecasting. The time progression of the images from this satellite enables meteorologists to predict the path of the typhoon and save lives. Other images on this page of the newspaper show the regions of precipitation on a map (Fig. 1–see color insert). Images of this type are composites of radar reflective imagery from dozens of radar sighting. Although radar was developed in the mid-twentieth century, the ability to image precipitation, its cumulative amount and velocity, was not developed until the latter quarter of the century. Using this imagery, we can tell if our airline flights will be delayed or our weekend barbecue will be rained out. Another article shows a U.S. National Aeronautics and Space Administration (NASA) satellite image of a smoke cloud from wildfires in the Payette National Forest in Idaho, U.S. The smoke cloud is clearly visible, spreading 475 miles across the entire length of the state of Montana. Another U.S. National Oceanic and Atmospherics Administration (NOAA) infrared satellite image shows the hot spots in another fire across the border in Montana. This imagery will allow firefighters to allocate resources more efficiently to battle the wildfires. We take much of the satellite imagery for granted today, but it was not until the development of rockets and imaging cameras that this imaging technology was possible and practical. For example, before electronic cameras were available, some satellites used photographic film that was sent back to earth in a recovery capsule. This method is not quite the instantaneous form used today. Turning to the entertainment section, we see a description of the way computer graphics were used to generate the special effects imagery in the latest moving picture. The motion picture industry has truly been revolutionized by the special effects made available by digital image processing, which was originally developed to remove noise from images, enhance image quality, and find features in images. Those people who do not receive their information in printed form need only look at the popularity of television, film and video cameras, and photocopiers to be convinced that imaging science and technology touches our daily lives. Nearly every home and motel room has a television. Considering that the networked desktop computer is becoming a form of television, the prevalence of television is even greater. Most people in the developed world own or have used a photographic film, digital, or VCR camera. Photocopiers are found in nearly all businesses, libraries, schools, post offices, and grocery stores. In addition to these obvious imaging systems, there are lesser known ones that have had just as important an impact on our lives. These include X-ray, ultrasound, and magnetic resonance
A word about Color in the Encyclopedia: In the Online Encyclopedia all images provided by authors in color appear in color. Color images in the print Encyclopedia are more limited and appear in color inserts in the middle of each volume. The pictures of a politician or sports hero on the front page were probably taken with a camera using silver halide film, one of the more common imaging systems. Chances are that a digital camera was used if it is a very recent story on the cover page. The imaging technology used in these cameras decreases the time between image capture and printing of the newspaper, while adding editorial flexibility. Moving further into the paper, you might see a feature story with images from the sunken HMS Titanic located approximately 2 miles beneath the surface of the North Atlantic Ocean. The color pictures were taken with a color video camera system, one of the products of the late twentieth century. This specific camera system can withstand the 296 atmospheres of pressure and provide the illumination to see in total darkness and murky water. Another feature in the paper describes the planets that astronomers have found orbiting around stars v
vi
PREFACE
imaging (MRI) medical imaging systems; radar for air traffic control; and reflective seismology for oil exploration. Imaging science is a relatively new discipline; the term was first defined in the 1980s. It is informative to examine when, and in some cases how recently, many of the imaging systems, on which we have become so dependent, came into existence. It is also interesting to see how the development of one technology initiated new imaging applications. The time line in Table 1 (see pages ix–x). Summarizes some of the imaging-related developments. In presenting this time line, it is helpful to note that the first person who discovers or invents something often does not receive credit. Instead, the discovery or invention is attributed to the person who popularizes or mass-markets the discovery or invention. A good example is the development of the optical lens (1). The burning glass, a piece of glass used to concentrate the sun’s energy and start a fire, was first referred to in a 424 B.C. play by Aristophanes. The Roman philosopher, Seneca (4 B.C. –65 A.D.), is alleged to have read books by peering at them through a glass globe of water to produce magnification. In 50 A.D., Cleomedes studied refraction. Pliny (23–79 A.D.) indicated that Romans had burning glasses. Claudius Ptolemy (85–165 A.D.), the Egyptian born Greek astronomer and mathematician, mentions the general principle of magnification. Clearly, the concept of the lens was developed before the Dark Ages. But because this early technology was lost in the Dark Ages, reading stones, or magnifying glasses, are said to have been developed in Europe about 1000 A.D., and the English philosopher, Roger Bacon, is credited with the idea of using lenses to correct vision in 1268. Therefore, this historical account of imaging science is by no means kind to the actual first inventors and discoverers of an imaging technology. It is, however, a fair record of the availability of imaging technology in society. It is clear from the historical account that technologies are evolving and developing to increase the amount of visual information available to humans. This explosive growth and need to use new imaging technology was so great that the field of imaging science evolved. The field was called photographic science, but now it is given the more general term imaging science to include all forms of visual image generation. Because many are unfamiliar with imaging science as a discipline, this introduction defines the field and its scope. As stated earlier, an image is a visual representation of some measurable property of a person, object, or phenomenon. The visual representation or map can be one-, two-, three-, or more dimensional. For this encyclopedia, the representation is intended to be exact, not abstract. The measurable property can be physical, chemical, or electrical. A few examples are the reflection, emission, transmission, or absorption of electromagnetic radiation, particles, or sound. The definition of an image implies that an image can also be of a phenomenon, such as temperature, an electric field, or gravity. The device that creates the image is called the imaging system. All imaging systems create representations of objects, persons, and phenomena that are visual. These representations are intended to be interpreted by the human mind, typically using the eyes, but in some cases by an expert system.
Many imaging systems create visual maps of what the eye and mind can see. Others serve as transducers, converting what the eye cannot see (infrared radiation, radio waves, sound waves) into a visual representation that the eye and mind can see. Imaging systems evolve and keep pace with our scientific and technological knowledge. In prehistoric time, imaging was performed by individuals skilled in creating visual likenesses of objects with their hands. The imaging tools were a carving instrument and a piece of wood or a piece of charcoal and a cave wall. As time evolved, imaging required more scientific knowledge because the tools became more technologically advanced. From 5000 B.C. to about 1000 A.D., persons interested in creating state-of-the-art images needed knowledge of pigments, oils, and color mixing to create paintings or of the cleavage planes in stone to create sculptures. By the time photography was developed, persons wishing to use this imaging modality had to know optics and the chemistry of silver halides. The development of television further increased the level of scientific knowledge required by those practicing this form of imaging. Few would have thought that the amount of scientific knowledge necessary to understand television would be eclipsed by the amount necessary to understand current imaging systems. Many modern digital imaging systems require knowledge of the interaction of energy with matter (spectroscopy), detectors (electronics), digital image processing (mathematics), hard and soft copy display devices (chemistry and physics), and human visual perception (psychology). Clearly, imaging requires more knowledge of science and technology than it did thousands of years ago. The trend predicts that this amount of knowledge will continue to increase. Early applications of imaging were primarily historic accounts and artistic creations because the imaged objects were visual perceptions and memories. Some exceptions were the development of microscopy for use in biology and telescopy for use in terrestrial and astronomical imaging. The discovery of X rays in 1895 further changed this by providing medical and scientific needs for imaging. The development of satellite technology and positioning systems that have angstrom resolution, led, respectively, to additional applications of remote and microscopic imaging in the sciences. Currently scientists, engineers, and physicians in every discipline are using imaging science to visualize properties of the systems they are studying. Moreover, images are ubiquitous. Because they can be more readily generated and manipulated, they are being used in all aspects of business to convey information more effectively than text. The adage, ‘‘A picture is worth a thousand words’’ can be modified in today’s context to say, ‘‘A picture is worth 20 megabits,’’ because an image can in fact summarize and convey such complexities. Imaging science is the pursuit of the scientific understanding of imaging or an imaging technique. The field of imaging science continues to evolve, incorporating new scientific disciplines and finding new applications. Today, imaging is used by astronomers to map distant galaxies, by oceanographers to map the sea floor, by chemists to map the distribution of atoms on a surface, by physicians to map the functionality of the brain, and
PREFACE
by electrical engineers to map the electromagnetic fields around power transmission lines. We see the results of the latest imaging technologies in our everyday lives (e.g., the nightly television news contains instantaneous radar images of precipitation or three-dimensional views of cloud cover), and we can also personally use many of these technologies [e.g., digital image processing (DIP) algorithms are available on personal computers and are being used to remove artistic flaws from digital and digitized family and vacation photographs]. When attempting to summarize a new field, especially an evolving one like imaging science, a taxonomy should be chosen that allows people to see the relationships to other established disciplines, as well as to accommodate the growth and evolution of the new discipline. The information presented in this encyclopedia is organized in a unique way that achieves this goal. To understand this organization, the concept of an imaging system must first be developed. An imaging system is a device such as a camera, magnetic resonance imager, an atomic force microscope, an ultraviolet telescope, or an optical scanner. Most imaging systems have ten loosely defined components: 1. An imaged object, person, or phenomenon. 2. A probing radiation, particle, or energy, such as visible light, electrons, heat, or ultrasound. 3. A measurable property of the imaged object, person, or phenomenon, such as reflection, absorption, emission, or scattering. 4. An image formation component, typically based on focusing optics, projections, scans, holography, or some combination of them. 5. A detected radiation, particle, or energy, which may be different from the probing radiation, particle, or energy. 6. A detector, consisting of photographic film, a charge-coupled device (CCD), photomultiplier tubes, or pressure transducers, 7. A processor, which can be chemical as for photographs, or a computer algorithm as in digital imaging processing (DIP) techniques 8. An image storage device, such as photographic film, a computer disk drive, or computer memory. 9. A display, which can be a cathode ray tube (CRT) screen, photographic paper, or some other hard copy (HC) output. 10. An end user. Examples of the components of two specific imaging systems will be helpful here. In a photographic camera imaging system (see articles on still Still Photography and Instant Photography), the imaged object might be a landscape. The probing radiation may be visible light from the sun or from a strobe light (flash). The measurable property is the absorption and reflection of visible light by the objects in the scene. The image formation component is a lens. The detected radiation is visible light. The detector is many tiny silver halide crystals in the film (see article on Silver Halide Detector Technology). The processor is a
vii
chemical reaction that converts the exposed silver halide crystals to silver metal and removes the unexposed silver halide crystals. The storage device is the developed negative. The display is photographic paper (see article on Photographic Color Display Technology), and the end user is a person. In a magnetic resonance imaging (MRI) system (see article on Magnetic Resonance Imaging), the imaged object might be the human brain. The probing radiation is electromagnetic radiation in the radio-frequency range (typically 63 MHz). The measurable property is a resonance associated with the absorption of this energy by magnetic energy levels of the hydrogen nucleus. The image formation component is frequency and phase encoding of the returning resonance signal using magnetic field gradients. The detected radiation is at the same radio frequency. The detector is a doubly balanced mixer and digitizer. The processor is a computer. The storage device is a computer disk. The display device is a cathode ray tube (CRT; see article on Cathode Ray Tubes and Cathode Ray Tube Display Technology) or liquid crystal display (LCD; see article on Liquid Crystal Display Technology) computer screen, or a film transparency. The end user is a radiologist. The imaged object and end user play an important role in defining the imaging system. For example, even though ultrasound imaging (see article on Ultrasound Imaging) is used in medicine, nondestructive testing, and microscopy, the three imaging systems used in these disciplines have differences. The systems vary because the imaged objects (human body, manufactured metal object, and surface) are different, and the end users of the information (physician, engineer, and scientist) have unique information requirements. The way in which the human mind interprets the information in an image (see articles on Human Visual System) plays a big role in the design of an imaging system. Many imaging systems use some form of probing radiation, particle, or energy, such as visible light, electrons, or ultrasound. The radiation, particle, or energy is used to probe a measurable property of the imaged object, person, or phenomenon. Nearly every wavelength of electromagnetic radiation, type of particle, and energy has been used to produce images. The nature of the interaction of the radiation, particle, or energy with the imaged object, person, or phenomenon is as important as the probing radiation, particle, or energy. This measurable property could be the reflection, absorption, fluorescence, or scattering properties of the imaged object, person, or phenomenon. Systems that do not use probing radiation, particles, or energy typically image the emission of radiation, particles, or energy. This aspect of imaging science is so important that it is covered in detail in a separate article of the encyclopedia entitled Electromagnetic Radiation and Interactions with Matter. The image formation component is responsible for spatially encoding the data from the imaged object so that it forms a one, two, three, or more dimensional representation of the imaged object. Optics are the most frequently used image formation method (see article on Optical Image Formation) when dealing with ultraviolet
viii
PREFACE
(UV), visible (vis), or infrared (IR) radiation. Other image formation methods include scanning used in radar, MRI, and flat-bed scanners; point source projection used in Xray plane-film imaging; and hybrid methods used, for example in electrophotographic copiers (see article on Image Formation). Many imaging systems have image storage devices. The storage can be digital or hard copy, depending on the nature of the imaging system. There is much concern about the permanence of image storage techniques in use today. Many images are stored on CDs, whose lifetime is ∼10 years. Magnetic storage media face another problem: obsolescence. Many images are archived on magnetic media such as nine-track tape, and 8, 5.25, or 3.5 inch floppy disks. The drives for these devices are becoming more and more difficult to find. This concern about image permanence and the ability to view old images is not new. Figure 3 (see color insert) is an image of a mosaic created before 79 A.D. depicting the battle between Alexander the Great and the Persian Emperor Darius III in 333 B.C. This mosaic, from the House of the Faun in Pompeii, Italy, survived the eruption of Mt. Vesuvius that buried it in 79 A.D. It is claimed that this mosaic is a copy of an original painting created sometime after 333 B.C., but destroyed in a fire. (This image is noteworthy for another reason. It is a mosaic copy of a painting and therefore an early example of a pixelated copy of an image. The creator understood how large the pixels (tiles) could be and still convey visual information of the scene.) Will our current image storage formats survive fires and burial by volcanic eruptions? The display device can produce hard copy, soft copy, or three-dimensional images. Hard copy devices include dye sublimation printers (see Dye Transfer Printing Technology), laser printers (see article on Electrophotography), ink jet printers (see Ink Jet Display Technology), printing presses (see articles on Gravure Multi-Copy Printing and Lithographic Multicopy Printing), and photographic printers (see article on Photographic Color Display Technology). Soft copy devices include cathode ray tubes (see articles on Cathode Ray Tubes and Cathode Ray Tube Display Technology), liquid crystal displays (see article on Liquid Crystal Display Technology), and field emission displays (see article on Field Emission Display Panels). Three-dimensional displays can be holographic (see article on Holography) or stereo (see article on Stereo and 3D Display Technologies). Table 2 (see pages xi–xiii) will help you distinguish between an imaging system and imaging system components. It identifies the major components of various imaging systems used in the traditional scientific disciplines, as well as in personal and commercial applications. The information in this encyclopedia is organized around Table 2. There is a group of articles designed to give the reader an overview of the way imaging science is used in a select set of fields, including art conservation (see Imaging Science in Art Conservation), astronomy
(see Imaging Science in Astronomy), biochemistry (see Imaging Science in Biochemistry), overhead surveillance (see Imaging Science in Overhead Surveillance), forensics & criminology (see Imaging Science in Forensics and Criminology), geology (see Imaging Applied to the Geologic Sciences), medicine (see Imaging Science in Medicine), and meteorology (see Imaging Science in Meteorology). The articles describe the objects imaged and the requirements of the end user, as well as the way these two influence the imaging systems used in the discipline. A second class of articles describes specific imaging system components. These include articles on spectroscopy, image formation, detectors, image processing, display, and the human visual system (end user). Each of these categories contains several articles on specific aspects or technologies of the imaging system components. For example, the display technology category consists of articles on cathode-ray tubes, field emission panels, liquid crystals, and photographic color displays; and on ink jet, dye transfer, lithographic multicopy, gavure, laser, and xerographic printer. The final class of articles describes specific imaging systems and imaging techniques such as magnetic resonance imaging (see Magnetic Resonance Imaging), television (see Television Broadcast Transmission Standards), optical microscopy (see Optical Microscopy), lightning strike mapping (see Lightning Locators), and ground penetrating radar (see Ground Penetrating Radar), to name a few. The editorial board of the Encyclopedia of Imaging Science and Technology hopes that you find the information in this encyclopedia useful. Furthermore, we hope that it provides cross-fertilization of ideas among the sciences for new imaging systems and imaging applications. REFERENCES 1. E. Hecht, Optics, Addison-Wesley, Reading, 1987. 2. R. Hoy, E. Buschbeck, and B. Ehmer, Science, (1999). 3. The Trilobite Eye. http://www.aloha.net/∼smgon/eyes.htm 4. The Cave of Chauvet Pont d’Arc. Minist`ere de la culture et de la communication of France. http://www.culture.gouv.fr/culture/arcnat/chauvet/en/ 5. The Cave of Lascaux. Minist`ere de la culture et de la communication of France. http://www.culture.fr/culture/arcnat/lascaux/en/ 6. Brad Fortner, The History of Television Technology, http://www.rcc.ryerson.ca/schools/rta/brd038/clasmat/class1/ tvhist.htm 7. Electron Microscopy. University of Nebraska-Lincoln. http://www.unl.edu/CMRAcfem/em.htm 8. B.C. Breton, The Early History and Development of The Scanning Electron Microscope. http://www2.eng.cam.ac.uk/∼bcb/history.html
JOSEPH P. HORNAK, PH.D. Editor-In-Chief
Table 1. The History of Imaging ∼ 5 × 108
B.C.
30,000 B.C.
16,000 B.C.
15000 B.C.
12000 B.C.
5000 B.C. 4000 B.C.
3500 B.C.
3000 B.C.
2700 B.C. 2500 B.C.
1500 B.C. 1400 B.C. 300 B.C.
200 B.C. 9 A.D. 105
1000
1268
1450 1608
1590
Imaging systems appear in animals during the Paleozoic Era, for example, the compound eye of the trilobite (2,3). Chauvet-Pont-d’Arc charcoal drawings of on cave walls in southern France (4). Aurignacians of south Germany created sophisticated portable art in the form of ivory statuettes that have naturalistic features (4). The first imaging tools: paint medium — animal fat; support — rock and mud of secluded caves; painting tools — fingers, scribing sticks, blending and painting brushes; a hollow reed to blow paint on the wall (airbrush). Lascaux, France, cave drawings (5). Paintings were done with iron oxides (yellow, brown, and red) and bone black. Altamira, Spain, cave paintings made from charcoal, pigments from plants, soil, and animal fat and blood. Water-based paint was used in upper Nile, Egypt. Turpentine and alcohol were also available as paint thinners for oil-based paints in the region around the Mediterranean Sea. The first paperlike material was developed in Egypt by gluing together fibers from the stem of the papyrus plant. Memphis, Egypt, lifelike gold sculptures were being poured. Sumerians created mural patterns by driving colored clay cones into walls. Egyptians carved life-size statues in stone with perfect realism. Lead-based paints developed in Morocco, Africa. Vermilion (mercuric sulfide), a bright red pigment developed in China. Egg and casein (a white dairy protein) mediums used as paint in the Baltic Sea region. Ammonia available in Egypt for wax soap paints. Babylonia, art decorated structures with tile (early mosaics). Greece, representational mosaics constructed from colored pebbles. Aristotle described the principle of the camera obscura. Camera is Latin for room and obscura is Latin for dark. Roman sculpture was considered to have achieved high art standards. Chinese innovators create movable type, the precursor of the printing press. Cai Lun, the manager of the Chinese Imperial Arsenal, reported the production of paper from bark, hemp, rags, and old fishing nets. Alhazen studied spherical and parabolic mirrors and the human eye. Reading stones, or magnifying glass, were developed in Europe. The English philosopher and Franciscan, Roger Bacon, initiated the idea of using lenses to correct vision. Johannes Gutenberg originates the movable type printing press in Europe. Hans Lipershey, a Dutch spectacle maker, invents the telescope. (. . . or at least helped make it more widely known. His patent application was turned down because his invention was too widely known.)
1600s
1677 1798 1820s 1827
1839
1842
1855 1859
1860 1880
1884
1887 1891 1895
1897
1907
1913
ix
Hans and Zacharias Janssen, Dutch lens grinders, produced the first compound (twolens) microscope. English glass makers discover lead crystal glass, a glass of very high clarity that could be cut and polished. Anton van Leeuwenhoek, a Dutch biologist, invents the single-lens microscope. Alois Senefelder in Germany invents lithographic printing. Jean Baptiste Joseph Fourier develops his Fourier integral based on sines and cosines. Joseph Ni´epce produced the heliograph, or first photograph. The process used bitumen of Judea, a varnish that hardens on exposure to light. Unexposed areas were washed away leaving an image of the light reflected from the scene. Niepce Daguerre develops the first successful photographic process. The image plate or picture as called a daguerreotype. Alexander Bain proposes facsimile telegraph transmission that scans metal letters and reproduces an image by contact with chemical paper. Christian Andreas Doppler discovers wavelength shifts from moving objects. Known today as the Doppler effect, this shift is used in numerous imaging systems to determine velocity. Collotype process developed for printing high-quality reproductions of photographs. Gaspard Felix Tournachon collects aerial photographs of Paris, France, from a hot-air balloon. From 1863, William Bullock develops the web press for printing on rolls of paper. The piezoelectric effect in certain crystals was discovered by Pierre and Jacques Curie in France. The piezoelectric effect is used in ultrasound imagers. German scientist Paul Gottlieb Nipkow patented the Nipkow disk, a mechanical television scanning system. American, George Eastman, introduces his flexible film. In 1888, he introduced his box camera. Eastman’s vision was to make photography available to the masses. The Kodak slogan was, ‘‘You press the button, we do the rest.’’ H. R. Hertz discovers the photoelectric effect, the basis of the phototube and photomultiplier tube. The Edison company successfully demonstrated the kinetoscope, a motion picture projector. Louis Lumiere of France invented a portable motion-picture camera, film processing unit, and projector called the cinematographe, thus popularizing the motion picture camera. Wilhelm R¨ontgen discovers X rays. German physicist Karl Braun developed the first deflectable cathode-ray tube (CRT). The CRT is the forerunner of the television picture tube and computer monitor. Louis and Auguste Lumiere produce the Autochrome plate, the first practical color photography process. Frits Zernike, a Dutch physicist, develops the phase-contrast microscope.
Table 1. (Continued) 1915
1917
1925 1926
1927 1928
1929
1931
1934
1935
1938 1938 1940s 1941
1947
1949 1951
1953
1956
Perhaps motivated by the sinking of the HMS Titanic in 1912, Constantin Chilowsky, a Russian living in Switzerland, and Paul Lang´evin, a French physicist, developed an ultrasonic echo-sounding device called the hydrophone, known later as sonar. J. Radon, an Austrian mathematician, derived the mathematical principles of reconstruction imaging that would later be used in medical computed tomography (CT) imaging. John Logie Baird invented mechanical television. Kenjito Takayanagi of the Hamamatsu Technical High School in Tokyo, Japan, demonstrates the first working electronic television system using a cathode-ray tube (6). Philo T. Farnsworth, an American, demonstrates broadcast television (6). John L. Logie Baird demonstrates color television using a modified Nipkow disk. Sergei Y. Sokolov, a Soviet scientist at the Electrotechnical Institute of Leningrad, suggests the concept of ultrasonic metal flaw detection. This technology became ultrasound-based nondestructive testing. Vladimir Zworykin of Westinghouse invented the all-electric camera tube called the iconoscope (6). Max Knoll and Ernst Ruska of Germany develop the transmission electron microscope (TEM) (7). Harold E. Edgerton develops ultrahigh speed and stop-action photography using the strobe light. Karl Jansky of Bell Telephone Laboratories discovers a source of extraterrestrial radio waves and thus started radio astronomy. Corning Inc., Corning, New York, pours the first 5-m diameter, 66-cm thick borosilicate glass disk for a mirror to be used in the Hale Telescope on Palomar Mountain, near Pasadena, California. Robert A. Waston-Watt, a British electronics expert, develops radio detection and ranging (radar). The first unit had a range of ∼8 miles. Manfred von Ardenne constructed a scanning transmission electron microscope (STEM) (8). Electrophotography (xerography) invented by Chester F. Carlson. Moving target indication (MTI) radar or pulsed-Doppler radar developed. Konrad Zuse, a German aircraft designer, develops the first, electronic, fully programmable computer. He used old movie film to store programs and data. Edwin Herbert Land presents one-step photography at an Optical Society of America meeting. One year later the Polaroid Land camera was introduced. Rokuro Uchida, Juntendo University, Japan, builds the first A-mode ultrasonic scanner. Charles Oatley, an engineer at the University of Cambridge, produces the scanning electron microscope. Ian C. Browne and Peter Barratt, Cambridge University, England, apply pulsed-Doppler radar principles to meteorological measurements.
1957
1963
1965
1969
1971
1972
1973
1974 1975
1976 1981
1989 1990 1997
x
Charles Ginsburg led a research team at Ampex Corporation in developing the first practical videotape recorder (VTR). The system recorded high-frequency video signals using a rapidly rotating recording head. Gordon Gould invents light amplification by stimulated emission of radiation (laser). The Soviet Union launches Sputnik 1, the first artificial satellite to orbit the earth successfully. This success paved the way for space-based remote sensing platforms. Polaroid introduces Polacolor film that made instant color photos possible. Allan M. Cormack develops the backprojection technique, a refinement of Radon’s reconstruction principles. James W. Cooley and John W. Tukey publish their mathematical algorithm known as the fast Fourier transform (FFT). The charge-coupled device (CCD) was developed at Bell Laboratories. The first live images of man on the earth’s moon and images of the earth from the moon. Sony Corporation sells the first video cassette recorder (VCR). James Fergason demonstrates the liquid crystal display (LCD) at the Cleveland Electronics Show. Intel introduces the single-chip microprocessor (U.S. Pat. 3,821,715) designed by engineers Federico Faggin, Marcian E. Hoff, and Stan Mazor. Dennis Gabor receives the Nobel prize in physics for the development of a lensless method of photography called holography. Godfrey Hounsfield constructs the first practical computerized tomographic scanner. Pulsed-Doppler ultrasound was develops for blood flow measurement. Paul G. Lauterbur develops backprojection-based magnetic resonance imaging (MRI). Robert M. Metcalfe, from the Xerox, Palo Alto Research Center (PARC), invents hardware for a multipoint data communication system with collision detection which made the Internet possible. The first personal computers were marketed (Scelbi, Mark-8, and Altair). Richard Ernst develops Fourier-based magnetic resonance imaging (MRI) (see Fig. 2) Laser printer invented. First CCD television camera. Ink-jet printer developed. Gerd Karl Binnig and Heinrich Rohrer invent the scanning tunneling microscope (STM) which provides the first images of individual atoms on the surfaces of materials. The IBM personal computer (PC) was introduced. First Ph.D. program in Imaging Science offered by Rochester Institute of Technology. Hubble Space Telescope launched. Mars Pathfinder lands on Mars and transmits images back to earth.
xi
Imaged Object
Aviation Radar (ground-based) Radar (airborne) Storm scope Chemistry Autoradiograph Molecular modeling Molecular modeling Defense/Spy/Surveillance Night vision (IR) Satellite imaging Electrical Engineering E field imaging B field imaging E field imaging Forensics/Criminology Fluorescence
Visible X ray
MW MW EMR γ, β RF X ray IR Vis, IR RF RF RF UV
Tracers Molecules Molecules
Military Military
Fields Fields Fields
Material
Vis X ray
RF UV
RF X ray
Vis
UV
Radiation/ Energy
Planes Planes Lightning
Stars/planets Stars
Pulsars Stars
Artifacts Artifacts
Subsurface radar X ray
Astronomy Radio UV
Art/Artifact
Narrow BW optical
Archeology/Art Conservation Fluorescence Art
Application Field Technique
Table 2. Partial List of Imaging Technologies
Fluorescence
Absorption NMR Absorption
Emission Refl/IR-emis
Radioact. decay NMR Diffraction
Reflection Reflection Emission
Vis emission Emission
RF emission UV emission
Reflection Absorption
Refl/Abs
Fluorescence
Spectroscopy
Optics
Array k-space scan x, y scan
Optics Optics
Projection — —
θ, t scan θ, t scan Triangulation
Optics Projection
θ1 , θ2 scan Optics
θ, t scan Projection
Optics
Optics
Image Formation
CCD AgX
Resistive-LC mixer mixer
SSDA SSDA
AgX Mixer Scintillator-PMT
Mixer Mixer Triangulation
Mixer CCD Scintillator-AgX CCD SSDA Scintillator-AgX
Mixer Scintillator-CCD Scintillator-AgX
CCD
CCD
Detection
DIP Chemistry
Scan 2-DFT Scan
DIP DIP,MIS
Chemistry 2-DFT-space mod. Space modeling
θ, t scan θ, t scan Digital
DIP DIP Chemistry DIP DIP Chemistry
DIP Chemistry DIP Chemistry θ, t scan DIP Chemistry
Processing & Enhancement
EE EE EE Law enforcement Law enforcement
Camera SC + HC HC SC + HC HC − photo
Intelligence Intelligence
Scientists Chemists Chemists
HC − photo SC + HC SC + HC SC SC + HC
ATC Pilots Pilots
Astronomers Astronomers Astronomers Astronomers Astronomers Astronomers
Art Cons. Art Cons. Art Cons. Art Cons. Archeologists Archeologists Archeologists
End User
SC − CRT SC − CRT SC − LCD
SC + HC SC + HC HC SC + HC SC + HC HC
SC HC − photo SC HC − photo SC + HC SC HC − photo
Display
xii
Particle radiogram PET Thermography Ultrasound X ray Meteorology Doppler radar Lightning locator Satellite
MEG topography Nuclear med imaging Proton, neutron X ray IR Sound X ray MW EMR IR
Body Organ function Tissue temp. Soft tissue Hard tissue
Precipitation Lightning Clouds
Body Organ function
B field γ, β
X ray Sound E field E field E field Vis RF RF RF X ray
Medicine CT Doppler ultrasound EEG topography ECG topography EMG topography Endoscopy fMRI MRI MRA X ray angiography
Reflection Emission Absorption
Emission β + Annihilation Emission Reflection Absorption
MEG Nuclear decay
Absorption Reflection Emission Emission Emission Reflection NMR NMR NMR Absorption
Reflection Reflection Absorption
Vis Sound X ray
Soft/hard tissue Flow Brain activity Cardiac activity Muscle activity Internal organs Brain activity Soft tissues Flow Flow
Overhauser NMR
B field
Spectroscopy
Refl/emission Force NMR Refl/emission Reflection Reflection
Radiation/ Energy
Vis, IR Gravity RF Vis, IR RF MW
Imaged Object
Geology + Earth Resource Management Airborne imaging Earth Gravitation of imaging Mass NMR Oil/water Satellite imaging Earth Subsurface radar Geologic deposits Synthetic aperture Terrain radar Magnetic anomaly Ferromagnetic obj. imaging Machine Vision/Process Inspection Optical camera Production Ultrasound Mfg. material X ray Mfg. material
Application Field Technique
Table 2. (Continued)
θ, t,ν scan θ, amplitude Optics
Projection Coincidence Det. Optics θ, t scan Projection
Triangulation Tomo. Projection
Tomo. projection θ, t, ν scan Detector array Detector array Detector array Optics k-space scan k-space scan k-space scan Projection
Optics θ, t scan Projection
Scan
Optics x, y scan Scan Optics Scan Scan
Image Formation
Mixer Triangulation CCD
Scintillator-CCD Piezoelectric Digitizer Digitizer Digitizer CCD Mixer Mixer Mixer Scintillator-AgX Scintillator TV SQUID Scintillator-CCD Scintillator-AgX Particle detector Scintillator-CCD CCD Piezoelectric Scintillator-film
CCD Piezoelectric Scintillator-CCD Scintillator- AgX
Mixer
CCD Piezoelectric Mixer CCD φ, Array Mixer
Detection
θ, t, ν scan Triangulation DIP
Radon Transform θ, t, ν scan Triangulation Triangulation Triangulation Video 2-DFT 2-DFT 2-DFT Chemical/DIP Video Triangulation DIP Chemistry DIP Coincidence DIP θ, t scan Chemical
Artificial Intel. θ, t scan DIP Chemistry
FT
DIP/MIS x, y scan FT DIP/MIS DIP Time/Phase
Processing & Enhancement
Physicians Cardiologists Physicians Cardiologists Physicians Physicians Neurologists Radiologists Cardiologists Cardiologists Cardiologists Physicians Physicians Physicians Physicians Physicians Physicians Physicians Physicians Meteorologists Meteorologists Meteorologists
SC + HC SC + HC SC + HC SC + HC SC + HC SC − video SC + HC SC + HC SC + HC Film Video SC + HC SC + HC Film SC + HC SC + HC SC + HC Video + HC Film SC + HC SC + HC SC + HC
Intel. Syst. Inspectors Engineers Engineers
Geologists
SC + HC
Intel. Syst. SC + HC SC + HC Film
Geologists Geologists Geologists Geologists Geologists Geologists
End User
SC + HC SC + HC SC + HC SC + HC SC + HC SC + HC
Display
xiii Vis MW/RF Vis Sound B-field Vis Vis Vis Vis Vis Vis Vis IR
Objects e− spin Objects
Terrain Metal/oil
Experiences Vis memories Experiences Vis memories Vis memories Documents
People People
B field Ions Vis
Abs/Refl Abs/Refl
Abs/Refl Reflection Abs/Refl Reflection Reflection Abs/Refl
Reflection NMR-T2 *
Abs/scat ESR Reflection
Absorption E field SIMS
B field PIXE Refl.-tran/polar
Absorption Fluorescence
e− UV
Vis Surf. conductiv e−
Mag. domains Surfaces Materials
Magnetic force PIXE imaging Polarizing microscope
Abs/tran
Acoustic impe Auger E field Capacitance Reflection
Spectroscopy
Vis
Sound e− E field E field Vis
Radiation/ Energy
Surfaces Surfaces Surfaces
Surfaces Surfaces
Electron microscopy Fluorescence
Scanning confocal Scanning tunneling SIMS imaging Miscellaneous Ballistic photon ESR imaging Holography Oceanography Sonar Magnetometer Personal/Consumer Broadcast television Digital camera Movie Photography Video camera Xerography Surveillance Vis IR
Surfaces
Surfaces Surfaces Surfaces Surfaces Surfaces
Microscopy Acoustic force Auger imaging Atomic force Capacitive probe Compound
Confocal
Imaged Object
Application Field Technique
Table 2. (continued)
Optics Optics
Optics Optics Optics Optics Optics Hybrid
θ, t scan x, y scan
Optics/scan θ, x scan Holography
x, y scan x, y scan x, y scan
x, y scan x, y scan Optics
x, y scan Optics
Optics
x, y scan x, y scan x, y scan x, y scan Optics
Image Formation
CCD CCD
CCD CCD AgX AgX CCD Org. photoconductor
Piezoelectric Mixer
HS camera/detector Mixer AgX
Piezoelectric Scintillator PMT Piezoelectric Voltage Eye CCD Eye CCD CP optics Scintillator-film Scintillator-CCD Piezoelectric Scintillator-PMT AgX CCD PMT/Diode Current Mass spectrometer
Detection
—
DIP DIP
Electronic DIP Chemical Chemical DIP Physical
θ, t scan FT/time
Time Radon transform Chemical
— DIP Scan + DIP — DIP Scan Scan — DIP Scan Scan Scan
DIP
Scan Scan Scan Scan
Processing & Enhancement
Law enforcement Law enforcement
SC + HC SC + HC
Sailors Oceanographers
SC + HC SC + HC
Consumers Consumers Consumers Consumers Consumers Consumers
Scientists Scientists Scientists
SC + HC SC + HC Hologram
Video SC + HC Projection Photo Video HC − paper
Microscopists Surf. Scientists Microscopists Microscopists Bio/Scientists Bio/Scientists Microscopists Microscopists Microscopists Microscopists Microscopists Microscopists Surf. Scientists Microscopists Microscopists Microscopists Microscopists Surf. Scientists
End User
SC + HC SC + HC SC + HC SC + HC Eye SC + HC Eye SC + HC SC + HC HC − Film SC + HC SC + HC SC + HC Photo SC + HC SC + HC SC + HC SC + HC
Display
ACKNOWLEDGMENTS The Encyclopedia of Imaging Science and Technology is the work of numerous individuals who believed in the idea of an encyclopedia covering the new field of imaging science. I am grateful to all these individuals. I am especially grateful to these people: The authors of the articles in this encyclopedia who set their professional and personal activities aside to write about their specialities. The nine members of the Editorial Board who signed up to guide the development of this encyclopedia when it was just a vague idea. Their input and help molded this encyclopedia into its present form.
Susanne Steitz and the many present and past employees at John Wiley & Sons, Inc. for their help in making this encyclopedia possible. Dott. Bruno Alfano e il Consiglio Nazionale delle Ricerche — Centro per la Medicina Nucleare di Napoli, Italia, per avermi dato la possibilita’ di dedicare parte del mio anno sabbatico al completamento di questa enciclopedia. My wife, Elizabeth, and daughter, Emma, for their love and support during the writing and editing of this work. JOSEPH P. HORNAK, PH.D. Editor-In-Chief
xv
A ACOUSTIC SOURCES OR RECEIVER ARRAYS
the distances from sources 1 and 2 to that same point P are r1 and r2 , respectively. By linear superposition, the total acoustic pressure p at point P is simply the sum of the pressures created by each source, or
WILLIAM THOMPSON, JR. Penn State University University Park, PA
p(P) =
INTRODUCTION This article discusses the directional response characteristics (i.e., the beam patterns or directivity patterns) of arrays of time harmonic acoustic sources. A wonderful consequence of the acoustic reciprocity theorem (1), however, is that the directional response pattern of a transducer used as a source is the same as its directional response pattern used as a receiver. The arrays in question begin with the simplest, two identical point sources, and build up to line arrays that contain many point sources, continuous two dimensional arrays of sources, and discrete arrays of finite sized sources. Topics such as beam width of the main lobe, amplitudes of the side lobes, amplitude weighting to reduce the side lobes, and beam tilting and its consequences are discussed. Again, although the emphasis will be on the response of arrays of sources, the result (the directivity pattern) is the same for the same array used as a receiver. Harmonic time dependence of the form exp(jωt) at circular frequency ω is assumed and suppressed throughout most of the following discussion.
qc
d
r2
(2)
= r + (d/2) − rd cos(π/2 + θ ).
(3)
2
2
However, if one relaxes the demands on this response function to say that interest is only in the directivity pattern observed at distances r that are much greater than the size d of the array (the so-called acoustic far field), useful approximations of Eqs. (2) and (3) can be made: 1d sin θ + · · · r1 ≈ r 1 − (d/r) sin θ ≈ r 1 − 2r ≈r−
d sin θ, 2
(4)
1d sin θ + · · · r2 ≈ r 1 + (d/r) sin θ ≈ r 1 + 2r ≈r+
d sin θ. 2
(5)
Here, the binomial theorem was used to expand the square root of one plus or minus a small quantity. The term (d/2)2 in both Eqs. (2) and (3) was ignored because it has secondorder smallness compared to the (rd sin θ ) term. This same result can be obtained by arguing that when r is sufficiently large, all three lines r1 , r2 , and r are essentially parallel and can be related by simple geometrical considerations that immediately yield the results in Eqs. (4) and (5). A quantitative relationship for how large r must be compared to d so that the observation point P is in the far field will be discussed later. Equations (4) and (5) are to be used in Eq. (1), but the quantities r1 and r2 occur in two places, in an amplitude factor such as 1/r1 and in a phase factor such as exp(−jkr1 ).
P
r q
r21 = r2 + (d/2)2 − rd cos(π/2 − θ ), r22
Consider two, in-phase, equistrength point sources separated by a distance d, and a single point sensor located at observation point P, as shown in Fig. 1. This source configuration is sometimes called a bipole. The sources are located in an infinitely extending isovelocity fluid medium characterized by ambient density ρ0 and sound speed c. The origin of the coordinate system is chosen at the midpoint of the line segment that joins the two sources. The angle θ is measured from the normal to that line segment; alternatively, one could use the complementary angle θc measured from the line segment. The distance from the center point to point P is r. Similarly,
r1
(1)
where k is the acoustic wave number (= ω/c or 2π/λ, where λ is the wavelength) and Q is the strength of either source (the volume velocity it generates which is the integral of the normal component of velocity over the surface of the source; SI units of m3 /s). Now, what is really desired is a response function that describes how the pressure varies with the angle θ as a sensor is moved around on a circle of radius r. This requires expressing r1 and r2 as functions of r and θ . Unfortunately, using the law of cosines, these functional relationships are nonlinear:
TWO IN-PHASE POINT SOURCES
1
jρo ckQ −jkr1 jρo ckQ −jkr2 e + e , 4π r1 4π r2
q = 0°
2 Figure 1. Two, in-phase, equistrength point sources. 1
2
ACOUSTIC SOURCES OR RECEIVER ARRAYS
Another approximation that is made, which is consistent with the fact that r d, is 1 1 1 ≈ ≈ r1 r2 r
(6)
in both amplitude factors. However, in the phase factors, because of the independent parameter k that can be large, the quantity [(kd/2) sin θ ] could be a significant number of radians. Therefore, it is necessary to retain the correction terms that relate r1 and r2 to r in the phase factors. Hence, Eq. (1) becomes jρo ckQ −jkr j(kd/2) sin θ + e−j(kd/2) sin θ e e 4π r kd jρo ckQ −jkr = e sin θ . cos 2π r 2
p(P) =
(7)
in that plane is equidistant from the two identical but 180° out-of-phase sources whose outputs therefore totally cancel one another. The other difference in Eq. (8) is an additional phase factor of j which says the pressure field for this source configuration is in phase quadrature with that of the two in-phase sources. This factor is not important when considering just this source configuration by itself, but it must be accounted for if the output of this array were, for example, to be combined with that of another array; in that case, their outputs must be combined as complex phasors, not algebraically. As a special case of this source arrangement, imagine that the separation distance d is a small fraction of a wavelength. Hence, for a small value of kd/2, one can approximate the sine function by its argument, thereby obtaining p(P) =
−ρo ckQ 2π r
kd sin θ e−jkr , 2
for
kd 1. 2
Because the source array is axisymmetrical about the line segment that joins the two sources, so is the response function. If we had instead chosen to use the angle θc of Fig. 1, the term sin θ in Eq. (7) would simply become cos θc . The quantity in the brackets which describes the variation of the radiated pressure with angle θ , for a fixed value of r, is called the directivity function. It has been normalized so that its absolute maximum value is unity at θ = 0° or whenever [(kd/2) sin θ ] = nπ for integer n. Conversely, the radiated pressure equals zero at any angle such that [(kd/2) sin θ ] = (2n − 1)π/2. If kd/2 is small, which is to say the size of the source configuration is small compared to the wavelength, then the directivity function is approximately unity for all θ , or one says that the source is omnidirectional. On the other hand, if kd/2 is large, the directivity pattern, by which is meant a plot of 20 log10 |directivity function| versus angle θ , is characterized by a number of lobes all of which have the same maximum level. These lobes, however, alternate in phase relative to one another.
This configuration is known as the acoustic dipole. Its directivity function is simply the term sin θ (or cos θc ), independent of frequency. The directivity pattern shows one large lobe centered about θ = 90° , a perfect null everywhere in the plane θ = 0° , as discussed above, and another lobe, identical to the one centered about θ = 90° but differing in from it by 180° , and centered about θ = −90° . Because of the shape of this pattern plotted in polar format, it is often referred to as a ‘‘figure eight’’ pattern. Because of the multiplicative factor kd/2, which is small, the acoustic output of this source configuration is small compared to that of the in-phase case previously discussed. In fact, if the distance d were to shrink to zero so that the two 180° out-of-phase sources were truly superimposed on each other, the total output of the dipole would be zero.
TWO 180° OUT-OF-PHASE SOURCES
FOUR IN-PHASE COLLINEAR SOURCES
Referring to Fig. 1, imagine that the lower source is 180° out of phase from the upper one. Hence, the pressure field at point P would be the difference of the two radiated pressures rather than the sum, as in Eq. (1). Making the same far-field approximations as before, the total pressure at point P becomes p(P) =
kd −ρo ckQ −jkr e sin θ . sin 2π r 2
(8)
There are two evident differences between this result and that in Eq. (7) to discuss. The first is that now the directivity function is a sine function rather than a cosine, although the argument remains the same. This means that the positions of maxima and nulls of the pattern of the in-phase pair of sources become, instead, the respective positions of the nulls and the maxima of the 180° out-ofphase pair of sources. There is always a null in the plane defined by θ = 0° , the plane which is the perpendicular bisector of line segment d, because any observation point
(9)
Now, consider the source configuration of Fig. 2. This is a symmetrical line array of four, in-phase, point sources, equally spaced by distance d, where the relative strengths are symmetrical with respect to the midpoint of the line array. These relative strength factors, which incorporate many of the physical constants as well as the source strengths Q in either of the terms of Eq. (1), are denoted A1 and A2 here. Hence, by superposing the basic result in Eq. (7), the total pressure field at far-field point P is p(P) =
2 −jkr e [A1 cos u + A2 cos 3u], r
(10)
where u = (kd/2) sin θ . The quantity in the brackets is the (unnormalized) directivity function for this line array. The response is maximum in the direction θ = 0° , whereupon all of the cosine factors are unity. Therefore, the normalized directivity factor is obtained by dividing the quantity in the brackets by (A1 + A2 ).
ACOUSTIC SOURCES OR RECEIVER ARRAYS
and assuming that all sources are in phase with one another and that the relative strengths are symmetrical about the midpoint of the array, one obtains the far-field response function,
P
A2 A1
3
q
p(P) =
A1 d A2
2 −jkr 1 e A0 + A1 cos 2u + A2 cos 4u r 2 + · · · + AN cos 2Nu .
(12)
Figure 2. Symmetrical line array of four point sources.
SYMMETRICAL EVEN ARRAY OF 2N IN-PHASE SOURCES Extending the last result to a line array of 2N sources that are all in phase with one another, where equal spacing d is between any two adjacent sources and where the relative strengths are symmetrical about the midpoint (the centermost pair each have strength A1 , next pair outward from the center each have strength A2 , etc.), one obtains 2 −jkr e [A1 cos u + A2 cos 3u + · · · + AN cos(2N − 1)u]. r (11) The solid curve in Fig. 3 presents the beam pattern for six equistrength sources (i.e., all Ai = 1.0) and for d/λ = 2/3. The plot shows only one-quarter of the possible range of angle θ ; it is symmetrical about both the θ = 0° and the θ = 90° directions. The quantity on the ordinate of the plot, sound pressure level (SPL), equals 20 log |p(θ )/p(0)|. p(P) =
SYMMETRICAL ODD ARRAY OF (2N + 1) IN-PHASE SOURCES The analysis of an odd array is equally straightforward. Placing the origin of coordinates at the centermost source
0 −5 −10
SPL (dB)
−15 −20 −25 −30 −35 −40 −45 −50
0
10
20
30 40 50 60 Angle q (degrees)
70
80
90
Figure 3. Directivity pattern of symmetrical line array of six equispaced point sources, d/λ = 2/3. Solid curve: A1 = A2 = A3 = 1.0; Dashed curve: A1 = 1.0, A2 = 0.75, A3 = 0.35.
This directivity function differs in two regards from Eq. (11) for an even array. First, because the spacing between any two corresponding sources is an even multiple of distance d, all of the cosine arguments are even multiples of the parameter u and second, because the source at the midpoint of the array does not pair up with any other, its relative strength counts only half as much as the other pairs. Nevertheless, the beam pattern of an odd array is very similar to that of an even array of the same total length.
AMPLITUDE WEIGHTING By adjusting the relative weights of the sources (a process called shading the array), attributes of the beam pattern such as the width of the main lobe or the levels of the side lobes compared to that of the main lobe can be altered. The dashed curve in Fig. 3 illustrates the effect of decreasing the strengths of the six-source array in some monotonic fashion from the center of the array toward either end (A1 = 1.0, A2 = 0.75, A3 = 0.35). Invariably, such an amplitude weighting scheme, where the sources at the center of the array are emphasized relative to those at the ends, reduces the side lobes and broadens the width of the main lobe compared to uniform strengths along the array. Many procedures have been devised for choosing or computing what the relative weights should be to realize a desirable directivity pattern. Perhaps two of the most famous and most often used are those due to Dolph (2) and to Taylor (3). Dolph, appreciating features of the mathematical function known as the Chebyshev polynomial which would make it a desirable directivity function, developed a procedure by which the weighting coefficients of the directivity function in either Eq. (11) or (12) could be computed to make that directivity function match the Chebyshev polynomial whose order is one less than the number of sources in the array. This technique has become known as Dolph–Chebyshev shading. Its result is that all of the side lobes of the beam pattern have exactly the same level, which can be chosen at will. In the Taylor procedure, only the first few side lobes on either side of the main lobe are so-controlled and the remainder of them simply decrease in amplitude with increasing angle via natural laws of diffraction theory, as illustrated by the unshaded case (solid curve) in Fig. 3.
4
ACOUSTIC SOURCES OR RECEIVER ARRAYS
PRODUCT THEOREM
Z
The discussion so far has concerned arrays of point sources. Naturally, a realistic source has a finite size. The effect of the additional directivity associated with the finite size of the sources can be accounted for, in certain situations, by using a result known as the product theorem (4) (sometimes called the first product theorem to distinguish it from another one) which says that the directivity function of an array of N sources of identical size and shape, which are oriented the same, equals the product of the directivity function of one of them times the directivity function of an array of N point sources that have the same center-to-center spacings and the same relative amplitudes and phases as the original sources. NONSYMMETRICAL AMPLITUDES The directivity function of the in-phase array that has a symmetrical amplitude distribution, Eq. (11) or (12), predicts a pattern that is symmetrical about the normal to the line that joins the sources, the θ = 0° direction. It is interesting to note that even if the amplitude distribution is nonsymmetrical, as long as the sources are in phase, the pattern is still symmetrical about the θ = 0° direction. This can be seen by considering the simplest case of two point sources again, as in Fig. 1, but assuming that the upper one has strength A1 and the lower one has complex strength A2 exp(jα), where α is some arbitrary phase. In the manner previously discussed, one can show that the magnitude of unnormalized directivity function R(θ ) is
q0
r4
4
(13)
TILTING OR STEERING Figure 4 shows a line array of four, equistrength, equispaced point sources. The following analysis could be performed for any number of sources and for any strength distribution, but this simple example is very illustrative. The result will then be generalized. We wish to have the direction of maximum response at angle θ0 , as measured from the normal direction rather than at θ = 0° , as would be the case if they were all in phase. This can be accomplished if we ‘‘fool’’ the sources into ‘‘believing’’ that they lie at positions indicated by + along the line inclined by that same angle θ0 measured from the x axis, that is, we want to pseudorotate the original line array by the angle θ0 . Then the normal
d 2
1
r1 X
direction to that pseudoarray will make an angle θ0 with the normal direction to the actual array. This pseudorotation is accomplished by introducing time delays (or equivalent amounts of phase shift) into the relative outputs of the sources. Specifically, the output of #3 must be time delayed relative to that of #4 corresponding to the shorter propagation distance (d sin θ0 ) to #3; therefore, that time is τ3 = d sin θ0 /c. Because of the even spacing of the sources in this case, delay times τ2 and τ1 for sources #2 and #1 are two and three times time τ3 , respectively. The total pressure field at the usual far-field observation point at an arbitrary angle θ becomes (momentarily bringing back the previously suppressed harmonic time factor for illustrative purposes) Ae−jkr1 jω(t−τ1 ) Ae−jkr2 jω(t−τ2 ) e + e r1 r2 +
Now, if A1 = A2 but α = 0, this is an even or symmetrical function of argument u and hence of angle θ . However, for α = 0 or nπ , the latter form of Eq. (13) clearly indicates that R(θ ) is not symmetrical about u = 0, that is, about θ = 0° . In fact, introducing specific phase shifts into the sources along a line is a method of achieving a tilted or steered beam pattern.
r2
Figure 4. Geometry associated with tilting the direction of maximum response of an equispaced line array of four point sources to some angle θ0 measured from the normal to the line array.
+ (A1 − A2 )2 sin2 (u − α/2)]1/2 = [A21 + A22 + 2A1 A2 cos(2u − α)]1/2 .
r
3 q0
p(P) =
R(θ ) = [(A1 + A2 )2 cos2 (u − α/2)
r3
=
Ae−jkr3 jω(t−τ3 ) Ae−jkr4 jωt e + e r3 r4
Ae−jkr jωt j(3kd/2) sin θ −j3kd sin θ0 e + e j(kd/2) sin θ e−j2kd sin θ0 e e r +e−j(kd/2) sin θ e−jkd sin θ0 + e−j(3kd/2) sin θ . (14)
If one factors the quantity exp(−j3kd sin θ0 /2) out of every term and by analogy to the previous definition of quantity u defines u0 = kd sin θ0 /2, Eq. (14) becomes 2Ae−jkr jωt −j3u0 e e [cos 3(u − u0 ) + cos(u − u0 )] . r (15) The factor in brackets is the unnormalized directivity function of the tilted array. Because it predicts a maximum value at u = u0 which is to say θ = θ0 , the peak response has been steered or tilted in that direction. As opposed to introducing sequential time delays into the various outputs, as has been described, it seems that one could choose sequential amounts of phase shift so that at circular frequency ω, the phase shift for source n is ωτn . Then, Eq. (14) would be unchanged. This is true. However, one must note that if one uses true time delay elements to accomplish the beam tilting, the angle of tilt θ0 will be the same at all frequencies of operation. If, however, one chooses phase shift networks to accomplish the desired tilt p(P) =
ACOUSTIC SOURCES OR RECEIVER ARRAYS
at a certain frequency, the direction of tilt will change as frequency, changes, which may not be desirable. Generalizing this result for the array of four sources, from Eq. (11) for an even symmetric array of 2N sources, one obtains the unnormalized directivity function of the tilted array as R(θ ) = A1 cos(u − u0 ) + A2 cos 3(u − u0 ) + · · · + AN cos(2N − 1)(u − u0 ),
(16)
and from Eq. (12) for an odd symmetrical array of (2N + 1) sources, R(θ ) = 12 A0 + A1 cos 2(u − u0 ) + · · · + AN cos 2N(u − u0 ). (17) AMPLITUDE BEAM TILTING An interesting alternative interpretation of Eqs. (16) and (17) exists (5). In particular, if one were to expand each of the trigonometric functions of a compound argument in Eq. (16) by using standard addition theorems and then regroup the resulting terms, one would obtain R(θ ) = [A 1 cos u + A 2 cos 3u + · · · + A N cos(2N − 1)u] + [A
1 sin u + A
2 sin 3u + · · · + A
N sin(2N − 1)u], (18) where A 1 = A1 cos u0 , A 2 = A2 cos 3u0 , etc. and A
1 = A1 sin u0 , A
2 = A2 sin 3u0 , etc. Because u0 is a constant, these primed and double primed coefficients are simply two new sets of weighting coefficients that are functions of the original unprimed coefficients and of the angle θ0 to which one wishes to tilt. The two quantities in brackets in Eq. (18) can be readily interpreted. The first is the directivity function of a symmetrical line array of the same size as the actual array, except that it must have weighting coefficients, as prescribed by the primed values. The second term in brackets, with reference to Eq. (8), represents the directivity function of a line array of the same size as the actual array, except that it must have an antisymmetrical set of weighting coefficients, as prescribed by the double primed values; that means that a source on one side of the middle of the array must be 180° out of phase from its corresponding source on the other side of the middle. Equation (18) suggests that if one simultaneously excites the array by using the two sets of excitation signals so as to create both the phase symmetrical output (first term in brackets) and the phase antisymmetrical output (second term in brackets) and allows the two outputs to combine, a tilted beam will result, and there is no reference to employing time delays or phase shifts to accomplish the tilt. Although this is true, it must be noted that the two outputs must truly be added algebraically, as indicated in Eq. (18). Further reference to Eq. (8), as contrasted to Eq. (7), recalls the fact that the output of an antisymmetrical array is inherently in phase quadrature with that of an symmetrical array. This phase quadrature effect must be removed to combine the two
5
outputs algebraically, as required. Hence, for the concept discussed previously to work, it is necessary to shift one of the two sets of excitation signals by 90° , but that is all that is required — one 90° phase-shift network for the whole array, regardless of how many sources there are. The concept works just as well for the odd array described by Eq. (17), except that the centermost source does not participate in the antisymmetrical output. By simply reversing the polarity of one of the two outputs, that is, subtracting the two terms of Eq. (18) rather than adding them, one instead realizes a beam tilted at angle −θ0 . This technique, akin to the use of phase-shift networks to produce tilted beams, has the feature that because the primed and double primed weighting coefficients are frequency-dependent, the angle of tilt will change with frequency for a fixed set of weighting coefficients. Because different angles of tilt can be realized at a given frequency by changing only the weighting coefficients, this technique is called amplitude beam tilting. GRATING LOBES Although Eq. (11) (for an even array) and Eq. (12) (for an odd array) both predict that symmetrical, evenly spaced, in-phase arrays have a maximum response at u = 0(θ = 0° ), it is also evident that the absolute value of either of those directivity functions has exactly the same value as they have at u = 0 if the argument u is any integral multiple of π , that is, if sin θ = nλ/d
(19)
for integer n. This repeat of the main lobe is called a grating lobe, borrowing a term from optics where the effect is common. Equation (19) predicts that this effect cannot occur unless the source spacing d is equal to or greater than the wavelength. Hence, one can usually rule out the existence of problems associated with the occurrence of any grating lobes in the beam pattern by designing the source spacing at less than λ at the highest operating frequency. Because the directivity function of an even array involves cosines of odd multiples of the parameter u, it is noted that the successive grating lobes of an even array alternate in phase whereas those of an odd array are always in phase with one another because that directivity function involves cosines of even multiples of u.
U -SPACE REPRESENTATION If the directivity functions given in either Eqs. (11) or (12) were simply plotted (on a linear scale) as a function of the argument u, that finite Fourier series might yield a curve such as shown in Fig. 5. The peak that is centered at u = 0 is the main lobe. This is surrounded by a set of secondary maxima and minima called side lobes (because a beam pattern is a plot of the absolute value of the directivity function, the relative minima become, in fact, relative maxima) and then, when u = ±π , there is a repeat of the main lobe which is a grating lobe; if u is increased further,
6
ACOUSTIC SOURCES OR RECEIVER ARRAYS
Response function Main lobe
Individual transducer response function
Side lobes
−π
u
π
−u 90
u 90 Grating lobe
Figure 5. u-space representation of the directivity function of a symmetrical line array of six equispaced point sources.
one sees another pair of grating lobes at u = ±2π , etc. This plot happens to be for an even number of sources, in fact, six sources. It can be shown that the number of relative maxima and minima between adjacent grating lobes or between the main lobe and either of its adjacent grating lobes equals the number of sources. Now, in practice, there is a maximum value of u, and it occurs at θ = 90° . Therefore, this maximum value is denoted u90 , and its position on the abscissa of the plot in Fig. 5 is so indicated; it is assumed that d/λ is <1 and consequently u90 < π . Hence, in this case, one does not ‘‘see’’ the grating lobe. The actual aperture window is too narrow to admit the grating lobe. Also sketched on this plot is a curve that might be the directivity function associated with the finite size of any one of the sources (the dashed curve labeled ‘‘Individual transducer response function’’). According to the product theorem, the directivity pattern of this array is the product of these two curves. Note that the slight decrease in the latter curve, as u (or θ ) increases, would have the effect of decreasing the heights of the side lobes as θ increases, which might be considered a desirable effect. Turning to tilted or steered beams, Eqs. (16) or (17) indicate that either directivity function is simply translated or shifted by amount u0 in this u space. Figure 6 shows the same curve from Fig. 5 shifted to the left, (i.e., u0 is assumed to be negative). The actual aperture still extends from −u90 to +u90 as indicated, and the directivity function of the individual nonzero sized sources is exactly as before. But now we see that, as the main lobe is tilted to the left (negative θ in this case), the first grating lobe
on the right, which was not previously within the aperture window, has in fact been shifted almost entirely into the window, which is undesirable. This effect can be controlled only by reducing the d/λ value; the greater the angle θ0 to which one wishes to tilt, the smaller d/λ must be. Furthermore, upon multiplying the solid and dashed curves together, as mandated by the product theorem, one will be reducing the main lobe height relative to the side lobes, another undesirable effect. Finally, although the main lobe has a constant width in u space as the beam is tilted, that constant u value converts to an increasing difference in degrees as the tilt angle increases, because of the nonlinear relationship between u and θ , that is, the beam width of the main lobe increases as the main lobe is tilted. In summary, three potentially undesirable effects are associated with tilting beams: (1) the grating lobe situation is potentially more troublesome, (2) the side lobes are higher relative to the main lobe than those of the same untilted beam, and (3) the beam width increases with the tilt angle. CONTINUOUS LINE SOURCE The source distributions considered so far have been discrete arrays of point sources or perhaps of finite size, but the size is less than the spacing so that there are gaps between the sources. If one imagines that those gaps shrink to zero, so that we have a continuum, we can still obtain the radiated pressure field by mathematically discretizing the continuum into many small elements and
Response function
Individual transducer response function
Main lobe
u −π −u 90
u 90
π Grating lobe
Figure 6. u-space representation of the directivity function of a tilted or steered symmetrical line array of six equispaced point sources.
ACOUSTIC SOURCES OR RECEIVER ARRAYS
1
P(r,q)
rz q
z
r
−L /2 Figure 7. Geometry of a continuous line source of length L.
adding up, that is, integrating, their individual outputs. For the continuous line source of length L shown in Fig. 7, the total pressure at far-field point P is p(P) =
jρo ck −jkr e 4π r
−L/2
Qs (ζ )e jkζ sin θ dζ
(20)
and for the latter, kL sin θ 4 2 kL sin θ 4
sin2 R(θ ) =
= sinc2
kL sin θ . 4
(22)
These two functions are plotted in Fig. 8 for L/λ = 5.0. One notes that the linear taper strength function results in lower side lobes and a broader main beam, changes that were noted earlier for the shaded discrete line array. It can be shown that the first side lobe for the uniform strength case is about −13.3 dB below the peak, whereas for the linear taper case, the first side lobe is exactly a factor of 2 smaller on the decibel scale (≈−26.6 dB) because it is essentially the square of the previous result. Note that for a continuous source distribution, there is never a gating lobe. Equation (20) indicates that the pattern function p(P) is the Fourier transform of the strength distribution Qs (ζ ). This suggests that the strength distribution function should then be the inverse Fourier transform of the pattern function. This idea can be exploited to obtain the strength distribution needed to generate some desirable patterns. A difficulty, however, is that the necessary strength distribution that is computed must exist and be nonzero across an infinite length line and as soon as one truncates it, as one must, the resulting pattern is not exactly the desired pattern.
0.8 0.6 0.4 0.2 0 −0.2 −0.4
L/2
where Qs (ζ ) is the strength per unit length, which may be a function of position along the line, and ζ is a position variable along the line from the center point. Two of the simplest source distribution functions considered are a uniform strength line, Qs (ζ ) = a constant, and a linear taper function which varies from a maximum at the center to zero at either end, Qs (ζ ) = (1 − 2|ζ |/L). For the former, the normalized directivity function is the sinc function kL sin θ sin kL 2 R(θ ) = sin θ , (21) = sinc kL 2 sin θ 2
Normalized response function
L /2 dz
7
10
0
20
30 40 50 60 Angle q (degrees)
70
80
90
Figure 8. Normalized response function of a continuous line source. Solid curve: uniform strength; dashed curve: linear strength variation from center to either end.
CONTINUOUS PLANAR SOURCES The idea of discretizing an array into differentially sized pieces and then adding up the outputs of all such pieces to obtain the response function of a large array can easily be extended to a continuous planar, that is, two-dimensional, array. Figure 9 depicts an arbitrary region of the x, y plane that has a nonzero time harmonic normal velocity u. For computational convenience, we must assume that the entire region of the x, y plane outside of the active area has no velocity, that is, we assume that the piston source lies on an otherwise infinite plane rigid baffle so that radiation from the back side of the piston does not contribute to the acoustic field in the region z > 0. Although an infinite baffle is a mathematical fiction, it is a reasonably accurate model when the planar piston source is mounted in an enclosure whose dimensions are large compared to the wavelength. Because reflections from the baffle combine constructively with the direct radiation from the source, the pressure in the region z > 0 is effectively doubled. Alternatively, one can argue that the presence of the baffle can be simulated by introducing an in-phase image source immediately adjacent to the actual piston source, thereby effectively doubling its strength (6). Focusing on the one indicated differential patch of area dS in Fig. 9, the differential amount of pressure it creates
x dS
Active region of x, y plane
r′ q
y
Field pt. P (coords: r, q, f)
r z
Area S
Figure 9. Geometry of a two-dimensional radiator lying in the x, y plane; the region of the x, y plane outside the indicated active region has zero velocity.
8
ACOUSTIC SOURCES OR RECEIVER ARRAYS
at the observation point P, based on spherical coordinates (r, θ , φ), is jρo ck udS −jkr
e . (23) dp(P) = 2π r
Then, the total field at point P is obtained by integrating across the active area. This computation can be done, in theory, for an area of any shape or even for disjoint areas, but in practice there are only a few simple geometries for which it is possible to obtain analytic expressions for the radiated pressure and then, usually, only for observation points in the far field. Note that the task of integrating can be further complicated if the velocity distribution u is not constant but varies across the active area. BAFFLED CIRCULAR PISTON Perhaps the simplest shaped piston source for which it is possible to obtain an exact answer for the radiated pressure is the circular piston of radius a and uniform velocity u0 . This is also a practical case because it serves as a model for most loudspeakers as well as for many sonar and biomedical transducers. The integrations involved are not trivial but are well documented in a number of standard textbooks (7–9). The result is jρo cka2 u0 −jkr 2J1 ka sin θ e p(r, θ ) = 2r ka sin θ
BW = 2 sin−1 (0.26λ/a).
(25)
For the same geometry, it is possible to obtain an exact expression for the radiated pressure at any point along the z axis, that is, any point along the normal through the center of the circular piston: k 2 r + a2 + r p(θ = 0) = 2jρo cuo exp −j 2 k r2 + a2 − r , × sin 2
(26)
0.6
where r is now distance along the z axis. This expression is valid even for r = 0. If one plots the square of the absolute value of this expression versus r for an assumed value of ka, the curve will be characterized by a sequence of peaks of equal height, determined by the maxima of the sine squared function. There will be one last peak at approximately r = a2 /λ. Then, for r greater than this value, because the argument of the sine squared function actually becomes smaller, as r increases, one can make a small argument approximation to the sine squared function and to the square root function and show that the square of the radiated on-axis pressure varies as 1/r2 or that the pressure varies as 1/r. Hence, we can say that this is the distance to the so-called far-field. That distance is variously taken as 2a2 /λ or 4a2 /λ or sometimes π a2 /λ. It is not that the distance is arbitrary, but rather that the exact result asymptotically approaches the 1/r dependence for large r, and therefore, the greater the distance, the more accurate the approximation will be. The value π a2 /λ, which can be read as the active area divided by the wavelength, is a convenient metric for other shapes of radiators.
0.4
BAFFLED RECTANGULAR PISTON
(24)
where J1 is the Bessel function of the first kind of order one. The quantity in brackets is the normalized directivity function of the uniform strength, baffled, circular piston. Because of the axisymmetry of the source, the result is independent of the circumferential angle φ. Figure 10 is a plot of this function versus the dimensionless argument ka sin θ . This plot represents one-half of a symmetrical beam pattern, that is, θ varies in only one direction from the z axis of Fig. 9. Depending on the value of ka, only a portion of this curve is applicable; the largest value
1 0.8
2∗J-sub1(v )/v
of the argument is ka, which occurs at θ = 90° . If ka is small, one observes very little decrease in the response at θ = 90° , that is, any source that is small, compared to the wavelength, radiates omnidirectionally. However if ka is say 20, there will be about five side lobes on each side of the main lobe. Note that the level of the first side lobe on either side is about 17.6 dB lower than that of the main lobe and that subsequent side lobes are even lower. By noting the value of the argument corresponding to the √ 1/ 2 value of the main lobe, one can determine that the total −3-dB beam width (BW) of the main lobe is predicted by the equation
For a rectangular piston of dimensions (parallel to the x axis of Fig. 9) by w (parallel to the y axis) and uniform velocity, one can readily integrate Eq. (23) across the active area to obtain the far-field radiated pressure. For this case, the normalized directivity function is
0.2 0 −0.2
0
5
10
15 Argument v
20
25
30
Figure 10. Normalized response function of a baffled circular piston radiator of radius a and uniform normal velocity; argument v = ka sin θ.
R(θ ) =
sin sin w = Sinc( )Sinc(w ) w
(27)
where = (k/2) sin θ cos φ and w = (kw/2) sin θ sin φ. Because the source distribution is not axisymmetric, the
ANALOG AND DIGITAL SQUID SENSORS
directivity function depends on both the coordinate angles φ and θ that locate the far-field observation point P. This expression is a special case of a result known as the second-product theorem (10) which states that the pattern function of a planar radiator whose velocity distribution can be written as the product of a function of one coordinate variable and another function of the orthogonal coordinate variable, for example, [u(x, y) = f (x)g(y)], equals the product of the pattern functions of two orthogonal line arrays, one that lies along the x axis and has velocity distribution f (x), and the other, that lies along the y axis and has velocity distribution g(y). For a piston of uniform velocity, f (x) = g(y) = 1.0, and therefore the two line array pattern functions, as noted earlier, are sinc functions.
9
10. V. M. Albers, Underwater Acoustics Handbook II, The Pennsylvania State University Press, University Park, PA, 1965, p. 188. 11. P. M. Morse and K. U. Ingard, Theoretical Acoustics, McGraw Hill, NY, 1968, pp. 332–366. 12. M. C. Junger and D. Feit, Sound, Structures, and Their Interaction, 2nd ed., The MIT Press, Cambridge, MA, 1986, pp. 151–193. 13. H. Schenck, J. Acoust. Soc. Am. 44, 41–58 (1968). 14. G. W. Benthien, D. Barach, and D. Gillette, CHIEF Users Manual, Naval Ocean Systems Center Tech. Doc. 970, Sept. 1988.
ANALOG AND DIGITAL SQUID SENSORS NONPLANAR RADIATORS ON IMPEDANCE BOUNDARIES The problem of computing the radiation from source distributions that lie on nonplanar surfaces is considerably more complicated and in fact tractable only in an analytic sense for a few simple surfaces such as spheres and cylinders. These results are well documented in some standard textbooks (11,12) . For sources on an arbitrary surface, one must resort to numerical procedures, particularly an implementation of the Helmholtz integral equation such as the boundary value program CHIEF described in (13,14). Furthermore, if the supporting surface is an elastic body or a surface characterized by some finite locally reacting impedance, a finite-element model of the boundary reaction coupled with a boundary value program such as CHIEF must be used to compute the radiated sound field.
ABBREVIATIONS AND ACRONYMS SI m s dB BW
System International meters seconds decibel beamwidth
BIBLIOGRAPHY 1. L. E. Kinsler, A. R. Frey, A. B. Coppens, and J. V. Sanders, Fundamentals of Acoustics, 4th ed., J. Wiley, NY, 2000, pp. 193–195.
MASOUD RADPARVAR HYPRES, Inc. Elmsford, NY
INTRODUCTION Superconducting quantum interference devices (SQUIDs) are extremely sensitive detectors of magnetic flux that can be used as low-noise amplifiers for various applications such as high resolution magnetometers, susceptometers, neuromagnetometers, motion detectors, ultrasensitive voltmeters, picoammeters, readout of cryogenic detectors, and biomagnetic measurements. A SQUID is a superconducting ring interrupted by one or two Josephson tunnel junctions made of a superconductor such as niobium (Nb) (Fig. 1). Nb is the material of choice for superconducting circuits because it has the highest critical temperature among the elemental superconductors (9.3 K). This alleviates certain concerns for uniformity and repeatability that must be taken into account when using compound superconductors. Josephson tunnel junctions using niobium as electrodes and thermally oxidized aluminum as tunnel barriers have been investigated rather extensively (1) and have been brought to a state of maturity suitable for medium-scale integrated circuits by several laboratories. Figure 2a shows the characteristics of a Josephson tunnel junction. One of the unique features of a Josephson junction is the presence of a tunnel current through the device at zero bias voltage. The magnitude of this current, which is normally referred to as the Josephson current, depends
2. C. L. Dolph, Proc. IRE 34, 335–348 (1946). 3. T. T. Taylor, IRE Trans. Antennas Propagation AP-7, 16–28 (1955).
I /P
4. L. E. Kinsler et al., Op. Cit., p. 199.
X
5. W. J. Hughes and W. Thompson Jr., J. Acoust. Soc. Am. 5, 1040–1045 (1976).
F/ B
6. L. E. Kinsler et al., Op. Cit., pp. 163–166. 7. L. E. Kinsler et al., Op. Cit., pp. 181–184. 8. D. T. Blackstock, Fundamentals of Physical Acoustics, Wiley, NY, 2000, pp. 445–451. 9. A. D. Pierce, Acoustics: An Introduction to its Physical Principles and Applications, McGraw Hill, NY, 1981, pp. 225–227.
Resistive shunts
SQUID loop
Josephson junction
Figure 1. Circuit diagram of a dc SQUID. The feedback (F/B) and X coils are coupled to the SQUID loop through multiturn coils whose number of turns is determined by the required sensitivity.
10
ANALOG AND DIGITAL SQUID SENSORS
Current (0.1 mA/div.)
(a)
Voltage (0.5 mV/div.)
gauss-cm2 (h = Planck’s constant, e = electron charge). This sensitivity of the Josephson current to an external magnetic field is exploited in building ultrasensitive SQUID-based magnetometers. Such a magnetometer can be easily transformed into a picoammeter or a low-noise amplifier for many applications by coupling its transformer directly with the output of a detector. A typical SQUID circuit process uses Josephson junctions 3–10 µm2 in area whose critical current density is 100–1000 A/cm2 . These circuits are fabricated by a multilayer process using all niobium electrodes and wiring, an aluminum oxide tunnel barrier, Mo or Au resistors, Au metallization, and SiO2 insulating layers. Table 1 lists the layers and their typical thicknesses. The layer line widths are nominally 2.5 µm for a 5-µm pitch. The overall dimensions of a SQUID circuit including its integrated coils are about 1000 × 3000 µm2 .
(b) Voltage (20 µV/div.)
ANALOG SQUIDS
F0 Figure 2. (a) High-quality Josephson junctions are routinely fabricated from conventional materials such as niobium. (b) Voltage across a shunted SQUID as a function of an external magnetic field.
on a magnetic field. When two Josephson junctions are paralleled by an inductor (Fig. 1), their combined Josephson currents as a function of an applied field to the SQUID loop by either the coupled transformer (X) or feedback (F/B) coil exhibit a characteristic similar to that shown in Fig. 2b, which is normally called an interference pattern. The periodicity of the pattern exhibited in this characteristic is multiples of the flux quantum o = h/2e = 2.07 × 10−7
The first SQUID-based device was used as a magnetometer in 1970 by Cohen et al. (2) to detect the human magnetocardiogram. Since then, this type of magnetic flux detector has been used as the main tool in biomagnetism. The SQUID is inherently very sensitive. Indeed, the principal technical challenge by researchers has been mainly discrimination against the high level of ambient noise. A SQUID amplifier chip has a SQUID gate coupled to a transformer (Fig. 3a). The transformer is, in turn, coupled to a matched input coil (pickup loop). The SQUID and its associated superconducting components are maintained at 4.2 K by immersion in a bath of liquid helium in a cryogenic dewar. Superconducting transformers are essential elements of all superconducting magnetometer/amplifier systems. The SQUID and its transformer are usually surrounded by a superconducting shield to isolate them from ambient fields. The transformer acts as a flux concentrator. In such an input circuit (the transformer + input coil), the flux trapped is fixed, and subsequent changes in the applied field are canceled exactly by changes in the screening current around the input circuit. Because no energy dissipates, the
Table 1. A Typical Standard Niobium Process for Fabricating SQUID Magnetometers Layer
Material
Thickness (nm)
Purpose
1 2 3
Niobium Silicon dioxide Niobium Aluminum oxide Niobium Molybdenum Silicon dioxide Niobium Silicon dioxide Niobium Gold
100 150 135 ∼15 100 100 200 300 500 500 600–1,000
Ground plane Insulator Base electrode Tunnel barrier Counterelectrode 1 /square resistor Insulator First wiring Insulator Second wiring Pad metallization
4 5 6 7 8 9 10
ANALOG AND DIGITAL SQUID SENSORS
Signal coil
(a)
Base electrode
Josephson junctions
Counter-electrode
SQUID loop
(b)
Multiturn transformer
Contact pads
Josephson junctions
Figure 3. (a) Layout schematic of a dc SQUID. The transformer is coupled to the SQUID via the hole in the plane of the SQUID. The shunting resistors across Josephson junctions are not shown in this diagram. (b) Photograph of a dc SQUID chip.
11
SQUID loop. This tuned circuitry is driven by a constant current RF oscillator. RF SQUID-based circuits were slowly phased out because of their lower energy sensitivity (5 × 10−29 J/Hz at a pump frequency of 20 MHz), compared with dc SQUIDs. The dc SQUID differs from the RF SQUID in the number of junctions and the bias condition. The dc SQUID magnetometer, which was first proposed by Clarke (3), consists of the dc SQUID exhibited in Fig. 1 where the Josephson junctions are resistively shunted to remove hysteresis in the current–voltage characteristics and the transformer is a multiturn coil tightly coupled to the SQUID loop. When the dc SQUID is biased at a constant current, the expected behavior to an external magnetic field is through periodic dependence of output voltage on input magnetic flux. Figure 2b shows such a characteristic for a Nb/AlOx/Nb SQUID. The SQUID’s sensitivity to an external field is improved significantly by coupling the device to the multiturn superconducting flux transformer. Figure 3b shows a photograph of an actual SQUID chip and its integrated transformer. It is estimated that the minimum energy sensitivity for dc SQUIDs is on the order of h (Planck’s constant, 6.6 × 10−34 J/Hz). dc SQUIDs that operate at 5h have been fabricated and evaluated in the laboratory (4). The performance of these Josephson tunnel junction SQUIDs is actually limited by the Johnson noise generated in the resistive shunts used to eliminate hysteresis in the current–voltage characteristics. To use a SQUID as an amplifier, its periodic transfer characteristic should be linearized. This linearization also substantially increases the dynamic range of the SQUID circuit. Figure 4 shows a dc SQUID system and its peripheral electronics. The function of the feedback coil is to produce a field that is equal but opposite to the applied field. The circuit uses a phase-lock loop (lock-in amplifier) and a reference signal to facilitate a narrowband, lock-in type of measurement. The output is proportional to the feedback current, hence, to the amount of flux required to cancel the applied field, and is independent of the voltage–flux (transfer) characteristic shown in Fig. 2b. Thus, the SQUID circuit that has a feedback coil, lock-in
Transformer
Step-up transformer
I /P
superconducting input circuit is a noiseless flux-tocurrent transducer. The loop operates at arbitrarily low frequencies, thus making it extremely useful for very low frequency applications. The field imposed on the SQUID loop by the input circuit is transformed into a voltage by the SQUID and is sensed by an electronic circuit outside the dewar. Early systems used RF SQUIDs because they were easy to fabricate. The prefix RF refers to the type of bias current applied to the SQUID. An RF SQUID is a single junction in a superconducting loop. Magnetic flux is generally inductively coupled into the SQUID via an input coil. The SQUID is then coupled to a high-quality resonant circuit to read out the current changes in the
Pre-amp.
Phasesensitive detector
Dc amp.
Feedback coil ~
Output
Feedback current Figure 4. A dc SQUID chip and peripheral electronics. A feedback circuit is needed to linearize SQUID characteristics. The dotted line shows the components normally at cryogenic temperature. The output resistor can be on- or off-chip.
12
ANALOG AND DIGITAL SQUID SENSORS
amplifier, oscillator, dc amplifier, and on-chip transformer coil serves as a null detector. The feedback electronics also determine the dynamic range (ratio of the maximum detectable signal and minimum sensitivity) and the slew rate (the maximum allowable rate of change of flux coupled to the SQUID loop with time) of the system. The maximum measurable signal is determined by the maximum feedback current available from the peripheral electronics. The slew rate is determined by the speed by which the feedback current can null out the applied field, which is dictated by the speed of the peripheral electronics, as well as the physical distance between the electronics and the SQUID chip. To simplify the room temperature electronics and eliminate the bulky transformer, which limits the system bandwidth, Drung et al. (5) developed and demonstrated a dc SQUID magnetometer that has additional positive feedback (APF). In this case, an analog SQUID is operated without impedance matching and without flux modulation techniques. For a minimum preamplifier noise contribution, the voltage–flux characteristic of a dc SQUID (Fig. 2b) should be as steep as possible at the SQUID’s operating point. To steepen the voltage–flux characteristic at the operating point, Drung’s novel circuit directly couples the SQUID voltage via a series resistor to the SQUID loop. This current provides additional positive feedback and steepens the voltage–flux characteristic at the operating point at the expense of flattening the curve that has an opposite slope. For a properly designed SQUID that has adequate gain, such a technique allows reading out the voltage across the dc SQUID without the need for the step-up transformer. Alternatively, the peripheral circuitry can be either integrated on-chip with the SQUID gate, as in a digital SQUID, or can be minimized by exploiting the dc SQUID array technique. The digital SQUID is suitable for multichannel systems, where it virtually eliminates the need for all of the sophisticated room temperature electronics and reduces the number of leads to a multichannel chip to less than 20 wires. This method reduces system cost and heat leak to the chip significantly at the expense of a very complex superconducting chip. On the other hand, the DC SQUID array chip relies on relatively simple circuitry and is ideal for systems where the number of channels is limited and the heat load to the chip is manageable. In this case, an analog system becomes more attractive than a system based on digital SQUID technology. DC SQUID ARRAY AMPLIFIERS A dc SQUID array amplifier chip consists of an input SQUID that is magnetically coupled to an array of 100–200 SQUIDs (Fig. 5) (6). An input signal couples flux into the input SQUID, which is voltage-biased to a very small resistor, typically 0.05 , so that the SQUID current is modulated by variations in the applied flux. The flux modulation coil of the output SQUID is connected in series with the input SQUID, so that variation in the input SQUID current changes the flux applied to the output
Feedback
Bias
Output Integrator
Input
SQUID
Figure 5. Circuit schematic of the two-stage dc array amplifier. The analog SQUID is coupled to an array of dc SQUIDs through coupling series inductors. The room temperature amplifier can be directly connected to the array. The signal coil is connected to the input transformer, and the feedback coil can be connected to a simple integrator attached to the output of the room temperature amplifier.
array. The series array is biased at a constant current, so the output voltage is modulated by this applied flux. The external field sensed by the high-sensitivity analog SQUID is converted to a current that is applied, through the series array of inductors, to the series dc SQUIDs (Fig. 5). These dc SQUIDs have voltage–flux characteristics similar to those of Fig. 2b, but about 100 times larger in amplitude due to the summation of the voltages in the output series SQUID array. Because the output current from the front end analog SQUID is relatively large, we do not need high-sensitivity dc SQUIDs for the dc SQUID array. This significantly simplifies the SQUID layout density and fabrication. The series array of dc SQUIDs can generate a dc voltage on the order of nanovolts, which can be used directly by room temperature electronics and feedback circuitry to read out and provide feedback to the high-sensitivity input analog SQUID circuit. Because the output voltage of the SQUID array is very large, no special dc amplifiers or step-up transformers are required. The input SQUID stage normally consists of a lowinductance, double-loop SQUID that has a matchedinput flux transformer (Fig. 6). The two SQUID loops are connected in parallel, and the junctions are located between them. Therefore, the SQUID inductance is half of the individual loop inductance. Two multiturn modulation coils are wound (in opposite directions) on the SQUID loops, one on each side, and connected in series. This double coil is connected to a load or an input coil. An additional coil, as feedback, can be integrated to permit operation in a flux-locked loop. DIGITAL SQUID SENSORS The high resolution data obtained from SQUID magnetometers in biomedical applications such as magnetoencephalograms show remarkable confirmation of the power of magnetic imaging. To localize the source of the signal, which could be a single-shot event, it is important
ANALOG AND DIGITAL SQUID SENSORS
13
Feedback loop
I/P coil
Write gate 1
Write gate 2
Analog SQUID Buffer
Comparator
Output Figure 7. Block diagram of a digital SQUID chip.
Figure 6. A dc SQUID that has two SQUID loops minimizes the effect of an external field.
to have a large number of simultaneous recording channels. However, as the number of channels increases, the wiring requirements (four wires per an analog SQUID), the physical size of available dc SQUIDs and SQUID electronics, and the cost become prohibitive. For example, a 128-pixel system, using conventional bias and readout circuitry, requires at least 512 wires attached to the superconducting chip and needs a set of 128 channels of sophisticated electronics. A bundle of wires containing 512 connections between superconducting chips held at 4.2 K and room temperature represents a 4-W heat leak, or consumption of about 3 liters of liquid helium per hour. Consequently, one must turn to multiplexed digital SQUIDs for such imaging applications. The concept of digital SQUID magnetometry is rather new and, thus, deserves more attention. A digital SQUID integrates feedback electronics into the SQUID circuit and eliminates the need for modulation and a phase-sensitive detector. A multiplexed digital SQUID requires less than 20 connections to a chip and minimal support electronics by introducing a very sophisticated superconducting chip. Consequently, heat load from the leads to the chip is substantially reduced to a few hundred milliwatts or less due to the reduction in the number of connections. A complete digital SQUID amplifier consists of a highsensitivity analog SQUID coupled to a comparator and integrated with a feedback coil. Figures 7 shows the block diagram of a digital SQUID chip. In this case, the pickup coil is in series with the feedback circuit and is integrated with the analog SQUID front end. The write gates induce currents in the feedback loop for positive and negative
applied magnetic fields to maintain the circulating current in the feedback coil near zero. Due to flux quantization, the induced currents create magnetic fields in the feedback loop in multiples of flux quanta. A quantum of flux is normally called a fluxon or antifluxon, depending on the direction of its magnetic field. The distinct advantage of the digital SQUID amplifier shown in Fig. 7 is its ultrawide dynamic range that is obtained by using the series connection of the single-fluxquantum feedback device with the SQUID pickup loop. The circuit consists of a comparator SQUID coupled to the output of an analog SQUID preamplifier. The output of the comparator controls the inputs of the two separate SQUID write gates (one of them through a buffer circuit). As can be seen from Fig. 7, the outputs of the write gates are connected through a superconducting inductive path consisting of the pickup coil and the flux transformer of the analog SQUID preamplifier. As a consequence of this series connection, the feedback loop always operates so that net circulating current vanishes, resulting in a practically unlimited dynamic range. In addition, due to its high dynamic range, such a digital SQUID can be used in a relatively high magnetic field without requiring extensive magnetic shielding. This novel architecture of combining two write gates in series with the applied signal gives this digital magnetometer circuit its high dynamic range. The operation of the single-chip magnetometer is as follows. In the absence of an external field, the comparator switches to a voltage state that causes its corresponding write gate to induce a flux quantum into the feedback loop and prohibits the buffer gate from switching to a voltage state. This flux creates a circulating current in the loop that is amplified by the analog SQUID and is applied back to the comparator SQUID. In the following clock cycle, this magnetic field keeps the comparator from switching and causes the buffer circuit to switch; hence, its corresponding write gate induces an antifluxon in the loop that annihilates the original fluxon. As long as there is no applied magnetic field, this process of fluxon/antifluxon creation/annihilation continues and represents the steadystate operation of the digital SQUID circuit, as shown in Fig. 8.
14
ANALOG AND DIGITAL SQUID SENSORS
Input coil Voltage
Feedback
Comparator
Coupling transformer Time Figure 8. Steady-state outputs of a digital SQUID circuit in the absence of an external field. The traces from top to bottom are clock 1, clock 2, input, and comparator and buffer outputs. As expected, the outputs alternately induce fluxons and antifluxons into the loop.
In the presence of an applied magnetic field, the comparator causes its corresponding write gate to generate pulses into the feedback loop to cancel the applied magnetic field. The injection of fluxons continues in each clock period, as long as the gate current of the comparator is less than its threshold current. Using proper polarity, the (Single Flux Quantum) SFQ-induced current in the superconducting feedback loop can eventually cancel the applied current and restore the comparator SQUID close to its original state. When the current in the feedback loop is close to zero, both write gates alternately emit fluxons and antifluxons into the loop in each clock period and keep the feedback current close to zero. The difference between the number of pulses is a measure of the applied the signal. Figure 9 shows a photograph of a successful high-sensitivity digital SQUID chip. In an optimized digital SQUID circuit, the least significant bit (LSB) of the output must be equal to the 1/2 flux noise of the front-end analog SQUID (SB ). If the 1/2 LSB is much less than SB , then the circuit does not work properly, that is, the comparator output randomly fluctuates between 0 and 1. If the LSB is much larger than 1/2 SB , then the sensitivity of the complete digital SQUID is compromised. In the latter case, however, such a lowsensitivity digital SQUID can be produced and will operate properly, albeit it has nonoptimal sensitivity. The digital SQUID chip slew rate (a measure of how fast the feedback current can null out the applied signal), is proportional to the clock frequency. The system must track the signal as well as the noise (at a slew rate of a few µT/s) to be able to operate in an unshielded environment. This requirement sets the minimum clock frequency required to accommodate the slew rate associated with the external noise at typically a few tens of MHz. The sensitivity of the single-chip digital SQUID magnetometer is limited by the sensitivity of its analog SQUID frontend. Consequently, if an analog SQUID that
Figure 9. Photograph of a high-sensitivity digital SQUID chip magnetometer. The loop inductance of the SQUID comparator is made of eight washers in parallel to facilitate coupling it to the large coupling transformer. The analog SQUID is coupled to the feedback coils through the two washers under these coils.
Clock
Digital SQUID
1-bit code
Divide by 214
+ −
Up-down counter
Accumulator
Output Figure 10. Block diagram of the digital SQUID peripheral electronics.
has ultimate (quantum) sensitivity is used as a front-end sensor for a digital SQUID, the complete sensor chip also possesses the same quantum efficiency. The easiest method of reconstructing the signal applied to the input of a digital SQUID is to integrate the output of the digital SQUID in real time. This method, however, does not take advantage of the digital nature of the output signal. Figure 10 shows a block diagram of the complete digital peripheral electronics for digital SQUID magnetometers. The output of the digital SQUID, which is typically 1 mV, is amplified 100 times by a preamplifier (not shown) that is attached to the digital SQUID output. Then, this amplified signal is fed to a gate that either generates a pulse at the positive output when there is an input pulse, or generates a pulse at the negative terminal
ANALOG AND DIGITAL SQUID SENSORS
when there is no input pulse. The counter simply keeps track of the difference between the number of positive and negative pulses. The counter output is loaded into the accumulator after each read cycle. Between read cycles, the counter’s value is added to the accumulator after each clock cycle. This decimation technique is actually the digital equivalent of a simple analog low-pass filter. Any data acquisition program, such as LabView, can postprocess the digital signal. SUMMARY Today analog dc SQUIDs are the most widely produced superconducting magnetometers for a variety of applications such as biomagnetism, rock magnetometry, and nondestructive evaluation. The SQUID array amplifier has led to significant improvement in the performance and cost of readout circuits for cryogenic detectors used in high-energy particle physics. The main features of the array amplifier are the wide bandwidth and low cost. The most intriguing application of the digital SQUID-based system is in the biomedical field, where multiple SQUID
15
channels are required for magnetic imaging. However, significant engineering works remain to be done before digital SQUIDs can be marketed.
BIBLIOGRAPHY 1. M. Gurvitch, M. A. Washington, H. A. Huggins, and J. M. Rowell, IEEE Trans. Magn. MAG-19, 791 (1983). 2. D. Cohen, E. Edelsack, and J. E. Zimmerman, Appl. Phys. Lett. 16, 278 (1970). 3. J. Clarke, in B. B. Schwartz and S. Foner, eds., Superconducting Applications: SQUID and Machines, Plenum Press, NY, 1977, pp. 67. 4. M. B. Ketchen and J. M. Jaycox, Appl. Phys. Lett. 40, 736 (1982). 5. D. Drung et al., Appl. Phys. Lett. 57, 406 (1990). 6. R. P. Welty and J. M. Martinis, IEEE Trans. Magn. 27, 2924 (1991). 7. N. Fujimaki et al., IEEE Trans. Electron Dev. 35, 2414 (1988). 8. M. Radparvar, IEEE Trans. Appl. Superconductivity 4, 87 (1994).
C CAPACITIVE PROBE MICROSCOPY
of cross-sectioned transistors for process failure analysis. Semiconductors other than silicon, such as InP/InGaAsP buried heterostructure lasers, have also been imaged with SCM. Another application that has encouraged the development of commercial SCMs is the need for quantitative two-dimensional (2-D) carrier profiling of silicon transistors. The International Technology Roadmap for Semiconductors (1) describes the necessity and requirements for 2-D carrier profiling as a metrology to aid in developing future generations of ICs. If accurate 2-D carrier profiles could be measured, they could be used to enhance the predictive capabilities of technology computer-aided design (TCAD). However, for this application, the performance goals for 2-D carrier profiling tools are very challenging. Carrier profiles of the source and drain regions of the transistors need to be known at 5 nm spatial resolution and ±5% accuracy in 2001 — the requirements rise to 0.6 nm spatial resolution and ±2% accuracy by 2014. The status of quantitative carrier profiling by SCM will be reviewed later. This article will review the history, operating principles, and applications of SCM. Special emphasis will be placed on measuring quantitative 2-D carrier profiles in silicon using SCM. This article will also review implementation and applications of the scanning capacitance spectroscopy (SCS) technique, the intermittent contact mode of SCM (or IC-SCM), and two other techniques that can also be classified as scanning capacitance probes: the scanning Kelvin probe microscope (SKPM), when operated at twice its drive frequency, and the scanning microwave microscope (SMWM).
JOSEPH J. KOPANSKI National Institute of Standards and Technology Gaithersburg, MD
INTRODUCTION A scanning capacitance microscope (SCM)1 combines a differential capacitance measurement with an atomic force microscope (AFM). The AFM controls the position and contact force of a scanning probe tip, and a sensor simultaneously measures the capacitance between that tip and the sample under test. The capacitance sensor can detect very small (≈10−21 F) changes in capacitance. The sensor is electrically connected to an AFM cantilevered tip that has been coated with metal to make it electrically conductive. In the most common mode of operation, an ac voltage (around 1 Vpeak−to−peak at 10 kHz) is used to induce and modulate a depletion region in a semiconductor sample. The varying depletion region results in a varying, or differential, SCM tipto-sample capacitance. The capacitance sensor detects the varying capacitance and produces a proportional output voltage. The magnitude and phase of the sensor output voltage are measured by a lock-in amplifier that is referenced to the modulation frequency. The signal measured by the SCM is the output voltage of the lock-in amplifier, which is proportional to the induced tip-to-sample differential capacitance. As the SCM/AFM tip is scanned across the sample surface, simultaneous images of topography and differential capacitance are obtained. Currently, the most common application of SCM is imaging the carrier concentration variations (also known as dopant profiles) qualitatively within transistors that are part of silicon integrated circuits (ICs). The ICs are cross-sectioned to expose the structure of individual transistors and other electronic devices. The differential capacitance measured by the SCM is related to the carrier concentration in the silicon. Regions of high carrier concentration, such as the source and drain of a transistor, produce a low differential capacitance signal level, and regions of low carrier concentration, such as the transistor channel, produce a high signal level. Thus, SCM images of transistors contain information about the details of construction of the transistor that is not usually visible in an optical or topographic image. Silicon integrated circuit manufacturers use SCM images
DEVELOPMENT OF SCM The first capacitance probe microscope predates the AFM. Matey and Blanc described an early version of SCM in 1985 (2,3). This instrument measured capacitance using a sensor removed from an RCA2 SelectaVision VideoDisc System (4,5) and a 2.5 × 5 µm probe that was guided in a groove. The VideoDisc was a competitor to the Video Cassette Recorder (VCR) for home video playback. The VideoDisc retrieved an analog video signal for playback that had been mechanically imprinted onto a capacitance electronic disk (CED). The VideoDisc player featured a capacitance detector that operated at 915 MHz and detected small changes in capacitance by the shift induced in the resonant peak of the response of an inductance–capacitance–resistance (LCR) resonator. The
2 Certain commercial equipment, instruments, or materials are identified in this paper to specify the experimental procedure adequately. Such identification does not imply recommendation or endorsement by NIST, nor does it imply that the materials or equipment used are necessarily the best available for the purpose.
1
The acronym SCM refers to the instrument, scanning capacitance microscope, and the technique, scanning capacitance microscopy. 16
CAPACITIVE PROBE MICROSCOPY
operating principle of the capacitance detector will be discussed in detail later. The VideoDisc has not endured in the market place, but the remarkable sensitivity of its capacitance detector inspired the SCM. Sensors similar in design are used on the commercial SCMs of today. The next development was a generation of SCMs that were similar in some ways to the scanning tunneling microscope (STM) (6–11). These instruments featured a sharpened metal tip and a piezoelectric scanning system like that used for the STM. Instead of controlling tip height by maintaining constant electron tunneling current, as in the STM, these microscopes maintained a constant capacitance to control tip-to-sample separation. By using the capacitance as the feedback signal, these SCMs measured the topography of insulating surfaces. (STMs can measure only surfaces that are relatively good conductors). Slinkman, Wickramasinghe, and Williams of IBM patented an SCM based on AFM and a RCA VideoDisc sensor in 1991 (12). This instrument was similar in configuration to current SCMs. Rather than using the capacitance signal to control the tip–sample separation, this instrument used the AFM to control tip position, and the capacitance was measured independently of the AFM. Combinations of commercial AFMs and RCA capacitance sensors to form SCMs were demonstrated soon after (13–15). The first turnkey commercial SCM was introduced by Digital Instruments Inc. in 1995. Development of this instrument was partially funded by the industry consortium SEMATECH. Thermomicroscopes Inc. also markets an SCM, and Iders Inc., Manitoba, Canada, markets a capacitance sensor suitable for SCM. Today, numerous groups pursue research into further development of SCM. Much, but not all, of this work is focused on developing SCM as a tool to measure the twodimensional carrier profiles of the source/drain regions of silicon transistors. The following is a thorough, but not exhaustive, survey of recent publications from groups that have active SCM programs. In North America, a variety of university, industrial, and government labs have published SCM related work. Contributions from each of these groups are referenced where relevant throughout this article. Since 1992, the National Institute of Standards and Technology (NIST) has had a SCM program with the goal of making SCM a quantitative 2-D carrier profiling technique (13,15–26). The NIST program has developed models of the SCM measurement based on three-dimensional finite-element solutions of Poisson’s equation (23,24) and Windows-based software for rapid carrier profile extraction from SCM images (21). One of the inventors of the SCM technique, Williams, is now at the University of Utah and continues to lead a large and active SCM group. The University of Utah program also seeks to use SCM to measure quantitative 2-D carrier profiles of silicon transistors (27–37). The industrial consortium, International SEMATECH, has also played a critical role in developing a commercial SCM tool. International SEMATECH established a working group on 2-D dopant profiling that conducts review
17
meetings on the SCM technique and is conducting roundrobin measurements of samples by SCM (38). Universities that publish SCM studies include Stanford University (39,40), the University of California at San Diego (41), and the University of Manitoba (42). Industrial applications of SCM have been reported by scanning probe microscope instrument makers, Digital Instruments Inc. (15, 43) and Thermomicroscopes (11,44), and semiconductor manufacturers such as Intel (15,43), AMD (17,18,45), Texas Instruments (46–50), and Lucent Technologies (51,52). In Japan and Korea, several groups are investigating SCM applications. Researchers at Korea’s Seoul National University have studied charge trapping and detrapping mechanisms by using SCM in SiO2 -on-Si structures (44,53–55). Fuji Photo Film Co. has studied SCM charge injection and detection of bits for ferroelectric/semiconductor memory (56–58). Researchers at Tohoku University, Nissin Electric Co., and Nikon Co. have investigated charge injection into SiO2 -on-Si structures (59–62). Multiple groups are active in Europe. Researchers at the Laboratory of Semiconductor Materials, Royal Institute of Technology, Sweden, have specialized in applying SCM to image buried heterostructure (BH) laser structures and other layered structures made from InP/GaInAsP/InP (63–65). Researchers at the University of Hamburg, Germany, have reported manipulating charge in NOS structures as potential mass storage devices using a capacitance sensor of their own design (66,67). In Switzerland, a group at the Swiss Federal Institute of Technology has developed a direct inversion method for extracting carrier profiles from SCM data and simulated the methods’ sensitivity to tip size (68,69). IMEC, in Belgium, has long been a center of expertise in measuring carrier profiles in silicon. IMEC has published various studies of SCM and scanning spreading resistance microscopy (70–74). OPERATING PRINCIPLES Contrast Mechanism Figure 1 shows a block diagram of a typical scanning capacitance microscope operated in the constant changein-voltage (V) mode. The SCM can be considered as four components: (1) AFM, (2) conducting tip, (3) capacitance sensor, and (4) signal detection electronics. The AFM controls and measures the x, y, and z positions of the SCM tip. (See the article on AFMs in this encyclopedia for more details). The conducting tip is similar to ordinary AFM tips, except that it needs to have high electrical conductivity and must have electrical contact with the capacitance sensor and the sample. The radius of the tip determines the ultimate spatial resolution of the SCM image. Commercially available metal-coated silicon tips or tips made completely from very highly doped (and hence low resistivity) silicon have been used. Capacitance Sensor. The capacitance sensor is the key element that enables the scanning capacitance microscope.
18
CAPACITIVE PROBE MICROSCOPY
Piezoelectric scanner
Laser diode Beam position detector
dC
Lock-in amp (w)
dV C(Vdc)
Capacitance sensor
r ileve Cant Tip Dielectric N-type Figure 1. Block diagram of the scanning capacitance microscope configured for constant V mode (13). (Copyright American Institute of Physics 1996, used with permission.)
Semiconductor
The archetypical SCM uses a capacitance sensor similar to that used in the RCA VideoDisc player (4,5). Commercial SCMs use sensors that are similar in concept, though unique and proprietary in design. The sensor uses an excited LCR resonant circuit to measure capacitance. The capacitance to be measured is incorporated in series as part of the LCR circuit. The amplitude of the oscillations in this resonant circuit provides a measure of the capacitance. Figure 2 shows a simplified schematic of the capacitance sensor circuitry. The circuit consists of three inductively coupled circuits: an ultrahigh frequency (UHF) oscillator, the central LCR resonant circuit, and a simple peak detection circuit. The UHF (ωhf ≈ 915 MHz for the VideoDisc) oscillator is used to excite the LCR resonator at a constant drive frequency. The oscillator circuit contains a mechanically tunable potentiometer and capacitor to allow some tuning of the UHF signal frequency and amplitude. The magnitude of the UHF signal voltage VHF which is applied between the tip and sample is an important variable for SCM image interpretation. The capacitance CTS between the SCM tip and the sample is incorporated into the central LCR circuit. As shown in Fig. 3, an LCR circuit displays a characteristic bell-shaped response versus frequency; the peak response is at the resonant frequency. When CTS changes, the resonant frequency changes, and the amplitude of the oscillations in the resonant circuit changes. The total
Vac(w)
P-type
Vdc
capacitance in the sensor LCR circuit includes the tip–sample capacitance, the stray capacitance between the sample and sensor, and a variable tuning capacitance. In an RCA style sensor, a varactor diode is used to provide a voltage-variable capacitance. The varactor capacitance can be used to adjust the total capacitance in the LCR circuit, which in turn changes the circuit’s resonant frequency. In this way, the sensor can be ‘‘tuned’’ to produce a similar response for a range of tip–sample capacitances. A simple video peak detection circuit is used to detect the amplitude of the oscillations in the resonant circuit, thus giving a measure of CTS . The detector circuit produces a dc output voltage that changes in proportion to CTS . Figure 3 shows the effect of introducing a resonant frequency deviation, due to a change in CTS , on the detected sensor output voltage. The rightmost bell-shaped envelope shows the LCR circuit response to changes in drive frequency. The peak response is at the resonant frequency. The leftmost envelope shows the response when the resonant frequency has been altered by 115 kHz due to a change in CTS . At the drive frequency, this causes the amplitude of the oscillations in the resonant circuit to drop by 10 mV. In this example, this change will show up in the detector output voltage by a drop in 10 mV from an unperturbed output voltage of around 2 V. Summary of MOS Device Physics. To generate image contrast, the SCM exploits the voltage-dependent
Amplitudemodulated carrier
Peak detector AMP
Tip/ sample capacitance C TS Figure 2. Capacitance detection circuit used in the RCA VideoDisc sensor and later modified for use in scanning capacitance microscopy (4). (Copyright American Institute of Physics 1989, used with permission.)
UHF oscillator
C
Capacitance sensor output
CAPACITIVE PROBE MICROSCOPY
Sensor output voltage (V)
Freq. deviativon
−4
Vp
BW = 20 MHz
Vp Slope = − √3 4 BW
−2
10 mVpp
f f0 = 915 MHz Resonant drive freq. freq.
Freq.
Figure 3. Capacitance sensor LCR circuit response illustrating how the shift in resonant frequency is converted into an output voltage [adapted from (5)].
capacitance between the conducting SCM tip and a semiconductor sample under test. When imaging silicon by SCM, this capacitor is usually considered a metal–oxide–semiconductor (MOS) capacitor, though a metal–semiconductor Schottky contact may also be formed (42). The quality of the SCM measurement on silicon depends on forming a thin oxide and establishing an unpinned semiconductor surface. Because a semiconductor is necessary for generating a SCM signal in contact mode, the following discussion assumes that a semiconductor is present as the sample to be imaged. A MOS capacitor displays a characteristic voltagedependent capacitance due to the ‘‘field effect.’’ The field effect is also the basis of the technologically important metal–oxide–semiconductor field effect transistor (MOSFET). The physics of the field effect and the MOS capacitor is discussed in great detail in many textbooks (75,76). Following is a brief discussion of the MOS capacitance versus voltage (C–V) response in the high-frequency regime that will define the terms and behaviors necessary to describe how the SCM functions. The MOS capacitance changes with voltage because an electric field can attract or repel charge carriers to or from the surface of a semiconductor. A voltage applied across a MOS capacitor induces an electric field at the surface of the semiconductor. Characteristically, the MOS C–V response is divided into three regions. When the voltage on the metal has a sign opposite that of the majority carriers, the electrical field repels minority carriers and attracts additional majority carriers. The silicon surface has ‘‘accumulated’’ additional majority charge carriers. In the accumulation region, the capacitance is just the parallel plate oxide capacitance between the metal and semiconductor. When the voltage on the metal has the same sign as that of the majority carriers, the electrical field attracts minority carriers and repels majority carriers. The silicon surface is then ‘‘depleted’’ of charge carriers. In the depletion region, the capacitance is the oxide capacitance plus a depletion capacitance due to
19
the field-induced space charge region (or depletion region). When the voltage on the metal is large and has the same sign as the majority carriers, the electrical field can attract sufficient minority carriers to ‘‘invert’’ the conduction type at the silicon surface. In the inversion region, the depletion region has reached its maximum extent, and the capacitance is at a minimum. The sign of the voltage necessary to deplete a semiconductor depends on its conduction type. Silicon of n-type, where electrons are the majority carriers, displays accumulation at positive tip voltages, and inversion at negative tip voltages. Silicon of p-type, where holes are the majority carriers, displays accumulation at negative tip voltages and inversion at positive tip voltages. The dividing point between accumulation and depletion occurs at the flatband voltage Vfb . Flatband describes a condition of semiconductor band structure where the conduction band, valence band, and Fermi level at the surface are all equal to their bulk values. The ideal value of Vfb depends on the net carrier concentration in the semiconductor and the dielectric constant of the semiconductor. In real systems, oxide surface charges, the dielectric constant of the insulator, and the work function of the gate (or SCM tip) metal cause Vfb to vary from its ideal value. Analysis of the MOS C–V curve reveals a lot of information about the semiconductor–insulator system. The value of the MOS capacitance depends on the voltage and also on the oxide thickness tox , the oxide dielectric constant εox , the semiconductor dielectric constant εs , and the carrier concentration N in the semiconductor. The maximum capacitance in accumulation is equal to the oxide capacitance Cox . The ratio of the inversion and oxide capacitances and the slope of the capacitance versus voltage in depletion depend on the carrier concentration of the semiconductor. SCM image contrast is generated from the voltage-dependent capacitance of a MOS capacitor formed by the SCM tip and an oxidized semiconductor. Field of View and Spatial Resolution. Because it is based on an AFM, the SCM has the same field of view as the AFM. For most commercial systems, the field of view can be continuously adjusted down from 100 × 100 µm to less than 1 × 1 µm. The spatial resolution of the SCM is also related to that of the AFM. The AFM system determines the number of points per line and the number of lines per image. However, the tip radius determines the ultimate spatial resolution of SCM. SCM gathers information from a volume beneath the probe tip that is determined by the magnitude of the voltages applied to the SCM. Large applied voltages can cause that volume to expand to many times the tip radius, especially for material that has a low concentration of carriers. It is generally believed that, by controlling the voltages applied to the SCM and by using models in data interpretation, the SCM can measure carrier profiles whose spatial resolution is at least equal to the tip radius (20,30). Current SCM tip technology produces tips whose radii are of the order of 10 nm.
20
CAPACITIVE PROBE MICROSCOPY
Modes of Operation The SCM detection electronics are used to measure a difference in capacitance C, between the tip and sample. The SCM is commonly operated to image dopant gradients in two different modes, the constant V mode and the constant C mode. Constant 1V (or Open Loop) Mode. In the constant delta voltage (V) mode of SCM (8–10), an ac sinusoidal voltage whose magnitude is Vac at a frequency of ωac is applied between the sample and the SCM tip, which is a ‘‘virtual’’ ground. In this case, the tip is inductively coupled to ground. The inductance is such that it passes the ωac signal to ground but does not ground the ωhf signal, which is used to detect the capacitance. Vac is typically around 1 V at a frequency of 10–100 kHz. Figure 4 illustrates the voltage dependence of a MOS capacitor and the mechanism of SCM contrast generation. The Vac signal induces a change in the tip-to-sample capacitance C(ωac ) that depends on the slope of the C–V curve between the tip and the semiconductor. When the tip is over a semiconductor that has a low doping concentration, the C–V curve changes rapidly as voltage changes, and Vac produces a relatively large C. When the tip is over a semiconductor that has a high dopant concentration, the C–V curve changes slowly as voltage changes, and Vac produces a relatively small C. When a set value of Vac is applied between the tip and the sample, the capacitance sensor produces a voltage output signal that varies at ωac in proportion to C. In the constant V mode of SCM, the SCM signal is this sensor output voltage. As shown in Fig. 1, the magnitude of the sensor output is detected by using a lock-in amplifier that is referenced to ωac . Because n-type and p-type semiconductors produce C–V characteristics that are mirror images of each other, they produce distinctly different responses to the SCM (17,18). In general, n-type silicon produces a response that is inphase with the drive signal, and p-type silicon produces a response that is 180° out
Capacitance
DC high
DC low
Vac
Tip-sample voltage
Figure 4. Schematic illustration of the way SCM image contrast is generated for two different dopant densities of silicon.
of phase with the drive signal. When using a lock-in amplifier in the x/y mode, n-type silicon produces a positive signal, and p-type silicon produces a negative signal. The capacitance sensor output provides a relative measure of differential capacitance — larger values of C produce larger sensor output voltages. Because the tip shape and the tuning of the capacitance sensor are variable, the absolute value of capacitance (in farads) measured by the SCM is difficult to determine. When measuring an unknown, the absolute capacitance is determined by comparison to measurements of a reference whose capacitance can be calculated from its known properties. Several operational factors strongly influence the SCM signal: 1. ac bias voltage Vac . The ac voltage determines how much of the C–V curve is used to generate the differential capacitance. Larger values of Vac generate larger values of C. The larger change in capacitance is due to depletion of carriers from a larger volume of the sample. This results in lower spatial resolution when the SCM is used to measure quantitative carrier profiles. For quantitative measurements, a low value of Vac is used, typically 1 V or less. 2. dc bias voltage Vdc . The dc voltage Vdc applied between the sample and the tip determines where C is measured on the C–V curve. The effect of Vdc is relative to the flatband voltage of the MOS capacitor formed by the tip and the sample. Vfb will vary from sample to sample. The peak SCM response occurs when Vdc ≈ Vfb (24). In a sample that contains a dopant gradient, Vfb varies with dopant concentration. In practice, Vdc is usually set equal to the voltage that produces the maximum SCM response in the region that has the lowest doping concentration. 3. Sensor high-frequency voltage Vhf . The magnitude of the high-frequency voltage Vhf in the LCR loop of the capacitance sensor is proportional to the magnitude of the output signal. Increasing Vhf increases the capacitance sensor output and the signal-to-noise of the capacitance measurement. However, similar to Vac , the measured SCM signal is averaged across its values at the voltages spanned by Vhf . Large values of Vhf generate large signals at the cost of spatial resolution. 4. Sensor-to-sample coupling. The capacitance sensor forms a grounded LCR loop that includes the tip–sample capacitance. The capacitance between the sample and the grounded outer shield of the capacitance sensor completes the loop. Subtle variations in the sample geometry can influence this coupling capacitance. A signature of large coupling is the generation of a nonzero SCM signal when the probe is over a region such as a good insulator where no SCM signal should be expected.
CAPACITIVE PROBE MICROSCOPY
Constant 1C (or Closed-Loop) Mode. For carrier profiling applications, the SCM is also commonly operated in a constant C mode to control the volume depleted of carriers better. A feedback loop (hence, closed-loop mode) is added to the SCM to adjust the magnitude of Vac automatically to keep C constant in response to changes in dopant concentration as the SCM tip is scanned over a dopant gradient (18,27,29). The lower the doping beneath the tip, the smaller the Vac needed to induce the same C. Figure 5 shows a block diagram of a constant C SCM feedback loop. Because C remains constant, the volume depleted of carriers and therefore the spatial resolution of the image are less dependent on the dopant concentration in the semiconductor. In the constant C mode of SCM, the SCM signal is the value of Vac that the feedback loop establishes to maintain a constant capacitance. To achieve optimum control of the volume of the depletion layer in the constant C mode, the dc bias must be equal to the flatband voltage of the tip–sample MOS capacitor (17). The feedback loop of the constant C mode produces a signal that changes monotonically as carrier concentration changes only for dopant gradients in like-type semiconductors. When a p–n junction is present, the changes in sign of the SCM signal when transiting between n-type and p-type material cause the feedback loop to saturate near the junction. Capacitance-Voltage Curve Measurement. The measurement of capacitance versus dc voltage (i.e., the C–V curve) is the usual method of characterizing any MOS capacitor, and this is just as true for the MOS capacitor formed by the SCM tip and an oxidized semiconductor. Measurement of the SCM tip–sample C–V characteristics reveals the suitability of the sample preparation for quantitative carrier profiling. SCM tip–sample C–V curves can be measured by a boxcar averager (27) or a digital oscilloscope operated in the x, y mode. An ac voltage at around 1 kHz that spans the range of the desired voltage of the C–V curve is applied between the tip and
Out
Vref
Lock-in amp. Ref
+
the sample. The ac voltage is displayed on the x channel, and the capacitance sensor output is displayed on the y channel. The averaging feature of the digital oscilloscope is used to improve the signal-to-noise. To be suitable for quantitative carrier profiling, the C–V curve must display clear accumulation and depletion regions, little hysteresis between the forward and reverse sweep, and no increase in capacitance in the depletion region. The C versus V curve can also be measured directly by the SCM (25). In this case, an SCM image is acquired while slowly changing the value of Vdc , so that the desired range for the C–V curve is swept out once in the time it takes to acquire the image. Sections of such an image taken parallel to the slow scan direction reveal the C–V curve. A high-quality surface should yield a single peak (a positive peak for n-type and negative peak for p-type) at the flatband voltage. Scanning Capacitance Spectroscopy. The SCM technique has been extended into spectroscopy by measuring capacitance at multiple values of Vdc at each point of an image (46,47). Multiple points of the tip–sample C–V curve are measured by changing the applied dc bias voltage between the tip and the sample on successive scan lines. The measured C–V curves display different characteristics of behavior for n-type, p-type, and intermediately doped regions. Material of n-type has a C–V of positive slope, p-type has a C–V of negative slope, and intermediate regions display a U-shaped C–V curve. The p-n junction location can be estimated as the point where the C–V curve has a symmetrical U-shape (46). Intermittent-Contact SCM. The SCM can also be operated in intermittent-contact (or tapping) mode (20,21). In this case, a differential capacitance is generated by the change in the tip–sample distance due to the vibrating tip. Figure 6 shows a block diagram of the IC-SCM. The capacitance sensor output is measured by a lock-in amplifier using the tip vibration frequency as the reference.
Lock-in amplifier
AFM auxiliary input
In
21
Ref.
Capacitance sensor
-
w
Tip Sample
Vdc
Piezo stack
Dielectric
Capacitance sensor
Out
Vac(a)
DC (w) Tune +15 V
Ref Amplitude
modulation In
Figure 5. Feedback loop used to implement the constant C mode of the scanning capacitance microscope (16). (Copyright Elsevier Science 1997, used with permission.)
Silicon
Metal
Figure 6. Block diagram of the intermittent-contact mode scanning capacitance microscope (20). (Copyright American Institute of Physics 1998, used with permission.)
22
CAPACITIVE PROBE MICROSCOPY
No modulation of a depletion region in a semiconductor is required in the IC-SCM mode to generate a differential capacitance signal. Because all materials display a tip–sample spacing-dependent capacitance, the IC-SCM extends the technique to measurements on nonsemiconductors. IC-SCM can detect metal lines beneath an insulating layer, even if the insulating layer has been planarized to remove any indication of the metal lines in the surface topography. This may have applications in lithographic overlay metrology. IC-SCM can also detect variations in the dielectric constant of thin films; the difference between SiO2 (ε = 3.9) and Si3 N4 (ε = 7.5) is clearly resolvable. This capacity may have applications for evaluating deposited dielectric films used in the semiconductor industry, such as candidate alternative gate dielectrics and for capacitors in memory circuits. The IC-SCM’s ability to measure dielectric constant at high spatial resolution may also have applications for library evaluation in combinatorial methods. APPLICATIONS OF SCM This section will discuss several applications of SCM, except for quantitative carrier profiling by SCM, which will be discussed separately in the next section. The most common application of SCM today is for qualitative imaging and inspection of fabricated doped structures in silicon. SCM images of a matched pair of 0.35-µm channel NMOS and PMOS silicon transistors are shown in Fig. 7 and 8 (17,18). Other device structures such as the tungsten plug silicide contacts to source/drain regions and the trench sidewall implants for DRAM
200 nm
Figure 7. SCM image of cross-sectioned NMOS silicon transistor (17). The lines show the approximate location of the silicon surface, polysilicon gate, and p–n junction. (Copyright Electrochemical Society 1997, used with permission.) See color insert.
200 nm
Figure 8. SCM image of cross-sectioned PMOS silicon transistor (19). The lines show the approximate location of the silicon surface, polysilicon gate, and p–n junction. (Copyright American Institute of Physics 1998, used with permission.) See color insert.
trench capacitor processes have also been imaged by SCM. SCM images are used to detect defects that negatively impact device performance, to evaluate the uniformity of implants, and to determine information as basic as whether the implant dopant step used the right type of dopant, or even if it was conducted at all. This type of information is not as easily available from other techniques, making SCM an increasingly used method for failure analysis. SCM is also used to determine the location of the electrical p–n junction in images of cross-sectioned silicon devices that allows measuring the basic transistor properties of p–n junction depth and MOSFET channel length. The apparent p–n junction location in an SCM image is where the signal changes sign from positive (for n-type) to negative (for p-type). The apparent junction location is a function of both Vdc and Vac (22,51,52). By varying Vdc , it is possible to move the apparent junction location from one side of the built-in depletion region of the junction to the other. The apparent junction location coincides approximately with the actual electrical junction when Vdc is midway between the flatband voltage of the n-type silicon and the flatband voltage of the p-type silicon (22,51,52). Junction location can also be determined from scanning capacitance spectroscopy images. A variety of other semiconductors have also been imaged by SCM for the same general goals as for silicon. Images of InP based laser structures (63–65), SiC, and AlGaN/GaN (41) have recently been published. SCM imaging of InP laser structures has been used to evaluate regrown material quality, interface locations, and the uniformity of doped structures formed by LPE and MOVPE. An SCM image of an InP/InGaAsP Buried Heterostructure Laser is shown in Fig. 9. The active region
CAPACITIVE PROBE MICROSCOPY
Figure 9. SCM image 4 × 8 µm of a cross-sectioned InP/InGaAsP buried heterostructure laser. (Image/photo taken with NanoScope SPM, Digital Instruments, Veeco Metrology Group, Santa Barbara, CA, courtesy Andy Erickson. Copyright Digital Instruments Inc. 1995–98, used with permission.) See color insert.
of the laser is in the center of the image and, if lasing, the emitted light would be directed out of the page. SCMs have been used to study the charge trapping and detrapping dynamics of silicon dioxide or silicon nitride insulating layers on silicon (39,44,53,55–58,60,61,66). A voltage pulse through an SCM tip can be used to inject a charge into a nitride–oxide–silicon (NOS) structure, which becomes trapped. The trapped charge induces a depletion region in the silicon whose capacitance can be detected by SCM. A reverse bias pulse can remove the stored charge. Much of this work has been conducted to determine the potential for using charge trapping and detection with capacitive probes as a nonvolatile semiconductor memory. Because the small tip size enables high packing densities of information, it has been hypothesized that such a technique may be able to store large amounts of data in a small area. TWO-DIMENSIONAL CARRIER PROFILING OF SILICON TRANSISTORS Two-dimensional carrier profiling is the measurement of the local carrier concentration (in cm−3 ) as a function of the x and y positions across a semiconductor surface. When referring to 2-D carrier profiling of MOSFETs, this 2-D surface is assumed to be a cross section of the device parallel to the direction of the source-to-drain current flow. Such a cross section will reveal the structure of the source and drain regions and allow visualizing the sourceto-drain spacing and junction depths. A one-dimensional profile traditionally is the distribution of carriers as a function of depth from the surface of the silicon wafer. The regions of most interest are those where high concentrations of dopants have been introduced into the semiconductor by ion implantation or diffusion to form regions of high conductivity. In particular, the dopant concentration as a function of position (or the dopant profile) of the source and drain regions of a MOSFET in the vicinity of the channel largely determines the electrical characteristics of the device and is of much technological interest. For the sake of clarity, keep in mind that dopants are the chemical impurities introduced to change the electrical conductivity of a semiconductor, and ‘‘carriers’’
23
are the electrically active charge carriers. If incorporated into an electrically active site of a crystal lattice, each dopant can provide a charge carrier. Because the internal electrical field from steep gradients of dopants can cause redistribution of the charge carriers, the dopant profile and the carrier profile are not necessarily the same. SCM is sensitive to the electrically active charge carrier profile. To measure 2-D carrier profiles in silicon by SCM requires three separate procedures: sample preparation, SCM image acquisition, and carrier profile extraction. For silicon, sample preparation must expose the region of interest, produce a smooth surface whose residual scratches are less than a nanometer deep, and form a thin oxide or passivation layer. To extract accurate quantitative carrier profiles, SCM images must be acquired under highly controlled and specified conditions. The SCM model parameters must be measured, and data from known reference samples must be included. Once a controlled SCM image has been acquired, the 2-D carrier profile can be extracted with reference to a physical model of the SCM. Forward modeling is the calculation of a SCM signal for a known carrier profile whose measurement and sample model parameters are known. Several forward models of varying complexity and accuracy have been developed. Reverse modeling is the rather more complex problem of determining an unknown carrier profile from the measured SCM signal and the known model parameters. SCM Measurement Procedures for Carrier Profiling Sample Preparation. SCM cross-sectional sample preparation is derived from techniques developed for Scanning Electron Microscopy (SEM) and Transmission Electron Microscopy (TEM). A detailed common procedure is described in (77). Sample preparation usually involves four steps: (1) cross-sectional exposure, (2) back side metallization, (3) mechanical polishing, and (4) insulating layer formation. Before cross-sectioning, the region of interest is usually glued face-to-face to another piece of silicon. This cover prevents rounding and chipping of the surface of interest during mechanical polishing. Thin (<1 mm) cross sections are cut from the sample–cover sandwich by a mechanical dicing saw. Thin samples are essential because the SCM measurement depends on low resistivity between the front surface (where the probe contacts the sample) and the back surface (where the voltages that stimulate the SCM are applied). To improve the electrical contact between the back of the sample and the microscope sample holder, the back side is usually coated by a Cr layer (for adhesion), followed by an Au layer (for low resistance and corrosion resistance). The sample is attached to a metal sample holder using a conducting silver-filled epoxy. Front-to-back sample resistance of no more than a few ohms indicates good back side contact. The surface of the exposed cross section is mechanically polished using a rotating lap and progressively finer grit abrasives. A typical sequence of polishing materials uses diamond lapping films that have grit sizes of 10 µm, 3 µm, 1 µm, 0.5 µm, and 0.1 µm, followed by colloidal diamond slurries on napless polishing pads whose grit sizes are 1 µm, 0.5 µm, 0.25 µm, 0.1 µm, and finally, 0.05 µm. Each polishing step should remove a surface layer equal to at
24
CAPACITIVE PROBE MICROSCOPY Table 1. SCM Model Parameters Parameter
Symbol
Typical Range
Nominal
SCM Operating Point ac voltage ac frequency dc voltage High-frequency voltage
Vac ωac Vdc Vhf
0 to 8 V 1–100 kHz −5 to +5 V 0–10 V
<1 V 10 kHz 0 V ≈ 1 V (instrument dependent)
Tip Parameters Tip radius Cone angle Tip flat radius Tip tilt
rtip CA rflat
5–50 nm 10° 0–50 nm 0° or 20°
10 nm 10° 0 nm 0°
Sample Parameters Oxide thickness Oxide dielectric constant Flatband voltage
tox εox Vfb
0.5–10 nm 1 to 5 −5 to +5 V
2 nm 3.9 0 V
Normalization Points (constant V mode) Sensor output at Nhigh Sensor output at Nlow
Chigh Clow
Instrument dependent Instrument dependent
least three times the grit size of the previous polishing step. An important part of the polishing sequence is frequent inspection of the polished surface by an optical microscope. Inspection verifies that a uniform polish has been achieved and no scratches are deeper than the mean grit size of the current polishing medium. The final step in silicon sample preparation is the formation of a thin, passivating, insulating layer on the surface to be imaged. This layer is necessary to reduce the number of surface states and to serve as the oxide of the MOS capacitor that the SCM tip forms with the sample. Several processes have been used to form this layer, including a short polish using colloidal silicon (77), annealing at 300 ° C for 60 minutes under UV light (34) or without (78), a UV-ozone photoreactor (49,78), and UV light and ozone at elevated temperatures (79). Regardless of the process used to form the surface-insulating layer, the quality of the insulating layer for SCM is determined by measuring the C–V by SCM. Determination of Model Parameters. To determine accurate and quantitative carrier profiles from an SCM image requires detailed knowledge of all of the other parameters that influence the SCM response. The accuracy with which the model parameters are known limits the accuracy with which the carrier profile can be determined. The SCM model parameters are summarized in Table 1. The SCM operating point includes the user-applied voltages, which are known precisely. The applied ac voltage Vac , ac frequency ωac , dc voltage Vdc , and highfrequency voltage Vhf have all been defined previously. These voltages greatly affect the magnitude of the SCM signal. Due to the nature of the capacitance sensor, the magnitude of the high-frequency voltage is usually not
Sample dependent Sample dependent
known precisely and can only be measured indirectly. The ac measurement frequency, tip scan rate, and the lockin integration time also influence the quality of a SCM image. In general, the SCM image must be acquired at a slow enough tip scan rate, so that the equilibrium SCM response is measured at each point of the image. A good indicator of equilibrium is whether the SCM line trace from the forward and reverse images overlay each other. The tip model parameters attempt to describe the complex shape of the SCM tip by using a few simple parameters. The actual tip shape can be measured by various tip profiling approaches (80). The tip radius rtip describes a hypothetical spherical tip, which is easer to deal with in numerical modeling. The tip cone angle CA describes a more realistic conical tip. Together, these two parameters describe a tip whose finite tip radius is smoothly jointed to a conical shaft. Although the spherical portion of the tip contributes a majority of the capacitance measured by SCM, it has been shown that the contribution from the conical sidewalls is also significant (24). The tip model has been refined by introducing a flat tip radius rflat to model a worn spherical tip (34). The tilt of the tip relative to the sample plane has also been included in some models (39). The sample parameters consist of the oxide thickness tox , the oxide dielectric constant εox , and the flatband voltage Vfb of the oxide on the sample. Oxide thickness can be measured by ellipsometry, and εox and Vfb can be estimated from C–V measurements of large area MOS capacitors where oxides are formed via the same process as the sample. Vfb can also be estimated from the peak SCM response as a function of Vdc (24). It has been shown that the value of Vdc , which produces the peak SCM response, is near, though not exactly at, Vfb . This is due to the
CAPACITIVE PROBE MICROSCOPY
asymmetry of the C–V curves and the use of nonzero values of Vac , rather than a point derivative. The final information necessary to extract carrier profiles from SCM images is one or more normalization points. The confidence of the model fit to the experimental data can be improved by using additional normalization points. These may be obtained from the known dopant concentration in the substrate, the peak doping of the ion implanted profile as determined by 1-D SIMS, or from reference samples. Reference samples, when used as the cover material to prepare cross-sectional samples, will be exposed to exactly the same sample surface preparation as the unknown sample under test. SCM Image Acquisition. Tuning the SCM for optimal response in carrier profiling involves balancing two conflicting needs. Large values of Vac and Vhf produce a large signal relative to the noise in the measurement. However, large values of these voltages also cause the depletion region to spread to a larger volume that degrades the spatial resolution of the measurement. Consequently, good measurement practice requires adjusting Vac and Vhf to the minimum values needed to produce adequate signal-to-noise (17,35,36). For samples that do not contain p–n junctions, the constant C mode of SCM, which automatically adjusts Vac , can be used. Samples that contain p–n junctions are usually imaged in constant V mode, which uses a fixed value of Vac . Selecting the optimal value of the dc bias is also important in obtaining good carrier profiles. The SCM signal varies with dc bias, and a peak occurs near the flatband voltage of the sample surface. For samples without p–n junctions, Vdc is usually selected so that the peak SCM response is obtained when the tip is contacting the most lightly doped region of the sample. This ensures that the maximum SCM signal is measured during imaging and provides a reference to the flatband voltage of the sample. For samples that contain p–n junctions, Vdc is selected so that it is halfway between the flatband voltages of the n-type and p-type material. Several performance criteria indicate good surface preparation and proper coupling between the SCM and the sample (17): 1. C–V curve shape, flatband voltages, and hysteresis are used to verify an adequate surface oxide layer. The C–V curve should display clear accumulation and depletion regions and not be U-shaped (increasing capacitance in depletion) when contacting uniformly doped material. Flatband voltages should be near zero, and the flatband voltages of n-type and p-type silicon should differ by about the band gap of silicon, 1.1 V. Hysteresis between C–V curves measured by tip voltages swept from accumulation to depletion and C–V curves measured by the reverse sweep convention indicate trapping of charge. 2. Zero SCM signal level should be obtained when the tip is contacting a region that is known to be highly insulating, such as in the glue joining the sample and cover material. A finite signal level here indicates poor coupling between the SCM and the sample.
25
3. The sign of the SCM signal should be positive for n-type material and negative for p-type material. A SCM signal of the wrong sign for the sample conductivity type indicates large coupling, a poor surface, or an improper dc bias voltage. 4. Forward and reverse sweep SCM signals should track each other precisely. Differences in the forward and reverse sweep SCM signals across a dopant gradient indicate that the capacitance sensor is not measuring the equilibrium capacitance response as the tip is swept. The tip scan rate, the frequency of the ac voltage, and the lock-in averaging time should be adjusted until the forward and reverse sweep SCM signals track each other. SCM Forward Models All models of SCM involve calculating the tip-to-sample capacitance for a known carrier profile, SCM/sample geometry, and a set of applied operating voltages. This falls into a class of basic electromagnetic problems of finding the electrostatic potential , that corresponds to a given charge distribution for a set of boundary conditions. The physics of this problem is described by a general form of Gauss’ law, which is one of Maxwell’s four fundamental equations of electromagnetism. Gauss’ law relates the flux of the electric field E through a closed surface to the total charge enclosed by the surface. In terms of the electrostatic potential, this is expressed mathematically by Poisson’s equation, which states that the divergence of the gradient in is equal to the charge density:
q ∇ · (εr ∇ ) = − εo
(Nd − Na + p − n).
(1)
Here, q is the elementary charge, εo is the vacuum permittivity, εr is the relative dielectric constant, Nd is the ionized donor concentration, Na is the ionized acceptor concentration, p is the hole concentration, and n is the electron concentration. Two basic approaches have been used to model SCM. ‘‘Quasi 3-D’’ approaches simulate the interaction of the 3D SCM tip with a 3-D carrier profile by summing a series of capacitances between small elements of the tip and the underlying dopant profile. These capacitance elements are based on the 1-D analytic solution of Poisson’s equation for a parallel plate MOS capacitor geometry. A more rigorous approach is to solve Poisson’s equation numerically for the geometry and boundary conditions imposed by the SCM tip and sample. The SCM signal using solutions of Poisson’s equation has been simulated via code developed in-house at NIST and various commercial device simulators. Quasi 3-D Models. For the geometry of a large area, parallel plate MOS capacitor, Poisson’s equation has an analytical solution that describes the capacitance per unit area as a function of the device geometry and semiconductor properties (81). This 1-D theory is described in great detail elsewhere (75,76). Because the MOS capacitor is technologically important to the silicon integrated circuit, many characterization techniques and simulation tools have been developed. These have been
26
CAPACITIVE PROBE MICROSCOPY
primarily based on the 1-D theory until recently, when shrinking device dimensions forced consideration of 2D and 3-D effects. The 1-D theory is not directly applicable to SCM because both the probe tip and the dopant distribution of interest vary rapidly in three dimensions. The SCM tip–sample capacitor is not a parallel plate capacitor, but fundamentally has a 3-D geometry. Quasi 3-D models refer to a class of SCM simulations that attempt to mimic 3-D behavior from multiple 1-D simulations. An approach developed at the University of Utah modeled the SCM tip as a sphere and calculated the electrical field between the tip and sample by using the method of images (27–29). In this case, the sample, centered on the tip point of contact, was divided into a series of annular regions, and the oxide capacitance of each region was calculated from the electrical field and charge density determined by the method of images. Once the oxide capacitance was known, the silicon capacitance was calculated using a 1-D band bending model in the silicon. The total tip–sample capacitance is determined from the series combination of the oxide and silicon capacitance of each annular region, integrated across all of the annular regions. This approach preserves the fundamentally spherical geometry of the SCM tip and resulting electrical field distribution. An alternative approach (16,82) divides the tip into a series of small annular rings, as shown in Fig. 10. Each ring is modeled as a parallel plate capacitor that has a small area and equivalent oxide thickness due to the series combination of air gap and sample oxide. The total tip–sample capacitance is determined by summing all of the parallel capacitances of the rings. This approach allows modelling the tip by both a spherical tip at the apex and a conical shaft (or any arbitrary tip geometry). This model
10°
has been extended to include the idea of a flattened tip end to simulate the effect of a worn tip (34). Quasi 3-D approaches can be calculated very rapidly, and reproduce the features and model parameter dependencies of the SCM signal. These models allow studying the effect on a SCM signal of varying carrier concentration, oxide thickness, dielectric constant, and dc bias voltage relative to the sample flatband voltage. They do not produce a particularly exact agreement with the experimental SCM signal. Substantial differences have been observed between a more comprehensive finite-element solution of Poisson’s equation and the various quasi-3D models (24). Better agreement can be produced by fitting a calculated SCM signal to a measured SCM signal by varying one or more model parameters. The high-frequency voltage is often used as the fitting parameter because small changes in its value can result in large changes in the shape of the tip–sample C–V curves. Numerical Solutions of Poisson’s Equation. The quasi-3D models have proven to be a quick and convenient way to approximate the SCM signal. However, the extreme performance goals for 2-D carrier profiling indicate that a more rigorous approach is required. Although an analytic solution is not possible for the SCM geometry, Poisson’s equation can be solved numerically using an iterative finite-element method (FEM). The approach developed at NIST (18,23,24,26) models the probe as if it is conically shaped and has a rounded tip that rests on the surface of a uniformly thick oxide layer that covers a uniformly doped semiconductor silicon substrate. The cylindrical axis of the probe is normal to the oxide surface. A representative cross section of the cylindrical geometry that shows the charge distribution in the silicon is presented in Fig. 11. The probe-tip radius of curvature is 10 nm, the probe cone apex half-angle is 10° , the oxide thickness is 10 nm, the radial distance along the oxide–semiconductor boundary is 0.1 µm, and the outer boundary of the ambient region that surrounds the probe is a circular arc. The net charge density in the semiconductor is found by solving Poisson’s equation. The quantities n and p are related to by using Fermi statistics, the band bending approximation, and no allowance for interfacial states. Poisson’s equation is solved self-consistently with
rtip
Oxide
ni − 5 ni − 4 ni − 3 ni −2 ni−1
ni
ni+1 ni+2 ni+3 ni+4 ni+ 5
D Figure 10. Division of model SCM tip geometry into the small ring-shaped elements used in the quasi-3-D simulation of a SCM signal.
Figure 11. 2-D geometry for the FEM Poisson solution showing the simulation geometry and the calculated charge density. See color insert.
CAPACITIVE PROBE MICROSCOPY
da(n · D),
(3)
where D is the electrical displacement and n is the outward unit normal to the probe surface (n is directed toward the interior of the probe). The high-frequency (HF) capacitance is determined by taking the difference between the results of two steady-state solutions, whose biases differ by V, and calculating Q/V. The HF capacitance is calculated for a range of biases and spline fitted. The derivative of the HF capacitance is found by differentiating the spline curve. For 2-D simulations, the probe corresponds to a knife shape that has a rounded cutting edge. The finite-element software package PLTMG is used to solve Poisson’s and Laplace’s equations (24,26). For uniformly doped silicon, the 2-D solutions can be easily projected to 3-D because of the rotational symmetry of the probe. True 3-D solutions are obtained by solving an elliptical system of coupled nonlinear partial differential equations on a rectangularly shaped 3-D domain by using the method of collocation of Gauss points (23). Several commercial device simulation packages have also been applied to the problem of simulating the SCM signal. These include PADRE, used by the group at Lucent Technology (51,52), DAVINCI used by Sanford University (39), and DESSIS used by the Swiss Federal Institute of Technology (68,69) and IMEC (70–72). Extraction of Carrier Profiles from SCM Images (Reverse Modeling)
1.0
SCM
1.0
TS4 simulation
0.8
0.8 1e+16
0.6
5e+16 5e+16 1e+17
0.6
0.4
5e+17 5e+17 1e+18 1e+18
0.4
2e
+1
3e
0.2
0.0
0.2
2e+19
0.0
0.4
1e+19
9
9
5e+18
9
+1
3e+19
Reverse modeling [also sometimes called inverse modeling (34)] refers to the problem of determining the carrier profile from the measured SCM signal and the known measurement parameters. This is more difficult than forward modeling because the profile cannot be calculated directly, but rather depends on iterated solutions of the forward problem. The process for extracting a carrier profile from SCM data via reverse modeling was first described in (12). The process begins with a candidate carrier profile and a set of model parameters. These inputs are used to calculate the theoretical SCM signal by using a forward model.
Calibration Curve Methods. Application of SCM in a semiconductor fabrication or research environment requires a fast and accurate method for extracting the 2-D carrier profile. Calibration curve methods have been developed to reduce the time needed to extract carrier profiles from SCM images. Software packages that extract 2-D carrier profiles from SCM images using variants of the calibration curve method have been developed at NIST (FASTC2D) and the University of Utah (DPACK). It has been shown that calibration curve methods are adequate for correlating carrier profiles measured by SCM with TCAD models. Figure 12 shows a contour plot of 2-D carrier profiles extracted from an SCM image
1e+19 5e+ 118 5e+ 8
+1 2e
0.6 0.8 1.0 Distance (µm)
0.2
19
+ 3e
3e+ 19
Q=−
19
or by a surface integral over the probe;
19
(2)
2e+
d3 x(Nd − Na + p − n),
1e+
Q=
The difference between the measured and the calculated SCM signal is then used to adjust the candidate carrier profile. The process is repeated with the new candidate carrier profile until the measured and calculated SCM signal converges to the desired degree. A functioning reverse SCM model has been reported (34). A major problem of reverse modeling is that it can be extremely computer time-consuming, especially using a sophisticated 3-D forward model. SCM images from commercial microscopes can be as large as 512 × 512 points, so repeatedly calculating the SCM signal at each point can rapidly become intractable. An additional complicating factor is that the solution to the reverse problem is not unique; many different carrier profiles can produce the same SCM signal level. Achieving convergence to a realistic carrier profile requires placing additional constraints on the candidate carrier profiles. Because no unique solution exists, the reverse modeling process is very sensitive to measurement noise. Real data must be carefully smoothed to avoid runaway solutions.
Distance (µm)
Laplace’s equation in the oxide and in the air regions by using the linear FEM. At the insulator–semiconductor boundary, the normal potential is continuous, and the discontinuity of the normal component of the electrical displacement vector depends on the trapped interfacial charge. The interfacial charge density was set to zero for this work. Dirichlet boundary conditions are used along the boundary and the grounded semiconductor boundary. Neumann boundary conditions are used along the remaining outer boundaries, where the normal derivative of the potential is set to zero. The net charge Q in the semiconductor may be found by a volume integral:
27
1.2
0.0 1.4
Figure 12. Contour plots of 2-D carrier profiles extracted from SCM image and corresponding simulation of carrier profile using TSUPREM4 (32). (Copyright American Institute of Physics 1998, used with permission.)
28
CAPACITIVE PROBE MICROSCOPY
and corresponding simulation of a carrier profile using TSUPREM4 (32). The simplest implementation of the calibration curve method includes these features (16,18): 1. Calibration curve. The calibration curve is determined by calculating the SCM signal for distinct values of dopant concentration using the model parameters for which the SCM data was acquired. The SCM signal value at each x and y position of the image is related to a value of dopant concentration from interpolation versus the calibration curve points. 2. Point-by-point assumption. Each measured capacitance point is treated independently, that is the effect of the local dopant gradient and curvature is ignored. Because the adjacent points are not considered, one calibration curve is valid for every point of the entire SCM image. Because only the local value of N is considered, this approach has been deemed a zero-moment model (40). A one-moment model would consider both N and the dopant gradient dN/dx. A two-moment model would consider N, dN/dx, and the dopant gradient curvature d2 N/dx2 . 3. Linearized model. The physical model of the SCM is embedded in a database of general C–V curves that are calculated off-line. The database is multidimensional and contains C–V curves calculated for different dopant densities, tip shapes, oxide thickness, etc. Specific C–V curves are obtained by interpolation. The use of a database allows calculating the calibration curves very rapidly, regardless of the time required to calculate a C–V curve. 4. Reference for normalization. To relate the measured SCM signal to the calculated SCM signal requires that the dopant density and SCM signal be known at one point of the image. This normalization point can be obtained from a 1-D SIMS profile, from SCM measurements on known reference samples, or from a region of the sample under test that has a known dopant density. Zero-moment calibration curve methods permit extracting carrier profiles from SCM images in near real time, but they suffer from limitations. To be strictly valid, the carrier profiles must change slowly with respect to the tip radius. Most technologically interesting carrier profiles in silicon have a much steeper rate of change. It has been shown that for the current generation of ultrashallow junctions used in silicon ICs, the SCM signal depends critically on both the dopant gradient and the dopant gradient curvature (26). At the peak of a typical ion implanted dopant profile, the dopant gradient can be low, and the dopant curvature can be high. Yu et al.developed a onemoment method for extracting carrier profiles from SCM images (40). McMurray developed an nth-order method for forward modeling SCM response to arbitrary dopant profiles (34).
OTHER CAPACITIVE PROBE MICROSCOPES Two other techniques can also be called capacitive probe microscopes: the scanning Kelvin probe microscope (SKPM) and the scanning microwave microscope (SMWM). The scanning Kelvin probe microscope uses the Kelvin method of measuring the contact potential difference (83–88). A noncontact mode AFM is used to provide a tip that vibrates at a frequency ω. The difference in work function between the sample and the tip generates an electrical field that alternates at ω. The voltage necessary to minimize the signal at ω is an estimate of the work-function difference. The second harmonic (2ω) of the Kelvin excitation frequency is proportional to the dC/dz signal. Scanning microwave microscopes can be implemented in several ways. Scanning evanescent microwave microscopes (SEMMs), also known as scanning near-field microwave microscopes, use near-field or nonpropagating microwaves to measure the complex electrical impedance of materials (89–94). The SEMM tip is connected to a high quality-factor (Q) microwave resonator that is shielded to screen out all but the evanescent microwaves from being generated at the tip. The interaction between the evanescent microwaves and the sample surface gives rise to resonant frequency and quality-factor changes in the resonator. These signals are interpreted as the real and imaginary components of impedance (i.e., resistive and capacitive components). SEMMs are used for evaluating the uniformity of the electrical properties of high-Tc superconductors and for library evaluation of combinatory methods. The microwave mixing approach to scanning capacitance microscopy combines an AFM for tip control with the alternating current scanning tunneling microscopy (ACSTM) mixing technique for capacitance detection (95–97). Two microwave input signals whose frequencies are f1 and f2 and have a variable dc bias voltage are applied to the tip–sample MOS structure. The sum frequency signal (f1 + f2 ) is proportional to dC/dV, and the third harmonic signal (3f ) is proportional to d2 C/dV 2 . CURRENT RESEARCH TOPICS Since the introduction of the scanning capacitance microscope, its instrumentation, metrological techniques, models, and data interpretation have continued to evolve. Much of the work continuing today is aimed at making SCM a 2-D carrier-profiling tool that can meet the ITRS goals. Experimentally, the signal-to-noise of the SCM measurement limits the quality of 2-D carrier profiles. It has been suggested that noise due to nonuniform surface properties is currently a limiting factor in the quality of SCM images (35). A wide range of sample and surface preparation techniques has been investigated, but no generally accepted ‘‘best’’ procedure has yet to emerge. For a silicon surface, the insulating layer should have low surface charges, high insulating properties and dielectric strength, and sufficient thickness to prevent electron tunneling. A
CAPACITIVE PROBE MICROSCOPY
low temperature process is required that avoids degrading epoxies and disturbing the carrier distribution. The quality of SCM data would also be improved by the development of sharper tips that have a longer lifetime when scanning in contact. On the theoretical side, some features in SCM images of semiconductors that contain p–n junctions have not been adequately explained (17,36). The accuracy of the measured carrier profile depends on the accuracy of the device simulator used to calculate the theoretical response. Much effort is being placed in applying and developing 3D device simulators that can rapidly calculate the SCM signal for an arbitrary 3-D dopant profile. Because the oxide layers on silicon produced by surface preparation techniques are of the order of 2 nm thick, quantum mechanical effects may have to be included in models of the SCM as well. Any model of the SCM is only as good as its ability to reproduce the experimentally measured SCM signal. Validation of SCM models versus experimental data is important. For SCM to meet the ITRS road-map goals for 2-D carrier profiling in the future requires the development of a validated 3-D simulator. Improved techniques to extract carrier profiles from SCM data using 3-D model data must also be developed. These techniques must account for the dopant gradient and dopant curvature effects, account for the presence of a p-n junction, and provide for rapid extraction of the carrier profile. If these barriers can be overcome, then the spatial resolution and accuracy in carrier concentration of 2-D carrier profiles measured by SCM will continue to improve. Acknowledgments Official Contribution of the National Institute of Standards and Technology; not subject to copyright. The author thanks all of those who contributed material and reviewed this article, including Jay F. Marchiando, NIST; Clayton C. Williams, University of Utah; Andy Erickson, Digital Instruments, Veeco Metrology Group; Hal Edwards, Texas Instruments Inc.; Brian G. Rennex, NIST; Richard Allen, NIST; and Eric Secula, NIST.
ABBREVIATIONS AND ACRONYMS 2-D 3-D ACSTM AFM CED C-V DRAM FEM HF IC-SCM ITRS LCR LPE MOS MOSFET
Two-Dimensional Three-Dimensional Alternating Current Scanning Tunneling Microscope Atomic Force Microscope Capacitive Electronic Disc Capacitance versus Voltage Dynamic Random Access Memory Finite Element Method High Frequency Intermittent Contact Scanning Capacitance Microscope International Technology Roadmap for Semiconductors Inductor Capacitor Resistor Liquid Phase Epitaxy Metal Oxide Semiconductor Metal Oxide Semiconductor Field Effect Transistor
NOS NMOS PMOS SEM SEMM SCM SCS SKPM SIMS SMWM STM TCAD TEM VCR UHF UV
29
Nitride Oxide Semiconductor N-Channel Metal Oxide Semiconductor P-Channel Metal Oxide Semiconductor Scanning Electron Microscope Scanning Evanescent Microwave Microscope Scanning Capacitance Microscope Scanning Capacitance Spectroscopy Scanning Kelvin Probe Microscope Secondary Ion Mass Spectroscopy Scanning Microwave Microscope Scanning Tunneling Microscope Technology Computer Aided Design Transmission Electron Microscope Video Cassette Recorder Ultra High Frequency Ultraviolet
BIBLIOGRAPHY 1. The International Technology Roadmap for Semiconductors, International SEMATECH, 3101 Industrial Terrace Suite 106, Austin, TX 78758. The 1999 edition is available at http://public.itrs.net/NTRS/publntrs.nsf. 2. J. R. Matey and J. Blanc, J. Appl. Phys. 57, 1437–1444 (1985). 3. J. R. Matey, SPIE Scanning Microsc. Technol. Appl. 897, 110–117 (1988). 4. J. K. Clemens, RCA Rev. 39, 33–59 (1978). 5. R. C. Palmer, E. J. Denlinger, and H. Kawamoto, RCA Rev. 43, 194–211 (1982). 6. C. D. Bugg and P. J. King, J. Phys. E.: Sci. Instrum. 21, 147–151 (1988). 7. H. P. Kleinknecht, J. R. Sandercock, and H. Meier, Scanning Microsc. 2, 1839–1844 (1988). 8. C. C. Williams, W. P. Hough, and S. A. Rishton, Appl. Phys. Lett. 55, 203–205 (1989). 9. C. C. Williams, J. Slinkman, W. P. Hough, and H. K. Wickramasinghe, J. Vac. Sci. Technol.A 8, 895–898 (1990). 10. C. C. Williams, J. Slinkman, W. P. Hough, and H. K. Wickramasinghe, Appl. Phys. Lett. 55, 1662–1664 (1989). 11. M. D. Kirk, S. I. Park, and I. R. Smith, 3-Dimensional dopant profiling using scanning capacitance microscopy, Final Technical Report on DARPA Contract #DAAH01-91-C-R118, September 9, 1991. 12. Scanning Capacitance-Voltage Microscopy, US Pat. No. 5,065,103, Nov. 12, 1991, J. A. Slinkman, H. K. Wickramasinghe, and C. C. Williams. 13. J. J. Kopanski, J. F. Marchiando, and J. R. Lowney, in W. M. Bullis, D. G. Seiler, and A. C. Diebold, eds., Semiconductor Characterization: Present Status and Future Needs, AIP, NY, 1996, pp. 308–312. 14. J. J. Kopanski, J. F. Marchiando, and J. R. Lowney, J. Vac. Sci. Technol. B 12, 242–247 (1996). 15. G. Neubauer et al., J. Vac. Sci. Technol. B 14, 426–432 (1996). 16. J. J. Kopanski, J. F. Marchiando, and J. R. Lowney, Mater. Sci. Eng. B 44, 46–51 (1997). 17. J. J. Kopanski, J. F. Marchiando, and R. Alvis, in P. RaiChoudhury, J. L. Benton, D. K. Schroder, and T. J. Shaffner, eds., Diagnostic Techniques for Semiconductor Materials and Devices, vol. 97-12, The Electrochemical Society, Inc, Pennington, NJ, 1997, pp. 183–193.
30
CAPACITIVE PROBE MICROSCOPY
18. J. F. Marchiando, J. R. Lowney, and J. J. Kopanski, Scanning Microsc. 11(1), 205–224 (1997).
48. R. Mahaffy, C. K. Shih, and H. Edwards, J. Vac. Sci. Technol. B 18, 566–571 (2000).
19. J. J. Kopanski et al., J. Vac. Sci. Technol. B 16, 339–343 (1998).
49. V. A. Ukraintsev et al., in D. G. Seiler et al., eds., Characterization and Metrology for ULSI Technology, AIP Conf. Proc. 449, AIP, NY, 1998, pp. 741–745.
20. J. J. Kopanski and S. Mayo, Appl. Phys. Lett. 72, 2469–2471 (1998). 21. S. Mayo, J. J. Kopanski, and W. F. Guthrie, in D. G. Seiler et al., eds., Characterization and Metrology for ULSI Technology, AIP Conf. Proc. 449, AIP, NY, 1998, pp. 567–572.
50. V. A. Ukraintsev et al., in D. G. Seiler et al., eds., Characterization and Metrology for ULSI Technology, AIP Conf. Proc. 449, AIP, NY, 1998, pp. 736–740. 51. M. L. O’Malley et al., Appl. Phys. Lett. 74, 272–274 (1999).
22. J. J. Kopanski, J. F. Marchiando, and B. G. Rennex, J. Vac. Sci. Technol. B 18, 409–413 (2000).
52. M. L. O’Malley et al., Appl. Phys. Lett. 74, 3672–3674 (1999).
23. J. F. Marchiando, Intl. J. Num. Meth. Eng. 39, 1029–1040 (1996).
54. C. J. Kang et al., Appl. Phys. Lett. 71, 1546–1548 (1997).
24. J. F. Marchiando, J. J. Kopanski, and J. R. Lowney, J. Vac. Sci. Technol. B 16, 463–470 (1998). 25. J. J. Kopanski, J. F. Marchiando, J. Albers, and B. G. Rennex, in D. G. Seiler et al., eds., Characterization and Metrology for ULSI Technology, AIP Conf. Proceedings 449, AIP, New York, 1998, pp. 725–729. 26. J. F. Marchiando, J. J. Kopanski, and J. Albers, J. Vac. Sci. Technol. B 18, 414–417 (2000). 27. Y. Huang and C. C. Williams, J. Vac. Sci. Technol. B 12, 369–372 (1994). 28. Y. Huang, C. C. Williams, and J. Slinkman, Appl. Phys. Lett. 66, 344–346 (1995).
53. C. J. Kang et al., Appl. Phys. Lett. 74, 1815–1817 (1999). 55. C. J. Kang, C. K. Kim, Y. Kuk, and C. C. Williams, Appl. Phys. A 66, S415–S419 (1998). 56. R. Yamamoto, K. Sanada, and S. Umemura, Jpn. J. Appl. Phys. 33, 5829–5837 (1994). 57. R. Yamamoto, K. Sanada, and S. Umemura, Jpn. J. Appl. Phys. 35, 5284–5287 (1996). 58. K. Sanada, R. Yamamoto, and S. Umemura, Jpn. J. Appl. Phys. 33, 6383–6388 (1994). 59. H. Tomiye et al., Jpn. J. Appl. Phys. 34, 3376–3379 (1995). 60. H. Tomiye, T. Yao, H. Kawami, and T. Hayashi, Appl. Phys. Lett. 69, 4050–4052 (1996). 61. H. Tomiye and T. Yao, Appl. Phys. A 66, S431–S434 (1998).
29. Y. Huang, C. C. Williams, and H. Smith, J. Vac. Sci. Technol. B 14, 433–436 (1996).
62. K. Goto and K. Hane, J. Appl. Phys. 84, 4043–4048 (1998).
30. J. S. McMurray, J. Kim, and C. C. Williams, J. Vac. Sci. Technol. B 15, 1011–1014 (1997).
64. O. Bowallius et al., Appl. Surf. Sci. 144–145, 137–140 (1999).
31. J. S. McMurray, J. Kim, C. C. Williams, and J. Slinkman, J. Vac. Sci. Technol. B 16, 344–348 (1998).
66. M. Dreyer and R. Wiesendanger, Appl. Phys. A 61, 357–362 (1995).
32. J. Kim, J. S. McMurray, C. C. Williams, and J. Slinkman, J. Appl. Phys. 84, 1305–1309 (1998).
67. A. Born and R. Wiesendanger, Appl. Phys. A 66, S421–S426 (1998).
33. V. V. Zavyalov, J. S. McMurray, and C. C. Williams, J. Appl. Phys. 85, 7774–7783 (1999).
68. L. Ciampolini, M. Ciappa, P. Malberti, and W. Fichtner, Proc. 1st Eur. Workshop on Ultimate Integration of Silicon, (January 20–21, 2000), Grenoble, France.
34. J. S. McMurray and C. C. Williams, in D. G. Seiler et al., eds., Characterization and Metrology for ULSI Technology, AIP Conf. Proceedings 449, AIP, NY, 1998, pp. 731–735. 35. V. V. Zavyalov, J. S. McMurray, and C. C. Williams, in D. G. Seiler et al., eds., AIP Conf. Proc. 449, AIP, NY, 1998, pp. 753–756. 36. V. V. Zavyalov, J. S. McMurray, and C. C. Williams, Rev. Sci. Instrum. 70, 158–164 (1999). 37. V. V. Zavyalov et al., J. Vac. Sci. Technol. B 18, 549–554 (2000).
63. S. Anand et al., Appl. Surf. Sci. 144–145, 525–529 (1999). 65. S. Anand, IEEE Circuits and Devices 16, 12–18 (2000).
69. L. Ciampolini, M. Ciappa, P. Malberti, and W. Fichtner, Proc. 3rd Int. Conf. Modeling and Simulation of Microsystems, (March 27–29, 2000), San Diego, CA. 70. R. Stephenson et al., J. Vac. Sci. Technol. B 18, 555–559 (2000). 71. R. Stephenson et al., Appl. Phys. Lett. 73, 2597–2599 (1998). 72. R. Stephenson et al., J. Vac. Sci. Technol. B 18, 405–408 (2000).
38. A. C. Diebold, M. Kump, J. J. Kopanski, and D. G. Seiler, J. Vac. Sci Technol. B 14, 196–201 (1996).
73. W. Vandervorst et al., in D. G. Seiler et al., eds., Characterization and Metrology for ULSI Technology, AIP Conf. Proc. 449, AIP, NY, 1998, pp. 617–640.
39. R. C. Barrett and C. F. Quate, J. Appl. Phys. 70, 2725–2733 (1991).
74. P. De Wolf et al., J. Vac. Sci. Technol. B 16, 355–361 (1998).
40. G. M. Yu, P. B. Griffin, and J. D. Plummer, Techn. Dig. Int. Electron Device Meeting, 1998, pp. 717–720. 41. K. V. Smith, E. T. Yu, J. M. Redwing, and K. S. Boutros, Appl. Phys. Lett. 75, 2250–2252 (1999). 42. Y. Li, J. N. Nxumalo, and D. J. Thomson, J. Vac. Sci. Technol. B 16, 457–462 (1998). 43. A. Erickson et al., J. Electron. Mater. 25, 301–304 (1996). 44. J. W. Hong et al., Appl. Phys. Lett. 75, 1760–1762 (1999). 45. R. Alvis, C. C. Williams, J. McMurray, and J. Kim, Future Fab Int. 3, 345–349 (1996).
75. E. H. Nicollian and J. R. Brews, MOS (Metal Oxide Semiconductor) Physics and Technology, Wiley, NY, 1982. 76. S. M. Sze, Physics of Semiconductor Devices, Wiley, NY, 1981. 77. Scanning Capacitance Microscopy, Support Note No. 224, Rev. B, Digital Instruments Inc., 520 E. Montecito St., Santa Barbara, CA, 1996. 78. D. E. McBride and J. J. Kopanski, in D. G. Seiler et al., eds., Characterization and Metrology for ULSI Technology 2000, AIP Conf. Proceedings 550, AIP, New York, 2001, pp. 657–661.
46. H. Edwards et al., Appl. Phys. Lett. 72, 698–700 (1998).
79. G. D. Wilk and B. Brar, IEEE Electron Dev. Lett. 20, 132–134 (1999).
47. H. Edwards et al., J. Appl. Phys. 87, 1485–1495 (2000).
80. P. Markiewicz et al., Probe Microscopy 1, 355–364 (1999).
CATHODE RAY TUBE DISPLAY TECHNOLOGY 81. A. S. Grove, B. E. Deal, E. H. Snow, and C. T. Sah, SolidState Electron. 8, 145–163 (1965). 82. B. G. Rennex, J. J. Kopanski, and J. F. Marchiando, in D. G. Seiler et al., eds., Characterization and Metrology for ULSI Technology 2000, AIP Conf. Proceedings 550, AIP, New York, 2001, pp. 635–640. 83. M. Nonnenmacher, M. P. O’Boyle, and H. K. Wickramasinghe, Appl. Phys. Lett. 58, 2921–2923 (1991). 84. M. Nonnenmacher, M. O’Boyle, and H. K. Wickramasinghe, Ultramicroscopy 42–44, 268–273 (1992). 85. A. K. Henning et al., J. Appl. Phys. 77, 1888–1896 (1995). 86. M. Yasutake, Jpn. J. Appl. Phys. 34, 3403–3405 (1995). 87. T. Hochwitz et al., J. Vac. Sci. Technol. B 14, 440–446 (1996). 88. H. O. Jacobs, H. F. Knapp, and A. Stemmer, Rev. Sci. Instrum. 70, 1168–1173 (1999). 89. C. P. Vlahacos et al., Appl. Phys. Lett. 69, 3272 (1996). 90. D. E. Steinhauer et al., Appl. Phys. Lett. 71, 1736 (1997). 91. C. Gao, Appl. Phys. Lett. 71, 1872 (1997). 92. C. Gao and X.-D. Xiang, Rev. Sci. Instrum. 69, 3846 (1998). 93. F. Duewer, C. Gao, I. Takeuchi, and X.-D. Xiang, Appl. Phys. Lett. 74, 2696 (1999). 94. M. Tabib-Azar, P. S. Pathak, G. Ponchak, and S. LeClair, Rev. Sci. Instrum. 70, 2783 (1999). 95. J. Schmidt, D. H. Rapoport, G. Behme, and H.-J. Fr¨ohlich, J. Appl. Phys. 86, 7094–7099 (1999). 96. L. A. Bumm and P. S. Weiss, Rev. Sci. Instrum. 66, 4140–4145 (1995). 97. S. J. Stranick and P. S. Weiss, Rev. Sci. Instrum. 65, 918–921 (1994).
CATHODE RAY TUBE DISPLAY TECHNOLOGY ROBERT L. BARBIN ANTHONY S. POULOS Thomson Consumer Electronics Lancaster, PA
The invention of the cathode ray tube (CRT) is generally ascribed to Karl Ferdinand Braun in 1897. His tube was the first to combine the basic functions of today’s CRTs: electron source, focusing, deflection, acceleration, phosphor screen, and a sealed mechanical structure (1). The invention was followed by a period where the CRT progressed from being a laboratory curiosity, through the application stages of oscillograph, radar, and black-andwhite television, to the major modern applications that have helped shape the world and have become a part of our everyday lives: color television and information displays. CRT BASICS Figure 1 shows the basic components of a CRT used for color displays: electron gun, deflection yoke, phosphor screen, shadow mask, and glass bulb. References 1–7 provide good overall information about CRTs. The electron gun is the source of electrons. This component, located at the rear of the CRT, contains (1) three cathodes that generate electrons by thermionic
Glass bulb panel and funnel
31
Phosphor screen
Electron gun Shadow mask
Deflection yoke Figure 1. Basic parts of the color CRT: panel, funnel, shadow mask, phosphor screen, electron gun, and deflection yoke.
emission and (2) a series of electrodes at various electrical potentials to accelerate, modulate, and focus the electron beams. The gun’s configuration determines the relationship between the incoming signal and the beam current. It also plays a significant role in determining the size and shape of the picture elements and, hence, sharpness. Three electron beams leave the gun from slightly different positions. These separate beams carry the red, green, and blue video information. The deflection yoke creates the magnetic fields that deflect the electron beams. This component contains two formed coils: one deflects the beams horizontally and one deflects them vertically. It is located between the electron gun and the screen. The yoke provides the precise electron beam scanning that is a key aspect of television picture reproduction. The phosphor screen, comprised of phosphor elements that provide red, green, and blue light, converts the energy in the electron beams to visible light through cathodoluminescence. In addition, the screen typically contains black elements that improve contrast by reducing the reflection of ambient light. The screen is located at the front of the CRT on the inside of the glass panel. The glass bulb encloses the internal components and supports the external components. The enclosure is airtight allowing a vacuum to be created on the inside. The bulb consists primarily of the panel at the front of the CRT and the funnel at the back. The neck, a cylindrical glass tube, is located at the rear of the funnel and provides support for the electron gun and the deflection yoke. The shadow mask governs the operation of nearly all of today’s color CRTs. H. B. Law of RCA developed the shadow mask principal of operation in the 1950s. The shadow mask is typically a curved metal sheet 0.10 to 0.25 mm thick, approximately the size of the screen and contains small apertures or slots. The three electron beams approach the screen at slightly different angles. The apertures in the mask combined with the proper geometry allow the appropriate electrons to pass while the solid space between openings blocks or shadows other electron beams from striking undesired colors. The raster is the pattern of scanned lines created when the deflection yoke deflects the electron beams horizontally and vertically (see Fig. 2). The electron beam currents are modulated during the scan to provide the proper luminance and chrominance for each portion of the picture. The screen is typically at a potential of about 30 kV. Thus, it acts as the anode and attracts the electrons emitted from the cathode, which is near ground potential.
32
CATHODE RAY TUBE DISPLAY TECHNOLOGY
(2) the size of the CRT color selection elements, and (3) the convergence of the three electron beams.
Figure 2. The raster is created by vertical and horizontal deflection of the electron beam. The vertical scan occurs at a much slower rate than the horizontal scan and thereby causes a slight tilt in the lines. The nearly horizontal retrace lines occur at a very high rate.
The funnel is coated with conducting layers on the inside and outside. The capacitance between these two layers filters the high voltage generated by the raster scan circuit and thus provides an acceptable anode voltage source and the energy storage necessary to produce high peak currents. The capability to supply this instantaneous high power provides the CRT with outstanding highlight brightness.
Electron Beam Spot Size and Shape. The electron spot characteristics are determined by the design of the electron gun and deflection yoke, as well as many of the basic CRT parameters: tube size, deflection angle, and anode voltage. The size and shape of the electron spot is a complicated function and requires special care to measure and analyze. At a given screen location and peak beam current, the electron beam distribution can be characterized as the distribution of a static spot fixed in time. Because much of the spot is hidden by the shadow mask and screen structure, it is difficult to obtain a good measure of the actual spot distribution (see Fig. 3). Specialized measurement equipment has been developed to ‘‘see’’ the distribution of the electrons within the electron beam behind the mask. The resolution is generally evaluated separately in the horizontal and vertical
(a)
PERFORMANCE BASICS Many performance measures can be used to characterize the quality of CRT displays. In this section we will describe some of the more important parameters and quantify the performance levels typically obtained in color television applications (6–10). Resolution/Sharpness
(b)
Probably the item first thought of for describing the performance of a CRT display system is its sharpness or focus quality. The resolution of the display is one of the most important, and probably the most difficult, of the performance parameters to quantify. In describing the resolution of a display, you have to determine what pieces of information can be seen and also how well they can be seen. This involves the physical parameters of the display device itself, as well as psychophysical properties of the human vision system (11–29). (c)
Resolution Versus Addressability. Display monitors and television sets are often said to be able to display a certain number of pixels, or lines of resolution. Oftentimes the number of addressable elements (pixels) of a system is confused with its actual resolution. The number of scan lines and number of pixels are, in general, addressability issues related to the input signal and the electrical characteristics of the display monitor or TV set and are not related to the CRT itself. A 600 × 800 pixel specification for a monitor describes only the electrical characteristics of the monitor but does not tell how well the pixels are resolved by the CRT display. The actual resolution of a color CRT display is determined by the input signal and monitor performance and also by the characteristics of the CRT itself. The important CRT characteristics are (1) the size and shape of the electron beam as it strikes the screen,
Figure 3. Representation of the projection of the electron beam through the shadow mask onto the screen illustrates the difficulty of measuring the actual electron beam distribution. It consists of (a) a representation of the electron beam density as it strikes the shadow mask, (b) the shadow mask, and (c) the projection of the electron beam through the shadow-mask apertures.
Line profile electron density
CATHODE RAY TUBE DISPLAY TECHNOLOGY
Half-intensity point
50%
Approximate visual limit
5% 2% Spatial distance
Figure 4. Line profile distribution of the electron beam as it strikes the shadow mask. This represents the electron density perpendicular to a single crosshatch line.
directions by doing the appropriate integration of the two-dimensional electron spot. The resulting ‘‘line-spread profiles’’ are representative of the electron distribution across horizontal and vertical crosshatch lines. These line-spread profiles are roughly Gaussian in distribution at low currents but can deviate significantly from Gaussian at high peak currents. A common descriptor of the spot size is the width of these profiles in millimeters at a given percentage (usually 5%) of the spot height, as shown in Fig. 4. Spot sizes vary greatly with tube size, beam current, and screen location. Figure 5 shows the 5% spot size of an A90 (36 in. visual screen diagonal) color television CRT as a function of the beam current at both the center and the corners of the visible screen. The 5% spot size is a measure of the CRT’s resolution capability, but it does not consider the input signal to
Center, vertical direction 8.0
7.0
7.0
6.0
6.0 5% spot size (mm)
5% spot size (mm)
Center, horizontal direction 8.0
5.0 4.0 3.0
5.0 4.0 3.0
2.0
2.0
1.0
1.0
0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
Beam current (mA)
Beam current (mA) Center, vertical direction 8.0
7.0
7.0
6.0
6.0 5% spot size (mm)
5% spot size (mm)
Center, horizontal direction 8.0
5.0 4.0 3.0
5.0 4.0 3.0
2.0
2.0
1.0
1.0
0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
0.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
Beam current (mA)
33
Beam current (mA)
Figure 5. Plots of 5% spot size as a function of beam current for a typical 36V CRT. Both horizontal and vertical direction and center and corner data are shown.
34
CATHODE RAY TUBE DISPLAY TECHNOLOGY
be reproduced. A better descriptor of how well the image is reproduced is the modulation transfer function (MTF). The MTF describes the response of the display system as a function of the frequency of the input signal. In broad terms, it is the ratio of the output (light intensity or electron beam density) to a sinusoidal input video signal as a function of the signal frequency, normalized so that the ratio goes to one as the frequency approaches zero. The MTF can be measured directly, but more often it is calculated by taking the Fourier transform of the linespread profile distribution of the electron spot. Specifically, the MTF is ∞ l(x)e−2π iνx dx (1) MTF = M(ν) = −∞
where l(x) is the line-spread profile and ν is the frequency of the input signal. The MTF can be numerically calculated from the measured line-spread profile by using fast Fourier transforms common in computer mathematical programs. When the electron beam distribution is Gaussian, the MTF can be calculated in closed form: π 2 d2 ν 2 /12
M(ν) = e
(2)
where d equals the 5% spot size in millimeters and ν represents the spatial frequency of the display in cycles per millimeter. Figure 6 shows the horizontal direction MTF for an A90 CRT for various Gaussian spot sizes. Some common signal frequencies (NTSC, VGA, SVGA, and XGA) are indicated. Typical values of the MTF at the pixel frequency for television receivers that show television pictures or for desktop display monitors that show appropriate VGA or SVGA signals are in the 40% to 70% range at the screen center and are in the 20% to 50% in the corners. However, MTFs as low as 10% still give resolvable images. Convergence. The electron beams from the three guns should land at the same position when scanned to any area
100% 0.0-mm spot
90% 80%
VGA SVGA
70% MTF
60%
1.0-mm spot XGA
50% 40%
HDTV
30%
NTSC
20% 10% 0%
4.0-mm spot 0
2.0-mm spot 3.0-mm spot
200
400 600 800 1000 1200 1400 NTSC VGA SVGA XGA HDTV Pixels per picture width (# lines)
Figure 6. Horizontal direction modulation transfer function as a function of input signal frequency (pixels per picture width) for various sized Gaussian spots. Some common television and computer resolutions are also indicated.
of the screen, so that a white spot comprised of red, green, and blue primary colors can be formed. If the beams are not coincident, the composite spot will be larger than the individual beams, thereby degrading the resolution and possibly causing color ‘‘fringing’’ at the edge of the white lines. The distance by which the three beams are separated from each other is called misconvergence. Misconvergence is generally observed by looking at a crosshatch pattern that consists of narrow horizontal and vertical white lines, and it is determined by the distance from the center of one color to another at a given location on these lines. Because each of the three electron beams typically covers multiple screen elements, care must be taken to determine the centroids of the light from each of the three colors. The misconvergence is then defined as the distance between the centroids of each of these colors. The misconvergence of self-converging systems is generally greatest near the edge of the screen. Typical maximum misconvergence values are 0.3 to 0.5 mm for desktop data display monitors, 1.5 to 2.0 mm for 27V CRTs, and 2.0 to 2.4 mm for 36V CRT direct view systems. Screen Structure Size. The resolving capability of a color CRT is related to the spacing of adjacent trios (screen pitch); an excessively large screen pitch inhibits the ability to resolve fine details. However, it is secondary to the electron beam spot size and convergence. The pitch needs to be small enough so that the structure is not visible to the viewer at typical viewing distances and so that it does not significantly interfere with the viewer’s perception of the size and brightness of a single-pixel electron spot. In general, this means that the spacing between screen elements of the same color should be no larger than about one-half the size of the electron spot. Brightness/Contrast The brightness and contrast are other very visible and important characteristics of the quality of the CRT display. Brightness is a perceptual sensation: The measurable parameter from the CRT is light output or, more properly, ‘‘luminance.’’ Luminance is the normalized luminance flux emitted from the CRT surface and is measured in candela per square meter (preferred) or in footlamberts (1 fL = 3.43 cd/m2 ). The light output from the CRT is linear with the electron gun beam current except at very high currents, where phosphor saturation occurs. Typical luminance values for a 27 in. television set are an average of 100 cd/m2 and a small area peak light value of 600 cd/m2 . It makes little sense to talk about the light output of a CRT without also considering the contrast or, more specifically, the contrast ratio. To make a more readily visible image under high ambient lighting effects, the CRT panel glass contains neutral density darkening agents that enhance the contrast at the expense of basic light output. The contrast ratio is defined by TEPAC Publication No. 105-10 (29) as the ratio of the luminance from the excited area of the CRT phosphor screen to that of an unexcited area of the screen. It is calculated by measuring the luminance at the center of a fully excited screen and then changing the pattern to include a 1 in. wide vertical unexcited area (black bar), remeasuring the luminance,
CATHODE RAY TUBE DISPLAY TECHNOLOGY
and taking the ratio of the two readings: CR =
L1 L2
35
(a)
(3)
The contrast ratio is affected by the luminance to which the system is driven, internal light reflections falling on unexcited areas, the unexcited area being excited by internally scattered electrons, and the ambient light reflected from the CRT screen. The contrast ratio is normally measured under ambient lighting conditions that approximate a normal user environment. An ambient value typically used is 215 lux (20 foot-candles).
Pincushion (b)
Raster rotation
White Uniformity/Color Fidelity The color display should have the proper colors across the entire screen and not exhibit distorted colors caused by electrons striking the wrong phosphors. This is called color purity and is usually determined by using signals that energize only one of the three guns at a time, and observing for pure red, green, or blue fields across the entire CRT. There should be no visible discoloration at normal brightness when the CRT faces in any direction. Because of the effects of the earth’s magnetic field on the electron beam, every time the CRT direction is changed, it must be properly demagnetized. In addition, when all three guns are energized in the proper ratios to make the desired white field, the white field should be smooth and uniform without any rapid changes in color or intensity. A gradual change in luminance from the center to the corner of the screen so that the corner luminance is about 50% of that of the center is normal and typical for TV applications. Raster Geometry Deviations in the display of what would normally be a rectangle with perfectly straight lines are called raster geometry distortions, as shown in Fig. 7. Typical geometric errors are (1) pincushion, (2) raster rotation, (3) trapezoid, and (4) parallelogram. All of these can be controlled in the signal and scan circuitry of the monitor or TV set, but in typical modern TV sets, only the side pincushion correction and the raster rotation (for large size TV sets) are controlled electronically. The other items — top and bottom pincushion, trapezoid, and parallelogram — are controlled in the design, manufacture, and setup of the deflection yoke and CRT. Because the CRT panel surface is not flat, the apparent shape of the raster lines depends on the location of the viewer. Measurement is typically from a point along the axis of the tube at five times the picture height from the center face, as described in IEC Publication 107-1960. Another method is to measure just the X and Y coordinates of the location independent of the screen curvature, which is equivalent to viewing the screen from infinity. In discussing raster geometric errors, one must be careful to understand which method is being used.
(c)
Trapezoid (d)
Parallelogram Figure 7. Raster geometry deviations from rectangular illustrating the shape of some common distortions. (a) Pincushion, (b) raster rotation, (c) trapezoid, (d) parallelogram.
DESIGN CONSIDERATIONS The design of a CRT typically starts with selecting several basic parameters: screen size, deflection angle, and panel contour. These parameters, usually established by Engineering and Sales, allow delivering a workable and manufacturable product to the market. CRTs for commercial television applications have screen diagonals that range up to about 40 in., deflection angles from 90° to 110° , and panel contours that are slightly curved to flat. Understanding basic performance issues is needed to understand the design space available in selecting trade-offs. Tube Size and Deflection Angle The size of entertainment television sets in the United States is designated by the viewable screen diagonal in inches such as 19 in., 27 in., and 36 in. The corresponding CRTs are identified as 19V, 27V, and 36V; V indicates that the preceding number is the viewable screen diagonal in inches. The aspect ratio, the screen width divided by the screen height, must also be specified for a complete
36
CATHODE RAY TUBE DISPLAY TECHNOLOGY
description. The common format in use is a 4 : 3 aspect ratio. CRTs for high-definition television use a wider aspect ratio, 16 : 9 (see Fig. 8). The screen size, measured from the outside of the CRT, is the chord distance between the apparent location of diagonally opposite screen corners as seen from infinity (i.e., measured parallel to the center axis of the tube). The actual screen on the inside of the panel is slightly smaller because of the optical refraction caused by the sloped surfaces of the panel (see Fig. 9). In general, larger screen sizes imply more difficult design and manufacture, and they result in larger, more expensive, and heavier tubes and TV sets. The deflection angle is the angle formed between straight lines that originate at the deflection center of the yoke and end at diagonally opposite corners of the screen on the inside of the panel (see Fig. 10). The trade-offs involved in selecting the deflection angle are the overall length of the tube, deflection power, geometric distortion, convergence, technical difficulty, and cost.
4×3
16 × 9
Figure 8. Illustration of the 4 × 3 and 16 × 9 aspect ratios. The heights of both rectangles have been set equal.
Observed screen edge
Glass panel
Screen inside screen edge
Figure 9. The observed screen edge is larger than the actual screen on the panel inside. The measurement is made parallel to the tube centerline and refracts through the glass, as shown.
Deflection angle 110 ° Figure 10. The deflection angle is the angle between the beam path that connects the deflection center and diagonally opposite corners of the screen.
Panel Design Considerations The panel contour is also a trade-off. A flatter panel is better for styling and aesthetic purposes. However, weight, technical difficulty, and cost are increased. In the industry, panels are often associated with descriptions such as 1R, 1.5R, and 2R. These identifications characterize the flatness of the panel. The reference is an early 25 V tube that has a spherical contour and approximately a 1-m radius. It became known as the 1R contour. A 36V 1R contour would have a radius scaled by the ratio of the or 1.44 m. A 2R contour for a 25 V screen sizes: 1 m × 36 25 screen would be flatter and have a radius of 2 m. A 36V 2R contour would have a radius of 2.88 m. Some panels have an aspherical curvature. The radius of curvature in these panels is a function of position. For example, if only the center point and the corner point are considered, a particular panel may be 2R. If only the center and the top are considered, the same panel might be 1.5R. As the contour becomes flatter, the thickness of the faceplate typically increases to reduce the glass stress introduced by atmospheric pressure. The path length differences between beams traveling to the center of the screen and to the edge become larger and require more complex gun and yoke designs. In addition, as will be seen later, the shadow mask becomes flatter, which increases local thermal distortions. Basic Shadow-Mask Geometry A detailed understanding of several key performance aspects begins with understanding the shadow-mask principle. The three beams leave the electron gun slightly separated by approximately 5.8 mm. Most tubes today use in-line geometry wherein the three beams leave the gun in a horizontal plane. The two outer beams are angled slightly inward to achieve convergence at the screen. Figure 11 shows that proper positioning of the shadow mask allows only the correct beam to reach the red, green, or blue phosphor. If the screen consists of regularly spaced vertical stripes of red, green, and blue phosphor (a trio) for each mask aperture and if the trios are contiguous across the screen, then Q = LA/3S (4) where A is the spacing between mask apertures, Q is the distance between the mask and the screen, S is the spacing between beams in the deflection plane, and L is the distance between the deflection plane and the screen. The screen trio pitch D is the projection of the mask aperture pitch A onto the screen (A × L/P). Proper meshing of the individual phosphor stripes occurs when the spacing d between adjacent colors is one-third of D. This fundamental equation plays a key role in determining the geometric aspects of the shadow mask. Note that during the manufacturing process, the phosphor elements are optically printed using a photosensitive resist. Optical lenses are used during the printing process together with the shadow-mask assembly and panel combination that will be made into the final tube to place the screen elements properly.
CATHODE RAY TUBE DISPLAY TECHNOLOGY
Shadow mask Deflection plane Path of red beam
37
Screen B
Red beam blocked from reaching the green phosphor element
d
Red, green, and blue phosphor elements
G R D B
S Q
Path of green beam
P
G
One trio
R
L A
B G R
Figure 11. Geometry that shows similar triangles relating S and P to Q and d, which is necessary to derive Q = LA/3S.
Visible portion of electron beamlet Blue
Red
Green
Blue
Red Clipping tolerance
Leaving tolerance Black stripes
Phosphor stripes
Hidden portion of electron beamlet
Figure 12. The screen is comprised of red, green, and blue phosphor stripes and a black material between the colors. The beamlets formed by individual mask apertures are wider than the phosphor stripes. Their edges are hidden behind the black material.
Figure 13. The distance between the edge of the electron beamlet and the edge of the adjacent phosphor stripe is the clipping tolerance and measures how much the beamlet can move before it clips the wrong color and causes color impurity in the picture. The distance between the edge of the beamlet and the edge of its own phosphor stripe is the leaving tolerance and measures how much the beamlet can move before the phosphor stripe is not fully illuminated.
Screen Register Tolerances Figure 12 shows a close-up of the screen structure and the electron beam pattern passed by the shadow mask. The electron pattern, called a beamlet, is smaller than the electron beam emitted from the gun. It is formed by a single mask aperture and is wider than the visible phosphor stripe. The red, green, and blue phosphors stripes are separated by black stripes. This geometry allows the beamlet to misregister with the visible phosphor line and still provide nearly full illumination. The amount of misregister possible without striking the wrong color phosphor is called clipping tolerance and the amount of misregister that can occur before the white uniformity is affected is called leaving tolerance. The actual tolerance calculations take into account the Gaussian distribution of electrons across the beamlet and, if needed, the impact of adjacent beams; a simplified version is shown in Fig. 13. Older CRTs used a different geometry: The phosphor stripes were contiguous (no black stripes), and the tolerance was obtained by having the beamlet smaller
than the phosphor stripe. This construction was eventually replaced by the black matrix construction of today’s screens that allow higher glass transmission and hence higher picture brightness while maintaining good contrast because of the nonreflective black stripes. Mask to Screen Alignment Issues After the tube is completed, anything that causes motion of the shadow mask or anything that perturbs the electron path will disturb the correct alignment of the electron beamlet with the phosphor element. This important fact is the reason that several performance items — thermal (warpage, doming, blister), mechanical shock, and magnetic shielding — are key issues for CRT performance. During operation, the shadow mask intercepts about 80% of the electrons emitted by the gun. Typical TV sets can produce an average steady-state beam current
38
CATHODE RAY TUBE DISPLAY TECHNOLOGY
of about 2 mA at 30 kV. Then, the total power is 60 W. The mask intercepts about 48 W, which in turn raises the temperature of the mask and, by radiation, the temperature of the metal structure supporting the mask. After an hour or two, thermal equilibrium is reached. However, both the mask and the supporting frame have expanded. The expansion across the surface of the mask is a function of the thermal expansion coefficient of the mask and the temperature change. For example, a mask of aluminum-killed (AK) steel has an expansion coefficient of about 12.0 × 10−6 mm/mm° C; a 700-mm length will expand approximately 0.4 mm when heated 50 ° C. This expansion moves the mask apertures from their desired locations. Compensation for this expansion is obtained by moving the entire mask/frame assembly forward through the use of bimetal mounting elements. Invar material has a lower expansion coefficient, about 1.5 × 10−6 mm/mm° C in the range of interest, and practically eliminates expansion of the mask. It is used in many higher performance CRTs because of its better thermal characteristics. Doming is a second thermal performance issue. It occurs when the mask (a thin sheet) is hot but the supporting structure (much more massive) is still relatively cool. Thus, the mask expands but is constrained at the edge. As a consequence, the domed portion of the mask moves forward as seen in Fig. 14. Doming is controlled through a combination of techniques that cool the mask (coatings that increase the electron backscatter or coatings that increase the radiation to nearby structures) and proper positioning of the beams and phosphor elements to allow misregister without loss of performance, or by using Invar material. Blistering (or local doming) is a third thermal performance issue. It occurs when a small portion of the mask becomes very hot compared to the surrounding areas. Such conditions occur in pictures that have small areas of very high brightness such as specular reflections. Under these conditions, temperature differences of 50 ° C between different areas on the mask can occur. Again, the heated area expands and domes forward. However, because only a small area is affected, a bubble or blister forms in the heated area of the mask. This movement of the mask causes large misregister that can affect performance — notably color field purity and the color uniformity of a white picture area. The solution for blistering is the same as the solution for doming.
by the product of the charge, the velocity, and the magnetic field flux density. This force causes a perturbation of the electron’s trajectory. This phenomena occurs in CRTs primarily because of the earth’s magnetic field. A TV set may be operated facing in any direction — north, south, east, west, or in between — where it interacts with the horizontal component of the earth’s field. Horizontal components of 200 to 400 mG are typical. In addition, the vertical component of the earth’s field varies strongly with latitude. It can range from +500 mG in Canada to −500 mG in Australia. Beamlet motions of several hundred micrometers and unacceptable performance would occur without magnetic shielding. Shielding is accomplished by a magnetic shield inside the CRT. The internal magnetic shield, together with the shadow mask and its supporting structure and to some degree the deflection yoke, provides the complete shielding system. Shielding the entire tube would be desirable but impractical. The screen must not be obstructed for the viewer, and the electrons must pass through the shield to reach the screen. Another important component of the shielding system is an external degaussing coil placed on the funnel body. During the degaussing process, alternating current is applied to the coil, then reduced to zero in less than a second. The magnetic domains in the shielding system are reoriented to align more precisely with the magnetic field lines, thereby bucking or canceling the external magnetic field. The shielding system successfully reduces the magnetic field inside the CRT but does not eliminate it. The typical misregister shifts that occur in different magnetic fields are of interest and are shown in Fig. 15. Moir´e A moir´e or beat or interference pattern can be produced when the horizontal scan lines that comprise the scanned raster interact with the periodic placement of the mask aperture. The frequency of the moir´e pattern is a function
Tube facing north
Tube facing east
Tube in Northern Hemisphere
Tube facing south
Tube facing west
Tube in Southern Hemisphere
Magnetics Electron beams are very sensitive to magnetic fields. The force on an electron that traverses a magnetic field is given
Hot mask
Cold mask Support frame
Figure 14. Doming of the mask occurs as it is heated and expands while the mask edge is constrained by the cooler support frame.
Figure 15. The beamlet motions across the whole screen when the tube is facing north, south, east, or west and experiences a horizontal field and when the tube is in the Northern Hemisphere and Southern Hemisphere and experiences a vertical field.
CATHODE RAY TUBE DISPLAY TECHNOLOGY
of the vertical pitch of the mask apertures and the vertical pitch of the scan lines. The vertical pitch of the apertures is chosen to minimize moir´e at the number of scan lines used in the intended application. The intensity of the moir´e pattern is a function of the height of the horizontal bars (tie bars) that vertically separate the mask apertures and the height of the scanned line which is in turn a function of the vertical spot size. Conventional shadow-mask tubes have adjacent columns of apertures shifted by half the vertical pitch to minimize moir´e. Tubes that have tension masks do not have moir´e due to scan lines because of their particular mask structure.
39
Impregnated cathode
Oxide cathode
Heater
Safety CRTs meet strict federal and regulatory agency standards for safety. Two critical aspects of safety are protection against X rays and protection against implosions. X rays are generated as the electron beam strikes the internal components of the tube. The resulting X rays are absorbed and attenuated primarily by the glass envelope (panel and funnel) which contains X-ray absorbing materials. The glass thickness and material composition provide the necessary attenuation characteristics. As the tube nears the end of the construction process, it is evacuated to provide a proper environment for the cathodes. Thus, the glass envelope is stressed by atmospheric pressure. If the envelope failed mechanically, the tube would implode. Events such as a hammer strike or a gunshot can trigger a failure. The common means for providing protection is a metal band on the outside of the side walls of the panel. The band is in tension and supplies a compressive force to the panel side walls. In addition to the proper distribution of internal stress by design of the panel and funnel, this force ensures that the glass is safely contained in the rare event of an implosion. ELECTRON GUN Essentially all of today’s direct view color CRTs use tricolor electron guns that generate three electron beams, one to illuminate each of the primary phosphor colors. These generally use common grids for the three guns; each of the grids has three apertures, one for each of the three electron beams. The cathodes are separated so that each of the three beams can be electrically modulated (31–38). Heater and Cathode The sources of the electrons used to create the image on the screen are thermionic cathodes near the socket end of the electron gun. These cathodes are raised to emitting temperature by standard resistance-type wirewound heaters. The vast majority of the CRTs in use today use oxide cathodes; a few high-end products use impregnated (or dispenser) cathodes (Fig. 16). The longterm electron generating capability of the oxide cathodes is limited to about 2 A/cm2 ; impregnated cathodes can operate up to 10 A/cm2 . Impregnated cathodes are used in some very large tubes, which require high beam current, and in some high-end display tubes that have very small
(a)
(b)
Figure 16. Drawings of typical heater/cathode assemblies showing (a) oxide cathodes used in most television tubes and (b) impregnated or dispenser cathodes used where high current density is required.
apertures, where the current density from the cathodes is also high. Basic Gun Construction The simplest of the electron guns in use today incorporate cathodes G1 , G2 , G3 , and G4 grids, as shown in Fig. 17(a). This is a standard bipotential focus-type gun. The cathodes G1 , G2 , and G3 bottom comprise the beam-forming region (BFR) in which the electron cloud from the cathode is accelerated and focused into a crossover at about the level of the G2 grid. This crossover becomes the image of the beam, which is eventually focused on the CRT screen. This electron beam is focused on the screen by the main focusing lens, generated by the potential difference between the G3 and theG4 , where the G3 is at an adjustable focus potential and the G4 is at the CRT anode potential. This focus lens is a simple bipotential lens that uses two voltages, focus and anode. Another basic type of focus lens is the unipotential, or Einsel lens, shown in Fig. 17(b). In this lens the voltages are equal, usually anode, on either side of the focus electrode. Unipotential focus lenses generally have better high-current performance than bipotential lenses but poor low-current focus performance and increased high-voltage stability problems because the anode voltage’s further down the neck of the gun, closer to the BFR. As a general rule, the electron guns used in today’s CRTs use a combination of bipotential and unipotential lenses to achieve the best spot performance. One of the most common is the uni-bipotential lens shown in Fig. 17(c). In this type of gun a unipotential lens is formed by the G3 , G4 , and G5 , where the G3 and G5 are at focus potential and the G4 is connected to the G2 . A bipotential
40
CATHODE RAY TUBE DISPLAY TECHNOLOGY
V anode
Vanode
V anode G6
G1
V anode G6 G 5T S
G5
G4 V focus
G 5B
G5
V focus G3
V focus
V focus G4 G3
G3
Modulation
G4 G3
G1
G2 K
G1
G2 K
G1
G2 K
G1
K
G2
Figure 17. Representations of various electron gun configurations showing the location and interconnections of the various grids. Gun types illustrated are (a) unipotential, (b) bipotential, (c) uni-bipotential, and (d) uni-bipotential with dynamic correction.
electron gun parts together. The three electron beams go through different portions of this lens, which have slightly different focusing effects, so some means is needed to trim the different focusing effects among the three beams. This is generally done with plates and/or apertures near the individual beams that are a few millimeters remote from the main focus lens gap. Trinitron Gun
Figure 18. Photograph of grids that create the low-voltage side of the main focus lens illustrates the expanded lens type of grid on the left and shows a conventional three-lens type of grid on the right.
lens then follows this between the G5 and the G6 and the G6 is connected to the anode. Optimized Main Lens Designs The gun design characteristics that primarily affect the spot size performance of the electron beam on the screen are the magnification (related mostly to the gun length and focus voltage), the mutual repulsion of the electrons in the beam (space charge), and the aberrations of the focusing lens. In general, for minimum spot size on the screen, one would want as large a beam as possible in the main lens. However, the effects of spherical aberrations of the main lens become large as the beam approaches the extremities of the focus lens, limiting the spot performance. Consequently, it is advantageous to have the physical size of the focus lens as large as possible. In the BFR portion of the gun G1 , G2 , G3 bottom), individual apertures are used for each of the three beams. However, for the main focusing lens, modern electron guns use lenses that have a single oval opening encompassing all three beams, as shown in Fig. 18. These lenses are limited in size by the CRT neck and the glass beads that hold the
The electron gun used in the Trinitron tube (Fig. 19) is a variation on the construction described before. There are still three cathodes and three separate electron beams, but they cross each other in the main focus lens region. Beyond the main focus lens, the diverging outer beams then need to be deflected to converge again on the center beam at the center of the tube. This is done electrostatically by parallel plates in the upper end of the gun that operate at a voltage about 500 V lower than the anode voltage. This configuration allows a large-diameter, singleaperture focusing lens where the three beams cross, but it adds complications and cost to reconverge the outer beams with the second anode voltage. Dynamic Astigmatism The self-converging in-line system inherently causes a strong overfocus of the beam in the vertical direction, but maintains focus in the horizontal direction at the corners of the screen. In small tubes and/or low-end systems, this effect is tolerated and results in some amount of overfocus V anode KR KG KB
Blue beam Green beam Red beam
G1 G 2 G 3 G 4 V focus
G5 V convergence
Figure 19. Representation of a Trinitron type of electron gun that illustrates the crossover of the three beams at the main focus lens and shows the location of convergence plates.
CATHODE RAY TUBE DISPLAY TECHNOLOGY
41
described by the equation I = k(Vd )γ
+ ∆V
−
Figure 20. Representation of the interdigital grids method for correcting dynamic astigmatism by applying a voltage differential between the two interdigital grids.
or flare of the beam in the corners in the vertical direction. In high-performance systems, the use of a special gun (Fig. 17d) and dynamic focus modulation generated by scanning waveforms can compensate for this effect. The principle behind this operation is adding special grids in which the individual electron beams can be underfocused in the vertical direction and overfocused in the horizontal direction. This can be done with opposing orthogonal rectangular apertures or with interdigitized plates, as shown in Fig. 20. In either case, applying a voltage differential between the two grids causes underfocus in the vertical direction and overfocus in the horizontal direction. If a dynamic focus voltage is applied to these grids and at the same time also applied to the main focus lens, so that the beam is underfocused in both the horizontal and vertical direction, the result is a strong underfocusing action in the vertical direction and essentially no effect on the horizontal focus of the beam. This corrects the overfocusing action of the self-convergent system in the vertical direction without affecting the good focus already obtained in the horizontal direction. Using the interdigit approach, waveforms required are about 800 V at the horizontal rate and 300 V at the vertical rate for approximately 1100 V in the corners of the CRT screen. Electron Gun Drive Characteristics The electron gun is a nonlinear device, and the resultant picture performance is very dependent on the gun operating characteristics of the TV set. The TV set or monitor designer must decide at what cutoff voltage to operate the gun. This is the voltage between cathode and G1 , at which the electron beam is just cut off. It is obtained by applying this direct-current (dc) voltage between the cathode and G1 (generally 150 to 190 V with cathode positive) and then increasing the G2 voltage until the electrons are on the verge of being accelerated to the screen. From this point, the video signal is applied to the cathode (driving it closer to G1 ). Higher operating cutoff generally gives better spot performance, but it requires a higher drive voltage to obtain the same beam current and resulting light output. The resulting beam current is
(5)
where Vd is the video drive voltage, k is a constant, depending on the gun design and the operating cutoff, and γ is approximately 2.7. Gamma (γ ) is relatively constant for all CRTs and is already anticipated in the TV signals transmitted; a display with this gamma is expected, in order to give the proper reproduction of the image seen by the TV camera. See the tube data bulletins of any CRT manufacturer for more information on electron gun operating characteristics. DEFLECTION YOKES The deflection yoke is a wire-wound electromagnetic device located around the outside of the CRT neck whose primary function is to deflect the electron beam to any location on the CRT screen. The basic yoke consists of two sets of orthogonal coils to provide horizontal and vertical deflection, along with a ferrite core to provide a low reluctance return path for the magnetic flux. For NTSC television operation, the vertical direction scan is at a relatively slow rate of about 60 Hz, and the horizontal direction is scanned at 15,750 Hz. Scanning of the electron beams is synchronized with the video signals applied to the electron gun to ‘‘paint’’ the proper image on the screen. These frequencies are for normal NTSC broadcast television, which has 525 scan lines, of which 483 contain active video information. Computer display monitors and digital HDTV systems have more scan lines and operate at higher frequencies to provide reduced flicker and higher resolution. In addition to deflecting the electron beams, the distribution and shape of the magnetic field generated by the deflection yoke strongly influences many of the performance parameters — particularly geometry and convergence. Because of the self-converging inline system used in nearly all modern television receivers, the deflection yoke magnetic fields need to have very strong but precisely controlled nonuniformity. This is done by careful control of the distribution of the wires within each of the coils. Saddle Versus Toroidal Deflection Coils There are two basic types of coil constructions used in television deflection yokes: saddle and toroidal. In saddle coils, the wires are wound into a shaped cavity and then bonded together to maintain that shape. In this case, the shape of the cavity determines the distribution of the wires and the magnetic field. Two similar coils are located opposite each other on the tube neck, and a ferrite core is placed around the outside to provide a magnetic return path (Fig. 21). In toroidal coils, the wires are wound toroidally around the core itself. The two toroidal coils are wound on opposite sides of the cylindrical core, and they are connected so that the fields ‘‘buck’’ each other and the deflection is accomplished by the ‘‘leakage flux’’ across the middle of the cylinder, as shown in Fig. 22.
42
CATHODE RAY TUBE DISPLAY TECHNOLOGY
(a)
Core
Co (ou il A t) il B Co ut) o (
C o i (in l B )
o
+
Coil A (in)
A o o o o B
+
o Deflection field
o
+
o
+
A + + + + B
(b)
Figure 21. Schematic representation of a saddle-type coil and ferrite core showing the wire location and the main magnetic flux lines.
Figure 23. Photographs of (a) saddle/saddle yoke and typical saddle coil and (b) a saddle/toroidal yoke and typical toroidal coil and core assembly.
Magnetic leakage flux
Signal out
+ + + +
Deflection field
Core + + + + +
1 Signal in Figure 22. Schematic representation of a toroidal type of coil and ferrite core showing the wire location and the magnetic flux lines.
and deflection circuitry and (2) the inductive reactance to the rapid changes in deflection current necessary to scan the beam to various parts of the screen. The vertical deflection is at a relatively slow 60 Hz, and the resistive losses predominate. Consequently, the vertical sensitivity is normally expressed in peak watts required to deflect the beam to the top or bottom of the CRT screen. However, for the horizontal deflection, the inductive reactance predominates, and the horizontal deflection sensitivity is better characterized by the stored energy, which needs to be switched to retrace the beams from the right side of the screen to the left. Vertical power = (Ip )2 R Horizontal stored energy =
There is more versatility in designing different magnetic field configurations using saddle coils than using toroidal coils. In television applications, the horizontal deflection coils are of the saddle type, but the vertical deflection coils may be either toroidal or saddle. Most television applications use the S–T (saddle horizontal, toroidal vertical) type because they are less expensive to manufacture than the S–S (saddle horizontal, saddle vertical) type of deflection yoke. However, many data-display, desktop monitor applications and HDTV and high-end television applications use the S–S yoke because the additional design flexibility can give better electron optics performance. The S–S yoke also has less leakage flux, which can be important in monitor applications. Figure 23 shows photographs of S–S and S–T yokes. Deflection Power The deflection yoke is an inductive device, and the power to drive it consists of (1) the resistive losses in the coils
1 L(Ip )2 2
(6) (7)
where Ip is the peak vertical or horizontal scan current. The stored energy is a basic parameter of the yoke and the CRT mechanical configuration and is not strongly affected by the impedance (number of turns of wire) of the yoke itself. It is, however, highly dependent on the anode voltage (directly proportional) and the deflection angle of the system. Typical stored energy values for 90° systems are about 2 mJ, and for 110° systems the stored energy increases by about two and one-half times to 5 mJ. Raster Geometry — Pincushion The deflection angle of the CRT, the flatness of the CRT screen, and the basic design of the deflection yoke largely determine the outline shape of the deflected raster (pincushion distortion). Wider deflection angles and flatter screens cause greater pincushion-type distortion. With self-converging inline systems, some of this distortion can be corrected in the design of the deflection yoke, and
CATHODE RAY TUBE DISPLAY TECHNOLOGY
some must be corrected by appropriately modulating of the scanning signals. In particular, the design of the deflection yoke can correct the pincushion distortion at the top and bottom of the screen. For 90° deflection systems and some 110° systems, the pincushion at the sides of the picture is also correctable in the yoke design. However, with flatter panels and 110° deflection, circuit correction is required to correct the pincushion distortion at the sides of the raster.
43
7. T. Iki, CRTs, Society for Information Display, Seminar Lecture Notes, Seminar M2, 1997. 8. P. A. Keller, Electronic Display Measurements, Wiley, NY, 1997. 9. C. Infante, CRT Display Measurements and Quality, Society for Information Display, Seminar Lecture Notes, Seminar M-3, 1995. 10. R. L. Barbin, T. F. Simpson, and B. G. Marks, RCA Eng. 27(4), (1982). 11. P. A. Keller, Displays 7(1), 17–29 (1986).
SUMMARY For nearly 40 years, direct view color CRTs have been the primary display device for television and many computer and industrial displays. As we move into the age of digital television signals, the requirements for display performance will become even greater. The higher resolution and less noisy digital television signals will create a demand for better resolution and higher brightness displays that give more realistic natural images. CRTs are continuing to improve to meet these challenges through larger sizes, new aspect ratios, better resolution, and higher brightness images. At the same time, alternative display technologies are appearing that may threaten CRT dominance in certain areas. In small sizes, liquid-crystal displays can give very good computer-generated images, and they do indeed dominate the laptop display market. However, they are much more expensive than CRTs, particularly as size increases; nearly all nonportable desktop computer displays larger than 13 in. are CRT. For the very large sizes greater than 40 in., displays are mostly projection, both CRT-based and light valve-based (e.g., liquid crystal) projectors. Large plasma displays (greater than 40 in.) are also becoming available, although still at a very high price. CRT projectors are quite bright but are very directional due to their special screen structure to obtain the light output gain, and they also lack the sharpness of a direct view CRT display. Liquid-crystal projectors and plasma displays are much dimmer and much more expensive than CRT displays. In the predominant television market of 15 in. through 40 in. screen sizes, none of the alternate technologies have the performance or the cost to match direct view CRTs. Consequently, CRTs will remain the dominant display for television applications for many years to come. BIBLIOGRAPHY
12. P. Keller, TEPAC Eng. Bull. 25, (1985). 13. Electronic Industries Association, TEPAC Eng. Bull. 25, (1985). 14. Electronic Industries Association, Line Profile Measurements in Shadow Mask and Other Structured Screen Cathode Ray Tubes, TEPAC Publication No. 105-9, 1987. 15. P. Burr and B. D. Chase, SID Proc. 1st Eur. Display Res. Conf., September 1981, pp. 170–172. 16. R. L. Donofrio, IEEE Trans. Broadcast Television Receivers BTR-18, 1–6 (1972). 17. P. D. Griffis and J. Shefer, IEEE Trans. Consumer Electron. CE-23, 14–21 (1977). 18. C. R. Carlson and R. W. Cohen, Visibility of Displayed Information: Image Descriptors for Displays, Office of Naval Research, Arlington, VA, Report ONR-CR 213-120-4F, 1978. 19. C. R. Carlson and R. W. Cohen, Proc. S.I.D. 21(3), 229–246 (1980). 20. P. Barten, 1983 SID Int. Symp., 1983, pp. 64–65. 21. P. Barten, Proc. pp. 280–283.
3rd
Int.
Display
Res.
Conf.,
1983,
22. P. Barten, Proc. S.I.D. 28, 253–262 (1987). 23. P. Barten, S.I.D. Semin. Lect. Notes 1, 6.1–6.19 1988. 24. P. G. J. Barten, Proc. S.I.D. 30(1), 9–14 (1989). 25. Thompson Tubes & Displays, Tubes for Multimedia Applications, 1997. 26. A. A. Seyno Sluyterman, S.I.D. Symp. Dig. 24, 340–343 (1993). 27. D. P. Bortfeld and J. P. Beltz, S.I.D. Symp. Dig. 18, 225–227 (1987). 28. C. Infante, Displays 12, 80 (1991). 29. Electronic Industries Association, MTF Test Method for Monochrome CRT Display Systems, TEP105-17, 1990. 30. Electronic Industries Association, Contrast Measurement of CRTs, TEPAC 105-10, 1987. 31. H. Y. Chen, Inf. Display 13(6), 32–36 (1997). 32. I. M. Wilson, IEEE Trans. Consumer Electron.CE-21, 32–38 (1975). 33. R. H. Hughes and H. Y. Chen, IEEE Trans. Consumer Electron. CE-25, 185–191 (1979).
1. P. A. Keller, The Cathode-Ray Tube: Technology, History and Applications, Palisades Press, NY, 1991.
34. R. L. Barbin et al., S.I.D. 90 Dig., 1990.
2. A. M. Morell et al., Color Television Picture Tubes, Academic Press, NY, 1974.
35. K. Hosokoshi, S. Ashizaki, and H. Suzuki, Proc. 13th Int. Display Res. Conf. Japan Display ’83, 1983, pp. 272–275.
3. R. L. Barbin and R. H. Hughes, IEEE Trans. Broadcast Television. Receivers BTR-18, 193–200 (1972).
36. S. Ashizaki et al., Proc. 16th Int. Display Res. Conf. Japan Display, ’86, 1986, pp. 44–47.
4. A. M. Morell, IEEE Spring Conf., Chicago, 1973.
37. S. Shirai et al., S.I.D. Int. Symp. Dig. Tech. Papers, 1987, pp. 162–165.
5. A. M. Morrell, IEEE Trans. Consumer Electron.CE-28, 290–296 (1982). 6. E. Yamazaki, CRT Displays, Society for Information Display, Seminar Lecture Notes, Seminar F-5, 1993.
38. Color CRT System and Process with Dynamic Quadrupole Lens Structure, US Pat. 5,036,258, 1991, H. Y. Chen and R. M. Gorski.
44
CATHODE RAY TUBES
CATHODE RAY TUBES
THE CRT R. CASANOVA ALIG Sarnoff Corporation Princeton, NJ
The cathode-ray tube (CRT) is best known as the display for television and computer monitors. It is a vacuum tube in which electrons move in a beam. Figure 1 shows a cutaway diagram of the interior of a CRT for color television display. Inside the tube, the large, nearly flat, area, called the screen, is covered with phosphor materials that convert the electron-beam energy into light. The beam is rapidly swept over this area, and its intensity is varied, so that a display appears to the viewer, who visually integrates the light. When the display is rapidly changed, the viewer sees motion, as in television. The CRT is described in this article. In particular, the parts of the CRT, such as the cathode, electron gun, phosphors, deflection system, and shadow mask, are explained. The electrons come from the negative electrode, called the cathode, and form a beam, or ray, inside a vacuum tube, leading to the name cathode-ray tube. Much interesting physics is associated with the beam formation and direction and its conversion to light, as well as with the manufacture of CRTs. CRTs have been used in other display applications, such as oscilloscopes, radar, and photorecording tubes. Some history of the tube, which extends during the last century, is presented.
Figure 1 shows a cutaway diagram of a CRT. The CRT is cut vertically in this diagram to show half the interior. The thick outline is the glass edge. The interior is a vacuum. The external surface of the tube at the right is the portion generally seen by the viewer. The light seen by the viewer originates in the phosphors on the inner surface, called the screen. These phosphors are shown by the colored lines. The phosphors are white powders and the color is added to show the color of the light emitted when they are struck by the electron beam. The electron beam is shown by the white band from the neck at the rear of the CRT. For color, three electron beams, shown by red, green, and blue discs, are used. These discs show the color of the light emitted by the phosphor struck by the electron beam. The electron beam comes from the electron gun inside the neck. The electron gun, a very critical part of the CRT, is small compared with the CRT; only its largest parts can be seen in Fig. 1. Pins at the rear of the CRT extend through the glass and provide most of the electrical potentials needed by the gun. Outside the neck and in front of the electron gun, the deflection yoke is used to deflect the electron beams to points on the screen. It is inactive when the beams go to the screen center, as shown in Fig. 1. The blue trapezoidal cross sections represent the ferrite core, and the red surfaces represent the wires of the coil. Red is a typical color of the insulator on the wires. Electric currents in these coils create magnetic fields in the interior of the CRT. The moving electrons in the beam feel the Lorentz force when passing through these fields, and they are deflected by it. The connector at the bottom of the yoke brings in these currents. The circular inset in Fig. 1 shows details of the shadow mask. This mask is the large metal plate immediately behind the phosphor screen. It is supported by the L-shaped frame, and the frame is secured to pins embedded in the glass. The tapered metal structure attached behind the frame is the magnetic shield, needed to exclude the earth’s magnetic field. The metal band outside the tube is the implosion band, needed to prevent the earth’s atmosphere from violently collapsing the tube. A support for mounting the CRT in a cabinet can be seen near this band at the corner of the CRT. The round structures on the CRT funnel exterior are parts of the anode button. It transmits the highest electrical potential needed for the gun. The inside surface of the CRT is electrically conducting, and the top of the electron gun touches this surface. If this potential were brought in through the socket, arcs to the other pins of the socket would destroy the tube. A wire extends from the end of the electron gun, seen at the bottom of Fig. 1, that has a pan at its other end. This pan contains the getter material. Gun/Cathode
Figure 1. A cutaway diagram of a cathode-ray tube for television or computer monitors.
The electron beams in Fig. 1 originate from the electron gun. A photograph of an electron gun is shown in Fig. 2. The gun is an assembly of metal parts secured by multiform glass rods, often called beads. One rod is visible in the foreground of Fig. 2. In manufacture, the top of the gun is inserted into the rear of the neck of the CRT up to the level of the stem; the stem is the glass disk in the
CATHODE RAY TUBES
Anode grid Focus grid Rod
Stem
Figure 2. Photo of electron gun.
lower part of the figure. The electron gun fits snugly in the neck, held in place by the small leaf springs at the very top of the picture. They also connect the top metal part to the anode potential via an internal coating of the tube. This part is often called the anode grid. Below it lies a large metal part called the focus grid. The diameter of the gun generally ranges from 10 to 50 mm, depending on the application. Voltages of 25,000 V and 5,000 V on the anode and focus grids, respectively, create an electron lens in the gap between these grids. This lens focuses the electron beam. The edges of the screen grid and control grid are visible behind the focus grid. The cathode lies inside the small cylinder seen underneath the wire leading to the screen grid. Metal pins pass through the stem to make electrical connections. Wires from the internal ends of these pins lead to the grids, and the external ends of these pins are attached to a socket. The glass stem tube at the bottom of the picture is used for handling the gun before and during insertion into the CRT. Following sealing of the stem to the CRT and air evacuation of the CRT (through this tube), this tube is removed. A cross-sectional diagram of the electron gun is shown in Fig. 3. Apertures in the control and screen grids allow electrons to pass from the cathode to the electron lens formed by the focus and anode grids. These apertures are very small, generally well below 1 mm in diameter. The lens focuses the electrons. Paths of electrons are shown in Fig. 3 by the lines that have arrows. 0V
500 V
5 kV
100 V Cathode Control grid
Axis Focus grid
Inside the gun, the electron beam originates from a cathode. A picture of a cathode and its parts is shown in Fig. 4. The cathode is a cap on the cathode cylinder. This cylinder is small, frequently less than 2 mm in diameter. Inside this cylinder is a very small heater filament coil whose diameter is about 1 mm. Electric currents in this coil generate heat to raise the cathode temperature to nearly 1000 ° C. Red light from this filament can often be seen at the back of an operating tube. At this temperature, the material of the cathode, typically barium, emits many electrons into vacuum. Positive voltages on the gun electrodes pull the emitted electrons away from the cathode and form them into the beam. Voltage on the control grid in Fig. 3 varies the beam intensity, so that the display will have light and dark regions. Replacement electrons come from the outside through a pin in the stem. Higher voltages on electrodes further from the cathode accelerate the electron beam. These final electrodes of the gun apply forces to the beam that resemble the action of an optical lens on light, and these forces make the beam very small when it ends on the screen. Because of the similarity of light rays in lenses to electron paths in these forces, the study of this electron motion is often called electron optics. The electrons leave the gun at speeds of 5% or more of the speed of light. This speed is high enough for the theory of relativity to change the electron path perceptibly from that expected from Newtonian physics. The CRT is unusual, perhaps unique, among household objects, in that relativistic effects are significant. Phosphor The electron beam in Fig. 1 ends on the screen, where phosphors convert the electron-beam energy into light. Phosphors are semiconductors that frequently contain impurities, which emit light when electrons and holes recombine. Because the beam energy is nearly 25,000 eV and the phosphor band gap is about 5 eV, each electron creates thousands of electrons and holes, causing the
Cathode coating
Ceramic cathode support Cathode cylinder
25 kV Electrons
Heater or filament
Anode grid
Screen grid
Figure 3. A cross-sectional diagram of the electron gun shown in Fig. 2.
45
Figure 4. Details of the cathode (1).
46
CATHODE RAY TUBES
emission of thousands of photons of light. Zinc sulfide is a typical phosphor. When it contains cadmium and silver, white light is emitted. From silver alone, the light is blue, whereas it is green with copper. Deflection To generate a display of the light from the electron beam, the beam must be swept over the area of the display in synchronization with its intensity variation. Magnetic coils outside the CRT are often used to deflect the beam. These deflection coils are shown in Fig. 1. A more detailed view is shown in Fig. 5. Currents flow in these coils to generate magnetic fields inside the CRT. Changing currents sweep the beam over the display area. These coils resemble Helmholtz coils pressed to the outer contour of the CRT. Helmholtz coils have uniform magnetic flux density in the central region between the coils. Electric currents flow in the same direction in the portions of the coils in the foreground to create a magnetic field nearly perpendicular to the direction of view. Electrons moving downward are deflected in the view direction. The Lorentz force deflects the beam, just as a magnet held near a television screen will distort its appearance. (Color CRTs may be permanently distorted if the magnet used for this demonstration is too strong). A ferrite core in Fig. 5, half of which has been removed, concentrates the magnetic field in the neck of the CRT. In operation, two pairs of coils are needed for the two field directions. Many parts of the electron gun shown in Fig. 2 can be seen through the CRT neck in Fig. 5.
Color Most CRTs in use today display color images. Color is displayed by using three phosphors, one for each of the primary colors to which the eye is sensitive. These additive primary colors are red, green, and blue for emitted light, such as emerges from a CRT. An electron beam is needed to energize each of these phosphors, so the three beams shown in Fig. 1 are required for color. Each electron beam energizes a single color phosphor. To prevent the electron beams from energizing the other phosphors, the shadow mask shown in Fig. 1 is used. The shadow mask contains hundreds of thousands of holes, some of which are shown in Fig. 6. These holes shadow electrons, so that any point on the screen receives electrons from only one of these beams. The electron beams move in the directions of the arrows and have cross sections much larger than an opening in the shadow mask. A portion of each beam passes through the opening and impinges on a single phosphor area. A portion from the blue beam is shown by the dashed lines in this figure. Other portions of the beams are shadowed from the other phosphor areas by the solid mask. In Fig. 6, the angles between the beams are exaggerated, and the distance between the mask and screen is sharply contracted. Typically the angles are less than 1° and the mask–screen separation is 50 to 100 times the mask aperture width. The phosphor area is similar to the mask opening area. An example of the arrangement of the phosphors that emit red, green, and blue colors is shown in the inset of Fig. 1. This phosphor arrangement can also be seen on any home television set. A magnifying glass or optical loop may help in distinguishing it. When seen from afar, the details of the phosphor arrangement blend, giving the appearance of a color image. Contrary to the appearance of the inset in Fig. 1, each beam typically intercepts several mask openings. The beams also come together, or converge, at the screen, not at the shadow mask. HISTORY AND APPLICATIONS The basic elements of a CRT, the glass envelope, cathode, deflection, and phosphors, have been described. CRT applications include TV and computer monitors, as well as oscilloscopes and radar tubes. In these applications, the basic elements are not always implemented in the same way as they are in CRTs for TV and monitors. As these applications are described, these differences
Blue
Green
Red
Electron beams
Shadow mask
Phosphors red green blue Screen Figure 5. Deflection coils on the neck of a CRT. One pair of coils is shown (2).
Figure 6. Diagram showing the masking of phosphors by the shadow mask.
CATHODE RAY TUBES
will be noted. Other vacuum devices, which could be called CRTs but generally are not, include X-ray tubes, electron microscopes, microwave tubes, television camera tubes, and electron-beam lithography, annealing, welding, deposition, etc. They all lack phosphors. Oscilloscope The CRT was invented in 1897 by Ferdinand Braun. The oscilloscope was the first application for the CRT. It was developed in the early part of the twentieth century to display the variation of voltage with time. This voltage is taken from a location in an ac circuit. Voltage can vary rapidly with time, so the electron beam must be rapidly deflected to display the variation. Thus, the magnetic deflection coils are replaced by two sequential pairs of plates inside the CRT. Electric fields between a pair of plates deflect the beam. The beam generates a spot on the screen that is swept by the deflection fields to generate a graph of voltage versus time. The time for a horizontal scan is chosen as one or more periods of the ac voltage. The voltage is applied to the vertical scan, so the display resembles the graph of the sine function, if the amplitude is constant and contains only one frequency. Of course, other interesting voltages have many harmonics with different amplitudes that lead to informative displays about the voltage at the circuit location. It became a widely used laboratory tool. Modern oscilloscopes replace the CRT with digital electronics and a flat display. Very high speed modern optics, using pulses of subpicosec duration, uses a CRT similar to the oscilloscope, called a streak tube, to display the pulse. To detect light, a photocathode replaces the thermionic cathode. Radar Tube The radar tube was the first application for the CRT outside the laboratory. The screen of a radar CRT is usually round, and a rotating radial line is displayed. Occasional bright spots appear along this line that are associated with targets detected by the radar. The radial line is coordinated with the direction of the radar antenna orientation and the radial distance of the bright spot is coordinated with the distance to the target. There are other, less well-known, radar display formats besides this one, called plan position indicator. The radial line is generated by deflecting the electron beam with an external magnetic deflection yoke, such as shown in Figs. 1 and 5. In early versions, only one pair of coils was used, and the yoke was physically rotated around the tube neck in coordination with the rotation of the radar antenna. Later, two pairs of fixed coils provided the required electronic signals to rotate the line. The signal reflected from the target increases the electron-beam current to generate a bright spot. Its arrival time is coordinated with the deflection along the radial line to indicate its distance. The phosphors on the screen are specialized to be persistent, that is, light continues to be emitted for several seconds following the passage of the electron beam. In most displays, persistence results in undesirable blurring of the display, but for radar, this persistence enhances the visibility of the target. The cathode, electron gun, and glass envelope of the radar tube are similar to those of other
47
CRTs. As happened with the oscilloscope, digital electronics in modern radar enable replacement of the radar CRT with a flat display or with a computer-monitor CRT. TV and Computer Display and Projection TV Television became popular before and following World War II, and the CRT displayed the visual information. To display a picture, one pair of deflection coils rapidly sweeps the beam across the width of the screen, while another pair slowly sweeps the beam vertically. The pattern of displaced lines is called the raster. Because the beam makes a point of light on the screen, the raster is the composite of these points of light. Thus a CRT display is analogous to the French Impressionist art form called pointillism, where the picture is a composite of dots. The resolution of a CRT display is related to the number of distinct points of light in the composite. Television rasters use about 500 lines. High-definition television (HDTV), the new television standard, will use more than 1000 lines. To display moving pictures, the raster is changed 30 times a second. In home television, alternate lines are often displayed, so 250-line rasters are displayed 60 times per each second. This arrangement, called interlace, reduces the perception of flicker. During the 1960s, color television supplanted monochrome television. CRTs for shadow-mask color television were described earlier. This is not the only way to generate a color image on a CRT. Another way uses a CRT that emits white light. A changing set of color filters lies between the CRT and the viewer. When the green portion of the image is displayed on the CRT, a filter that passes green light is used; when the blue portion is displayed, a filter that passes blue light is used; and when the red portion is displayed, a filter that passes red light is used. To avoid flicker, this must happen very quickly, about 60 times per second. The filters can be changed by arranging them as sectors of a wheel that rotates in front of the CRT. This color wheel method was seriously considered during the development of color television in the 1950s. However, adopting it would have made television inaccessible to owners of the monochrome televisions then in use. The shadow-mask method allows displaying color television signals on both color and monochrome CRTs. Today, the color wheel is replaced by liquid-crystal filters in some specialized applications. In recent years, CRTs have been used extensively for computer displays. These CRTs are similar to that shown in Fig. 1 and differ from those for television only in the details needed to make the display sharper for text. Brightness is reduced by these changes. Quite recently, projection television has become popular. Projection systems generally use three CRTs. One monochrome CRT is used for each of the three phosphor colors, and the light from them is projected and combined on the screen. CONSTRUCTION Construction of a CRT begins with the flask-shaped glass container. Its top, sides, and base are commonly called
48
CHARACTERIZATION OF IMAGE SYSTEMS
the neck, funnel, and panel, respectively. The phosphor, consisting of small grains mixed with water to form a slurry, is added to the flask. After the phosphor settles on the base, which will become the screen, the water is removed. Of course, the construction of the color phosphor screen shown in Fig. 1 is much more complicated than simply settling the phosphor slurry. Photolithographic techniques are used. Even today, each shadow mask must be used to deposit the phosphor on the panel with which it will be used to achieve adequate registration of the openings and phosphors shown in Fig. 6. An electron gun, shown in Fig. 2, is inserted and flamesealed to the neck. The tube is sealed after the air in the tube is removed through the cylinder in the stem. A vacuum of 10−8 torr is needed, and baking at 400 ° C, followed by gettering is necessary to attain it. Efficient getters are essential to electrically stable CRTs. Electrical connections are made through the pins in the glass stem. A vacuum inside the CRT is needed for the electron beam to exist. Atmospheric pressure on the evacuated tube can cause violent collapse of a tube that is mishandled. The glass thickness is increased in larger CRTs to prevent collapse of the tube. Increased glass thickness adds weight and limits the maximum size of CRTs. The electron gun is made by assembling stamped, stainless steel metal parts onto a fixture. They are secured with glass rods heated to softening, which are pressed onto the fixtured assembly. The cathode and its filament are assembled into these metal parts. Pins in the stem are welded to wires that are also welded to the metal parts. The electron current needed for the electron beam enters through one of these pins, filament current enters and exits through two others, and voltages are placed on the electron-gun electrodes through yet other pins. The anode voltage is so high that arcs to the other pins would occur, so it is brought in through the anode button, shown in Fig. 1; the electron current from the beam exits through this connection, too. The deflection coils are assembled from wires. They are pressed into coils using an arbor with the CRT contour. These coils are surrounded by a ferrite core, separated by insulators, and supported by a plastic housing. This deflection system, frequently called the yoke, slides over the neck at the rear of the tube to reach its position against the funnel. Connections between the coils and external transformers provide the currents needed for the magnetic fields that deflect the electron beam. Deflection is needed to move the electron beam across the screen of the CRT. As the deflection angle is enlarged, the beam paths to the center and edge are different. This limits the deflection between opposite corners to angles of 110 ° or less. This limit on the deflection angle adds depth to the CRT and prevents manufacture of a flat CRT. INDUSTRY — TODAY AND FUTURE Today most CRTs are made for color and projection television and for computer displays. More than 200 million tubes are sold annually in a market valued at $20 billion. They range in screen diagonal size from 1 inch to 35 inches. Most are made and sold in the Far East.
Many new flat-panel display technologies are emerging. All of them are far more expensive than the CRT. All require far more external electrical connections than the CRT. Unless the application demands their attributes, the CRT will generally be used. Thus, these new technologies generally are associated with new display products, such as the liquid-crystal display (LCD) with the laptop computer. Modern CRTs for TV and computer monitors have flatter and larger screens. Flatter screens greatly reduce glare from ambient light, and they make the CRT appear more like the new flat-panel displays. Larger screens are needed to display the improved resolution associated with digital information transfer; still larger screens will be needed for HDTV. Growing computer technology drives new applications for displays, many of which will be fulfilled by the CRT. Arcade games, surveillance, banking and vending machines, and so on, all require displays. Most of these needs will be met with the CRT. Medical technology is replacing X-ray film with CRTs. Photography is also becoming digital, and many images are being displayed on CRTs. ABBREVIATIONS AND ACRONYMS CRT
Cathodic-Ray Tube
BIBLIOGRAPHY 1. P. A. Keller, The Cathode-Ray Tube, Palisades, NY, 1991. 2. R. Vonk, Philips Tech Rev. 32, 62 (1971).
ADDITIONAL READING K. B. Benson, ed., Television Engineering Handbook, McGrawHill, NY, 1986, (Chaps. 10 and 12). Morrell A. M. et al., Adv. Image Pickup and Display, Color Television Picture Tubes, Suppl. 1, Academic Press, NY, 1974. Yamazaki E., Adv. Imaging Electron. Phys. 105, 142–266 (1999). Zeppenfeld K. W. M. P., Acta Electronica 24, 309–316 (1981).
CHARACTERIZATION OF IMAGE SYSTEMS FRIEDRICH O. HUCK CARL L. FALES NASA Langley Research Center Hampton, VA
INTRODUCTION This characterization addresses image systems that combine image gathering and display with digital processing. These systems enable the user to acquire images (take pictures or make videos) and also to transmit them and to alter their appearance. The alterations may vary widely, according to need or taste. Normally, however, the goal is to produce images that are sharp and clear
CHARACTERIZATION OF IMAGE SYSTEMS
without perceptually annoying or misleading distortions and to do so for a wide variety of scenes at the lowest cost of data transmission and storage. This goal, to produce the best possible images consistently and economically, leads the proper characterization of image systems into the realm of communication theory. Communication theory is concerned explicitly with transmitting and processing signals. Modern formulations of this theory are based largely on the classical works that Shannon (1) and Wiener (2) published in the late 1940s. Shannon introduced the concept of the rate of transmission of information (or information rate) in a noisy channel, and Wiener introduced the concept of the restoration of signals perturbed by noise with the using the minimum meansquare error (or, equivalently, the maximum realizable fidelity). These concepts deal inherently with statistical properties of stochastic (random) processes and the probability of occurrences rather than with periodic functions or transients and deterministic outcomes (3). In 1955, Fellgett and Linfoot (4,5) were the first to extend these concepts of communication theory to the characterization of image systems. Their extension addressed photographic image systems in which an objective lens forms a continuous image of the captured radiance field directly on film. The formation of this image is constrained by the blurring due to the objective lens and by the granularity of the film, similar to the way the transmission of a signal in a communication channel is constrained by bandwidth and noise. Digital image systems, however, are more complex. Normally, these systems encompass the following three stages, depicted in Fig. 1: 1. Image gathering, to capture the radiance field that is either reflected or emitted by the scene and transform this field into a digital signal. The transformation may account for the a. two-dimensional spatial variation of the radiance field for panchromatic (black-and-white) imaging; b. three-dimensional spatial and spectral variations of the radiance field for color (or multispectral) imaging;
c. four-dimensional spatial, spectral, and temporal variations of the radiance field for color video. 2. Signal coding, to encode the acquired signal for the efficient transmission and/or storage of data and decode the received signal for the restoration of images. The encoding may be either lossless (without loss of information) or lossy (with some loss of information) if higher compression is required. 3. Image restoration, to produce a digital representation of the scene and transform this representation into a continuous image for the observer. The digital processing first minimizes the mean-square error due to the perturbations in image gathering, lossy coding, and image display, and subsequently, if desired, it enhances the visual appearance of the displayed image, according to need or taste. Human vision may be included as a fourth stage, to characterize image systems in terms of the information rate that the eye of the observer conveys from the displayed image to the higher levels of the brain. These four stages are similar to each other in one respect: each contains one or more transfer functions, either continuous or discrete, followed by one or more sources of noise, also either continuous or discrete. The continuous transfer functions of both the image gathering device and the human eye are followed by sampling that transforms the captured radiance field into a discrete signal that has analog magnitudes. In most image gathering devices, the photodetection mechanism conveys this signal serially into an analogto-digital (A/D) converter to produce a digital signal. In the human eye, however, the retina conveys the signal in parallel paths from each photoreceptor to the cortex as temporal frequency-modulated pulse trains. Analog signal processing in a parallel structure has many advantages that are increasingly emulated by neural networks (6). This model of digital image systems differs from the classical model of communication channels in two fundamental ways: 1. The continuous-to-discrete transformation in image gathering. Whereas the performance of classical
Signal coding
Image gathering Radiance field L (x,y;l)
na (x,y) K tp (x,y;l)
t (x,y;l)
A /D
Acquired signal
s (x,y)
te (x,y)
np (x,y)
se (x,y;k)
nq 0 (x,y)
K 0 tp 0 (x,y)
Encoder
Decoder
Image restoration
na 0 (x,y) 0
Encoded signal
nq (x,y;k)
Human vision Visual signal s 0 (x,y;k)
t 0 (x,y)
49
Displayed image r (x,y;k)
K −1td (x,y) nd (x,y)
Figure 1. Model of digital image systems.
Digital image t (x,y) r (x,y;k) v
Ψ(x,y;k)
Decoded signal
se (x,y;k)
50
CHARACTERIZATION OF IMAGE SYSTEMS
channels is constrained critically only by bandwidth and noise, the performance of image gathering devices is constrained also by the compromise between blurring due to limitations in the response of optical apertures and aliasing due to insufficient sampling by the photodetection mechanism. This additional constraint requires a rigorous treatment of insufficiently sampled signals throughout the characterization of image systems. 2. The sequence of image gathering followed by signal coding. Whereas the characterization of classical channels accounts only for the perturbations that occur at the encoder and decoder or during transmission, characterization of image systems must account also for the perturbations in the image gathering process before coding. These perturbations require a clear distinction between the information rate of the encoded signal and the associated theoretical minimum data rate or, more generally, between information and entropy. The extension of modern communication theory to the characterization of digital image systems (7–10) establishes a cohesive mathematical foundation for optimizing the end-to-end performance of these systems from the scene (the original source) to the observer (the final destination) in terms of the following figures of merit: 1. The information rate He of the encoded signal, the associated theoretical minimum data rate Ee , and the information efficiency He /Ee . For lossless coding, the information rate He becomes the information rate H of the acquired signal. 2. The information rate Hd of the displayed image, the associated maximum realizable fidelity Fd , and the visual information rate Ho that the human eye conveys from the displayed image to the higher levels of the brain. To be useful, the characterization must account for the physical behavior of the real world and lead to a close correlation between predicted and actual performance. These two basic requirements are not easily satisfied, for there exists a wide disparity between the complexity of the image gathering process and the simplicity of the model of this process for which the mathematics of communication theory yields expressions for information rate and fidelity. This disparity must be bridged carefully by judicious assumptions and approximations in developing a suitable mathematical model of digital image systems and of the associated figures of merit for optimizing their end-to-end performance. The mathematical model of image systems and the associated figures of merit developed in the second to fourth sections are sufficiently general to account for a wide range of radiance field properties, photodetection mechanisms, and signal processing structures. However, the results that are presented emphasize (1) the salient properties of the irradiance and reflectance of natural scenes that dominate the captured radiance field in the visible and near-infrared region under normal daylight
conditions and (2) the photodetector arrays that are used most frequently as the photodetection mechanism of image gathering devices. This emphasis fosters the comparison of image gathering and early human vision (11), which is of interest because both are constrained by the same critical limiting factors: 1. The sampling lattice formed by the angular separation of the photodetector-array apertures (or the photoreceptors in the eye). This lattice, which determines the angular resolution (or visual acuity), depends on the focal length of the objective lens (or pupil) and the separation of the apertures (or photoreceptors). The Fourier transform of the sampling lattice leads to the sampling passband that, analogous to the bandwidth of a communication channel, sets an upper bound on the highest spatial frequency components that the image gathering device (or eye) can convey. 2. The spatial response (or its Fourier transform, the spatial frequency response) and the signal-to-noise ratio. Both depend on the aperture area and focal length of the objective lens (or pupil) and on the aperture area and responsibility of the photodetectors (or photoreceptors). The signal-to-noise ratio and the relationship between the spatial response and sampling lattice — or, equivalently, between the spatial frequency response and sampling passband — determine the sharpness and clarity with which images can be restored (or the scene can be observed) within the fundamental constraint set by the sampling lattice or passband. 3. The data rate of the transmission link (or of the nerve fibers from the retina to the brain). When these factors are accounted for properly, which is the subject of the third section, then it emerges that the design of the image gathering device that is optimized for the highest realizable information rate corresponds closely, under appropriate conditions, to the design of the human eye that evolution has optimized for viewing the real world. This convergence of communication theory and evolution toward the same design corroborates the validity of extending communication theory to the characterization of image systems, and it also demonstrates the robustness of the approximate model of the image gathering process on which this characterization is based. Computer simulations are a valuable tool for correlating quantitative assessments based on the statistical properties of ensembles of natural scenes with the perceptual and measurable performance based on single realizations (or members) of these ensembles. Moreover, these simulations can illustrate the performance of image systems for periodic features and transients for which communication theory cannot account directly, and they can relate perceptual degradations in the displayed image to the perturbations that occur in image gathering and signal coding. The characterizations in the third and fourth sections are constrained, as is commonly done, to a uniform irradiance of the scene and to linear transformations
CHARACTERIZATION OF IMAGE SYSTEMS
in the image system. The fifth section extends these characterizations to irradiances that vary across the scene (e.g., due to shadows) and to a nonlinear transformation that seeks to suppress this variation and preserve only the spatial detail and reflectance (or color) of the surface itself. This transformation, which is based on a model of color constancy in human vision, illustrates several aspects of the characterization of image systems that are of general interest. One, this transformation addresses the goal of many currently evolving coding strategies that seek to gain high data compression by preserving only the edges and boundaries in the scene together with their contrast. Two, it stresses that a close relationship must be maintained between characterizing image systems and the appropriate model of the real world (which, in this case, requires distinguishing between the properties of the irradiance and the reflectance of natural scenes). Three, it stresses that the image-restoration algorithm must account for the perturbations in signal coding as well as in image gathering (which, in this case, includes the errors that arise in separating reflectance from irradiance). And four, it continues to foster the comparison of image systems with human vision.
RADIANCE FIELD
as a function of the continuous spatial coordinates (x, y) and spectral wavelength (λ). As depicted in Fig. 2, Ls (x, y; λ) is the reflected component due to the direct irradiance of the scene, La (x, y; λ) is the reflected component due to the downward scattered irradiance by the atmosphere, and Lp (λ) is the component, referred to as path radiance, that is scattered by the atmosphere directly into the image gathering device. The major omission in this model of the captured radiance field is its dependence on the irradiance and viewing geometry. This geometry has a dominant effect on the quality of the observed image, but it does not affect the design of the image system, except sometimes in determining its sensitivity requirement. The solar radiation I(λ) at the top of the atmosphere is altered significantly by the atmospheric transmission τ a (λ) along the incident path, so that the direct irradiance becomes Is (λ) = τ a (λ)I(λ). Commonly, it is assumed that the surface is Lambertian, for which the reflected radiance field is everywhere independent of the direction from which the scene is viewed. For this assumption, the component Ls (x, y; λ) due to the direct irradiance Is (λ) is Ls (x, y; λ) =
For the visible and near-infrared region of the optical spectrum, the radiance field L(x, y; λ) reflected by a scene and then captured by an image gathering device under normal daylight conditions may be modeled approximately by the expression (12,13) L(x, y; λ) = Ls (x, y; λ) + La (x, y; λ) + Lp (λ)
1 s I (λ)τra (λ)ρ(x, y; λ) cos θ, π
where ρ(x, y; λ) is the reflectance of the surface, τra (λ) is the atmospheric transmittance along the reflected path to the image gathering device, and θ is the angle between the direct irradiance and the surface normal. Similarly, the component La (x, y; λ) due to the downward scattered irradiance Ia (λ) by the atmosphere, or sky radiance, is
(1) La (x, y; λ) =
I (l)
Image system
Lp(l) Ls(x,y;l) La(x,y;l) r(x,y;l)
Figure 2. Radiance field components.
51
1 a I (λ)τra (λ)ρ(x, y; λ). π
Finally, the path radiance Lp (λ) due to molecular, aerosol, and particulate scattering in the atmosphere depends only on wavelength. If the topography of the scene creates shadows by obstructing some of the irradiance from reaching the surface, so that the total irradiance across the scene becomes nonuniform, then its two components should be represented as functions of the spatial coordinates (x, y) as Is (x, y; λ) and Ia (x, y; λ). Ordinarily, a shadow grows deep only when the topography blocks the direct irradiance. The depth of this shadow depends, then, on the fraction of the diffuse sky irradiance that still reaches the surface. Furthermore, if the surface of the scene has changes in slope, then the constant angle θ should be replaced by θ (x, y). Finally, if the angular reflectance property of the scene must be accounted for, then the term ρ(x, y; λ)/π should be replaced by the bidirectional reflectance distribution function (BRDF) of the surface. To characterize image systems in terms of communication theory, it is necessary to simplify the radiance-field model further and to imbue its salient features with statistical properties. The following model accounts for the nonuniform irradiance due to shadows, but it does not
52
CHARACTERIZATION OF IMAGE SYSTEMS
account for the topography or angular reflectance properties of the surface. Three simplifications are made, mostly to separate spatial and spectral dependencies: One, the reflectance is represented by ρ(x, y; λ) = ρ(x, y)ρi (λ), where ρ(x, y) ≡ ρ(x, y; λ ) accounts for the spatial variation of the reflectance at some wavelength λ and ρi (λ) accounts for the spectral variation of the reflectance of some patch i of the surface. Two, the irradiance is represented without distinction between the direct and scattered components by
where ρ(υ, ˆ ω) and ˆι(υ, ω) are the Fourier transforms, or spectra, of the reflectance and irradiance variations, |(·)|2 denotes the expected value, or mean, of |(·)|2 over an ensemble of scenes, and (υ, ω) are the continuous spatial frequency coordinates of the transform domain. For example, if the unit of the spatial coordinates (x, y) is meters, then the unit of the spatial frequency coordinates (υ, ω) is cycles/meter. The associated variances are
Isa (x, y; λ) = Is (x, y; λ) cos θ + Ia (x, y; λ) = Isa (λ)ι(x, y), where Isa (λ) is the magnitude of the total irradiance above the surface, and ι(x, y) characterizes the spatial variation of the irradiance that reaches the surface. Finally, the atmospheric transmittance along the reflected path and the path radiance are neglected. However, their effects must be included for imaging over long distances or from the air and space (12,13). For these simplifications, Eq. (1) becomes (2) Li (x, y; λ) = Ki (λ)ι(x, y)ρ(x, y), where Ki (λ) =
1 sa I (λ)ρi (λ). π
(3)
If the irradiance is uniform, then ι(x, y) = 1, and if only panchromatic imaging is considered, then the subscript i is dropped. To imbue this model of the radiance field with statistical properties that approximate those of natural scenes, the reflectance component ρ(x, y) is represented by ρ(x, y) = ρ(x, y) + ρ, where ρ(x, y) is the spatial variation of the reflectance around its mean ρ, and the irradiance component ι(x, y) is represented by ι(x, y) = ι(x, y) + ι, where ι(x, y) is the spatial variation of the irradiance around its mean ι. Hence, the spatial variation of the captured radiance field is represented by
σρ2 = σι2 =
ˆ ρ (υ, ω) = 1 |ρ(υ, ˆ ω)|2 |A| ˆ ι (υ, ω) =
1 |ˆι(υ, ω)|2 , |A|
ˆ ι (υ, ω) dυ dω.
Then, the captured radiance field L(x, y) is also a stationary process where the mean L = Ki ι ρ, the PSD is ˆ ρ (υ, ω) + ρ 2 ˆ ι (υ, ω) ˆ L (υ, ω) = Ki2 ι2 ˆ ρ (υ, ω) ∗ ˆ ι (υ, ω) , + and the variance is σL2 = Ki2 (ι2 σρ2 + ρ 2 σι2 + σι2 σρ2 ). ˆ ρ (υ, ω) of the reflectance of The corresponding PSD natural scenes may be modeled as 2π µ2ρ σρ2 (m − 2) , 2 m/2 ˆ ρ (υ, ω) = [1 + (2π µρ ζ ) ] Kρ , [1 + (2π µρ ζ )2 ]m/2
m>2 (4) m ≤ 2,
where ζ 2 = υ 2 + ω2 and m controls the fall-off rate with increasing spatial frequency ζ (Fig. 3a and 3b). For m ≤ 2, for which the variance of the reflectance is unbounded, it is necessary to limit the integration for σρ2 to the cutoff frequency ζ of the objective lens of the image gathering device, so that
(x, y) = ρ(x, y)ι(x, y), where |ρ(x, y)| ≤ min[ρ, 1 − ρ] to avoid reflectance values that are either negative or larger than unity and |ι(x, y)| ≤ min[ι, 1 − ι] to avoid irradiance values that are either negative or larger than I(λ). The mathematics of communication theory constrains these reflectance and irradiance variations to be independent, stationary random processes in the area |A| centered at x = y = 0. For a sufficiently large area |A|, the power spectral densities (PSDs) of ρ(x, y) and ι(x, y), respectively, can be approximated by
ˆ ρ (υ, ω) dυ dω
Kρ =
m 4π µ2ρ σρ2 1 − 2 , [1 + (2π µρ ζ )2 ](1−m/2) 4π µ2ρ σρ2 ln[1 + (2π µρ ζ )2 ]
,
m<2 m = 2.
Some studies (14,15) indicate that the fall-off rate is near m = 3, whereas others (16,17) indicate that this rate is near m = 2. Results in the fourth section demonstrate that informationally optimized designs of image gathering devices are highly robust to changes of m within this range. Figure 4 illustrates a specific model for the reflectance and irradiance and the resultant radiance field. The reflectance Mondrian in Fig. 4(a) consists of random polygons whose boundaries are distributed according
CHARACTERIZATION OF IMAGE SYSTEMS
(a)
105
∧
sr−2 Φr(u,w)
(a)
53
104
mr
103
1/9 1/3 1 3 9
102 101 100 10−1 10−2 10−3 10−4
10−3
10−2 10−1 u,w
100
Reflectance, r(x,y), mr = 5
101
Reflectance, m = 2
(b)
105
(b)
104 102 101
∧
sr−2 Φr(u,w)
103
100 10−1 10−2 10−3 10−4
Irradiance, i(x,y), mi = 10, sb = 10 10−3
10−2
10−1 u,w
100
101
(c)
Reflectance, m = 3 105
(c)
mi
104
1 3 9 27 81
∧
si−2 Φi(u,w)
103 102 101 100
Radiance field, (x,y)
10−1 10−2 10−3
10−4
Figure 4. Model of reflectance, irradiance, and radiance field.
10−3
10−2 10−1 u,w
100
101
Irradiance, sb = 10 Figure 3. Normalized PSDs of reflectance and irradiance.
to Poisson probability, whose mean separation is µρ and whose magnitudes are distributed according to independent zero-mean Gaussian statistics of variance σρ2 (18). The mean separation µρ between the edges of the polygons is measured relative to the sampling intervals of the image gathering process and treated as the mean spatial detail of the scene. The corresponding PSD ˆ ρ (υ, ω) is closely approximated by Eq. (4) for m = 3. The constraints associated with Eq. (3) are satisfied by letting the mean reflectance ρ = 0.5 and the standard deviation σρ ≤ 0.2. The irradiance Mondrian in Fig. 4(b) consists also of random polygons whose boundaries and magnitudes
are distributed according to the same probability functions as those of the surface Mondrian. However, the sharp edges are blurred to mimic the penumbra of shadˆ ι (υ, ω) is closely approxiows. The corresponding PSD mated by ˆ ι (υ, ω) =
2π µ2ι σι 2 exp[−2(σb ζ )2 ], [1 + (2π µι ζ )2 ]3/2
(5)
where µι is the mean spatial detail, σι 2 is the variance of the unblurred Mondrian, and σb is the blur index that controls the sharpness of the penumbra. The correspond ing variance of ι(x, y) is σι2 = σι 2 [1 − νeν ( 12 , ν)], where 1 1 ν = 2 (σb /π µι )2 and ( 2 , ν) is the incomplete gamma function. The constraints associated with Eq. (3) are satisfied by letting the mean normalized irradiance ι = 0.5 and the standard deviation σι ≤ 0.2. Moreover, shadows tend to be larger than the detail that they hide, so that µι > µρ .
54
CHARACTERIZATION OF IMAGE SYSTEMS
IMAGE GATHERING Optical Geometry Figure 5 depicts the basic optical geometry of image gathering devices, and Table 1 lists the corresponding design parameters (19–21). The objective lens forms an image of the captured radiance field across the photodetector array, and the array transforms this image into a discrete signal. The human eye has the same configuration; the pupil acts as the objective lens, and the photoreceptors in the retina act as the photodetector array. Optical image formation is depicted in Fig. 6 for the thin-lens approximation that treats the thickness of the lens as negligible compared to the focal length f and to the object and image distances o and i , respectively. The graphical ray trace is done according to the following rule: 1. Rays parallel to the optical axis are refracted to exit through the focal point.
Picture element
Objective lens
Photodetector array Xp Yp
f
∆
i
o
p
Figure 5. Optical geometry of image-gathering device.
2. Rays through the focal point are refracted to exit parallel to the optical axis. 3. Rays through the center of the lens are passed without change in direction. The resultant geometry relates the object and image distances by the thin-lens equation 1 1 1 = − . o f i Hence, for the simplified model of the radiance field given in the previous section, the reflected radiance field Lo (xo , yo ; λ) just above the surface of the scene and the geometric image of the captured radiance field L(x, y; λ) are related by L(x, y; λ) = Lo (i x/o , i y/o ; λ). The photodetector array is normally located at or near the image plane. If all of the objects are far away, so that o ≈ ∞ (as for a telescope), then the photodetector array should be located at the focal point of the objective lens so that i = f . However, if the objects appear at various distances from the lens, then only some of them can be in focus, and the others will be out of focus to various degrees. When this occurs, then the photodetector array should be located at a distance p from the objective lens (e.g., by a focus control mechanism) that minimizes the difference = |i − p | for most (or for the most important) objects. To characterize the optical geometry of the image gathering device, it is often convenient to specify the dimensions of the photodetector array (see Fig. 7) in terms of angular coordinates centered at the objective lens. Thus, the sampling lattice (X,Y), which determines the (angular) resolution of the image gathering device, is related to the separation (Xp ,Yp ) of the photodetector-array apertures by the distance p as given for one of the dimensions by X = 2 tan−1
Table 1. Design Parameters Parameter
Symbol
Unit
Lens aperture diameter Lens focal length Lens f number Photodetector aperture spacings Photodetector aperture width Photodetector distance from lens Spectral transmittance Photodetector responsivity
D f F = f /D X p , Yp γp p τ (λ) r(λ)
m m m m m A/W
Object
Xp Xp 57.3Xp ≈ rad = deg. 2p p p
The solid angle of the instantaneous field of view (IFOV) is related to the area Ap of the photodetectorarray apertures in the same way by ≈ Ap /2p ster, √ where Ap = γp2 for the square array and Ap = 3γp2 for the hexagonal array. Normally, the difference |p − f | is small so that the distance p can be replaced by the focal length f to determine (X,Y) and in terms of the lens specifications. This simplification facilitates characterization of the optical geometry in terms of a single optical-design index, either γp /2λF or γp /2λF, where F = f /D is the f number of the objective lens whose diameter is D. Responsivity and Spatial Response (SR)
Optical axis
Image
f o
f i
Figure 6. Image formation based on graphical raytrace rules for thin-lens approximation.
The responsivity and SR of image gathering devices evolve together from the physics that governs the transformation of the captured (continuous and incoherent) radiance field L(x, y; λ) into the acquired (discrete) signal sj (x, y). The subscript j denotes the transmittance τj (λ) of the jth spectral filter in the optical path from the lens to the photodetector array for color and multispectral imaging, and the bold Cartesian pair (x, y) denotes the set of
CHARACTERIZATION OF IMAGE SYSTEMS
55
step addresses the transformation of Ij (x, y; λ) into the signal sj (x, y) that the photodetector array produces. The integration of these two steps into a single expression leads to the mathematical model of the image gathering process given in ‘‘Mathematical Model.’’ The spectral irradiance Ij (x, y; λ) across the photodetector array is related to the geometric image L(x, y; λ) of the captured radiance field by the expression
(a)
gp
Xp
gp
Xp
Ij (x, y; λ) = λ2
w
τj (λ)L(x, y; λ)|T (x − x , y − y ; λ)|2 dx dy ,
(6a) where T (x, y; λ) is the coherent spatial response of the lens given by
1/2
u 1/2
T (x, y; λ) =
−i
Square (Xp = 1)
2π P (x , y ; λ) exp i W(x , y ; λ) λ
2π (x x + y y) dx dy , λi
P (x , y ; λ) is the lens pupil function, W(x , y ; λ) is an effective path-length error, and (x , y ) are the physical spatial coordinates in the plane of the lens aperture. A variable lens transmittance shading t(x , y ; λ) can be accounted for by letting P (x , y ; λ) = t(x , y ; λ) within the field stop and P (x , y ; λ) = 0 outside. The path-length error is given by
(b)
g p′
Xp
1 λ 2 2
g ′p 3
W(x , y ; λ) =
π 2 (x + y2 ), 22i
w
where = | − p |. Finally,
2/3
1/√3
u
|T (x, y; λ)|2 dx dy = =
Regular hexagon (Xp = 1) Figure 7. Photodetector-array apertures and their sampling passbands.
spatial locations of the photodetector-array apertures. For panchromatic imaging, the subscript j is dropped, and for color imaging, the subscript becomes j = b (blue), g (green), and r (red). Responsivity is also referred to as sensitivity and SR as point-spread function (PSF), especially for photographic image systems that transform the continuous radiance field directly into a continuous image rather than a discrete signal. The expression for the transformation of L(x, y; λ) into sj (x, y) is developed in two steps. The first step addresses the transformation of L(x, y; λ) into the spectral irradiance Ij (x, y; λ) that the objective lens and a spectral filter form across the photodetector array, and the second
1 2 λ 2i
|P (x , y )|2 dx dy
k (λ)A = kˆ (λ) /λ2 , λj2 2i
where A = π D2 /4 is the area of the lens aperture, = A /2i is the solid (energy collecting) angle subtended by the lens, and k (λ) is the spectral transmittance of the lens given by k (λ) =
1 A
|P (x , y ; λ)|2 dx dy .
Hence, the transformation of the captured radiance field into the irradiance across the photodetector array given by Eq. (6a) can be modeled succinctly by the expression Ij (x, y, λ) = k (λ)τj (λ)L(x, y; λ) ∗ τ (x, y; λ),
(6b)
where the symbol ∗ denotes the continuous spatial convolution in Eq. (6a) and τ (x, y; λ) is the spatial and
56
CHARACTERIZATION OF IMAGE SYSTEMS
is the gain of the radiance-to-signal conversion, Lj (x, y) ≡ L(x, y; λj ),
spectral response of the objective lens, defined by τ (x, y; λ) =
|T (x, y; λ)|2
.
τj (x, y) ≡ τ (x, y; λj ) = τ (x, y; λj ) ∗ τpj (x, y)
|T (x, y; λ)|2 dx dy
A single member of the photodetector array positioned at an arbitrary location (x, y) transforms that portion of the irradiance Ij (x, y; λ) that its aperture captures into the electronic signal (current or voltage) sj (x, y) given by sj (x, y) =
Ij (x, y; λ)rpj (x, y; x, y; λ) dx dy dλ,
(7a)
(9)
is the SR of the image gathering device for the jth spectral band, and j = Apj /2p . These assumptions do not hold for panchromatic imaging without filters to provide narrow spectral transmittances. Hence, if the image gathering process occurs over the wide spectral responsivity r(λ), then it is necessary to make the following two approximations: (1) let λj become the spectral-radiance-and-responsivity weighted wavelength λ given by
where rpj (x, y; x, y; λ) is the linear response function that accounts for the spatial and spectral properties of this photodetector. If all of the photodetectors are identical, then their response functions are spatially translated replicas of a universal photodetector located at the origin of the (x, y) coordinate system, rpj (x, y; λ), so that rpj (x, y; x, y; λ) = rpj (x − x, y − y; λ). Moreover, if the spatial and spectral properties of this photodetector are separable, then rpj (x, y; λ) ≡ Ppj (x, y)rj (λ) = Apj τpj (−x, −y)rj (λ), where Ppj (x, y) and Apj are the aperture transmittance function and area, respectively, and τpj (x, y) and rj (λ) are the SR and spectral responsivity, respectively. Normally, the aperture is entirely transparent so that Ppj (x, y) is equal to unity inside and zero outside the area Apj . Under these conditions, Eq. (7a) becomes the electronic signal of the photodetector array given by sj (x, y) = Apj
Ij (x, y; λ)τpj (x − x, y − y)rj (λ) dx dy dλ
= Apj
Ij (x, y; λ) ∗ τpj (x, y)rj (λ) dλ.
(7b)
For color and multispectral imaging with filters that provide narrow spectral transmittances τj (λ), it may be assumed that the irradiance Ij (x, y; λ) given by Eq. (6b) is uniform over the effective spectral width λj of the integration in Eq. 7(b) and, furthermore, that λj /λj 1. Substituting Eq. (6b) in Eq. 7(b) under these conditions leads to the simplified expression sj (x, y) = Kj Lj (x, y) ∗ τj (x, y),
(7c)
λ=
where K(λ) is the expected mean value of Ki (λ) given by Eq. (3), and (b) combine K(λ) with the gain Kj of the radiance-to-signal conversion given by Eq. (8) as the gain K of the reflectance-to-signal conversion given by K=
1 A π
= A j
(8)
(10)
(11)
This expression describes in compact form both the continuous convolution of the reflectance ρ(x, y) of the scene, where the SR τ (x, y) of the image-gathering device is given by τ (x, y) = τ (x, y) ∗ τp (x, y) and the continuous-to-discrete transformation by the discrete lattice (x, y) of the photodetector array. However, the resultant perturbations due to blurring and sampling and their interrelationship emerge more clearly in the spatial frequency domain, as the following section delineates. Spatial Frequency Response (SFR) and Sampling Passband Continuous functions fˆ (υ, ω) in the spatial frequency domain whose coordinates are (υ, ω) are related to continuous functions f (x, y) in the spatial domain whose coordinates are (x, y) by the Fourier transform pair (22–25)
k (λ)τj (λ)r(λ) dλ k (λ)τj (λ)r(λ) dλ
I(λ)ρ(λ)k (λ)r(λ) dλ,
s(x, y) = Kρ(x, y) ∗ τ (x, y).
f (x, y) =
where ρ(λ) is the expected mean spectral reflectance of the scene. For these approximations, Eq. (7b) simplifies directly to the expression for the discrete electronic signal s(x, y) given by
Kj ≡ Apj
, K(λ)k (λ)r(λ) dλ
where
λK(λ)k (λ)r(λ) dλ
fˆ (υ, ω) =
fˆ (υ, ω) exp[i2π(υx + ωy)] dυ dω
f (x, y) exp[−i2π(υx + ωy)] dυ dω.
CHARACTERIZATION OF IMAGE SYSTEMS
In particular, this transform pair relates the reflectance ρ(x, y) to its spatial frequency distribution, or spectrum, ρ(υ, ˆ ω), and the SR τ (x, y) of the image gathering device to its SFR τˆ (υ, ω) = τˆ (υ, ω)τˆp (υ, ω), where convolution in the spatial domain becomes multiplication in the spatial frequency domain and τˆ (υ, ω) and τˆp (υ, ω) are the SFRs of the objective lens and photodetector apertures, respectively. If the coordinates (x, y) in the spatial domain are centered at the objective lens and specified in units of radians, then the corresponding coordinates (υ, ω) in the spatial frequency domain have the units of cycles/rad. Alternately, the coordinates (x, y) may be located at the scene or at the photodetector array and specified in units of meters, for which the corresponding coordinates (υ, ω) have the units of cycles/m. Similar relationships between the discrete signal s(x, y) given by Eq. (11a) and its corresponding periodic spectrum s˜ (υ, ω) require careful treatment of the continuous-todiscrete transformation by the photodetector array to account for insufficient sampling. To this end, first consider the relationship s(x, y) = sˆ (υ, ω) exp[i2π(υx + ωy)] dx dy between the discrete signal s(x, y) and the spectrum sˆ (υ, ω) ≡ Kρˆ (υ, ω)τˆ (υ, ω), as obtained from Eq. (11) by the previous transform and convolution properties. For the continuous-to-discrete transformation a set of discrete spatial frequency coordinates (v, w) exists, such that xv + yw = integer, which constitute a regular lattice in the spatial frequency domain. For example, for a rectangular lattice with sampling intervals (X,Y), (x, y) = (mX, nY), and (v, w) = (k/X, /Y), where (m, n; k, ) are integers. In the spatial domain, the array is covered by primitive cells (or pixels) of area |A| (= XY for the rectangular lattice). Similarly, the spatial frequency domain is covered by primitive cells ˆ = |A|−1 , each of them can be constructed as of area |B| the minimum regions whose boundaries are formed by the perpendicular bisectors of the lines that connect (v, w) to their neighboring lattice points. The primitive cell that is associated with the origin (v, w) = (0, 0) is the sampling ˆ which for the rectangular lattice, becomes passband B,
ˆ = (υ, ω); |υ| ≤ 1 , |w| ≤ 1 . B 2X 2Y These observations indicate that the exponential in the previous integral has the periodic property exp{i2π [(υ + v)x + (ω + w)y]} = exp[i2π(υx + ωy)] and, hence, that the integration over the entire frequency ˆ so that plane can be folded into the sampling passband B, s˜ (υ, ω) exp[i2π(υx + ωy)] dυ dω,
s(x, y) = ˆ B
(12a)
where s˜ (υ, ω) =
57
sˆ (υ − v, ω − w).
v,w
The spectrum s˜ (υ, ω) identified by the tilde ‘‘∼’’ represents a sum of periodic duplications of the spectrum sˆ (υ, ω) identified by the caret ‘‘∧.’’ This duplication can be expressed succinctly by s˜ (υ, ω) = sˆ (υ, ω) ∗ |ˆ|| = K ρ(υ, ˆ ω)τˆ (υ, ω) ∗ |ˆ||, where |ˆ|| ≡ |ˆ||(υ, ω) = = |A|
δ(υ − v, ω − w)
v,w
exp[−i2π(υx + ωy)]
x,y
is the Fourier transform of the sampling function ||| ≡ |||(x, y) = |A|
δ(x − x, y − y)
x,y
and δ(υ, ω) and δ(x, y) are Dirac delta functions. Now, the mathematical treatment of the continuous-to-discrete transformation can be finalized by noting that the latter representation of |ˆ|| is the Fourier series of its former representation; Eq. (12a) can be inverted and with the series s˜ (υ, ω) = |A|
s(x, y) exp[−i2π(υx + wy)]
(12b)
x,y
forms a discrete Fourier transform (DFT) pair, and Eq. (12b) presents the periodic spectrum s˜ (υ, ω) of the discrete signal s(x, y) given by Eq. (12a). The SFR, or normalized optical transfer function (OTF), of a diffraction-limited lens that has a circular aperture is given by (26–28)
τˆ (υ, ω) =
ζ ˆ∗ ζ ˆ P υ + , ω P υ − , ω 2 2
exp(iuζ υ ) dυ dω
,
Pˆ (υ , ω )Pˆ ∗ (υ , ω ) dυ dω
where Pˆ (υ, ω) =
tˆ(υ, ω), 0,
υ 2 , ω2 ≤ 1 elsewhere,
ζ = (υ 2 + ω2 )1/2 = 2λFζ, 2 π D π ≈ u≈ 2λ i 2λF 2 is the defocus parameter and tˆ(υ, ω) is the transmittance. The corresponding modulation transfer function (MTF) is |τˆ (υ, ω)|. The dimensionless spatial-frequency variables
58
CHARACTERIZATION OF IMAGE SYSTEMS
u, o, and ζ are normalized to the coherent cutoff frequency 1/2λF, where λ is the spectral-radiance-and-responsivity weighted wavelength. For a diffraction-limited lens that has a clear aperture and u = 0, the SFR simplifies to
1.0
(a)
u 0 2 4 6 8 10
0.8 t (u,w)
0.6 0.4
∧
ζ ζ 2 2 −1 ρ − 1− , ζ ≤ λF τˆ (υ, ω) = π cos λF λF λF 0, otherwise.
0.2 0 −0.2
Hopkins (26,27) was the first to formulate the SFR of a defocused diffraction-limited lens that has a clear aperture, i.e., tˆ(υ, ω) = 1. Mino and Okano (28) subsequently extended this formulation to include two circularly symmetrical apodizations, or variable lens transmittance shadings, that reduce defocus blur. SFRs for different shadings can also be obtained directly by numerical integrations. Figure 8 illustrates the SFR τˆ (υ, ω) for a clear and shaded aperture, for which
0
0.4
0.8 1.2 u,w
1.6
2.0
Clear aperture, k = 1 1.0
(b)
u 0 2 4 6 8 10
0.8
0.4
∧
t (u,w)
0.6
2 tˆ(υ, ω) = tˆ(ζ ) = 1 − ζ .
0.2 0
The corresponding ratio k of light transmitted through the shaded aperture to that through a clear aperture, i.e., the effective transmittance in Eqs. (8) and (10), is 1
k = 2
ζ |tˆ(ζ )| dζ .
Hence, the reduction in defocus blur by shading causes a loss in transmittance. The angular sensitivity of the eye’s photoreceptors (i.e., the Stiles–Crawford effect (29)) produces a result similar to that from the variable lens transmittance shading. Metcalf (30) showed that the angular sensitivity of the photoreceptors can be considered equivalent to a variable pupil aperture transmittance, and Carroll (31) showed that this variable transmittance produces a SFR similar to that in Fig. 8(b). The SR and SFR and the sampling lattice of the square photodetector array depicted in Fig. 7(a) are given by the Fourier transform pairs 1 , γ2 0,
0.4
0.8 1.2 u,w
1.6
2.0
Figure 8. SFRs of diffraction-limited lens that has clear and shaded apertures for a coherent cutoff frequency 1/2λF = 1 and several amounts of defocus u.
2
0
0
Shaded aperture, k = 0.33
τp (x, y) =
−0.2
γ γ , |y| < 2 2 elsewhere |x| <
τˆp (υ, ω) = sincγ υsincγ ω,
ˆ = 1/X 2 . The same characteristics of the where area |B| regular hexagonal array depicted in Fig. 7(b) are given by
√ 3|x| |y| |y| < γ γ 2 √ , , + < = 2 2 2 2 2 3γ 0, elsewhere γυ π υ 1 sinc √ cos γ √ − ω τˆp (υ, ω) = 3 2 3 3 υ υ π 1 × sinc γ √ + ω + cos γ √ + ω 2 2 3 3 υ πγ υ 1 ×sinc γ √ − ω + cos √ 2 3 3 υ υ 1 1 ×sinc γ √ − ω sinc γ √ + ω 2 2 3 3 τp (x, y)
and
and ||| = X 2
√ √ 3 2 3 1 ||| = δ x− X X (m + n), y − X (m − n) 2 2 2 m,n
(m + n) (m − n) |ˆ|| = . δ υ− √ ,ω − X 3X m,n
δ(x − Xm, y − Xn).
m,n
|||ˆ =
m n δ υ − ,ω − . X X m,n
The corresponding sampling passband is
The corresponding sampling passband is
ˆ = (υ, ω); |υ| < 1 , |ω| < 1 B 2X 2X
√ 3|ω| 1 |υ| 1 ˆ B = (υ, ω); |υ| < √ , + < √ 2 3X 2 3X
CHARACTERIZATION OF IMAGE SYSTEMS
t^p(u,w)
tp(x,y)
59
^ B
1.0 0.8 t^p(u,w)
0.6
−1 −0.5 x
0.5
1
1
0.5
−0
.5
−1
−1 − 0.5 u
0.4 0.2 0
0.5
1
y
1
0.5
−0.2
−1
−0
.5
−0.4
w
^t (u,w) p
tp(x,y)
0
0.4
0.8 1.2 u,w
1.6
2.0
1.6
2.0
^ B
1.0 0.8 t^p(u,w)
0.6
−1 − 0.5 x
0.5
1
1
0.5
−0
.5
−1
−1 − 0.5 u
0.4 0.2 0
0.5
1
y
1
0.5
−0
.5
w
−1
−0.2 −0.4
0
0.4
0.8 1.2 u,w
Figure 9. SRs and SFRs of square and hexagonal photodetector arrays with contiguous apertures.
√ ˆ | = 2/ 3X 2 . The areas |B| ˆ and |Bˆ | and where area |B hence the sampling densities of the square and hexagonal lattices are equal to each other, the dimensions when 1/2 √ 3/2 X = 0.93X . X and X are such that X = The previous dimensions γ , γ (= γp /p , γp /p ) and X, X (= Xp /p , Xp /p ) are given in the angular coordinates centered at the objective lens. However, these dimensions may be replaced by γp , γp and Xp , Xp for spatial coordinates centered at the photodetector array. Figure 9 illustrates the SR and SFR of the square and hexagonal photodetector arrays for γ = X = 1 and γ = X = 1.075. The associated (normalized) sampling passbands are ˆ = [(υ, ω); |υ| < 0.5, |ω| < 0.5] B ˆ = [(υ, ω); |υ| < 0.54, |w| < 0.62 − 1.73|υ|]. B It is important to observe that the SFRs of the photodetector apertures extend substantially beyond these sampling passbands and, therefore, that the objective lens (and/or an optical filter between the lens and the aperture array) must be used to adjust the relationship between the SFR τˆ (υ, ω) of the image gathering device and the ˆ Normally, the dimensions of the sampling passband B. photodetector apertures are smaller than the sampling intervals (typically, γ = 0.9X), which reduces the acquired energy of the captured radiance field and extends the SFR further beyond the sampling passband, but both only by a small margin. Figure 10 illustrates three sets of SFRs τˆ (υ, ω) of image gathering devices. The two curves in each set that are
specified by the optical-design index γ /2λF represent physical realizations of image gathering devices that have a square photodetector array lattice, one curve is for the clear lens, and the other is for the shaded lens. A comparison of these SFRs shows that those for the shaded lens are higher inside and lower outside the sampling passband, but only by a small margin. The third curve in each set that is specified by the optical-response index ζc represents a convenient approximation of these SFRs by the circularly symmetrical Gaussian shape ζ 2 τˆ (υ, ω) = exp − . ζc As these curves show, the performance of image gathering devices is constrained inevitably by the compromise between the blurring that is caused by the decrease of the SFR inside the sampling passband and the aliasing that is caused by extending the SFR beyond this passband. In 1934, Mertz and Gray (32) were the first to observe this basic constraint. Although their analysis addressed analog line-scan telephotography and television, in which sampling occurs only normal to the line-scan direction, they concluded correctly that ‘‘the complete process of transmission may be divided into two parts: (a) the reproduction of the original picture with a blurring similar to that caused in general by an optical system of only finite perfection, and (b) the superposition on it of an extraneous pattern [aliasing] not present in the original, but which is a function of both the original and the scanning system.’’
60
CHARACTERIZATION OF IMAGE SYSTEMS
1.0
(a)
Gaussian, zc = 0.5 Clear, gp /2lF = 0.8
∧
t (u,w)
0.8
Shaded, gp /2lF = 0.8
0.6 0.4 0.2
0
0.2
0.4 0.6 u, w
0.8
1.0
K sr /sp = 16 ∧
1.0
(b)
B Gaussian, zc = 0.4
∧
t (u,w)
0.8
Clear, gp /2lF = 0.4 Shaded, gp /2lF = 0.5
0.6
ˆ pq = (υ, ω); |υ| < p , |ω| < q . B 2X 2Y
0.4 0.2
0
0.2
0.4 0.6 u, w
0.8
1.0
Ksr /sp = 64 1.0
(c)
Gaussian, zc = 0.3
0.8
∧
t (u,w)
restoration to suppress the effects of the perturbations in image gathering and display. This distinction is important because image reconstruction without digital processing is constrained to produce a continuous representation of the discrete output of the image gathering device, whereas digital restoration seeks to produce a representation of the input to this device (i.e., the scene) within the limits of resolution set by the sampling lattice. As a result, image reconstruction and restoration impose different requirements on the design of the image gathering device. When these different requirements are met, then the restoration can produce significantly sharper and clearer images (10). Normally, as discussed earlier, the sampling passband ˆ is formed by the intervals (X,Y) of the photodetectorB array lattice. However, in some applications, it is possible to increase the effective sampling passband by shifting the field of view (FOV) of the image gathering device across the scene between successive images by a fraction of the intervals (X,Y). In particular, if the FOV is shifted p times in the x direction at the interval X/p and q times in the y direction at the interval Y/q, then the effective sampling passband increases to
Clear, gp /2lF = 0.34 Shaded, gp /2lF = 0.42
0.6 0.4 0.2
0
0.2
0.4 0.6 u, w
0.8
1.0
K sr /sp = 256 Figure 10. Informationally optimized SFRs for three SNRs. The SFRs are given for the clear and shaded objective lens and the square photodetector array that has contiguous apertures. Also shown are Gaussian approximations of these SFRs.
Schade (33) and Schreiber (34), among others, continued the analysis of line-scan image systems in which image gathering and display are performed without the aid of digital restoration, and Fales et al. (35) and Huck et al. (10,36) extended this analysis to photodetector-array as well as line-scan image systems that use digital
To improve the resolution of the composite image that is reproduced (reconstructed or restored) from the pq images in proportion to this increase in sampling passband, it is also necessary to extend the SFR τˆ (υ, ω) correspondingly. However, for a square photodetector array that has contiguous apertures, characterized in Fig. 9, the SFR τˆp (υ, ω) is limited to |υ|<1/X, ˜ |ω|<1/X. ˜ Hence, the blurring by these apertures constrains the improvement in resolution to a factor of 2. Further improvements in resolution can be attained only by reducing the area of the photodetector apertures to extend their SFRs, but this also reduces the SNR. Another more novel method for producing images that have spatial detail that is finer than the sampling lattice (X,Y) grew out of a purely mathematical examination of insufficient sampling and the nature of aliased signal components (7,10). This method consists of an image gathering process that acquires ϒ images; each has a different ˆ SFR τˆ (υ, ω) that extends beyond the sampling passband B and a Wiener-matrix filter that unscrambles and reassembles the within-passband and aliased signal components. Within the constraint imposed by the realizability of the SFR τˆp (υ, ω), the resolution of the restored image can be √ increased by a factor approaching 1/ ϒ times the sampling intervals, if the SNR is high. Moreover, it is necessary for the FOV to remain stationary relative to the scene or to shift by a known interval. This examination revealed that the aliased signal components actually carry information, but that this information becomes useful only if these components can be properly unscrambled and reassembled. Otherwise, the aliased signal components interfere with both image coding and restoration, consistent with their treatment as noise in the mathematical model of the image gathering process that follows.
CHARACTERIZATION OF IMAGE SYSTEMS
Mathematical Model Acquired Signal The previous assumptions and approximations about the transformation of the captured radiance field L(x, y; λ) into the acquired signal s(x, y) lead to the mathematical model of the (panchromatic) image gathering process given by the expression s(x, y) = Kρ(x, y) ∗ τ (x, y) + np (x, y),
(13a)
which accounts, in the simplest form, for the reflectanceto-signal conversion gain K, the reflectance ρ(x, y) of the scene, the SR τ (x, y) of the image gathering device, the continuous-to-discrete transformation by the photodetector-array lattice, and the discrete noise np (x, y) of the photodetectors. It is assumed that the A/D conversion is performed with a sufficiently large number of quantization levels, so that the perturbation due to this conversion is negligible and the information conveyed by the output digital signal (that has quantized amplitudes) remains equal to that of the input discrete signal (that has analog amplitudes). This assumption permits treating the perturbation due to coarse quantization as confined to lossy signal coding. The DFT of the previous expression for the acquired signal s(x, y) is the periodic spectrum s˜ (υ, ω) given by s˜ (υ, ω) = K ρ(υ, ˆ ω)τˆ (υ, ω) ∗ |ˆ|| + n˜ p (υ, ω) ˆ = K ρ(υ, ˆ ω)τˆ (υ, ω) + n(υ, ω),
Gaussian, stationary, and statistically independent. It follows from these assumptions that the aliased signal components caused by insufficient sampling can also be treated as an independent and additive noise (7–10). To complete the description of the signal s(x, y), it is necessary to define the salient statistical properties of the signal variations s(x, y) = s(x, y) − s, where s = Kρ is the mean value of the acquired signal. These properties are given by the PSDs ˜ s (υ, ω) ≡
1 |˜s(υ, ω)|2 |A|
˜ p (υ, ω) ≡ 1 |n˜ p (υ, ω)|2 |A| of the signal and the photodetector noise, respectively, and by the associated variances 2 ˜ s (υ, ω) dυ dω σs = ˆ B
σp2 =
˜ p (υ, ω) dυ dω.
ˆ B
˜ s (υ, ω) of the acquired signal is Hence, the PSD ˆ n (υ, ω), ˆ ρ (υ, ω)|τˆ (υ, ω)|2 + ˜ s (υ, ω) = K 2
(13b) (13c)
61
(14)
where ˆ n (υ, ω) = ˆ a (υ, ω) + ˜ p (υ, ω)
where ˆ n(υ, ω) = nˆ a (υ, ω) + n˜ p (υ, ω) is the spectrum of the total noise in the acquired signal. The term nˆ a (υ, ω) represents the aliased signal components caused by insufficient sampling, given by ˆ ω)τˆ (υ, ω) ∗ |ˆ||s nˆ a (υ, ω) = K ρ(υ, =K ρ(υ ˆ − v, ω − w)τˆ (υ − v, ω − w), v,w=(0,0)
where |ˆ||s ≡ |ˆ||s (υ, ω) =
δ(υ − v, ω − w)
v,w=(0,0)
represents the sampling sidebands and the term n˜ p (υ, ω) represents the photodetector noise given by n˜ p (υ, ω) = |A|
np (x, y) exp[−i2π(υx + ωy)].
x,y
It is common in the prevalent digital image processing literature to interpret the convolution in Eq. (13a) erroneously as a discrete rather than continuous process and, thereby, to miss the effects of insufficient sampling that Eq. (13c) reveals clearly. Several assumptions have to be made to characterize the image gathering process modeled by Eqs. (13) in terms of communication theory, namely, that this process is linear and isoplanatic (spatially invariant) and that the radiance field and photodetector noise amplitudes are
and ˆ a (υ, ω) = K 2 ˆ ρ (υ, ω)|τˆ (υ, ω)|2 ∗ |ˆ|| . s Encoded Signal Signal encoding may be divided into two stages, as depicted in Fig. 1. The first stage accounts for manipulations of the acquired signal that seek to extract those features of the captured radiance field that are considered perceptually most significant; this usually entails some loss of information. This stage is represented by the digital operator τe (x, y) and the quantization noise nq (x, y; κ). The operator is constrained to be linear in this and the next section; however, it is allowed to become nonlinear in the section ‘‘Retinex Coding.’’ The second stage, which is represented by the encoder, accounts for manipulations that seek to reduce redundancies in the acquired signal without loss of information. For these conditions, the encoded signal se (x, y; κ) is given by the Fourier transform pair ∗ τe (x, y) + nq (x, y; κ) se (x, y; κ) = s(x, y)
(15a)
s˜ e (υ, ω; κ) = s˜ (υ, ω)τ˜e (υ, ω) + n˜ q (υ, ω; κ),
(15b)
where the symbol defined by
∗
denotes the discrete convolution
∗ τ (x, y) = |A| s(x, y)
x ,y
s(x , y )τe (x − x , y − y )
62
CHARACTERIZATION OF IMAGE SYSTEMS
and nq (x, y; κ) is the quantization noise. If the data transmission and/or storage and the signal decoding are error-free, as is assumed here, then the decoded signal remains identical to the encoded signal. The quantization noise nq (x, y; κ) is modeled, like the photodetector noise, as Gaussian, stationary, statistically independent, and additive where the PSD is ˜ q (υ, ω; κ) ≡
1 |n˜ q (υ, ω; κ)|2 |A|
σq2 =
˜ q (υ, ω; κ) dυ dω.
For η-bit quantization, where η = log κ and κ is the number of uniformly spaced quantization levels, the PSD becomes (9,10) ˆ −1 σ 2 = |B| ˆ −1 (σs /κ)2 , ˜ q (υ, ω; κ) = |A|σq2 = |B| e q
(16)
where σs2e is the variance of se (x, y) approximated by σs2e ≈ K 2
ˆ ρ (υ, ω)|ˆ e (υ, ω)|2 dυ dω.
ˆ B
This approximation is based on the premise that the noise in s(x, y) does not contribute significantly to the variance σs2e . To characterize the joint performance of successive stages of the image system from the image gathering device onward, it is convenient to introduce the concepts of ˆ and accumulated noise n(·); ˆ throughput SFR (·) both with subscripts that reflect the last stage that is accounted for. Then, upon substituting Eq. (13c) for the acquired signal s˜ (υ, ω) in Eq. (15b), the encoded signal is given succinctly in the same form as Eq. 13(c) by ˆ ω)ˆ e (υ, ω) + nˆ e (υ, ω; κ), s˜ e (υ, ω; κ) = K ρ(υ,
(15c)
(18a)
is the accumulated noise. This noise reduces to ˆ ω)τˆe (υ, ω) nˆ e (υ, ω) = n(υ,
ˆ n (υ, ω)|τˆe (υ, ω)|2 . ˆ ne (υ, ω) =
(19a)
ˆ (19b) = E[˜se (υ, ω; κ)] − E[nˆ e (υ, ω; κ) : (υ, ω) ∈ B]. A full representation of He in terms of the probability densities ps [˜se (υ, ω; κ)] and pn [n˜ e (υ, ω; κ)] of the encoded signal and the total noise, respectively, is given by the expression He =
1 ˆ 2|B|
ps [˜se (υ, ω; κ)] ˆ B
(19c)
where log ≡ log2 ,
when the quantization noise is negligible. The corresponding PSD is
ˆ ne (υ, ω) + ˜ q (υ, ω; κ), ˆ ne (υ, ω; κ) =
He = E[se (x, y; κ)] − E[neo (x, y; κ)]
× dnˆ e (υ, ω; κ) dυ dω,
nˆ e (υ, ω; κ) = nˆ e (υ, ω) + n˜ q (υ, ω; κ)
ˆ ρ (υ, ω)|ˆ e (υ, ω)|2 + ˜ se (υ, ω; κ) = K 2 ˜ ne (υ, ω; κ),
where the first term, E[·], represents the entropy of the encoded signal, and the second term, E[·|·], represents the conditional entropy of this signal, given the reflectance. The conditional entropy is the uncertainty in se (x, y; κ) when ρ(x, y) is known. Thus, the information rate He given by this defining equation is the entropy of the encoded signal less the part of this entropy that is due to the accumulated noise. For the assumptions that have been made about image gathering and coding, the conditional entropy becomes the entropy E[neo (x, y; κ)] of those variations of the accumulated noise ne (x, y; κ), denoted by neo (x, y; κ), for which the spectrum nˆ e (υ, ω; κ) ˆ Hence, is contained within the sampling passband B. Eqs. (18) simplify to
× log ps [˜se (υ, ω; κ)]d˜se (υ, ω; κ) − pn [nˆ e (υ, ω; κ)] log pn [nˆ e (υ, ω; κ)]
ˆ e (υ, ω) = τˆ (υ, ω)τ˜e (υ, ω)
is the throughput SFR and
and
The information rate He (in bits/sample) of the encoded signal is the mutual information between those variations of the continuous reflectance ρ(x, y), denoted by ρo (x, y), for which the spectrum ρ(υ, ˆ ω) is contained within the ˆ and the digital signal se (x, y; κ), sampling passband B defined by (1)
ˆ (18b) = E[˜se (υ, ω; κ)] − E[˜se (υ, ω; κ)|ρ(υ, ˆ ω) : (υ, ω) ∈ B],
ˆ B
where
Information Rate He
He = E[se (x, y; κ)] − E[se (x, y; κ)|ρo (x, y)]
and the variance is
where
Figures of Merit
(17)
ps [˜se (υ, ω; κ)] =
exp[−|˜se (υ, ω; κ)|2 , ˜ se (υ, ω; κ)|A| ˜ se (υ, ω; κ)|A|] π
pn [nˆ e (υ, ω; κ)] =
exp[−|nˆ e (υ, ω; κ)|2 1 , ˆ ˆ ne (υ, ω; κ)|A|] π ne (υ, ω; κ)|A|
1
and d˜se (υ, ω; κ) and dnˆ e (υ, ω; κ) denote the real differential area elements comprising the real and imaginary parts of s˜ e (υ, ω; κ) and nˆ e (υ, ω; κ), respectively. Substituting the probability densities in Eq. (19c) yields the
CHARACTERIZATION OF IMAGE SYSTEMS
expressions (9,10) He =
1 ˆ 2|B|
1 = ˆ 2|B|
log ˆ B
ˆ B
˜ se (υ, ω; κ) dυ dω, ˆ ne (υ, ω; κ)
(19d)
ˆ ρ (υ, ω)|ˆ e (υ, ω)|2 K 2 log 1 + ˆ ne (υ, ω; κ)
1 He = 2
ˆ B
ˆ ρ (υ, ω)|ˆ e (υ, ω)|2 log 1 + ˆ n (υ, ω; κ)
dυ dω,
(19f )
e
ˆ ρ (υ, ω), ˆ ρ (υ, ω) = σρ−2 ˆ n (υ, ω; κ) = ˆ n (υ, ω) + ˜ q (υ, ω; κ), e e Kσρ −2 2 ˆ ˆ ˆ |τ˜e (υ, ω)|2 , ne (υ, ω) = ρ (υ, ω)|τˆ (υ, ω)| ∗ |||s + σp Kσρ −2 −2 ˆ q (υ, ω; κ) = κ , σse ˆ = 1. The corresponding spatial frequency coorand |B| dinates and the mean spatial detail parameter in the ˆ ρ (υ, ω) are dimensionless and scaled as (υ, ω) → PSD ˆ 1/2 ), and µ = µρ |B| ˆ 1/2 . The sampling interˆ 1/2 , w/|B| (υ/|B| vals are X = 1 and X = 1.075 for the square and hexagonal photodetector arrays, respectively. This expression for He can be computed as a function of (1) the rms-signal-to-rmsnoise ratio (SNR) Kσρ /σp without accounting explicitly for either the gain K given by Eq. (10) or the variances σρ2 and σp2 of the reflectance and photodetector noise, respectively, and (2) the normalized mean spatial detail µ = µρ /X and µ = µρ /(0.93X ) for the square and hexagonal lattices, respectively, without specifying the dimensions of either the mean spatial detail µρ or the sampling intervals X or X . Finally, if the SFR τ˜e (υ, ω) extends to the sampling passband and the quantization noise is negligible, then the shape of τ˜e (υ, ω) does not affect the information rate, and Eq. (19f) simplifies further to 1 2
log 1 +
ˆ B
ˆ ρ (υ, ω)|τˆ (υ, ω)|2 ˆ n (υ, ω; κ)
Kσρ 2 1 . C = log 1 + 2 σp
dυ dω. (19e)
where
H=
The theoretical upper bound of H is Shannon’s channel capacity C for a bandwidth-limited system that has white photodetector noise and an average power limitation (here the variance σρ2 ), as given by (1)
If the photodetector noise is white so that the PSD ˆ and the PSD ˜ p (υ, ω) is equal to its variance σp2 /|B| 2 ˆ ˜ q (υ, ω; κ) is equal to its variance σq /|B|, then Eq. (19e) simplifies to
dυ dω,
(20)
where Kσρ −2 2 ˆ ˆ ˆ n (υ, ω) = ρ (υ, ω)|τˆ (υ, ω)| ∗ |||s + σp This expression represents the information rate of the acquired signal as a function of the critical limiting factors that constrain the image gathering process.
63
(21)
He can reach C only when 2 ˆ −1 ˆ ˆ ρ (υ, ω) = σp |B| , (υ, ω) ∈ B 0, elsewhere, ˆ τˆ (υ, ω) = 1, (υ, ω) ∈ B 0, elsewhere. However, neither of these two conditions can occur in ˆ ρ (υ, ω) of natural scenes practice because both the PSD and the SFR τˆ (υ, ω) of image gathering devices decrease gradually with increasing spatial frequency, as depicted in Figs. 3 and 10, respectively. Hence, H always remains smaller than C. Theoretical Minimum Data Rate Ee The theoretical minimum data rate Ee (also in bits/sample) that is required to convey the information rate He may be represented by the mutual information between the acquired signal s(x, y) and the encoded signal se (x, y; κ), defined by (9,10) Ee = E[se (x, y; κ)] − E[se (x, y; κ)|s(x, y)] = E[˜se (υ, ω; κ)] − E[˜se (υ, ω; κ)|˜s(υ, ω)],
(22a) (22b)
where the second term, E[·|·] is the uncertainty of se (x, y; κ) when s(x, y) is known. Thus, the entropy Ee given by this defining equation is the entropy of the encoded signal less the part of this entropy that is due to the quantization noise. For the assumptions that have been made about the coding process, the conditional entropy becomes the entropy E[nq (x, y; κ)] of the quantization noise, and Eqs. (22) simplify to Ee = E[se (x, y; κ)] − E[nq (x, y; κ)]
(23a)
= E[˜se (υ, ω; κ)] − E[n˜ q (υ, ω; κ)].
(23b)
A full representation of Ee in terms of the probability densities ps [˜se (υ, ω; κ)] and pn [n˜ q (υ, ω; κ)] of the encoded signal and the quantization noise, respectively, follows the same steps that led from Eq. (19b) to Eq. (19d) for He , except that now n˜ q (υ, ω; κ) replaces nˆ e (υ, ω; κ). Hence, entropy Ee becomes (9,10) Ee =
1 ˆ 2|B|
log ˆ B
˜ se (υ, ω; κ) dυ dω ˜ q (υ, ω; κ)
ˆ ρ (υ, ω)|ˆ e (υ, ω)|2 K 2 ˆ ne (υ, ω) 1 + = log 1 + ˜ ˆ 2|B| q (υ, ω; κ) ˆ B
(23c) dυ dω. (23d)
64
CHARACTERIZATION OF IMAGE SYSTEMS
ˆ ˜ p (υ, ω) = σ 2 /|B| If Eq. (23d) simplifies to
and
Ee =
1 2
ˆ B
ˆ ˜ q (υ, ω; κ) = σq2 /|B|,
ˆ ρ (υ, ω)|ˆ e (υ, ω)|2 ˆ n (υ, ω) + e log 1 + (Kσρ /σse )−2 κ −2
then
dυ dω,
(23e)
ˆ = 1, (υ, ω) → (υ/|B| ˆ 1/2 , ω/ where, as for Eq. (19f), |B| 1/2 1/2 ˆ ˆ |B| ), and µ = µρ |B| . These expressions for Ee represent the entropy of completely decorrelated data. They set the theoretical lower bound on the data rate that is associated with the information rate He . Information Efficiency He /Ee The information efficiency of completely decorrelated data is defined by the ratio He /Ee . This ratio reaches its upper ˆ ρ (υ, ω)|ˆ e (υ, ω)|2 ˆ ne (υ, ω) K 2 bound He /Ee = 1 when ˆ ne (υ, ω) ˜ q (υ, ω; κ), because then both Eq. (19e) and for He and Eq. (23d) for Ee reduce to the same expression: 1 Ee = He = ˆ 2|B|
ˆ B
ˆ ρ (υ, ω)|ˆ e (υ, ω)|2 log 1 + ˜ q (υ, ω; κ) K −2
image-gathering devices as a function of the SNR Kσρ /σp and of the optical-design index γp /2λF relative to the sampling passband, independent of the specific physical dimensions of the image gathering device. However, note that the size of the image gathering device for a given angular resolution tends to be proportional to the size of the photodetector array, so that small arrays normally lead to more compact devices. Figure 11 presents the information rate H of the acquired signal as a function of the optical-design index γp /2λF for several SNRs Kσρ /σp . The objective lens is either clear or shaded, and the photodetectorarray lattice is either square or hexagonal. The PSD ˆ ρ (υ, ω) has a fall-off rate m = 3 and a mean spa tial detail equal to the sampling interval (i.e., µ = µρ ˆ 1/2 = 1). |B| The curves reveal that H is constrained to remain below 3 bits/sample, regardless of the SNR, however high it may be, unless the optical-design index for a diffraction-limited objective lens falls within the range 0.3 < γp /2λF < 0.6. For the SNR Kσρ /σp = 256, the information rate reaches its highest value H ≈ 5 bits/sample when γp /2λF ≈ 0.35 for the clear lens or when γp /2λF ≈ 0.4 for the shaded
dυ dω. Ksr /sp
(a)
This upper bound can be approached with a small loss in He only when the aliasing and photodetector noises are small. Hence, the electro-optical design that optimizes He also optimizes He /Ee . However, a compromise always remains between He and He /Ee in selecting the number of quantization levels: He favors fine quantization, whereas He /Ee favors coarse quantization. For the two conditions for which He reaches C,
256 128 64 32 16
6 5 4 3 2 1
1 Kσρ 2 . Ee = C = log 1 + 2 σq
0 0.2 0.4 0.6 0.8 1.0 1.2
0 0.2 0.4 0.6 0.8 1.0 1.2
gp/2lF
g′p/2lF
Moreover, for σq2 = (σse /κ)2 and σs2e = K 2 σρ2 as an upper bound of signal variance, √ 3κ 1 3κ 2 . Ee = C = log 1 + 2 ≈ log 2 c c A comparison of this result with the maximum possible √ entropy Em = η = log κ shows that Ee = Em when c = 3. This constant is included implicitly in Eq. (19f) for He , Eq. (20) for H, and Eq. (23e) for Ee . Performance Electro-Optical Design Image gathering devices are normally specified by their angular resolution and SNR. The angular resolution, which is determined by the intervals (X, Y) of the sampling lattice, is normalized here, for convenience, to unity (i.e., to X = 1 and X = 1.075 for the square and hexagonal lattices, respectively), consistent with the formulation of He and Ee given by Eqs. (19f) and (23e). This normalization permits characterizing the performance of
Clear lens Ksr /sp
(b)
256 128 64 32 16
6 5 4 3 2 1
0 0.2 0.4 0.6 0.8 1.0 1.2
0 0.2 0.4 0.6 0.8 1.0 1.2
gp /2lF
g p′ /2lF Shaded lens
Figure 11. Information rate H versus the optical-design index γp /2λF for several SNRs Kσp /σp . The objective lens is either clear or shaded, and the photodetector array is either square or hexagonal and has contiguous apertures (i.e., γp = Xp ). The PSD ˆ ρ (υ, ω) is for m = 3 and µ = 1. The dot marks the design of the human eye.
CHARACTERIZATION OF IMAGE SYSTEMS
in the retina to the visual cortex within ∼1/20 s to avoid prolonging reaction time. This number of levels corresponds to a data rate of log2 40 = 5.3 bits/sample, which, in turn, is consistent with the only slightly lower information rate that this data rate appears to convey. The following results show that a design that is informationally optimized for a high SNR actually provides also high informational efficiency; and the results in ‘‘Retinex Coding’’ show, more specifically, that retinex coding, which is based on a model of color constancy in human vision, compresses the data to similar rates, even for radiance fields that have a wide dynamic range due to spatial variations in the irradiance. Information Rate and Channel Capacity Figures 12 and 13, together, characterize the information rate H as a function of the optical-response index ζc and ˆ ρ (υ, ω) has a fallthe mean spatial detail µ. The PSD off rate m = 3, as for the curves in Fig. 11, and the SFR τˆ (υ, ω) for ζc = 0.3 relative to the normalized sampling passband closely approximates that of the human eye relative to its sampling passband 1/2X = 50 cycles/deg.
4
0.3 0.4 0.6 0.8
B
0.8 0.6 0.4 0.2
0
0.5
1.0
1.5
u, w Figure 12. SFRs τˆ (υ, ω) of the image gathering device relative ˆ for unit sampling intervals. to the sampling passband B
Ksr /sp 256 128 64 32 16
5
zc
^
1.0
t^ (u, w)
lens. Actually, H is lower than this value for the clear lens and square sampling lattice, and it is higher than this value for the shaded lens and hexagonal sampling lattice, but only by a small margin for both. By comparison to these refinements, H depends far more on the SNR and on the optical-design index that controls the compromise between blurring and aliasing. Hence, the design is said to be informationally optimized when the optical-design index γp /2λF maximizes the information rate H for the available SNR Kσρ /σp . In practice, a conflict may arise between the objective lens aperture that is required to obtain a sufficiently high SNR and the aperture that produces the preferred SFR. Transmittance shading of the objective lens to shape the SFR is often undesirable because of the associated loss in SNR. There are, however, two other options. The conventional option is to use optical filters between the lens and the photodetector array. These filters are commonly referred to as prefilters in consumer color cameras (37). The other option is to degrade the SFR by permitting aberrations in the objective lens or by defocusing the lens. The idea of permitting lens aberrations to improve information rate may sometimes even reduce the cost of the optical system (by decreasing the corrective lens elements), especially if this system requires a large aperture for which diffraction-limited performance is difficult to attain. The human eye is represented most appropriately by the shaded lens and hexagonal sampling lattice. The pupil diameter is D = 2.5 mm in normal daylight and the effective focal length is f = 17 mm. Hence, the f number is F = 6.8. The photoreceptor spectral responsivity is centered around λ = 0.56 µm, and the center-to-center spacing of the contiguous foveal cones is Xp = 3 µm. Hence, the angular resolution is X = Xp /f = 0.18 mrad = 0.01° and the optical-design index for γp ≈ Xp is γp /2λF = 0.4. Barlow (11) estimates that human vision can distinguish ∼200 radiance levels, which corresponds closely to the SNR Kσρ /σp = 256 (10) for which γp /2λF = 0.4 optimizes H, as marked by a dot in Fig. 11(b). Moreover, Barlow estimates that retinal processing reduces this number of levels by a factor of ∼5 to the upper limit of ∼40 levels that each nerve fiber can transmit from a photoreceptor
65
Design 1 2 3
3 2 1
m=1
0
0.2
0.4 zc
0.6
0.8
10−2
10−1
100 m
101
102
Figure 13. Information rate H versus the optical-response index ζc and the mean ˆ ρ (υ, ω) is for spatial detail µ. The PSD m = 3 and the designs are specified in Table 2.
66
CHARACTERIZATION OF IMAGE SYSTEMS Table 2. Informationally Optimized Designs
(a)
5
Design
Kσρ /σp
ζc
H, bitsa
4
1 2 3
256 64 16
0.3 0.4 0.5
4.4 3.3 2.2
3
a
2
m = 3, µ = 1.
The information rate H versus ζc is given for several SNRs Kσρ /σp and µ = 1, and H versus µ is given for the three informationally optimized designs specified in Table 2. The curves show that H reaches its highest possible value only when both of the following conditions are met:
1 0
1. The result that the SFR τˆ (υ, ω) that optimizes H depends on the available SNR is consistent with the observation that, in one extreme, when the SNR is low, substantial blurring should be avoided because the photodetector noise would constrain the enhancement of fine spatial detail, even if aliasing were negligible and, in the other extreme, when the SNR is high, substantial aliasing should be avoided, so that this enhancement is relieved from any constraints, except those that the sampling passband inevitably imposes. 2. The result that H reaches its highest value when the sampling intervals are near the mean spatial detail of the scene is consistent with the observation that, ordinarily, it would not be possible to restore spatial details that are finer than the sampling intervals, whereas it would be possible to restore much coarser details from fewer samples. Robustness of Informationally Optimized Designs The selection of ζc that optimizes H for the available SNR, as specified in Table 2 from the curves in Fig. 13, is based
K sr / sp = 256 5
(b)
4
1. The relationship between the SFR τˆ (υ, ω) and the ˆ is chosen to optimize H for the sampling passband B available SNR Kσρ /σp . 2. The relationship between the sampling passband ˆ and the PSD ˆ ρ (υ, ω) is chosen to optimize H. B ˆ ρ (υ, ω) typical of natural scenes, the For the PSDs best match occurs when the sampling intervals are near the mean spatial detail of the scene (i.e., when X ≈ Y ≈ µρ or, equivalently, µ ≈ 1). These two conditions are consistent with those for which the information rate H given by Eq. (20) reaches Shannon’s channel capacity C. However, blurring and aliasing constrain H to values that are smaller than C by a factor of nearly 2 (i.e., H
m 4 3 2 1.4
3 2 1 0
(c)
Ksr /sp = 64
5 4 3 2 1
0
0.2
0.4 zc
0.6
0.8
Ksr /sp = 16 Figure 14. Information rate H versus the optical-response index ˆ ρ (υ, ω) when the mean ζc for four fall-off rates m of the PSD spatial detail µ = 1.
ˆ ρ (υ, ω) of the scene has on the assumption that the PSD a fall-off rate m = 3. Figure 14 shows that this selection is largely independent of this assumption over a wide range of fall-off rates (1.4 ≤ m < 4). This independence of the preferred SFR from the statistical properties of the scene is referred to as the robustness of the electrooptical design. This robustness continues for m < 1.4, for ˆ ρ (υ, ω) approaches a constant spectrum, but it which ˆ ρ (υ, ω) falls off more rapidly. ceases for m ≥ 4, for which
CHARACTERIZATION OF IMAGE SYSTEMS
However, neither of these two extremes has been recorded for natural scenes. Information Rate and Data Rate Figure 15 presents an information–entropy He (Ee ) plot that characterizes the information rate He of the encoded signal versus the associated theoretical minimum data rate, or entropy Ee for η-bit quantization. It is assumed that the encoder SFR τ˜e (υ, ω) is unity. Results are constrained to η ≥ 5 bits because of the treatment of the perturbation due to coarse quantization as an independent random noise. Within this constraint, the He (Ee ) plot illustrates the trade-off between He and Ee in selecting the number of quantization levels for encoding the acquired signal. It shows, in particular, that the design that realizes the highest information rate He also enables the encoder to realize the highest information efficiency He /Ee . This result appeals intuitively because perturbations due to aliasing and photodetector noise can be expected to interfere with signal decorrelation as well as with image restoration. Figure 16 depicts a common redundancy reduction scheme that consists of differential pulse code modulation (DPCM) and entropy coding. The DPCM encoder first predicts the acquired sample s0 based on some previous samples s1 , s2 , . . . , sn , and then it subtracts this prediction from the actual value. The decoder reverses this process by adding the prediction to the received signal. The entropy (Huffman) coding, that follows the DPCM deals with the
e=
5
ee
Design 1
4
efficient assignment of binary code words. The efficiency is gained by letting the length of the binary code word for a quantization level be inversely related to its frequency of occurrence. The nth-order conditional entropy En of the encoded data is En = −
κ−1 κ−1
...
s0 =0 s1 =0
κ−1
p(s0 , s1 . . . , sn ) log p(s0 |s1 , s2 , . . . , sn ).
sn =0
The lower bound of En (for a Gaussian signal) is Ee given by Eqs. (23), and the upper bound of En is Eo = −
κ−1
p(s0 ) log p(s0 ),
s0 =0
where p(s0 ) is the probability that a sample has the quantized amplitude so ∈ (0, 1, 2, . . . , κ − 1). The values p(so ) are usually obtained from the probability distribution (histogram) of the digital signal. If the probability distribution is uniform so that all quantization levels are equally likely, then p(so ) = 1/κ, and Eo = η is the maximum possible entropy. The conditional entropy E1 uses the past nearest sample, s1 , and E3 uses the past three nearest samples, s1 , s2 , and s3 . The prediction minimizes the linear mean-square error estimation E{[se (x, y; κ) − se (x, y; κ)]2 }. Figure 17 shows that the measurements E3 correlate closely with their theoretical lower bound Ee for random polygons such as those shown in Fig. 4(a). Entropy E3 is the average of 30 trials that have different realizations of these polygons for each measurement. Note that the entropy En of non-Gaussian signals can be lower than Ee , which is defined only for a Gaussian signal.
2
3 e
67
IMAGE RESTORATION
3
2 h=5
6
7
9 10 bits
8
Mathematical Model
1
Restored Image 0
1
2
3
4
5
6
7
8
9
10
ee Figure 15. The information–entropy He (εe , plot of the information rate He versus the associated theoretical minimum data rate ˆ ρ (υ, ω) is for m = 3 and µ = 1, εe for η-bit quantization. The PSD and the designs are specified in Table 2.
Image restoration, as depicted in Fig. 1, transforms the decoded signal se (x, y; κ) into the digital image r(x, y), defined by the DFT pair ∗ ev (x, y; κ) r(x, y; κ) = |||se (x, y; κ)
(24a)
˜ ev (υ, ω; κ). r˜ (υ, ω; κ) = s˜ e (υ, ω; κ)
(24b)
Encoder s(x,y;k)
+ −
Σ
Decoder se(x,y;k)
Entropy encoder
Quantizer
s(x,y;k) Predictor with delays
Entropy encoder
s (x,y;k)
+ Σ +
s(x,y;k) Σ
+ +
s2
s3
s1
s0
Predictor with delays
Figure 16. Differential pulse code modulation (DPCM) with entropy coding.
68
CHARACTERIZATION OF IMAGE SYSTEMS
8
E3,ee
Design h
6
1
8
4
error due to the perturbations in image gathering and signal coding and τv (x, y) and τ˜v (υ, ω) are the SR and SFR of an enhancement filter for controlling the visual quality of the Wiener restoration. Both filters are linear and spatially invariant, as delineated later. The corresponding interpolation passband is Z Z ˆ B = (υ, ω); |υ| < , |ω| < . 2X 2Y
8
E3,ee 6
2
7
The subsequent image-display process transforms the digital image r(x, y; κ) into the continuous displayed image r(x, y; κ), defined by the Fourier transform pair
4 8 3
E3,ee 6
6
r(x, y; κ) = |A|
K −1 r(x, y; κ)τd (x − x, y − y) + nd (x, y)
x,y
(25a) 4
rˆ (υ, ω; κ) = K
0 0.1
1 m
10
Figure 17. Data rate versus the mean spatial detail µ for the designs specified in Table 2 and η-bit quantization. The data rate is given by the measured entropy E3 and its theoretical lower bound εe .
se(−2)
se(−1)
se(0)
se(1)
se(2)
−1
r˜ (υ, ω; κ)τˆd (υ, ω) + nˆ d (υ, ω),
where τd (x, y) and τˆd (υ, ω) are the SR and SFR of the image-display device and nd (x, y) and nˆ d (υ, ω) are the noise and its spectrum of the image-display medium. It is assumed that the gain of the signal-to-image conversion is the inverse of the gain K of the reflectance-to-signal conversion, so that the mean reflectance of the image is equal to that of the scene. The image-display noise is modeled, like the photodetector and quantization noises, as Gaussian, stationary, statistically independent, and additive where the PSD is ˆ d (υ, ω; κ) ≡
r (0)
r (1)
(25b)
1 |nˆ d (υ, ω)|2 |A|
and the variance is
Figure 18. Interpolation between decoded signal se (x) and restored image r(x) for Z = 2.
σd2 =
˜ d (υ, ω) dυ dω.
ˆ B
The operation indicated by |||se (x, y; κ) (called zero padding) prepares the decoded signal for image restoration by interpolation, as depicted in Fig. 18, and the operator ev (x, y; κ) produces the digital image r(x, y; κ) on the interpolation lattice ||| that is Z times denser than the sampling lattice |||. The zero-padding operation is defined by
If this noise is characterized in terms of the number of distinguishable gray levels κd , like quantization, then ˆ −1 σ 2 ˆ −1 ˆ ˆ d (υ, ω) = |B| d d (υ, ω) ≈ |B|
where (x, y) = (x/Z, y/Z) and |A| = |A|/Z2 . The DFT and convolution are defined by Eqs. (12) and (15), respectively, by the substitutions (x, y) → (x, y) and |A| → |A|. The image-restoration operator is given by ∗ τv (x, y) ev (x, y; κ) = e (x, y; κ) ˜ e (υ, ω; κ)τ˜v (υ, ω), ˜ ev (υ, ω; κ) = ˜ e (υ, ω; κ) are the SR and SFR of the where e (x, y; κ) and Wiener filter that minimizes the mean-square restoration
σr κd
2 ,
where σr2 is the variance of r(x, y; κ) given by
|||se (x, y; κ) = |A||A|−1 se (x, y; κ) for (x, y) = (x, y) = 0, otherwise,
σr2
≈
σρ2
ˆ ρ (υ, ω)|ˆ r (υ, ω; κ)|2 dυ dω.
If the interpolation in the image restoration is sufficiently dense (normally, for Z ≥ 4), then the blurring and raster effects of the image-display process can be suppressed entirely, as depicted in Fig. 19. Consequently, the model of the transformation of the discrete image r(x, y; κ) into the continuous image r(x, y; κ), given by Eqs. (25), is freed from its dependence on both the SFR τˆd (υ, ω) and the interpolation lattice |||. This interpolation produces images with the highest possible fidelity and visual quality, and it also simplifies the
CHARACTERIZATION OF IMAGE SYSTEMS
1.0
where
∧
∧
Yv(u;k)
∧ B
∧
Yv(u −1;k)
ˆ ev (υ, ω; κ)|2 ˆ nr (υ, ω; κ) = K −2 ˆ ne (υ, ω; κ)|
SFR
td(u)
s~e(u;k)
0.5
0
0.5
1.0
1.5
2.0
2.5 u
3.0
3.5
4.0
4.5
5.0
Figure 19. Effect of digital interpolation in image restoration and display for Z = 4.
formulation of the Wiener filter used to restore images and of the figures of merit used to assess these images. A complete treatment of image restoration either without or with constrained interpolation (i.e., for 1 ≤ Z < 4) is given in Refs. 8 and 10, but it is omitted here because the added complexity distracts from the characterization of the more fundamental constraints of image gathering. Upon substituting Eq. (15b) for the decoded signal s˜ e (υ, ω; κ) in Eq. (24b) for the digital image r˜ (υ, ω; κ), the displayed image is given in the same form as Eqs. (13c) and (15c) by rˆ (υ, ω; κ) = ρ(υ, ˆ ω)ˆ r (υ, ω; κ) + nˆ r (υ, ω; κ) + nˆ d (υ, ω), (25c) where
is the throughput SFR from image gathering to display and ˆ ev (υ, ω; κ) nˆ r (υ, ω; κ) = K −1 nˆ e (υ, ω; κ) is the accumulated noise due to the perturbations in image gathering and signal coding. The accumulated noise may be expressed explicitly in terms of its three components as nˆ r (υ, ω; κ) = nˆ ar (υ, ω; κ) + nˆ pr (υ, ω; κ) + nˆ qr (υ, ω; κ),
1 |A|
1 = |A|
|ρ(x, y) − r(x, y; κ)|2 dx dy A
|ρ(υ, ˆ ω) − ˆr(υ, ω; κ)|2 dυ dω.
ˆ e (υ, ω; κ) To develop an expression for the Wiener filter 2 that minimizes the MSRE e , Eq. (25c) can be expressed in the form given by ˆr(υ, ω; κ) = [ρ(υ, ˆ ω)ˆ e (υ, ω) + K −1 nˆ e (υ, ω; κ)] ˆ e (υ, ω; κ) + nd (υ, ω), × ˜ e (υ, ω; κ)τˆd (υ, ω). For the assumpˆ e (υ, ω; κ) ≡ where tions that have been made about the image gathering process and signal quantization, the squared term of ˆr(υ, ω; κ) becomes
where ˆ ev (υ, ω; κ) nˆ ar (υ, ω; κ) = nˆ a (υ, ω)τ˜e (υ, ω) ˆ ev (υ, ω; κ) nˆ pr (υ, ω; κ) = nˆ p (υ, ω)τ˜e (υ, ω)
1 ˆ e (υ, ω; κ)|2 + ˆ d (υ, ω), ˜ se (υ, ω; κ)| |ˆr(υ, ω; κ)|2 = K −2 |A|
ˆ ev (υ, ω; κ) nˆ qr (υ, ω; κ) = n˜ q (υ, ω; κ)
are the displayed aliased signal components, photodetector noise, and quantization noise, respectively. The corresponding PSD is
˜ se (υ, ω; κ) is given by Eq. (17) and the cross term where of ρ(υ, ω) and ˆr(υ, ω; κ) becomes 1 1 ˆ e (υ, ω; κ) ρˆ ∗ (υ, ω)ˆr(υ, ω; κ) = |ρ(υ, ˆ ω)|2 ˆ e (υ, ω) |A| |A|
ˆ r (υ, ω; κ) = ˆ ρ (υ, ω)|ˆ r (υ, ω; κ)|2 + ˆ nr (υ, ω; κ) ˆ d (υ, ω), +
is the PSD of the accumulated noise. The mean-square restoration error (MSRE) e2 between the reflectance of the scene and the displayed image may be defined in two ways. Consistent with the foregoing formulations, e2 may be defined as a function of both the variation and the mean of the reflectance ρ(x, y) = ρ(x, y) + ρ and the corresponding displayed image r(x, y; κ) = r(x, y; κ) + ρ(x, y; κ). Alternatively, the mean values may be disregarded by removing the statistical mean s = Kρ from the acquired signal s(x, y; κ) so that e2 is defined only as a function of ρ(x, y) and r(x, y; κ). The difference between these two ways is small because the Wiener filter based on the first, more inclusive, definition separates the MSRE e2 linearly into two errors; one is between the variations ρ(x, y) and r(x, y; κ), and the other is between the means ρ and r(x, y; κ). Moreover, this definition allows the minimum MSRE between ρ and r(x, y; κ) to vanish without affecting that between ρ(x, y) and r(x, y; κ). The corresponding Wiener filters are identical except for several isolated spatial frequencies, as noted later. With this provision, the MSRE e2 between the variation of the reflectance and the displayed image is defined by
e2 =
ˆ ev (υ, ω; κ) ˆ r (υ, ω; κ) = τˆ (υ, ω)τ˜e (υ, ω)
and
69
(26)
ˆ e (υ, ω; κ). ˆ ρ (υ, ω)ˆ e (υ, ω) =
70
CHARACTERIZATION OF IMAGE SYSTEMS
Hence, by completing the squares, e2 becomes ˆ ρ (υ, ω)|ˆ e (υ, ω)|2 ˆ ρ (υ, ω) 1 − e = ˜ se (υ, ω; κ) K −2 2
+K
−2
2 ˆ ρ (υ, ω)ˆ e∗ (υ, ω) ˜ se (υ, ω; κ) ˆ e (υ, ω; κ) − ˜ se (υ, ω; κ) K −2
ˆ d (υ, ω) +
dυ dω,
where ˆ e (υ, ω) is the throughput SFR in Eq. (15c). The absolute minimum MSRE occurs when ˆ e (υ, ω; κ) =
ˆ ρ (υ, ω)ˆ e∗ (υ, ω) , ˜ se (υ, ω; κ) K −2
(27a)
for which the second positive term under the integral vanishes for all spatial frequencies. The remaining two positive terms become the associated minimum MSRE ε 2 given by ε2 = εˆ 2 (υ, ω; κ) dυ dω,
where ˆ ρ (υ, ω)[1 − ˆ r (υ, ω; κ)] + ˆ d (υ, ω) εˆ 2 (υ, ω; κ) = is the minimum mean-square error PSD of the displayed image and ˆ e (υ, ω; κ) = τˆ (υ, ω)τ˜e (υ, ω) ˆ e (υ, ω; κ) ˆ r (υ, ω; κ) = ˆ e (υ, ω)
detail µ = µρ /X or µ = µρ /(0.93X ) relative to the square and hexagonal lattices, respectively, without specifying either the mean spatial detail µρ or the sampling intervals X or X . ˆ Figure 20 illustrates the Wiener filter (υ, ω; κ) and the corresponding throughput SFR ˆ r (υ, ω; κ) = ˆ τˆ (υ, ω) (υ, ω; κ) for the three designs of the image gathering device specified in Table 2. It is assumed that the ˆ ρ (υ, ω) has a fall-off rate m = 3 and a mean spatial PSD detail µ = 1. (This assumption is adhered to for the subsequent image restorations, regardless of the target.) Note that ˆ r (υ, ω; κ) approaches the ideal SFR of the classical communication channel, which is unity inside the sampling passband and zero outside, as the accumulated noise becomes negligible. Without or with only coarse interpolation (1 ≤ Z < 4), the minimization of the MSRE e2 yields a less optimal filter that, as given in Refs. 8 and 10, is a function of both ˆ Moreover, the SFR τˆd (x, y) and the interpolation lattice |||. if the mean s is not removed from the acquired signal s(x, y), then the MSRE between ρ and r(x, y; κ) can vanish ˆ e (v, w; κ) = 0 for ˆ e (0, 0; κ) = 1 and when only when ˆ (v, w) ∈ / B. The first condition is approached in the limit ˆ ne (υ, ω; κ) vanishes, and the second condition as the PSD ˆ e (υ, ω; κ) decreases rapidly is generally valid because ˆ outside the passband B. The Wiener restoration produces images that normally have a high resolution and sharpness. However, these images also tend to contain visually annoying defects due to the accumulated noise nˆ r (υ, ω; κ) and the ringing near
(a)
is the throughput SFR of the image system without the enhancement filter τ˜v (υ, ω). Finally, substituting Eq. (16) ˜ se (υ, ω; κ) in Eq. (27a) yields the Wiener for the PSD filter
1
∧ t(u, w)
2
ˆ ρ (υ, ω)ˆ e∗ (υ, ω) , ˆ ρ (υ, ω)|ˆ e (υ, ω)|2 + K −2 ˆ ne (υ, ω; κ) (27b) ˆ ne (υ, ω; κ) is the PSD of the accumulated noise in where the encoded signal given by Eq. (16). ˆ τ˜e (υ, ω) = 1 and ˜ q (υ, ω; κ) = ˜ p (υ, ω) = σp2 /|B|, For 2 ˆ σq /|B|, Eq. (27b) simplifies to ˆ ρ (υ, ω)τˆ ∗ (υ, ω) , Kσρ −2 ˆ ρ (υ, ω)|τˆ (υ, ω)|2 ∗ |ˆ|| + σp Kσρ −2 −2 + κ σs
Design
0.8
ˆ e (υ, ω; κ) =
ˆ (υ, ω; κ) =
∧ B
1.0
0.6
3
0.4 0.2 0
0.4 0.8 1.2 1.6 u, w ∧ t (u, w)
Design 2
(27c)
−0.5 −1.0
−1 .
0
0.
5
−0 .
5
1.
0
w
ˆ 1/2 , ω/|B| ˆ 1/2 ). The corresponding where (υ, ω) → (υ/|B| sampling intervals are X = 1 and X = 1.075 for the square and hexagonal photodetector-array lattices, respectively. This filter can be implemented as a function of (1) the SNR Kσρ /σp without accounting explicitly for either the gain K or the variances σρ2 and σp2 and (2) the normalized spatial
1.0 0.5 u
Image gathering
Figure 20. SFRs of image gathering and restoration. The PSD ˆ ρ (υ, ω) has m = 3 and µ = 1 and the designs are specified in Table 2.
CHARACTERIZATION OF IMAGE SYSTEMS (b)
10
τˆv (υ, ω) = exp −
∧ Y (u, w; k)
8
ζ ζb
2
+ ξ(2π )1/2
ζ 2 × exp − , ζg
6
ζ ζg
71
2
(28)
4
where ζb controls the roll-off of the SFR of the Wiener filter and ξ controls the amount of edge enhancement. The second term of τˆv (υ, ω) contains the SFR of the Laplacianof-Gaussian (∇ 2 G) operator, given by
2 0
0.2 0.4 0.6 0.8 u, w
ζ 2 τˆg (υ, ω) = (2π ζ ) exp − , ζg
∧ Y(u, w; k)
2
(29)
where ζg controls the effective width of this bandpass filter. Marr and Hildreth (38,39) introduced this operator as a model of edge enhancement in human vision. 1.0
0.5 u
Visual Signal 0.
5
−0
−1
.5
.0
0
ω
−0.5 −1.0
1.
Wiener filter
1.0
(c)
Two coordinate transformations must be accounted for to model the visual signal so (x, y; κ) that the eye of the observer conveys from the displayed image to the higher levels of the brain, as depicted in Fig. 1. The first transformation is scaling from the normalized spatial frequencies (υ, ω) (in cycles/sample) to the display patial frequencies (v, w) (in cycles/mm), given by
∧
Gr(u, w; k)
0.8
γd υ, γd ω → v,w,
0.6
where γd is the number of samples/mm that the displayed image contains. The Wiener filter extends essentially only ˆ of the image gathering device. to the sampling passband B For example, for a square lattice, the cutoff frequency of the ˆ = 0.5 γd cycles/mm. ˆ d = γd displayed image is limited to Hence, scaling from (υ, ω) to (v,w) may also be written as
0.4 0.2 0
0.2 0.4 0.6 u, w
0.8
∧
ˆ d ω → v,w. ˆ d υ, 2 2
Gr (u,w;k)
The second transformation is scaling from the observer spatial frequencies υ0 , ω0 (again in cycles/sample) to the display spatial frequencies (v,w), given by 1.0
0.5 u
γ0 υ0 , γ0 ω0 → v,w, 0.
5
−1
−0 .0
.5
1.
0
ω
−0.5 −1.0
Throughput Figure 20. (Continued)
sharp edges (Gibbs phenomenon). Hence, it is often desirable to combine this restoration with an enhancement filter that gives the user some control over the tradeoff among fidelity, resolution, sharpness, and clarity. A suitable enhancement filter is given by (8–10)
where γ0 is the number of samples/mm that the foveal photoreceptors acquire from the displayed image. Hence, scaling from the observer spatial frequencies υ0 , ω0 directly to the normalized spatial frequencies υ, ω is γ0 υ0 , γ0 ω0 → γd υ, γd ω γd γd υ, ω. υ 0 , ω0 → γ0 γ0 ˆ 0 = 0.5 cycle/ Because the observer cutoff frequency sample = 0.5γ0 cycles/mm and the displayed image cutoff ˆ d = 0.5 cycle/sample = 0.5 γd cycles/mm, the frequency
72
CHARACTERIZATION OF IMAGE SYSTEMS
where σs20 is the variance of s0 (x, y; κ) given by
previous scaling can be rewritten as υ 0 , ω0 →
ˆd ˆ0
υ,
ˆd ˆ0
σs20 ≈ K02 σρ2
ω.
From these coordinate transformations, the visual signal s0 (x, y; κ) can be expressed, like Eqs. (13) for the acquired signal s(x, y), by the Fourier transform pair s0 (x, y; κ) = K0 r(x, y; κ) ∗ τ0 (x, y) + nq0 (x, y)
(30a)
s˜ 0 (υ, ω; κ) = K0 rˆ (υ, ω; κ)τ˜0 (υ, ω) ∗ |ˆ||0 + n˜ q0 (υ, ω),
(30b)
where K0 is the gain of the (assumed) linear image-tosignal conversion, τ0 (x, y) and τˆ0 (υ, ω) are the SR and SFR, respectively, of the human eye, |||0 and |ˆ||0 denote the photoreceptor sampling lattice and its transform, respectively, and nq0 (x, y) and n˜ q0 (υ, ω) are the effective noise and its transform, respectively. If the observer is free to view the displayed image ˆ 0 of the eye closely, then the sampling passband B ˆ of normally encompasses the interpolation passband B d the displayed image, so that the perturbations in viewing the image (unlike viewing the scene directly) are limited to blurring and noise. Hence, upon substituting Eq. (25c) for the displayed image in Eq. (30b), the visual signal is given in the same form as Eqs. (13c), (15c), and (25c) by ˆ ω)ˆ 0 (υ, ω; κ) + nˆ 0 (υ, ω; κ) + n˜ q0 (υ, ω), sˆ 0 (υ, ω; κ) = K0 ρ(υ, (31) where ˆ ev (υ, ω; κ)τˆ0 (υ, ω) ˆ 0 (υ, ω; κ) = τˆ (υ, ω)τ˜e (υ, ω)
Figures of Merit Information Rate Hd Similarly to the information rate He of the encoded signal se (x, y; κ) given by Eq. (18), the information rate Hd (in bits/sample) of the restored image r(x, y; κ) is defined by Hd = E[r(x, y; κ)] − E[r(x, y; κ)|ρ0 (x, y)]
nˆ 0 (υ, ω; κ) = K0 [nˆ r (υ, ω; κ) + nˆ d (υ, ω)]τˆ0 (υ, ω)
For the assumptions that have been made about image gathering, coding and display, the conditional entropy becomes the entropy E[nr0 (x, y; κ) + nd (x, y)] of those variations of the accumulated noise nr (x, y; κ), denoted by nr0 (x, y; κ), for which the spectrum nˆ r (υ, ω; κ) is contained within the sampling passband plus the noise nd (x, y) of the image-display medium. Hence, Eqs. (33) simplify to Hd = E[r(x, y; κ)] − E[nr0 (x, y; κ) + nd (x, y)]
ˆ + nˆ d (υ, ω)]. (34b) = E[r(υ, ω; κ)] − E[nˆ r (υ, ω; κ) : (υ, ω)ε B A full representation of Hd follows the same general steps that led from Eq. (19b) to Eq. (19d) for He , so that Hd becomes ˆ B
ˆ ρ (υ, ω)|ˆ r (υ, ω; κ)|2 log 1 + ˆ nr (υ, ω; κ) + ˆ d (υ, ω)
(32)
where
Hd =
1 2
ˆ B
log 1 +
ˆ ρ (υ, ω)|ˆ r (υ, ω; κ)|2 dυ dω, −2 −2 ˆ n (υ, ω; κ) + Kσρ κ d r σr (34c)
where
ˆ n0 (υ, ω; κ) = K02 [ ˆ nr (υ, ω; κ) + ˆ d (υ, ω)]|τˆ0 (υ, ω)|2
σs0 κ0
2 ,
dυ dω.
ˆ ρ (υ, ω)|τˆ (υ, ω)|2 ∗ ||| + ˆ n (υ, ω; κ) = [ r s
˜ q0 (υ, ω) is the is the PSD of the accumulated noise, and PSD of n˜ q0 (υ, ω). In normal daylight, the information rate is limited by the number of discrete levels κ0 that each nerve fiber can transmit from the retina to the visual cortex, rather than by the photon-limited noise from the photoreceptors. ˜ q0 (υ, ω) may be given, like that for Hence, the PSD quantization, by the expression
(34b) ˆ τ˜e (υ, ω) = 1, ˆ ˜ q (υ, ω; κ) = σq2 /|B|, ˜ p (υ, ω) = σp2 /|B|, For ˆ d (υ, ω) = σ 2 , Eq. (34b) simplifies to τˆv (υ, ω) = 1, and d
ˆ s0 (υ, ω; κ) = K02 ˆ ρ (υ, ω)|ˆ 0 (υ, ω; κ)|2
ˆ −1 σ 2 ˆ −1 ˜ ˜ q0 (υ, ω) = |B| q0 q0 (υ, ω) ≈ |B|
(34a)
is the accumulated noise due to the perturbations in the image system. The corresponding PSD of this signal is
ˆ n0 (υ, ω; κ) + ˜ q0 (υ, ω), +
(33a)
ˆ = E[ˆr(υ, ω; κ)] − E[ˆr(υ, ω; κ)|ρ(υ, ˆ ω) : (υ, ω)ε B]. (33b)
1 Hd = ˆ 2|B|
is the throughput SFR from image gathering to the retina of the human eye and
ˆ ρ (υ, ω)|ˆ 0 (υ, ω; κ)|2 dυ dω.
+
Kσρ σse
−2
Kσρ σp
−2
ˆ κ −2 ]| (υ, ω; κ)|2
ˆ = 1, (υ, ω) → (υ/|B| ˆ 1/2 , ω/|B| ˆ 1/2 ), and, as for Eq. (19f), |B| 1/2 ˆ and µ = µρ |B| . The information rate Hd given by Eqs. (34) is independent of the SFR τˆd (υ, ω) of the image-display device only if the interpolation is sufficiently dense (Z ≥ 4), so that τˆd (υ, ω) remains near unity within the sampling ˆ (to avoid blurring) and yet diminishes to passband B ˆ (to avoid raster zero beyond the interpolation passband B effects). Under this condition, the information rate of the
CHARACTERIZATION OF IMAGE SYSTEMS
displayed image differs from that of the digital image only by the perturbation due to the added image-display ˆ d (υ, ω). Moreover, if noise, as accounted for by the PSD ˆ e (υ, ω; κ), this noise becomes negligible and τˆv (υ, ω), like ˆ then the expressions extends to the sampling passband B, for Hd reduce to those given by Eqs. (19) for He . Otherwise, if the interpolation is constrained (1 ≤ Z < 4), then it becomes necessary to account for the blurring and raster effects in the image-display process, as delineated in Ref. 10. The total information content of an image restored from N samples is NHd bits. If the image is displayed with a density of γd samples/mm, so that the area of the image |Ad | = Nγd−2 , then the information rate Hd (in bits/sample) can be interpreted as the information density |Ad |−1 NHd = γd2 Hd (in bits/mm2 ) of the displayed image.
ˆ e (υ, ω; κ) as Hence, Eq. (36a) can be expressed in terms of H
|ρ(x, y) − r(x, y; κ)|2 dx dy A
A
eˆ 2 (υ, ω; κ) dυ dω.
(35)
ˆ e (υ, ω; κ), the For images restored with the Wiener filter 2 mean-square error PSD eˆ (υ, ω; κ) becomes the minimum mean-square error PSD εˆ 2 (υ, ω; κ). Hence, the maximum realizable fidelity Fd is obtained by substituting εˆ (υ, ω; κ) for eˆ (υ, ω; κ) in Eq. (35), which yields
ˆ ρ (υ, ω)ˆ r (υ, ω; κ) − ˆ d (υ, ω)] dυ dω. (36a) [
ˆ τ˜e (υ, ω) = 1, ˆ ˜ p (υ, ω) = σp2 /|B|, ˜ q (υ, ω; κ) = σq2 /|B|, For 2 ˆ ˆ and d (υ, ω) = σd /|B|, Eq. (36a) simplifies to Fd =
2 σρg
ˆ ρ (υ, ω)|τˆg (υ, ω)|2 ˆ r (υ, ω; κ) dυ dω,
where
2 = σρ2 σρg
ˆ ρ (υ, ω)|τˆg (υ, ω)|2 dυ dω.
For the assumption that the noise in early human vision can be treated as independent and additive, the visual information rate H0 is defined, analogously to Eqs. (19b) and (34b), by
|ρ(x, y)|2 dx dy
Fd = σρ−2
σρ2
Visual Information Rate H0
= 1 − σρ−2
ˆ ρ (υ, ω)[1 − 2−Hˆ e (υ,ω;κ) ] − ˆ d (υ, ω)} dυ dω. {
(36c) In some applications, the goal of image processing is to enhance or extract a particular feature of the scene, not to produce images that are perceptually similar to the scene. For example, in robot vision, the first step often is to enhance edges with a filter like the ∇ 2 G operator τˆg (υ, ω) given by Eq. (29). Hence, to characterize image gathering and processing for this goal, it is appropriate to replace ρ(x, y) in Eq. (35) by ρ(x, y) ∗ τg (x, y) as the desired representation of the scene, for which the maximum realizable fidelity becomes
Maximum Realizable Fidelity Fd
F=1−
Fd = σρ−2
Fg = The fidelity F is a measure of the similarity between the variation of the reflectance of the scene and the displayed image, ρ(x, y) and r(x, y; k), respectively, defined by
73
ˆ ρ (υ, ω)ˆ r (υ, ω; κ) dυ dω −
Kσρ σr
−2
κd−2 , (36b)
ˆ 1/2 ), and µ = µρ |B| ˆ 1/2 . ˆ 1/2 , ω/|B| where (υ, ω) → (υ/|B| ˆ The integrand He (υ, ω; κ) of the information rate He given by Eq. (19d) and the throughput response ˆ r (υ, ω; κ) are related to each other by the expressions ˆ e (υ, ω; κ) = − log[1 − ˆ r (υ, ω; κ)] H ˆ ˆ r (υ, ω; κ) = 1 − 2−He (υ,ω;κ) .
ˆ H0 = E[ˆs0 (υ, ω; κ)] − E[nˆ 0 (υ, ω; κ) + n˜ q0 (υ, ω) : (υ, ω)ε B] (37a) ˆ ρ (υ, ω)|ˆ 0 (υ, ω; κ)|2 K 2 1 log 1 + 0 dυ dω. (37b) = ˜ q0 (υ, ω) ˆ n0 (υ, ω; κ) + 2 ˆ B
ˆ τ˜e (υ, ω) = 1, ˆ ˜ q (υ, ω; κ) = σq2 /|B|, ˜ p (υ, ω) = σp2 /|B|, For and τ˜v (υ, ω) = 1, Eq. (37b) simplifies to H0 =
1 2
ˆ B
log 1 +
ˆ ρ (υ, ω)|ˆ 0 (υ, ω; κ)|2 dυdω, −2 σ K 0 ρ −2 ˆ n (υ, ω; κ) + κ 0 0 σs0 (37c)
where ˆ n (υ, ω; κ) + ˆ n (υ, ω; κ) = [ r 0
Kσρ σr
−2
κd−2 ]|τˆ0 (υ, ω)|2 ,
ˆ 1/2 ), and µ = µρ |B| ˆ 1/2 . ˆ = 1, (υ, ω) → (υ/|B| ˆ 1/2 , ω/|B| and |B| The coordinates of foveal human vision are given by the angular sampling intervals X0 that range from ∼0.01 to 0.015° , for which the cutoff frequency ranges from ˆ 0 = 50 to 32 cycles/deg (11). The SFR τˆ0 (υ, ω) of the eye may be represented by the Gaussian shape whose opticalresponse index ζc = 0.3 for the normalized cutoff frequency ˆ = 0.5, or, for example, whose index ζc = 19 for the cutoff ˆ 0 = 32 cycles/deg. Finally, the effective noise frequency may be estimated from the constraint imposed by the
74
CHARACTERIZATION OF IMAGE SYSTEMS
κ0 ≈ 40 levels that each nerve fiber can transmit from the retina to the visual cortex.
m=3 m=1
5
(a)
4
Performance 3
Information Rate, Fidelity, and Robustness ˆ ρ (υ, ω) of a scene is seldom known when The PSD images are restored. Moreover, even if the fall-off rate m of this PSD were known, the mean spatial detail ˆ 1/2 ) would usually remain uncertain because µ(= µρ |B| it depends on the surface properties of the scene and the angular resolution of the image gathering device and also on the viewing distance. In practice, therefore, image restorations normally have to rely on estimates of the statistical properties of scenes rather than on their actual properties. The tolerance of the quality of the restored image to errors in these estimates is referred to as the robustness of the image restoration. Figure 21 presents the information rate H and the corresponding maximum realizable fidelity F for matched and mismatched Wiener restorations. These restorations represent the digital image r(x, y; κ) or, equivalently, the continuous image r(x, y; κ) displayed on a noise-free medium. The matched restorations use the correct values of m and µ, whereas the mismatched restorations use incorrect estimates of m and/or µ. These curves reveal that the informationally optimized designs produce the highest fidelity and robustness and that both improve with increasing H. Ordinarily, to assure both high fidelity and robustness, it is necessary for the SNR Kσρ /σp to be high, above ∼ 64, so that an information rate H ≥ 3 bits/sample can be attained. Moreover, these curves reveal that, ordinarily, one cannot go far wrong by assuming that the mean spatial detail is equal to the sampling intervals (i.e., µ = 1). This observation appeals intuitively because spatial detail that is much smaller cannot be resolved and detail that is larger is not degraded substantially by blurring and aliasing. Information Rate and Visual Quality Figure 22 presents the information rate Hd of the displayed image for the three informationally optimized designs specified in Table 2. These curves account for the loss in the information rate that occurs in the transformation from the digital image r(x, y; κ) to the continuous image r(x, y; κ) due to limitations in the number of distinguishable gray levels of the image-display medium. For the κd ≈ 44 levels of the following halftone prints, this loss is small for designs 2 and 3 but substantial for design 1. Hence, these prints cannot convey fully the improvement in image quality that most other imagedisplay media can produce with increasing information rate above H ≈ 3 bits/sample. Figure 23 presents two targets that are used to assess the visual quality of restored images. The resolution wedges, which expand linearly in width proportionally to the distance from the center, represent a periodic structure that is highly sensitive to the perceptual defects due to blurring and aliasing, and the random polygons, which were characterized previously in the section ‘‘Radiance Field,’’ represent a particular realization of scenes, where
2 1 0 1.0 0.8 0.6 0.4 0.2 0.0
0.2
0.4 zc
0.6
0.8
K sr /sp = 256 <m> 3 2 3 2 3 2
(b) 5 4 3
<m> 1 1 10 10 0.1 0.1
2 1 0 1.0 0.8 0.6 0.4 0.2 0.0
0.2
0.4 zc
0.6
0.8
K sr /sp = 16 Figure 21. Information rate H and maximum realizable fidelity F versus the optical-response index ζc for two SNRs Kσρ /σρ . F is given for the matched (m = 3, µ = 1) and five mismatched Wiener restorations.
ˆ ρ (υ, ω) is given by Eq. (4) and m = 3. The width the PSD of the resolution wedges is equal to unit sampling intervals for µ = 1, which represents the finest spatial detail that
CHARACTERIZATION OF IMAGE SYSTEMS
can ordinarily be restored. Both targets are displayed with 256 × 256 discrete elements in a 4 × 4 cm area. Even though the resultant granularity cannot be resolved without magnification, it causes a perceptual distortion of spatial detail that is smaller than the sampling intervals, as is most evident in the resolution wedges for µ < 1. This distortion is an artifact of the discrete halftone prints rather than the result of perturbation in image systems. Figures 24 and 25 present Wiener restorations and their three components, as delineated by Eq. (25c). The images are formed from a set of 64 × 64 samples with Z = 4 interpolation, so that their display has the
4.0
3.0
d
2.0 Design 1 2 3
1.0
0 256 128 112
96
64 kd
48
32
16
0
Figure 22. Information rate Hd of the displayed image versus ˆ ρ (υ, ω) the number of distinguishable gray levels kd . The PSD has m = 3 and µ = 1, and three designs are specified in Table 2.
(a) m
1
2
75
same density of discrete elements as the targets. The contrast of the aliased and photodetector noise components has been increased by a factor of 4 to improve their visibility. The extraneous structure that aliasing produces in the images of the resolution wedges is commonly referred to as a Moire’ pattern. The corresponding distortion in the images of the random polygons, however, emerges more subtly as jagged (or staircase) edges. If these images were displayed in a smaller format, then the jagged edges could not be resolved by the observer and would appear to be the result of blurring instead of aliasing. Therefore, aliasing has often been overlooked as a significant source of image degradation. Finally, Fig. 26 presents images for which the Wiener restorations given in Figs. 24 and 25 are enhanced for visual quality by suppressing some of the defects due to aliasing and ringing to improve clarity at the cost of a small loss in sharpness. The images are produced again from 64 × 64 pixels, but they are displayed in three different formats. The smallest format whose a density is 64 pixels/cm corresponds to a display of 256 × 256 pixels in a 4 × 4 cm area, which is typical of the images found in the prevalent digital image-processing literature. As can be observed, this format leads to three indistinguishable images that hide the distortions due to perturbations in image gathering and also cause a significant loss in resolution. In general, if the images are displayed in a format that is large enough to reveal the finest spatial detail near the limit of resolution of the image system, then distortions due to the perturbations in this system also become visible. When no effort is spared to reduce these distortions without a perceptual loss of resolution and sharpness, then it becomes most strongly evident that the best visual quality with which images can be restored improves with increasing information rate, even after the fidelity has essentially reached its upper bound. This improvement continues until it is gradually ended by the unavoidable compromise among sharpness, aliasing, and ringing, as well as by the granularity of the image display.
3
RETINEX CODING Resolution wedges
Mathematical Model (b)
Random polygons, m = 5 Figure 23. Targets.
The goal of retinex coding is to suppress the spatial variation of the irradiance (e.g., shadows) across natural scenes and preserve only the spatial detail and reflectance (or lightness) of the surface itself. This coding scheme uses an algorithm for enhancing color images that Jobson et al. (40,41) developed from one (42) of several models (43) of color constancy in human vision. However, to characterize retinex coding in terms of communication theory, it is necessary first to develop an approximate small-signal linear representation of the nonlinear retinex-encoded signal that is similar in its general form to Eq. (15c) for the linearly encoded signal.
76
CHARACTERIZATION OF IMAGE SYSTEMS
(b)
(a)
r(x,y ) ∗ Gr(x,y ;k)
r (x,y ;k) (c)
(d)
4 nar(x,y ;k)
4 npr(x,y ;k)
Figure 24. Images restored with the Wiener filter and their three components for designs 1 (top) and 3 (bottom).
The inclusion of retinex processing in an informationally optimized image system involves two steps (44): 1. Signal coding with the nonlinear retinex transformation to a. enhance edges sharply and clearly, like the Mach bands (45) in human vision, free from the compromise between sharpness and ringing due to band-limited linear processing
b. preserve the contrast across these edges relative to the mean reflectance of their neighborhoods. 2. Image restoration with a linear Wiener filter to minimize the mean-square error due to the perturbations by a. blurring, aliasing, and photodetector noise in image gathering b. errors in suppressing the irradiance and preserving the reflectance.
CHARACTERIZATION OF IMAGE SYSTEMS
(a)
77
(b)
r(x,y ) ∗ Gr(x,y ;k)
r (x,y ;k) (c)
(d)
4 npr(x,y ;k)
4 nar(x,y ;k)
Figure 25. Images restored with the Wiener filter and their three components for designs 1 (top) and 3 (bottom).
Retinex coding transforms the acquired signal s(x, y) into the encoded signal sret (x, y; κ), defined by sret (x, y; κ) = sret (x, y) + nq (x, y; κ),
(38)
where ∗ τre (x, y)]. sret (x, y) = log[s(x, y)] − log[s(x, y)
(39a)
The so-called surround function τre (x, y) has the Gaussian shape given by the Fourier transform pair
τre (x, y) = π ζe2 exp[−(π ζe r)2 ] τˆ (υ, ω) ∗ |ˆ|| re τ˜re (υ, ω) = ζ 2 exp − , ζe
if τˆre (υ, ω) is band-limited,
78
CHARACTERIZATION OF IMAGE SYSTEMS
(a)
(b)
Design 1
Design 3
Figure 26. Images restored with the Wiener filter and enhanced for visual quality. The large, medium, and small formats contain 16, 32, and 64 pixels/cm, respectively.
where r2 = x2 + y2 and ζe is the surround index that controls the width of the SR τre (x, y) (Fig. 27b). The expression for sret (x, y) may be rewritten as follows: sret (x, y) = log
s(x, y) ∗ τre (x, y) s(x, y)
= log[1 + sre (x, y)],
(39b)
where sre (x, y) =
∗ τre (x, y) s(x, y) , ∗ τre (x, y) s(x, y)
mean are sufficiently small, so that s(x, y) ∗ τre (x, y) 1. s(x, y) ∗ τre (x, y) This still nonlinear representation of the retinex transformation consists of a high-pass filter τre (x, y) and a locally ∗ τre (x, y)]−1 . adaptive automatic-gain control [s(x, y) For the model of the radiance field L(x, y; λ) given by Eq. (2), Eq. (13a) of the acquired signal s(x, y) can be represented as the sum of two components,
(40)
s(x, y) = s(x, y) + s(x, y),
τre (x, y) = δ(x, y) − τre (x, y), where and δ(x, y) is the Kronecker delta (Fig. 27c). This formulation suggests that sre (x, y) is a reasonable smallsignal representation of the retinex transformation sret (x, y) when the surround, or support, of SR τre (x, y) and/or the variations of the acquired signal relative to its
s(x, y) = [Kρ(x, y)ι(x, y)] ∗ τ (x, y) + np (x, y) and s(x, y) = Kρ ι(x, y) ∗ τ (x, y).
(a)
0.35
∧ B
1.0
0.30
0.8 t (u,w)
0.20 0.15
∧
t (x,y)
0.25 0.6 0.4
0.10 0.2
0.05 0 −10
−5
0 x,y
5
0 10−3
10
10−2
10−1
100
10−1
100
10−1
100
10−1
100
u,w
Image gathering response, zc = 0.3 0.025
1.0
0.020
0.8 ~ tre(u,w)
tre(x,y)
(b)
0.015 0.010 0.005 0 −10
0.6 0.4 0.2
−5
0 x,y
5
0 10−3
10
10−2 u,w
Surround-function response, ze = 0.08 0.10
(c)
1.0
0.08
0.8 ∆t~re(u,w)
∆tre(x,y)
0.06 0.04 0.02
0.6 0.4
0 0.2
−0.02 −0.04 −10
−5
0 x,y
5
0 10−3
10
10−2 u,w
High-pass response produced by surround function 0.35
(d)
1.0
0.30 0.8 tg(u,w)
0.20 0.15
∧
tg(x,y)
0.25
0.10 0.05
0.6 0.4 0.2
0 −0.05 −10
−5
0 x,y
5
10
0 10−3
10−2 u,w
DOG response produced by image gathering and surround-function responses Figure 27. SRs and SFRs of image gathering and retinex transformation.
79
80
CHARACTERIZATION OF IMAGE SYSTEMS
It is assumed that the variation of the irradiance ι(x, y) is coarse relative to the SR τ (x, y) of the image gathering device so that these two equations can be simplified to
Alternatively, Eq. (41b) can be expressed by regrouping the terms as s˜ re (υ, ω; κ) = Kρˆ (υ, ω)τˆ (υ, ω) + nˆ (υ, ω)τ˜re (υ, ω)
s(x, y) = [Kρ (x, y) ∗ τ (x, y)]s(x, y) + np (x, y) and s(x, y) = Kρ ι(x, y), where ρ (x, y) = ρ(x, y)/ρ. Moreover, the mathematical development is limited to the first order in the variations ρ(x, y), ι(x, y) and np (x, y), along with the approximation [ι(x, y)]−1 = [ι + ι(x, y)]−1 ≈
ι(x, y) 1 1− . ι ι
+ e˜ (υ, ω) + n˜ q (υ, ω; κ).
This expression isolates the error of the retinex transformation as e˜ (υ, ω) = e˜ ι (υ, ω) + eˆ ρ (υ, ω), where e˜ ι (υ, ω) = K˜ι(υ, ω)τ˜e (υ, ω) and eˆ ρ (υ, ω) = −Kρˆ (υ, ω)τˆ (υ, ω)τ˜e (υ, ω)
Hence, the small-signal representation sre (x, y) of the retinex transformation reduces to the linear expression sre (x, y) = K[ρ (x, y) ∗ τ (x, y) + ι (x, y) ∗ τre (x, y), + np (x, y)]
(41a)
where ι (x, y) = ι(x, y)/ι and np (x, y) = nρ (x, y)/Kρ ι. Finally, to account explicitly for the image gathering process, the DFT of Eq. (41a) is given in the same form as Eq. (15c) for the linearly encoded signal s˜ e (υ, ω; κ) by s˜ re (υ, ω; κ) = Kρˆ (υ, ω)ˆ re (υ, ω) + nˆ re (υ, ω; κ),
(39b)
(41c)
≈ −Kρˆ (υ, ω)τ˜e (υ, ω) The first kind of error, e˜ ι (υ, ω), is the failure to suppress the irradiance fully, and the second kind of error, eˆ ρ (υ, ω), is the failure to preserve the reflectance fully. To complete the description of the retinex-coded signal ˜ sre (υ, ω; κ) s˜ re (υ, ω; κ), as expressed by Eq. (41b), the PSD is given in the same form as Eq. (16) for the PSD ˜ se (υ, ω; κ) by ˜ sre (υ, ω; κ) = K 2 ρ −2 ˆ ρ (υ, ω)|ˆ re (υ, ω)|2 + ˆ nre (υ, ω; κ), (42) where the PSD of the accumulated noise
where ˆ nre (υ, ω) + ˆ q (υ, ω; κ) ˆ nre (υ, ω; κ) =
ˆ re (υ, ω) = τˆ (υ, ω)τ˜re (υ, ω) = τˆ (υ, ω) − τˆ (υ, ω)τ˜re (υ, ω) is the throughput SFR. This SFR can be approximated by the difference-of-Gaussian (DOG) function given by the Fourier transform pair
ˆ re (υ, ω) = exp −
ζ ζc
2
ζ 2 − exp − ζce
2 exp[−(π ζce r)2 ], re (x, y) = π ζc2 exp[−(π ζc r)2 ] − π ζce −2 where ζce = ζc−2 + ζe−2 (Fig. 27d). The associated accumulated noise nˆ re (υ, ω; κ) is
nˆ re (υ, ω; κ) = [K˜ι (υ, ω) + nˆ (υ, ω)]τ˜re (υ, ω) + n˜ q (υ, ω; κ), where
nˆ (υ, ω) = nˆ a (υ, ω) + n˜ p (υ, ω)
represents the noise due to the image gathering process, as given by the aliased signal components nˆ a (υ, ω) = Kρˆ (υ, ω)τˆ (υ, ω) ∗ |ˆ||s and the photodetector noise n˜ p (υ, ω) = n˜ p (υ, ω)/ρ. The SFR ˆ re (υ, ω) of the DOG function given before and the SFR τˆg (υ, ω) of the ∇ 2 G function given by Eq. (29) are similar to each other and normally can be interchanged (38,39).
and ˜ ei (υ, ω) + ˆ are (υ, ω) + ˜ pre (υ, ω). ˜ nre (υ, ω) = The first noise term is the PSD of the retinextransformation error ˜ ι (υ, ω)|τ˜e (υ, ω)|2 , ˜ eι (υ, ω) = K 2 ι−2 the second noise term is the PSD of the high-pass filtered aliased signal components ˆ ρ (υ, ω)|τˆ (υ, ω)|2 ∗ |ˆ|| |τ˜e (υ, ω)|2 , ˆ are (υ, ω) = K 2 ρ −2 s and the third noise term is the PSD of the high-pass filtered photodetector noise ˜ p (υ, ω)|τ˜e (υ, ω)|2 . ˜ pre (υ, ω) = (ρι)−2 ˜ q (υ, ω; κ) of the quantization noise in Finally, the PSD the encoded signal is given by Eq. (17) where σse is replaced by σsre . Equations (41) and (42) represent the linear approximation of the retinex-encoded signal in the same form as Eqs. (15) and (16) represent the linearly encoded sigˆ e (υ, ω; κ) given by Eq. (27a) nal. Hence, the Wiener filter
CHARACTERIZATION OF IMAGE SYSTEMS
can be extended directly to account for retinex coding, as given by ˆ re (υ, ω; κ) =
∗ ˆ ρ (υ, ω)ˆ re (υ, ω) K 2 ρ −2 . ˜ sre (υ, ω; κ)
Theoretical Minimum Data Rate Ere The theoretical minimum data rate Ere , similar to Eq. (23d) for Ee , is
(43a) Ere =
ˆ ρ (υ, ω), ˜ ι (υ, ω), ˜ ι (υ, ω) = σι−2 ˜ p (υ, ˆ ρ (υ, ω) = σρ−2 For 2 2 ˆ ˆ ˜ ω) = σp /|B|, and q (υ, ω; κ) = σq /|B|, Eq. (43a) simplifies to ∗ ˆ (υ, ω)ˆ re (υ, ω) ˆ re (υ, ω; κ) = ρ (43b) ˜ sre (υ, ω; κ),
˜ s (υ, ω; κ) re
=ρ
2
ˆ ρ (υ, ω)|ˆ re (υ, ω)|2
+
ˆ n (υ, ω; κ, ι), re
ˆ n (υ, ω) + ˜ q (υ, ω; κ), ˆ n (υ, ω; κ) = re re ˆ n (υ, ω) = [ ˆ ρ (υ, ω)|τˆ (υ, ω)|2 ∗ |ˆ|| + re s
Kισρ σp
Kσρ ρσsre
−2
ˆ ρ (υ, ω)|ˆ re (υ, ω)|2 K 2 ρ −2 ˆ nre (υ, ω) + ˜ q (υ, ω; κ)
dυ dω.
1 2
ˆ B
ˆ ρ (υ, ω)|ˆ re (υ, ω)|2 ˆ n (υ, ω) + re log 1 + Kσρ −2 −2 κ ρσsre
dυ dω.
(45b)
Maximum Realizable Fidelity Fdre
Fdre = =
ˆ q (υ, ω; κ) =
ˆ B
log 1 +
The maximum realizable fidelity Fdre , similar to Eqs. (36) for Fd , is
−2
2 2 σι ρ ˜ ι (υ, ω)]|τ˜e (υ, ω)|2 , + ι σρ and
1 2
(45a) ˆ and ˆ Eq. (45a) ˜ q (υ, ω; κ) = σq2 /|B|, ˜ p (υ, ω) = σp2 /|B| For simplifies to
Ere =
where
81
κ −2 .
ˆ ρ (υ, ω)ˆ re (υ, ω; κ) − ˆ d (υ, ω)] dυ dω [
(46a)
ˆ
ˆ ρ (υ, ω)[1 − 2−Hre (υ,ω;κ) ] − ˆ d (υ, ω)} dυ dω,(46b) {
ˆ re (υ, ω; κ) is the integrand of the information rate where H Hre given by Eqs. (44).
Figures of Merit The figures of merit in the previous sections can be extended directly to account for retinex coding, within the constraints of the linear approximation, as given here for the information rate Hre of the retinex-encoded signal, the associated theoretical minimum data rate Ere , and the maximum realizable fidelity Fre of the digital image or, equivalently, of the continuous image displayed in a noise-free medium. Information Rate Hre The information rate Hre , similar to Eq. (19e) for He , is 1 Hre = 2
ˆ B
ˆ ρ (υ, ω)|ˆ re (υ, ω)|2 K 2 ρ −2 log 1 + ˆ nre (υ, ω; κ)
dυ dω.
(44a) ˆ and ˜ q (υ, ω; ˜ p (υ, ω) = σp2 /|B| For the assumptions that ˆ Eq. (44a) simplifies to κ) = σq2 /|B|,
Hre =
1 2
log 1 +
ˆ B
ˆ ρ (υ, ω)|ˆ re (υ, ω)|2 ˆ n (υ, ω; κ)
dυ dω.
(44b)
re
The theoretical upper bound of Hre remains Shannon’s channel capacity C given by Eq. (21).
Performance Figure 28 illustrates the retinex-transformed signal sret (x, y) for a square-wave target that has either a uniform or nonuniform irradiance. It is assumed that the acquired signal is either free from perturbations in the image gathering process or subject only to blurring. As this illustration reveals, the retinex transformation seeks to suppress the gradual irradiance variation and to produce Mach bands around edges that preserve information about their contrast relative to the mean reflectance. The blurring in the image gathering process degrades the sharpness and contrast of the Mach bands. The loss of reflectance level away from the edges can be reduced by increasing the width of the surround function but only at the cost of a loss in suppressing the irradiance variation. There is inevitably a compromise between suppressing the irradiance and preserving the reflectance. Figure 29 presents the information rate Hre and maximum realizable fidelity Fre as functions of the optical-response index ζc for SNR Kσρ /σp = 256 and retinex-surround index ζe = 0.08. The nonuniform irradiance has a mean spatial detail of either µι = 10 or ˆ 1/2 = 1) of two times the mean spatial detail µ(= µρ |B| the scene and a blur index of either σb = 10, 7, or 5. Most importantly, the curves show that the informationally optimized design (ζc = 0.3, SNR = 256) for linear processing (see Fig. 13) remains the preferred design
82
CHARACTERIZATION OF IMAGE SYSTEMS
for retinex coding. These curves also show that the effect of changes in the mean spatial detail of shadows on both Hre and Fre is small relative to the effect of changes in the sharpness of their penumbra. As can be expected intuitively, the edges of the scene and shadows are increasingly difficult to distinguish as the sampling intervals approach the width of the penumbra. Figure 30 presents images for two nonuniform irradiances that differ from each other only by the sharpness of their penumbra near the limit of resolution with which the retinex coding can distinguish between the sharp edges of the scene and the penumbra. There are two kinds of distortion. One kind of distortion appears in and near shadows
as colored noise and residual shading, and the other kind appears everywhere (but most immediately noticeable outside the shadows) as a loss of reflectance level away from the edges. The first kind of distortion increases as the penumbras become sharper, whereas the second kind increases as the areas of constant reflectance become larger. These distortions diminish when the images are displayed in a smaller format. For the smallest format that is shown (i.e., 64 pixels/cm), it appears that the shadows have been removed without significant residual distortion. Figure 31 presents images for two indexes ζe of the retinex-surround function. In the top row are Wiener restorations, and in the bottom row are slightly blurred
(b)
(a)
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2 0
10
20
30
40
50
0
10
20
30
40
x
x
Irradiance, i(x,y)
Radiance field, (x,y)
50
Figure 28. Features of the retinex-transformed signal for a uniform (top) and nonuniform (bottom) irradiance.
CHARACTERIZATION OF IMAGE SYSTEMS
83
(d)
(c)
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2 0
10
20
30 40 50 x Retinex-transformed signal, sret(x,y), without the blurring in image gathering
0
10
20
30 40 50 x Retinex-transformed signal, s ret(x,y), with the blurring in image gathering
Figure 28. (Continued)
versions of these restorations to reduce visually annoying defects at the cost of a modest loss of resolution and sharpness. These images illustrate how the width of the retinex-surround function controls the trade-off between suppressing irradiance and preserving reflectance. Finally, Fig. 32 compares the measurements E3 for the DPCM and entropy coding with their theoretical lower bound Ere . As for the linear coding characterized in Fig. 17, the entropy E3 is the average of 30 trials on different realizations of the irradiance and reflectance polygons for each measurement. The highest data rate of 5.3 bits that is required to convey the retinex-encoded signal without a significant loss in the information rate corresponds closely to the upper level of ∼40 levels that each nerve fiber in
human vision can transmit from a photoreceptor in the retina to the visual cortex.
CONCLUDING REMARKS Communication theory was used to characterize image systems that combine image gathering and display with digital processing for signal coding and image restoration. This characterization establishes a mathematical foundation for the end-to-end optimization of design and performance trade-offs that otherwise would become entangled in trial-and-error experimentation. Normally, the goal of these trade-offs is to develop an image system that
84
CHARACTERIZATION OF IMAGE SYSTEMS
requires or for the highest resolution that constraints of size, weight, and cost permit. This goal cannot be realized by experiments that deal with image gathering, signal coding, and image restoration as separate and independent tasks. The figure of merit that ties image gathering, signal coding, and image restoration to a common denominator is the information rate in the image system. The information rate H of the acquired signal is used to guide the design of the image gathering device, in particular, the compromise between blurring and aliasing as a function of the available SNR. The information rate He of the encoded signal is paired with the theoretical minimum data rate Ee to assess the efficiency with which information about natural scenes can be transmitted and stored. The information rate Hd of the displayed image is paired with the maximum realizable fidelity Fd to assess the quality with which images of these scenes can be restored. Finally, the critical constraints of human vision are accounted for by the information rate Ho that the eye can convey from the displayed image to the higher levels of the brain.
sb 10 7 5
5
(a)
4 re
3 2 1 0
0.8
re
0.6 0.4 0.2
0
0.2
0.4
0.6
0.8
zc
(a)
mi = 10 µ 5
(b)
4
re
3 2 1 0 0.8
re
0.6 0.4 0.2
0
0.2
0.4
0.6
0.8
zc mi = 2 µ Figure 29. Information rate Hre and maximum realizable fidelity Fre for nonuniform irradiances as functions of the optical-response index ζc . The irradiances are characterized by ˆ ρ (υ, ω) the mean spatial detail µι and blur index σb . The PSD has m = 3 and µ = 1, the acquired signal has a SNR = 256, and the retinex coding has a surround index ζe = 0.08 and η = 8-bit quantization.
produces the best possible images at the lowest data rate for the angular resolution that a particular application
sb = 10 Figure 30. Radiance fields captured by the image gathering device (top) and images restored from the retinex-coded signal. The two nonuniform irradiances differ by the sharpness of their shadows, as specified by the blur index σb . The mean spatial detail of the scene is µ = 5, and the mean spatial detail of the irradiance is µt = 2 µ. The image gathering device is represented by design 1, and the retinex coding has the surround function where index ζe = 0.08 and η = 8-bit quantization. The large, medium, and small formats contain 16, 32, and 64 pixels/cm, respectively.
CHARACTERIZATION OF IMAGE SYSTEMS
85
This approach to characterizing image systems has evolved only recently, most comprehensively in Refs. 7 to 10, and has remained limited to panchromatic imaging. The formulations differ from those in the prevalent literature on digital image processing that, so far, has treated this subject only as a two-dimensional extension of one-dimensional signal processing without accounting adequately for the critical limiting factors that constrain image systems. However, communication theory can be expected to guide the design of image systems toward the best possible performance only if these factors are properly accounted for.
(b)
Acknowledgments The support of Rachel Alter-Gartenberg and Zia-ur Rahman in producing the computational results and reviewing the manuscript is gratefully acknowledged.
ABBREVIATIONS AND ACRONYMS DFT PSD SFR SR
discrete Fourier transform power spectral density spatial frequency response spatial response
BIBLIOGRAPHY 1. C. E. Shannon, Bell Syst. Tech. J. 27, 379–423 and 28, 623–656 (1948); C. E. Shannon and W. Weaver, The Mathematical Theory of Communication, University of Illinois Press, Urbana, 1964.
sb = 5 Figure 30. (Continued)
(b)
(a)
ze = 0.16
ze = 0.04
Figure 31. Images restored from the retinex-encoded signal for two surround indexes ζe . The Wiener restorations in the top row are slightly blurred in the bottom row. The mean spatial detail of the scene is µ = 5, and the mean spatial detail and sharpness of the irradiance are µt = 2 µ and σb = 10. The image gathering device is represented by Design 1, and the retinex coding has η = 8-bit quantization.
86
CHARGED PARTICLE OPTICS
6
Uniform irradiance
E3,ere
5 4 3 Nonuniform irradiance
6
E3,ere
23. R. N. Bracewell, The Fourier Transform and Its Application, 2nd ed., McGraw-Hill, NY, 1986. 24. R. N. Bracewell, Two-Dimensional Imaging, Prentice-Hall, Englewood Cliffs, 1995. 25. J. W. Goodman, Introduction to Fourier Optics, 2nd ed., McGraw-Hill, NY, 1996. 26. H. H. Hopkins, Proc. R. Soc., London A231, 91–103 (1955). 27. M. Born and E. Wolf, Principles of Optics, Pergamon, NY, 1965. 28. M. Mino and Y. Okano, Appl. Opt. 10, 2,219–2,225 (1971). 29. J. M. Enoch and F. L. Tobey, eds., Vertebrate Photoreceptor Optics, Springer-Verlag, NY, 1981. 30. H. J. Metcalf, J. Opt. Soc. Am. 55, 72–79 (1965).
5 4 3 0
21. L. B. Wollf, S. A. Shafer, and G. E. Healey, eds., Radiometry, Jones and Bartlett, Boston, 1992. 22. J. D. Gaskill, Linear Systems, Fourier Transforms, and Optics, Wiley & Sons, NY, 1978.
1
2
5
10
m
31. J. P. Carroll, J. Opt. Soc. Am. 70, 1,155–1,156 (1980). 32. P. Mertz and F. Gray, Bell Syst. Tech. J. 13, 494–515 (1934).
Figure 32. Data rate versus mean spatial detail µ for the informationally optimized design 1 of the image gathering device and retinex coding where the surround index ζe = 0.08 and η = 8-bit quantization. The data rate is given by the measured entropy E3 and its lower bound εre .
33. O. H. Schade Sr., J. Soc. Motion Picture Television Eng. 56, 137–174 (1951); 58, 181–222 (1952); 61, 97–164 (1953); 64, 593–617 (1955).
2. N. Wiener, Extrapolation, Interpolation, and Smoothing of Stationary Time Series, Wiley, NY, 1949.
36. F. O. Huck, C. L. Fales, D. J. Jobson, and Z. Rahman, Opt. Eng. 34, 795–813 (1995).
3. Y. W. Lee, Statistical Theory of Communication, Wiley, NY, 1964.
37. J. E. Greivenkamp, Appl. Opt. 29, 676–684 (1990). 38. D. Marr and E. Hildredth, Proc. R. Soc. London B207, 187–217 (1980).
4. P. B. Fellgett and E. H. Linfoot, Philos. Trans. R. Soc. London 247, 369–407 (1955). 5. E. H. Linfoot, J. Opt. Soc. Am. 45, 808–819 (1955). 6. C. Mead, Analog VLSI and Neural Systems, Addison-Wesley, Reading, 1989. 7. C. L. Fales and F. O. Huck, Inf. Sci. 57–58, 245–285 (1991). 8. C. L. Fales, F. O. Huck, R. Alter-Gartenberg, and Z. Rahman, Philos. Trans. R. Soc. London A354, 2,249–2,287 (1996). 9. F. O. Huck, C. L. Fales, and Z. Rahman, Philos. Trans. R. Soc., London A354, 2,193–2,248 (1996). 10. F. O. Huck, C. L. Fales, and Z. Rahman, Visual Communication: An Information Theory Approach, Kluwer Academic, Boston, 1997. 11. H. B. Barlow, Proc. R. Soc., London B212, 1–34 (1981).
34. W. F. Schreiber, Fundamentals of Electronic Imaging Systems, 3rd ed., Springer-Verlag, Berlin, 1993. 35. C. L. Fales, F. O. Huck, and R. W. Samms, Appl. Opt. 23, 872–888 (1984).
39. D. Marr, Vision, Freeman, San Francisco, 1992. 40. D. J. Jobson, Z. Rahman, and G. A. Woodell, IEEE Trans. Image Process 6, 451–462 (1997). 41. D. J. Jobson, Z. Rahman, and G. A. Woodell, IEEE Trans. Image Process 6, 965–976 (1997). 42. E. H. Land, Proc. Natl. Acad. Sci. USA 80, 5,163–5,169 (1983). 43. A. C. Hurlbert, J. Opt. Soc. Am. A3, 1,684–1,693 (1986). 44. F. O. Huck, C. L. Fales, R. E. Davis, and R. Alter-Gartenberg, Appl. Opt. 39, 1,711–1,730 (2000). 45. T. N. Cornsweet, Visual Perception, Academic Press, NY, 1970.
12. J. R. Schott, Remote Sensing, The Image Chain Approach, Oxford University Press, NY, 1997. 13. R. A. Schowengerdt, Remote Sensing, Models and Methods for Image Processing, 2nd ed., Academic Press, San Diego, 1997.
CHARGED PARTICLE OPTICS
14. Y. Itakura, S. Tsutsumi, and T. Takagi, Infrared Phys. 14, 17–29 (1974).
JON ORLOFF University of Maryland College Park, MD
15. M. Kass and J. Hughes, in Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, Institute of Electrical and Electronics Engineers, NY, 1983, pp. 369–372. 16. D. J. Field, J. Opt. Soc. Am. A4, 2,379–2,392 (1987).
INTRODUCTION
17. Y. Tadmor and D. J. Tolhurst, Vision Res. 34, 541–554 (1994). 18. J. W. Modestino and R. W. Fries, IEEE Trans. Inf. Theory IT-26, 44–50 (1980). 19. R. W. Boyd, Radiometry and the Detection of Optical Radiation, Wiley & Sons, NY, 1983. 20. L. J. Pinson, Electro-Optics, Wiley & Sons, New York, 1985.
This article describes the the lenses used for highresolution charged-particle optics and how their performance — in particular, their resolution — is analyzed. There are many applications for beams of charged particles (ions and electrons) that have very different resolution or
CHARGED PARTICLE OPTICS
beam size requirements. For example, broad (centimeter sized) beams of ions of arsenic and boron whose energies are in the range of a few hundred keV and currents in the range of milliamperes are used in semiconductor device fabrication. At the other extreme, subnanometer sized beams of electrons are used for microscopy in scanning transmission electron microscopes (STEMs), that have currents in the pico–nanoampere range and energy ∼1–100 keV. Charged-particle beams are focused and deflected by electrostatic and magnetic fields, but the way in which a beam is focused depends on the application. The optical systems described here are for high resolution, a resolving power near or at the atomic level in a microscope, or a probe focused to the 0.3–10 nm range. There is a considerable body of books and technical articles on the subject of charged-particle lenses, and this article is only an introduction. For an in-depth treatment of the subject the reader is referred to the extensive three volume treatise by Hawkes and Kasper (1); a more accessible introduction is the older two volume set by Grivet (2). A useful compendium of information can be found in the handbook edited by Orloff (3), and a complete and rigorous treatment of light optics can be found in the treatise by Born and Wolf (4). Today, high resolution means beams of electrons ˚ or beams of focused to a size less than 10 nm (100 A) ˚ using a ions focused to a size less than 100 nm (1000 A) scanning probe instrument such as a scanning electron microscope. Another example is the ability to distinguish ˚ objects in a specimen separated by less than 1 nm (10 A) using a transmission electron microscope (TEM). The meaning of the word ‘‘size’’ in the context of a focused beam has to be defined carefully because of the wide variation in the shape of the beam profile (5). A reasonable definition for the size of the beam is its diameter where the intensity has fallen to one-half of its maximum at the beam center. For example, imagine a beam of charged particles falling on a phosphor screen that indicates the radial extent of the beam by the intensity pattern of the light emitted from the phosphor. The intensity of light emitted from the phosphor is directly proportional to the current density distribution of the beam and will decrease with radial distance from the point where the center of the beam strikes (current density is measured in amperes per square centimeter and is usually denoted by the symbol J, not to be confused with the symbol for joules). A highly focused beam of electrons or ions frequently has a current density distribution which is nearly Gaussian: J(r) = J0 exp(−r2 /2σ 2 ). J0 is the current density on the axis of the beam (r = 0); J(r) = 12 J0 when r = 1.177σ and twice this value of r is the full width at half maximum (FWHM) of the beam. So, the size of the beam can be taken as the FWHM. If the beam shape is not Gaussian, which is the case if the focusing system suffers from aberrations, then this definition of size may not apply. High-resolution electron beams are used primarily in electron microscopes and in electron beam lithography machines for making the lithographic masks employed for integrated circuit manufacturing. High-resolution beams of ions are primarily used for microscopy (imaging), for
87
micromachining1 and for ion-beam-induced chemistry. These applications are frequently related to integrated circuit manufacturing and in both cases the imaging resolution of the beams is of great importance. The way a beam of charged particles is focused depends on how the beam is generated, so we begin with a discussion of electron sources. This will be followed by a description of the way lenses that consist of electrical or magnetic fields can be made that will focus charged particles. We also discuss why an electromagnetic field is a lens in the same sense that a curved piece of glass is a lens for light and conclude with a discussion of the capabilities and limitations of some lenses. SOURCES OF CHARGED PARTICLES FOR HIGH-RESOLUTION BEAMS Thomas Edison discovered long ago that a wire heated in vacuum gives off electric current; later it was discovered that this current consists of electrons. When the first electron microscopes were developed in the early 1930s (6), they employed simple tungsten (W) hairpin filaments as their electron sources. When a W wire is heated to ∼2400 K, it provides a current density J of about 0.1 A cm−2 , and this current can be focused by the electrodes of the electron source into a concentrated spot called a crossover, whose current density is several A cm−2 . The crossover is the optical electron source that is imaged onto the specimen by the microscope’s lenses. An example is an SEM that consists of an electron source and several lenses to focus the beam of electrons to a very small diameter — typically less than 10 nm. The beam is scanned across a specimen by magnetic or electrostatic deflectors, and an image of the specimen is formed by collecting secondary electrons generated by the primary focused beam. The secondary electrons in turn generate the signal used to modulate the intensity of a CRT display scanned synchronously by the focused electron beam (the
Electron source Lens 1 (condenser) Primary electron beam
Intermediate crossover Optical axis of SEM
Lens 2 (objective) Secondary electrons
Secondary electron detector Specimen plane
Figure 1. Schematic diagram of an SEM (beam deflectors and stigmators not shown).
1 An ion beam removes material from a surface by sputtering, analogously to a milling machine that removes material by using a cutting tool, except on a very small scale.
88
CHARGED PARTICLE OPTICS
process is similar to that of a television system). In optical terms, the SEM consists of an object — the source of radiation (electrons) — focused onto an image plane (the specimen) by a series of lenses, as shown schematically in Fig. 1. It follows that if a certain size focused beam spot is needed at the image plane, the overall amount of demagnification required of the optical system depends on the size of the electron source. The source electrodes consist of the filament itself, a focusing electrode that has a small aperture (called a wehnelt) that surrounds the filament, and an anode on which a high positive voltage (300–30,000 volts) relative to the filament is placed. The current density J from the filament is given by J = const T 2 exp(−eφ/kT) (A cm−2 ), where T is the filament temperature, φ the filament work function, k is Boltzmann’s constant, and e the electronic charge. The wehnelt is at a small negative potential (∼100 volts) relative to the filament. The anode voltage accelerates the electrons into the optical column and determines the beam energy. The typical crossover size is several micrometers. A schematic diagram of a source is shown in Fig. 2. Probably the most important metric for the usefulness of an electron gun is its brightness, a quantity closely related to the emittance. Emittance, which is the product of the trajectory and its slope, is essentially a measure of how rapidly a beam spreads as it leaves its source and how much the particle trajectories deviate from the condition r ∼ r, where r is the slope of a trajectory and r its distance from the optical axis (7). The smaller the emittance, the more concentrated the beam along the axis. Brightness is the emitted current divided by the emittance; the smaller the emittance, the higher the brightness of the beam. Properly, brightness should be normalized to the beam energy, in which case it can be shown that the brightness measured over an infinitesimally small angular range is a conserved quantity (see later). However, electron sources are often described in terms of the unnormalized brightness.
Filament Wehnelt
Electron beam Anode
Crossover
Optical axis Figure 2. Schematic diagram of a thermionic electron gun.
When the thermionic electron source is turned on, the filament, wehnelt, and anode act as a lens to form an image of the filament. As for any image created by a single lens, the image is inverted. At the point on the axis where rays (electron trajectories) from the filament pass through the optical axis, there is a maximum intensity, called a crossover. It is this crossover that is the object used for the optical system and whose demagnified image appears on the specimen. The unnormalized brightness is denoted by β (the units of β are A cm−2 sr−1 )2 . If we think of the brightness in terms of the current I of electrons divided by the emittance, we can write it as β=
I , A
(1)
where A is the area of the crossover and is the solid angle that contains the electron trajectories (strictly speaking, the limit of I, A, → 0 should be taken). The solid angle is = 2π(1 − cos θ ), where θ is the half-angle subtended by the electron beam. As the angle shrinks toward zero, → π θ 2 . We can also think of brightness as the intensity of electrons per unit solid angle divided by the area A, I/. In a thermionic emitter, the diameter of the crossover is several micrometers, the intensity is about 1 mA sr−1 , and the unnormalized brightness β ∼ 104 –105 A cm−2 sr−1 . High brightness in an electron gun is important because one would generally like to maximize the current in the beam for any given focused beam size. For simplicity, assume that the beam energy is the same everywhere from the crossover to the specimen. Because brightness is conserved, the brightness on the specimen will be the same as the brightness at the crossover. This can be seen as follows. Suppose, for simplicity, that the current at the crossover and the current at the specimen are equal: Icrossover = Ispecimen . Let the angles subtended by the beam at the crossover and at the specimen be θcrossover and θspecimen , respectively, and let the linear magnification of the lens be M. Then, the angular magnification m of the lens will be 1 Vcrossover . (2) m= M Vspecimen This relationship is just the Abbe sine condition of light optics: no yo sin θo = ni yi sin θi → no yo θo = ni yi θi as θ → 0, where n is the index of refraction, y the linear image size, and θ the beam angle. Subscripts o and i refer to object and image side of the lens, respectively. The linear magnification M = yi /yo , and the angular magnification is θi /θo Because the index of refraction is proportional to the square root of the potential in charged particle optics (see third section), Vo no = = mM. (3) ni Vi
2 sr stands for steradian, a measure of a solid angle. A sphere subtends a solid angle of 4π sr (the area of the sphere divided by its radius squared)
CHARGED PARTICLE OPTICS
Now, if we write A = (Md)2 , = π(mθ )2 , and replace the subscripts o and i by crossover and specimen, respectively, Ispecimen Aspecimen specimen Vspecimen =
Icrossover = const. Acrossover crossover Vcrossover
(2)
The relationship m = (1/M) Vo /Vi has a serious implication. If it is necessary to demagnify by a large amount (M = 10−4 , for example), then the angular magnification will be large. Because lens aberrations that cause the current density distribution to spread out depend on the beam convergence angle, it is not possible to make the angle arbitrarily large, and so the beam angle at the crossover must be limited by using apertures. If an aperture is used to limit the extent of the beam, the beam current is also limited. As the solid angle subtended by the aperture becomes smaller it allows less of the beam through. If the source has a uniform distribution of intensity (I ) (A sr−1 ), then the current passing through the aperture is given simply by I = I (A). To avoid loss of beam current when the solid angle at the image plane (specimen) is limited, the source intensity must be increased. Unfortunately, this is not usually physically possible, and so another approach is taken: reducing the crossover size of the source so that less demagnification is required. It turns out that the only way in which this can be done is to use a completely different type of emitter that has an extremely small crossover: the field emitter or one of its variants [see the article by Swanson in (3)]. Electrons are emitted from a field emitter (a very sharp metallic needle) by lowering the coulombic potential barrier that keeps the electrons inside the metal, so that there is an appreciable probability that they can tunnel out through the barrier. Field emission of electrons is an extremely useful phenomenon; it was predicted theoretically by Fowler and Nordheim (8) in 1928 and ¨ first seen experimentally by Muller (9) a few years later.
Potential energy Fermi level Energy
W Emitted electrons
Ef
Edge of metal
x
−eFx
Figure 3. Schematic one-dimensional diagram showing the tunneling of electrons from the Fermi level through a potential barrier deformed by the application of an electrical field F. The work function of the metal is = W − Ef . and the potential due to the applied field is −eFx.
89
A highly simplified schematic diagram of the process is shown in Fig. 3. Electrons are contained in the metal by a potential barrier of height = W − Ef ; is the metal work function ( ≈ 4.5 eV for W, for example). Application of a high negative voltage to the metal results in an electrical field F that deforms the potential at the metal surface to the point where there is an appreciable probability that an electron can tunnel through the barrier and escape. The tunneling process, which can be analyzed by using the onedimensional WKB approximation, is highly field sensitive. The higher the energy of the electrons (measured relative to the metal’s Fermi level), the higher the probability that they escape from the metal. However, the (Fermi–Dirac) density distribution of the electrons falls off rapidly for energies above Ef , and the product of the density distribution and the probability for tunneling results in a very narrow energy distribution for the emitted electrons: ≈0.25 volts FWHM at room temperature. The current density of the emitted electrons is given approximately by 3/2 6 2 7φ , (4) J = 6.2 × 10 F exp −6.8 × 10 F so, as stated before, the field emission process is very sensitive to the field strength F. It is also sensitive to the work function . Appreciable current is drawn from a W field emitter when the field F ∼ 107 V cm−1 , a value that can be attained by placing a potential of ∼3,000 volts on a field emitter (FE) whose end radius is ≈0.1 micrometer. This room temperature process is called cold field emission, and the emitter is a cold field emitter (CFE). Electrons that tunnel out of the metal are accelerated away from its surface nearly radially, as shown in Fig. 4. There are some practical problems associated with the CFE. First, if any gas molecules are in the neighborhood of the emitter tip and are ionized by the outgoing electrons, they will be accelerated by the field back toward the tip and can cause sputter damage to it. The result is a local change of radius of the tip that can lead to increased emission. At the minimum, the current will vary with time because of this reverse-ion bombardment. At worst, the current can increase so much that the tip heats and loses tensile strength. An electrical field applied to a surface puts an outward stress on the surface proportional to the square of the field, so that the emitter tends to elongate and narrow, thus increasing the field and the current. The result is a runaway process that ends in the emitter’s destruction. Another problem is that gas molecules that adsorb on the field emitter change the local work function, leading to a fluctuating electron current as the molecules diffuse across the emitter tip. These problems can be reduced by operating the emitter in high vacuum: a pressure <1 × 10−9 torr (<10−7 Pa) is desirable for reasonably stable operation. An actual emitter is several millimeters long and has a finely sharpened tip. It is necessary to use single crystal wire for the field emitter, so that emission is derived from only one crystal face to achieve uniformity from one emitter to the next. W is usually chosen because of its refractory properties. Typically a CFE W emitter is oriented in the 310 direction. The optical crossover is actually behind
90
CHARGED PARTICLE OPTICS
End of field emitter
Electrons Field emitter Emitted electrons
Axis
Virtual source Extraction electrode (anode)
Figure 4. Schematic diagram of the tip of a field emitter showing trajectories of electrons leaving the emitter apex and a virtual crossover behind the apex (left). Schematic of a FE electron gun (right).
the surface of the emitter, and so it is a virtual crossover or source (see Fig. 4). The diameter of the virtual source ˚ and this extremely small crossover is about 5 nm (50 A), size means that only a modest amount of demagnification is needed to produce a very small probe size. The excellent optical properties of the CFE led to efforts to reduce its vulnerability to the environment so as to be able to use it more easily. The natural approach was to operate a W emitter at high temperature to reduce the residence time of adsorbed gases and to allow sputter damage to anneal (10,11). It was found that only the 100 crystalline direction of W is stable at the temperature needed to be useful (1800 K), because only for this orientation is there is an overall balance between surface tension in the metal and electrical field stress; other orientations of W deform and destroy themselves when heated. Two forms of emitters were found: the pure W100 thermal field emitter (TFE) and the zirconia coated W/ZrO 100 Schottky TFE. The former is a true thermal field emitter, whereas the latter is not actually a field emitter because the electrons are emitted over a fieldlowered barrier rather than tunneling under it; √ the current − e3 F)/kT]. density follows the law J = const T 2 exp[−eφ √ The amount of barrier reduction is given by e3 F. The Schottky emitter is by far the more widely used because the emitting area of the W100 emitter is so small (∼10 nm diameter) that coulombic effects between the electrons lead to a large (3–5 volt) energy spread in the beam (10). The Schottky emitter has the interesting property that the ZrO coating lowers the work function of the (100) planes of W from ≈4.5 eV to ≈2.8 eV. As a result, the (100) planes emit a large electron current at 1800 K. The (100) plane at the emitter apex has its emission barrier lowered even more by the high electrical field (F ∼ 106 V/cm), and as a result the electron current is intense along the emitter axis (11,12). Because the emitting plane of the Schottky cathode is about 0.5 micrometers in diameter, the current density is orders of magnitude lower than in the W100 cathode, and the energy spread is about 1–2 eV. The diameter of the virtual crossover of these thermal field cathodes is about 15 nm ˚ so in either case still only a modest degree (150 A), of demagnification is required to produce an electron beam focused to 1 nm, compared with that needed for a thermionic emitter. As a result, essentially all SEMs and STEMs use Schottky TFE type cathodes, and the optical
Accelerating electrode
Optical axis
systems are quite different than those for thermionic cathodes; de-magnification factors are ∼100 as opposed to the ∼104 needed for a thermionic cathode system. High-resolution focused ion beam (FIB) systems also use field emission type sources called liquid metal ion sources (LMIS). These are essentially point sources of ions, and so the optical systems follow a design similar to that used for high-resolution electron beam systems, except that electrostatic rather than magnetic lenses are employed. The LMIS consists of a blunt field emitter that have an end radius of ≈5 micrometers supported by a wire loop used for heating. The emitter is covered by a liquid metal (most FIBs use Ga metal, which is liquid at room temperature). If a higher temperature metal (or metal alloy) is used, the emitter is heated to the metal melting point. For the LMIS to operate, the metal must have a reasonably low vapor pressure; the maximum pressure that can be used is about 10−5 torr (10−3 Pa) to avoid too rapid a loss of material through evaporation. When an electrical field is applied to the metal by placing a high positive voltage (∼5–10 kV) on the emitter relative to a nearby (∼5 mm) extraction electrode, the field stress draws the liquid toward the emitter apex. The stress is counterbalanced by the surface tension of the liquid, and ultimately the liquid assumes the shape of a cone that has a 49° interior half-angle, called a Taylor cone (for an excellent review of LMISs, see the article by Mair in (3)). This is the only stable shape. The field on the cone increases inversely with the radius, as does the surface tension, and the field becomes large enough near the cone apex to support field evaporation and ionization of the metal atoms, thus generating an ion beam. Atoms that are removed by ionization are replaced by the liquid flow from the interior of the cone. The system requires a high enough surface tension to prevent the metal from exploding before the field necessary to cause ionization is reached, that is, the electrical field needed for field evaporation must not exceed the Rayleigh limit, which expresses the maximum field F that can be applied to a liquid surface before the field stress exceeds the surface tension force: εF 2 /2 = 2γ /r for a spherical liquid drop: ε is the permittivity of the medium in which the drop is (vacuum for a LMIS), γ is the surface tension, and r the sphere radius. Although limited to electrostatic lenses because of the small charge-to-mass ratio of the ions, FIB systems
CHARGED PARTICLE OPTICS
nevertheless have attained an imaging resolution of 5 nm (see Fig. 10).
91
Electrodes
CHARGED PARTICLE LENSES: ELECTROMAGNETIC FIELDS AS FOCUSING ELEMENTS Lens action is due to a change in the refractive index in some region of space; an example is a piece of curved glass in air. Because of the formal identity between classical particle mechanics and light optics first explored by Hamilton [see (13)], it is possible to show that electrically charged particles that traverse electromagnetic fields behave the way light photons do when passing through transparent media of varying index of refraction. That is, if a beam of electrically charged particles encounters an electrostatic or magnetic field, the action of those fields on the beam will be the same as the action of a glass lens or prism on a beam of light. The lens for the charged particle is a field, and the field is generated by an appropriate set of electrodes or magnetic poles; the resulting lens can be described by using optical terminology of focal length, principal planes, aberrations etc. The physical electrodes or pole pieces are called the lens (see Fig. 4 and 5), but the fields do the focusing. In classical mechanics, one can derive the equations of! motion using the principle of least action: 2m(E − V) dt = 0, where T is the kinetic energy, V δ the potential energy, E the total energy, and m the mass of the particle,!respectively. In light optics, the analogous expression is δ n ds = 0, where n is the index of refraction and s the path. In charged-particle optics, 2m(E − V) plays the role of the index of refraction. In practice, the fields in the lens are calculated numerically, and trajectories of particles are found by solving the equations of motion in the fields. The formal correspondence between mechanics and optics, the trajectories, justifies the use of lens terminologies such as a focal length, aberrations, object and image positions, magnification etc. TYPES OF LENSES Two kinds of lenses are in common use: electrostatic and magnetic, or sometimes a combination of both. The lenses for microscopy are cylindrically symmetrical to the axis of symmetry that coincides with the optical axis. Examples of electrostatic and magnetic lenses are shown in Figs. 5 and 6. In an SEM, for example, the object of the optical system is the electron source, and the image of the source, created by magnetic lenses (generally), appears as a highly demagnified spot on the specimen, where it is scanned by electrostatic or magnetic deflectors. In general, electrostatic lenses must have the object and image outside of the lens where the potential has vanished, so that the field is not distorted. The field of a magnetic lens is generally not affected by the presence of an object inside it, and so magnetic lenses can operate with much shorter focal lengths than electrostatic lenses (millimeters vs. centimeters). Because the amount of magnification attainable and also the magnitude of the lens aberrations depend strongly on the focal length, magnetic lenses are almost exclusively used in high-resolution microscopes.
10 kV
−1.8 kV
10 kV
Figure 5. Schematic diagram of a three-electrode electrostatic lens showing the equipotentials generated by the voltages on the electrodes and the path of a beam of charged particles from an object point on the left to an image point on the right (from the article by Munro in (3)).
However, if an optical system is to be used that has ion beams instead of electron beams, then electrostatic lenses must be used. The reason is that the force exerted on a particle of charge q by an electric field E is proportional to qE. If the lens has length L, then the particle traverses it in a √ time T ≈ L/v, where v is the velocity of the particle, v = 2η · η = q/m is the charge-to-mass ratio of the particle, and is the accelerating voltage of the source. The velocity will not be constant in the lens, hence T is not precisely L/v. The particle will be deflected by the field by an amount x ∼ ηET 2 (the acceleration of the particle is ηE). Thus the deflection x ∼ ηEt2 = ηEL2 /2η = EL2 /2, which is independent of the charge-to-mass ratio. In a magnetic field, the particle feels a force ∼qvB, where B is the magnetic field√strength. In this case the deflection √ amount x ∼ ηBL2 / 2. Because the ratio of the amount √ of focusing depends on η, when comparing focusing strength for electrons of mass m and ions of mass M, a magnetic lens will be ∼ m/M weaker than an electrostatic lens, about 300 times weaker for Ga ions, for example. PROPERTIES OF LENSES An ideal lens will produce an image of an object that is geometrically similar to the object across an arbitrary area. Lenses for focusing light come close to the ideal across a limited spatial area, as in a highquality camera. This is possible because the aberrations of individual lens elements can be compensated for by using multiple elements of positive and negative lenses made from materials of different index of refraction. In this way, spherical and chromatic aberration can be essentially eliminated, for example. There are five
92
CHARGED PARTICLE OPTICS
R2 R1
M F2
F1
FG
Figure 7. Schematic representation of the effect of spherical aberration (see text for details).
70 mm Magnetic circuit region (asymmetric lens)
Axial flux density (teslas)
5e−2
0 −125
25 Z coordinate (mm)
Figure 6. Schematic diagram of a magnetic lens (left) showing the magnetic flux distribution through the pole pieces and in the air gap. The flux density on the axis of the lens is shown on the right (from the article by Munro in (3)).
primary lens aberrations plus chromatic aberration that affect a cylindrically symmetrical, space-charge-free optical system: spherical aberration, coma, astigmatism, curvature of field, and distortion. The primary aberrations affect light rays of the same wavelength or charged particles of the same energy. Chromatic aberration affects a beam that has a distribution of particle energies. In general, the aberrations depend on the beam-angle subtended by the beam-limiting aperture. We will now give a brief description of aberrations that affect high-resolution charged-particle instruments. This is a complex topic, and we refer the reader to reference (4) for a thorough explanation of the meaning and origin of aberrations in light optics and to (1,2) for electron optics; see also the article by Hawkes in (3).
Spherical aberration means that the focusing power of a lens increases as the distance from the lens axis increases. It is evident in a glass lens that has spherical surfaces, for example, and is always present in a cylindrically symmetrical magnetic or electrostatic lens [Scherzer (14) showed that the spherical aberration of a cylindrically symmetrical (round), open (no screens or electron transparent foils covering the electrode apertures or pole pieces) space-charge-free lens is always positive. Nonsymmetrical lenses, foils, and space charge are technically very difficult approaches for aberration correction and are rarely if ever used in microscopy]. Spherical aberration depends on the cube of the convergence angle of a beam, and so is one of the two most important aberrations (the other is chromatic aberration). In a probe forming system, spherical aberration smears out a focused spot into a disk; in a transmission microscope, any point in the object is transformed into a disk in the image plane. The outer diameter of the disk is given by d = 12 Cs θ 3 , where θ is the convergence half-angle and Cs is the spherical aberration coefficient. This is called the disk of confusion whose effect can be qualitatively seen in Fig. 7. Note that the further a ray is from the optical axis, the more strongly it is focused. The disk of confusion is quite large at the Gaussian focal plane FG — the point where the image would be produced and would be in sharpest focus if there were no aberrations. The disk of confusion has a minimum size (‘‘M’’)located upstream from FG. Diagrams such as Fig. 7 must be considered with caution. Although the diameter of the disk accurately indicates the extent over which the current density distribution has been spread due to the spherical aberration, it says nothing about the actual distribution; in particular, the distribution is far from uniform. Figure 8 shows the current distribution for a focusing system limited by diffraction and without aberrations and
CHARGED PARTICLE OPTICS
(a)
0
r (b)
0
r Figure 8. (a) The current density distribution for a diffraction-limited optical system and (b) for a spherical aberration limited system (16).
the same system that has a large amount of spherical aberration. The current density distribution is drastically affected, and the implications for this will be considered later in the discussion of calculating resolution. Chromatic aberration or, more accurately, longitudinal chromatic aberration is the aberration that has the greatest effect on the size of a probe besides spherical aberration. This is well known in light optics and manifests itself as colors fringes around objects if the optical system isn’t corrected. It results from longer wavelength radiation being refracted more than shorter wavelength radiation — this is the reason a prism produces a spectrum. In charged-particle optics, it is found that higher energy
93
particles are deflected less than lower energy particles in a lens. Higher energy particles spend less time in the lens because they are moving faster, and so the force exerted by the lens has less time to act; if we think of the lens giving a radial impulse F (toward the lens! axis) to a particle, the momentum change will be p = t F dt where t, the time spent in the lens, is shorter for higher energy particles. The effect on the current density distribution is to cause it to spread out to a disk whose diameter is dc . It can easily be shown that dc depends on E/E and on the convergence half-angle of the beam, where E is the spread of energy of the beam and E the mean energy (E = qV, where V is the acceleration voltage of the beam): dc = Cc E/Eθ . Cc is the chromatic aberration coefficient, and θ is the beam angle. Because the beam energy distribution will not have a sharp cutoff, E usually represents its FWHM. Chromatic effects are most serious for FIB systems because the energy spread in an ion beam from a LMIS is quite large — 5 volts or more — but it can also be serious in the highest resolution STEMs and TEMs. There is another form of chromatic aberration called chromatic aberration of magnification that affects the TEM; it manifests itself as a radial distortion of the image through an energy-dependent magnification. The most important aberrations by far for a probeforming system are spherical and chromatic because they are both independent of the extent of the source of ions or electrons. The other aberrations, distortion, curvature of field, astigmatism, and coma depend to a greater or lesser extent on how large the source is or, more properly, how far from the optical axis the radiation originates. These other aberrations will be important in a TEM where an extended object is imaged but have negligible effect on an instrument that uses a field emission type source that is effectively a point source. We briefly indicate the effect of these four aberrations. Distortion deforms an object by displacing each image point from the position it would have in an ideal system by an amount proportional to the cube of the distance of the corresponding object point from the optical axis. The distortion may be either positive (‘‘barrel’’ distortion) or negative (‘‘pincushion’’ distortion). Distortion is also independent of the beam aperture angle. Curvature of field makes the image of a flat object in focus only on a curved surface near the image plane. This distortion is proportional to the square of the distance of a point from the optical axis and to the beam aperture angle. Astigmatism results from a lens that has different focal lengths in two orthogonal planes. The result is a blurred image; when the image is examined as the optical system is focused, the signature of astigmatism is that point objects seem to elongate first in one direction and then in the orthogonal direction, as the system passes through focus. Astigmatism depends on the square of the distance of a point from the optical axis and on the beam aperture angle. Coma deforms a point object into a comet-shaped figure (coma is Latin for comet). The effect is proportional to the distance of the object point from the optical axis and to the square of the beam aperture angle. The results are summarized in Table 1.
94
CHARGED PARTICLE OPTICS Table 1. Dependence of Various Aberrations on Optical System Parameters. Aberration
Position dependence
Dependence on beam aperture angle
Distortion Curvature of field Astigmatism Coma Spherical Chromatic
Cubic Quadratic Quadratic Linear None None
None Linear Linear Quadratic Cubic Linear
Because magnetic forces make the image plane rotate in the lens (magnetic forces are perpendicular to the beam velocity) several of these aberrations, distortion, coma, and astigmatism, have counterparts called anisotropic distortion, anisotropic coma, and anisotropic astigmatism, respectively. The anisotropic effect adds a rotational component to the aberrations. For example, the comet-shaped figure produced by coma is curved. Chromatic aberration of magnification also suffers from an anisotropic component if magnetic lenses are used [see (1)]. It is important to be able to determine the effect of aberrations on the current density distribution to calculate the resolution of an instrument (see later). To do this, the coefficients for the aberrations must be found. The details of the way this is done are beyond the scope of this article, but we can indicate a procedure here. The properties of an optical system can be found by expressing the optical path of a ray through it as a power series in the radial coordinates x and y, assuming that the optical axis is the z axis. If this is done, the symmetry of the system allows eliminating a number of possible combinations of x and y. The path that the rays follow can then be found by taking the gradient of the optical path. To the lowest order of the power series, this gradient gives the Gaussian image plane. As higher order terms are included, the paths are shifted, and it is possible to identify shifts proportional to different powers of distance and angle that correspond to the five aberrations. For an electron microscope, the shifts can, with some effort, be expressed as integrals across the electric potentials and magnetic fields of the lens, their derivatives, and the undistorted (first-order) trajectories. For details, the reader is referred to (1) and to the article by Munro in (3). In light optics, the effects of aberrations can be reduced by combining lens elements that have different shapes and index of refraction and also by limiting the acceptance area of the lens to rays that travel near the optical axis by using apertures. Unlike light lenses, charged-particle lenses cannot be combined to eliminate aberrations, and so the only practical way to reduce the effects of aberrations is to use apertures to limit the angle subtended by the beam. The beam current is equal to the beam intensity multiplied by the beam solid angle. So when apertures are used to limit the area, the current is reduced. Thus, one always faces the trade-off of beam size against beam current.
The resolution of an electron microscope cannot be improved indefinitely by using smaller and smaller apertures because the wave nature of an electron results in diffraction by the aperture, just as for light. Ultimately, it is necessary to balance the effect of aberrations against diffraction of the electrons at the beam limiting aperture (5). Diffraction is a phenomenon caused by the wave nature of matter. It was noted for light in the early nineteenth century but was not seen for massive particles until the 1920s when electrons were diffracted from a crystal. When passing through a circular aperture of sufficiently small size, any particle — photon, electron, neutron — will exhibit a diffraction pattern of its wave function. This pattern — called an Airy function — is described by the equation
J1 (kra/Z) 2 A(r) = , (5) (kra/Z) where k is the wave number of the radiation, k = 2π/λ, r is the distance from the optical axis, a the radius of the aperture (which is centered on the axis), and Z is the distance from the aperture to the plane where the pattern is observed. J1 is a Bessel function of the first kind of order one. Because a/Z defines the aperture half-angle θ , we can also express A as A(r) =
J1 (krθ ) (krθ )
2 .
(6)
This function plays an important role in light optics because it can be used to define the resolution of a diffraction-limited system (one whose aberrations have been corrected). A(r) is shown in Fig. 9. The Rayleigh definition (4) of resolution is that when the maximum of the diffraction pattern of a point object (such as a star seen in a telescope) falls over the first minimum of the diffraction pattern of an adjacent point object, the two objects are just resolved, that is, the resolution is the distance between the two maxima of the Airy functions. This distance (on the observation plane) is given by the condition krθ = 2π θ/λ = 1.22π and leads to the formula: Resolution = 0.61λ/θ . As an example, if electrons in a microscope have an energy of 100 keV, then their wavelength is approximately 0.035 A˚ (3.5× 10−3 nm), and it has a beam limiting aperture of θ = 5 mrad, the minimum resolvable distance would be 4.3 A˚ (0.43 nm). There is effectively no such limitation for ions because the wavelength of even the lightest ion, a proton, is some forty times less than that of an electron of the same energy. Because electrostatic lenses are required to focus ions, their much greater (5X or more) spherical aberration means that the ultimate beam size of an ion microscope is still not as good as that of an electron microscope. Nonetheless, focused ion beam instruments employing field emission liquid metal ion sources that have liquid Ga emitters can routinely produce a focused Ga+ ion beam less than 10 nm in diameter. Figure 10 shows an image of graphite that contains visible information at the 5 nm level.
CHARGED PARTICLE OPTICS
electron microscopes limited by spherical aberration and diffraction, the problem was to minimize the expression
0.25
d=
0.2
A
0.15
0.1
0.05
0 −8
95
−6
−4
−2
0 x
2
4
6
8
" #2 Figure 9. The Airy function A(x) = J1 (x)/x describes the intensity distribution of diffraction-limited radiation.
1 Cs θ 3 2
The problem of estimating the resolution of a chargedparticle instrument has been approached in a number of ways. Experimentally, resolution has been defined as the minimum discernible distance between two objects; however, although practically useful, this is somewhat vague because it leaves the definition of discernible up in the air. A simple although hardly rigorous (or terribly accurate) approach that was used for many years was to calculate resolution by adding the effects of various aberrations and diffraction (in the case of electrons) in quadrature and use the result to find an optimum aperture angle. There is no justification for this approach because the shapes of the current distributions produced by each aberration are not Gaussian (see Fig. 8). For
+
0.61λ θ
2 ,
(7)
which yields θ = 0.916(λ/Cs )1/4 (d is the beam diameter). The problem in this approach is that it combines the notion of spherical aberration from geometrical optics, assuming a uniform distribution of current density over a circle of diameter 12 Cs θ 3 , with the notion of diffraction from wave optics. A rigorous way to define resolution is to calculate the modulation transfer function (MTF) of the optical system. The MTF is the image contrast measured as a function of spatial frequency: to determine resolution, which can be thought of as the finest discernible detail in an image, the MTF gives you the highest spatial frequency in the image for any given contrast level. In particular, because the lowest amount of contrast that the human eye can reliably see is about 10%, a good (and rigorous) definition of resolution R is that R equals the inverse of the spatial frequency corresponding to the 10% contrast level. This is actually fairly intuitive: contrast as a function of position in an image is defined as follows: C(x) =
Figure 10. An image of fractured graphite made by a 30-keV Ga+ ion beam including visible detail at the 5-nm level (15).
2
I(x + x) − I(x) , I(x + x) + I(x)
(8)
where I(x) is the intensity at point x and (Ix + x) is the intensity at an adjacent point x + x. C(x) is the contrast between the two points. Intensity in an image can be measured by using a microdensitometer to measure light transmission point by point, through a photographic negative of the image, for example. If C(x) is plotted as a function of spatial frequency, the result is the MTF. Spatial frequency ν is the distance between two points measured in nm−1 : ν = 1/x. In charged particle optics, this is called the wave optical approach, as opposed to the geometrical optics approach in which the particles are considered to have zero wavelength (except when calculating the contribution of diffraction in an electron microscope). As an example, suppose that an image of a structure that has a continuously varying spatial period is produced bu using a scanning probe of size L (FWHM), as shown in Fig. 11. If the spacing of the structure is much larger than the beam size L, we would expect the contrast to be high: the structure would be ‘‘resolved.’’ However, if the spatial period of the structure is much less than L so that the beam overlaps several periods at any point, we would expect low contrast between adjacent parts of the structure: the structure would not be resolved. A plot of the image intensity produced by the probe system as a function of position x on the specimen would look like Fig. 12. The contrast, as defined before, as a function of spatial frequency is shown in Fig. 13. The resolution at a given contrast level is taken as 1/ν, where ν is the spatial frequency (nm−1 ) corresponding to that contrast level. If the lowest value of the contrast that
96
CHARGED PARTICLE OPTICS
1.0
Modulation contrast C m
L
Cth −1 L min
0 Spatial frequency n = Position on the specimen Figure 11. Structure in a specimen where a continuously varying spatial frequency shows the nominal beam size L of a focused probe (16).
L
I max
I min
Beam position on the specimen Figure 12. Intensity of the image as a function of position on the specimen in Fig. 11 (16).
can be detected is Cth , then the resolution is L = 1/νth , where νth ↔ Cth . This definition of resolution is ideal because it is unambiguous, at least once it is decided what the minimum discernible contrast actually is. Another method that gets around this problem will be described below. Note that this definition is based on experimental data that can be obtained from an image. It is an interesting fact that the MTF can be derived theoretically in two ways: (1) directly from the optical properties of the microscope; and (2) by taking the Fourier transform of the intensity distribution of light for a light microscope or its equivalent for an electron or ion microscope, the current density distribution (4). This allows an easy comparison of theory with experiment. Result (2) in the previous paragraph is very important: wave optics must be used to obtain a rigorous prediction of the performance of an instrument, and this can be
L−1
Figure 13. The modulation contrast or MTF of the intensity function shown in Fig. 12 (16).
done fairly quickly on a desktop computer. However, the calculations would become extremely long for ions because the wavelengths are so short — thousands of times shorter than for electrons. However, it is possible to calculate the current density distribution in an ion beam fairly straightforwardly and then to take the Fourier transform. In this way, the rigorous concepts from wave optics can be applied to ion beam instruments. Because high resolution ion microscopes are so important technically, we will show how their resolution can be determined following the treatment developed by Sato (5). Consider an optical system whose optical axis coincides with the z-axis (see Fig. 14). Let the source coordinates be (xo , yo , zo ), the coordinates of the Gaussian image plane be (xi , yi , zi ), and the coordinates of a sampling plane or the target plane, at which the current distribution is to be calculated, be (xt , yt , zt ). The origin of the coordinate system is taken as at zi = 0. The source is characterized by a brightness B(xo , yo , xo , yo ) (’denotes d/dz). In terms of B, the differential
dI = Bs(xo, yo, xo′, yo′) dxo′ dyo′ dxo dyo yo
yi
yt
(xt ,yt )
xo
xi
xt
( xo ,yo )
zo
Optical axis
zi
zt
z
(Target plane) Source area Γo (Object plane)
(Gaussian image plane)
Figure 14. Definition of the coordinates used to calculate the current density (5).
CHARGED PARTICLE OPTICS
current emitted into a solid angle dxo dyo is given by
97
g (z ) 1
dI = B(xo , yo , xo , yo ) dxo dyo dxo dyo .
(9)
Then, the total beam current is found by integrating over dxo , dyo , dxo , and dyo , where the limits of integration are the solid angle defined by the entrance pupil of the optical system and the area for which B = 0:
B(xo , yo , xo , yo ) dxo dyo dxo dyo .
Ip =
h (z )
π 4 0
(10)
(xo ,yo )∈(xo ,yo )∈
M
To find the current at a point xt , yt at the axial position zt , it is necessary to find all of the trajectories that pass through that point and originate at an emitting site on the source. To do this, we need a relationship between the target coordinates and the source coordinates:
xo = xo (xt , yt , xo , yo ),
(11a)
yo = yo (xt , yt , xo , yo ).
(11b)
Then, one can transform coordinates dxo dyo dxo dyo in the integral in Eq. (10) to dxt dyt dxo dyo by a Jacobian transformation, dx o dx K= t dyo dxt
dxo dyt , dyo dyt
(12)
to obtain J(xt , yt ) =
zo
(13)
where xo and yo are in and xo and yo are in the solid angle , and where the (differential) current has been divided by dxt dyt to obtain the current density. An analytical expression can be found for Eqs. (11a) and (11b) by using geometrical optics with the results of the paraxial ray equation and third-order aberration theory, as follows. Let g(z) and h(z) represent two solutions of the paraxial ray equation where initial conditions are g(zo ) = 1, g (zo ) = 0, h(zo ) = 0, and h (zo ) = 1 at the source plane, as shown in Fig. 15. If we establish initial conditions X(zo ) ≡ Xo , Y(zo ) ≡ Yo , X (zo ) ≡ Xo , and Y (zo ) ≡ Yo , a paraxial ray can be described by
g(z) and h(z) is given by β(z) = [g(z)h (z) − g (z)h(z)][(zo )/(z)]1/2 ,
Xo =
ht Xt ht ht Xo − βt + gt ht βt + gt ht
(16a)
Yo =
ht Yt ht ht Yo − . βt + gt ht βt + gt ht
(16b)
and
As usual, if the Gaussian image plane and the target plane are at the same potential (this means that the target or specimen is at ground potential), then βi = βt . Then the slope of a trajectory outside the lens field will remain constant, so gt = gi , ht = hi = βi /gi . When z ≡ zt − zi = zt (zi = 0), we find that gt = gi + zgi and ht = zhi (hi = 0). Now, the linear magnification is defined by M = gi /go = gi , and the angular magnification by m = hi /ho = hi , so Eqs. (16a) and (16b) may be rewritten as Xo =
and Xo =
X (z) = Xo g(z) + Xo h (z), Y (z) = Yo g(z) + Yo h (z).
(14)
It is simplest to find the Jacobian K in terms of the coordinates on the Gaussian image plane. The Jacobian of
(15)
where (z) is the potential at z. At zi , h(zi ) = 0 by the definition of focusing action, so β(zi ) = g(zi )h (zi ). Using this relationship and Eq. (14), we can write Xo and Yo in terms of gt , ht , etc., where gt ≡ g(zt ), etc. as
X(z) = Xo g(z) + Xo h(z), Y(z) = Yo g(z) + Yo h(z),
zt
Figure 15. The definition of the two solutions g(z) and h(z) of the paraxial ray equation (5).
|K|B(xo (xt , yt , xo , yo ),
× yo (xt , yt , xo , yo ), xo , yo ) dxo dyo ,
zi
Xt − zmXo zg M 1+ i M Xt − zmXo . zg M 1+ i M
(16c)
(16d)
For a ray starting out at unit distance such as g(go = 1), the slope of the ray on the Gaussian image plane is of the order of the focal length f of the lens, gi ≈ 1/f ∼ 10 mm−1 .
98
CHARGED PARTICLE OPTICS
Typically the distance z of the target plane from the Gaussian focal plane will be of the order of micrometers to tenths of a millimeter, and M ∼ 1. In this case, zgi /M 1, and Eqs. (16c) and (16d) simplify to Xo =
Xt − zmXo M
Yo =
Yt − M
.
(17b)
Now, the aberrations of the system must be included to obtain the actual trajectories. These effectively displace the paraxial trajectories by a small amount. Suppose that the actual trajectories are given by lower case letters xt , yt , etc., as opposed to the paraxial trajectories Xo , Yo , etc. Because the magnitude of the trajectories can be determined as functions of Xo , Yo , Xo and Yo and of the aberrational coefficients of the optical system, we may write xo = Xo + xo (Xo , Yo , Xo , Yo ) yo = Yo + yo (Xo , Yo , Xo , Yo ),
∞ N(V)J(xt , yt , zto − Cci −∞
V ) dV Vi
2
2
yo = −Cso (Xo + Yo )Yo ,
(21b)
xo = θo ρ cos ,
(22a)
yo
(22b)
= θo ρ sin ,
and the equations for xo and yo become xo =
xt − M
yo =
yt − M
zmθo ρ + Cso θo 3 ρ 3 cos M
(23a)
zmθo ρ + Cso θo 3 ρ 3 sin , M
(23b)
and
(19)
In Cartesian coordinates, the displacement due to spherical aberration is given by xo = −Cso (Xo2 + Yo2 )Xo ,
(21a)
Then, the Jacobian K is easily found as K = 1/M 2 . If we assume that the optical system is cylindrically symmetrical and if the beam divergence half-angle into the entrance pupil is θo , we can use cylindrical dimensionless coordinates ρ and , where ρ = r/a and a = entrance pupil radius, to write Eqs. (21a) and (21b) as
(18)
having assumed that the angular aberrations are negligible, so that xo = Xo , yo = Yo . Because we have taken (zi ) = (zt ), the region of space around the target plane is field free so that xt = Xt and yt = Yt , that is, there is no lens action and hence no aberrations introduced into the trajectories in this region. There are only two aberrations that need to be considered when the beam is near the axis and the source dimensions are very small, as for field emission sources. These are spherical aberration, described by Csi (at the target) and chromatic aberration described by Cci (at the target). The relationship between the spherical aberration measured on the image plane and on the object plane 3/2 is Cso = Csi (Vo /Vi )/M 4 , and the relationship between chromatic aberration measured on the image and object planes is Cco = Cci ((Vo /Vi )3/2 /M 2 ). Chromatic aberration can be taken into account by allowing zt to vary according to zt = zto − Cci V/Vi , where the energy variation V is given by the energy distribution N() of the source and Vi is the beam voltage at the target. In this case, the result for J(xt ,yt ) must be integrated over N():
J(xt , yt ) =
xt − zmxo − Cso (xo2 + yo2 )xo , M yt − zmYo yo = − Cso (xo2 + yo2 )yo . M
xo =
(17a)
and zmYo
where Cso is the spherical aberration coefficient at the object plane. Then,
(20a) (20b)
so that K = (θo /M)2 ρ. It is necessary to restrict the area of integration to the region of the source that is actually emitting. For this purpose, it is convenient to suppose that B = B(ro ) source for ro ≤ Ro and B = 0 for ro > R"o . For a Gaussian # current distribution J = Jo exp − 21 (r/rs )2 , Ro might be taken as 5 rs , for example. It is also convenient to replace Cso θo3 by Csi θi3 /M 4 . Then, let ro = (x2o + y2o )1/2 , and normalize all distances to Ro . Then, we can replace the equations for xo and yo by
ro Ro
= R(ρ)2 +
rt MRo
where R(ρ) =
2
− 2R(ρ)
rt MRo
Csi θi3 3 zθi ρ+ ρ . MRo MRo
cos ,
(24)
(25)
The coordinates are shown in Fig. 16. Now, the current density for a monoenergetic beam can be written as an integral of B[ro (rt , zt , ρ, )] over ρ and , where ρ is restricted to 0 ≤ ρ ≤ 1. The limitation on B means that ro /Ro ≤ 1 which will put a limit on the maximum and minimum values of and on R(ρ). From Eq. (24), we see that if = 0 and ro /Ro = 1, R(ρ) = rt /MRo ± 1, from which ρ is determined implicitly. Then, the maximum value of can be found from 2 rt 2 −1 R (ρ) + MRo
. (26) cos = 2R(ρ)rt MRo
CHARGED PARTICLE OPTICS
99
the source of a scanning probe:
yo Ro
2 H = 5.30
S ln 1 + |MTF (ν)|2 ν dν, N
(28)
0
R(r)
1 φm
rt
Locus of ray
MR o
Source circle
Figure 16. Definition of the locus of the emitted rays of particles on the object plane and the normalized source coordinates for Eqs. (24) and (25) (5).
Then, the equation for the monoenergetic current density in the target plane can be written in terms of ρ and as J(rt , zt ) = 2
θo M
m B[ro (rt , zt , ρ, )]ρdρd ρ
(27)
0
where ρ means that the integral over ρ is restricted to the values of xo and yo to the actual source emitting area. At this point all that is necessary to calculate the MTF for the optical system is to calculate the Fourier transform of Eq. (27). Another definition of resolution was formulated by Sato (5,17–19) based on work of Fellget and Linfoot (20) that in turn is based on Shannon’s theory of information (21), in which resolution is related to the ability of the optical system to pass information from the object plane to the image plane. In particular, Sato’s method explicitly takes into account the effect of noise in the image, and so gets around the problem of how to settle on the contrast that should be used when determining the highest spatial frequency of the MTF. The problem is that image noise, which is manifested as the signal-to-noise (S/N) ratio in the image, depends on the amount of charge used to create the image which, in turn, is the product of the beam current and the time. It is not possible to reduce noise or increase the S/N ratio by using an arbitrarily large amount of charge. Using an electron beam, there may be specimen deterioration and beam drift, so practical exposure times for image gathering are ∼10–100 seconds. Using an ion beam, the specimen is eroded by sputtering. See Orloff (15) for a discussion of the limits on ion beam resolution due to specimen deterioration. As a starting point, Sato used Fellget’s expression (20) for the amount of information that can be transmitted or passed by an optical system modified to take into account
where the spatial frequency has been normalized to 0.61λ/θ for an electron beam system or to the source size D (FWHM) for an ion beam system (H is dimensionless). It would be possible to take into account the effect of a finite source by including the Fourier transform of the source current distribution, but for simplicity we will ignore it here [see the article by Sato in (3)]. Fellgett’s original expression was derived by considering information about an object that is transmitted to the image plane of a microscope by analyzing the number of pixels in the object that are imaged onto the image plane. Each of a large number of pixels has a certain number of gray levels that may or may not be distinguishable due to noise. The image of neighboring pixels may overlap if the current or intensity distributions overlap, and this will lessen the information passed by the optical system. The result of this is expressed in Eq. (28). Sato (19, see also the article by Sato in (3)] proposed defining a density of information per cm2 by H π D2
ρH =
(29)
where D is a characteristic length for the optical system; for electron beams, D = 0.61λ/θ , and for ion beams, D is the source size (FWHM of the current density distribution) referred to the image plane, that is, the demagnified source size. D represents the resolution of the ideal system, a diffraction-limited electron beam or a source-size-limited ion beam. H can always be calculated if either the optical properties of the system or its current density distribution are known. For an ideal system without aberrations, H will be maximum: H = Ho . Sato then defined the resolution of a real, aberrated system as the length R that makes the density of information passed by an ideal system where H = Ho equal to the density of information passed by the real system: Ho H = (30a) π R2 π D2 or
$ R=D
Ho H
(30b)
R is always larger than D because H is always less than Ho . The reason for this is that the MTF for a real system always lies below the MTF for an ideal system [see (4)]. This approach gives results similar to the resolution defined using the MTF and is in excellent agreement with experiment [see (19)]. Sato’s definition integrates across all spatial frequencies in accounting for the informationpassing capacity of the system. The agreement with experiment implies that the higher spatial frequencies are most important. This definition is particularly well suited to ion beam systems where the noise is a significant issue because specimen erosion limits the signal collection time [see (15)].
100
COLOR IMAGE PROCESSING
The reader is referred to Born and Wolf (4), Fellget and Linfoot (20), Black and Linfoot (22), Linfoot (23), Hopkins (24), Hawkes and Kasper (1), and Sato (3) for complete discussions of resolution and its relation to information transfer in light optics. In closing, we note that the resolution of the best electron microscopes approaches 0.1 nm. The resolution R of the best (far field) light microscopes is about one-half wavelength of the light used, R ∼ λ/2 (the wavelength of green light is about 500 nm). The deBroglie wavelength of a massive particle is λ = h ¯ /p, where h ¯ is Planck’s constant (1.05× 10−34 J-s) and p is the particle momentum (kg m sec−1 ). An electron whose energy is 50 keV has a ˚ and so deBroglie wavelength of about 0.005 nm (0.05 A), if the resolution of the best electron microscope is about ˚ light microscopes are twenty times better 0.1 nm (1 A), than electron microscopes in terms of resolution versus wavelength. Acknowledgments The author thanks Ms. Stephanie Knapp of Intel Corporation for taking the time to read this article and for making numerous suggestions to render it more readable.
COLOR IMAGE PROCESSING KENNETH R. CASTLEMAN Advanced Digital Imaging Research, LLC League City, TX
INTRODUCTION TO COLOR IMAGING The objects in a visual scene differ in brightness and also in the spectral characteristics of the light that emanates from them. This is true whether they generate light themselves, as in astronomy and fluorescence microscopy, or if they merely reflect a portion of the incident illumination. The human eye responds to their overall brightness and also to the spectral content of the emanation. We refer to this perceptual phenomenon as ‘‘color vision.’’ This article addresses color image processing, which is the digital image processing counterpart of color vision.
MONOCHROME (BLACK-AND-WHITE ) IMAGING BIBLIOGRAPHY 1. P. W. Hawkes and E. Kasper, Principles of Electron Optics, (in three volumes), Academic Press, 1989–1996. 2. P. Grivet, Electron Optics, 2nd ed., (in two volumes), Pergamon Press, 1972. 3. J. Orloff, ed., Handbook of Charged Particle Optics, CRC Press, 1996. 4. M. Born and E. Wolfe, Principles of Optics, 6th ed., Pergamon Press, 1980. 5. M. Sato and J. Orloff, J. Vac. Sci. Tech. B9, 2,602 (1991). 6. M. Knoll and E. Ruska, Z. Physik 78, 318 (1932). 7. M. Reiser, Theory and Design of Charged Particle Beams, Wiley-Interscience, 1994. 8. R. H. Fowler and L. W. Nordheim, Proc. R. Soc. A119, 173 (1928). ¨ 9. R. H. Good and E. W. Muller, Handbuch der Physik 21, 176 (1956). 10. L. W. Swanson, J. Vac. Sci. Tech. 12, 1,228 (1975). 11. L. W. Swanson and D. W. Tuggle, Appl. Surf. Sci. 8, 185 (1981). 12. L. W. Swanson and N. A. Martin, J. Appl. Phys. 46, 2,029 (1975). 13. H. Goldstein, Classical Mechanics, Addison-Wesley, 1959. 14. O. Scherzer, Z. Physik 101, 593 (1936). 15. J. Orloff, L. W. Swanson, and M. Utlaut, J. Vac. Sci. Tech. B14, 3,759 (1996). 16. M. Sato, private communication. 17. M. Sato and J. Orloff, Optik 89, 44 (1991). 18. M. Sato and J. Orloff, Optik 89, 82 (1991). 19. M. Sato and J. Orloff, Ultramicroscopy 41, 181 (1992). 20. P. B. Fellget and E. H. Linfoot, Phil. Trans. 247, 46 (1955). 21. C.E Shannon, Bell. Syst. Tech. J. 27, 379 (1948). 22. G. Black and E. H. Linfoot, Proc. R. Soc. A239, 522 (1957). 23. E. H. Linfoot, J. Opt. Soc. Am. 45, 808 (1955). 24. H. H. Hopkins, Proc. R. Soc. A231, 91 (1955).
When digitizing an image, one measures and quantizes the brightness at each pixel. The value obtained (gray level) at each pixel depends upon (1) how brightly the object is lighted, (2) how reflective it is, and (3) the sensitivity of the sensor that makes this measurement. The sensor has a spectral sensitivity characteristic that can be plotted as a spectral sensitivity curve S(λ). In addition, the illumination has a particular spectral power density (SPD) function I(λ), and the reflecting surface also has its own spectral characteristics. Thus, the gray level at position x, y in the digitized image is given by f (x, y) = k
I(λ)R(λ, x, y)S(λ)dλ,
(1)
where λ is wavelength, I(λ) is the SPD of the illumination source, R(x, y, λ) is the reflectance spectrum of the object at the location that maps to (x, y) in the image, S(λ) is the sensitivity spectrum of the sensor, and k is a calibration constant. The integration is taken over the spectral range in which both I(λ) and S(λ) are nonzero. Clearly, one could obtain very different images of the same scene simply by substituting a different light source or sensor. Photographers are well aware of this when selecting the film, lighting, and optical filters for a given assignment. Sunlight and incandescent light, for example, can produce strikingly different blackand-white images, depending on the nature of the photographic subject and the particular film and optical filters employed. The salient characteristic of monochrome digital imaging is that only a single gray-level value is recorded at each pixel location. The spectral characteristics of the
COLOR IMAGE PROCESSING
illumination and the sensor dramatically affect the way a given scene will be rendered in the digital image, but no record of these characteristics is kept in the digitized image. Note that it is impossible to reconstruct the spectral characteristics of the object [i.e., the function R(x, y, λ)] given only the digitized image f (x, y), even if the illumination and sensor spectra are known. Many different R(x, y, λ) functions would yield the same gray level f (x, y). Thus, monochrome imaging inherently loses the spectral information in a scene. MULTISPECTRAL IMAGING
1
101
Green Blue
Rods
Red
0.5
0 350
400
450 500 550 Wavelength (nm)
600
650
One can extend the process of Eq. (1) by measuring more than a single gray level at each pixel location. Then, Eq. (1) becomes (2) fi (x, y) = k I(λ)R(λ, x, y)Si (λ)dλ,
Figure 1. Absorbance spectra of human photoreceptor cells. The dotted line shows the absorbance of the photopigment in the rod cells of the human retina. The solid curves, left to right, show the absorbance of the blue, green, and red photopigments in the cone cells. The plots are scaled to unit value at the peaks.
where 1 ≤ i ≤ N is an index, N is the number of spectral channels used, and Si (λ) is the spectral sensitivity of the ith channel. Then, we obtain an N-element vector of gray levels at each pixel. This is equivalent to recording a series of monochrome images, each taken with a different sensor spectral characteristic. Satellite imaging systems, for example, commonly use sensors that have 24 or more spectrally adjacent bands that collectively cover the range from the infrared into the ultraviolet. The different Si (λ) are commonly chosen to be bandpass filters spaced side by side throughout the range of interest of the electromagnetic spectrum. The resulting digital image can be viewed as three-dimensional, two spatial dimensions and one spectral dimension. Such a digitizing scheme preserves, to some extent, the spectral characteristics of the scene. Indeed, R(x, y, λ) could be reconstructed, to an approximation, from fi (x, y) if I(λ) and the set of Si (λ) are known and reasonably well designed.
photopigments (1) are shown in Fig. 1. The cells that have peak absorption around 420 nm sense what we know as blue light. Those that peak at about 530 nm respond to green light, and those that peak at about 560 nm see red. This is the red/green/blue (‘‘RGB’’) color system. Red, green, and blue are called the ‘‘primary colors’’ of human vision. The human visual system, based on a fundamental three-band design, dictates the basis of what we call ‘‘color imaging.’’ It is a special case of multispectral imaging, namely, the one that nature has bestowed upon Homo sapiens.
COLOR IMAGING The human visual system employs the technique of Eq. (2), where N = 3, to implement perception of the spectral characteristics of a scene. Most species have only monochrome visual systems. Presumably, the early primates accrued survival value from three-channel vision, and now humans perceive the world through tricolor multispectral eyes. Human Vision The light sensing cells on the retina of the human eye come in two types, rods and cones. The rods are more sensitive and give us night vision, but they respond only to brightness and do not see color. The less sensitive cone cells come in three types that differ primarily in their spectral sensitivity. The absorbance spectra of their
Tricolor Systems The choice of a tricolor (N = 3) system, and particularly the spectral sensitivities suggested by Fig. 1 (i.e., the RGB system), are not automatically optimal for the majority of image analysis applications. Indeed, multispectral imaging where N > 3 is required in many applications. Tricolor RGB imaging equipment is manufactured in huge quantities at relatively low cost to serve the imaging needs of humans. Thus, the availability of equipment alone dictates that one at least consider tricolor RGB for any image processing application that cannot be performed with monochrome equipment. Three spectral channels are relatively convenient to accommodate in an imaging problem, although spectral information is thereby quantized rather crudely. In many image analysis applications, it may be unimportant to match the spectral characteristics of the human eye. For example, if one wishes to classify different types of terrain and vegetation in color aerial photographs, the response of the human eye is irrelevant. On the other hand, if one wishes to display on a CRT monitor an image that looks like the original scene, then the perceptual characteristics of the eye are of central importance. Similarly, if one wishes to display
102
COLOR IMAGE PROCESSING
an image so that it appears exactly as it would look when printed by a certain printing process, then both human vision and the printing process must be modeled accurately. In the remainder of this section, we restrict the discussion exclusively to the three-channel RGB case. COLOR IMAGE PROCESSING AND ANALYSIS There are two distinct purposes for which one might use digital color imaging. One is to produce processed color images for human consumption; the other is to analyze color images toward some quantitative goal. These are direct generalizations of monochrome digital image processing and analysis, respectively. The challenge in color image processing is to record an image of a scene using a tricolor system and then use a tricolor display system (e.g., CRT monitor) to present the image so that it has the same colors as the original scene. Normally, the displayed image will not have the same spectral content as the original scene because the phosphors used for display will not match the spectral characteristics of the objects in the original scene. But if the job is well done, it will be perceived that the colors in the displayed image match those in the scene. Here, an accurate model of the human visual system is required. Otherwise, even before any processing, the image will not look natural when displayed. In color image analysis, on the other hand, one seeks to analyze the content of a color image. An example might be to estimate the amount of forest, lake, and cultivated farmland in an aerial color photograph. The result of such an analysis may not ever be displayed as a natural-looking color image. Here, the details of human visual perception are much less important. International Standards A large body of color vision research has been done in connection with the international standardization of television transmission technology. Various standards have been established by the Commission Internationale de l’Eclairage (CIE), an international standards committee for light and color, and by the National Television Standards Committee (NTSC), the Society of Motion Picture and Television Engineers (SMPTE), and the International Television Union (ITU). Many of these standards are based on data from experiments conducted on human observers. Definitions The field of color science has developed a set of strict definitions of the terms used to describe color, particularly in documents published by the organizations mentioned. The same terms, however, are used more freely in digital image processing. Because the focus here is on digital processing of color images, we do not hold rigidly to the terminology of color science. In a color matching experiment, a human observer adjusts the intensity of three primary colors (red, green,
and blue) until it is perceived that their mixture matches a given color. The intensities of the three primaries, relative to those required to match a reference white, form the set of three tristimulus values that specify the color. A color matching experiment requires prior selection of three primary colors and a reference white that corresponds to the chosen illumination source. Additive color matching uses superimposed spots of colored light, but subtractive color matching works with stacks of colored filters. In this section, we are concerned only with additive color matching. It is often conceived that the tristimulus values derived from a color matching experiment specify a location in a three-dimensional color space. Brightness refers to how strongly a lighted or lightemitting object stimulates the visual system. Scaled between ‘‘bright’’ and ‘‘dim,’’ it is the perception of how much light is being radiated. Related terms are intensity, luminance, luma, lightness, and value. Intensity is the total emitted radiant power per unit solid angle. It can be obtained by integrating the SPD of the source across the wavelength spectrum. Luminance, as defined by the CIE, is the emitted radiant power, weighted by the spectral sensitivity function of human vision. This latter curve is called the ‘‘luminous efficiency of the standard observer.’’ Figure 2 shows the standard scotopic (low light level) luminosity function (2). Based on measurements by Wald (3,4) and by Crawford (5), it was adopted by the CIE in 1951 (6). It peaks at 555 nm and agrees roughly with the absorbance curve of the rod photopigment in Fig. 1. One can integrate the product of the ‘‘luminosity function’’ (Fig. 2) and an SPD to obtain a numerical luminance value, usually denoted as Y. This quantifies the perceptual strength of a color. One can also calculate the luminance of a color as a weighted sum of its RGB tristimulus values, as we shall see later. The human eye has a nonlinear response to light intensity. One must reduce the intensity of a light source to about 18% for it to appear half as bright. Lightness models this behavior and measures brightness relative to a reference white. Usually denoted L∗ , it is proportional to
1
0.75
0.5
0.25
400
450
500 550 600 Wavelength (nm)
650
700
Figure 2. The luminosity function of human vision. This curve shows the sensitivity of the human eye to light, as a function of wavelength, normalized to unity at the peak.
COLOR IMAGE PROCESSING
the cube root of the luminance ratio of the test source over the reference white. Luma, usually denoted Y , is computed from RGB tristimulus values in a manner similar to luminance, except that nonlinear (gamma-corrected) values of R, G, and B are used. It is similar to lightness, except that the nonlinear transformation is done before averaging the RGB components. Value is a scale of ten perceptually equal steps of brightness developed by Munsell (see later). The hue of a color refers to the spectral color (e.g., a color of the rainbow) to which it is perceptually closest. The scale of hues is circular and runs first through the rainbow from red through orange, yellow, green, and cyan to blue, then through the nonspectral (magenta or purple) colors and back to red. The hue of a light source usually corresponds to the wavelength at which its SPD peaks. Saturation is a measure of how strongly colored (colorful) a light source is. Vividly colored lights, like the spectral colors, are highly saturated. Pastel colors have only moderate saturation. Neutral colors (grays) have zero saturation. The related term chroma is sometimes used. The concept of saturation can be illustrated as follows. A bucket of bright red paint would correspond to a hue of 0° and saturation 1. Mixing in white paint makes the redness less vivid and reduces its saturation, but does not change its hue. Pink corresponds to a saturation of 0.5 or so. As more white is added to the mixture, the red becomes paler, the saturation decreases and eventually approaches zero saturation (white). On the other hand, if one mixed black paint with the bright red, its intensity would decrease (toward black), and its hue (red) and saturation (1.0) would remain constant. Chrominance refers to the colorfulness of a source, independent of intensity. Hue and saturation together make up chrominance.
103
The tungsten filament of a light bulb, for example, is a blackbody radiator. The spectral power distribution of a blackbody radiator is given by Planck’s law, E(λ) =
c1 λ ec2 /λT
−5
−1
,
(3)
where T is temperature in degrees Kelvin and E is energy in ergs per second per nanometer of bandwidth per square centimeter of radiating surface. The constants are c1 = 3.7415 × 1023 and c2 = 1.4388 × 107 (7). Materials at about 5000 K emit light whose SPD is reasonably flat across the visible spectrum. Cooler objects emit reddish light, and hotter objects appear bluish. Figure 3 shows the spectral energy distribution of black bodies at three different temperatures. Although warmer bodies emit more power per unit surface area, the plots in Fig. 3 are normalized so that they are equal at 550 nm. Color Temperature Any reasonably white light source can be specified by its ‘‘color temperature,’’ that is, the black body temperature to which it is perceptually closest. Ordinary tungstenfilament lamps have a color temperature of about 2400 K. Those designed for photography operate at color temperatures up to about 3200 K. Sunlight on a clear day has a color temperature of about 5000 K. Blue sky has a color temperature of about 20,000 K. Color temperatures between 5000 and 5500 K appear white to most people. Lower color temperatures appear yellowish, and higher temperatures tend to look bluish, although the eye tends to adapt so as to perceive the ambient illumination as white. CIE Standard Illuminants In 1931, the CIE established definitions of, and published SPDs for, standard illuminants that correspond to various artificial and sunlight sources. CIE Illuminant A is
COLORIMETRY 2 Blackbody radiators
1.5 6500 K SPD
The science of quantifying perceived color is called colorimetry. Here, one seeks to specify a set of tristimulus values that correspond uniquely to a particular perceived color. Before this can be done, however, one must establish the spectral characteristics of the three primary colors and of the illumination, which is called the reference white. Then, color matching is used to determine the amount of each primary color that, when mixed together, will reproduce a color sensation indistinguishable from that of the test color. Now, we address ways one can quantitatively specify a color, such as that of a pixel in a color digital image.
1
5500 K
0.5
3200 K
Illuminants Blackbody Radiators A dense incandescent material radiates electromagnetic energy. As its temperature changes, so do the intensity and spectral characteristics of the emitted energy. At high enough temperatures, the radiation contains visible light.
350
400
450
500 550 600 Wavelength (nm)
650
700
750
Figure 3. Spectral energy distribution of incandescent black bodies at three temperatures. Radiant energy per unit of radiating surface area, per unit of wavelength, normalized to unity at 550 nm.
104
COLOR IMAGE PROCESSING
2 White reference
A
SPD
1.5
C
Test color
B
1
D65 0.5
0 350
400
450
500 550 600 Wavelength (nm)
650
700
750
Figure 4. SPDs for four CIE standard illuminants.
intended to represent tungsten light. It is a blackbody radiator whose SPD is calculated from Eq. 3 using a color temperature of 2856 K. CIE Illuminant B is intended to represent direct sunlight. It has a color temperature of approximately 4874 K. CIE Illuminant C is intended to represent average daylight. It has a color temperature of approximately 6774 K. CIE Illuminant D65 is now regarded as the main standard for daylight illumination. It has a color temperature of approximately 6504 K. SPDs for these four illuminants, plotted in Fig. 4, are scaled so that they are equal at 550 nm. Primaries
Figure 5. Color matching experiment configuration. The intensities of the overlapping primaries are adjusted to match first the reference white, then the test color.
Color Matching Experiments In a color matching experiment, a spot of light of an arbitrary test color is projected on a uniformly diffusing (white) surface, that is, one that reflects all wavelengths equally in all directions. A spot of reference white light is also projected on the same surface (Fig. 5). Finally, three spots of primary-colored light are superimposed on the same surface. If A1 (C), A2 (C), and A3 (C) are the primary intensities required to make the color of the overlapped spot match that of the test color and A1 (W), A2 (W), and A3 (W) are the primary intensities required to match the reference white, then the tristimulus values that specify the test color are
A color matching experiment requires selecting red, green and blue primaries, which are then mixed in proper proportions to match a specific color. Thus, the tristimulus values obtained depend on the specific choice of primaries.
T1 (C) =
A1 (C) , A1 (W)
T2 (C) =
A2 (C) , A2 (W)
T3 (C) =
A3 (C) . A3 (W)
Monochromatic Primaries In 1931, the CIE adopted a set of three monochromatic light sources as primary illuminants for use in color matching experiments (8). Their wavelengths were red: 700 nm, green: 546.1 nm, and blue: 435.8 nm. Phosphor Primaries Color images are often displayed on CRT tubes that generate a mixture of light from red, green, and blue phosphors, so it is natural to use the SPDs of those phosphors as primaries in color matching experiments. The NTSC primaries of 1953 were standardized in ITU Recommendation BT.601 (9). The luminance resulting from these primaries is given by Y601 = 0.299RN + 0.587GN + 0.114BN .
(4)
Primaries consistent with contemporary CRT phosphors were standardized in ITU-R Recommendation BT.709 (10). The luminance resulting from these primaries is given by Y709 = 0.2125RN + 0.7154GN + 0.0721BN .
(5)
and
(6)
It may not be possible to match the test color with any mixture of the three primary colors. In that case, one matches a mixture of two of the primaries to a mixture of the test color and the third primary. Again, all three primary intensities are adjusted to produce a perceptual match. In this case, the tristimulus values are T1 (C) =
−A1 (C) , A1 (W)
T2 (C) =
A2 (C) , A2 (W)
T3 (C) =
A3 (C) , A3 (W)
and (7)
where the first primary is that mixed with the test color. If this cannot produce a match, even after all three primaries have been used with the test color, one attempts to match one primary with a mixture of the test color and the other
COLOR IMAGE PROCESSING
two primaries. In this case,
105
White
T1 (C) =
−A1 (C) , A1 (W)
T2 (C) =
−A2 (C) , A2 (W)
T3 (C) =
A3 (C) , A3 (W)
B
Blue
Cyan
Magenta
and G
(8)
Yellow
Green
where the first and second primaries are mixed with the test color. Notice that the latter two cases produce negative tristimulus values for colors that cannot be matched by a mixture of the three primaries. The Laws of Color Matching Grassman (11) has advanced a set of axioms that describe color matching phenomena. These lead to eight principles of color matching: 1. Any color can be matched by a properly balanced mixture of no more than three colored lights (primaries), provided that none of the primaries can be matched by a mixture of the other two. 2. Color matches are relatively independent of intensity level. 3. The human eye is unable to decompose a mixture into its primary components. 4. The luminance of a mixture is equal to the sum of the luminances of the components. 5. If color A matches color B and color C matches color D, then when color A is mixed with color C, it will match the mixture of color B with color D. 6. If the mixture of color A with color B matches the mixture of color C with color D and color A matches color C, then color B matches color D. 7. If color A matches color B and color B matches color C, then color A matches color C. 8. Either color A can be matched by a mixture of colors B, C, and D; a mixture of colors A and B can be matched by a mixture of colors C and D; or a mixture of colors A, B, and C can be matched by color D. COLOR COORDINATE SYSTEMS Several systems have been developed for specifying color. They derive mainly from the large body of work that has been done on the international standardization of television transmission methodology by standards organizations. A few of the more important color systems are listed here. RGB Color Spaces The most straightforward way to specify color is to use red, green, and blue tristimulus values, scaled between, for example, zero and one. Each pixel can be represented
Black
Red
R
Figure 6. Rectangular RGB color space.
by a point in the first quadrant of three-space, as shown by the color cube in Fig. 6. The origin of the RGB color space represents no intensity of any of the primary colors and thus corresponds to the color black. Full intensity of all three primaries together appears as white at the opposite corner of the cube. Equal amounts of the three color components at lesser intensity produces a shade of gray. The locus of all such points falls on the diagonal of the color cube, called the gray line. Three of the corners of the color cube correspond to the primary colors, red, green, and blue. The remaining three corners correspond to the secondary colors, yellow, cyan (blue-green), and magenta (purple). Each of the secondaries is a combination of two primary colors. The gray-level histogram of an RGB image is a scatter of points in three-dimensional RGB-space. Figure 7 shows a color image and its R, G, and B components. The Rc Gc Bc Coordinate System In 1931, the CIE developed a color coordinate system (8) based on color matching experiment data obtained by Guild (12) and by Wright (13). The CIE 1931 system forms the basis for much of modern colorimetry. It uses the three 1931 CIE monochromatic primaries mentioned before (14,15). The reference white is an illuminator that has constant energy throughout the visible spectrum. Figure 8 shows the tristimulus values required to match the spectral (monochromatic) colors (2,16). These curves were determined experimentally using several normal human subjects, and they represent the perception of the CIE Standard Observer. Note that, for this set of primaries, certain of the tristimulus values are negative at some wavelengths. The RN GN BN and RS GS BS Coordinate Systems The NTSC developed the RN GN BN colorimetry system using as primaries three phosphors that were typical of those used in commercial color television in 1953 (17). This was codified in ITU Recommendation BT.601 (9). The reference white is CIE source C, which approximates the light from an overcast sky (18). The NTSC tristimulus values for a particular color can be obtained from the CIE
106
COLOR IMAGE PROCESSING
Figure 7. Linear color space: Upper left: color image; upper right: red; Lower left: green; Lower right: blue component. See color insert.
values by the linear transformation R N
GN BN
=
0.842 0.156 −0.129 1.320 0.008 −0.069
0.091 RC GC . −0.203 BC 0.897
(9)
again formed by summing the RN , GN , and BN values. The chrominance (color difference) signals U and V that contain the color information are ignored by black-andwhite television receivers. The YUV coordinates of a color can be obtained from its NTSC components by the linear relationship
Later the SMPTE developed the RS GS BS system that better matches CRT phosphors in use today (19). It is codified in ITU Recommendation BT.709 (10). These two RGB systems are linearly related by R S
GS BS
1.609 −0.447 −0.104 R N = −0.058 GN . 0.977 0.051 BN −0.025 −0.037 1.162
(10)
Color Difference Systems For color television transmission to remain compatible with existing black-and-white receivers, it was necessary to establish color standards that (1) continue to transmit the luminance signal as before and (2) encode the color information so that it would not affect existing monochrome receivers. The RGB signals were combined to form the luminance signal [e.g., as in Eq. (4)], and two new signals were designed to carry the color information. These were color difference signals (B–Y and R–Y) from which R, G, and B could be recovered by suitable processing. The YUV and YIQ Coordinate Systems The PAL and SECAM color television broadcast systems used in many countries (20) transmit three color component signals called Y, U, and V. Y is the luminance signal,
Y U V
=
0.299 −0.148 0.615
0.587 0.114 RN GN . −0.289 0.437 BN −0.515 −0.100
(11)
The NTSC color broadcasting system used in the United States employs three color component signals called Y, I, and Q. Y is again the luminance signal, but I and Q are chrominance signals obtained by rotating the U and V signals by 33° . They are given by I = −U sin(33° ) + V cos(33° ) and Q = U cos(33° ) + V sin(33° ).
(12)
The I and Q signals can be subjected to more bandwidth reduction than the U and V signals without visibly degrading the transmitted image (18). The YIQ signals can be obtained directly from the NTSC components by the linear relationship Y I Q
0.299 =
0.596 0.211
0.587 0.114 RN GN . −0.274 −0.322 BN −0.523 0.312
(13)
COLOR IMAGE PROCESSING
tristimulus values by the linear relationship
3
X Y Z
2
CMF
107
0.607 =
0.299 0.000
0.174 0.587 0.066
0.200 RN GN . 0.114 BN 1.116
(14)
Note that the middle row of the matrix contains the luminance weighting coefficients from Eq. (4). Figure 10 shows an image and its X, Y, and Z components. The XYZ values can be normalized to sum to one to form the chromaticity coordinates
1
400
450
500 550 600 Wavelength (nm)
650
700
X , X +Y +Z Y y= , X +Y +Z
x=
−1 Figure 8. CIE 1931 tristimulus values. These curves show how much of each primary is required to match monochromatic colors of different wavelengths.
and z=
Figure 9 shows an image and its Y, I, and Q components. The XYZ Coordinate System The negative-going tristimulus values of the 1931 CIE system (Fig. 8) create problems in television transmission, where it is more convenient to encode positive signals. The CIE developed a color system in which the tristimulus values (X, Y, Z) are always positive. It uses three artificial primaries and an equal-energy reference white. It is designed so that the Y component is the luminance. The XYZ tristimulus values can be obtained from the NTSC
Z . X +Y +Z
(15)
Then, the chrominance of the color, independent of intensity, can be shown in an x, y chromaticity diagram (Fig. 11). The chromaticity coordinates of the D65 illuminant, for example, are [x, y] = [0.3127, 0.3290]. Table 1 shows the chromaticity coordinates of the BT.709 primaries. The XYZ and x, y systems are now the most commonly used color specification systems. Cube Root Systems It is desirable that perceptually equal changes in color (hue, saturation, and intensity) should produce
Figure 9. YIQ color difference space: Upper left: color image; Upper right: Y; Lower left: I; Lower right: Q component. See color insert.
108
COLOR IMAGE PROCESSING
Figure 10. XYZ color space, Upper left: color image; Upper right: X; Lower left: Y; Lower right: Z component. See color insert.
Table 1. Chromaticity Coordinates of the BT.709 Primaries
0.8 550
0.7
Red Green Blue
0.6 500
x
y
0.640 0.300 0.150
0.330 0.600 0.060
y
0.5 3,000
0.4 5,000
4,000 6,000
490
0.3
600
2,000
The UVW and U ∗ V ∗ W ∗ Coordinate Systems 650
10,000 0.2 480 0.1
470 400
0 0
0.1
0.2
0.3
0.4 x
0.5
0.6
0.7
0.8
Figure 11. The x, y chromaticity diagram. The horseshoe shows where the spectral colors (wavelengths in nm) fall in normalized x, y space. The horizontal curve shows where different color temperatures (K) fall.
equal changes in color coordinates. Several color spaces that have been developed for this use the cube root function, which approximates the sensitivity of the human eye.
The C.I.E. UVW system, introduced in 1960, was designed so that just noticeable changes in hue and saturation produce equal changes in color coordinates (14,21). The U ∗ V ∗ W ∗ system, adopted in 1964, was an attempt to make just noticeable changes in both luminance and chrominance produce equal changes in color coordinates (14,22). They were both rendered obsolete by the 1976 introduction of the more accurate L∗ a∗ b∗ and L∗ u∗ v∗ systems. The L∗ a∗ b∗ Coordinate System The L∗ a∗ b∗ system (14,23) was designed to agree more closely with the Munsell color system (24,25). In this system, L∗ specifies the lightness, and a∗ and b∗ specify chrominance. The L∗ a∗ b∗ color coordinates are related to the XYZ system by 1/3 Y − 16, L∗ = 116 Yn
COLOR IMAGE PROCESSING
109
Figure 12. The L∗ u∗ v∗ cube root space: Upper left: color image; Upper right: L∗ , Lower left: u∗ , Lower right: v∗ component. See color insert.
a∗ = 500 and
∗
b = 200
X X0
X X0
1/3
−
1/3
−
Y Y0
Z Z0
1/3
and Xn , Yn , Zn corresponds to the reference white. Figure 12 shows L∗ u∗ v∗ component images when the D65 illuminant is used for the reference white.
, 1/3
Cylindrical Color Spaces ,
(16)
where Xn , Yn , and Zn are the coordinates of the reference white and Y/Yn > 0.008856. The L∗ u ∗ v ∗ Coordinate System The L∗ u∗ v∗ system (14,26), another cube root system, was introduced by the CIE in 1976. The color coordinates are related to the XYZ system by 1/3 Y − 16, L∗ = 116 Yn u∗ = 13L∗ (u − u0 ), and v∗ = 13L∗ (v − v0 ),
Separating Luminance and Chrominance The three types of color spaces mentioned before (linear RGB, color difference, and cube root spaces) do not truly separate luminance (brightness) from chrominance (hue and saturation). In color image processing, it is often necessary, for example, to segment the image into regions that correspond to different objects, such as forest and cropland in an aerial photograph. These regions differ in luminance, hue, and saturation due to the different reflective properties of the vegetation. They will also differ in luminance due to variations in illumination, such as shadows from clouds. Thus luminance, like R, G, and B, is affected by nonuniform illumination, but hue and saturation, to a first approximation, are not. Thus, it is often useful to use a color space in color image processing wherein hue and saturation are truly independent of luminance.
(17) The Color Circle
where 4X 9Y , v = , (18) X + 15Y + 3Z X + 15Y + 3Z 4Xn 9Yn , v0 = , (19) u0 = Xn + 15Yn + 3Zn Xn + 15Yn + 3Zn u =
Because the hue scale is periodic, artists have used the concept of a ‘‘color circle’’ for centuries. The fully saturated colors of the rainbow (the spectral colors) are spaced around the perimeter of the circle in the sequence red, yellow, green, cyan, blue, purple, and back to red (Fig. 13). Note that the rainbow extends only from red through
110
COLOR IMAGE PROCESSING
The Ostwald System
120° Green
Yellow
Sat Hue
Cyan
0° Red
About 100 years ago, Wilhelm Ostwald advanced a threedimensional generalization of the color circle (27). He placed a perpendicular gray axis through the center of the hue circle to form a double cone solid (Fig. 14). White fell at the upper apex, and black at the lower apex. The hues, numbered 1 through 24, are arranged around the periphery. The fully saturated colors fall on the perimeter of the circle where the two cones intersect. Adding white to a paint color causes it to move toward the upper apex, and adding black moves it toward the lower apex. The Munsell System
Blue 240°
Magenta
Figure 13. The color circle. Hue is an angle, and saturation is a vector length.
blue. Between blue and red fall the nonspectral (purple) colors that the eye perceives when red- and blue-sensitive cones, but not the green-sensitive ones, are stimulated simultaneously. This is not realizable with monochromatic light, but it does occur in nature. The colors on the color circle become less saturated (paler, more pastel) toward the center, but still retain their hue. The color circle, which has long proved useful in illustrating similarities and differences among colors, forms the basis for cylindrical color spaces.
22
Hue 23 24 1
The color space developed by H. A. Munsell (24,25) at about the same time as Ostwald’s is a cylindrical coordinate system, wherein ten hues are spaced around the color circle and nine ‘‘values’’ (intensities) cover the range from black to white along the vertical axis. Munsell established scales of ‘‘chroma’’ (saturation) and, for each hue, illustrated different chroma/value combinations with painted chips (Fig. 15). The hue, value, and chroma scales were designed to have perceptually equal steps. The Munsell system, like Ostwald’s, is a threedimensional generalization of the color circle, and it parallels human vision rather well. The Munsell system has retained its importance, in part, because the company founded by Munsell has supplied color matching books since 1904. Due to the availability of better pigments, the number of hues in the Munsell system has increased to forty. Since 1934, the CIE has used Munsell colors in color matching experiments to establish a correspondence with CIE color coordinates. Cylindrical Coordinates
2 3
21
4
20 19 18
5 6
17
7
Intensity. For quantitative purposes, the three color coordinates, hue, saturation, and intensity, define a cylindrical color space, shown in Fig. 16. Intensity specifies
8
16 9
15 14
13 12 11 White
10 White
Full color
Black
Black
Figure 14. The Ostwald color system. The spectral colors lie on the perimeter of the solid.
Figure 15. The Munsell color system. For each hue, painted chips show the various combinations of value and chroma. (photo provided courtesy of Munsell Color, division of GretagMacbeth). See color insert.
COLOR IMAGE PROCESSING
White
Green
Red
111
The effects of hue, saturation, and intensity are illustrated by the color test patterns in Fig. 17, where hue varies with angle from zero to 360° . In the left image, both saturation and intensity are constant at unity. In the middle image, saturation goes to zero at the center as the colors fade to gray. In the right-hand image, intensity goes to zero at the center, as the colors fade to black.
Blue Hue
Intensity
120° Saturation 0° Black 240° Figure 16. Cylindrical color space. Hue is an angle, saturation is a vector length, and intensity is the vertical axis.
the overall brightness of a color, without regard to its spectral wavelength, and it is the vertical axis of cylindrical color space. Different intensities (the gray shades) fall along the vertical axis from black at the bottom to white at the top. In this discussion, we depart somewhat from the strict definition of intensity put forth by the CIE and used routinely in color science. Instead we use a more flexible definition, as is common practice in digital image processing.
Hue. The two chrominance parameters, hue and saturation, are illustrated by the color circle in Fig. 16. Hue is an angle, and saturation is a vector length in the hue-saturation plane. Arbitrarily, a hue of 0° is red, 120° is green, and 240° is blue. Hue traverses the colors of the visible spectrum as it goes from zero to 240 degrees. The nonspectral colors fall between 240° and 360° .
Color Coordinate Conversion For image processing, some techniques are naturally more successful when carried out in one color coordinate system or another. Thus, it is useful to be able to convert between rectangular (RGB) and cylindrical color coordinates. We present two alternative cylindrical systems that are useful for color image processing, even though the hue, saturation, and intensity values obtained are not the same as those used in color science. The IHS Coordinate System Several formulations for HSI spaces have been proposed and used. One of the most straightforward is called the IHS (intensity, hue, and saturation) system (7). The conversion from NTSC RGB to IHS color coordinates is given by R I 1/3 1/3 1/3 N √ √ √ GN V1 = −1/√ 6 −1/√6 2/ 6 V2 0 BN 1/ 6 −2/ 6
(20)
and H = tan−1 S=
V2 V1
V12 + V22 .
(21)
In this formulation, blue is the reference for hue. The inverse formulation is V1 = S cos(H),
Saturation. The saturation parameter is the radius of the point from the origin of the color circle. The ‘‘vivid’’ (saturated) colors fall around the periphery of the cylinder, and their saturation values are unity. At the center lie neutral (gray) shades, those with zero saturation. In between lie pastel shades. The fully bright, fully saturated colors fall on the perimeter of the circular top surface.
V2 = S sin(H),
(22)
and R N
GN BN
4/3 −2√6/9 √6/3 I √ √ = 2/3 −√ 6/9 − 6/3 V1 . 6/3 0 V2 1
(23)
Figure 17. Color charts: Hue varies with angle according to Fig. 13. Left: constant saturation and intensity. Middle: constant intensity, saturation goes to zero at the center. Right: constant saturation, intensity goes to zero at the center. See color insert.
112
COLOR IMAGE PROCESSING
Figure 18. Cylindrical IHS color space: Upper left: color image; Upper right: I; Lower left: H; Lower right: S component. In this image, blue is the reference for zero hue. See color insert.
Table 2. RGB, IHS, and HSI Values for Primary, Secondary, and Neutral Colors Color Name Black Gray White Red Yellow Green Cyan Blue Magenta
IHS System
HSI System
R
G
B
I
H
S
H
S
I
0 0.5 1.0 1.0 0.5 0 0 0 0.5
0 0.5 1.0 0 0.5 1.0 0.5 0 0
0 0.5 1.0 0 0 0 0.5 1.0 0.5
0 0.5 1.0 0.333 0.333 0.333 0.333 0.333 0.333
270° 270° 270° 135° 207° 243° 297° 0° 45°
0 0.204 0.408 0.577 0.455 0.912 0.456 0.816 0.290
45 45 45 0 60 120 180 240 300
0 0 0 1 1 1 1 1 1
0 0.872 1.743 0.588 0.588 0.588 0.588 0.588 0.588
The transformations between this system and RGB space are relatively simple to compute, and intensity and hue are well behaved (Fig. 18). Saturation, however, is intensity-dependent, and the primary colors do not possess unity saturation, as one would hope. Table 2 illustrates this. Thus, this IHS space is less desirable for color image processing than a truly cylindrical color space might be.
HSI Format A more useful cylindrical color specification system is one we call the ‘‘HSI format.’’ Its saturation is intensityindependent, and it also offers other advantages for image processing. The conversion to and from RGB space, however, is a bit more complex than for the IHS system.
RGB to HSI Conversion The conversion from RGB to HSI format can be approached as follows (28–30). Recall that the gray line is the diagonal of the color cube in RGB space, and it is the vertical axis in cylindrical HSI space. Thus we can begin by establishing an (x, y, z) coordinate system in which the RGB cube is rotated so that its diagonal lies along the z axis and its R axis lies in the x–z plane (Fig. 19). This is given by 1 x = √ [2R − G − B], 6 1 y = √ [G − B], 2
COLOR IMAGE PROCESSING
z
(a)
113
1
Rotation
G
W′ r
y
f
W
G′
R
x
−1
1
B
B
R′
B′
−1
G
R
(b)
1
x
Figure 19. Rotating the RGB cube. This is a step in the development of the conversion from RGB to HSI coordinates.
S y
H
−1
and 1 z = √ [R + G + B]. 3
x
(24)
Next we convert to cylindrical coordinates by defining polar coordinates in the x, y plane, ρ=
(25)
where (x, y) is the angle that a line from the origin to the point (x, y) makes with the x axis. This is basically the arc tangent function, but attention is paid to the quadrant in which the point falls. Now, we have cylindrical coordinates, where (φ, ρ, z) correspond to (H, S, I), but there are two problems with saturation. It is not independent of intensity, as we would like, and the fully saturated colors (those that have no more than two of the primary colors present) fall on a hexagon in the x, y plane (Fig. 20a) rather than on a circle. The remedy is to normalize ρ by dividing by its maximum for that value of φ. This leads to the saturation formula √ 3 3 min(R, G, B) S= =1− =1− min(R, G, B) ρmax R+G+B I (26) This places all the fully saturated colors on a unit-radius circle in the x, y plane (Figure 20b). There are at least three ways to compute hue. One can use Eq. (24) and take φ in Eq. (25) as the hue. An alternative expression for y, which yields results similar to those of Eq. (24) but provides hue values that are more stable in neutral gray (i.e., small S) areas, is 1 y = √ [G − 2B]. As a third alternative, one can compute 6 ρ
−1 Figure 20. The x, y plane of color space: (a) unnormalized polar coordinates; (b) normalized saturation.
x2 + y2 ,
φ = (x, y),
1
the angle −1
θ = cos
1 [(R 2
− G) + (R − B)]
(R − G)2 + (R − B)(G − B)
,
(27)
after which the hue is H=
θ, G≥B . 2π − θ, G ≤ B
(28)
Figure 21 shows the HSI components of the Lenna image. Notice how different saturation is from that in the IHS system (Fig. 18). Hue differs only in that it is referenced to red at 0° .
HSI to RGB Conversion The formulas for converting from HSI to RGB take on slightly different form, depending upon which sector of the color circle the point falls (28–30). Assuming that Eqs. (24)–(26) have been used for the RGB to HSI conversion, the reverse conversion proceeds as follows. For 0° ≤ H < 120° ,
S cos(H) I , R= √ 1+ cos(60° − H) 3 I B = √ (1 − S), 3
114
COLOR IMAGE PROCESSING
Figure 21. The HSI cylindrical color space: Upper left: color image; Upper right: H; Lower left: S; Lower right: I component. Here, red is the reference for zero hue. See color insert.
and G=
√
COLOR CORRECTION 3I − R − B,
(29)
but for 120° ≤ H < 240° ,
S cos(H − 120° ) I , G= √ 1+ cos(180° − H) 3 I R = √ (1 − S), 3 and, B=
Color Balance
√ 3I − R − G,
(30)
and for 240° ≤ H < 360° ,
S cos(H − 240° ) I , B= √ 1+ cos(300° − H) 3 I G = √ (1 − S), 3 and R=
√ 3I − G − B.
Sometimes, a digitized color image is obtained without the proper balance between the red, green, and blue components, or the RGB values may not truly represent the characteristics of the scene. Then, it may be necessary to correct color before processing or display.
(31)
There are several more variations of HSI color spaces (14,31). From the viewpoint of color image processing, the specific choice may not materially affect the result, as long as hue is an angle, hue and saturation are independent of intensity, and the transformation is accurately invertible.
Different sensitivities, offsets, gain factors, etc., in the three color channels perform linear transformations on the three color components during digitizing. The result of this can be a color image where its primary colors are ‘‘out of balance.’’ All of the objects in the scene are shifted in color from the way they should appear. Most noticeably, objects that should be gray take on unwarranted color. The first test of color balance is whether all of the gray objects appear gray. The second test is whether the highly saturated colors have the proper hue. If the image has a prominent black or white background, this will produce a discernible peak in the gray-level histograms of the RGB component images. If these peaks occur at different gray levels in the three channels, this signals a color imbalance. The remedy for color imbalance of this type is to use linear gray-scale transformations on the individual R, G, and B images to give equal gray levels to gray areas and thus bring about color balance. Normally, only two of the component images need to be transformed to match the third. The simplest way to design the required grayscale transformation function is (1) to select relatively uniform light gray and dark gray areas of the image, (2) to
COLOR IMAGE PROCESSING Pixel count 1200
Pixel count 600
Red
1000
500
800
400
600
300
400
200
200
100
Green
Pixel count 1000
0
60
Pixel count 1000
120 180 Gray level
600 400 200 0
240
Red
0
60
Pixel count
800 600
Blue
800
0
0
115
120 180 Gray level
0
240
Green
60
120 180 Gray level
Pixel count
600
600
400
400
200
200
240
Blue
400 200 0
0 0
60
120 180 Gray level
240
0 0
60
120 180 Gray level
240
0
60
120 180 Gray level
240
Figure 22. Color balance example. Top row: unbalanced image and RGB histograms. Bottom row: balanced image and RGB histograms. This exercise assumes that the (white) hatband and the (black) mirror frame are neutral colors. See color insert.
compute the mean gray level of both areas in all three component images, and then (3) to use a linear gray-scale transformation on two of them to make them match the third. If each of the two areas has the same gray level in all three component images, color balance has been effected, at least to a first approximation. Figure 22, at the upper left, shows the Lenna picture as it is available on the Internet, and gray-level histograms of its RGB components are to the right. The leftmost peak in each histogram corresponds to the mirror frame on the right side of the picture. The average gray levels there are approximately [R,G,B] = [95, 22, 61]. To effect color balance, we took both the mirror frame and the hatband as neutral colors. First, constants were subtracted from the red and blue components to make their leftmost peaks match that of the green component. Then, the three component images were multiplied by constants designed to place the hatband at an average
255
gray level of approximately 240. The resulting histograms are seen in the lower row of Fig. 22, and the color-balanced image appears at the lower left. More accurate color balance can be obtained if a gray step-wedge test target is present in the image, and nonlinear gray-scale transformations are used to give each step of the wedge the same gray level in each of the three color component images. Figure 23 illustrates the process. If the camera and digitizing system are stable, the required transformations can be determined before the digitizing session simply by digitizing a black-and-white step-wedge test target. Then, all images digitized during that session can be balanced by using the same set of transformations. Proper color balance is important when converting from RGB to cylindrical color spaces. Because hue is computed as an arc tangent [Eqs. (21), (25)], it is quite susceptible to color imbalance, especially in low-intensity (dark) areas. Even slight color imbalance can result in
255
Red 128
128
Red
Blue
Blue
0
0
x
x
Figure 23. Nonlinear color balance illustration. RGB gray level plots along a line that cuts through a step-wedge target. Left: Before color balancing. Right: After color balancing.
116
COLOR IMAGE PROCESSING
assigning inappropriate hue values and excess saturation to the dark regions of an image. COLOR IMAGE CALIBRATION
Color Compensation Example
Color Compensation In some applications, the goal is to isolate different types of objects that differ primarily, or exclusively, in chrominance. In fluorescence microscopy, for example, different constituents of a biological specimen (e.g., different cell structures) are stained with differently colored fluorescent dyes. The analysis often involves being able to visualize these objects separately but in correct spatial relationship to one another. If the preparative procedure stains three chemical components of the specimen, for example, using red, green, and blue fluorescent dyes, one can digitize and display this as a normal tricolor image. Then, the RGB component images, form a set of registered monochrome images, each of which shows objects of a specific type. This paves the way for image segmentation and object measurement by standard monochrome image processing techniques. Given the broad and overlapping sensitivity spectra of commonly used color image digitizers and the varied emission spectra of available fluorescent dyes (fluorophores or fluorochromes), one seldom completely isolates the three types of objects in the three component images. Normally, each type of object will be visible in all three of the color component images, although at reduced contrast in two of them. We refer to this phenomenon as ‘‘color spreading.’’ We can model color spreading as a linear transformation (32,33). Let the matrix C specify how the three colors are spread among the three channels. Each element cij is the proportion of the intensity of fluorophore j that appears in color channel i of the digitized image. Let x be the three-element vector of actual fluorophore intensities at a particular pixel, scaled as gray levels that would be produced by an ideal digitizer (one with no color spread and no black level offset). Then, y = Cx + b
Figure 24 shows an RGB image of human bone marrow cells that have been digitized by a color television camera mounted on a fluorescence microscope. In this preparation, all cells have been stained with DAPI, a blue fluorescent dye. Cells that are in the process of dividing also absorb FITC, a green fluorescent dye. Finally, the DNA located at the centromeres of the two number 8 chromosomes is labeled with Texas Red, a red fluorescent dye. Ideally, all of the cells would be visible in the blue channel, dividing cells in the green channel as well, and two dots per cell, corresponding to the two number 8 chromosomes, would appear in the red channel. In Fig. 24, however, all cellular components appear in all channels due to overlapping sensitivity spectra of the image digitizer. The color spread matrix for the instrument that recorded this image appears in Table 3. It states, for example, that only 44% of a DAPI molecule’s intensity
(a)
(32)
is the vector of RGB gray levels recorded at that pixel by the digitizer. The matrix C accounts for the color spread, and the vector b accounts for the black level offset of the digitizer, that is, bi is the gray level that corresponds to black (zero intensity) in channel i. This is easily solved for the true intensity values by x = C−1 [y − b].
matrix was determined. The color compensation matrix can be modified to account for differences in exposure time among channels (33).
(b)
(33)
Thus, color spreading can be eliminated by premultiplying the RGB gray-level vector for each pixel by the inverse of the color spread matrix, after the black level has been subtracted from each channel. This development assumes that gray levels are linear with intensity. For some cameras, a gray-scale transformation of the RGB components might be required to establish this condition before color compensation. The foregoing analysis assumes that the exposure time in each channel is the same as it was when the color spread
Figure 24. Three-color fluorescence microscopy of cells: (a) color; (b) red; (c) green; (d) blue components. See color insert.
COLOR IMAGE PROCESSING
117
compensation is given by
(c)
1.24 0.05 x = C y = −0.45 1.69 −0.35 −1.26 −1
−0.29 R −0.24 G 2.61 B
(36)
Thus, to correct the red channel image, at each pixel, one would, take 124% of the gray level in the red channel image, add 5% of the green channel value, and subtract 29% of the blue. The second and third rows, likewise, specify how to correct the green and blue channel images. Figure 25 shows the result of color compensation applied to the image in Fig. 24. Here, the three differently labeled types of objects have been effectively isolated to the RGB component images. This makes image segmentation and measurement a much simpler task. Color compensation also increases the saturation of the displayed color image (Fig. 25a) because color spread tends to desaturate the image.
(d)
COLOR IMAGE ENHANCEMENT Contrast and Color Enhancement When working with the RGB components of a tricolor digital image, one must be careful to avoid upsetting the color balance. Essentially all of the image processing techniques discussed in this volume, however, will produce very predictable results if applied to the intensity component of an image in HSI format. In many ways, the intensity component can be treated as a monochrome image. The color information, embedded in the hue and saturation components, will usually tag along without protest. Any transformations that alter the geometry of the image must be carried out in exactly the same way on all three components, whether these are in RGB or HSI format.
Figure 24. (Continued)
Table 3. Color Spread Matrix
Red Green Blue
Texas Red
FITC
DAPI
0.85 0.05 0.10
0.26 0.65 0.09
0.24 0.32 0.44
is recorded in the blue channel, whereas 32% of it shows up in the green channel and 24% ends up in the red channel. The values in this matrix were determined experimentally from digitized images of cells stained with single fluorophores, and they are specific to this particular combination of dyes, camera, and optics. The relatively large degree of color smear evident in this example resulted from spectral overlap among the RGB band-pass filters built into the color camera. Less severe color smear could be obtained with carefully designed optics, but this example illustrates the technique of color compensation. The color compensation matrix C−1 specifies what must be done to correct the color spread. Using the smear matrix in Table 3 and assuming that the background is zero, color
Saturation Enhancement One can make the colors in an image bolder by multiplying the saturation at each pixel by a constant greater than one. Likewise, a constant less than one reduces the colorfulness of the image. A nonlinear point operation can be used on the saturation image, as long as it is zero at the origin. Changing the saturation of pixels that have near-zero saturation upsets the color balance. Hue Alteration Because hue is an angle, one logical thing to do is to add a constant to the hue of each pixel. This shifts the color of each object up or down the rainbow. If the angle added or subtracted is only a few degrees, this process will ‘‘cool’’ or ‘‘warm’’ the color image, respectively. Larger angles will drastically alter its appearance. A general point operation, performed on the hue image, will exaggerate color differences between objects in portions of the spectrum where its slope is greater than one, and conversely. Because hue is an angle, operations processing that component must treat the hue scale as periodic, using modulo-2π arithmetic. For example, adding 20° to a hue of
118
COLOR IMAGE PROCESSING
(a)
(b)
(c)
(d)
Figure 25. The result of color compensation: (a) color; (b) red; (c) green; (d) blue components. See color insert.
350° yields a hue of 10° . One must sometimes add or subtract 360° to the result to constrain hue to the range [0° , 360° ]. Color Image Restoration One can apply the standard image restoration techniques to the R, G, and B images individually in a straightforward extension from monochrome to color. However, Some special considerations, however, apply to tricolor images.
If an image is being restored or enhanced for the sake of its appearance, one does well to note the strengths and weaknesses of the human eye. Detail, for example, is much more visible in intensity than in color. Blurring of edges, then, is much more disturbing if it affects intensity rather than hue or saturation. Similarly, graininess (random noise) of reasonable amplitude is more apparent in intensity than in color. Finally, the eye is much more sensitive to graininess in flat areas than in ‘‘busy’’ areas that contain high-contrast detail. This applies to both intensity and color (hue and saturation) noise. Taking the foregoing into account, we can construct a general outline for approaching a color image enhancement or restoration project. 1. Use a linear point operation to ensure that the RGB image fits properly within the gray scale (e.g., 0–255) and is balanced in color. 2. Convert to HSI format. 3. Use a low-pass filter or, better yet, a median filter on the hue and saturation components to reduce the random color noise within objects. Some blurring of edges in these images will not be noticeable in the final product, so this step might involve significant noise reduction. The filter used must preserve average gray level (i.e. MTF(0,0) = 1.0), and hue must be treated as an angle. 4. Use a spatially variant approach (e.g., linear combination filters) to restore the intensity image. This step sharpens edges and enhances detail while reducing graininess in flat areas. Again, MTF(0,0) = 1.0. 5. Use a linear point operation on the intensity component, as required, to ensure proper utilization of the gray scale. 6. Convert to RGB format, and display or print the image. Figure 26 shows an example of image restoration. The top row shows the Lenna image after it has been degraded by blurring, hue shift, and random noise addition. To the left of the color image are hue, saturation, and intensity images. The bottom row shows the same thing after restoration. For the restoration, the hue at each pixel was reduced by 14° to effect color balance. Saturation was increased by 50% and subjected to a 3 × 3 median filter to reduce color noise. The intensity image was synthesized as a weighted sum of a low-pass- and a high-pass-filtered intensity image. The weighting was done by the smoothed output of a Sobel edge detector. This ensured that the high-pass-filtered image contributed most to areas that contain detail and the low-pass-filtered image filled in flat areas. Each pixel in the intensity image was multiplied by 1.3 to increase the contrast by 30%. When processing color components, the filtering operations must account for the periodicity of the hue scale. For example, the average of hues 10° and 50° is 30° , but the average of 10° and 330° is 350° . Here we add 360° to the first hue to place it (370° ) nearer the second hue before averaging.
COLOR IMAGE PROCESSING
119
Figure 26. Color image restoration example. Top row: degraded image, Bottom row: processed image. Columns: HSI components, color image. See color insert.
For a different approach to filtering the color components, one can convert from H, S (polar) coordinates to x, y (rectangular chromaticity) coordinates before filtering. Here, there is no periodicity, and the results will be quite different from filtering hue and saturation separately. For example, suppose we average two adjacent pixels where [H, S] = [0° , 1] and [180° ,1]. These are fully saturated red and cyan, respectively. If we average hue and saturation separately, we obtain [H, S] = [90° , 1], which is a vivid yellow-green. Instead, if we average the corresponding [x, y] = [1, 0] and [−1, 0], we obtain [x, y] = [0, 0], which is a neutral (gray) color. COLOR IMAGE SEGMENTATION Segmentation of a color image is basically a process of partitioning the (three-dimensional) color space into regions that correspond to the different types of objects and background. The different objects in the image often correspond to separate clusters of points in a threedimensional histogram defined in RGB or HSI space. The background upon which the objects reside will also produce one or more clusters. A three-feature (e.g., RGB or HSI) classifier design exercise can prove helpful in defining the surfaces that partition the space best. The hue and saturation of an object are normally dictated by the light absorbing or reflecting properties of the material of which it is composed, combined with the illumination spectrum. The intensity, however, is seriously affected by both illumination intensity and viewing angle. A shadow, for example, that falls across an object would normally affect the intensity of the pixels therein much more than it would affect their hue and saturation. Thus, it may be productive to segment the image in the
hue–saturation plane (i.e., on the color circle), completely ignoring intensity, rather than in three-dimensional color space. One may find, however, that the hue and saturation images are noisy and lack distinct edges. Some noise reduction, implemented by smoothing or median filtering the two chrominance channels, may prove helpful before segmentation. Figure 27 shows how the chromaticity (hue-saturation) plane can be used in image segmentation. Figure 27 has an arrangement of differently colored flowers at the upper left. At the upper right is a hue versus saturation histogram of that image. The hue axis has been shifted 120° to center the clusters of points in the display window. The differently colored objects in the scene produce separate clusters of points in the two-dimensional histogram. The highly saturated yellow flowers produce cluster 1, the red flowers create cluster 2, and the blue flowers are responsible for cluster 3. Cluster 4 is due to the basket, and the background produces the very dense cluster 5 at a hue of 260° and near-zero saturation. The white flowers are hopelessly confounded with the background cluster. Due to the hue shift mentioned, 120° (green), not 0° (red), appears at both ends of the horizontal scale. In the HSI components at the bottom of Fig. 27, the different flowers are largely constant in hue and saturation. One could the image segment either by thresholding the hue and saturation images or by treating the 2-D histogram as a probability density function and partitioning it into regions by standard pattern recognition techniques. A more ambitious approach, not necessarily destined for better performance, would be to partition the 3-D HSI color space in a similar fashion.
120
COLOR IMAGE PROCESSING
1
.5
0 120
240
0
120
Figure 27. Color image segmentation example. Upper left: image; Upper right: hue vs. saturation chromaticity histogram; bottom row: HSI components. In the histogram the numbered clusters correspond to (1) yellow flowers, (2) red flowers, (3) blue flowers, (4) the basket, and (5) the background. See color insert.
COLOR IMAGE MEASUREMENT Once segmentation is complete, size and shape can be measured in the same way as they are for monochrome images. Now however, brightness includes the added aspect of color. One can compute, for example, the average hue and average saturation of each object in addition to its average intensity. This corresponds to the location, in color space, of the center of the cluster of points due to pixels that fall inside the object. Once measured, then, the objects can be classified by standard techniques. The texture features that have been defined for monochrome images can be generalized to color. The standard deviation of gray level inside the object, for example, is a measure of the amplitude of the textural pattern. Given a color image, one can compute this on the R, G, and B channel images separately, or on H, S, and I images. COLOR IMAGE DISPLAY Pseudocolor The term pseudocolor refers to a color image generated from a monochrome image by mapping each of the gray levels (located along the axis of the color cylinder) to a point elsewhere in color space (34). This is simply assigning a color to each gray level by some rule that can be stored in a lookup table. Thus, pseudocolor is basically a detail of
the monochrome image display process rather than a color image processing technique unto itself. The attraction of pseudocolor stems from the fact that the human eye can reliably distinguish many more colors than it can differentiate shades of gray. Thus, although one might be able to discern only 40 or so distinct shades of gray in a monochrome display, many more shades might be visible when mapped to different colors. Figure 28 shows an example of pseudocolor display. Figure (28a) is a monochrome image used as the input. Figure (28b) shows how the gray levels in the input image map to gray levels in the RGB components of the output image, which appears in Fig. (28c). This particular pseudocolor mapping encodes low intensity as blue, high intensity as red, and leaves green in the middle. Increasing the gray level simply cycles through the fully saturated colors of the rainbow. This pseudocolor mapping essentially converts gray level to hue. A poorly chosen pseudocolor mapping can obscure, rather than enhance, the information in a displayed image. The mapping is usually more satisfactory if it employs some pattern, rather than simply a random assignment of colors to gray levels. Normally, the gray-scale axis maps to a continuous line that spirals through RGB or HSI color space. Mapping the black-and-white points to themselves is often useful as well. In general, the more conservative mappings are the more successful. Considerable visual discomfort can result from more aggressive pseudocolor mapping schemes.
COLOR IMAGE PROCESSING
121
256
Output
192
128
64
0 0
64
(a)
128 Input
(b)
192
256
(c)
Figure 28. Pseudocolor image display example. (a) monochrome input image. (b) RGB gray-scale mapping. (c) output image. See color insert.
The usefulness of pseudocolor can be illustrated by a microscope image digitizing application developed by Donald Winkler at the NASA Johnson Space Center. A real-time pseudocolor display was used to assist operators in setting the light level before digitizing images in a microscope. The entire gray-scale was mapped onto itself, except for 255, which was mapped to a brilliant, saturated red. The operators’ instructions were to increase the lamp current until red appears, then decrease it slowly until the red goes away. This visual aid proved very useful in practice. CONCLUSION
6. CIE Proceedings, Bureau Central de la CIE, Paris, 1951, Vol. 1, Sec 4; Vol. 3, p. 37. 7. W. Niblack, An Introduction to Digital Image Processing, Prentice-Hall, Englewood Cliffs, NJ, 1985. 8. CIE, Commission Internationale de l’Eclairage Proceedings, 1931, Cambridge University Press, Cambridge, 1932. 9. ITU-R Recommendation BT.601, Studio Encoding Parameters of Digital Television for Standard 4 : 3 and Wide-Screen 16 : 9 Aspect Ratios, International Television Union, Geneva, 1982. 10. ITU-R Recommendation BT.709, Basic Parameter Values for the HDTV Standard for the Studio and for International Programme Exchange, International Television Union, Geneva, 1990.
A wealth of color and vision data is available on the World Wide Web for those who wish to pursue further study in this area. Worthwhile starting points are (35–37).
11. H. Grassman, Philos. Mag. 4(7), (1854).
ABBREVIATIONS AND ACRONYMS
14. W. K. Pratt, Digital Image Processing, Wiley, New York, 1991.
SMPTE
15. R. W. G. Hunt, The Reproduction of Colour, Wiley, New York, 1957.
12. J. Guild, Philoso. Trans. R. Soc. London A230, 149–187 (1931). 13. W. D. Wright, Trans. Opt. Soc. 30, 141–164 (1928).
CIE NTSC ITU CRT RGB PIXEL SPD HSI IHS FITC DAPI
Society of Motion Picture and Television Engineers Commission Internationale de l’Eclairage National Television Standards Committee International Television Union Cathode Ray Tube red, green, blue Picture Element spectral power density hue, saturation, intensity intensity, saturation, hue a green fluorescent dye a blue fluorescent dye
16. W. S. Stiles and J. M. Burch, Optica Acta 6, 1–26 (1959). 17. F. J. Bingley, in D. G. Fink, ed., Television Engineering Handbook, McGraw-Hill, New York, 1957. 18. The Science of Color, Crowell, New York, 1953. 19. SMPTE RP 177-1993, Derivation of Basic Television Color Equations, Society of Motion Picture and Television Engineers, White Plains, New York, 1993. 20. P. S. Carnt and G. B. Townsend, Color Television. Vol. 2: PAL, SECAM and Other Systems, Iliffe, London, 1969. 21. D. L. MacAdam, J. Opt. Soc. Am. 27, 294–299 (1937). 22. G. Wyszecki, J. Opt. Soc. Am. 53, 1318–1319 (1963).
BIBLIOGRAPHY
23. C.I.E. Colorimetry Committee Proposal for Study of Color Spaces, Tech Note, J. Opt. Soc. Am. 64, 896–97 (1974).
1. H. J. A. Dartnall, J. K. Bowmaker, and J. D. Mollon, Proc. R. Soc. London B220, 115–130 (1983).
24. A. H. Munsell, A Color Notation, 8th ed., Munsell Color Company, Boston, 1939.
2. G. Wyszecki and W. S. Stiles, Color Science: Concepts and Methods, Quantitative Data and Formulae, 2e, Wiley & Sons, New York, 1982.
25. A. H. Munsell, A Grammar of Color, Van Nostrand-Reinhold, New York, 1969.
3. G. Wald, Science 101(2635), 653–658 (1945).
26. Colorimetry, 2e, Pub. No. 15.2, Commission Internationale de l’Eclairage, Vienna, Austria, 1986.
4. G. Wald, Science 145(3636), 1007–1017 (1964). 5. B. H. Crawford, Proc. Phys. Soc. B62, 321–334 (1949).
27. D. L. MacAdam, Color Measurement, Springer-Verlag, Berlin, 1985.
COLOR PHOTOGRAPHY
b
1.0
Although color has been used for graphic purposes since prehistoric times, it was Isaac Newton’s discovery of the solar spectrum in 1666 that led to an understanding of
r
0.5
500 600 Wavelength, (nm)
Red
0
400
COLOR VISION AND THREE-COLOR PHOTOGRAPHY
g
Orange
Color photography is a technology by which the visual appearance of a three-dimensional subject may be reproduced on a two-dimensional surface pleasingly balanced in brightness, hue, and color saturation. There are two essentials in the practice of color photography: the camera and the light-sensitive sensor. The task of the camera is to present an undistorted image to the plane of the sensor that has an intensity level and exposure time appropriate to the sensitivity of the sensor being used. The sensor can be either silver halide based photographic film or an electronic sensor such as a charge-coupled device (CCD). A digital record of the scene may also be obtained by scanning the processed film. The technology of silver halide color photography is discussed herein. The physical record of the image is expected to have high permanence. The image may be viewed directly as a reflection color print, by projection as a color slide, or by back-illumination as a display transparency. A digital image may also be viewed on a video monitor. References 1–20 are general sources of information on the science and technologies supporting color photography (see also PHOTOGRAPHY; COLOR PHOTOGRAPHY, INSTANT PHOTOGRAPHY). A detailed history of color photography may be found in Refs. 1 and 2.
Yellow
INTRODUCTION
Green
Eastman Kodak Company Rochester, NY
Blue
JON A. KAPECKI JAMES RODGERS
Violet
COLOR PHOTOGRAPHY
color with regard to the properties of light. From the observation that white light is not a pure entity, but consists of a mixture of colors, the idea developed that any color could be obtained from a mixture of three primary colors. A wide range of colors could be obtained by mixing red, green, and blue lights. At the start of the nineteenth century, the Young–Helmholtz theory of color vision proposed that there were three types of receptors in the human retina; each responds over a certain wavelength range in proportion to the amount of light absorbed. Because the concept of color is meaningless in the absence of an observer, the physiological basis of color is used as a starting point for trichromatic color reproduction. Human vision is sensitive to electromagnetic radiation in the 400–700 nm range. Figure 1 is an illustration of the colors perceived over this wavelength range, together with the representative spectral sensitivities of human retinal receptors (21). There are no abrupt transitions between the spectral colors, as indicated by the vertical lines, but rather a gradual merging. The human eye contains two main types of lightsensitive cells: rods and cones. These names come from the shape of the outer segment of the cell. The rods are by far the most numerous; they outnumber the cones by a factor of nearly 20 and operate as the main sensors at low illumination levels. Scotopic (rod) vision as shown in Fig. 1 centers at about 510 nm and extends from 400–600 nm. Because there is only one type of rod, it lacks the ability to discriminate colors, and its electrical output varies as the integrated response over this wavelength range. Cones are the primary sensors of color vision, and there is strong evidence that there are three types, shown as ρ, γ , and β. Over most of the visible range, any single wavelength stimulates more than one receptor type. For example, the
Ultraviolet
28. R. S. Ledley, M. Buas and T. J. Golab, Proc. 10th Int. Conf. Pattern Recognition, IEEE Comp. Soc. Press, Los Alamitos, CA, 1, 791–795 (1990). 29. K. R. Castleman, Digital Image Processing, Prentice-Hall, Englewood Cliffs, NJ, 1996. 30. R. C. Gonzalez and R. E. Woods, Digital Image Processing, Addison-Wesley, Reading, MA, 1992. 31. A. K. Jain, Fundamentals of Digital Image Processing, Prentice-Hall, Englewood Cliffs, NJ, 1989. 32. K. R. Castleman, Bioimaging 1(3), 159–165 (1993). 33. K. R. Castleman, Bioimaging 2(3), 160–162, (1994). 34. J. J. Sheppard Jr., R. H. Stratton and C. Gazley Jr., Arch. Am. Acad. Optometry 46, 735–754, (1969). 35. A. Stockman and L. T. Sharpe, Color and Vision Research Laboratories,
1999. 36. C. Poynton, Color FAQs, , 1999. 37. D. Bruton, Color Science, , 1999.
Relative sensitivity
122
700
Figure 1. Representative spectral sensitivities of the human retinal receptors, (- - - - ) scotopic (rod) vision, and β, γ , and ρ cone sensitivities. The wavelengths of the perceived colors are also shown. Reproduced with permission (20).
COLOR PHOTOGRAPHY
peak sensitivities of the ρ and γ receptors at 580 nm and 540 nm, respectively, are quite close together. The ability to distinguish small differences in spectral wavelength, it is believed relies on the difference signals between the responses of the three cone detectors. For example, stimulation of the ρ and γ receptors by monochromatic light of 570 nm has no difference signal and thus is interpreted by the brain as yellow light. However, stimulation of the γ receptor is increased by monochromatic light of 550 nm, that of the ρ receptor is decreased, and the difference signal between the two is interpreted by the brain as greenish light. Most colors consist of many wavelengths of varying intensities; these signals are additive for each receptor. This understanding of human color vision makes a definition of the characteristics of a perfect color reproduction system possible. For simplicity, assume that a reproduction of any uniform patch of color is desired, such that the original and the reproduction are indistinguishable under any illuminant. The first requirement is selecting three sensors that match the cone sensitivity curves shown in Fig. 1. The second requirement is for these sensors to record faithfully the intensity or brightness of the stimulus. The third requirement is that in viewing the reproduction, the physical record of each sensor’s response must be capable of stimulating only the cone receptor it is designed to mimic. This last requirement cannot be met in practice because even monochromatic light over most of the visible range stimulates more than one receptor type. Thus, the construction in color vision that permits exquisite wavelength discrimination also makes perfect color reproduction using three-color mixing impossible. With this limitation, the design challenge of color photography is to reduce or correct for unwanted stimulations as much as possible. In 1861, James Clerk Maxwell demonstrated that colored objects could be reproduced by photography, using three-color mixing (1). He produced three separate negative images in silver on glass plates, ostensibly taken through red, green, and blue filters. After the three negative plates were converted to positive plates, the positives were then loaded into three separate projectors, each had the corresponding filter used in the exposure in front of the lens, and the images were brought into registration on a white screen. A tolerable color reproduction was obtained. Although considered a landmark, the success of the experiment was fortuitous, because the silver halide crystals used as the lightsensitive element were sensitive only to ultraviolet and blue light. It was subsequently shown (22) that the green exposure resulted from the sensitivity of the silver halide tailing into the blue-green region of the spectrum and that the red exposure resulted from ultraviolet radiation passed by the red filter employed. Thus, the ability to sensitize silver halide crystals to green and red light, in addition to the inherent blue sensitivity, is critical. Intrinsic sensitivity depends on the halide chosen: useful sensitivity does not extend beyond 410 nm for AgCl and 490 nm for AgBr. It is axiomatic that only absorbed radiation can produce chemical action (Grotthus–Draper law), in this case, the
123
promotion of an electron from the valence to the conduction band of silver halide. It was discovered in 1873 that when certain organic dyes are adsorbed on silver halide crystals, they induce sensitivity to the longer wavelengths of the visible spectrum. In the course of testing photographic plates that contained a yellow dye to prevent backreflection of light (halation), some sensitivity to green light was observed. Many thousands of dyes have been tested as spectral sensitizers, and materials exist for selectively sensitizing of any region of the visible spectrum and also the near infrared. Besides permitting better tone reproduction in black-and-white photography, this technology is the key to providing the red, green, and blue discrimination required in modern color photography. Maxwell’s demonstration was an example of an additive trichromatic process: the final image was produced by combining red, green, and blue lights in registration, and each light corresponded to its amount in the original scene. The various colors produced by the additive combination of red, green, and blue lights are shown in Fig. 2. To simplify the discussion, blue light is considered the 400–500 nm range of the spectrum, green the 500–600 nm range, and red the 600–700 nm range. White light is produced when all three colors are combined in equal amounts. The twoway combinations of red, green, and blue light produce three colors; each lacks one-third of the spectrum. The color cyan is generated by combining blue and green lights and lacks light in the red part of the spectrum. Similarly, the color magenta lacks light in the green part of the spectrum, and the color yellow lacks light in the blue part of the spectrum. For this reason cyan, magenta, and yellow are called subtractive primaries. Early experimenters focused on the problem of converting the three separation positives produced by Maxwell’s method into a form that could be printed with colored inks on a paper surface. The first clear statements of the basic principles that underlie modern three-color photography
Figure 2. Addition mixing of red, green, and blue lights. See color insert.
124
COLOR PHOTOGRAPHY
Figure 3. Superimposed subtractive dyes. See color insert.
are credited to two independent workers: du Hauron, who applied for a French patent in 1868, and Cros, who wrote an article in Les Mondes in 1869. Both noted that printing colors should be complementary or ‘‘antichromatic’’ to the red, green, and blue taking filters. The complementary colors are the subtractive primaries. Cyan is complementary to red, magenta is complementary to green, and yellow is complementary to blue. Each subtractive primary dye absorbs the color to which it is complementary. Figure 3 illustrates the superposition of equal amounts of the three subtractive primaries as dyes. The three possible two-way combinations yield red, green, and blue. The three-way combination yields black. Thus, the three subtractive primary dyes used singly or together can create a wide variety of colors by selectively modulating the red, green, and blue components in white light. The Light-Recording Element The primary element for light capture in color photography is the silver halide crystal. By common usage, the term emulsion has come to denote in the photographic literature what is actually a dispersion of silver halide crystals (grains) in gelatin. Four discrete steps can be identified in forming a colored image: (1) light absorption by the crystal; (2) the solid-state processes that lead to the formation of a latent image; (3) reduction of all or part of a crystalbearing latent image to metallic silver by a mild reducing agent; and (4) use of the by-products of silver reduction to create a colored image in register by dye formation, dye destruction, or dye transfer. Sensitivity, or photographic speed, is one of the most important attributes of the light-sensitive element. Practical color photography using a handheld camera is possible in conditions from bright sunlight to night street lighting. These conditions span a factor of about 105 in illuminance and must be accommodated by the combination of camera shutter speed, lens aperture, and choice of film speed. Most
commercial emulsions contain a population of silver halide crystals that vary widely in size and shape. Although the structure of the AgBr and AgCl lattice is simple facecentered cubic, an enormous variety of crystal shapes can be obtained, depending on the number and orientation of twin planes in the crystal, the silver ion concentration during growth, and the presence of growth modifiers (23,24). To add to this complexity, the crystals in commercial emulsions usually contain mixed-halide phases. Films suitable for a handheld camera generally contain silver bromoiodide, in which iodide ions are incorporated into the AgBr lattice during crystal growth. The intrinsic sensitivity of silver halide is enhanced during manufacture by heat treatment, usually in the presence of tiny amounts of sulfur and gold compounds, a process known as chemical sensitization. This creates subdevelopable chemical sensitization centers of mixed silver/gold sulfides on the grain surface. The gold incorporated into the latent image formed during exposure enhances the developability of the silver halide crystal and effectively increases its light sensitivity, without affecting the wavelength dependence of sensitivity. The key to efficient chemical sensitization is to trap the photoelectron effectively at a latent-image site while minimizing loss processes such as electron–hole recombination. The important subject of chemical sensitization is dealt with more fully in PHOTOGRAPHIC COLOR DISPLAY TECHNOLOGY. Sensitizing Dyes Sensitizing dyes (25,26) are essential to color photography. Although the initial discovery was made in 1873, it was not until the 1930s that photographic scientists began to develop a systematic series of dyes that adsorbed on the silver halide crystal and then transferred to it the energy of green and red light necessary to create a latent image. The most widely used of these materials are the cyanine dyes, which are heterocyclic moieties linked by a conjugated chain of atoms. More than 20,000 dyes of this class have been synthesized. In the example shown in Fig. 4, when n = 0, the dye is yellow and provides sensitization to blue light. Dyes that sensitize in the blue are particularly important for AgCl emulsions, which lack intrinsic blue sensitivity. When n = 1, the dye, a carbocyanine, is magenta and absorbs green light. For n = 2, a dicarbocyanine, the dye appears cyan and sensitizes silver halide to red light. Extending the conjugation further produces dyes that absorb in the infrared and produce films useful for aerial photography and thermal analysis. There are several requirements for a good sensitizing dye. A good dye is adsorbed strongly on silver halide. The dye molecules attach themselves to the surface of the silver halide crystals, usually up to monolayer coverage. This amount can be determined by measuring the adsorption isotherm for the dye. Dye in excess of this amount can cause the emulsion to lose sensitivity, so-called dyeinduced desensitization. In color films, simple positively charged cyanine dyes can also dissolve in the organic phase used to solubilize the image-forming coupler and lead to a phenomenon known as unsensitization. This can be overcome by adding a negatively charged acidic group, such as sulfonate, to turn the dye into a zwitterion that is
COLOR PHOTOGRAPHY
C
CH
(CH
CH)n
C +N
N
I−
C2H5
C2H5
Sensitivity
(a)
B
A
400
500
C
600 700 Wavelength, nm
D 800
(b) Figure 4. (a) Cyanine sensitizing dye structure. (b) Sensitivity curves (- - - - ) intrinsic silver halide; and ( — ) for the dye in (a) where A corresponds to n = 0; B to n = 1; C to n = 2; and D to n = 3.
insoluble in organic media. Adsorption is also influenced by the composition and crystallographic surfaces of the silver halide crystal and by the procedure of dye addition. The optical properties of the dye change when it is attached to silver halide. Typically the light absorption peak shifts by 30–50 nm to longer wavelengths. Frequently, new absorption peaks occur as a result of different stacking arrangements of the dyes on the crystal surface. A good sensitizing dye absorbs light of the desired wavelength range very efficiently. The absorption spectrum of the sensitizing dye depends on a number of factors. Each heterocyclic nucleus has a characteristic color value associated with it that can be modified by chemical substitution. The chromophore length is another important variable. A useful empirical rule is that for each increment in n, the absorption peak of the eye shifts to longer wavelengths by 100 nm. A good sensitizing dye sensitizes very efficiently. A modern sensitizing dye transfers an electron from the dye’s excited singlet state to the conduction band of silver halide, which can subsequently participate in latentimage formation. The dye’s ability to serve as an efficient sensitizer is often proportional to the electrochemical reduction potential of the dye measured in solution. This has proved to be a useful tool in designing spectral sensitizers. A practical measure of dye efficiency is the relative quantum efficiency of sensitization (27) defined as the ratio of the number of quanta absorbed in the intrinsic region, usually 400 nm, of the silver halide to the number of quanta absorbed only by dye at a wavelength within its absorption band, both to produce the same specified developed density. For the best dyes this value is only slightly less than 1.0. A good sensitizing dye does not interfere with other system properties. Sensitizing dyes can sometimes influence the intrinsic response of a chemically sensitized emulsion and lead to desensitization or additional sensitization. The dye can also interfere with the
development rate, increase or decrease unwanted fog density, and remain as an unwanted stain in the film after processing. The dye should have adequate solubility for addition to the emulsion but should not wander between layers of the final coating. The relationship between crystal size and photographic speed can be understood by using simple geometric arguments. The sensitivity of an individual crystal is defined as the reciprocal of the minimum light absorption required to generate a developable latent image. For a silver halide crystal without sensitizing dye, blue light absorption is proportional to volume. If it is assumed that the crystal is a sphere and that the latent image can be formed with equal efficiency in all grain sizes, the relationship shown in Fig. 5 is obtained. The adsorption of sensitizing dyes is necessary to confer sensitivity on the green and red regions of the spectrum; this is frequently called ‘‘minus-blue’’ speed. To a first approximation, minus-blue speed depends on the surface area available for dye adsorption. Again, assuming sphericity, the line shown in Fig. 5 is the expected change of minus-blue speed with crystal size. Even for the highest speed films, the crystals do not usually exceed 3 µm in linear dimension. Crystal Morphology Over the years, obtaining greater uniformity in silver halide crystal size and habit in the grain population has been emphasized, in the belief that the chemical sensitization process can then yield a higher average imaging efficiency. One way of doing this is to adjust the nucleation conditions so that untwined crystals are favored, and then to ensure that no new crystals are formed during the growth of the starting population. Crystals that contain twin planes grow anisotropically, and it is more difficult to obtain a uniform population.
Log (relative sensitivity)
S
S
125
Blue
Minus blue
Log (crystal volume)
Figure 5. Calculated relationship between log (relative sensitivity) and log (crystal volume) for ( — ) intrinsic response (blue) and (- - - - ) dyed response (minus blue), assuming crystals are spherical.
126
COLOR PHOTOGRAPHY
Commercial materials are available that contain cubic and octahedral crystals of narrow size dispersity. Along with better control of the crystal size and shape in the population, the placement of iodide in the AgBr crystal has received a great deal of attention. Iodide incorporation increases blue light absorption, and its selective placement within the crystal allows controlling the rate of the development reaction, because iodide also acts as a development inhibitor. In color negative films, the color image is formed by reducing less than one-quarter of the total silver available to metallic silver during development. This strategy is necessary so that the crystals are large enough to have the required sensitivity, yet each crystal by its partial reduction contributes only a small amount of dye to the final image and leads to an image of low graininess. An approach has been devised (28) to break out of the surface-to-volume relationship imposed by crystal shapes that are nearly spherical. Conditions have been
(a)
established to favor the growth of crystals that have multiple parallel twin planes (29), and emulsions that contain mostly hexagonal tabular crystals, such as those shown in Fig. 6b, can be prepared. Figure 6 compares the crystals of a newer emulsion to a more traditional one. Tabular thicknesses of about 0.1 µm are commonly employed. By adjusting the projective area and thickness of these tabular crystals, it is possible to create a series of emulsions in which the surface area and hence minusblue speed increases, but the crystal volume and hence blue speed remain constant. Because an emulsion that is dyed to be sensitive to green or red light still retains its intrinsic blue sensitivity, the greater minus-blue/blue sensitivity ratio afforded by tabular crystals reduces unwanted blue light sensitivity in those layers of the film designed to record green and red light. Another advantage cited for this technology is improved optical transmission properties of layers that contain tabular emulsions. Because the crystals align themselves nearly parallel to the film support during the coating and drying operations, the transmitted light is less scattered sideways. This is advantageous for image sharpness in multilayer color films. Color Processes Additive Mixing
1 µm
(b)
1 µm
Figure 6. Emulsions (a) traditional and (b) hexagonal tabular crystals.
The first commercially successful color photographic systems used additive color mixing. Simultaneous recording of red, green, and blue components of an image was achieved in the chromoscope camera in 1898. The image beam was split into three by semitransparent mirrors and prisms; the three beams passed through red, green, and blue filters before striking the emulsion surface. One of the most serious practical difficulties encountered was nonuniformity of intensities at the film plane as a result of the optics of semireflecting mirrors. If different colors are presented rapidly enough to the eye, they are additively fused by the visual system. This principle has been applied to the generation of moving color images, where successive frames contain the red, green, and blue records of the scene. Photography and projection are accomplished by moving the appropriate colored filters synchronously with the image frames. This method places great mechanical demands on apparatus, however, and requires much higher than the normal projection rate of 24 frames per second. The results are often unsatisfactory because of color fringing of moving objects and flicker from the different transmission characteristics of the filters used. Mixing additive primaries as dyes by superposition does not work because each primary absorbs two-thirds of the spectrum. The two difficulties of registration and additive superposition were overcome by what is known historically as the screen-plate process. The additive primaries are presented to the eye as a mosaic of very small colored dots in juxtaposition, as in pointillist painting. Although close inspection reveals a pattern of colored dots, additive blending occurs by increasing the viewing distance. The retina of the eye is itself a random mosaic of red, green, and blue receptors, and if the dot pattern is fine enough,
COLOR PHOTOGRAPHY
the eye interprets the image as smooth. The photographic record is obtained by exposing a silver halide emulsion, which is sensitized throughout the visible spectrum, that is, it is panchromatic, through a mosaic of very small red, green, and blue filters. The film is then processed to give a positive image in silver. When viewed or projected in registration with the original mosaic, a colored image is created. The amount of light transmitted through each filter is controlled by the developed silver optical density. The first commercially successful screen-plate process was the Autochrome Plate made by the Lumiere Co. in France in 1907. The mosaic of filters was integral to the photographic plate, it consisted of starch granules about 15 µm in diameter, which were dyed red, green, and blue. The individually dyed granules were mixed and pressed onto the plate so that their edges touched. The spatial distribution of red, green, and blue granules was random. Any gaps in the mosaic were filled by a carbon paste. This filter screen was then overcoated with the emulsion. Exposure was made through the reverse side so that the exposing light passed through the filters first. The final image was intended for direct viewing or projection. The relative surface areas of the three primary colors were such as to give a satisfactory neutral with the viewing illuminant. The Autochrome process survived commercially until the mid-1930s. Mottle sometimes appeared in the Autochrome image because of clumping of the starch granules. A regular grid of filters gave more satisfactory results, as in Dufaycolor (1908), which employed a very fine square grid of filter elements. The lenticular method (30) was also commercially successful. The reverse side of a panchromatic blackand-white film was embossed with a very fine (about 25 per mm) pattern of cylindrical lenses or lenticules. The camera had a red filter over the top third of its lens, a green filter over the middle third, and a blue filter over the bottom third. During exposure, the lenticules were parallel to the camera filters. Light from any tiny area of the subject is focused onto the lenticular surface, which faces the lens. The lenticule then focuses a tiny image of the lens aperture with its three filters onto the panchromatic emulsion coated on the reverse side of the film. The relative intensities coming through each filter depend on the color of that tiny area of the subject. This process occurs for every point of the image, resulting in three horizontal bands for each lenticule. In this system, the variables to be optimized were line spacing, the thickness of the support, and the curvature of the lenticules. When the film is given a black-and-white reversal process and the optical path is reversed in a projector where the lens has the same arrangement of filters as the camera, a colored image is obtained. This system was introduced in 1928 by Eastman Kodak Company as Kodacolor and was available until 1935 as a 16-mm motion picture film. At present, the additive process is used in color television, in which light emitted from a tiny regular mosaic of red, green, and blue phosphors blends to give the colored image. Another modern additive color system is Polaroid’s Polachrome 35-mm transparency film that consists of a positive silver image overlying an additive screen that has 394 triplets of red, green,
127
and blue lines per centimeter of film. However, because additive photographic systems inherently waste light (each additive filter absorbs two-thirds of the light energy), most modern systems rely on the subtractive primaries. Subtractive Mixing There are mechanical difficulties in separating a photographic image into three images to record red, green, and blue information, only to recombine them later. Perfect registration of the color information can be preserved in a film if the three-color records are stacked on top of each other on the support. This film structure is known as an integral tri-pack. Photographic systems have been designed in which the three subtractive primary dyes — cyan, magenta, and yellow — can be formed in register, destroyed in register, or transferred in register to create the full-color image. Dye destruction technology, Silver Dye-Bleach, is used in the Cibachrome process. After a black-and-white development step, the film is subjected to an acidic dyebleach solution that destroys the incorporated azo dyes in proportion to the amount of developed silver and leaves a residual positive color image. Because the presence of light-absorbing dyes during exposure severely limits the photographic speed of these materials, they are used to make display transparencies and prints, usually from a camera transparency original. The azo dyes used in this process offer very good light and dark stability. The first instant color photography system, introduced by the Polaroid Corp. in 1963 as Polacolor, used the transfer of subtractive dyes to a receiver sheet to produce a positive image. The incorporated dye-developers that contain a hydroquinone moiety are soluble in the alkaline activator solution, except where silver development occurs, and then they are immobilized as the quinone form. Another dye diffusion method is the dye transfer system in which three-color separation negatives are prepared from an original positive color transparency. These are printed onto a special matrix film, which is processed in a tanning developer and washed in hot water to remove unhardened gelatin, giving three positive gelatin relief images. The depth of the gelatin relief is inversely proportional to the original camera exposure received. The corresponding subtractive primary dye is imbibed into each matrix and then transferred in register to a receiver sheet. Technicolor motion picture prints have been made by this process, which is used when exceptional color quality and dye stability are demanded. The first commercially successful film using the in situ formation of three subtractive dyes was Kodachrome film, introduced in 1935. The film has a multilayer structure in which red-sensitive, green-sensitive, and blue-sensitive emulsions are successively coated on the same support. Because the compounds necessary for dye formation are not incorporated in the film, an elaborate process is required to produce each dye in its correct layer. The first step is conventional black-and-white development to give a silver image. In the first commercial Kodachrome process, the silver image was removed (bleached) to leave a reversal image in residual silver halide. As the initial step in color image formation, cyan dye was formed in all layers
128
COLOR PHOTOGRAPHY
by reducing this (residual) silver halide. The film was then dried and subjected to a slowly penetrating bleach solution that decolorized the cyan dye and oxidized the silver in the blue- and green-sensitive layers. The process was repeated for magenta dye formation in the green- and bluesensitive layers. The film was again dried, and there was subsequent selective bleaching of the magenta image and silver oxidation in the blue-sensitive layer. A final colorforming step generated yellow dye in the blue-sensitive layer. Removal of all image silver then left a positive three-color image. The modern Kodachrome process relies on selective layer exposure and dye formation steps, again using silver reduction to drive the dye formation reactions. Subtractive dye precursors (couplers) that could be immobilized in each of the silver containing layers were sought, so that dye formation in all layers could proceed simultaneously rather than successively. The first of these to be commercialized were in Agfacolor Neue and Ansco Color films, introduced soon after Kodachrome film. These reversal working films contained colorless couplers that were immobilized (ballasted) by attaching long paraffinic
chains. The addition of sulfonic or carboxylic acid groups provided the necessary hydrophilicity to make them dispersible as micelles in aqueous gelatin. A different approach was taken in Kodacolor film, introduced by Eastman Kodak Company in 1942. The couplers were ballasted but, instead of having hydrophilic functional groups, were dissolved in a sparingly watersoluble oily solvent. This oily phase was then dispersed by high agitation into a gelatin solution as fine droplets less than one micrometer in diameter. Kodacolor film is negative working and was designed to be printed onto a companion color paper, which because it is also negative working, produces a positive color print. The whole system is known as the negative–positive process. Figure 7 illustrates two ways in which the integral tripack can be processed. One leads directly to a positive image; the other leads to a negative image that can be subsequently printed on a negative-working paper. In the positive-working or ‘‘reversal’’ process used in slide films, the first step is black-and-white development to yield a negative silver image. After light or chemical fogging
Silver halide
Blue-sensitive emulsion
Latent image
Yellow filter layer
Developed silver
Green-sensitized emulsion Red-sensitized emulsion Support
Color positive First developer
Color negative Developer
Subsequent dye development
Bleach and fix
Figure 7. Positive and negative-working integral tri-packs. See color insert.
Bleach and fix
COLOR PHOTOGRAPHY
of the unexposed silver halide, subsequent development is carried out in a color developer that simultaneously reduces the unreacted silver halide and generates dye. Removal of all of the silver leaves the positive color image. The negative-working process uses the initial silver development reaction to drive color formation. Because the camera negative is not the final image, the system tolerates underexposures and overexposures by as much as two stops (a factor of 4), and density differences in the negative can be allowed for in the printing stage. The key process steps for color negative are develop/bleach/fix, which are carried out under carefully controlled temperature and agitation. Water washes follow the bleach and fix steps. The negative–positive system enjoys great commercial success. In 1949, color purity was improved by introducing colored masking couplers to the camera negative film, which partially correct for unwanted absorption of the image dyes (31). Masking couplers account for the yelloworange color seen in the unexposed parts of modern color negative films after processing. A later improvement was the introduction of development-inhibitor-releasing (DIR) couplers, in which a silver development inhibitor released as a function of exposure in one layer can influence the degree of development in adjacent layers (32). Using masking couplers and DIR couplers in concert substantially improves the quality of color reproduction in the final print. The principal features of an integral tri-pack are shown in Fig. 8. The color records are stacked in the order shown, the red record is on the bottom, the green record next, and the blue record on the top. The blue record is on the top because it is necessary to interpose a blue light filter to remove blue light, which would otherwise form latent images in the underlying red and green records. Silver halides, except for AgCl, have an intrinsic blue light sensitivity even when spectrally sensitized to green or red. The traditional filter material has been Carey Lea silver, a finely dispersed colloidal form of metallic silver, which removes most of the blue light at wavelengths
129
Overcoat UV filter Fast yellow Slow yellow
Blue light record
Blue light filter Fast magenta Slow magenta
Green light record
Interlayer Fast cyan Slow cyan
Red light record
Antihalation layer Support Figure 8. Features of an integral tri-pack.
Each color record is separated from its neighbor by a gelatin interlayer to prevent silver development in one record causing unwanted dye formation in another. It is also common practice to divide each color record into two or more separate layers; the most sensitive emulsion of each record is on the top. Optical screening of the underlying less sensitive layer allows a wider exposure latitude. A UV-filter layer is placed on top of the pack. Because silver halide is sensitive to ultraviolet light, this layer is designed to minimize transmission of wavelengths
130
COLOR PHOTOGRAPHY
from the crystal are reduced to metallic silver, halide ions depart into solution to maintain electrical neutrality. The developing agent of choice is for modern color photographic systems an N,N-disubstituted pphenylenediamine (PPD) (34,36). In the initial heterogeneous reaction with silver halide, the PPD is oxidized by losing one electron to form the semiquinone (Eq. 1). Following desorption, two molecules of the semiquinone undergo a fast dismutation reaction to produce the charged quinonedimine (QDI) and a molecule of the original reduced developer for an overall two-electron oxidation (Eq. 2). Alternatively, the adsorbed semiquinone can give up a second electron to the silver halide crystal to produce the QDI (37). This species is the active dye-forming agent (37,38). The overall development reaction produces protons, so the process is favored by an alkaline environment, and the developer solutions are often maintained around pH 10 by a carbonate buffer. ·· NH2
· +NH 2 + Ag+
+ Ag0
NR2
(1)
NR2 Semiquinone · +NH
2
2
NR2 Dismutation
NH
NH
·· +NH
NH +
(2)
+ +
NR2
NR2
NR2
NR2
Quinonediimine (QDI) + ·· NH2
AgX surface
e−
Root +
Ag i X −soln
Ag filament
Ag+ X−
Devox Dev Etch pit
Figure 9. Corrosion model of silver development. As the halide ion X− is removed into solution at the etch pit, the silver ion Ag+ travels interstitially, Agi + , to the site of the latent image where it is converted to silver metal by reaction with the color developer, Dev. Devox represents oxidized developer.
is reaction with ionized coupler to produce dye (Eq. 3). Typically, the second-order rate constant for this process with ionized coupler is about 103 to 104 L/(mol · s) (41,42). QDI is also attacked by hydroxide ion (Eq. 4) to produce a quinone monoimine (QMI), itself an oxidized developer derived from p-aminophenol. Such compounds can further react with coupler, albeit at a slower rate than QDI, to form a dye and were cited in the seminal patent as color developers (34). However, the dyes derived from this deaminated developer have hues different from the QDI dyes, and these hues are pH-dependent as a consequence of the phenolic group contributed by the developer. Although the deamination reaction to produce QMI is fast, the rate constant is 103 to 104 L/(mol · s) (42–44), its effect is somewhat offset by the redox reaction of the QMI with the reduced developer, present in large excess, to regenerate the desired QDI. The primary net effect of the deamination reaction is to enlarge the resulting dye cloud (45). A nucleophile even more potent toward oxidized developer is the sulfite anion (Eq. 5) present in most developer formulations (44,46). Sulfite is used to scavenge excess oxidized developer that otherwise would undergo a number of self-condensation reactions resulting in stain. Sulfite also drives the redox reaction and acts as an antioxidant and mild silver halide solvent. The sulfite reaction is complicated, involving acid and base catalysis and a multiplicity of products, but the half-life for oxidized color developer in a typical processing solution containing sulfite is only about 2 ms. This reaction with sulfite is the principal reaction that limits the size of the resulting dye cloud (47). X
+ H
N
+
X
NH
C
Y
(3)
Y
CH
NH
NR2 OH−
Development in most color photographic systems, is the rate-determining step, and within that step the formation of semiquinone is the slow process (39). The fate of the highly reactive QDI is determined by the relative rates of a number of competing processes (40). The desired outcome
NR2 NH2
−
SO3
2−
+NR 2
+ R2NH (4)
SO3
O NR2
(5)
COLOR PHOTOGRAPHY
Practical developers must possess good image discrimination, rapid reaction with exposed silver halide, but slow reaction with unexposed grains. This is possible because the silver of the latent image provides a conducting site where the developer can easily give up its electrons, but this requires that the electrochemical potential of the developer be properly poised. For most systems, this means a developer overpotential of between −40 to +50 mV versus the normal hydrogen electrode. Developing agents must also be soluble in the aqueous alkaline processing solutions. Typically, such solutions are maintained at about pH 10 by a carbonate buffer. Other buffers used include borate and, less frequently, phosphate. Developer solubility can be enhanced by hydroxyl or sulfonamide groups, usually in the N-alkyl substituent. The solubilization also reduces developer allergenicity by reducing partitioning into the lipophilic phase of the skin (48). Some of the color developers in commercial use are CD-2 [2051-79-8], CD-3 [25646-71-3], and CD-4 [2564677-9]. The various substituents control the rates of the various developer reactions as well as the hue, extinction, and stability of the resulting image dyes. The presence of an alkyl group ortho to the primary amino moiety serves several purposes, including steric minimization of the condensation reaction with reduced developer (49) and improved dye stability. Electron-donating groups either on the ring or on the tertiary amino group increase the development rate and decrease to a lesser extent the rate of reaction of oxidized developer with coupler (35,42). Electron-donating groups on the developer also tend to shift the absorption of the dye formed to longer wavelengths (50). H5C2
H5C2
C2H5 N
C2H4NHSO2CH3 N
· 3/2 H2SO4 CH3
· HCl CH3 NH2
NH2
CD-2
CD-3 H5C2
C2H4OH N
· H2SO4 CH3 NH2 CD-4
In addition to the developer, sulfite, and pH buffer, commercial developer solutions often contain antifoggants or restrainers that reduce the rate of development of unexposed silver halide relative to exposed grains (51). The halide ion of the same type as in the film’s silver halide emulsion is a common restrainer. The excess halide controls the silver ion concentration or pAg, where pAg =
131
− log [Ag+ ]. Other antifoggants include organic materials such as benzotriazoles, benzyltriazolium salts, and 3,5dinitrobenzoic acid [99-34-3], C7 H4 N2 O6 . Developer solutions can also contain antioxidants such as hydroxylamine sulfate [10039-54-0], metal sequestrants, and materials to aid in sensitizing dye removal or to boost coupling rates. Coupler Types A photographic coupler is a weakly acidic organic compound that reacts with oxidized p-phenylenediamine to produce a dye, usually one of the subtractive primaries, cyan, magenta, or yellow (52) (Fig. 10). In addition to the dye-forming portion of the molecule, most couplers also bear an organic ballast, a long aliphatic chain or a combination of aliphatic and aromatic groups. This allows the coupler to be suspended in droplets of a high boiling organic liquid, called the coupler solvent, which anchors the coupler in its appropriate film layer. In some recent formulations, the dye-forming portion of the coupler is attached to a polymeric backbone (53). In both cases, such films are called incorporated coupler systems. Kodachrome film is unique, however, because its couplers are not contained in the film, but diffuse in during the development steps. Although functionally similar to the incorporated couplers, couplers for Kodachrome films are water-soluble and do not contain an organic ballast. Cyan dyes are derived typically from phenols or naphthols. These so-called indoaniline dyes absorb in the red, from 600 to 700 nm and beyond, and have unwanted absorptions in the blue-green. Magenta dyes are formed from pyrazolones, pyrazolotriazoles, and other species that have either an active methylene group as part of a conjugated ring or, less frequently, an active methylene flanked by electron-withdrawing groups. Magenta dyes are designed to absorb in the green between 500 and 600 nm, but often have additional absorbances in the blue. Yellow dyes are produced from couplers that contain an active methylene group that is not part of a ring. They absorb blue light between 400 and 500 nm and typically strongly absorb in the ultraviolet, which affords some protection to underlying layers from light-induced dye fade. Mechanisms of Coupling Because the active coupling species is the ionized coupler (37,54), the rate of the coupling reaction and hence its ability to compete for oxidized developer depends on the pH of the process, the pKa or acidity of the coupler or, less frequently, the rate of coupler ionization, and the reactivity of the resulting coupler anion with QDI (42). Ionization is rapid for most cyan and magenta couplers, and can be characterized by the pKa of the coupler. For yellow couplers, which can exist as both keto (carbon acid) and enol (oxygen acid) tautomers, ionization can be measurably slow, even rate-determining in some processes (55). This tautomeric equilibrium and hence the degree and rate of ionization can depend on both structure and environment. Once ionized, the coupler reacts with oxidized developer to produce an intermediate or leuco (colorless) dye (Fig. 11). If X is hydrogen, an oxidation that involves a
132
COLOR PHOTOGRAPHY O O 1
R CCCNHR2
O
N
O
R1CCH2CNHR2
∼
OH ∼
N R
R′′
∼ +
Ar N
Dev ox O
N
OH−
kr
X
Dev Leuco dye
Magenta R′
Dev
7.0
Four-equivalent
Dev Dev ox+
6.0
ke
Dev
kf · k e kr + ke
R′
R′′ N
RR′C +
kc− =
N R
X− Two-equivalent
Dye ~ kf (For ke >> kr)
Figure 11. Reaction of ionized coupler and oxidized developer (Dev+ ox ) to produce the intermediate leuco dye. If X is a good leaving group, the reaction proceeds spontaneously to dye, and the coupler is said to be two-equivalent. If oxidation by a second molecule of oxidized developer is required, the coupler is four-equivalent.
Log kc−
X
RR′C
R′′ N
R
RR′C kf
Cyan
N N O
X=H
RR′C− + Dev ox+
N
Ar
N+ R′
∼
R′
OH−
+
R
R′′
+
NH
Figure 10. In each image-forming layer, developer oxidized by the exposed silver halide (Devox ) reacts with the appropriate coupler to form the corresponding subtractive primary dye, yellow, cyan, or magenta. Ar represents an aryl group, and the various R’s are undefined organic segments.
Yellow
O
OH−
A
5.0
B
C
4.0 OH
3.0
Ballast
2.0 X 5.0
6.0
7.0
8.0
9.0
10.0
Coupler pK a
second molecule of QDI is required to produce the image dye. Because formation of each mole of QDI requires the reduction of two moles of silver ion (for a total of four moles), such couplers are referred to as ‘‘four-equivalent’’ couplers. However, if X is a good leaving group, such as a halide, or a low pKa phenol or heterocycle, departure of X as an anion to form the dye proceeds spontaneously, and the coupler is said to be ‘‘two-equivalent’’ (45). Although the final dye yield, that is, the ratio of dye formed to silver reduced, depends on the degree of competition in the system, two-equivalent couplers are preferred because they undergo fewer side reactions with oxidized developer and require less silver halide to produce an equivalent amount of dye. Thus, there is an obvious economic benefit, and it can also lead to increased film sharpness by reducing optical scatter by the coated silver halide. The experimentally observed second-order rate constant kc− , for two-equivalent couplers, where the conversion of the leuco dye to image dye is rapid, can be equated with kf the rate of nucleophilic attack of coupler anion on oxidized developer. Thus, when the pH of the process is specified, two parameters, pKa and kc− , can be
Figure 12. The relationship between acidity and anion reactivity for naphthol couplers that differ in the 2-position ballast where A is ballast 1; B, ballast 2; C, ballast 3. For each coupler, data points represent different leaving groups X.
conveniently used to characterize the molecular reactivity of a large variety of photographically well-behaved couplers (42,56). When couplers are grouped together in structurally similar families, such as the naphthols in Fig. 12, it is found that a linear free-energy relationship of the form log kc− = α + βpKa often exists between the pKa of the coupler and the log kc− of its resulting anion (42), that is, a substituent X that raises the pKa of the coupler and thus reduces the concentration of active anion at a given pH also increases the activity of that anion. The constant β is a measure of the trade-off between those two opposing factors on coupler reactivity. For β > 1, increasing the electron-donating ability of the substituent leads to an increase in anion reactivity that more than compensates for the loss in anion concentration. For β < 1, the
COLOR PHOTOGRAPHY
opposite is true; overall reactivity results from decreasing the pKa of the coupler, until the pKa drops below the pH of the process. At that point, when the coupler is fully ionized, changes in the molecule that lower pKa do not increase anion concentration but only decrease anion reactivity (56). Studies of the naphthol and phenol couplers indicate that the field and resonance, but not steric, effects of the substituents are important in determining pKa and kc , suggesting that formation of the leuco dye proceeds through an early transition state where there is overlap between the electron-deficient p orbitals of the developer ring and the electron-rich p orbitals of the naphthol ring. This geometry is quite different from that of the product (57). For coupling reactions where conversion of the leuco dye to image dye is not fast, the rate constant of the reverse reaction, kr , is important. If kr is much less than ke , dye formation is slow, but dye yield can still be high, although the leaving group may not be released in time to affect development. If, however, kr is faster than ke , much of the oxidized developer originally captured by the coupler is lost to other reactions, and dye yield is low (56). The effect of the pH of the processing solution on coupling is most often considered in the context of the coupler, whose structure can be varied, but pH can also affect the activity of the oxidized developer (45,58). In general, oxidized developers fall into three categories. The first, like CD-2, has a constant charge, either cationic or zwitterionic, at normal processing pH values, and coupling rates are little affected by pH changes. The second class exists in cationic or neutral forms as a function of pH. For example, the oxidized form of the commercially important CD-4 reacts reversibly with hydroxide ion to form a neutral pseudobase, which is unreactive toward coupling (43). Finally, the quinonedimine derived from developers like CD-3 can exist in cationic and zwitterionic forms as a function of pH. Both forms can react with ionized coupler, but at different rates. Though the molecular reactivities of the coupler and the developer are important, both can be severely constrained by the polyphasic nature of the film system, where the coupler is most often embedded in a lipophilic phase and the oxidized developer is generated in the aqueous gelatin phase (45). Because the organic solvent invariably has a lower dielectric constant than the aqueous phase, the effect is to suppress ionization of the coupler and slightly increase the rate at which the positively charged QDI and coupler anion react with one another. The overall result is to reduce the rate of dye formation. For some yellow couplers, the rate-determining step can change from coupling to ionization as a function of solvent or even substituent. The structure of the oxidized developer and its charge can also control its ability to partition into the organic phase (44,58). In some processes, development additives such as benzyl alcohol are added to the developer to increase the hydrophilicity of the organic phase. More frequently, higher pKa couplers are designed to have additional ionizable sites, such as carboxyl, sulfo, or phenolic groups, to accomplish the same end (59).
133
In any case, the kinetics of such systems can be very complex, depending on the identity of the coupler and developer, and also on the nature of the dispersed organic phase. Often, the rate of coupling is proportional to the total surface area of the droplets that contain the coupler. Less frequently, for example, when the droplet integrity is destroyed in the developer bath, the rate can be independent of droplet size. Surfactants and mechanical milling techniques are used to form these oil-in-water emulsions, or dispersions as they are called in the photographic literature, that are highly controlled in particle sizes and have the requisite stability for coating and storage. In some cases, dye-forming moieties attached to a polymeric backbone, called a polymeric coupler, can replace the monomeric coupler in coupler solvent (53). In other reports, very small particles of coupler solubilized by surfactant micelles can be formed through a catastrophic precipitation process (60). Though both approaches have the potential to eliminate the need for mechanical manipulation of the coupler phase, as well as detrimental heating to effect coupler dissolution, in practice they have seen only limited application. Cyan Couplers Substituted phenols and α-naphthols are the primary classes of cyan dye-forming couplers (Fig. 13). Naphthols of structural types (1) and (2), the 1-hydroxy-2naphthamides, have proved very useful and are easily and inexpensively prepared. Hydrogen bonding between the naphtholic oxygen and the hydrogen of the 2-amido group shifts the dye hue bathochromically (to longer wavelengths), increases its extinction, and contributes to the dye’s stability. Electron-withdrawing groups on the amide nitrogen also shift the hue bathochromically and increase the extinction coefficient (61). Substitution in the 4-position provides accessibility to image-modifying couplers of various types. However, some dyes of this class are prone to chemical reduction, which returns them to the colorless leuco form in the presence of ferrous ion during the bleaching step (62). Naphthols of type (3) reportedly show less fade to the leuco dye (63), probably due to stabilization of the dye by internal hydrogen bonding. Phenols of structure (4), it is also claimed, show markedly improved dye stability both in the presence of ferrous ion and with a second carbonamido group in the 5-position to simple thermal fade (64). Numerous substituent variations are described in the literature to adjust dye hue. A perfluoroacylamido in the 2-position shifts the hues bathochromically and maintains thermal stability of the dyes (65). Phenols of structure (5) are said to show outstanding light stability, which makes them especially suitable for display materials like color paper (66). Some cyan dyes derived from both naphthols and phenols reportedly show thermochromism, a reversible shift in the dye hue as a function of temperature. This can occur in a negative while prints are being made (67). Derivatives of the pyrazolotriazole nucleus, more commonly used in magenta dye-forming couplers (see below), and the related pyrrolotriazole ring have recently been described as new parents for cyan couplers. They have the reported advantage of yielding dyes of very high
134
COLOR PHOTOGRAPHY
OH
O
OH
CNH(CH2)nOAr
O
OH
CNHAr
X
R′
X
RCNH
X
O (1)
(2) O
OH
HNCR
(3) OH
Cl
O HNCR
R′ Figure 13. Cyan dye-forming couplers where X can be H, Cl, OAr, OR, or SAr. Ar is aryl. R and R are undefined organic segments.
X (4)
extinction (ε > 60, 000). The hue of the dye is shifted into the cyan by using extended conjugation and electronwithdrawing groups (68). Yellow Couplers The most important classes of yellow dye-forming couplers are derived from β-ketocarboxamides, specifically the benzoylacetanilides (6) (69) and pivaloylacetanilides (7) (70). Substituents Y and Z can be used to attach ballasting or solubilization groups and to alter the reactivity of the coupler and the hue of the resulting dyes. Typical coupling-off groups (X) cited in the literature are also shown. Y COCHCONH X
Z
Ballast (6) Y
(CH3)3CCOCHCONH X
Ballast (7) H, Cl, OSO2R,
where X =
OOCR,
S-aryl, O
SO2R,
C O
,
OOCN
,
N C
R
O
For the widely studied benzoylacetanilides, coupler acidities can be correlated using a two-term Hammett equation that involves substituents in either or both rings. As in the naphthols, there is a linear correlation of the log kc− and pKa values, but β equals 0.55, suggesting that increased reactivity comes with reduced pKa until the coupler is nearly fully ionized (71). The hues of the
X (5)
resulting dyes can be shifted bathochromically by electronwithdrawing groups in either ring. Ortho substitution in the anilide ring increases the extinction coefficient and narrows the bandwidth of the dyes. Both the couplers and their dyes have significant absorptions in the ultraviolet that offer protection to dyes in the underlying layers (72). The relatively low pKa values for benzoylacetanilides, especially as two-equivalent couplers, minimize concerns over slow ionization rates and contribute to the couplers’ overall reactivity. But this same property often results in slow reprotonation in the acidic bleach, where developer carried over from the previous step can be oxidized and react with the still ionized coupler to produce unwanted dye in a nonimage related fashion. This problem can be eliminated by an acidic stop bath between the developer and the bleach steps or minimized by careful choice of coupling-off group, coupler solvent, or dispersion additives. The second widely used class of yellow couplers is the pivaloylacetanilides (7) and related compounds that bear a fully substituted carbon adjacent to the keto group. The dyes from these couplers tend to show significantly improved light stability, and so these couplers have been widely adopted for use in color papers as well as in many projection materials. In general, the dyes have narrowed bandwidths and less unwanted green absorptions (70). The lack of a second aryl group flanking the active methylene site, however, means that the pKa values of pivaloylacetanilides tend to be considerably higher than those of benzoylacetanilides. As a result, these couplers are rarely used as their four-equivalent parents. Rather, the coupling site is substituted with electronwithdrawing groups to increase the acidity of the coupler or hydrophilic groups to aid in the rate and extent of oilphase ionization. Both electron-withdrawing groups and hydrophilic groups can also appear in the anilide ring. An interesting variation on this is the use of polarizable groups, such as C=O or S=O, in the ortho position of an aryloxy group attached to the coupling site. These groups reduce the pKa of the coupler by increasing the rate of the ionization process (73). Another technique for increasing the reactivity of these couplers is reducing the steric
COLOR PHOTOGRAPHY
hindrance of the tertiary-butyl group by constraining two of the pendant carbons in a cyclopropyl ring (74). The higher pKa of the coupling site in pivaloylacetanilides does mean that undesirable dye formation in bleach is minimized. A deficiency of many yellow couplers is the relatively low extinction coefficient of the dyes that they form, as low as about 15,000 and usually no higher than 22,000. This can be compared to typical cyan and magenta dyes whose extinctions are greater than 30,000. To create a neutral in the film, this requires coating considerably more coupler and silver halide, resulting in a thicker, more expensive yellow layer and consequent reduced sharpness. One approach to improved extinction has been to substitute an indolinyl or 2-aryl-3-indolyl group for the t-butyl group of pivalolylacetanilides. These materials combine good reactivity and extinction coefficients as high as 27,000 (75). A second ingenious, albeit more complicated, approach is to make the coupling-off group itself a preformed yellow dye whose hue does not manifest itself until the dye is released when coupling. Thus each coupling event creates two dye molecules of dye, whose a combined extinction is more than 50,000 (76). Other classes of yellow couplers reported in the literature include indazolones (77) and benzisoxazolones (78). Neither of these structures contains an active methylene group; dye formation, it is believed, occurs through a ring-opening process. Magenta Couplers For many years, the most widely used magenta couplers have been derived from 1-aryl-2-pyrazolin-5-ones (79). Substituents in the aryl ring or at the 3-position have been used to alter dye hue and stability and to control coupler reactivity. Ballasting groups are usually attached through the 3-position as well. Electron-withdrawing groups at either site tend to shift the hue of the resulting dye bathochromically (80). The principal absorption of these dyes is in the region from 500–570 nm, but most pyrazolinone dyes also show significant unwanted absorptions in the blue. Because these can be minimized by using 3-arylamino or 3-acylamino substituents [see structures (8), (9)] while also increasing the extinction coefficient of the primary absorptions, such couplers have been extensively described in the literature (81). Ar represents an aryl group. Ar
Ar N
N
O
N NHR′
H
O
O NHCR′′
X
H
(8) where X =
N
X (9)
,
O
OOCNHR,
R OOCN
,
OOCR,
NHSO2R,
SR,
S-aryl, N
N-aryl
135
Although these pyrazolinone couplers can exist in several tautomeric forms, ionization of the couplers is rapid, and the four-equivalent parents have been widely used for decades. The aryl ring is often trisubstituted in the 2,4,6-positions, and one or more of the substituents are chlorine (82). The pKa values the 3-arylamino pyrazolinones (8) are higher than for their 3-acylamino (9) counterparts with dyes, whose hues are shifted toward shorter wavelengths and show less unwanted blue absorption. Coupler (10) [61354-99-2] is unique because it carries its own stabilizer against photochemical dye fade in the form of the 4-alkoxyphenol ballast, which makes it especially suitable for color paper (83). Cl Cl
N ClO
N
Cl NH NHCOCH O (10)
n-C12H25
OH C(CH3)3
The four-equivalent pyrazolinones suffer from a number of disadvantages. Before processing, the couplers can react with ambient formaldehyde to yield a non-dyeforming condensation product. Formaldehyde scavengers, such as ethylenediurea, can be added to control this problem (84). After processing, the residual coupler can react with image dye to form colorless products (85). A formaldehyde stabilizer, that reacts with the residual coupler in the same fashion as the undesirable preprocess reaction eliminates this problem. Finally, though these couplers react rapidly with oxidized developer, the intermediates undergo side reactions that result in reduced dye yields. The 3-acylamino pyrazolinones are more prone than 3-arylamino couplers to these problems, but both are susceptible (86). The two-equivalent counterparts to these couplers are largely devoid of these problems. Both halogen and sulfo groups are unsatisfactory leaving groups because of various side reactions, whereas the otherwise attractive aryloxy moiety causes the coupler to undergo an often rapid radical chain decomposition (87). Useful couplingoff groups cited in the literature include aryl, alkyl, and hetero thiol groups (88), nitrogen heterocycles (89), and arylazo groups (90). Because the thiols only depress the reactivity of the already low pKa acylamino pyrazolinones, they tend to be most useful on the arylamino couplers. Alkyl thiols are generally reluctant leaving groups in the developer because of their high pKa values, but eventually form image dye in the low pH bleach (56,91). Most of the arylthiol leaving groups cited in the literature show substitution at least in the 2-position. Bis-pyrazolinones, linked through the coupling site by methylene, substituted methylene, sulfide, and disulfide groups, reportedly give two-equivalent couplers, but only one of the couplers yields a dye (92). A more recent class of magenta dye-forming couplers is the pyrazolo-(3,2,-c)-5-triazoles (11) and related isomers (93), where X can be Cl, SR, S-aryl, or O-aryl.
136
COLOR PHOTOGRAPHY
Dyes from this class of couplers are exceedingly attractive; they have good thermal stability and much lower unwanted blue and red absorptions than dyes from the pyrazolinones. However, the high pKa values of the four-equivalent parents translate into unacceptably low reactivity. Two-equivalent analogs that have chloro or aryloxy leaving groups and hydrophilic groups elsewhere in the molecule provide good reactivity and form dye yield and are resistant to ambient formaldehyde and form dyes that do not require postprocess stabilization (94). A related class of pyrazolotriazoles (12) reportedly yields dyes that have improved light stability (95). The higher pKa values of pyrazolotriazole couplers have led to concerns over image variability induced by small changes in developer pH.
N
N
The dyes produced in chromogenic development have unwanted absorptions. For example, the cyan dye is expected to control or modulate red light alone and thus should absorb only between 600 and 700 nm, but it also shows lesser absorptions in the blue (400–500 nm) and green (500–600 nm) regions. Thus, exposure of the redsensitive layer of the film produces the desired density to red light in the negative, and also undesirable densities to blue and green light, resulting in desaturation or ‘‘muddying’’ of the color. For materials that are not directly viewed, like a color negative film, masking couplers provide an ingenious solution (31,98). Unlike a normal cyan dyeforming coupler, which is colorless, a cyan masking coupler bears a colored, preformed (usually azo) dye in the coupling-off position. The hue of this dye is chosen to match the unwanted blue-green absorption of the cyan dye that is generated. When coupling occurs, the preformed dye is released and washes out of the film or is destroyed. The result is a negative image formed by the cyan dye with its unwanted absorptions and an entirely complementary positive image left by the preformed dye that remains on the residual coupler (Fig. 14). This is equivalent to a perfect cyan image overlaid with a uniform blue-green density. Because the negative is printed onto color paper using separate cyan, magenta, and yellow exposures, a somewhat longer cyan exposure is required. Similar chemistry is employed to deal with the unwanted blue density of the magenta coupler (99).
Ballast
N X
Colored Masking Couplers
(CH2)n N
CH3
indazolones (97). The latter are unique because they do not contain an active methylene group and it is proposed, form magenta dyes with a zwitterionic structure.
H (11) N
N
N
CH3 CH
CH3 NH
CH2NH
Cl
C
R
O (12)
Other classes of magenta dye-forming couplers reported in the literature include pyrazolobenzimidazoles (96) and
R
R H N+ + ··
N
N
O−
R2N
N
O +
Cyan dye
Masking coupler
Density
R 2N
Log exposure Figure 14. Masking coupler used in the cyan layer that shows ( — ) the unwanted density to blue-green light that accompanies cyan dye formation matched by (- - - - ) a complementary density to blue-green from the unreacted coupler and ( — · — .) density to red light.
+ N2
COLOR PHOTOGRAPHY
The use of colored masking couplers does not come without a price, however. The preformed dye can absorb light during exposure, reducing effective film speed, and the background density of the masking dye can reduce the dynamic range of the negative. The first problem can be attacked by the using a process-removable group, which shifts the hue of the masking dye out of the visible region in the unprocessed film (100). DIR Couplers Masking couplers cannot be used for directly viewed materials because of the objectionable color of the mask itself. But similar advantages and more can be achieved by using development-inhibitor-releasing (DIR) couplers (32,101). These materials are usually image couplers that carry a silver development inhibitor (In) linked directly or indirectly to the coupling site (Eq. 6). When released as a function of dye formation, the development inhibitors migrate to the silver halide grain and either slow or stop further development. In addition to correcting of unwanted dye absorptions, DIR couplers can be used to improve the sharpness and reduce the granularity of films (102). NH2
OH R′ +
In
NR2
137
and is designed to migrate into the adjacent magenta dye-forming layer, where it inhibits silver development and reduces the production of green density to the same degree that unwanted green density is produced by the cyan dye. This is sometimes referred to as a red-onto-green interlayer interimage effect and can occur only when the predominantly red image in the scene has some green component. Similar color correction can originate from the yellow and magenta dye-forming layers, and the overall correction can be described by a 3 × 3 color matrix (101). DIR couplers can also be used to control granularity, a measure of the visual nonuniformity of the dye image resulting from the random distribution of the dye clouds. Granularity, the noise in the photographic signal, is proportional to the density divided by the square root of the number of signal-generating centers, in this case, silver halide grains. Increasing the amount of silver halide does not in itself reduce granularity, however, because this also generates more oxidized developer and greater density. However, a DIR coupler can permit the coating of more silver halide centers without increased dye density by releasing an inhibitor that allows each grain to develop only partially. Perhaps the most intriguing use of DIR couplers is their ability to improve the perceived sharpness of an image. If a sharp photographic edge or square wave signal is imposed on a piece of film, for example, by placing an opaque material over part of the film and exposing it to light, the
(a)
DIR coupler AgX
R′ O
Density
(6) NR2 + In
N Dye R S N
N N
N ; N
Distance
Z N
Development inhibitors have long been used to control photographic processes, but the mechanism by which they work remains in dispute. In one model of the developing silver halide grain, the inhibitor reacts with the surface silver ions at the etch pit, a defect region on the crystal where halide ions depart into solution (Fig. 9). This complexation prevents these silver ions from migrating interstitially to the root of the developing silver filament and slows down or stops development (103). Another model shows the inhibitor complexing with the silver metal filament, where its insulating properties prevent electron injection by the oxidizing developer (104). For color correction of, the unwanted green absorptions of a cyan dye, for example a cyan DIR coupler is added to the imaging layer along with the cyan image coupler. The inhibitor is released in proportion to cyan dye formation
(b)
Density
N where In =
Distance Figure 15. Light scatter and chemical diffusion lead to a loss of sharpness at (a) a photographic edge where ( — ) represents the image and (- - - - ) the developed image. In case of (b) an inhibitor modified edge, the sharpness of the (- - - - ) developed image can be restored by preferentially releasing a development inhibitor in image areas. ( — ) represents the inhibitor concentration profile. Diffusion of the inhibitor accentuates the edge relative to the macro portion of the image.
138
COLOR PHOTOGRAPHY
O
Coupler O
O (CH2)n
NCIn R
X
R
O−
O (CH2)n
Devox
Dye + X
N R
C
N (CH2)n + In−
In Switch diffuses
O
X
n = 0, 1 Figure 16. Reaction of a delayed-release DIR coupler and oxidized developer (Devox ). A delayed-release DIR coupler permits fine-tuning where and when the development inhibitor (In) is generated by releasing a diffusible inhibitor precursor or ‘‘switch’’ as a function of image formation. This permits control of inter- and intralayer effects.
resulting developed image shows some degradation of the edge, mostly because of light scatter, but also because of diffusion processes (Fig. 15a). This ‘‘softening’’ of the edge is perceived as a loss in sharpness. If, however, a DIR coupler is coated with the image coupler, it produces the inhibitor concentration profile of Fig. 15b. The resulting inhibition of development leads to the final dye image of Fig. 15b, where the density at the edge is enhanced relative to the macro density. This increased rate of density change at the edge is seen as sharpness (102). Although the varied uses for which DIR couplers are employed call for precise control over where the inhibitor diffuses, the very complexation mechanism by which inhibitors work preclude such control. The desired ability to target the inhibitor can be attained by using delayedrelease DIR couplers, which release not the inhibitor itself, but a diffusable inhibitor precursor or ‘‘switch’’ (Fig. 16) (105). Substituents (X, R) and structural design of the precursor permit control over both diffusivity and the rate of inhibitor release. Increasing the effective diffusivity of the inhibitor, however, means that more of it can diffuse into the developer solution where it can affect film in an undesirable, nonimagewise fashion. This can be minimized by using self-destructing inhibitors that are slowly destroyed by developer components and do not build up or ‘‘season’’ the process (106). Although most DIR couplers are based on image dyeforming parents, universal DIR couplers have appeared in the literature. These materials react with oxidized developer to produce the inhibitor (or precursor) and either a colorless dye, an unstable dye, or a washout dye (107). Universal DIR couplers could be used in any layer where there is a need to match only image-modifying properties, not hue, to the given layer. In addition to inhibitors and masking dyes, similar coupling mechanisms have been used to release other photographically useful fragments, including development accelerators, image dyes, bleaching accelerators, development competitors, and bleaching inhibitors (108). Other mechanisms for imagewise release of inhibitors include inhibitor-releasing developers (IRDs) (13) (109) and the related IRD-releasing couplers (14) (110). The former is of particular interest as a potential means of achieving image modification in reversal systems, where the image structure is determined in a black-and-white development
step before chromogenic development. OH
NHCOR′
R
In
R′
R′′
HO
OH (13)
Z
In
O
OH X
RCONH
Y
(14)
Post-Development Chemistry The silver and silver halide that remains in the film after color development must be removed to improve the appearance of the color image and to prevent the appearance from changing as silver halide is slowly photoreduced to silver metal. This is generally accomplished in two steps. The first, called bleaching, is an oxidation that converts silver metal to silver salts. The second, called fixing, is solubilization of the silver salts by complexation with a silver ligand. In some processes, particularly those used for color paper, the two steps can be combined in a single step called a bleach-fix or blix (62,111). The most common color film bleaches are the rehalogenating type; they contain halide ion, often bromide, to complex the silver ion being formed and drive the reaction to its conclusion. For many years, the most common bleaching agent was ferricyanide, a powerful bleaching agent that also can convert residual leuco dye into image dye (62). However, the high oxidation potential, E0 = 356 mV versus the hydrogen electrode (NHE), also makes ferricyanide capable of oxidizing the color developer carried over with the film. This can couple and form nonimagewise dye. For this reason, an intervening acidic ‘‘stop’’ bath must be used with ferricyanide bleaches. Rehalogenating bleaches based on ferric ethylenediaminetetraacetic acid (ferric EDTA) are lower in potential, E0 = 117 mV versus NHE, do not require a stop bath, and permit a shorter and simpler process (112). The bleaching efficiency of ferric EDTA can be improved by lowering the solution pH below 6. However, as the pH is dropped, the propensity for reducing indoaniline cyan dyes to the colorless leuco form increases. Thus, most iron ligand bleaches are designed to operate in a window bounded by the pH of the bath and the oxidation potential of the bleaching agent.
COLOR PHOTOGRAPHY
The rates of most color bleaching processes are limited by diffusion over at least part of the reaction. Color negative films tend to pass from a diffusion-limited into a chemically limited regime as sensitizing dye and other passivating species accumulate on the remaining silver surfaces (113). Persulfate anion is used as a bleaching agent in some motion picture film processes (114). Although it is thermodynamically attractive because of the very high (E0 = 2010 mV vs NHE) oxidation potential, persulfate is a kinetically slow bleach in the absence of a catalyst. Commonly, a prebath containing dimethylaminoethanethiol [108-02-1] is used; the use of bleach accelerator-releasing couplers or blocked bleach accelerators incorporated in the film has also been proposed (108,115). Thiosulfate, usually as its sodium or ammonium salt, is almost universally employed as the fixing agent for color films (111). Thiocyanate can be used as a fixing accelerator. Fixing performance is often defined by two parameters: the clearing time necessary to dissolve the silver halide and render the film optically transparent and the fixing time required to remove the complexed silver from the film. Fixing must be complete and the fixing agents thoroughly washed from the film because thiosulfate can destroy image dye by reduction or by other reactions in long-term storage. The complexation constants of thiosulfate with silver ion are sufficiently large to keep silver salts from being redeposited on the film when diluted in the wash. Film Quality Speed Standards for photographic speed are now coordinated worldwide by the International Standards Organization (ISO) in Geneva, Switzerland. ISO speed is determined under specified conditions of exposure, processing, and measurement. Standards are published for color negative and color reversal films (116,117). In amateur color photography, the most popular film speeds are in the ISO 100–400 range. For modern 35-mm cameras, these speeds offer a good compromise between image quality, depth of field, and ability to arrest motion, particularly when coupled with electronic flash. Several high-speed color films of ISO 1000 or greater, now available from primary manufacturers, are recommended for conditions of low illumination or fast subject motion. The larger silver halide grains necessary for high speed become progressively less efficient in converting absorbed radiation into a latent image, and in practice the linear relationship shown in Fig. 5 rolls off at the larger sizes. Image graininess becomes objectionable if high magnification is required. Sensitivity to ambient ionizing radiation of both cosmic and terrestrial origin also increases with crystal size, resulting in a proportion of crystals fogging during storage and increasing graininess (118,119). The human visual system can adapt to changes in the spectral balance of the scene illuminant. For example, a gray color appears gray under both daylight and tungsten illumination. However, a color film designed for use in daylight produces prints that have a yellowish cast if used with tungsten illumination. This effect can be reduced by
139
placing an appropriate filter over the camera lens or by adjusting light filtration when printing a color negative. In practice, designing a film for a particular illuminant involves adjusting its sensitivity to blue, green, and red spectral light to account for the relative abundance of each in that illuminant. Most amateur films are balanced for optimum performance in daylight at a correlated color temperature of 5500 K (4,21). Color Reproduction Color has three basic perceptual attributes: brightness, hue, and saturation. Saturation relates to the amount of the hue that is exhibited. The primary influences on the color quality of the final image are the red, green, and blue spectral sensitivities of the film or paper and the spectral absorption characteristics of the image dyes. If the green record is designed to match the sensitivity of the eye’s γ receptor and the red record the ρ receptor (Fig. 1), channel overlap is so severe that a satisfactory reproduction cannot be obtained. In practice, the peak sensitivity of the red record is moved to longer (ca 650 nm) wavelengths to reduce the overlap. A penalty is that colors that have strong reflectances at 650 nm or longer appear excessively reddish in the reproduction. Ideally, subtractive image dyes should exhibit no absorption outside the spectral ranges intended. In reality, considerable unwanted absorptions occur, even using the best dyes, and colored masking couplers (see Fig. 14) are designed to alleviate this problem. However, to avoid serious speed losses, a colored coupler is normally employed only under layers in the film that absorb in a similar range. Thus, a magenta-colored coupler, which absorbs green light, is used in the cyan pack; a yellow-colored coupler, which absorbs blue light, may be used in the magenta or cyan packs. Color may also be corrected by selectively suppressing dye generation in one color record as a function of exposure in another by using a diffusible development inhibitor generated by a development-inhibitor-releasing (DIR) coupler. Masking couplers and DIR couplers are the main tools for achieving the high color saturation of modern color negative films. Reversal films are designed so that the camera film is the film that contains the final image, so these are subject to additional constraints. The minimum densities in the image must be low to accommodate scene highlights, and colored masking couplers cannot be used. In addition, because the color development step is exhaustive, that is, it develops all of the residual silver halide, it is difficult to gain color correction from DIR couplers, which rely on influencing the relative development rates in a system that is only partially developed. Some degree of interlayer development inhibition can be obtained in the black-and-white developer by the migration of iodide released from the developing silver halide (120). Some of the factors important in designing modern color reversal films are their ability to recover good quality images from underexposed film (push-processing), high emulsion efficiency for good speed and grain, and film technology for improved sharpness (121).
140
COLOR PHOTOGRAPHY
Dye Stability The dyes used in photographic systems can degrade over time, both by thermal reactions and, if the image is displayed for extended periods of time, by photochemical processes. The relative importance of these two mechanistic classes, known as dark fade and light fade, respectively, depends on the way the product is to be used (122). Meaningful evaluations of dye stability on a practical time scale are difficult because the reactions themselves are by design either slow or inefficient. The t1/10 value, defined as the time for a 0.1 density loss from a dye patch of 1.0 initial density, for dark fade ranges from 30 to greater than 100 years for chromogenically generated dyes, whereas the quantum yields of photochemical fading are often on the order of 1 × 10−7 . Accelerated testing conditions that involve high-temperature storage or highintensity illumination must be used (123), although these are sometimes unreliable predictors of ambient fade (124). The perceived color stability of a photographic system is usually limited by the fading of its least stable dye, which can produce an undesirable shift in color balance. Whereas recovery of such faded images is often possible, a so-called neutral fade, in which all three color records lose density at approximately the same rate, is usually preferred. For light fade, the magenta dye has usually been limiting. Numerous studies support the hypothesis that the fading mechanism is photooxidative (125). Stabilization techniques have included exclusion of oxygen by barrier or encapsulation technologies; elimination of ultraviolet light by incorporated UV absorbers; quenching of one or more of the excited state species in the reaction sequence, that is, dye singlet, dye triplet, or singlet molecular oxygen, via incorporated quenching agents; and scavenging of reactive intermediates such as free radicals. Cyan light fade includes both reductive and oxidative components (126). The dark fade of yellow dyes is largely a hydrolytic process (127). Cyan dark fade results mostly from reduction of the dye to its leuco form. For the magenta record, dark storage can be dominated by yellowing reactions from the residual coupler. A light-induced yellowing has also been observed. These problems have been successfully addressed by the design of new couplers and coupling-off groups. The stabilities of modern photographic products have vastly improved over those of the past. For example, the magenta light stability of color negative print papers has improved by about two orders of magnitude between 1942 and the present (128). Although such products offer very acceptable image stability in most customer uses, other processes, such as silver dye bleach and dye-transfer processes, can offer even greater stability, albeit at a significant cost in processing convenience. Image Structure Because the primary photographic sensors are a population of silver halide crystals whose spatial distribution is random, the final image is also particulate. In black-and-white photography the image consists of opaque deposits of silver. In chromogenic photography
using incorporated couplers, the image is formed by the coupler-containing hydrophobic droplets dispersed with the silver halide. The droplet size, typically 0.2 µm in diameter, is usually less than the crystal size. Dye is formed in a cloud of droplets around each developing crystal, as oxidized developer is released. Individual droplets cannot be resolved under usual viewing conditions, but the dye clouds can be seen under magnification and convey a visual sensation of nonuniformity or graininess. The objective correlate of graininess is granularity, which is the spatial variation in density observed when numerous readings are taken on a uniformly exposed patch using a densitometer that has a very small aperture. The distribution of such measurements is approximately Gaussian and can be characterized by its standard deviation, σD . This quantity, called root-mean-square (RMS) granularity, is published by film manufacturers as a figure of merit. The features which differentiate a color image from a black-and-white image are the transparency of the dyes and the spreading of the dye cloud around the developing crystal. Dye clouds continue to grow during development as the oxidized developer diffuses farther to find unreacted coupler. Overlapping of adjacent dye clouds leads to a reduction in granularity as more and more of the voids are filled in. The size of the dye clouds can be controlled by reducing the amount of silver development per crystal by using of development-inhibitor-releasing couplers, or by having a soluble coupler in the developer that competes with the incorporated coupler for oxidized developer (129), the dye formed is subsequently washed out. Sharpness is the edge detail in the final image. There are many opportunities to lose sharpness in a photographic system. Camera focus, depth of field, subject movement, camera movement during exposure, and the optical properties of the film all contribute to image sharpness. In the negative–positive system, printer focus and the optics of the color paper stock are also important. Modern multilayers of the type illustrated in Fig. 8 are typically 20–25 µm thick. As light penetrates the front surface, it is scattered by the embedded silver halide crystals, which differ in refractive index from the supporting gelatin medium. This optical spread increases with depth, and thus the uppermost blue-sensitive layer records the sharpest image and the lowest red-sensitive layer the least sharp image. Over the years, film manufacturers have reduced coated thickness to improve sharpness and reduce material costs. Absorbing dyes may also be included in the film to reduce sideways light scatter, although at some cost in speed. These are generally removed by the processing solutions. Another method of improving sharpness is to change the layer order. Motion picture theatrical print film, for example, has the green record on top to improve sharpness, because the eye is most sensitive to differences in midspectrum light. The sharpness of a film is often assessed by the modulation transfer function (MTF), which measures how sinusoidal test patterns of different frequencies are reproduced by the photographic material (130). A perfect reproduction has a MTF of 100% at all frequencies. In practice, a decrease of MTF due to increasing frequency occurs as a result of optical degradation. Films that contain couplers
COLOR PHOTOGRAPHY
that release development inhibitors can display MTF values above 100% at low frequencies because of edge effects that increase the output signal amplitude beyond that of the reference signal. The transport of inhibitor fragments across image boundaries leads to development suppression or enhancement on a microscopic scale (Fig. 15). Such edge effects are particularly useful in boosting the apparent sharpness of the lower layers of the film most degraded by optical scatter. However, if chemical enhancement of edges is carried too far, the effects appear unnatural. Environmental Aspects Photographic processing (photofinishing) is a geographically dispersed chemical industry. An increasing proportion of images is processed in minilabs, rather than large, centralized photofinishers, so manufacturers have responded to the need for more environmentally-benign chemistry. Processing machines are rarely emptied and refilled with fresh solutions but require replenishment because of chemical use, evaporation, and overflow. New films have been designed that the make use of coated silver more efficient and thus require less processing chemicals. The solution overflow that occurs is usually chemically regenerated and returned to the tank. Silver is most commonly recovered by electrolysis or metallic replacement from the processing solutions or by ion exchange from the wash water (131). Loss of chemicals from one tank into the next has been minimized. The color paper process has progressed from five chemical solutions, three washes, and a replenishment rate of 75 µL/cm2 (70 mL/ft2 ) of film for each of the five solutions to two chemical solutions, one wash, and replenishment rates of 6 µL/cm2 (6 mL/ft2 ) and less than 3 µL/cm2 (3 mL/ft2 ). For color negative films, developer replenishment has dropped from more than 300 to 43 µL/cm2 (40 mL/ft2 ). Regeneration of the now reduced overflow has decreased chemical discharge by as much as 55% (132). The new chemistry of the RA-4 paper process permits eliminating of benzyl alcohol from the developer and a 50% reduction of the biological or chemical oxygen demand (BOD/COD) of the film/paper effluent. Substitution of the more powerful ferric 1,3-propylenediaminetetraacetic acid (ferric 1,3-PDTA) bleaching agent for ferric EDTA allowed a 60% reduction in both iron and ligand concentrations and total elimination of ammonia from the bleaching formulation (133). Although ferric EDTA and related materials are rapidly photodegraded in the environment (134), concerns over the potential for persistent chelating agents to translocate heavy metals led to the development of photobleaches based on biodegradable chelating agents such as methyliminodiacetic acid (MIDA) and ethylenediaminodisuccinic acid (135,136). During the period 1985–1998, some components in the effluent have been reduced to near zero, and an overall reduction in effluent concentration of as much as 80% has been attained (132). Some of these reductions have been achieved by using solutions of highly concentrated, precisely metered chemicals, culminating in a system that uses dry tablets to deliver the replenishment chemistry, and plain water as the only liquid.
141
Economic Aspects At present, more than 3 billion rolls of film are processed worldwide every year. Despite its great size, there is still a tremendous opportunity for growth in the consumer photographic business. The countries whose consumption is highest (U.S. and Japan) average 7–8 rolls of film per household per year, whereas in developing markets such as China and Russia, film use averages less than onehalf roll per household per year. The last two decades have seen an increasing dominance of the 35-mm color negative at the expense of the color slide because of its greater exposure latitude, the convenience of viewing prints, and the ease of obtaining duplicate prints. The growth in the popularity of picture taking has been fueled by the increasing sophistication of 35-mm point-andshoot cameras. These offer features such as autoexposure, autofocus, motor drive, and built-in flash, all within a very compact camera body. The single-use camera, in which the film box itself is equipped with a lens and shutter, has also been very successful. In 1996, a camera-film format called the Advanced Photo System (APS) was introduced as a new industry standard. This has a smaller negative size than 35mm film but offers advantages to the customer such as easy loading, smaller camera size, a choice of three print formats including panoramic, and an index print with each order. APS negatives are returned in their original cassette and provide a convenient interface to a film scanner and therefore images in digital form. The relative volumes of different film speeds have changed over the years; there is a trend toward higher speeds because of the photographic flexibility that they offers in improved ability to arrest motion, better zoom capability, and better depth of field. For example, dividing the color negative population into two categories — those of 400 speed and higher and those of 200 speed and lower — the relative volume of the higher speed films has increased by about 50% from 1995 to 1999. The photofinishing industry has been growing at more than 5% annually. Since the mid-1970s a shift has occurred to favor local minilabs over large centralized processing laboratories. Although minilabs are generally more expensive, consumers appreciate the convenience, rapid access, and personal service. It has become clear in the 1990s that the photographic business has gone through a period of transition, driven both by the threats and opportunities provided by digital technology. Digital still imaging is starting to reach the consumer through digital still cameras based on charge-coupled device (CCD) sensors. The price for an equivalently featured digital camera is still considerably higher than that of a film camera, but prices are declining rapidly. An advantage of digital photography is that the image can be modified and transmitted easily, and the people are becoming aware that there is more that they can do with their pictures. However, high-resolution scanning puts silver halide film on the same playing field as electronic capture for image manipulation and reuse for different purposes. The scanning or ‘‘digitization’’ of conventional films and papers is becoming an increasingly important pathway in the industry. In the future, every film may
142
COLOR PHOTOGRAPHY
be scanned as a matter of course after processing. Standalone ‘‘kiosks’’ provide a way to scan, copy, or enhance conventional photographic prints. There are now more than 20,000 such kiosks in the United States. Because the growth of the Internet, many new businesses have started up offering photographic services, including storage of customers’ pictures. It is too early to judge the longterm viability of their business models. However, it is clear that convenient hard copy output will be even more important in the future as people learn that they can use their images for different purposes. There will be more options for getting high-quality prints — through wholesale photofinishing, minilabs, inkjet printing at home, and the Internet. The professional segment of color photography includes portrait and wedding photography, advertising photography, and photojournalism. Films tailored to these markets are offered by the major photographic film manufacturers. Most photojournalists now use digital still cameras because they need rapid access to the image. The motion picture industry has enjoyed continued growth in the last two decades. In spite of videocassette recorders, demand continues strong for the theatrical experience of first-run movies. This experience is enhanced by the improved picture quality of 70-mm origination and projection. Color negative film is often the preferred medium for prerecorded television shows and advertising; telecine transfer converts the images to electronic form for transmission.
ABBREVIATIONS AND ACRONYMS APS BOD CCD CD COD DIR Dox EDTA IRD IRD ISO MIDA MTF pAg PDTA PPD QDI QMI RMS
Advanced Photo System Biological Oxygen Demand Charge-Coupled Device Color Developer Chemical Oxygen Demand Developer Inhibitor Releasing (Coupler) Oxidized Developer Ethylenediaminetetraacetic Acid International Standards Organization Inhibitor Releasing Developer International Standards Organization Methyliminodiacetic Acid Modulation Transfer Function Negative of the log of the silver ion concentration 1,3-propylenediaminetetraacetic Acid para-Phenylene diamine Quinone diimine Quinone monoimine Root Mean Square
BIBLIOGRAPHY Color Photography under ‘‘Photography’’ in ECT, 1st ed., vol. 10, pp. 577–584, by T. H. James, Eastman Kodak Company; ‘‘Color Photography’’ in ECT, 2nd ed., vol. 5, pp. 812–845, by J. R. Thirtle and D. M. Zwick, Eastman Kodak Company; in ECT, 3rd ed., vol. 6, pp. 617–646, by J. R. Thirtle and D. M. Zwick, Eastman Kodak Company; in ECT, 4th ed., vol. 6, pp. 965–1002, by J. A. Kapecki and J. Rodgers, Eastern Kodak Company.
1. E. J. Wall, History of Three-Color Photography, American Photographic, Boston, MA, 1925. 2. J. S. Friedman, History of Color Photography, 2nd ed., Focal Press, London, 1968. 3. R. M. Evans, W. T. Hanson, and W. L. Brewer, Principles of Color Photography, Wiley, NY, 1953. 4. R. W. G. Hunt, The Reproduction of Colour, 4th ed., Fountain Press, Tolworth, UK, 1987. 5. T. H. James, ed., Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977. 6. J. Sturge, V. Walworth, and A. Shepp, eds., Imaging Processes and Materials — Neblette’s Eighth Edition, Van Nostrand Reinhold, NY, 1989. 7. J. M. Eder, History of Photography, Columbia University Press, NY, reprinted by Dover, NY, 1978. 8. W. T. Hanson, Photogr. Sci. Eng. 21, 293–296 (1977). 9. A. Weissberger, Sci. Am. 58, 648–660 (1970). 10. P. Glafkides, Photographic Chemistry, vol. II, Fountain Press, London, 1960. 11. G. F. Duffin, Photographic Emulsion Chemistry, Focal Press, London, 1966. 12. G. Haist, Modern Photographic Processing, Wiley, NY, 1979. 13. L. F. A. Mason, Photographic Processing Chemistry, Focal Press, London, 1966. 14. F. W. H. Mueller, Photogr. Sci. Eng. 6, 166 (1962). 15. J. Pouradier, in E. Ostroff, ed., Pioneers of Photography, SPSE, Springfield, VA, 1987, Chap. 3. 16. T. J. Dagon, J. Appl. Photogr. Eng. 2, 42 (1976). 17. R. L. Heidke, L. H. Feldman, and C. C. Bard, J. Imag. Technol. 11, 93 (1985). 18. P. Krause, Modern Photogr. 48(4), 72 (1984); ibid. 49(11), 47 (1985). 19. R. D. Theys and G. Sosnovsky, Chem. Rev. 97, 83 (1997). 20. T. Tani, Photographic Sensitivity, Oxford University Press, NewYork, 1995. 21. R. W. G. Hunt, Measuring Color, Ellis Horwood, Chichester, UK, 1987, p. 21. 22. R. M. Evans, J. Photogr. Sci. 9, 243 (1961). 23. C. R. Berry, in Ref. 5, p. 98. 24. J. E. Maskasky, J. Imag. Sci. 30, 247 (1986). 25. B. H. Carroll, Photogr. Sci. Eng. 21, 151 (1977). 26. S. Dahne, Photogr. Sci. Eng. 23, 219 (1979). 27. J. Spence and B. H. Carroll, J. Phys. Colloid Chem. 52, 1,090 (1948). 28. J. T. Kofron and R. E. Booms, J. Soc. Photogr. Sci. Technol. Jpn. 49(6), 499 (1986). 29. U.S. Pat. 4,439,520 (1984), J. T. Kofron and co-workers (to Eastman Kodak Company). 30. C. E. K. Mees, J. Chem. Ed. 6, 286 (1929). 31. W. T. Hanson, J. Opt. Soc. Am. 40, 166 (1950). 32. C. R. Barr, J. R. Thirtle, and P. W. Vittum, Photogr. Sci. Eng. 13, 214 (1969). 33. U.S. Pat. 4,923,788 (1990), L. Shuttleworth, P. B. Merkel, G. M. Brown (to Eastman Kodak Company). 34. Ger. Pat. 253,335 (1912), R. Fischer, R. Fischer, and H. Siegrist, Photogr. Korresp. 51, 18 1914. 35. L. E. Friedrich and J. E. Eilers, in F. Granzer and E. Moisar, eds., Progress in Basic Principles of Imaging Systems, Vieweg & Sohn, Wiesbaden, Germany, 1987, p. 385. 36. R. L. Bent and co-workers, J. Am. Chem. Soc. 73, 3,100 (1951).
COLOR PHOTOGRAPHY 37. L. K. J. Tong and M. C. Glesmann, J. Am. Chem. Soc. 79, 583,592 (1957). 38. J. Eggers and H. Frieser, Z. Electrochem. 60, 372,376 (1956). 39. L. K. J. Tong, M. C. Glesmann, and C. A. Bishop, Photogr. Sci. Eng. 8, 326 (1964); L. K. J. Tong and M. C. Glesmann, Photogr. Sci. Eng. 8, 319 (1964); R. C. Baetzold and L. K. J. Tong, J. Am. Chem. Soc. 93, 1347 (1971). 40. J. C. Weaver and S. J. Bertucci, J. Photogr. Sci. 30, 10 (1988). 41. E. R. Brown, in S. Patai and Z. Rappoport, eds., The Chemistry of Quinonoid Compounds, vol. 2, Wiley-Interscience, NY, 1988. 42. L. K. J. Tong and M. C. Glesmann, J. Am. Chem. Soc. 90, 5,164 (1968). 43. L. K. J. Tong, M. C. Glesmann, and R. L. Bent, J. Am. Chem. Soc. 82, 1,988 (1960). 44. J. Texter, D. S. Ross, and T. Matsubara, J. Imag. Sci. 34, 123 (1990). 45. L. K. J. Tong, in Ref. 5, pp. 345–351. 46. E. R. Brown and L. K. J. Tong, Photogr. Sci. Eng. 19, 314 (1975). 47. J. Texter, J. Imag. Sci. 34, 243 (1990). 48. U.S. Pat. 2,108,243 (1938), B. Wendt (to Agfa Ansco Corp.); U.S. Pat. 2,193,015 (1940), A. Weissberger (to Eastman Kodak Company); Brit. Pat. 775,692 (1957), D. W. C. Ramsay (to Imperial Chemical Ind.); Fr. Pat. 1,299,899 (1962), W. Pelz and W. Puschel (to Agfa-Gevaert), for examples. 49. C. A. Bishop and L. K. J. Tong, Photogr. Sci. Eng. 11, 30 (1967). 50. R. Bent and co-workers, Photogr. Sci. Eng. 8, 125 (1964). 51. F. W. H. Mueller, in S. Kikuchi, ed., The Photographic Image, Focal Press, London, 1970, p. 91. 52. J. R Thirtle, Chemtech. 9, 25 (1979), for a tutorial. 53. Brit. Pat. 701,237 (1953), (to DuPont); U.S. Pat. 3,767,412 (1973), M. J. Monbaliu, A. Van Den Bergh, and J. J. Priem (to Agfa-Gevaert); Brit. Pat. 2,092,573 (1982), M. Yagihara, T. Hirano, and K. Mihayashi (to Fuji Photo Film); U.S. Pat. 4,612,278 (1986), P. Lau and P. W. Tang (to Eastman Kodak Company). 54. F. Sachs, Berichte 33, 959 (1900). 55. C. A. Bishop and L. K. J. Tong, J. Phys. Chem. 66, 1,034 (1962). 56. J. A. Kapecki, D. S. Ross, and A. T. Bowne, Proc. Inte. EastWest Symp. II, The Society for Imaging Science and Technology, Springfield, VA, 1988, p. D-11. 57. L. E. Friedrich, unpublished data, 1983. 58. T. Matsubara and J. Texter, J. Colloid Sci. 112, 421 (1986); J. Texter, T. Beverly, S.R. Templar, and T. Matsubara, Ibid. 120, 389 (1987). 59. U.S. Pat. 4,443,536, 1984, G. J. Lestina (to Eastman Kodak Company). 60. Brit. Pat. 1,193,349, 1970, J. A. Townsley and R. Trunley (to Ilford, Ltd.); W. J. Priest, Res. Discl. 16468, 75–80, 1977; U.S. Pat. 4,957,857, 1990, K. Chari (to Eastman Kodak Company). 61. C. Barr, G. Brown, J. Thirtle, and A. Weissberger, Photogr. Sci. Eng. 5, 195 (1961). 62. K. H. Stephen, in Ref. 5, pp. 462–465. 63. U.S. Pat. 4,690,889, 1987, N. Saito, K. Aoki, and Y. Yokota (to Fuji Photo Film).
143
64. U.S. Pat. 2,367,531, 1945,I. L. Salminen, P. Vittum, and A. Weissberger (to Eastman Kodak Company); U.S. Pat. 2,423,730, 1947, I. L. Salminen and A. Weissberger (to Eastman Kodak Company); U.S. Pat. 4,333,999, 1982, P. T. S. Lau (to Eastman Kodak Company). 65. U.S. Pat. 2,895,826, 1959, I. Salminen, C. Barr, and A. Loria (to Eastman Kodak Company). 66. U.S. Pat. 2,369,929, 1945, P. Vittum and W. Peterson (to Eastman Kodak Company); U.S. Pat. 2,772,162, 1956, I. Salminen and C. Barr (to Eastman Kodak Company); U.S. Pat. 3,998,642, 1976, P. T. S. Lau, R. Orvis, and T. Gompf (to Eastman Kodak Company). 67. E. Wolff, in Proc. SPSE 43rd Conf., The Society for Imaging Science and Technology, Springfield, VA, 1990, pp. 251–253. 68. Eur. Pat. Apps. 744,655 and 717,315, 1996, S. Ikesu, V. B. Rudchenko, M. Fukuda Y. Kaneko (to Konica Corp.) 69. U.S. Pat. 2,407,210, 1946, A. Weissberger, C. J. Kibler, and P. W. Vittum (to Eastman Kodak Company); Brit. Pat. 800,108, 1958, F. C. McCrossen, P. W. Vittum, and A. Weissberger (to Eastman Kodak Company); Ger. Offen. 2,163,812, 1970, M. Fujiwhara, T. Kojima, and S. Matsuo (to Konishiroku KK.); Ger. Offen. 2,402,220, 1974, A. Okumura, A. Sugizaki, and A. Arai (to Fuji Photo Film); Ger. Offen. 2,057,941, 1971, I. Inoue, T. Endo, S. Matsuo, and M. Taguchi (to Konishiroku KK.); Eur. Pat. 296793 A2, 1988, B. Clark, N. E. Milner, and P. Stanley (to Eastman Kodak Company); Eur. Pat. 379,309, 1990, S. C. Tsoi (to Eastman Kodak Company). 70. U.S. Pat. 3,265,506, 1966, A. Weissberger and C. Kibler (to Eastman Kodak Company); U.S. Pat. 3,770,446, 1973, S. Sato, T. Hanzawa, M. Furuya, T. Endo, and I. Inoue (to Konishiroku KK.); Fr. Pat. 1,473,553, 1967, R. Porter (to Eastman Kodak Company); U.S. Pat. 3,408,194, 1968, A. Loria (to Eastman Kodak Company); U.S. Pat. 3,894,875, 1975, R. Cameron and W. Gass (to Eastman Kodak Company). 71. E. Pelizzetti and C. Verdi, J. Chem. Soc. Perkin II 806 (1973); E. Pelizetti and G. Saini, J. Chem. Soc. Perkin II 1,766 (1973); D. Southby, R. Carmack, and J. Fyson, in Ref. 67, pp. 245–247. 72. G. Brown and co-workers, J. Am. Chem. Soc. 79, 2,919 (1957). 73. J. A. Kapecki, unpublished data, 1981; U.S. Pat. 4,401,752, 1981, P. T. S. Lau (to Eastman Kodak Company). 74. U.S. Pat. 5,359,080, 1984, Y. Shimura, H. Kobayashi, Y. Yoshioka (to Fuji Photo Film Co.) 75. U.S. Pat. 5,213,958, 1993, M. Motoki, S. Ichijima, N. Saito, T. Kamio, and K. Mihayashi (to Fuji Photo Film Co.); U. S. Pat. 6,083,677, 2000, T. Welter and J. Reynolds (to Eastman Kodak Company). 76. U.S. Pat. 5,457,004, 1995, J. B. Mooberry, J. J. Siefert, D. Hoke, D. T. Southby, and F. D. Coms (to Eastman Kodak Company); D. Hoke, J. B. Mooberry, J. J. Siefert, D. T. Southby, and Z. Z. Wu, J. Imaging Sci. Technol. 42, 528 (1988). 77. Brit. Pat. 875,470, 1961, E. MacDonald, R. Mirza, and J. Woolley (to Imperial Chemical Ind.). 78. Brit. Pat. 778,089, 1957, J. Woolley (to Imperial Chemical Ind.). 79. U.S. Pat. 2,600,788, 1952, A. Loria, A. Weissberger, and P. W. Vittum (to Eastman Kodak Company); U.S. Pat. 2,369,489, 1945, H. D. Porter and A. Weissberger (to Eastman Kodak Company); U.S. Pat. 1,969,479, 1934, M. Seymour (to Eastman Kodak Company).
144
COLOR PHOTOGRAPHY
80. G. Brown, B. Graham, P. Vittum, and A. Weissberger, J. Am. Chem. Soc. 73, 919 (1951). 81. U.S. Pat. 2,343,703, 1944, H. Porter and A. Weissberger (to Eastman Kodak Company); U.S. Pat. 2,829,975, 1958, S. Popeck and H. Schulze (to General Aniline and Film); U.S. Pat. 2,895,826, 1959, I. Salminen, C. Barr, and A. Loria (to Eastman Kodak Company); U.S. Pat. 2,691,659, 1954, B. Graham and A. Weissberger (to Eastman Kodak Company); U.S. Pat. 2,803,544, 1957, C. Greenhalgh (to Imperial Chemical Ind.); Brit. Pat. 1,059,994, 1967, C. Maggiulli and R. Paine (to Eastman Kodak Company). 82. U.S. Pat. 2,600,788, 1952, A. Loria, A. Weissberger, and P. Vittum (to Eastman Kodak Company); U.S. Pat. 3,062,653, 1962, A. Weissberger, A. Loria, and I. Salminen (to Eastman Kodak Company). 83. U.S. Pat. 3,127,269, 1964, C. W. Greenhalgh (to Ilford, Ltd.); U.S. Pat. 3,519,429, 1970, G. Lestina (to Eastman Kodak Company). 84. U.S. Pat. 3,582,346, 1971, F. Dersch (to Eastman Kodak Company); U.S. Pat. 3,811,891, 1974, R. S. Darlak and C. J. Wright (to Eastman Kodak Company). 85. P. W. Vittum and F. C. Duennebier, J. Am. Chem. Soc. 72, 1,536 (1950). 86. A. Wernberg, unpublished data, 1986. 87. L. E. Friedrich, in Ref. 67, p. 261. 88. U.S. Pat. 3,227,554, 1966, C. R. Barr, J. Williams, and K. E. Whitmore (to Eastman Kodak Company); U.S. Pat. 4,351,897, 1982, K. Aoki and co-workers (to Fuji Photofilm Co.); U.S. Pat. 4,853,319, 1989, S. Krishnamurthy, B. H. Johnston, K. N. Kilminster, D. C. Vogel, and P. R. Buckland, (to Eastman Kodak Company). 89. U.S. Pat. 4,241,168, 1980, A. Arai, K. Shiba, M. Yamada, and N. surFurutachi (to Fuji Photofilm Co.). 90. U.S. Pat. 3,519,429, 1970, G. J. Lestina (to Eastman Kodak Company). 91. N. Furutachi, Fuji Film Res. Dev. 34, 1 (1989). 92. U.S. Pat. 2,213,986, 1941, J. Kendall and R. Collins (to Ilford, Ltd.); U.S. Pat. 2,340,763, 1944, D. McQueen (to DuPont); U.S. Pat. 2,592,303, 1952, A. Loria, P. Vittum, and A. Weissberger (to Eastman Kodak Company); U.S. Pat. 2,618,641, 1952, A. Weissberger and P. Vittum (to Eastman Kodak Company); U.S. Pat. 3,834,908, 1974, H. Hara, Y. Yokota, H. Amano, and T. Nishimura (to Fuji Photo Film Co.). 93. U.S. Pat. 3,725,067, 1973, J. Bailey (to Eastman Kodak Company); U.S. Pat. 3,810,761, 1974, J. Bailey, E. B. Knott, and P. A. Marr (to Eastman Kodak Company). 94. U.S. Pat. 4,443,536, 1984, G. J. Lestina (to Eastman Kodak Company); Eur. Pat. 284,240, 1990, A. T. Bowne, R. F. Romanet, and S. E. Normandin (to Eastman Kodak Company); Eur. Pat. 285,274, 1990, R. F. Romanet and T. H. Chen (to Eastman Kodak Company). 95. U.S. Pat. 4,540,654, 1985, T. Sato, T. Kawagishi, and N. Furutachi (to Fuji Photo Film). 96. K. Menzel, R. Putter, and G. Wolfrum, Angew. Chem. 74, 839 (1962); Brit. Pat. 1,047,612, 1966, K. Menzel and R. Putter (to Agfa-Gevaert). 97. J. Jennen, Chim. Ind. 67(2), 356 (1952). 98. W. T. Hanson Jr. and P. W. Vittum, PSA J. 13, 94 (1947); U.S. Pat. 2,449,966, 1948, W. T. Hanson Jr. (to Eastman Kodak Company); K. O. Ganguin and E. MacDonald, J. Photogr. Sci. 14, 260 (1966). 99. Ref. 4; pp. 270 ff; pp. 320–321.
100. U. S. Pat. 5,364,745, 1994, J. B. Mooberry, S. P. Singer, J. J. Seifert, R. J. Ross, and D. L. Kapp (to Eastman Kodak Company) 101. U.S. Pat. 3,227,554, 1966, C. R. Barr, J. Williams, and K. E. Whitmore (to Eastman Kodak Company); C. R. Barr, J. R. Thirtle, and P. W. Vittum, Photogr. Sci. Eng. 13, 74,214 (1969). 102. M. A. Kriss, in Ref. 5, pp. 610–614; P. Kowaliski, Applied Photographic Theory, Wiley, London, UK, 1972, pp. 466–468. 103. L. E. Friedrich, unpublished data, 1985; K. Liang, in Ref. 35, p. 451. 104. T. Kruger, J. Eichmans, and D. Holtkamp, J. Imag. Sci. 35, 59 (1991). 105. U.S. Pat. 4,248,962, 1981, P. T. S. Lau (to Eastman Kodak Company); Brit. Pat. 2,072,363, 1981, R. Sato, Y. Hotta, and K. Matsuura (to Konishiroku KK.). 106. Brit. Pat. 2,099,167, 1982, K. Adachi, H. Kobayashi, S. Ichijima, and K. Sakanoue (to Fuji Photo Film Co.); U.S. Pat. 4,782,012, 1988, R. C. DeSelms and J. A. Kapecki (to Eastman Kodak Company). 107. U.S. Pat. 3,632,345, 1972, P. Marz, U. Heb, R. Otto, W. Puschel, and W. Pelz (to Agfa-Gevaert); U.S. Pat. 4,010,035, 1977, M. Fujiwhara, T. Endo, and R. Satoh (to Konishiroku Photo); U.S. Pat. 4,052,213, 1977, H. H. Credner and co-workers (to Agfa-Gevaert); U.S. Pat 4,482,629, 1984, S. Nakagawa, H. Sugita, S. Kida, M. Uemura, and K. Kishi (to Konishiroku KK.). 108. T. Kobayashi, in Proc. 16th Symp. Nippon Shashin Kokkai, Japan, 1986; U.S. Pat. 4,390,618, 1983, H. Kobayashi, T. Takahashi, S. Hirano, T. Hirose, and K. Adachi (to Fuji Photo Film); U.S. Pat. 4,859,578, 1989, D. Michno, N. Platt, D. Steele, and D. Southby (to Eastman Kodak Company); U.S. Pat. 4,912,025, 1990, N. Platt, D. Michno, D. Steele, and D. Southby (to Eastman Kodak Company); Eur. Pat. 193,389, 1990, J. L. Hall, R. F. Romanet, K. N. Kilminster, and R. P. Szajewski (to Eastman Kodak Company). 109. Brit. Pat. 1,097,064, 1964 C. R. Barr (to Eastman Kodak Company); Belg. Pat. 644,382, 1964, R. F. Porter, J. A. Schwan, and J. W. Gates Jr. (to Eastman Kodak Company). 110. K. Mihayashi, K. Yamada, and S. Ichijima, in Ref. 67, p. 254. 111. G. I. P. Levenson, in Ref. 5, pp. 437–461. 112. K. H. Stephen and C. M. MacDonald, Res. Disclosure 240, 156 (1984); J. L. Hall and E. R. Brown, in Ref. 35, p. 438. 113. S. Matsuo, Nippon Shashin Gak. 39(2), 81 (1976). 114. Manual for Processing Eastman Color Films, Eastman Kodak Company, Rochester, NY, 1988. 115. U.S. Pat. 4,684,604, 1987, J. W. Harder (to Eastman Kodak Company); U.S. Pat. 4,865,956, 1989, J. W. Harder and S. P. Singer (to Eastman Kodak Company); U.S. Pat. 4,923,784, 1990, J. W. Harder (to Eastman Kodak Company). 116. ISO 5800:1987(E), Photography — Color Negative Films for Still Photography — Determination of ISO Speed. 117. ISO 2240:1982(E), Photography — Color Reversal Camera Films — Determination of ISO Speed. 118. A. F. Sowinski and P. J. Wightman, J. Imag. Sci. 31, 162 (1987). 119. D. English, Photomethods, 16 (May 1988). 120. W. T. Hanson Jr. and C. A. Horton, J. Opt. Soc. Am. 42, 663 (1952).
COLOR PHOTOGRAPHY 121. J. D. Baloga, Proc. P.I.C.S. Conf. (Society for Imaging Science and Technology), Portland, Oregon, May 17–20, 1998, p. 299; J. D. Baloga and P. D. Knight, J. Soc. Photogr. Sci. Technol. Jpn. 62, 111 (1999). 122. R. J. Tuite, J. Appl. Photogr. Eng. 5, 200 (1979). 123. K. O. Ganguin, J. Photogr. Sci. 9, 172 (1961); D. C. Hubbell, R. G. McKinney, and L. E. West, Photogr. Sci. Eng. 11, 295 (1967). 124. Y. Seoka and K. Takahashi, Second Symp. Photogr. Conserv. Society of Scientific Photography, Japan., Tokyo, July 1986, pp. 13–16. 125. P. Egerton, J. Goddard, G. Hawkins, and T. Wear, Royal Photogr. Soc. Colour Imaging Symp., Cambridge, UK, Sept. 1986, p. 128. 126. K. Onodera, T. Nishijima, and M. Sasaki, in Proc. Int. Symp.: The Stability Conser. Photogr. Images, Bangkok, Thailand, Nov. 1986. 127. E. Hoffmann and A. Bruylants, Bull. Soc. Chim. Belges 75, 91 (1966); K. Sano, J. Org. Chem. 34, 2,076 (1969). 128. R. L. Heidke, L. H. Feldman, and C. C. Bard, J. Imag. Technol. 11(3), 93 (1985). 129. U.S. Pat 2,689,793, 1954, W. R. Weller and N. H. Groet (to Eastman Kodak Company).
145
130. J. C. Dainty and R. Shaw, Image Science, Academic Press, London, 1974; M. Kriss, in Ref. 5, Chap. 21, p. 596. 131. CHOICES — Choosing the Right Silver-Recovery Method for Your Needs, Kodak Publication J-21, Rochester, NY, 1989. 132. T. Cribbs, Sixth Int. Symp. Photofinishing Technology, The Society for Imaging Science and Technology, Springfield, VA, 1990, p. 53; Kodak Publication Z130, revision of July, 1998 at www.kodak.com. 133. D. G. Foster and K. H. Stephen, in Ref. 132, p. 7. 134. F. G. Kari, S. Hilger, and S. Canonica, Enviro. Sci. and Techno. 29, 1,008 (1995); F. G. Kari and W. Giger, ibid. 29, 2,814 (1995). 135. U.S. Pat. 4,294914, 1981, J. Fyson (to Eastman Kodak Company); U.S. Pat. 5,652,085 and 5,691,120, 1997, D. A. Wilson, D. K. Crump, and E. R. Brown (to Eastman Kodak Company); Eur. Pat. 532,003B1, 1998, Y. Ueda (to Konica Corporation). 136. S. Koboshi, M. Hagiwara, Y. Ueda, IS&T Tenth Int. Symp. on Photofinishing Technol. Paper Summaries, p. 17, The Society for Imaging Science and Technology, Springfield, VA, 1998.
D DIGITAL
VIDEO
TELEVISION ARUN N. Bell Labs
SCANNING
The image information captured by a television camera conveys color intensity (in terms of red, green, and blue primary colors) at each spatial location (x, y) and for each time instance t. Thus, the image intensity is multidimensional (x, y, t). However, it needs to be converted to a unidimensional signal for processing, storage, communications, and display. Raster scanning is the process used to convert a three-dimensional (x, y , t) image intensity into a one-dimensional television waveform (1). The first step is to sample the television scene many times (l/T, where T is frame period in seconds) per second to create a sequence of still images (called frames). Then, within each frame, scan lines are created by vertical sampling. Scanning proceeds sequentially, left to right for each scan line and from top to bottom line at a time within a frame. In a television camera, an electron beam scans across a photosensitive target upon which the image is focused. In more modern cameras, chargecoupled devices (CCDs) are used to image an area of the picture, such as an entire scan line. At the other end of the television chain, using raster-scanned displays, an electronic beam scans and lights up the picture elements in proportion to the light intensity. Although it is convenient to think that all of the samples of a single frame occur at one time (similar to the simultaneous exposure of a single frame for film), the scanning in a camera and in a display makes every sample correspond to a different time.
NETRAVALI
Lucent Technolo@es Murray Hill, NJ Analog television was developed and standardized in the 1940s mainly for over-the-air broadcast of entertainment, news, and sports. A few upward compatible changes have been made in the intervening years, such as color, multi-channel sound, closed captioning, and ghost cancellation, bit the underlying analog system has survived a continuous technological evolution that has pervaded all other media. Television has stimulated the development of a global consumer electronics industry that has brought high-density magnetic recording, high-resolution displays, and low-cost imaging technologies from the laboratory into the living room. A vast array of video production processing technologies make high-quality programming an everyday reality, real-time on-site video the norm rather than the exception, and video the historical medium of record throughout the world. More recently, the emergence of personal computers and high speed networks has given rise to desktop video to improve productivity for businesses. In spite of this impressive record and a large invested base, we are on the threshold of a major disruption in the television industry. After fifty years of continuous refinement, the underlying technology of television is going to be entirely redone. Digital video is already proliferating in a variety of applications such as videoconferencing, multimedia computing, and program production; the impediments that have held it back are rapidly disappearing. The key enabling technologies are (1) mature and standardized algorithms for high-quality compression; (2) inexpensive and powerful integrated circuits for processing, storing, and reconstructing video signals; (3) inexpensive, high-capacity networks for transporting video; (4) uniform methods for storing, addressing, and accessing multimedia content; and (5) evolution of computer architecture to support video I/O. The market drivers include: (1) direct consumer access for content providers, (2) convergence of video with other information sources such as print; (3) the emergence of a fast growing consumer market for personal computing, (4) the evolution of the Internet and other networks in the commercial domain, and (5) the removal of various regulatory barriers. This article deals with the technology of digital television. We first start with the way the television signal is sampled (scanning) and digitized. Then we discuss techniques of compression to reduce the bit rate to a manageable level and describe briefly the emerging standards for compression.
Progressive
and
Interlaced
Scan
There are two types of scanning: progressive (also called sequential) and interlaced. In progressive scanning, the television scene is first sampled in time to create frames, and within each frame, all of the raster lines are scanned in order from top to bottom. Therefore, all of the vertically adjacent scan lines are also temporally adjacent and are highly correlated even in the presence of rapid motion in the scene. Almost all computer displays, are sequentially scanned, especially for all high-end computers. In interlaced scanning (Fig. l), all of the odd-numbered lines in the entire frame are scanned first during the first half of the frame period T, and then the even-numbered lines are scanned during the second half. This process produces two distinct images per frame at different times. The set of odd-numbered lines constitutes the odd field, and the even-numbered lines make up the even field. All current TV systems (National Television System Committee [NTSCI, PAL, SECAM) use interlaced scanning. One of the principal benefits of interlaced scanning is reducing the scan rate (or the bandwidth) without significantly reducing image quality. This is done with a relatively high field rate (a lower field rate would cause flicker), while maintaining a high total number of scan lines in a frame (lower number of lines per frame would reduce resolution of static images). Interlace cleverly preserves the high-detail visual 146
DIGITAL
Figure
television frame is divided into an odd field odd-numbered scan lines) and an even field even-numbered scan lines).
1. A
(containing (containing
information and, at the same time, avoids visible large area flicker at the display due to insufficient temporal postfiltering by the human eye. The NTSC has 15,735 scan lines/s or 525 lines/frame, because there are 29.97 frames/s. For each scan line, a small period of time (16% to 18% of total line time), called blanking or retrace, is allocated to return the scanning beam to the left edge of the next scan line. European systems (PAL and SECAM) have 625 lines/frame, but 50 fields/s. The larger number of lines results in better vertical resolution, whereas larger numbers of frames result in better motion rendition and lower flicker. There is no agreement worldwide yet, but highdefinition TV (HDTV) will have approximately twice the horizontal and vertical resolution of standard television. In addition, HDTV will be digital, where the television scan lines will also be sampled horizontally in time and digitized. Such sampling will produce an array of approximately 1000 lines and as many as 2000 pixels per line. If the height/width ratio of the TV raster is equal to the number of scan line/number of samples per line, the array is referred to as having “square pixels,” that is, the electron beam is spaced equally in the horizontal and vertical direction, or has a square shape. This facilitates digital image processing as well as computer synthesis of images. One of the liveliest debates regarding the next generation of television systems involves the type of scanning to be employed: interlaced or progressive. Interlaced scanning was invented in the 1930s when signal processing techniques, hardware, and memory devices were all in a state of infancy. Because all current TV systems were standardized more than five decades ago, they use interlace, and therefore, the technology and the equipment (e.g., cameras) using interlace are mature. However, interlace often shows flickering artifacts in scenes with sharp detail and has poor motion rendition, particularly for fast vertical motion of small objects. In addition, digital data are more easily compressed in progressively scanned frames. Compatibility with film and computers also favors progressive scanning. In the future, because different stages of the television chain have different requirements, it is likely that creation (production studios), transmission, and display may employ different scanning methods. Production studios require high -quality cameras and compatibility with film
VIDEO
147
and computer-generated material, all of very high quality. If good progressive cameras were available and inexpensive, this would favor progressive scanning at even higher scan rates (> 1,000 lines/frame). However, transmission bandwidth, particularly for terrestrial transmissions, is expensive and limited, and even with bandwidth compression, current technology can only handle up to 1,000 lines/frame. Display systems can show a better picture by progressive scanning and refreshing at higher frame rates (even if the transmission is interlaced and at lower frame rates) made possible by frame buffers. Thus, though there are strong arguments in favor of progressive scanning in the future, more progress is needed on the learning curve of progressive equipment. The (Federal Communication Commission) FCC in the United States therefore decided to support multiple scanning standards for terrestrial transmission, one interlace and five progressive, but using a migration path toward the exclusive use of progressive scanning in the future. Image
Aspect
Ratio
The image aspect ratio is generally defined as the ratio of picture width to height. It impacts the overall appearance of the displayed image. For standard TV, the aspect ratio is 4 : 3. This value was adopted for TV because this format was already used and found acceptable in the film industry prior to 1953. However, since then, the film industry has migrated to wide-screen formats with aspect ratios of 1.85 or higher. Subjective tests on viewers show a significant preference for a wider format than that used for standard TV, so HDTV plans to use an aspect ratio of 1.78, which is quite close to that of the wide-screen film format. Image
Intensity
Light is a subset of electromagnetic energy. The visible spectrum ranges from 380 to 780 nm in wavelength. Thus, visible light in a picture element can be specified completely (pel) by its wavelength distribution {S(k)}. This radiation excites three different receptors in the human retina that are sensitive to wavelengths near 445 (blue), 535 (green), and 570 (red) nm. Each type of receptor measures the energy in the incident light at wavelengths near its dominant wavelength. The three resulting energy values uniquely specify each visually distinct color C. This is the basis of the trzchromatic theory of color which states that for human perception, any color can be synthesized by an appropriate mixture of three properly chosen primary colors R, G, and B (2). For video, the primaries are usually red, green, and blue. The amounts of each primary required are called the tristimulus values. If a color C has tristimulus values Rc, Gc, and Bc, then C = Rcbf R + GcG + BcB. The tristimulus values of a wavelength distribution S(k) are given by
(1)
148
DIGITAL
VIDEO
Luminance is an objective measure of brightness. Different contributions of wavelengths to the sensation of brightness are represented by the relative luminance efficiency y(h). Then, the luminance of any given spectral distribution S(h) is given by Y = k,
Figure 2. The color-matching functions for the 2” standard observer, based on primaries of wavelengths 700 (red), 546.1 (green), and 435.8 nm (blue) have units such that equal quantities of the three primaries are needed to match the equal energy white.
where {r(h), g(A), b(h)} are called the color matching functions for primaries R, G, and B. These are also the tristimulus values of unit intensity monochromatic light of wavelength h. Figure 2 shows color matching functions for the primary colors chosen to be spectral (light of a single wavelength) colors of wavelengths 700.0, 546.1, and 435.8 nm. Equation (1) allows us to compute the tristimulus values of any color that has a given spectral distribution S(A), by using color matching functions. One consequence of this is that any two colors whose spectral distributions are Sl(h) and Sa(A) match if and only if RI = G1 = B1 =
s s s
&(a)r(a)da
=
Sl(h)g(h)dh
=
&(h)b(h)dh
=
s s s
S,(A)r(a)da
= Rz,
&(h)g(h)dh
= G2,
&(A)b(A)da
= Ba,
(2)
where {RI, Gi, Bl} and {Rz, Gz , Bz} are the tristimulus values of the two distributions S1 (h) and Sa (A), respectively. This could happen even if Si (a) were not equal to Sz (A) for all wavelengths in the visible region. Instead of specifying a color by its tristimulus values {R, G, B}, normalized quantities called chromaticity coordinates {r, g, b} are often used: r=
R R+G+B’
g=
G R+G+B’
b=
B
R+G+B’
COMPOSITE
(Wh
3
TV SYSTEMS
A camera that images a scene generates for each pel the three color tristimulus values RGB, which may be further processed for transmission or storage. At the receiver, the three components are sent to the display, which regenerates the contents of the scene at each pel from the three color components. For transmission or storage between the camera and the display, a luminance signal Y that represents brightness and two chrominance signals that represents color are used. The need for such a transmission system arose with NTSC, the standard used in North America and Japan, where compatibility with monochrome receivers required a black-and-white signal, which is now referred to as the Y signal. It is well known that the sensitivity of the human eye is highest to green light, followed by that of red, and the least to blue light. The NTSC system exploited this fact by assigning a lower bandwidth to the chrominance signal, compared to the luminance Y signal. This made it possible to save bandwidth without losing color quality. The PAL and SECAM systems also employ reduced chrominance bandwidths (3). The NTSC System The NTSC color space of Y1Q can be generated the gamma-corrected RGB components or from components as follows: Y = 0.299R’
+ 0.5876’
+ O.l14B’,
I = 0.596R’
- 0.2746’
- 0.322B’
Q = 0.21lX
Because r +g + b = 1, any two chromaticity coordinates are sufficient. However, for complete specification a third dimension is required. The luminance (Y) is usually chosen.
S@>Y
where k, is a normalizing constant. For any given choice of primaries and their corresponding color matching functions, luminance can be written as a linear combination of the tristimulus values {R, G, B}. Thus, complete specification of a color is given by either the three tristimulus values or by the luminance and two chromaticities. Then, a color image can be specified by luminance and chromaticities at each pel.
= -(sin
(3)
s
from YW
33”) U + (cos 33”)V, - 0.5236’
- 0.311B’
= (cos 33”) U + (sin 33”)V,
(5)
where U = B’ - Y/2.03 and V = R’ - Y/1.14. (Gamma correction is performed to compensate for the nonlinear relationship between signal voltage U and light intensity B[B s Vy]).
DIGITAL
The inverse operation, corrected RGB components space, can be accomplished
that is, generation of gammafrom the Y1Q composite color as follows:
R’ = l.OY + 0.9561+
0.621&,
G’ = l.OY + 0.2711+
0.649&,
B’ = l.OY - 1.1061-
1.703Q.
(6)
In NTSC, the Y, 1, and Q signals are multiplexed into a 4.2-MHz bandwidth. Although the Y component itself takes the 4.2-MHz bandwidth, multiplexing all three components into the same 4.2 MHz becomes possible by interleaving luminance and chrominance frequencies without too much “crosstalk” between them. This is done by defining a color subcarrier at approximately 3.58 MHz. The two chrominance signals I and Q are (quadrature amplitude modulated) QAM modulated onto this carrier. The envelope of this QAM signal is approximately the saturation of the color, and the phase is approximately the hue. The luminance and modulated chrominance signals are then added to form the composite signal. The process of demodulation involves first comb filtering (horizontal and vertical filtering) of the composite signal to separate the luminance and the chrominance signal followed by further demodulation to separate the I and Q components. The Phase Alternate
Line System
The YW color space of PAL is employed in one form or another in all three color TV systems. The basic YW color space can be generated from gamma-corrected RGB (referred to in equations as R’G’B’) components as follows:
Y = 0.299R'+ 0.5876'+ O.l14B', U = -O.l47R'V = 0.615R'-
0.2896 +0.436B'= 0.515G'-
0.492@'-
O.lOOB' = 0.877(R'-
Y), Y).
(7)
The inverse operation, that is, generation of gammacorrected RGB from YW components, is accomplished by the following: R’ = l.OY+ l.l4OV,
G' = l.OY- 0.394UB’ = l.OY - 2.03OU.
0.58OV. (8)
The Y, U, and V signals in PAL are multiplexed in a total bandwidth of either 5 or 5.5 MHz. With PAL, both U and V chrominance signals are transmitted in a bandwidth of 1.5 MHz. A color subcarrier is modulated with U and V via QAM, and the composite signal is limited to the allowed frequency band which ends up truncating part of the QAM signal. The color subcarrier for PAL is located at 4.43 MHz. PAL transmits the V chrominance component as +V and -V on alternate lines. The demodulation of the QAM chrominance signal is similar to that of NTSC. The recovery of the PAL chrominance signal at the receiver includes averaging successive demodulated scan lines to derive the U and V signals.
COMPONENT
VIDEO
149
TELEVISION
In a component TV system, the luminance and chrominance signals are kept separate, such as on separate channels or multiplexed in different time slots. The use of a component system is intended to prevent the cross talk that causes cross-luminance and cross-chrominance artifacts in the composite systems. The component system is preferable in all video applications that are without the constraints of broadcasting, where composite TV standards were made before the advent of high-speed electronics. Although a number of component signals can be used, the CCIR-601 digital component video format is of particular significance. The color Y, Cr, Cb space of this format is obtained by scaling and offsetting the Y, U, V color space. The conversion from gamma-corrected R, G, B components represented as eight bits (0 to 255) to Y, Cr, Cb is specified as follows: Y = 0.257R’
+ 0.504G’
+ 0.098B
+ 16,
Cr = 0.439R’
- 0.368G’
- 0.07lB’
+ 128,
Cb = -O.l48R'-0.29lG'+0.439B'+128.
(9)
In these equations, Y is allowed to take values in the 16 to 235 range, whereas Cr and Cb can take values in the range from 16 to 240 centered at a value of 128, which indicates zero chrominance. The inverse operation generates gamma-corrected RGB from Y, Cr, Cb components by R’=
l.l64(Y-
16)+1.596(C,
- 128),
G'= l.l64(Y-16)-0.813(C,
-128)-0.392(Cb
B'= l.l64(Y-16)+2.017(&
-128).
-128), (10)
The sampling rates for the luminance component Y and the chrominance components are 13.5 MHz and 6.75 MHz, respectively. The number of active pels per line is 720, the number of active lines for the NTSC version (with 29.97 frames/s) is 486 and for the PAL version (with 25 frames/s) is 576. At eight bits/pel, the bit rate of the uncompressed CCIR-601 signal is 216 Mbps. Digitizing
Video
Video cameras create either analog or sampled analog signals. The first step in processing, storage, or communication is usually to digitize the signals. Analog-to-digital converters that have the required accuracy and speed for video signals have become inexpensive in recent years. Therefore, the cost and quality of digitization is less of an issue. However, digitization with good quality results in a bandwidth expansion, in the sense that transmitting or storing of these bits often takes up more bandwidth or storage space than the original analog signal. In spite of this, digitization is becoming universal because of the relative ease of handling the digital signal compared to analog. In particular, enhancement, removal of artifacts, transformation, compression, encryption, integration with
150
DIGITAL
VIDEO
computers, and so forth are much easier to do in the digital domain using digital integrated circuits. One example of this is the conversion from one video standard to another (e.g., NTSC to PAL). Sophisticated adaptive algorithms required for good picture quality in standards conversion can be implemented only in the digital domain. Another example is the editing of digitized signals. Edits that require transformation (e.g., rotation, dilation of pictures, or time warp for audio) are significantly more difficult in the analog domain. Additionally, encrypting bits is a lot easier and safer than encrypting analog signals. Using digital storage, the quality of the retrieved signal does not degrade in an unpredictable manner after multiple reads as it often does in analog storage. Also, based on today’s database and user interface technology, a rich set of interactions is possible only with stored digital signals. Mapping the stored signal to displays with different resolutions in space (number of lines per screen and number of samples per line) and time (frame rates) can be done easily in the digital domain. A familiar example of this is the conversion of film, which is almost always at a different resolution and frame rate than the television signal. Digital signals are also consistent with the evolving network infrastructure. Digital transmission allows much better control of the quality of the transmitted signal. In broadcast television, for example, if the signal were digital, the reproduced picture in the home could be identical to the picture in the studio, unlike the present situation where the studio pictures look far better than pictures at home. Finally, analog systems dictate that the entire television chain from camera to display operates at a common clock with a standardized display. In the digital domain, considerable flexibility exists by which the transmitter and the receiver can negotiate the parameters for scanning, resolution, and so forth, and thus create the best picture consistent with the capability of each sensor and display. The process of digitization of video consists of prefiltering, sampling, quantization, and encoding (see Fig. 3).
Next, the filtered signal is sampled at a chosen rate and location on the image raster. The minimum rate at which an analog signal must be sampled, called the Nyquist rate, corresponds to twice that of the highest frequency in the signal. For the NTSC system this rate is 2 x 4.2 = 8.4 MHz, and for PAL, this rate is 2 x 5 = 10 MHz. It is normal practice to sample at a rate higher than this for ease of signal recovery when using practical filters. The CCIR-601 signal employs 13.5 MHz for luminance and half that rate for chrominance signals. This rate is an integral multiple of both NTSC and PAL line rates but is not an integral multiple of either NTSC or PAL color subcarrier frequency. Quantization
The sampled signal is still in analog form and is quantized next. The quantizer assigns each pel, whose value is in a certain range, a fixed value representing that range. The process of quantization results in loss of information because many input pel values are mapped into a single output value. The difference between the value of the input pel and its quantized representation is the quantization error. The choice of the number of levels of quantization involves a trade-off between accuracy of representation and the resulting bit rate. PCM
This step is also referred to as prefiltering because it is done prior to sampling. Prefiltering reduces the unwanted frequencies as well as noise in the signal. The simplest filtering operation involves simply averaging the image intensity within a small area around the point of interest and replacing the intensity of the original point by the computed averaged intensity. Prefiltering can sometimes be accomplished by controlling the size of the scanning spot in the imaging system. In dealing with video signals, the filtering applied on the luminance signal may be different from that applied to chrominance signals due to the different bandwidths required.
3.
TV signals.
Conversion
of component analog TV signals to digital
Encoder
The last step in analog-to-digital conversion is encoding quantized values. The simplest type of encoding is called pulse code modulation (PCM). Video pels are represented by eight-bit PCM codewords, that is, each pel is assigned one of the 28 = 256 possible values in the range of 0 to 255. For example, if the quantized pel amplitude is 68, the corresponding eight-bit PCM codeword is the sequence of bit 01000100. WHAT
Filtering
Figure
Sampling
IS COMPRESSION?
Most video signals contain a substantial amount of “redundant” or superfluous information. For example, a television camera that captures 30 frames/s from a stationary scene produces very similar frames, one after the other. Compression removes the superfluous information so that a single frame can be represented by a smaller amount of finite data, or in the case of audio or time-varying images, by a lower data rate (4,5). Digitized audio and video signals contain a significant amount of statistical redundancy, that is, “adjacent” pels are similar to each other, so that one pel can be predicted fairly accurately from another. By removing the predictable component from a stream of pels, the data rate can be reduced. Such statistical redundancy can be removed without loss of any information. Thus, the original data can be recovered exactly by inverse operation, called decompression. Unfortunately, the techniques for accomplishing this efficiently require probabilistic characterization of the signal. Although many excellent probabilistic models of audio and video signals
DIGITAL
have been proposed, serious limitations exist because of the nonstationarity of the statistics. In addition, video statistics may vary widely from application to application. A fast moving football game shows smaller frame-toframe correlation compared to a head and shoulders view of people using video telephones. Current practical compression schemes do result in a loss of information, and lossless schemes typically provide a much smaller compression ratio (2 : 1 to 4 : 1). The second type of superfluous data, called perceptual redundancy, is the information that a human visual system can not see. If the primary receiver of the video signal is a human eye (rather than a machine as in some pattern recognition applications), then transmission or storage of the information that humans cannot perceive is wasteful. Unlike statistical redundancy, the removal of information based on the limitations of human perception is irreversible. The original data cannot be recovered following such a removal. Unfortunately, human perception is very complex, varies from person to person, and depends on the context and the application. Therefore, the art and science of compression still have many frontiers to conquer, even though substantial progress has been made in the last two decades. ADVANTAGES
OF COMPRESSION
Table
1. Bit
Rates
of Compressed
Video
151
to accommodate two layers of data on each side of the DVD, resulting in 17 GB of data. The DVD can handle many hours of high-quality MPEG2 video and Dolby AC3 audio. Thus, compression reduces the storage requirement and also makes stored multimedia programs portable in inexpensive packages. In addition, the reduction of data rate allows transfer of video-rate data without choking various resources (e.g., the main bus) of either a personal computer or a workstation. Another advantage of digital representation/compression is packet communication. Much of the data communication in the computer world is by self-addressed packets. Packetization of digitized audio-video and the reduction of packet rate due to compression are important in sharing a transmission channel with other signals as well as maintaining consistency with telecom/computing infrastructure. The desire to share transmission and switching has created a new evolving standard, called asynchronous transfer mode (ATM), which uses packets of small size, called cells. Packetization delay, which could otherwise hinder interactive multimedia, becomes less of an issue when packets are small. High compression and large packets make interactive communication difficult, particularly for voice. COMPRESSION
The biggest advantage of compression is data rate reduction. Data rate reduction reduces transmission costs, and when a fixed transmission capacity is available, results in better quality of video presentation (4). As an example, a single ~-MHZ analog cable TV channel can carry between four and ten digitized, compressed programs, thereby increasing the overall capacity (in terms of the number of programs carried) of an existing cable television plant. Alternatively, a single ~-MHZ broadcast television channel can carry a digitized, compressed high-definition television (HDTV) signal to give significantly better audio and picture quality without additional bandwidth. Data rate reduction also has a significant impact on reducing the storage requirements for a multimedia database. A CD-ROM can carry a full-length feature movie compressed to about 4 Mbps. The latest optical disk technology, known as digital versatile disk (DVD), which is the same size as the CD, can store 4.7 GB of data on a single layer. This is more than seven times the capacity of a CD. Furthermore, the potential storage capabilities of DVD are even greater because it is possible
VIDEO
REQUIREMENTS
The algorithms used in a compression system depend on the available bandwidth or storage capacity, the features required by the application, and the affordability of the hardware required for implementing the compression algorithm (encoder as well as decoder) (4,5). Various issues arise in designing the compression system. Quality The quality of presentation that can be derived by decoding a compressed video signal is the most important consideration in choosing a compression algorithm. The goal is to provide acceptable quality for the class of multimedia signals that are typically used in a particular service. The three most important aspects of video quality are spatial, temporal, and amplitude resolution. Spatial resolution describes the clarity or lack of blurring in the displayed image, and temporal resolution describes the smoothness of motion. Amplitude resolution describes graininess or other artifacts that arise from coarse quantization. Signals
152 Uncompressed
DIGITAL
VIDEO Versus
Compressed
Bit Rates
The NTSC video has approximately 30 frames/s, 480 visible scan lines per frame, and 480 pels per scan line in three color components. If each color component is coded using eight bits (24 bits/pel total), the bit rate would be approximately 168 Mbps. Table I shows the raw uncompressed bit rates for film, several audio, and video formats. Robustness
As the redundancy from the video signal is removed by compression, each compressed bit becomes more important in the sense that it affects a large number of samples of the video signal. Therefore, an error in either transmission or storage of the compressed bit can have deleterious effects for either a large region of the picture or during an extended period of time. For noisy digital transmission channels, video compression algorithms that sacrifice efficiency to allow graceful degradation of the images in the presence of channel errors are better candidates. Some of these are created by merging source and channel coding to optimize the endto-end service quality. A good example of this is portable video over a wireless channel. Here, the requirements for compression efficiency are severe due to the lack of available bandwidth. Yet, a compression algorithm that is overly sensitive to channel errors would be an improper choice. Of course, error correction is usually added to an encoded signal along with a variety of error concealment techniques, which are usually successful in reducing the effects of random isolated errors. Thus, the proper choice of the compression algorithm depends on the transmission environment in which the application resides. Interactivity
Both consumer entertainment and business video applications are characterized by picture switching and browsing. In the home, viewers switch to the channels of their choice. In the business environment, people get to the information of their choice by random access using, for example, on-screen menus. In the television of the future, a much richer interaction based on content rather than channel switching may become possible. Many multimedia offerings and locally produced video programs often depend on the concatenation of video streams from a variety of sources, sometimes in real time. Commercials are routinely inserted into nationwide broadcasts by network affiliates and cable headends. Thus, the compression algorithm must support a continuous and seamless assembly of these streams for distribution and rapid switching of images at the point of final decoding. It is also desirable that simple edits as well as richer interactions occur on compressed data rather than reconstructed sequences. In general, a higher degree of interactivity requires a compression algorithm that operates on a smaller group of pels. MPEG, which operates on spatiotemporal groups of pels, is more difficult to interact with than JPEG, which operates only on spatial groups of pels. As an example, it is much easier to fast forward a compressed JPEG
bitstream than a compressed MPEG bitstream. This is one reason that current digital camcorders are based on motion JPEG. In a cable/broadcast environment or in an application requiring browsing through a compressed multimedia database, a viewer may change from program to program with no opportunity for the encoder to adapt itself. It is important that the buildup of resolution following a program change take place quite rapidly, so that the viewer can decide either to stay on the program or change to another, depending on the content. Compression
and
Packetization
Delay
have come predominantly Advances in compression through better analysis of the video signal arising from the application in hand. As models have progressed from pels to picture blocks to interframe regions, efficiency has grown rapidly. Correspondingly, the complexity of the analytic phase of encoding has also grown, resulting in an increase of encoding delay. A compression algorithm that looks at a large number of samples and performs very complex operations usually has a larger encoding delay. such encoding delay at the For many applications, source is tolerable, but for some it is not. Broadcast television, even in real time, can often admit a delay of the order of seconds. However, teleconferencing or multimedia groupware can tolerate a much smaller delay. In addition to the encoding delay, modern data communications introduce packetization delay. The more efficient the compression algorithm, the larger is the delay introduced by packetization, because the same size packet carries information about many more samples of the video signal. Symmetry
A cable, satellite, or broadcast environment has only a few transmitters that compress, but a large number of receivers that have to decompress. Similarly, video databases that store information usually compress it only once. However, different viewers may retrieve this information thousands of times. Therefore, the overall economics of many applications is dictated to a large extent by the cost of decompression. The choice of the compression algorithm ought to make the decompression extremely simple by transferring much of the cost to the transmitter, thereby creating an asymmetrical algorithm. The analytic phase of a compression algorithm, which routinely includes motion analysis (done only at the encoder), naturally makes the encoder more expensive. In a number of situations, the cost of the encoder is also important (e.g., camcorder, videotelephone). Therefore, a modular design of the encoder that can trade off performance with complexity, but that creates data decodable by a simple decompressor, may be the appropriate solution. Multiple
Encoding
In a number of instances, the original signal may have to be compressed in stages or may have to be compressed and decompressed several times. In most television studios, for example, it is necessary to store the compressed data and then decompress it for editing, as required. Such an edited
DIGITAL
signal is then compressed and stored again. Any multiple coding-decoding cycle of the signal is bound to reduce the quality of the signal because artifacts are introduced every time the signal is coded. If the application requires such multiple codings, then higher quality compression is required, at least in the several initial stages. Scalability A compressed signal can be thought of as an alternative representation of the original uncompressed signal. From this alternative representation, it is desirable to create presentations at different resolutions (in space, time, amplitude, etc.) consistent within the limitations of the equipment used in a particular application. For example, if a HDTV signal compressed to 24 Mbps can be simply processed to produce a lower resolution and lower bitrate signal (e.g., NTSC at 6 Mbps), the compression is generally considered scalable. Of course, the scalability can be achieved by brute force by decompressing, reducing the resolution, and compressing again. However, this sequence of operations introduces delay and complexity and results in a loss of quality. A common compressed representation from which a variety of low-resolution or higher resolution presentations can be easily derived is desirable. Such scalability of the compressed signal puts a constraint on the compression efficiency in the sense that algorithms that have the highest compression efficiency usually are not very scalable. BASIC COMPRESSION
TECHNIQUES
A number compression techniques have been developed for coding video signals (1). A compression system typically consists of a combination of these techniques to satisfy the type of requirements that we listed in the previous section. The first step in compression usually consists of decorrelation that is, reducing the spatial or temporal redundancy in the signal (4,5). The candidates for doing this are 1. Making a prediction of the next sample of the picture signal using some of the past and subtracting it from that sample. This converts the original signal into its unpredictable part (usually called prediction error). 2. Taking a transform of a block of samples of the picture signal so that the energy would be compacted in only a few transform coefficients. The second step is selection and quantization to reduce the number of possible signal values. Here, the prediction error may be a quantized sample at a time, or a vector of prediction error of many samples may be quantized all at once. Alternatively, for transform coding, only important coefficients may be selected and quantized. The final step is entropy coding which recognizes that different values of the quantized signal occur with different frequencies and, therefore, representing them with unequal length binary codes reduces the average bit rate. We give here more details of the following techniques because they have formed the basis of most compression systems;
Predictive coding (DPCM) Transform coding Motion compensation Vector quantization Subband/wavelet coding Entropy coding Incorporation of perceptual Predictive
Coding
VIDEO
153
factors
(DPCM)
In predictive coding, the strong correlation between adjacent pels (spatially as well as temporally) is exploited (4). As shown in Fig. 4, an approximate prediction of the sample to be encoded is made from previously coded information that has already been transmitted. The error (or differential signal) resulting from subtracting the prediction from the actual value of the pel is quantized into a set of discrete amplitude levels. These levels are then represented as binary words of fixed or variable lengths and are sent to the channel for transmission. The predictions may use the correlation in the same scanning line or adjacent scanning lines or previous fields. A particularly important method of prediction is the motion-compensated prediction. If a television scene contains moving objects and a frame-to-frame translation of each moving object is estimated, then more efficient prediction can be performed using elements in the previous frame that are appropriately spatially displaced. Such prediction is called motion-compensated prediction. The translation is usually estimated by matching a block of pels in the current frame to a block of pels in the previous frames at various displaced locations. Various criteria for matching and algorithms to search for the best match have been developed. Typically, such motion estimation is done only at the transmitter and the resulting motion vectors are used in the encoding process and are also separately transmitted for use in the decompression process. Transform
Coding
In transform by transform
Figure
coding (Fig. 5), a block of pels is transformed 7’ into another domain called the transform
4. Block
diagram
of a predictive
encoder
and decoder.
DIGITAL
VIDEO
receiver is considerably table lookup. Subband/Wavelet
Figure
5. Block diagram of a transform
coder.
domain, and some of the resulting coefficients are quantized and coded for transmission. The blocks may contain pels from one, two, or three dimensions. The most common technique is to use a block of two dimensions. Using one dimension does not exploit vertical correlation, and using three dimensions requires several frame stores. It has been generally agreed that discrete cosine transform (DCT) is best matched to the statistics of the picture signal, and moreover because it has fast implementation, it has become the transform of choice. The advantage of transform coding (4) comes mainly from two mechanisms. First, not all of the transform coefficients need to be transmitted to maintain good image quality, and second, the coefficients that are selected need not be represented with full accuracy. Loosely speaking, transform coding is preferable to predictive coding for lower compression rates and where cost and complexity are not extremely serious issues. Most modern compression systems have used a combination of predictive and transform coding. In fact, motion compensated prediction is performed first to remove the temporal redundancy, and then the resulting prediction error is compressed by two-dimensional transform coding using discrete cosine transform as the dominant choice. Vector
simple
because
it does a simple
Coding
Subband coding, more recently generalized using the theory of wavelets, is a promising technique for video and it has already been shown, outperforms still image coding techniques based on block transforms such as in JPEG. Although subband techniques have been incorporated into audio coding standards, the only image standard based on wavelets currently is the FBI standard for fingerprint compression. There are several compelling reasons to investigate subband/wavelet coding for image and video compression. One reason is that unlike the DCT, the wavelet framework does not transform each block of data separately. This results in graceful degradation, as the bit rate is lowered, without the traditional “tiling effect” that is characteristic of block-based approaches. Wavelet coding also allows one to work in a multiresolution framework which is a natural choice for progressive transmission or applications where scalability is desirable. One of the current weaknesses in deploying wavelet schemes for video compression is that a major component for efficient video compression is block-based motion estimation which makes block-based DCT a natural candidate for encoding spatial information. Entropy
Coding
If the quantized output values of either a predictive or a transform coder are not all equally likely, then the average bit rate can be reduced by giving each one of the values a different word length. In particular, those values that occur more frequently are represented by a smaller length code word (4,5). If a code of variable length is used and the resulting code words are concatenated to form a stream of bits, then correct decoding by a receiver requires that
Quantization
In predictive coding described in the previous section, each pixel is quantized separately using a scalar quantizer. The concept of scalar quantization can be generalized to vector quantization (5) in which a group of pixels are quantized at the same time by representing them as a code vector. Such vector quantization can be applied to a vector of prediction errors, original pels, or transform coefficients. As in Fig. 6, a group of nine pixels from a 3 x 3 block is represented as one of the K vectors from a codebook of vectors. The problem of vector quantization is then to design the codebook and an algorithm to determine the vector from the codebook that offers the best match to the input data. The design of a codebook usually requires a set of training pictures and can grow to a large size for a large block of pixels. Thus, for an 8 x 8 block compressed to two bits per pel, one would need a 2128 size codebook. Matching the original image with each vector of such a large codebook requires a lot of ingenuity. However, such matching is done only at the transmitter, and the
Figure
6.
Block diagram of vector quantlzation.
DIGITAL
every combination of concatenated code words be uniquely decipherable. A variable word length code that achieves this and at the same time gives the minimum average bit rate is called the Huffman code. Variable word length codes are more sensitive to the effect of transmission errors because synchronization would be lost in the event of an error. This can result in decoding several code words incorrectly. A strategy is required to limit the propagation of errors when Huffman codes are used. Incorporation
of Perceptual
Factors
Perception-based coding attempts to match the coding algorithm to the characteristics of human vision. We know, for example, that the accuracy with which the human eye can see coding artifacts depends on a variety of factors such as the spatial and temporal frequency, masking due to the presence of spatial or temporal detail, and so on. A measure of the ability to perceive the coding artifact can be calculated on the basis of the picture signal. This is used, for example, in transform coding to determine the precision needed for quantization of each coefficient. Perceptual factors control the information that is discarded, on the basis of its visibility to the human eye. Therefore, it can be incorporated in any of the previously stated basic compression schemes. Comparison
of Techniques
Figure 7 represents an approximate comparison of different techniques using compression efficiency versus complexity as a criterion under the condition that the picture quality is held constant at an eight-bit PCM level. The complexity allocated to each codec is an approximate estimate relative to the cost of a PCM codec which is given a value of 5. Furthermore, it is the complexity only of the decoder portion of the codec, because that is the most important cost element of digital television. Also, most of the proposed systems are a combination of several different techniques of Fig. 7, making such comparisons difficult. As we remarked before, the real challenge is to combine the different techniques to engineer a cost-effective solution for a given service. The next section describes one example of such a codec.
Figure 7. Bits/pel versus complexity ofvideo decoding for several video compression algorithms.
A COMPRESSION
VIDEO
155
SCHEME
In this section, we describe a compression scheme that combines the previous basic techniques to satisfy the requirements that follow. Three basic types of redundancy are exploited in the video compression process. Motion compensation removes two-dimensional DCT removes temporal redundancy, spatial redundancy, and perceptual weighting removes amplitude irrelevancy by putting quantization noise in less visible areas. Temporal processing occurs in two stages. The motion of objects from frame-to-frame is estimated by using hierarchical block matching. Using the motion vectors, a displaced frame difference (DFD) is computed which generally contains a small fraction of the information in the original frame. The DFD is transformed by using DCT to remove the spatial redundancy. Each new frame of DFD is analyzed prior to coding to determine its rate versus perceptual distortion characteristics and the dynamic range of each coefficient (forward analysis). The transform coefficients are quantized performed based on the perceptual importance of each coefficient, the precomputed dynamic range of the coefficients, and the rate versus distortion characteristics. The perceptual criterion uses a model of the human visual system to determine a human observer’s sensitivity to color, brightness, spatial frequency, and spatiotemporal masking. This information is used to minimize the perception of coding artifacts throughout the picture. Parameters of the coder are optimized to handle the scene changes that occur frequently in entertainment/sports events, and channel changes made by the viewer. The motion vectors, compressed transform coefficients, and other coding overhead bits are packed into a format that is highly immune to transmission errors. The encoder is shown in Fig. 8a. Each frame is analyzed before being processed in the encoder loop. The motion vectors and control parameters resulting from the forward analysis are input to the encoder loop which outputs the compressed prediction error to the channel buffer. The encoder loop control parameters are weighed by the buffer state which is fed back from the channel buffer. In the predictive encoding loop, the generally sparse differences between the new image data and the motioncompensated predicted image data are encoded using adaptive DCT coding. The parameters of the encoding are controlled in part by forward analysis. The data output from the encoder consists of some global parameters of the video frame computed by the forward analyzer and transform coefficients that have been selected and quantized according to a perceptual criterion. Each frame is composed of a luminance frame and two chrominance difference frames which are half the resolution of the luminance frame horizontally. The compression algorithm produces a chrominance bit rate which is generally a small fraction of the total bit rate, without perceptible chrominance distortion. The output buffer has an output rate from 2 to 7 Mbps and a varying input rate that depends on the image content. The buffer history is used to control the
156
DIGITAL
VIDEO
Figure
8. Block
diagram
parameters of the coding algorithm, so that the average input rate equals the average output rate. The feedback mechanism involves adjusting the allowable distortion level because increasing the distortion level (for a given image or image sequence) causes the encoder to produce a lower output bit rate. The encoded video is packed into a special format before transmission which maximizes immunity to transmission errors by masking the loss of data in the decoder. The duration and extent of picture degradation due to any one error or group of errors is limited. The decoder is shown in Fig. 8b. The compressed video data enter the buffer which is complementary to the compressed video buffer at the encoder. The decoding loop uses the motion vectors, transform coefficient data, and other side information to reconstruct the NTSC images. Channel changes and severe transmission errors detected in the decoder initiate a fast picture recovery process. Less severe transmission errors are handled gracefully by several algorithms, depending on the type of error. Processing and memory in the decoder are minimized. Processing consists of one inverse spatial transform and a variable length decoder which are realizable in a few very large scale integration (VLSI) chips. Memory in the decoder consists of one full frame and a few compressed frames. COMPLEXITY/COST
Because cost is directly linked to complexity, this aspect of a compression algorithm is the most critical for the asymmetrical situations described previously. The decoder cost is most critical. Figure 8 represents an approximate
of an encoder/decoder.
trade-off between compression efficiency and complexity under the condition that picture quality is held constant at an eight-bit PCM level. The compression efficiency is in terms of compressed bits per Nyquist sample. Therefore, pictures that have different resolution and bandwidth can be compared simply by proper multiplication to get the relevant bit rates. The complexity allocated to each codec should not be taken too literally. Rather, it is an approximate estimate relative to the cost of a PCM codec, which is given a value of 5. The relation of cost to complexity is controlled by an evolving technology; codecs of high complexity are quickly becoming inexpensive through the use of applicationspecific video DSPs and submicron device technology. In fact, very soon, fast microprocessors will be able to decompress the video signal entirely in software. It is clear that in the near future, a standard resolution (roughly 500 line by 500 pel TV signal) will be decoded entirely in software for even the MPEG compression algorithm. Figure 9 shows video encoding and decoding at various image resolutions. VIDEOPHONE STANDARDS-
AND COMPACT DISK H.320 AND MPEC-1
Digital compression standards (DCS) for videoconferencing were developed in the 1980s by the CCITT, which is now known as the ITU-T. Specifically, the ISDN video-conferencing standards are known collectively as H.320, or sometimes P*64 to indicate that it operates at multiples of 64 kbits/s. The video coding portion of the standard called H.261, codes pictures at a common intermediate format (CIF) of 352 pels by 288 lines. A
DIGITAL
157
VIDEO
Figure 9. Computational requirements in millions of instructions per second (mips) for video encoding and decoding at different image resolutions.
lower resolution of 176 pels by 144 lines, called QCIF, is available for interoperating with PSTN videophones. The H.263 standard is built upon the H.261 framework but modified to optimize video quality at rates lower than 64 kb/s. H.263+ is focused on adding features to H.263 such as scalability and robustness to packet loss on packet networks such as the Internet. In the late 1980s a need arose to place motion video and its associated audio onto first generation CD-ROMs at 1.4 Mbps. For this purpose, in the late 1980s and early 1990s the IS0 MPEG committee developed digital compression standards for both video and two-channel stereo audio. The standard is known colloquially as MPEG-1 and officially as IS0 11172. The bit rate of 1.4 Mbps available on first generation CD-ROMs is not high enough to allow full-resolution TV. Thus, MPEG-1 was optimized for the reduced CIF resolution of H.320 video-conferencing. It was designed to handle only the progressive formats, later MPEG-2 incorporated progressive as well as interlaced formats effectively. THE DIGITAL
ENTERTAINMENT
TV STANDARD--PEG-2
Following MPEG-1, the need arose to compress entertainment TV for such transmission media as satellite, cassette tape, over-the-air, and CATV (5). Thus, to have digital compression methods available for full-resolution standard definition TV (SDTV) pictures such as shown
in Fig. 4a or high definition TV (HDTV) pictures such as shown in Fig. 4b, (IS0 International Standard Organization) developed a second standard known colloquially as MPEG-2 and officially as IS0 13818. Because the resolution of entertainment TV is approximately four times that of videophone, the bit rate chosen for optimizing MPEG-2 was 4 Mbps. SUMMARY A brief survey of digital television has been presented in this article. Digitizing television and compressing it to a manageable bit rate creates significant advantages and major disruption in existing television systems. The future is bright for a variety of systems based on digital television technology. BIBLIOGRAPHY 1.
P. Mertz (1934).
and F. Gray,
Bell
Syst.
Tech.
J.
13,
464-515
2. W. -T. Wintringham, Proc. IRE 39(10) (1951). 3. K. B. Benson, ed., Televmion Engineermg Handbook, McGrawHill, NY, 1986. 4. A. Netravali and B. G. Haskell, Digital Pictures, Plenum, NY, 1988. 5. B. G. Haskell, A. Puri, and A. N. Netravali, Digital Video: An Introduction to MPEG-2, Chapman & Hall, London, 1996.
158
DIGITAL
DIGITAL
WATERMARKING
WATERMARKING R. CHANDRAMOULI Stevens Hoboken, NASIR
Institute NJ
of Technology
MEMON
Polytechnic Brooklyn, MAJID
University NY
RABBANI
Eastman Rochester,
Kodak NY
Company
INTRODUCTION The advent of the Internet has resulted in many new opportunities for creating and delivering content in digital form. Applications include electronic advertising, realtime video and audio delivery, digital repositories and libraries, and Web publishing. An important issue that arises in these applications is protection of the rights of content owners. It has been recognized for quite some time that current copyright laws are inadequate for dealing with digital data. This has led to an interest in developing new copy deterrence and protective mechanisms. One approach that has been attracting increasing interest is based on digital watermarking techniques. Digital watermarking is the process of embedding information into digital multimedia content such that the information (which we call the watermark) can later be extracted or detected for a variety of purposes, including copy prevention and control. Digital watermarking has become an active and important area of research, and development and commercialization of watermarking techniques is deemed essential to help address some of the challenges faced by the rapid proliferation of digital content. In the rest of this article, we assume that the content being watermarked is a still image, though most digital watermarking techniques are, in principle, equally applicable to audio and video data. A digital watermark can be visible or invzsible. A visible watermark typically consists of a conspicuously visible message or a company logo indicating the ownership of the image, as shown watermarked in Fig. 1. On the other hand, an invisibly image appears very similar to the original. The existence of an invisible watermark can be determined only by using an appropriate watermark extraction or detection algorithm. In this article, we restrict our attention to invisible watermarks. An invisible watermarking technique generally consists of an encoding process and a decoding process. A generic watermark encoding process is shown in Fig. 2. The watermark insertion step is represented as X’ = EK(X,
W),
(1)
where X is the original image, X’ is the watermarked image, W is the watermark information being embedded, K is the user’s insertion key, and E represents the watermark insertion function. Depending on the way the watermark is inserted and depending on the nature of
Figure
1. An image
Figure
that
2. Watermark
has visible
encoding
watermark.
process.
the watermarking algorithm, the detection or extraction method can take on very distinct approaches. One major difference between watermarking techniques is whether the watermark detection or extraction step requires the original image. Watermarking techniques that do not require the original image during the extraction process are called oblivzous (or public or blind) watermarking techniques. For oblivious watermarking techniques, watermark extraction is represented as Tjir = DK@),
(2)
where X is a possibly corrupted watermarked image, K’ is the extraction key, D represe_nts the watermark extraction/detection function, and W’ is the extracted watermark information (see Fig. 3). Oblivious schemes are attractive for many applications where it is not feasible to require the original image to decode a watermark. Invisible watermarking schemes can also be classified as either robust or fragile. Robust watermarks are often used to prove ownership claims and so are generally designed to withstand common image processing tasks
DIGITAL
fighting a form of piracy example, when someone movie as it is shown in onto optical disks or VHS
Figure
3. Watermark
decoding
process.
such as compression, cropping, scaling, filtering, contrast enhancement, and printing/scanning, in addition to malicious attacks aimed at removing or forging the watermark. In contrast, fragile watermarks are designed to detect and localize small changes in the image data. Applications
Digital watermarks applica tions, including
are potentially the following.
useful
in
many
Ownership Assertion. To assert ownership of an image, Alice can generate a watermarking signal using a secret private key and embed it in the original image. She can then make the watermarked image publicly available. Later, when Bob contests the ownership of an image derived from this public image, Alice can produce the unmarked original image and also demonstrate the presence of her watermark in Bob’s image. Because Alice’s original image is unavailable to Bob, he cannot do the same. For such a scheme to work, the watermark has to survive image processing operations aimed at malicious removal. In addition, the watermark should be inserted so that it cannot be forged because Alice would not want to be held accountable for an image that she does not own. Fingerprinting. In applications where multimedia content is electronically distributed across a network, the content owner would like to discourage unauthorized duplication and distribution by embedding a distinct watermark (or a fingerprint) in each copy of the data. If, unauthorized copies of the data are found, at a later time, then the origin of the copy can be determined by retrieving the fingerprint. In this application, the watermark needs to be invisible and must also be invulnerable to deliberate attempts to forge, remove or invalidate it. The watermark, should also be resistant to collusion, that is, a group of users that have the same image but contains different fingerprints should not be able to collude and invalidate any fingerprint or create a copy without any fingerprint. Another example is in digital cinema, where information can be embedded as a watermark in every frame or sequence of frames to help investigators locate the scene of the piracy more quickly and point out security weaknesses in the movie’s distribution. The information could include data such as the name of the theater and the date and time of the screening. The technology would be most useful in
WATERMARKING
159
that is surprisingly common, for uses a camcorder to record the a theater and then duplicates it tapes for distribution.
Copy Prevention or Control. Watermarks can also be used for copy prevention and control. For example, in a closed system where the multimedia content needs special hardware for copying and/or viewing, a digital watermark can be inserted indicating the number of copies that are permitted. Every time a copy is made the watermark can be modified by the hardware and after a certain number of copies, the hardware would not create further copies of the data. An example of such a system is the digital versatile disk (DVD). In fact, a copy protection mechanism that includes digital watermarking at its core is currently being considered for standardization, and second-generation DVD players may well include the ability to read watermarks and act based on their presence or absence (1). Fraud and Tampering Detection. When multimedia content is used for legal purposes, medical applications, and commercial transactions, it is news reporting, important to ensure that the content originated from a specific source and that it was not changed, manipulated, or falsified. This can be achieved by embedding a watermark in the data. Subsequently, when the photo is checked, the watermark is extracted using a unique key associated with the source, and the integrity of the data is verified through the integrity of the extracted watermark. The watermark can also include information from the original image that can aid in undoing any modification and recovering the original. Clearly, a watermark used for authentication should not affect the quality of an image and should be resistant to forgeries. Robustness is not critical because removing the watermark renders the content inauthentic and hence valueless. ID Card Security. Information in a passport or ID (e.g., passport number or person’s name) can also be included in the person’s photo that appears on the ID. The ID card can be verified by extracting the embedded information and comparing it to the written text. The inclusion of the watermark provides an additional level of security in this application. For example, if the ID card is stolen and the picture is replaced by a forged copy, failure in extracting the watermark will invalidate the ID card. These are a few examples of applications where digital watermarks could be of use. In addition, there are many other applications in digital rights management (DRM) and protection that can benefit from watermarking technology. Examples include tracking use of content, binding content to specific players, automatic billing for viewing content, and broadcast monitoring. From the variety of potential applications exemplified, it is clear that a digital watermarking technique needs to satisfy a number of requirements. The specific requirements vary with the application, so watermarking techniques need to be designed within the context of the entire system in which they are to be employed. Each application
160
DIGITAL
WATERMARKING
imposes different requirements and requires different types of invisible or visible watermarking schemes or a combination thereof. In the remaining sections of this article, we describe some general principles and techniques for invisible watermarking. Our aim is to give the reader a better understanding of the basic principles, inherent trade-offs, strengths, and weaknesses, of digital watermarking. We will focus on image watermarking in our discussions and examples. However as we mentioned earlier, the concepts involved are general and can be applied to other forms of content such as video and audio. Relationship
to Information
Hiding
and
Steganography
In addition to digital watermarking, the general idea of hiding some information in digital content has a wider class of applications that go beyond copyright protection and authentication. The techniques involved in such applications are collectively referred to as information hiding. For example, an image printed on a document could be annotated by information that could lead a user to its high resolution version as shown in Fig. 4. Metadata provide additional information about an image. Although metadata can also be stored in the file header of a digital image, this approach has many limitations. Usually, when a file is transformed to another format (e.g., from TIFF to JPEG or to bmp), the metadata are lost. Similarly, cropping or any other form of image manipulation typically destroys the metadata. Finally, the metadata can be attached only to an image as long as the image exists in digital form and is lost once the image is printed. Information hiding allows the metadata to travel with the image regardless of the file format and image state (digital or analog). Metadata information embedded in an image can serve many purposes. For example, a business can embed the website URL for a specific product in a picture that shows an advertisement of that product. The user holds the magazine photo in front of a low-cost CMOS camera that is integrated into a personal computer, cell phone, or Palm Pilot. The data are extracted from the low-quality picture and are used to take the browser to the designated website. Another example is embedding GPS data (about 56 bits) about the capture location of a picture. The key difference between this application and many other watermarking applications is the absence of an active adversary. In watermarking applications, such as copyright protection and authentication, there is an active adversary that would attempt to remove,
Figure
4. Metadata
taggmg
using
information
hiding.
invalidate, or forge watermarks. In information hiding, there is no such active adversary because there is no value in removing the information hidden in the content. Nevertheless, information hiding techniques need to be robust to accidental distortions. For example, in the application shown in Fig. 4, the information embedded in the document image needs to be extracted despite distortions from the print and scan process. However, these distortions are just a part of a process and are not caused by an active adversary. Another topic that is related to watermarking is steganography (meaning covered writing in Greek), which is the science and art of secret communication. Although steganography has been studied as part of cryptography for many decades, the focus of steganography is secret communication. In fact, the modern formulation of the problem goes by the name of the prisoner’s problem. Here, Alice and Bob are trying to hatch an escape plan while in prison. The problem is that all communication between them is examined by a warden, Wendy, who will place both of them in solitary confinement at the fh-st hint of any suspicious communication. Hence, Alice and Bob must trade seemingly inconspicuous messages that actually contain hidden messages involving the escape plan. There are two versions of the problem that are usually discussed - one where the warden is passzue, and only observes messages and the other where the warden is active and modifies messages in a limited manner to guard against hidden messages. Clearly, the most important issue here is that the very presence of a hidden message must be concealed, whereas in digital watermarking, it is not always necessary that a good watermarking technique also be steganographic. Watermarking
The following watermarking l
l
l
Issues
important techniques:
issues arise
in studying
digital
Capacity: What is the optimum amount of data that can be embedded in a given signal? What is the optimum way to embed and then later extract this information? Robustness: How do we embed and retrieve data so that it survives malicious or accidental attempts at removal? Transparency: How do we embed data so that it does not perceptually degrade the underlying content?
DIGITAL
Security: How do we determine that the information embedded has not been tampered with forged, or even removed?
l
These questions have been the focus of intense study in the past few years, and some remarkable progress has already been made. However, there are still more questions than answers in this rapidly evolving research area. Perhaps a key reason for it is that digital watermarking is inherently a multidisciplinary topic that builds on developments in diverse subjects. The areas that contribute to the development of digital watermarking include at least the following: Information and communication Decision and detection theory 0 Signal processing l Cryptography and cryptographic
l
theory
l
protocols
Each of these areas deals with a particular aspect of the digital watermarking problem. Generally speaking, information and communication theoretic methods deal with the data embedding (encoder) side of the problem. For example, information theoretic methods are useful in computing the amount of data that can be embedded in a given signal subject to various constraints such as the peak power (square of the amplitude) of the embedded data or the embedding-induced distortion. The host signal can be treated as a communication channel, and various operations such as compression/decompression, and filtering can be treated as noise. Using this framework, many results from classical information theory can be successfully applied to compute the data-embedding capacity of a signal. Decision theory is used to analyze data-embedding procedures from the receiver (decoder) side. Given a dataembedding procedure, how do we extract the hidden data from the host signal which may have been subjected to intentional or unintentional attacks? The data extraction procedure must guarantee a certain amount of reliability. What are the chances that the extracted data are indeed the original embedded data? Even if the dataembedding algorithm is not intelligent or sophisticated, a good data extraction algorithm can offset this effect. In watermarking applications where the embedded data is used for copyright protection, decision theory is used to detect the presence of embedded data. In applications such as media bridging, detection theoretic methods are needed to extract the embedded information. Therefore, decision theory plays a very important role in digital watermarking for data extraction and detection. In fact, it is shown that when ustng invisible watermarks for resolving rtghtful ownership, uniqueness problems arise due to the data detection process, irrespective of the data-embedding process. Therefore, there is a real and immediate need to develop reliable, eficient, and robust detectors for digital watermarking applications. A variety of signal processing tools and algorithms can be applied to digital watermarking. Such algorithms are based on aspects of the human visual system, properties of signal transforms [e.g., Fourier and discrete cosine
WATERMARKING
161
transform (DCT)] , noise characteristics, properties of various signal processing attacks, etc. Depending on the nature of the application and the context, these methods can be implemented at the encoder, at the decoder, or both. The user has the flexibility to mix and match different techniques, depending on the algorithmic and computational constraints. Although issues such as visual quality, robustness, and real-time constraints can be accommodated, it is still not clear if all of the properties desirable for digital watermarking discussed earlier can be achieved by any single algorithm. In most cases, these properties have an inherent trade-off. Therefore, developing signal processing methods to strike an optimal balance between the compettng properties of a digital watermarking algorithm is necessary. Cryptographic issues lie at the core of many applications of information hiding but have unfortunately received little attention. Perhaps this is due to the fact that most work in digital watermarking has been done in the signal processing and communications community, whereas cryptographers have focused more on issues like secret communication (covert channels, subliminal channels) and collusion-resistant fingerprinting. It is often assumed that simply using appropriate cryptographic primitives like encryption, time-stamps, digital signatures, and hash functions would result in secure information hiding applications. We believe that this is far from the truth. In fact, designing secure digital watermarking techniques requires an intricate blend of cryptography along with information theory and signal processing. The rest of this article is organized as follows. In the next section, we describe fragile and semifragile watermarking; the following section deals with robust watermarks. Communication and information theoretic approaches to watermarking are discussed in the subsequent section, and concluding remarks are provided in the last section. FRAGILE AND SEMIFRAGILE
WATERMARKS
In the analog world, an image (a photograph) has generally been accepted as a proof of occurrence “of the event depicted. The advent of digital images and the relative ease with which they can be manipulated has changed this situation dramatically. Given an image in digital or analog form, one can no longer be assured of its authenticity. This has led to the need for image authenttcation techniques. Data authentication techniques have been studied in cryptography for the past few decades. They provide a means of ensuring the integrity of a message. At first, the need for image authentication techniques may not seem to pose a problem because efficient and effective authentication techniques are found in the field of cryptography. However, authentication applications for images present some unique problems that are not addressed by conventional cryptographic authentication techniques. Some of these issues are listed here: l
It is desirable in many applications to authenticate the image content, rather then the representation of
162
DIGITAL
l
l
l
WATERMARKING
the content. For example, converting an image from JPEG to GIF is a change in representation. One would like the authenticator to remain valid across different representations, as long as the perceptual content has not been changed. Conventional authentication techniques based on cryptographic hash functions, message digests, and digital signatures authenticate only the representation. When authenticating image content, it is often desirable to embed the authenticator in the image itself. This has the advantage that authentication will not require any modifications to the large number of existing representational formats for image content that do not provide any explicit mechanism for including an authentication tag (like the GIF format). More importantly, the authentication tag embedded in the image would survive transcoding of the data across different formats, including analog-to-digital and digital-to-analog conversions, in a completely transparent manner. In addition to detecting any tampering with the original content, it is also desirable to detect the exact location of the tampering. Given the highly data-intensive nature of image content, any authentication technique has to be computationally efficient to the extent that a simple real-time implementation should be possible in both hardware and software.
These issues can be addressed by designing image authentication techniques based on digital watermarks. There are two kinds of watermarking techniques that have been developed for authentication applications -fragile watermarking techniques and semifragile watermarking techniques. In the rest of this section, we describe the general approach taken by each and give some illustrative examples. Fragile
Watermarks
A fragile watermark is designed to indicate and even pinpoint any modification made to an image. To illustrate the basic workings of fragile watermarking, we describe a technique recently proposed by Wong and Memon (2). This technique inserts an invisible watermark W into an
Figure mark
5. Public key verification insertion procedure.
water-
m x n image, X. The original image X and the binary watermark W are partitioned into k x I blocks, where the rth image block and the watermark block are denoted by X, and W,, respectively. For each image block X,, a corresponding block X,. is formed, identical to X,., except that the least significant bit of every element in XT is set to zero. For each block X,, a cryptographic hash H(K, m, n, &) (such as MD5) is computed, where K is the user’s key. The first kl bits of the hash output, treated as k x I rectangular array, are XOR’ed with the current watermark block W, to form a new binary block C,. Each element of C, is inserted into the least significant bit of the corresponding element in X,., generating the output Xi. Image authentication is performed by extracting C, from each block Xi of the watermarked image and by XOR’ing that array with the cryptographic hash H(K, m, n, X,), as before, to produce the extracted watermark block. Changes in the watermarked image result in changes in the corresponding binary watermark region, enabling using the technique to localize unauthorized alterations of an image. The watermarking algorithm can also be extended to a public key version, where the private key of a public key algorithm Ki is required to insert the watermark. However, the extraction requires only the public key of user A. More specifically, in the public key version of the algorithm, the MSBs of an image data block X,. and the image size parameters are hashed, and then the result is encrypted using the private key of a public key algorithm. The resulting encrypted block is then XOR’ed with the corresponding binary watermark block W, before the combined results are embedded in the LSB of the block. In the extraction step, the same MSB data and the image size parameters are hashed. The LSB of the data block (cipher text) is decrypted using the public key and then XOR’ed with the hash output to produce the watermark block. Refer to Figs. 5 and 6 for public key verification watermark insertion and extraction processes, respectively. This technique is one example of a fragile watermarking technique. Many other techniques are proposed in the literature. Following are the main issues that need to be addressed in designing fragile watermarking techniques:
DIGITAL
Figure
6.
procedure.
Figure
cedure.
7.
Public
key
verification
SARI image authentication
watermark
extraction
system - verification
pro-
Locality: How well does the technique identify the exact pixels that have been modified. The Wong and Memon technique described before, for example, can localize only changes in image blocks (at least 12 x 12), even if only one pixel has been changed in the block. Any region smaller than this cannot be pinpointed as modified. Transparency: How much degradation in image quality is suffered by inserting of a watermark? Securzty: How difficult is it for someone without the knowledge of the secret key (the user keyK in the first scenario or the private key Ki in the second scenario) used in the watermarking process to modify an image without modifying the watermark or to insert a new but valid watermark. Semifragile
Watermarks
The methods described in the previous subsection authenticate the data that form the multimedia content; the authentication process does not treat the data as distinct from any other data stream. Only the process of inserting the signature into the multimedia content treats the data stream as an object that is to be viewed by a human observer. For example, a watermarking scheme
WATERMARKING
163
may maintain the overall average image color, or it may insert the watermark in the least significant bit, thus discarding the least significant bits of the original data stream and treating them as perceptually irrelevant. All multimedia content in current representations have a fair amount of built-in redundancy, that is to say that the data representing the content can be changed without effecting a perceptual change. Further, even perceptual changes in the data may not affect the content. For example, when dealing with images, one can brighten an image, compress it in a lossy fashion, or change contrast settings. The changes caused by these operations could well be perceptible, even desirable, but the image content is not considered changed. Objects in the image are in the same positions and are still recognizable. It is highly desirable that authentication of multimedia documents takes this into account, that is, there is a set of allowed operations that can be applied to the image content without affecting the authenticity of the image. There have been a number of recent attempts at techniques that address authentication of “image content”, not just the image data representation. One approach is to use feature points in defining image content that are robust to image compression. Cryptographic schemes such as digital signatures can then be used to authenticate these feature points. Typical feature points include, for example, edge maps (3), local maxima and minima, and low-pass wavelet coefficients (4). The problem with these methods is that it is hard to define image content in terms of a few features; for example, edge maps do not sufficiently define image content because it may be possible for two images, to have fairly different content (the face of one person replaced by that of another) but identical edge maps. Image content remains an ill-defined attribute that defines quantification despite the many attempts by the image processing and vision communities. Another interesting approach to authenticating image content is to compute an image digest (or hash or fingerprint) of the image and encrypt the digest using a secret key. For public key verification of the image, the secret key is the user’s private key and hence the verification can be done by anyone who has the user’s public key, much like digital signatures. Note that the image digest that is computed is much smaller than the image itself and can be embedded in the image by using a robust watermarking technique. Furthermore, the image digest has the property that, as long as the image content has not changed, the digest that is computed from the image remains the same. Clearly, constructing such an image digest function is a difficult problem. Nevertheless, there have been a few such functions proposed in the literature, and image authentication schemes based on them have been devised. Perhaps the most widely cited image digest function/authentication scheme is SARI, proposed by Lin and Chang (5). The SARI authentication scheme contains an image digest function that generates hash bits that are invariant to JPEG compression, that is, the hash bits do not change if the image is JPEG compressed but do change for any other significant or malicious operation.
DIGITAL
WATERMARKING
The image digest component of SARI is based on the invariance of the relationship between selected DCT coefficients in two given image blocks. It can be proven that this relationship is maintained even after JPEG compression by using the same quantization matrix for the whole image. Because the image digest is based on this feature, SARI can distinguish between JPEG compression and other malicious operations that modify image content. More specifically, in SARI, the image to be authenticated is first transformed to the DCT domain. The DCT blocks are grouped into nonoverlapping sets Pp and P4 as defined here: pa
=
{plf2,p3,.
P,
=
t&l,
Q2,
. . ,&I/2)},
Q3,
..
,
Qc~,2)),
where N is the total number of DCT blocks in the input image. An arbitrary mapping function 2 is defined between Pp = Z(K, P,), these two sets that satisfies the criteria = P, where P is the set of all DCT PPnPq =$andP,UP, blocks of the input image. The mapping function 2 is central to the security of SARI and is based on a secret key K. The mapping effectively partitions image blocks into pairs. Then for each block pair, a number of DCT coefficients is selected. Feature code or hash bits are then generated by comparing the corresponding coefficients in the paired block. For example, if the DCT coefficient in block Pm is greater than the DCT coefficient in block P, in the block pair (Pm, P,), then the hash bit generated is “1”. Otherwise, a “0” is generated. It is clear that a hash bit preserves the relationship between the selected DCT coefficients in a given block pair. The hash bits generated for each block are concatenated to form the digest of the input image. This digest can then either be embedded in the image itself or appended as a tag. The authentication procedure at the receiving end involves extracting the embedded digest. The digest for the received image is generated in the same manner as at the encoder and is compared with the extracted and decrypted digest. Because relationships between selected DCT coefficients is maintained even after JPEG compression, this authentication system can distinguish JPEG compression from other malicious manipulations of the authenticated image. However, it was recently shown that if a system uses the same secret key K and hence the same mapping function 2 to form block pairs for all of the images authenticated by it, an attacker who has access to a sufficient number of images authenticated by this system can produce arbitrary fake images (6). SARI is limited to authentication that is invariant only to JPEG compression. Although JPEG compression is one of the most common operations performed on an image, certain applications may require authentication that is invariant to other simple image processing operations such as contrast enhancement or sharpening. As a representative of the published literature to achieve this purpose, we review a promising technique proposed by Fridrich (7). In this technique, N random matrices are generated whose entries are uniformly distributed in [O,l] , using a secret key. Then, a low-pass filter is applied to each of these random matrices to obtain N random smooth
Figure
Fridrich
8. Random patterns and their smoothed versions used in semifragile watermarking technique.
patterns, as shown in Fig. 8. These are then made DC free by subtracting their respective means to obtain P, where i=l,... , N. Then image block B is projected onto each of these random smooth patterns. If a projection is greater than zero, then the hash bit generated is a “1” otherwise a “0” is generated. In this way, N hash bits are generated for image authentication. Because the patterns P, have zero mean, the projections do not depend on the mean gray value of the block but depend only on the variations within the block itself. The robustness of this bit extraction technique was tested on real imagery, and it was shown that it can reliably extract more than 48 correct bits (out of 50 bits) from a small 64 x 64 image for the following image processing operations: 15% quality JPEG compression (as in PaintShop Pro); additive uniform noise that has an amplitude of 30 gray levels; 50% contrast adjustment; 25% brightness adjustment, dithering to 8 colors; multiple applications of sharpening, blurring, median, and mosaic filtering; histogram equalization and stretching; edge enhancement; and gamma correction in the range 0.7-1.5. However, operations, such as embossing, and geometric modifications, such as rotation, shift, and change of scale, lead to a failure to extract the correct bits. In summary, image content authentication using a visual hash function and then embedding this hash by using a robust watermark is a promising area and will see many developments in the coming years. This is a difficult problem, and there may never be a completely satisfactory solution because there is no clear definition of image content and relatively small changes in image representation could lead to large variations in image content. ROBUST WATERMARKS Unlike fragile watermarks, robust watermarks are resilient to intentional or unintentional attacks or signal processing operations. Ideally, a robust watermark must withstand attempts to destroy or remove it. Some of the desirable properties of a good, robust watermark include the following: l
Perceptual transparency: Robustness must not be achieved at the expense of perceptible degradation of the watermarked data. For example, a high-energy watermark can withstand many signal processing
DIGITAL
l
l
l
l
l
attacks; however, even in the absence of any attacks this can cause significant loss in the visual quality of the watermarked image. Higher payload: A robust watermark must be able to carry a higher number of information bits reliably, even in the presence of attacks. Resilience to common signal processing operations such as compression, linear and nonlinear filtering, additive random noise, and digital-to-analog conversion. Resilience to geometric attacks such as translation, rotation, cropping, and scaling. Robustness to collusion attacks where multiple copies of the watermarked data can be used to create or remove a valid watermark. Computational simplicity: Consideration for computational complexity is important when designing robust watermarks. If a watermarking algorithm is robust but computationally very intensive during encoding or decoding, then its usefulness in real life may be limited.
In general, most of these above properties conflict with one another, so a number of trade-offs is needed. Three major trade-offs in robust watermarking and the applications that are impacted by each of these trade-off factors are shown in Fig. 9. It is easily understood that placing a watermark in perceptually insignificant components of an image imperceptibly distorts the watermarked image. However, such watermarking techniques are generally not robust to intentional or unintentional attacks. For example, if the watermarked image is lossy compressed, then the perceptually insignificant components are discarded by the compression algorithm. Therefore, for a watermark to be robust, it must be placed in the perceptually significant components of an image, even though we run a risk of causing perceptible distortions. This gives rise to two important questions: (a) what are the perceptually significant components of a signal, and (b) how can the perceptual degradation due to robust water-marking be minimized? The answer to the first question depends on the type of mediumaudio, image, or video. For example, certain spatial frequencies and some spatial characteristics such as edges in an image are perceptually
Figure
9. Trade-offs
in robust
watermarking.
WATERMARKING
165
significant. Therefore, choosing these components as carriers of a watermark will add robustness to operations such as lossy compression. There are many ways in which a watermark can be inserted into perceptually significant components. But care must be taken to shape the watermark to match the characteristics of the carrier components. A common technique that is used in most robust watermarking algorithms is adaptation of the watermark energy to suit the characteristics of the carrier. This is usually based on certain local statistics of the original image so that the watermark is not visually perceptible. A number of robust watermarking techniques have been developed during the past few years. Some of them apply the wattermark in the spatial domain and some in the frequency domain. Some are additive watermarks, and some use a quantize and replace strategy. Some are linear and some are nonlinear. The earliest robust spatial domain techniques were the MIT patchwork algorithm (8) and another one by Digimarc (9). One of the first and still the most cited frequencydomain techniques was proposed by Cox et al. (10). Some early perceptual watermarking techniques using linear transforms in the transform domain were proposed a recent spatial-domain algorithm that in (11). Finally, is remarkably robust was proposed by Kodak (12-16). Instead of describing these different algorithms independently, we chose to describe Kodak’s technique in detail because it clearly identifies the different elements that are needed in a robust watermarking technique . Kodak’s
Watermarking
Technique
A spatial watermarking technique based on phase dispersion was developed by Kodak (12- 16). The Kodak method is noteworthy for several reasons. First, it can be used to embed either a gray-scale iconic image or binary data. Iconic images include trademarks, corporate logos, or other arbitrary small images; an example is shown in Fig. 10a. Second, the technique can determine cropping coordinates without the need for a separate calibration signal. Furthermore, the strategy that is used to detect rotation and scale can be applied to other watermarking methods in which the watermark is inserted as a periodic pattern in the image domain. Finally, the Kodak algorithm reportedly scored 0.98 using StirMark 3.0 (15). The following is a brief description of the technique. For brevity, only the embedding of binary data is considered. The binary digits of the message are represented by positive and negative delta functions (corresponding to ones and zeros) that are placed in unique locations within a message image M. These locations are specified by a predefined message template T, an example of which is shown in Fig. lob. The size of the message template is typically only a portion of the original image size (e.g., 64 x 64, or 128 x 128). Next, a carrier image c~, which is the same size as the message image, is generated by using a secret key. The carrier image is usually constructed in the Fourier domain by assigning a uniform amplitude and a random phase (produced by
DIGITAL
WATERMARKING
Figure
11.
Schematic
of the watermark
a random number generator initialized by the secret key) to each spatial-frequency location. The carrier image is convolved with the message image to produce a dispersed message image, which is then added to the original image. Because the message image is typically smaller than the original image, the original image is partitioned into contiguous nonoverlapping rectangular blocks X,., which are the same size as the message image. The message embedding process creates a block of the watermarked image, X;(~G, y), according to the following relationship: x;(&Y)
= dW~,Y>
* &(x,Y)l
+mw~,
(3)
where the symbol * represents cyclic convolution and a is an arbitrary constant chosen to make the embedded message simultaneously invisible and robust to common processing. This process is repeated for every block in the original image, as depicted in Fig. 11. It is clear from Eq. (3) that there are no restrictions on the message
insertion
process.
image and its pixel values can either be binary or multilevel. The basic extraction process is straightforward and consists of correlating a watermarked image block with the same carrier image used to embed the message. The extracted message image M(Lc, y) is given by
where the symbol 8 represents cyclic correlation. The correlation of the carrier with itself can be represented by a point-spread function p (x, y) = Ed (LV,y) @ c~ (x, y) , and because the operations of convolution and correlation commute, Eq. (4) reduces to
DIGITAL
The extracted message is a linearly degraded version of the original message plus a low-amplitude noise term resulting from the cross correlation of the original image with the carrier. The original message can be recovered by using any conventional restoration (deblurring) technique such as Wiener filtering. However, for an ideal carrier, p(~, y) is a delta function, and the watermark extraction process results in a scaled version of the message image plus low-amplitude noise. To improve the signal-to-noise ratio of the extraction process, the watermarked image blocks are aligned and summed before the extraction process, as shown in Fig. 12. The summation of the blocks reinforces the watermark component (because it is the same in each block), and the noise component is reduced because the image content typically varies from block to block. To create a system that is robust to cropping, rotation, scaling, and other common image processing tasks such as sharpening, blurring, and compression, many factors need to be considered in designing of the carrier and the message template. In general, designing the carrier requires considering the visual transparency of the embedded message, the extracted signal quality, and the robustness to image processing operations. For visual transparency, most of the carrier energy should be concentrated in the higher spatial frequencies because the contrast sensitivity function (CSF) of the human visual system falls off rapidly at higher frequencies. However, to improve the extracted signal quality, the autocorrelation function of the carrier, p(x, y), should be as close as possible to a delta function, which implies a flat spectrum. In addition, it is desirable to spread out the carrier energy across all frequencies to improve robustness to both friendly and malicious attacks because the power spectrum of typical imagery falls off with spatial frequency and concentration of the carrier energy in high frequencies would create little frequency overlap between the image and the embedded watermark. This would render the watermark vulnerable to removal by simple low-pass filtering. The actual design of the carrier is a balancing act between these concerns. The design of an optimal message template is guided by two requirements. The first is to maximize the quality of the extracted signal, which is achieved by placing the
WATERMARKING
167
message locations maximally apart. The second is that the embedded message must be recoverable from a cropped version of the watermarked image. Consider a case where the watermarked image has been cropped so that the watermark tiles in the cropped image are displaced with respect to the tiles in the original image. It can be shown that the message extracted from the cropped image is a cyclically shifted version of the message extracted from the uncropped image. Because the message template is known, the amount of the shift can be unambiguously determined by ensuring that all of the cyclic shifts of the message template are unique. This can be accomplished by creating a message template that has an autocorrelation equal to a delta function. Although in practice it is impossible for the autocorrelation of the message template to be an ideal delta function, optimization techniques such as simulated annealing can be used to design a message template that has maximum separation and minimum sidelobes. The ability to handle rotation and scaling is a fundamental requirement of robust data embedding techniques. Almost all applications that involve printing and scanning result in some degree of scaling and rotation. Many algorithms rely on an additional calibration signal to correct for rotation and scaling, which taxes the information capacity of the embedding system. Instead, the Kodak approach uses the autocorrelation of the watermarked image to determine the rotation and scale parameters, which does not require a separate calibration signal. This method can also be applied to any embedding technique where the embedded image is periodically repeated in tiles. It can also be implemented across local regions to correct for low-order geometric warps. To see how this method is applied, consider the autocorrelation function of a watermarked image that has not been rotated or scaled. At zero displacement, there is a large peak due to the image correlation with itself. However, because the embedded message pattern is repeated at each tile, lower magnitude correlation peaks are also expected at regularly spaced horizontal and vertical intervals equal to the tile dimension. Rotation and scaling affect the relative position of these secondary peaks in exactly the same way that they affect the image.
Figure
12.
tion process.
Schematic
of the watermark
extrac-
168
DIGITAL
WATERMARKING
on communication and information theory is an ongoing process where theories are proposed and refined based on feedback from engineering applications of watermarks. In this section, we describe some communication and information theoretic aspects of digital watermarking. First, we describe the similarities and differences between classical communication and current watermarking systems. Once this is established, it becomes easier to adapt the theory of communications to watermarking and make theoretical predictions about the performance of a watermarking system. Following this discussion, we describe some information theoretic models applied to watermarking. Watermarking
Figure 13. (a) Example of a watermarked image tion and scale transformation and its corresponding tlon. (b) Image in top row after scale and rotational tion and its corresponding autocorrelation.
without rotaautocorrelatransforma-
By properly detecting these peaks, the exact amount of the rotation and scale can be determined. An example is shown in Figure 13. Not surprisingly, the energy of the original image is much larger than that of the embedded message, and the autocorrelation of the original image can mask the detection of the periodic peaks. To minimize this problem, the watermarked image needs to be processed, before computing the autocorrelation function. Examples of such preprocessing include removing the local mean by a spatially adaptive technique or simple high-pass filtering. In addition, the resulting autocorrelation function is highpass filtered to amplify the peak values. COMMUNICATION ASPECTS
AND
INFORMATION
THEORETIC
Communication and information theoretic approaches focus mainly on the theoretical analysis of watermarking systems. They deal with abstract mathematical models for watermark encoding, attacks, and decoding. These models enable studying watermarks at a high level without resorting to any specific application (such as image authentication, etc.). Therefore, the results obtained by using these techniques are potentially useful in a wide variety of applications by suitably mapping the application to a communication or information theoretic model. The rich set of mathematical models based primarily on the theory of probability and stochastic processes allows rigorous study of watermarking techniques; however, a common complaint from practitioners suggests that some of these popular mathematical theories are not completely valid in practice. Therefore, studying watermarks based
as Communication
Standard techniques from communication theory can be adapted to study and improve the performance of watermarking algorithms (17). Figure 14 shows an example of a communication system where the information bits are first encoded to suit the modulation type, error control, etc., followed by the modulation of a carrier signal to transmit this information across a noisy channel. At the decoder, the carrier is demodulated, and the information bits (possibly corrupted due to channel noise) are decoded. Figure 15 shows the counterpart system for digital watermarking. The modulator in Fig. 14 has been replaced by the watermark embedder that places the watermark in the media content. The channel noise has been replaced by the distortions of the watermark media induced by either malicious attacks or by signal processing operations such as compression/decompression, cropping, filtering, and scaling. The embedded watermark is extracted by the watermark decoder or detector. However, note that a major difference between the two models exists on the encoder side. In communication systems, encoding is done to protect the information bits from channel distortion, but in watermarking, emphasis is usually placed on techniques that minimize perceptual distortions of the watermarked content. Some analogies between the traditional communication system and the watermarking system are summarized in Table 1. We note from this table that the theory and algorithms developed for studying digital communication systems may be directly applicable to studying some aspects of watermarking. Note that though these two systems have common requirements, such as power and reliability constraints, these requirements may be motivated by different factors, for example, power constraint in a communication channel is imposed from a cost perspective, whereas in watermarking, it is motivated by perceptual issues. Information
Theoretic
Analysis
Information theoretic methods have been successfully applied to information storage and transmission (18). Here, messages and channels are modelled probabilistically, and their properties are studied analytically. A great amount of effort during the past five decades has produced many interesting results regarding the capacity of various channels, that is, the maximum amount
DIGITAL
Figure
Table
1. Analogies
Between
Communication
15.
Watermarking
and
Watermarking
of information that can be transmitted through a channel so that decoding this information with an arbitrarily small probability of error is possible. Using the analogy between communication and watermarking channels, it is possible to compute fundamental information-carrying capacity limits of watermarking channels using information theoretic analysis. In this context, the following two important questions arise: l
l
What is the maximum length (in bits) of a watermark message that can be embedded and distinguished reliably in a host signal? How do we design watermarking algorithms that can effectively achieve this maximum?
Answers to these questions can be found based on certain assumptions (19-28). We usually begin by assuming probability models for the watermark signal, host signal, and the random watermark key. A distortion constraint is then placed on the watermark encoder. This constraint is used to model and control the perceptual distortion induced due to watermark insertion. For example, in image or video watermarking, the distortion metric could be based on human visual perceptual criteria. Based on the application, the watermark encoder can use a suitable distortion metric and a value for this metric that must be met during encoding. A watermark attacker has a similar distortion constraint, so that the attack does not result in a completely corrupted watermarked signal that makes it useless to all parties concerned. The information that is known to the encoder, attacker,
as a communication
WATERMARKING
169
system.
System
and the decoder is incorporated into the mathematical model through joint probability distributions. Then, the watermarking capaczty is given by the maximum rate of reliably embedding the watermark in any possible watermarking strategy and any attack that satisfies the specified constraints. This problem can also be formulated as a stochastic game where the players are the watermark encoder and the attacker (29). The common payoff function of this game is the mutual information between the random variables representing the input and the received watermark. Now, we discuss the details of the mathematical formulation described before. Let a watermark (or message) W E W be communicated to the decoder. This watermark is embedded in a length-N sequence XN = . , XN) representing the host signal. Let the Wl~X2, * * watermark key known both to the encoder and the decoder be KN = (KI, K2, . . . , KN). Then, using W, XN, and KN, a watermarked signal XfN = (Xi, Xi, . . . , X;V) is obtained by the encoder. For instance, in transformbased image watermarking, each X, could represent a block of 8 x 8 discrete cosine transform coefficients, WN could be the spread spectrum watermark (lo), and KN could be locations of the transform coefficients where the watermark is embedded. Therefore, N = 4096 for a 512 x 512 image. Usually, it is assumed that the elements of XN are independent and identically distributed (i.i.d.) random variables whose probability mass function is p(x), x E X. Similarly, the elements of KN are i.i.d., and their probability mass function is p(k), k E K. If X and K denote generic random variables in the random vectors
170
DIGITAL
WATERMARKING
XN and KN, respectively, then any dependence between X and K is modeled by the joint probability mass function p(x, k). Usually, it is assumed that W is independent of (X, K). Then, a length-N watermarking code that has distortion Di is a triple (W, fN, &), where, W is a set of fN is messages whose elements are uniformly distributed, the encoder mapping, and @N is the decoder mapping that satisfy the following (25): l
l
The encoder mapping xfN = fN(ti, z.u,kN) E il? is such that the expected value of the distortion, E[dN(XN, X’N)] 5 D1. The decoder mapping is given by & = $&yN, kN) E W where y” is the received watermarked signal.
The attack channel is modeled as a sequence of conditional probability mass functions, AN(yN IxN) such that E[dN(XN, YN)l 5 D2. Throughout, it is assumed that dN(p, yN) = l/N J!$i d(x,, yj) where d is a bounded, nonnegative, real-valued distortion function. A watermarking rate R = l/N log IWI is said to be achievable for (01, Da> if there exists a sequence of watermarking D1 that have codes (W, fN, #N) subject to distortion respective rates RN > R such that the probability of error P, = l/lWl C,,wPr(& # ullW = UI) + 0, as N -+ 00 for any attack subject to D2. Then, the watermarking capacity C(D1, Da) is defined as the maximum (or supremum, in general) of all achievable rates for given D1 and 02. This information theoretic framework has been successfully used to compute the watermarking capacity of a wide variety of channels. We discuss a few of them next. When N = 1 in the information theoretic model, we obtain a single letter channel. Consider the single letter, discrete-time, additive channel model shown in Fig. 16. In this model, the message W is corrupted by additive noise J. Suppose that E(W) = E(J) = 0; then, the watermark power is given by E(W2) = erg, and the channel noise power is E(J2) = 0;. If W and J are Gaussian distributed, then, it can be shown that the watermarking capacity is given by l/2 ln( 1 + a$/~$) (28). For the Gaussian channel, a surprising result has also been found recently (25). Let W = R be the space of the watermark signal and d (w , y) = (w -Y)~ be the squared-error distortion measure. If X - Gaussian (0, a:), then, the capacities of the blind and nonblind watermarking systems are equal! This means that, irrespective of whether or not the original signal is available at the decoder, the watermarking rate remains the same. Watermark capacity has received considerable attention for the case where the host signal undergoes specific processing/attacks that can be modeled using well-known
Figure
16.
Discrete-time
additive channel noise model.
probability distributions. It is also a popular assumption that the type of attack which the watermark signal undergoes is completely known at the receiver and is usually modeled as additive noise. But, in reality, it is not guaranteed that an attack is known at the receiver, and, it need not be only additive; for example, scaling and rotational attacks are not additive. Therefore, a more general mathematical model, as shown in Fig. 17, is required to improve the capacity estimates for many nonadditive attack scenarios (20). In Fig. 17, we see that a random multiplicative component is also introduced to model an attack. Using the model in Fig. 17, where Gd and G,, respectively, denote the deterministic and random components of the multiplicative channel noise attack, it has been shown that (20) a traditional additive channel model such as that shown in Fig. 16 tends either to over- or underestimate the watermarking capacity, depending on the type of attack. A precise estimate for the loss in capacity due to the uncertainty about the channel attack at the decoder can be computed by using this model. Extensions of this result to multiple watermarks in a host signal show that, to improve capacity, a specific watermark decoder has to cancel the effect of the interfering watermarks rather than treating them as known or unknown interference. It has also been observed that (20) an unbounded increase in watermark energy does not necessarily produce unbounded capacity. These results give us intuitive ideas for optimizing the capacity of watermarking systems. Computations of information theoretic watermarking capacity do not tell us how to approach this capacity effectively. To address this important problem a new set of techniques is required. Approaches such as quantization index modulation (QIM) (23) address some of these issues. QIM deals with characterizing of the inherent trade-offs among embedding rate, embedding-induced degradation, and robustness of embedding methods. Here, the watermark embedding function is viewed as an ensemble of functions indexed by UI that satisfies the following property: x 5%x’vw. (6) It is clear that robustness can be achieved if the ranges of these functions are sufficiently separated from each other. If not, identifying the embedded message uniquely even in the absence of any attackes will not be possible. Equation (6) and the nonoverlapping ranges of the embedding functions suggest that the range of the embedding functions must cover the range space of x’ and the functions must be discontinuous. QIM embeds information by first modulating an index or a sequence of indexes by using the embedding information and
17. Multiplicative noise model.
Figure
and additive
watermarking
channel
DIGITAL
then quantizing the host signal by using an associated quantizer or a sequence of quantizers. We explain this by an example. Consider the case where one bit is to be embedded, that is, zu E (0, I}. Thus two quantizers are required, and their corresponding reconstruction points in RN must be well separated to inherit robustness to attacks. If UI = 1, the host signal is quantized by the first quantizer. Otherwise, it is quantized by the second quantizer. Therefore, we see that the quantizer reconstruction points also act as constellation points that carry information. Thus, QIM design can be interpreted as the joint design of an ensemble of source codes and channel codes. The number of quantizers determines the embedding rate. It is observed that QIM structures are optimal for memoryless watermark channels when energy constraints are placed on the encoder. As we can see, a fundamental principle behind QIM is the attempt to trade off embedding rate optimally for robustness. As discussed in previous sections, many popular watermarking schemes are based on signal transforms such as the discrete cosine transform and wavelet transform. The transform coefficients play the role of carriers of watermarks. Naturally, different transforms possess widely varying characteristics. Therefore a natural question to ask is, what is the effect of the choice of transforms on the watermarking capacity? Note that good energy compacting transforms such as the discrete cosine transform produce transform coefficients that have unbalanced statistical variances. This property, it is observed, enhances watermarking capacity in some cases (26). Results such as these could help us in designing high-capacity watermarking techniques that are compatible with transform-based image compression standards such as JPEG2000 and MPEG-4. To summarize, communication and information theoretic approaches provide valuable mathematical tools for analyzing watermarking techniques. They make it possible to predict or estimate the theoretical performance of a watermarking algorithm independently of the underlying application. But the practical utility of these models and analysis has been questioned by application engineers. Therefore, it is important that watermarking theoreticians and practitioners interact with each other through a constructive feed back mechanism to improve the development and implementation of the state-of-the-art digital watermarking systems.
Although significant progress has already been made, there still remain many open issues that need attention before this area becomes mature. This chapter has provided only a snapshot of the current state of the art. For details, the reader is referred to the survey articles (30-43) that deal with various important topics and techniques in digital watermarking. We hope that these references will be of use to both novices and experts in the field.” BIBLIOGRAPHY 1. M. Maes (2000).
et al.,
2. P. Wong
and
Digital watermarking is a rapidly evolving area of research and development. We discussed only the key problems in this area and presented some known solutions. One key research problem that we still face today is the development of truly robust, transparent, and secure watermarking techniques for different digital media, including images, video, and audio. Another key problem is the development of semifragile authentication techniques. The solution to these problems will require applying known results and developing new results in the fields of information and coding theory, adaptive signal processing, game theory, statistical decision theory, and cryptography.
IEEE N.
press).
Signal
Process.
Memon, IEEE
Mag.
Trans.
1’7(5),
Image
47-57
Process.
(in
3. S. Bhattacharjee, Proc. Int. Confi Image Process., Chicago, Oct. 1998. 4. D. Kundur and D. Hatzinakos, Proc. IEEE, Speczal Issue Identification (1999).
Protectzon
Multzmedia
Inf:
87(7),
1,167-1,180
5. C. Y. Lin and S. F. Chang, SPIE Storage and Retrzeval of Image/Video Databases, San Jose, January 1998. 6. R. Radhakrishnan and N. Memon, Proc. Int. Conf: Image Process, 971-974, Thessaloniki, Greece, Oct. 2001. 7. J. Fridrich, Proc. Int. Conf: Image Process, Chicago, Oct. 1998. 8. W. Bender, D. Gruhl, N. Morimoto, and A. Lu, IBM Syst. J. 35(3-4),
313-336
(1996).
Digimarc Corporation. http:llwww.digzmarc.com. T. Leighton, and T. Shamoon, 10. I. J. Cox, J. Kilian, 9.
Trans.
11.
Image
Process.
R. B. Wolfgang, 87(7),
1,108-1,126
6(12),
1,673-1,687
C. I. Podilchuk,
and apparatus
for
13. Method
for
embedding
14
for
detectzng
hzding
one zmage
US Pat. 5,905,819, 1999, S. J. Daly. digital
znformation
15
16.
rotation
and
IEEE
or pattern
withzn
zn an zmage,
Pat. 5,859,920, 1999, S. J. Daly et al. Method
IEEE
(1997).
and E. J. Delp, Proc.
(1999).
12. Method another,
magnificatzon
US
zn zmages,
US Pat. 5,835,639, 1998, C. W. Honsinger and S. J. Daly. C. Honsinger, IS&T PICS 2000, Portland, March 2000, pp. 264-268; C. W. Honsinger and M. Rabbani, Int. Conf. Inf Tech.: Coding Cornput., March 2000. Method forgenerating ding problem, US
an improved
carrzer
for the data
embed-
Pat. 6,044,156, 2000, C. W. Honsmger
M. Rabbani. 17. I. J. Cox, M. L. Miller, 1,127-1,141
CONCLUSIONS
171
WATERMARKING
and A. L. McKellips,
(1999).
Proc.
IEEE
and 87,
C. E. Shannon, Bell Syst. Tech. J. 27, 379-423 (1948). C. Cachin, Proc. 2nd Workshop Infi Hiding, 1998. Proc. SPIE Securzty and Watermarkzng 20. R. Chandramouli, 18. 19.
Multimedia
21. 22.
Contents
R. Chandramouli, 2001,
p. 4,518.
Signal
Process.,
III,
Proc.
2001.
SPIE
Multimedia
B. Chen and G. W Wornell, IEEE 1998,
2nd
Syst. Appl. Workshop
IV, Aug.
Multimedia
pp. 273-278.
B. Chen and G. W. Wornell, IEEE Int. Conf Multimedia Comput. Syst. 1, 13-18 (1999). 24. B. Chen and G. W. Wornell, IEEE Int. Conf: Acoust. Speech
23.
Signal
Process.
25. P. Moulin
and
-moulinlpaper.html,
4, 2,061-2,064
(1999).
M. K. Mihcak, June 2001.
http:ll
of
www.ifp.uzuc.eduI
172
DISPLAY
CHARACTERIZATION
26.
M. Ramkumar Multimedia
and A. N. Akansu, IEEE 2nd Signal Process. Dec. 1998, pp. 267-272.
27.
M. Ramkumar and Appl. 3528,482-492
28.
S. D. Servetto, C. I. Podilchuk, and K. Ramachandran, lnt. Conf: Image Process. 1,445-448 (1998).
29.
A. Cohen and A. Lapidoth, 2000, p. 48.
30.
P. Jessop, 2,077-2,080.
31.
F. Mintzer and G. W. Braudaway, Signal Process. 80, 2,067-2,070.
32.
M. Holliman, N. Memon, and Watermarkzng of pp. 134- 146.
33.
F. Hartung, Watermarking
Int.
A. N. Akansu, (1998).
Conf
Proc.
Acoust.
et al., Contents,
Int.
Multimedia
Symp.
Speech Int.
Inf:
B. Girod, Contents,
Acoust.
SPIE 1999,
SPIE Security and 1999, pp. 171-182.
36.
M. Kutter and F. A. P. Petitcolas, markzng of Multimedia Contents,
37.
Special
38.
W Zhu, Z. Xiong, and Y. Q. Zhang, IEEE Video Technol. 9(4), 545-550 (1999).
39.
Proc.
40.
Special
41.
M. D. Swanson, 86(6), 1,064-1,087
42.
G. C. Langelaar 20-46 (2000).
43.
C. Podilchuk 33-46 (2001).
Int.
Proc.
Workshop Issue,
IEEE
Infi
IEEE
and
87(7)
80, Speech
Securzty and pp. 147-158.
Watermarking
of
Watermarkzng
SPIE Security and 1999, pp. 226-239.
Water-
(1999). Trans.
Circuits
Syst.
Hiding.
J. Selected
Areas
M. Kobayashi, (1998). et al.,
June
Process.,
35. J. Fridrich and M. Goljan, SPIE Security and of Multimedia Contents, 1999, pp. 214-225.
Issue,
IEEE
Theory,
Signal Confi
Syst.
and M. M. Yeung, SPIE Security Multimedia Contents, Jan. 1999,
J. K. Su, and of Multimedia
34. J. Dittmann Multimedia
SPIE
Workshop
IEEE
E. Delp,
Commun.
and A. H. Tewfik, Signal
IEEE
Signal
Process. Process
(May Proc.
1998). IEEE
Mag.
17(5),
Mag.
18(4),
DISPLAY CHARACTERIZATION DAVID
H. BFUINARD
University Philadelphia,
of Pennsylvania PA
DENIS G. PELLI New York University New York, NY TOM ROBSON Cambridge Research Systems, Ltd. Rochester, Kent, UK
now computer controlled, and this makes it possible for the computer to take into account the properties of the imaging device to achieve the intended image. We emphasize CRT (cathode ray tube) monitors and begin with the standard model of CRT imaging. We then show how this model may be used to render the desired visual image accurately from its numerical representation. We discuss the domain of validity of the standard CRT model. The model makes several assumptions about monitor performance that are usually valid but can fail for certain images and CRTs. We explain how to detect such failures and how to cope with them. Here we address primarily users who will be doing accurate imaging on a CRT. Inexpensive color CRT monitors can provide spatial and temporal resolutions of at least 1024 x 768 pixels and 85 Hz, and the emitted intensity is almost perfectly independent of viewing angle. CRTs are very well suited for accurate rendering. Our treatment of LCDs (liquid crystal displays) is brief, in part because this technology is changing very rapidly and in part because the strong dependence of emitted light on viewing angle in current LCD displays is a great obstacle to accurate rendering. Plasma displays seem more promising in this regard. We present all the steps of a basic characterization that will suffice for most readers and cite the literature for the fancier wrinkles that some readers may need, so that all readers may render their images accurately. The treatment emphasizes accuracy both in color and in space. Our standard of accuracy is visual equivalence: substituting the desired for the actual stimulus would not affect the observer.2 We review the display characteristics that need to be taken into account to present an arbitrary spatiotemporal image accurately, that is, luminance and chromaticity as a function of space and time. We also treat a number of topics of interest to the vision scientist who requires precise control of a displayed stimulus. The International Color Consortium (ICC, http: llwww. coZor.orgl) has published a standard file format (4) for storing “profile” information about any imaging device.3 It is becoming routine to use such profiles to achieve accurate imaging (e.g. by using the popular Photoshop@ programX4 The widespread support for profiles allows most users to achieve characterization and correction without needing to understand the underlying characteristics of the imaging device. ICC monitor profiles use the standard CRT model presented in this article. For applications where the standard CRT model and instrumentation designed for the mass market are sufficiently accurate, users can simply buy a characterization package consisting of a program
INTRODUCTION This article describes the characterization computer-controlled disp1ays.l Most imaging
and use of devices are
1 The display literature often distinguishes between calibration and characterization (e.g. l-31, calibration refers to the process of adjusting a device to a desired configuration, and characterzzation refers to modeling the device and measuring its properties to allow accurate rendering. We adopt this nomenclature here.
2 The International Color Consortium (4) calls this “absolute calorimetric” rendering intent, which they distinguish from their default “perceptual” rendering mtent. Their “perceptual” intent specifies that “the full gamut of the image is compressed or expanded to fill the gamut of the destination device. Gray balance is preserved but calorimetric accuracy might not be preserved.” 3 On Apple computers, ICC profiles are called “ColorSync” profiles because the ICC standard was based on ColorSync. Free C source code 1s available to read and write ICC profiles (5,6). 4 Photoshop is a trademark of Adobe Systems Inc.
DISPLAY
and a simple calorimeter that automatically produces ICC data in a profiles for their monitors. 5 The ICC-required monitor profile are just enough6 to specify the standard CRT model, as presented here. The ICC standard also allows extra data to be included in the profile, making it possible to specify extended versions of the standard CRT model and other relevant aspects of the monitor (e.g., modulation transfer function and geometry). This article explains the standard CRT model (and necessary basic calorimetry) and describes simple visual tests (available online at http:llpsychtoolbox.org/tips/displaytest. html) that establish the model’s validity for your monitor. Then, we show how to use the standard model to characterize your display for accurate rendering. Finally, the Discussion briefly presents several more advanced topics, including characterization of non-CRT displays.7 CRT
MONITOR
BASICS
We begin with a few basic ideas about the way CRT monitors produce light and computers control displays. The light emitted from each location on a monitor is produced when an electron beam excites a phosphor coating at the front of the monitor. The electron beam scans the monitor faceplate rapidly in a raster pattern (left to right, top to bottom), and the intensity of the beam is modulated during the scan so that the amount of light varies with the spatial position on the faceplate. It is helpful to think of the screen faceplate as being divided up into contiguous discrete pixels. The video voltage controlling the beam intensity is usually generated by a graphics card, which emits a new voltage on every tick of its pixel clock (e.g., 100 MHz). The duration of each voltage sample (e.g., 10 ns) determines the pixel’s width (e.g., 0.3 mm). Each pixel is a small fraction of a raster line painted on the phosphor as the beam sweeps from left to right. The pixel height is the spacing of raster lines. Most monitors today are “multisync,” allowing the parameters of the raster (pixels per line, lines per frame, and frames per second) to be determined by the graphics card. Color monitors contain three interleaved phosphor types (red, green, and blue) periodically arranged as dots or stripes across the face of the monitor. There are three electron beams and a shadow mask arranged so that each beam illuminates only one of the three phosphor types. 5 Vendors of such packages include Monaco (http:ll www.monacosys.coml) and ColorBlind (http:llwww.color.coml). 6 Actually, as noted by Gill (7), it is “a pity that the media black point is not a mandatory part of the profile m the same way [that] the media white point is, since the lack of the black point makes absolute colorimetrlc profile interpretation inaccurate”. Thus, before purchasing monitor characterization software, one should consider whether the software will include the optional media black point tag in the monitor profiles it produces. We have not found a precise definition of the term “media black point” m the ICC documentation, but we infer that it refers to a specification of the ambient light A(h) defined in Eq. (1) below. 7 More detailed treatments of CRTs (e.g., 2,8-11; see 12), calorimetry (e.g., 13-15), and use of ICC profiles (3) may be found elsewhere.
CHARACTERIZATION
173
The phosphor interleaving is generally much finer than a pixel, so that the fine red-green-blue dots or stripes are not resolved by the observer at typical viewing distances. We will not discuss it here, but for some applications, for example, rendering text, it is useful to take into account the tiny displacements of the red, green, and blue subpixel components (16,17). The stimulus for color is light. The light from each pixel may be thought of as a mixture of the light emitted by the red, green, and blue phosphors. Denote the spectrum of the light emitted from a single monitor pixel by C(A). Then, C(A) = rR(h)
+gG(A)
+ bB(h) + A(L),
(1)
where h represents wavelength, R(h), G(h), and B(h) are the spectra of light emitted by each of the monitor’s phosphors when they are maximally excited by the electron beam, r, g, and b are real numbers in the range [O, 11, and A(h) is the ambient (or “flare”) light emitted (or reflected) by the monitor when the video voltage input to the monitor is zero for each phosphor. We refer to the values r, g, and b as the phosphor light intensities.8 Later, we discuss the kind of equipment one can use to measure light, but bear in mind that the characterization of a display should be based on the same light as the observer will see. Thus the light sensor should generally be in approximately the same position as the observer’s eye. Happily, the luminance of CRTs is almost perfectly independent of viewing angle (i.e., it is nearly a lambertian light source), allowing characterization from one viewing point to apply to a wide range. (LCDs lack this desirable property. From a fixed head position, you can see variations in hue across a flat panel caused by the different viewing angles of the nearer and farther parts. This is a serious obstacle to characterization of LCDs for accurate rendering unless the viewpoint can be fixed. Plasma displays are much better in this regard.) Note that Eq. (1) depends on the assumption of channel constancy: to a very good approximation, the relative spectrum of light emitted by each (R, G, or B) monitor channel is independent of the degree to which it is excited.g To simplify the main development below, we assume that A(k) = 0. Correcting for nonzero ambient is 8 Spectrally, Eq. (1) says that the light is a linear combmation of contributions from the three phosphors. Similarly, spatially, the light measured at a particular pixel location IS a linear combination of contributions from several pixels. The weighting of the contributions is the point-spread function. The point spread may be neglected while characterizing the display’s color properties if you use a uniform test patch that is much bigger than the point spread. Thus, Eq. (1) does not take into account blur introduced by the optics of the electron beam and the finite size of the phosphor dots: the intensities r, g, and b describe the total light intensity emitted at a pixel but not how this light is spread spatially. Treatment of the point-spread function is provided in the Discussion (see Modulation Transfer Function). ’ Strictly speaking, Eq. (1) only requires phosphor constancy, the assumption that the relative spectrum of light emitted by each of the CRT’s phosphors is invariant. It is possible for phosphor constancy to hold and channel constancy to be violated. for example, when the degree to which electrons intended for one
174
DISPLAY
CHARACTERIZATION
Figure 1. Schematic of graphics card and CRT monitor. The figure illustrates the video chain from digital pixel value to emitted light. As mentioned in the text and illustrated in Fig. 2, most graphics cards can run either in 24-bit mode, in which mred, mgreen, and mblue are independent, or in s-bit mode, in which &e-j = mgreen = mblue. The digital video output of each lookup table is 8 bits in most commercial graphics cards, but a few cards have more than 8 bits to achieve finer steps at the digital-to-analog converter (DAC) output. See color insert.
both straightforward and recommended, as explained in the Discussion. Figure 1 shows how a pixel is processed. The graphics card generates the video voltages based on values stored in the onboard memory. Typically, user software can write digital values into two components of graphics card memory: a frame buffer and a color lookup table. The top panel of Fig. 2 illustrates the operation of the frame buffer and lookup table for what is referred to as “true color” or 24-bit (“millions of colors”) rn0de.l’ Three &bit bytes in the frame buffer are allocated for each monitor pixel. As shown in the figure, this memory can be thought of as three image planes, each arranged in a spatial grid corresponding to the layout of the monitor pixels. These are labeled Red, Green, and Blue in the figure. Each image plane controls the light emitted by one monitor phosphor. Consider a single pixel. The 8-bit (256 level) value mred in the Red image plane is used as an index to the red lookup table Fred(). This table has 256 entries, and each entry specifies the digital video value R used to generate the video voltage z&d that goes to the monitor to control the intensity of the electron beam as it excites the red phosphor at the pixel of interest. Similar indirection is used to obtain the digital video values G and B for the
green and blue phosphors: R
&red)
G = FDeen B
=
T
(mgreen)
Fblue(mblue)
y
(2)
.
In most graphics cards sold today, R, G, and B are B-bit numbers driving &bit digital-to-analog converters (DACs). This is very coarse compared to the 12- to 14-bit analog-todigital converters used in practically all image scanners sold today. It is odd that the digital imaging industry does not allow us to display with the same fidelity as we capture. However, some graphics cards for demanding applications such as radiology and vision science, do provide higher precision more-than-&bit DACs, using the lookup table F to transform the &bit values of r?‘&& mgreen, and mblue into values of R, G, and B selected from a palette of more finely quantized numbers. R, G, and B drive the graphic card’s The numbers digital-to-analog converters, which produce video voltages proportional to the numbers, except for small errors in the digital-to-analog conversion: ured vgreen
phosphor dot excite another varies with digital video value (see, e.g., 1; also l&19>. For the presentation here, it is more convenient not to distinguish explicitly between phosphor and channel constancy. lo This mode IS also sometimes referred to as 32-bit mode: the 24 bits of data per pixel are usually stored in 32-bit words because this alignment provides faster memory access on most computers.
= Fred
= R =
Ublue =
i- f&d@,
G + egreen(G, B + eblue@,
R’), (3, B’),
(3)
where v represents voltage on a dimensionless scale proportional to the actual voltage, e is the error, and R’, G’, and B’ are the values of R, G, and B for the immediately preceding pixel in the raster scan. The error e has a static and a dynamic component. The static error depends
DISPLAY
CHARACTERIZATION
175
Figure 2. Graphics card operation. Top panel: 24-bit mode. The frame buffer may be thought of as three separate planes, one each for the red, green, and blue channels. Each plane allows specification of an 8-bit number (O-255) for each image location. In the figure, values for the pixel at x = 4, y = 3 are shown: these are 0 for the red plane, 2 for the green plane, and 4 for the blue plane. For each plane, the frame buffer value is used as an index to a lookup table. A frame buffer value of 0 indexes the first entry of the lookup table, and a frame buffer value of 255 indexes the last entry. Thus the configuration shown m the figure causes digital RGB values (17,240, 117) to be sent to the DACs for the pixel at x = 4, y = 3 Bottom panel: &bit mode. The frame buffer IS a single image plane, allowing specification of a smgle &bit number for each image location. This number is used to index to a color lookup table that provides the RGB values to be sent to the DACs. The 8-bit configuration shown in the bottom panel displays the same color for the pixel at x = 4, y = 3 as the 24-bit configuration shown m the top panel. only on the current number and is sometimes specified by the manufacturer. It is generally less than one-half of the smallest step, that is, f0.5 in the dimensionless scale we are using here (e.g., O-255 for &bit DACs). and is a brief The dynamic error is called a “glitch” (few ns) error that depends on the relationship between the current value and that immediately preceding. The glitch is caused by the change of state of the switches (one per bit) inside the digital-to-analog converter. The glitch has a fixed duration, whereas the pixel duration is determined by the pixel clock rate, so the contribution of the glitch to the light intensity is proportionally reduced as the pixel duration is increased. The glitch is usually negligible.
The three video voltages produced by the graphics card, as described in Eq. (3), are transmitted by a cable to the CRT. Within the monitor, a video amplifier drives a cathode ray tube that emits light. The light emitted by the phosphors is proportional to the electron beam intensity, but that intensity is nonlinearly related to the video voltage, so the light intensity is a nonlinear function of video voltage: r = fredhed), g = fgreen (Ugreen) b = fblu&blue)~
3
(4)
176
DISPLAY
CHARACTERIZATION
where r, g, and b have the same meaning as in Eq. (1) and fred(), f,,,,(), and f&& are so-called gamma functions11 that characterize the input-output relationship for each monitor primary. Note that the term “gamma function” is often used today to describe the single-pixel nonlinear transformation of many display devices, including printers. There are differences in both the form of the nonlinearities and the mechanisms that cause them. The gamma function of a CRT is caused by a space charge effect in the neck of the tube, whereas the gamma function of printers is usually caused by the spreading and overlapping of ink dots and the scatter of light within the paper. Different models are needed to account satisfactorily for the nonlinearities of different devices. Some analog-input LCD displays and projectors process the incoming signal electronically to provide a nonlinear response that approximates that of a traditional CRT, but they are being displaced by newer digital-input LCDs. Equations (3) and (4) incorporate further assumptions about monitor performance, including assumptions of pixel independence and channel independence. These assumptions, particularly pixel independence, greatly simplify the CRT model and make it practical to invert it. Pixel independence is the assumption that the phosphor light intensities (r, g, and b) at each pixel depend solely on the digital video values for that pixel, independent of the digital video values for other pixels. Recall that, as a matter of convenience, Eq. (1) defines the phosphor intensities r, g, and b before blur by the point-spread function of the CRT. The real blurry pixels overlap, but pixel independence guarantees that the contribution of each pixel to the final image may be calculated independently of the other pixels. The final image is the sum of all of the blurry pixels (21). Channel independence is the assumption that the light intensity for one phosphor channel depends solely on the digital video value for that channel, independent of the digital video values for the other channels. The validity of these assumptions is discussed later. There is a second common mode for graphics cards. This is typically referred to as “indexed color” or &bit (“256 colors”) mode. It is illustrated at the bottom of Fig. 2. Here, there is only a single plane of &bit frame buffer values. These index to a single color lookup table to obtain R, G, and B values; at each pixel, the same index value is used for all three channels so that mred = mgreen = mbiue. Thus, for 8bit mode, only 256 colors (distinct R, G, B combinations) are available to paint each frame. It is usually possible to load an entirely new lookup table for l1 The term gamma function is also used to describe the relationship between phosphor intensities (i.e., r, g, and b) and digital video values (i.e., R, G, and B) because this latter relationship is often the one measured. It 1s called a “gamma function” because the relationship has traditionally been described by power-law like functions where the symbol gamma denotes the exponent, for example, r cx [(R - Ro)/(255 - Ro)lY for R > Ro and r = 0, otherwise [see Eq. (16) later]. Gamma is usually in the range 2 to 3, with a value of 2.3 quoted as typical (11; see also 20).
each frame. Most current graphic cards can be configured to operate in either 8 or 24-bit mode. The advantage of 8bit mode is that images can be moved into the frame buffer more rapidly, because less data must be transferred. The disadvantage is that only 256 colors are available for each frame. Both 8- and 24-bit modes are important and widely used in visual psychophysics (see 22). Although it is useful to understand the difference between 8- and 24-bit graphics modes, the distinction is not crucial here. The digital part of the CRT video chain is simple and does not introduce distortions. As illustrated by Fig. 3, the key point is that, for each pixel, we must compute the appropriate R, G, and B values to produce arbitrary desired phosphor intensities r, g, and b. This computation relies on measurements of the analog portion of the video chain and on the properties of the display. In particular, it is necessary to characterize the spectral properties of the light emitted by the monitor phosphors [Eq. (l)] and th e gamma functions [Eqs. (3) and (4)]. BASIC COLORIMETRY Equation (1) shows that monitors can produce only a very limited set of spectra C(k), those that consist of a weighted sum of the three fixed primaries. But that is enough because human color vision is mediated by three classes of light-sensitive cones, referred to as the L, 1M, and S cones (most sensitive to long, middle, and short wavelengths, respectively). The response of each class of cones to a spectrum incident at the eye depends on the rate at which the cone pigment absorbs photons, and this absorption rate may be computed via the spectral sensitivity of that class of cones. Denote the spectral sensitivities of the L, M, and S cones as L(1), M(h), and S(h), respectively. Then the quanta1 absorption rates I, m, and s of the L, M, and S cones for a color stimulus whose spectrum is C(h) are given by the integrals 780 nm I=
.I
UWW
d&
380" nm
m=
I
WW@)
dh,
386 nm
where each integral is taken across the visible spectrum, approximately 380 nm to 780 nm. We refer to these quanta1 absorption rates as the cone coordinates of the spectrum C(A). The integrals of Eq. (5) may be approximated by the sums
I = ~L(h,)C(h,)A*,
DISPLAY
177
CHARACTERIZATION
Figure 3. The standard CRT model. Based on Fig. 1, this reduced schematic shows the subsection of the video chain described by the standard CRT model. This subsection takes the digital video values R, G, and B as input and produces mtensities r, g, and b as output. See color insert.
(6) across wavelengths h, evenly spaced across the visible spectrum, where Ah is the interval between wavelength samples. The CIE recommends sampling from 380 nm and 780 nm at 5 nm intervals, making n = 81. Using matrix notation, s=
1 [I m
,
S
s =
[
L(h2) * * *
Jm.1) M(hl)
M(h2)
. .
S(h)
S(h2).
.
r c(h)
Ah,
(7)
L : we can rewrite
1
Eq. (6) as
s=sc.
(8)
Equation (8) computes cone coordinates s from the spectrum c of the color stimulus. When two physically distinct lights that have the same spatial structure result in the same quanta1 absorption rates for the L, M, and S cones, the lights are indistinguishable to cone-mediated vision.12 Thus, accurately rendering a desired image on a characterized monitor means choosing R, G, and B values so that the spectrum c emitted by the monitor produces the same cone coordinates s as the desired image. l2 We neglect for now a possible effect of rod signals that can occur at light levels typical of many color monitors. See Discusszon: Use of Standard Colorzmetrzc Data.
CHARACTERIZATION
MODEL
USING THE STANDARD
CRT
The CRT model presented above is the standard CRT model used for color characterization (e.g., 1,2,23).13 Using the standard CRT model, we can find the digital video values R, G, and B required to render a spectrum C(h) through the following computational steps: 1. Computsng cone coordinates. Use Eq. (8) to find the I, m, and s quanta1 absorption rates corresponding to C(h). 2. Computing phosphor light intensities. Find r, g, and b such that the mixture expressed on the right side of Eq. (1) produces the same quanta1 absorption rates. This computation relies on measurements of the phosphor spectra but is independent of the gamma functions. 3. Gamma correction. Find DAC values R, G, and B that will produce the desired phosphor intensities r, g, and b. This computation relies on measurements of the gamma functions but is independent of the phosphor spectra. Because each intensity r, g, and b under the standard CRT model is a simple monotonic function of the corresponding digital video value (R, G, or B), it is straightforward to invert each gamma function to find the necessary digital video value to produce any desired output (assuming that the DAC error e is negligible). The measurements and computations required for each of these steps are discussed in more detail below. Note that before actually using the standard model, you will want to make sure it is valid for your monitor and stimuli. l3 Both Post (1) and Cowan (2) provide systematic developments of the standard model. Different authors have used different terms for the assumptions embodied by the standard model. Our nomenclature here is similar to Post’s (1).
178
DISPLAY
CHARACTERIZATION
Detecting failure of the standard CRT model and possible remedies are also discussed below. Most CRT monitors have controls labeled “contrast” and “brightness” that affect the gamma function, the ambient luminance. These should light A(h), and the maximum be adjusted before beginning the characterization process. Write the date and “do not touch” next to the controls once they are adjusted satisfactorily. Computing
Cone
Coordinates
Recall that, given the spectrum of a color stimulus c, Eq. (8) allows us to compute the cone coordinates s. Measuring spectra requires a spectroradiometer. But spectroradiometers are expensive, so it is more common to use a less expensive calorimeter to characterize the spectral properties of the desired light. A calorimeter measures the CIE XYZ tristimulus coordinates of light. XYZ tristimulus coordinates are closely related to cone coordinates. Indeed, the cone coordinates s of the light may be obtained, to a good approximation, by the linear transformation
s=Tx
(9)
where
0.2420
T =
0.8526
-0.3896
-0.0445 0.0853
1.1601 -0.0018
[ 0.0034
and
X Y [I z
x=
0.5643
where the 3 x 3 matrix M equals S P. We can compute matrix inverse of M and solve for w,
w = M-h.
the
(14)
Equation (14) computes the phosphor intensities w (r, g, and b) that will produce the desired cone coordinates s (I, m, and s). Calculation of Eq. (14) requires knowledge of the matrix M, M=SP L(h,)ft(h,)Ah
L z=l
&(h,)G(hi)Ah
&(h,)B(hi)Ai,
z=l
2=1
-I
(15) Each element of M is the quanta1 absorption rate of one cone class for one phosphor spectrum at maximum excitation. Each column of M is the set of cone coordinates s of one phosphor spectrum [Eqs. 6 and 71. Thus, to implement Eq. (14), it is necessary to know these cone coordinates for each phosphor. The best way to find them is to use a spectroradiometer to measure the phosphor emission spectra R(L), G(h), and B(h). If such measurements are not feasible and only the XYZ tristimulus coordinates of each phosphor are available, then the matrix M may be computed using the relationship between the cone and tristimulus coordinates given in Eqs. (9) and (10).
1 (10)
.
The matrix T was calculated from the Smith-Pokorny estimates (24,25) of cone spectral sensitivities, each cone sensitivity was normalized to a peak sensitivity of one, and the XYZ functions were as tabulated by the CIE (26,27). The appropriate matrix may be easily computed for any set of cone spectral sensitivities (14). A detailed discussion of the relationship between cone coordinates and tristimulus coordinates is available elsewhere (14,15). Computing
Adopting intensities
Phosphor
matrix w, p
Intensities
notation
=
[:
w= allows us to rewrite
for the phosphor
NW
G(h)
W-1)
W2)
G&2)
W2)
[I
r g 7 b
.
Eq. (1) [neglecting
spectra
1 ,
!
(11)
A(h)] as
c=Pw. Using this to substitute
P and
(12)
for c in Eq. (8) yields
s=SPw=Mw.
(13)
Figure 4. Typical gamma function. The plot shows the gamma function for a typical CRT monitor. The solid circles show measured data. The measurement procedure is described in the text. The intensity data were normalized to a maximum of one for a digital video value of 255. The line shows the best fit obtained using Ea. (16). with narameter estimates v = 2.11 and Gn = 0.
DISPLAY
Alternatively, if one prefers to specify color stimuli in terms of XYZ coordinates rather than cone coordinates, one may set T as the identity matrix and work directly in XYZ coordinates, starting with the calorimetric measurements and replacing the cone spectral sensitivities used to form the matrix S in the previous derivations by the CIE XYZ color matching functions. Gamma
Correction
Figure 4 shows a typical gamma function measurement. of light emitted by To obtain this curve, the spectrum the green phosphor was measured for 35 values of the G digital video value where R = B = 0. A measurement of the ambient light (R = G = B = 0) was subtracted from each individual measurement. Then, for each measured spectrum, a scalar value was found that expressed that spectrum as a fraction of the spectrum G(h) obtained from the maximum digital video value (G = 255). These scalars take on values between 0 and 1 and are the measured gamma function fmeen(). Given a desired value for the green-phosphor intensity g, gamma correction consists of using the measured gamma function to find the digital value G that produces the best approximation to g. This is conceptually straightforward, but there are several steps. The first step is to interpolate the measured values. a Although exhaustive measurement is possible, fitting to the measured values has the parametric function advantage of smoothing any error in the gamma function measurements.14 For most CRT monitors, measured gamma functions are well fit by the functional form (e.g., for the green phosphor)
Because G must be an integer and the value computed by Eq. (17) is real, the computed value of G should be rounded up or down to minimize the error in g predicted by Eq. (16). DETECTING
g = f,,dG
+ egreen)= 0,
(16)
In Eq. (16), parameter Go represents a cutoff digital video value below which no incremental light is emitted, and parameter y describes the nonlinear form of the typical gamma function. The constant 255 in Eq. (16) normalizes the digital video value and is appropriate when G is specified by &bit numbers. To be consistent with Eq. (4), we retain egreen, which is usually negligible. The curve through the data in Fig. 4 represents the best fit of Eq. (16) to the measured data. Note that Eq. (16) requires that phosphor intensity g = 0 when digital video value G = 0. This is appropriate because both our definition of phosphor intensities [Eq. (l)] and our measurement procedure (see above) account explicitly for the ambient light. Equation (16) is easily inverted to determine, for example, G from g: G = (255 - Go)glly
+ Go.
(17)
l4 In a demanding application using &bit DACs, it may be desirable to measure light intensity accurately for all 256 digital video values and to use the measured data directly instead of fitting a smooth function, so that the DAC’s static error is taken into account m the gamma correction.
FAILURES
OF
THE
STANDARD
CRT
MODEL
The standard CRT model is useful because it generally provides an acceptably accurate description of the performance of actual CRT monitors, because measurement of its parameters is straightforward, and because it is easily inverted. Use of the standard CRT model is implicit in the information stored in ICC monitor profiles: these profiles store information about the XYZ tristimulus coordinates and gamma function for each channel (R, G, and B). For very accurate rendering, however, it is important to be aware of the assumptions that the model embodies and how these assumptions can fail. A number of authors provide detailed data on how far actual CRT monitors deviate from the assumptions of the standard CRT model (1,2,18,19,21,28-31). As noted above, the key assumption for the standard CRT model is pixel independence, expressed implicitly by Eqs. (3) and (4). Pixel independence is particularly important because, though it is easy to extend the standard CRT model to deal with dependent pixels, the extended model is hard to invert, making it impractical for fast image rendering. The next section (Extending the Standard CRT ModeZ) briefly discusses other assumptions for the standard CRT model. It is generally possible to cope with the failures of these other assumptions because the standard CRT model can be extended to handle them in a fashion that still allows it to be inverted rapidly. Pixel
otherwise.
179
CHARACTERIZATION
Independence
There are at least six causes for the failure of pixel independence (also see 21). Dependence on other pixels in the same frame (failures l-4, below) has been called “spatial dependence”, and dependence on pixels in preceding frames (failures 5 and 6) has been called “temporal dependence” (1). Failures 1 and 2 are short range, causing each pixel’s color to depend somewhat on the preceding pixel in the raster scan. Failures 3-6 are long range in the current frame (3 and 4) or in the preceding frames (5 and 6). 1. Glitch. As mentioned earlier, the glitch (the dynamic component of the DAC error) depends on the preceding as well as the current pixel. The DAC manufacturer generally specifies that the contribution of the glitch to the average voltage for the duration of a pixel at maximum clock rate is a fraction of a least significant step, so it is usually negligible. 2. Finite video bandwidth and slew rate. The whole video signal chain -including the DACs on the graphics card, the video cable, and the monitor’s amplifiers -has a finite bandwidth so that it cannot instantly follow the desired voltage change from pixel to pixel, resulting in more gradual voltage
180
DISPLAY
CHARACTERIZATION
transitions. Because this filtering effect precedes the nonlinear gamma function, it is not equivalent to simply blurring the final image horizontally or averaging along the raster lines. Smoothing before the nonlinear gamma function makes the final image dimmer. (Because the gamma function is accelerating, the average of two luminances produced by two voltages is greater than the luminance produced by the average voltage.) An excellent test for this effect is to compare the mean luminance of two fine gratings, one vertical and one horizontal (21,29,31,32). This test is available on-line at http:llpsychtoolbox.org/ tipsldisplaytest.html. Each grating consists of white and black one-pixel lines. Only the vertical grating is attenuated by the amplifier’s bandwidth, so it is dimmer. Like the glitch, this effect can be reduced by slowing the pixel clock in the graphics card or, equivalently, by using two display pixels horizontally (and thus two clock cycles) to represent each sample in the desired image. Video bandwidth is normally specified by the manufacturer and should be considered when choosing a CRT monitor. 3. Poor high-voltage regulation. The electron beam current is accelerated by a high-voltage (15 to 50 kV) power supply, and on cheaper monitors, the voltage may slowly drop when the average beam current is high. This has the effect of making the intensity of each pixel dependent on the average intensity of all of the pixels that preceded it. (The highvoltage supply will generally recuperate between frames.) You can test for such long-distance effects by displaying a steady white bar in the center of your display surrounded by a uniform field of variable luminance. Changing the surround from white to black ideally would have no effect on the luminance of the bar. To try this informally without a photometer, create a cardboard shield with a hole smaller than the bar to occlude a flickering surround, and observe whether the bar is steady. This effect depends on position. The effect is negligible at the top of the screen and maximal at the bottom. A single high-voltage supply generally provides the current for all three channels (R, G, and B), so that the effect on a particular test spot is independent of the channel used for background modulation. When the high voltage is very poorly regulated, the whole screen image expands as the image is made brighter, because as the increased current pulls the high voltage down, the electrons take longer to reach the screen and deflect more. 4. Incomplete DC restoration. Unfortunately, the video amplifier in most CRT monitors is not DC coupled (21). Instead it is AC coupled most of the time, and momentarily DC coupled to make zero volts produce black at the end of the vertical blanking interval. (DC, “direct current,” refers to zero temporal frequency; AC, “alternating current,” refers to all higher frequencies.) This is called “DC restoration,” which is slightly cheaper to design and build than a fully DC coupled video circuit. If the AC
time constant were much longer than a frame, the DC restoration would be equivalent to DC coupling, but, in practice, the AC time constant is typically short relative to the duration of a frame, so that the same video voltage will produce different screen luminances depending on the average voltage since the last blanking interval. As for failure 3, this effect is negligible at the top of the screen and maximal at the bottom. However, this effect can be distinguished from failure 3 by using silent substitution. To test, say, the green primary, use a green test spot, and switch the background (the rest of the screen) back and forth between green and blue. The green and blue backgrounds are indistinguishable to the high-voltage power supply (it serves all three guns) but are distinct to the video amplifiers (one per gun). 5. Temporal dependence. This is a special case of pixel dependence, where the pixel intensities depend on pixels in preceding frames. A typical example of this occurs when a full-field white screen is presented after a period of dark. Amongst other things, the large change in energy delivered to the screen results in distortions of the shadow mask and displayed luminances that are temporarily higher than desired. This effect may persist for several frames. It is normally difficult to characterize these phenomena precisely, especially in a general model, and the best line of defense is to avoid stimuli that stress the display. Measurement of temporal dependence requires an instrument that can measure accurately frame by frame. 6. Dynamic brightness stabilization. This is another cause of temporal dependence. Since the mid1990s some CRT manufacturers (e.g., Eizo) have incorporated a new “feature” in their CRT monitors. The idea was to minimize objectionable flicker when switching between windows, by stabilizing the total brightness of the display. To this end, the monitor compensates for variations in the mean video voltage input to the display. This has the effect of making the intensity of each pixel dependent on the average voltage of all of the pixels in the preceding frame. This artifact, too, will be detected by the test described in 3 above. However, this effect is independent of spot position and thus can be distinguished from failures 3 and 4, by testing at the top of the screen, where failures 3 and 4 are negligible. Different monitors conform in different degrees to the standard CRT model. More expensive monitors tend to have higher video bandwidth and better regulation of the high-voltage power supply. If accuracy matters, it is worth performing tests like those described here to find a monitor whose performance is acceptable. A suite of test patterns for display in your web browser is available at http: llpsychtoolbox.org I tips ldisplaytest. html. Coping
with
Pixel
Dependence
The standard CRT model is easily extended to deal with dependent pixels, but the resulting model is hard to invert,
DISPLAY
making it impractical for many applications. However, an extended model may be useful in estimating the errors in your stimuli. Once you understand the nature of pixel dependence, you may be able to minimize its effect by avoiding problematic stimuli. For example, the preceding pixel 1 and 2 of pixel independence) can be effects (failures reduced by reducing the rate of change of the video signal, either by spacing the changes out (e.g., double the viewing distance, and use a block of four pixels, 2 x 2, in place of each original pixel) or making a fine grating horizontal (parallel to the raster lines) instead of vertical. The spatialaverage effects (failures 3-6 of pixel independence) can be reduced by maintaining a constant mean luminance during characterization and display. If the image to be displayed does not occupy the whole screen, then it is generally possible to mask a small section of the screen from view and to modulate the light emitted from this section to counterbalance modulations of the image that are visible. In this way, the total light output from the entire display may be held constant. EXTENDING Channel
THE
STANDARD
CRT
MODEL
Constancy
As noted above, Eq. (1) embodies an assumption of channel constancy, that the relative spectrum emitted by an RGB monitor channel is independent of its level of excitation.
181
In our experience, channel constancy holds quite well for CRTs. However, for other display technologies, there can violations of this assumption. Figure 5, for be significant example, shows the relative spectrum of the green primary of an LCD monitor (see discussion of LCD displays later) at two digital video value levels, revealing a substantial change in the spectrum of this primary. It is possible to measure such changes systematically and correct for them. For example, Brainard et al. (33) describe a method based on using small-dimensional linear models to characterize spectral changes. This method was developed for handling spectral changes that occur when the voltage to filtered lamps is varied, but it also works well for LCD displays. Other authors have also proposed methods designed explicitly to deal with failures of channel constancy (28). Lookup table methods (see ChanneZ Independence) may also be used to handle the effects of spectral changes. Channel
Independence
In Eq. (4), the form of the gamma function for the green phosphor is g = fgreen(ugreen). A more general form for this function would be g = fgreen(ured, ugTeen,Ublue), so that g depends on its corresponding digital video value G and also on R and B. Failures of the simple form [Eq. (4)] are called failures of channel independence. Channel independence can fail because of inadequate power supply regulation (see item 3 under pixel independence above). Cowan and Rowe11 (18) and Brainard (19) describe procedures for testing for failures of channel independence. Channel inconstancy and dependence go beyond the standard CRT color model but still obey pixel independence and are within the domain of a general color transformation model. Lookup tables may be used to characterize CRT monitors whose performance violates the assumptions of channel constancy and independence. To create a table, one measures the monitor output (e.g., XYZ) for a large number of monitor digital video value triplets (i.e., RGB values). Then, interpolation is employed to invert the table and find the appropriate digital video value for any desired output. Cowan (2) provides an introduction to lookup table methods (see also 34,35). The ICC profile standard allows for storing and inverting a sampled representation of a general three-space to a three-space nonlinear transform. Spatial
Figure 5. LCD primary spectra. A plot of the relative spectral power distribution of the blue primary of an LCD monitor (Marshall, V-LCD5V) in the spectral range 400-600 nm measured at two digital video values. Measurements were made in David Brainard’s lab. The spectrum of the ambient light was subtracted from each measurement. The solid curve shows the spectrum for a digital video value of 255. The dashed curve shows the spectrum for a digital video value of 28, scaled to provide the best least-squares fit to the spectrum for a digital video value of 255. The plot reveals a change in the relative spectrum of the emitted light as the digital video value changes from 28 to 255. If there were no change, the two curves would comcide.
CHARACTERIZATION
Homogeneity
In general, all of the properties of a CRT may vary across the face of the CRT, and, for some applications, it may be necessary to measure this variation and correct for it. Luminance at the edge of a CRT is often 20% less than in the center (19). Failures of spatial homogeneity can be even larger for projectors. Brainard (19) found that the spatial inhomogeneity of a monitor could be characterized by a single light-attenuation factor at each location. Spatial inhomogeneity does not violate pixel independence, because the light output at each pixel is still independent of the light output at other pixel locations. Spatial homogeneity may be handled by allowing the parameters of the standard model to depend on the pixel location.
182
DISPLAY
CHARACTERIZATION
Temporal
Homogeneity
There are two common causes of temporal variations in many types of display: those caused by warm-up and lifetime decay. CRTs and LCD panels (as a result of their backlight) take a significant time to warm up after being turned on. This can be long as 45 to 60 minutes during which time the luminance of a uniform patch will gradually increase by as much as 20%. There may also be significant color changes during this period. CRTs have finite lifetimes, partly because of browning of the phosphors by X rays. This causes the phosphors to absorb some light that would otherwise have been emitted. The same process affects the backlights used in LCD panels. The magnitude of this effect is proportional to the total light that has ever been emitted from the tube. Thus, presenting very bright stimuli for long periods will cause it to happen faster. Typical magnitudes of this phenomenon can be a 50% decrease in luminance across a few thousand hours. In addition to an overall decrease in the amount of emitted light, repeated measurements of the gamma function of a 1989-vintage monochrome CRT monitor during its life revealed an increase in the cutoff parameter Go [Eq. (16)] by about six per month (on a scale of 0 to 255), when the monitor was left on continuously. We tentatively attribute this to loss of metal from the cathode, which may affect the MTF as well.
as a linear system that has the specified MTF. When the grating is vertical, some of the “blur” is due to the finite video bandwidth (failure 2 of pixel independence, above), and because the effect is nonlinear, the MTF is only an approximate description. Geometry. For some experiments, for example, judging symmetry, it may be important to produce images whose shape is accurately known. For this, you may wish to measure and correct for the geometric distortions across the face of your monitor (36). Some monitors allow adjusting of the geometry of the displayed image and such adjustment may be used to minimize spatial distortions. Ambient
Correcting
Light
or “Flare”
for
the presence of ambient light [term is easy. If we want to produce C(h), first we compute C’(h) = C(h) -A(h) and simply proceed as described previously (see Characterzzing with the Standard CRT Model), using C’(A) in place of C(k). The same correction can also be applied if one starts with cone coordinates s or tristimulus coordinates x. In these cases, one computes s’ or x’ by subtracting the coordinates of
A(h) in Eq. (l)]
DISCUSSION
The preceding development has specified the steps of basic display characterization. Here we touch on several more advanced points. Further
Characterization
There are other imaging properties, beyond mation, that you may want to characterize.
color transfor-
Modulation Transfer Function (MTF). The optics of the electron beam and phosphor blur the displayed image somewhat. This is linear superposition and does not violate the pixel independence assumption. (The contribution of each blurry pixel may be calculated independently, and the final image is the sum of all of the blurry pixels.) Although it is easier to think of blur as convolution by a point spread, it is usually best to characterize it by measuring the Fourier analog of the point spread, the MTF. To do this, we recommend displaying drifting sinusoidal gratings and using a microphotometer to measure the light from a small slit (parallel to the gratings) on the display. (Your computer should read the photometer once per frame.) This should be done for a wide range of spatial frequencies. Adjust the drift rate to maintain a fixed temporal frequency, so that the results will not be affected by any temporal frequency dependence of the photometer. Results of such measurements are shown in Fig. 6. The monitor’s MTF is given by the amplitude of the sinusoidal luminance signal as a function of spatial frequency. When the grating is horizontal (no modulation along individual raster lines), the blur is all optical and is well characterized
6. CRT monitor Modulation Transfer Function (MTF). The measured contrast of a drifting sinusoidal grating as a function of spatial frequency (normalized to 1 at zero frequency) for a CRT monitor, a 1989-vintage Apple High-Resolution Monochrome Monitor made by Sony. It has a fixed scan resolution of 640 x 480, a gamma of 2.28, and a maximum luminance of 80 cd/m2. As explained in the text, the contrast gain using horizontal gratings is independent of contrast, and thus is the MTF. The “contrast gain” using vertical gratings, normalized by the horizontal-grating result, characterizes the nonlinear effect of limited bandwidth of the video amplifier, which will depend on contrast. It is best to avoid presenting stimuli at frequencies at which the video amplifier limits performance, that is, where the dashed curve falls below the solid curve.
Figure
DISPLAY
the ambient light from the desired coordinates and then proceeds as before. When a monitor viewed in an otherwise dark room has been adjusted to minimize emitted light when R = G = B = 0, and only a negligible fraction of its emission is reflected back to it from surfaces in the room, correcting for the ambient light is probably not necessary. Under other conditions, the correction can be quite important. What
If I Care
Only
About
luminance?
For many experiments it is desirable to use just one primary and set the others to zero. Or, because observers typically prefer white to tinted displays, you may wish to set R = G = B and treat the display as monochrome, using a one-channel version of the standard CRT model. Fine
Control
of Contrast
As noted above, both &bit and 24-bit modes on graphics cards usually use B-bit DACs to produce the video voltages that drive the CRT. The 256-level quantization of the &bit DACs is too coarse to measure threshold on a uniform background. A few companies sell video cards that have more-than&bit DACs.15 There are several ways to obtain finer contrast steps using &bit DACs, but none is trivial to implement: Use a half-silvered mirror to mix two monitors optically (37). Optically mix a background light with the monitor image, for example, by using a slide projector to illuminate the face of the CRT (38). For gray-scale patterns, add together the three video voltages produced by a graphics card to produce a single-channel signal that has finer steps (see 39). For color patterns, add together the corresponding from two synchronized u,ed, Ugreen, and Ublue voltages graphics cards. (Synchronizing monitors can be technically difficult, however.) 4. Move the observer far enough away so that individual pixels cannot be resolved and use dithering or error diffusion (40,41). 5. Use a high frame rate, and dither over time. Use of Standard
Calorimetric
Data
Standard estimates of color matching functions represent population averages for color normal observers and are typically provided for fovea1 viewing at high light levels. These estimates may not be appropriate for individual observers or particular viewing conditions. First, there is variation among observers in which lights match, even among color normal observers (42,43; see also 25,27,44-50). Second, the light levels produced by typical CRTs may not completely saturate the rods. If rods participate significantly in the visual processing of the stimulus, then color matches computed on the l5 A list of such video cards org I tips I vldeocards. html.
may
be found
at http:
llpsychtool
box.
CHARACTERIZATION
183
basis of standard calorimetry will be imperfect for lights viewed outside the fovea. Some guidance is available for determining whether rods are likely to have an effect at typical CRT levels (see 51; also 13). correcting for individual variation Unfortunately, is not trivial, because determining the appropriate color matching functions for a given individual and viewing conditions requires substantial measurement. If individual cone sensitivities or color matching functions are known, then it is straightforward to employ them by substituting them for the standard functions in the matrix S of Eq. (7). In some experiments, stimuli are presented on monitors to nonhuman observers. Here it is crucial to customize the rendering calculations for the spectral sensitivities appropriate for the species of interest (see 52). For psychophysical experiments where taking the role of rods into account is critical, one can consider constructing custom four-primary displays, restricting stimuli to the rod-free fovea or employing an auxiliary bleaching or adapting light to minimize rod activity. For experiments where silencing luminance is of primary interest, individual variation can be taken into account by using a psychophysical technique such as flicker photometry to equate the luminance of stimuli that diller in chromaticity (15). Other
Display
Devices
Although CRT monitors remain the dominant technology for soft copy display, they are by no means the only display technology. Liquid crystal display (LCD) devices are becoming increasingly popular and in fact are the dominant display technology used in laptop computers and data projection systems. Other emerging technologies include the digital light processor (DLP; 53) and plasma displays. Printers, obviously, are also important. Characterization of other display devices requires applying the same visual criterion of accuracy. For each type of display device, a model of its performance must be developed and evaluated. The standard CRT model is a reasonable point of departure for LCD, DLP, and plasma displays. We discuss these in more detail below. Printers are more difficult to characterize. The standard CRT model does not provide an acceptable description of printers. First, printers employ subtractive (adding ink causes more light to be absorbed) rather than additive color mixture, so that Eq. (1) does not hold. Second, the absorption spectrum of a mixture of inks is not easy to predict from the absorption spectra of individual inks. Third, the spectrum of reflected light depends on the inks laid down by the printer and also on the spectrum of the ambient illumination. The illumination under which the printed paper will be viewed is often not under the control of the person who makes the print. Lookup tables (see above) are generally employed to characterize the relationship between digital values input to the printer and the reflected light, given a particular reference illuminant. Accordingly, ICC printer profiles provide for specification of tabular data to characterize printer performance. A detailed treatment of printer characterization is beyond the scope of this article (see 2,3,35,54,55).
184
DISPLAY
CHARACTERIZATION
LCD Panels. LCDs appear commonly in two manifestations: as flat panel equivalents of the CRT for desktop and laptop computers and as the light control element in projection displays. When designed to replace monitors, some manufacturers have tried to imbue them with some of the characteristics of CRT displays, so LCDs have typically accepted the same analog video signal from the computer that is used to drive a CRT. However, since the 1999 publication of the Digital Visual Interface (DVI) standard (http:llwww.ddwg.orgl; see also http:llwww.dell.comluslenlarmltopicslvectors~000dvi. htm), a rapidly increasing proportion of LCD displays accept digital input. At least in principle, digital-input LCDs can provide independent digital control of each pixel and promise excellent pixel independence. LCDs can be characterized similarly to CRT displays, but bear in mind the following points:
1. Angular dependence. The optical filtering properties of LCD panels can have a strong angular dependence, so it is important to consider the observer’s viewing position when characterizing LCD displays. This is especially important if the observer will be off-axis or there will be multiple observers. Angular dependence may be the greatest obstacle to the use of LCDs for accurate rendering. 2. Temporal dependencies. The temporal response of LCD panels can be sluggish, resulting in more severe violations of the assumption of temporal independence than typically observed in CRT displays. For example, when measuring the gamma function, a different result will be obtained if one measures the output for each digital video value after a reasonable settling time than if one sandwiches one frame at the test level between two frames of full output. It is important to try to match the timing of the characterization procedure to the timing of the target display configuration. 3. Warm-up. The LCD panel achieves color by filtering a backlight normally provided by a cold-cathode fluorescent display. These behave similarly to CRTs when warming up, so be prepared to wait for 45 minutes after turning one on before expecting consistent characterization. 4. Channel constancy. As illustrated by Fig. 5, some LCD displays do not exhibit channel constancy. This does not appear to be inherent in LCD technology, however. Measurements of other LCD panels (5657) indicate reasonable channel constancy, at least as assessed by examining variation in the relative values of XYZ tristimulus coordinates. 5. Output tzming. It can be important to know when the light is emitted from the display with respect to generating video signals. In a CRT-based display, the video signal modulates the electron beam directly, so it is easy to establish that the light from one frame is emitted in a 10 ms burst (assuming a 100 Hz frame rate) starting somewhere near the time of the frame synchronization pulse and the top of the screen, but this need not be so in an LCD system. To evaluate the output timing, arrange to display a
single light frame followed by about ten dark frames. Connect one of the video signals (say the Green) to one channel of an oscilloscope and connect the other channel to a photodiode placed near the top of the screen. (Note that no electronics are needed to use the photodiode but the signal may be inverted and the voltage produced is logarithmically, rather than linearly, related to the incident light.) When observing a CRT, it will be possible to identify a short pulse of light about 1 ms or so wide located somewhere near the beginning of the video stream. If the detector is moved down the screen, the pulse of light will move correspondingly toward the end of the video frame. When observing an LCD panel, the signals look completely different. The light pulse is no longer a pulse but a frame-length block, and there may be a significant delay between the video stream and the arrival of the light. In fact, in some displays, the two may have no fixed relationship at all (see Fig. 7). LCD panels (and projec6. Resolutzon. Analog-input tors) contain interlace electronics that automatically resample the video signal and interpret it in a manner suitable for their own internal resolution and refresh rate. This is desirable for easy interface to different computers, but the resampling can introduce both spatial and temporal dependencies that make accurate imaging more difficult. If possible, LCD displays should be run at their native spatial and temporal resolution. Even then, it is not guaranteed that the electronics pass the video signal unaltered, and one should be alert for spatial and temporal dependencies. This consideration also applies to DLP displays. Internal quantization. Analog-input LCDs may actually digitize the incoming video voltages with a resolution of only 6 or 8 bits before displaying it, so be prepared to observe quantization beyond that created by the graphics card. Gamma. There is no inherent mechanism in an LCD panel to create the power-law nonlinearity of a CRT; therefore a power-type function [e.g., Eq. (16)] does not work very well to describe the function relating digital video value to phosphor light intensity. One way to deal with this is to make measurements of the light out for every possible digital video valve in and invert the function numerically. LCD Projectors. Analog-input LCD displays currently offer little advantage in terms of image quality over a CRT display, but the unique problems posed by some environments such as the intense magnetic fields inside a MRI scanner mandate the use of a projector. Apart from CRT-based projectors, there are two types of projection displays commonly available: those based on LCDs, and those based on digital light processors (DLPs). LCD projectors have properties very similar to LCD panels except that the light source is often an arc lamp rather than a cold-cathode fluorescent device, and the pixel processing is more elaborate. Therefore, the same
DISPLAY
CHARACTERIZATION
185
Figure 7. Delay in LCD projection. Diagram showing how the light output from the center of a projection screen, illummated by an analog-mput LCD projector, varies with time when the patch is nominally illuminated for two frames before being turned off. This is compared to the light from a CRT driven by the same video signal. Note the extra 12-ms delay imposed by the LCD projector’s electronics.
considerations that apply to LCD panels generally apply to LCD projectors as well. Projectors are often used in conjunction with “gain” screens, which have nonlambertian viewing characteristics, and as for LCD panels, the light that reaches the observer varies with viewing position. DLP Projectors. Digital light processing (DLP) projectors work by using a two-dimensional array of tiny mirrors (digital micromirrors) that can be deflected by about 20” at high speed; 15 ,USswitching times are usual. These are used as light valves to direct the light from a high-intensity source either through the projection lens or off to the side. A color display is generated either by using three of these devices to deflect colored light filtered from the main source (as in modern LCD projectors) or by using one array operating at three times the net frame rate to deflect colored light provided by a rotating filter wheel. Typically, this rotates at about 150 Hz to give a 50 Hz display. The picture breakup that occurs when the eye makes saccadic movements makes projectors based on the filter wheel design difficult to use. Currently three-mirror DLP projectors are quite expensive in comparison to LCD projectors, and many are designed for large-scale projection (e.g., digital cinema). Packer et al. (58) evaluate how well a three-mirror DLP device conforms to a number of the assumptions of the standard CRT model. Plasma. Plasma displays are much more expensive than CRTs and LCDs but are currently favored by museums (e.g., the Museum of Modern Art and the Metropolitan Museum of Art in New York City) because the plasma display is both flat like an LCD and nearly lambertian (i.e., emitted light is independent of viewing angle) like a CRT.
Instrumental
Considerations
Three types of instrument are typically used to characterize display devices: spectroradiometers, colorimeters, and photometers. Spectroradiometers measure the full spectral power distribution of light. Colorimeters typically allow measuring of CIE XYZ tristimulus coordinates. Photometers measure only luminance (proportional to CIE Y). Zalewski (59) provides a detailed treatment of light measurement. Several sources (1,11,60,61; http:llpsychtoolbox.orgltipsllightmeasure.html) discuss instrumentation for display characterization. A spectroradiometer provides the most general characterization of the spectral properties of a display. One may compute cone coordinates or XYZ tristimulus coordinates from the spectral power distributions measured by spectroradiometers. If one wishes to use a description of color vision that differs from that specified by the CIE system (e.g., 24,25,27,42,62,63), then it is necessary to characterize the display by using a spectroradiometer. However, spectroradiometers are expensive (thousands of dollars), whereas calorimeters often cost less than a thousand dollars and the XYZ measurements they provide are sufficient for many applications. Photometers measure only luminance and thus are suitable only for applications where the color of the stimuli need not be characterized. How much instrumental precision is required? The answer, of course, depends on the application. A simple calculation can often be used to convert the effect of instrumental error into a form more directly relevant to a given application. As an example, suppose that we wish to modulate a uniform field so that only one of the three classes of cones is stimulated, and the modulation is invisible to the other two. Such “silent substitution” techniques are commonly used in visual psychophysics to allow isolation of individual mechanisms (e.g., individual cone types; see 64,65). Figure 8 shows an
186
DISPLAY
CHARACTERIZATION
8. CRT red primary emission compared with L cone sensitivity. The solid line is a plot of the spectral power distribution of a typical CRT red primary. The measurements were made in David Bramard’s lab and have approximately 1-nm resolution. The dashed line is a plot of an estimate (24) of the human L cone spectral sensitivity. Both curves have been normalized to a maximum of 1. Note that considerable light power is concentrated in a region where the slope of the L cone sensitivity is steep. Figure
estimate of the spectral sensitivity of the human L cone along with the spectral power distribution of a typical CRT red phosphor emission. Considerable phosphor power is concentrated in the spectral interval where the slope of the L cone sensitivity function is steep. One might imagine that calculations of the L cone response to light from this phosphor are quite sensitive to imprecisions in the phosphor measurement. To investigate, we proceeded as follows. We started with measurements of a CRT’s phosphor emissions that had roughly a 1-nm resolution. Then, we simulated two types of imprecision. To simulate a loss of spectral resolution, we convolved the initial spectra using a Gaussian kernel 5 nm wide (standard deviation). To simulate an error in spectral calibration, we shifted the spectra 2 nm toward the longer wavelengths. We computed three stimulus modulations for each set of simulated measurements. Each modulation was designed to generate 20% contrast for one cone type and silence the other two. Then, we used the original spectral measurements to compute the actual effect of the simulated modulations. The effect is very small for the reduced resolution case. The maximum magnitude of modulation in cone classes that should be silenced is 0.25%. However, the 2-nm wavelength shift had a larger effect. Here, a nominally silenced cone class can see almost 2% contrast. For this application, spectral calibration is more critical than spectral resolution. One point that is often overlooked when considering the accuracy of calorimeters and photometers is how well the instrument’s spectral sensitivities match those of their target functions: X, Y, and 2 color matching
functions for calorimeters or luminance sensitivity (Y) for photometers. To characterize the accuracy of a photometer, for example, it is typical to weight differences between instrumental and luminance spectral sensitivity at each wavelength in proportion to luminance sensitivity at that wavelength. This means that the specified accuracy is predominantly affected by discrepancies between instrumental and luminance sensitivity in the middle of the visible spectrum, where luminance sensitivity is high. The specified accuracy is not very sensitive to discrepancies in the short- or long-wavelength regions of the spectrum, where luminance sensitivity is low. A photometer that is specified to have good agreement with the luminance sensitivity function will accurately measure the luminance of a broadband source, but it may perform very poorly when measurements are made of spectrally narrower light, such as that emitted by CRT phosphors. The indicated luminance in such a situation can be wrong by 50% or more. A final question that arises when using any type of light measuring instrument for a CRT is whether the instrument was designed to measure pulsatile sources. The very short bursts of light emitted as the raster scans the faceplate of the CRT have peak intensities of 10 to 100 times the average intensity. This can distort the result obtained by using electronics not designed for this possibility. Recommended
Software
ICC. For many purposes, sufficiently accurate imaging may be obtained simply by using any available commercial package to characterize the CRT and create an ICC profile and then using an ICC-compatible image display program. Vision Research. After using a wide variety of software packages (e.g., see directories 66,67) for running vision experiments, we have come to the conclusion that it is very desirable that the top level, in which the experimenter designs new experiments, be a full-fledged language, preferably interactive, like BASIC or MATLAB. Writing software from scratch is hard. Some packages overcome this limitation by allowing users to design experiments by just filling out a form rather than actually programming. However, this limits the experiment to what the author of the form had in mind and makes it impossible to do a really new experiment. The best approach seems to be to program in a general purpose language and write as little new code as possible for each new experiment by modifying the most similar existing program. In that spirit, the free Psychophysics Toolbox (http: llpsychtooZbox.orgl> provides a rich set of extensions to MATLAB to allow precise control and synthesis of visual stimuli within a full-featured interactive programming environment, along with a suite of demo programs (68,69). Psychophysica (http: llvtsion.arc.nasa.gov lmathematica lpsychophysical), also free, provides similar extensions for Mathematica (70). Rush
puters
-
“Hogging”
interrupt
the
Machine.
the user’s
program
Today’s popular comfrequently to grant
DISPLAY
time to other processes. This is how the computer creates the illusion of doing everything at once. The difficulty with this is that the interruptions can cause unwanted pauses in the temporal presentation of a stimulus sequence. Although it is difficult to shut down the interrupting processes completely, we have found it both possible and desirable to suppress interrupts, hogging the machine, for the few seconds it takes to present a visual stimulus. In the Psychophysics Toolbox (see above), this facility is called Rush and allows a bit of MATLAB code to run at high priority without interruption. SUMMARY Today’s graphics cards and CRT monitors offer a cheap, stable, and well-understood technology suited to accurate rendering after characterization in terms of a simple standard model. This article describes the standard model and how to use it to characterize CRT displays. Certain stimuli will strain the assumption of pixel independence, but it is easy to test for such failures and often possible to avoid them. New technologies, particularly LCD, DLP, and plasma displays, are emerging as alternatives to CRT displays. Because these technologies are less mature, there are not yet standard models available for characterizing them. Development of such models can employ the same rendering intent and color vision theory introduced here, but the specifics of the appropriate device model are likely to differ. Digital-input LCD displays promise excellent pixel independence but may be difficult to use for accurate rendering because of the high dependence of the emitted light on the viewing angle.
9. P. A. Keller, Applications, 10.
11. 12.
CHARACTERIZATION
The Cathode-Ray Palisades Press,
Tube: Technology, NY, 1991.
D. H. Brainard and chophysics bibliography, terbib.html.
D. G. Pelli, Raster http:llpsychtoolbox.orgl
15.
P. K. Kaiser and R. M. Boynton, 2nd ed., Optical Society of America,
18. W. B. Cowan and S33-S38 (1986). 19. D. H. Brainard,
N. Rowell,
Color
Color
Res. Appl.
Res.
21.
D. G. Pelli, Spat&al V&on 10, 443-446 nyu.edulVideoToolboxlPixelIndependence.html.
22.
A. B. Watson et al., Behav. 587-594 (1986).
23.
B. A. Wandell,
Res. Methods
Foundations
MA, 1995. V. Smith
25.
P. DeMarco, 1,465-1,476
and J. Pokorny,
26.
CIE, Calorimetry, 2nd ed., Bureau VIENNA, 1986, publication 15.2. A. Stockman and L. T. Sharpe, Color http:l/cv~s~on.ucsd.edul~ndex.htm.
J. Pokorny, (1992).
Viszon and
Res.
31.
A. C. Naiman Visual Process.
32.
Q. J. Hu (1994).
33.
D. H. Brainard, W. A. Brunt, Am. A 14,2,091-2,110 (1997).
and
34.
E. S. Olds, 1,501-1,505
W. B. Cowan, (1999).
and
P. Jolicoeur,
J. Electron.
Imaging
4. ICC, Specification ICC.1:1998-09 1998, http: II www.color.orglprofiles.html. 5. G. Gill, ICC File argyll lcolor.html. 6. D. Wallner,
Engineering,
Practical Gucde to Color Foundation, Sewickley,
file format
I/O README,
for color
http:llweb.access.net.au/
Building ICC Profiles-The Mechanics 2000, http:llwww.color.orgl~ccprofZes.html.
7. G. Gill, What’s wrong with 1999, http: II web.access.net.au 8. D. E. Pearson, TransmcssLon mation, Pentech Press, Wiley,
the
profiles,
ICC largyll
profile
format of Pictortal 1975.
Behav.
37.
C. C. Chen, J. M. Foley, 773-788 (2000).
D. H. Brainard,
38.
A. B. Polrson (1996).
14, Int.
172-186 Sym.
Tech.
Color
Res.
94 Dig.
Viscon 25,
19-22
J. Opt.
J. Opt.
Sot.
Sot. A 16,
2, 53-61(1993).
B. A. Wandell,
and L. Zhang,
database,
J. M. Spelgle,
L. T. Maloney and K. Koh, Comput. 20, 372-389 (1988).
39. D. G. Pelli
Appl.
CIE,
SPIE Conf: Human III, 1666, 1992,41-56.
P. C. Hung,
and
la
vision
Display
A. Klein, Sot. Inf: Display
and
Sot. A 9,
de
M. E. Gorzynski,
and W. Makous, Digital Display,
and S.
Res.
Sot. Inf
J. Opt.
and
35.
anyway’? Infor-
Color
and
18,
Sunderland,
Central
36. and
f &cc-problems.html.
and Display New York,
C. S. Calhoun,
Comp.
15, 161-171(1975).
V. C. Smith,
1, NY,
and
http:llvlslon.
Sinauer,
R. S. Berns, R. J. Motta, Appl. l&299-314 (1993).
27.
11,
gamma,
Instrum.
of Viszon,
24.
Suppl.
about (1997);
30.
GATF Technical
Vision, 1996.
(1989).
C. Poynton, Frequently-asked questions http:llwww.tnforamp.netl-poyntonlGammaFAQ.html.
in
3. R. Adams and J. Welsberg, Management, Graphic Arts PA 1998.
Color DC,
Appl.
14,23-34
20.
N. P. Lyons and J. E. Farrell, Dig. 20, 220-223 (1989).
vol.
psy-
S. Daly, Sot. Inf: Display 2001 Dig. XXXII, 1,200-1,203 (2001). 17. R. A. Tyrell, T. B. Pasquale, T. Aten, and E. L. Francis, Sot. Infi Display 2001 Dig. XXXII, 1,205-1,207 (2001).
29.
of Optics, McGraw-Hill,
tipslras-
16.
BIBLIOGRAPHY
ed., Handbook and Design,
graphics
Human Washington,
D. L. Post (1989).
2. W. B. Cowan, in M. Bass, Fundamentals, Techntques, 1995, pp. 27.1-27.44.
and
13. G. Wyszecki and W. S. Stiles, Color Sctence-Concepts and Methods, Quantitative Data and Formulae, 2nd ed., J Wiley, NY, 1982. 14. D. H. Brainard, in M. Bass, ed., Handbook of Optics, vol. 1, Fundamentals, Techmques, and DesLgn, McGraw-Hill, NY, 1995, pp. 26.1-26.54.
28.
Color
History,
T. R. H. Wheeler and M. G. Clark, m H. Widdel and D. L. Post, eds., Color Ln Electrome Displays, Plenum Press, NY, 1992, pp. 221-256. P. A. Keller, Electronic Display Measurement - Concepts, Techmques and Instrumentation, J Wiley, NY, 1997.
Acknowledgments We thank J. M. Foley, G. Horwitz, A. W. Ingling, T. Newman, and two anonymous reviewers for helpful comments on the manuscript. More generally, we learned much about monitors and their characterization from W. Cowan, J. G. Robson, and B. A. Wandell. This work was supported by grant EY10016 to David Bramard and EY04432 to Denis Pelli.
1. D. L. Post, in H. Widdel and D. L. Post, eds., Electronic Displays, Plenum, NY, 1992, pp. 299-312.
187
Viszon
Res.
Vision
Methods
Instrum.
Vision Res.
36,
Res. 31, 1,337-1,350
Res.
40,
515-526 (1991).
188
DYE TRANSFER
40.
C. W. Tyler,
41.
J. B. Mulligan 1,217-1,227
Spatial
42.
W. S. Stiles
43.
M. A. Webster and 1,722-1,735 (1988).
PRINTING
DYE TRANSFER PRINTING
10, 369-77 (1997).
Viszon
and (1989).
TECHNOLOGY
L. S. Stone,
and J. M. Burch,
J.
Opt.
Optica
Sot.
Acta
D. I. A. MacLeod,
Am.
6, l-26
J. Opt.
A
6,
Am.
J. Nathans
et al., Sczence
232,
203-210
This article describes the structure and operation of dye-sublimation thermal-transfer printing technology. In addition to the structure and operation, the capabilities and limitations of the technology such as resolution, speed, intensity levels, color accuracy, and size are described. Though the subject of this chapter is the dye-sublimation printer, the details of other thermal transfer printers are included to assist in understanding and comparing the capabilities and limitations of the dye-sublimation printer.
(1986).
(1993).
M. Neitz
and J. Neitz,
Sczence
267,
1,013-1,016
(1995).
50 J. Carroll, C. McMahon, M. Neitz, and J. Neitz, J. Opt. Sot. A 17,499-509(2000). 51. A. G. Shapiro, J. Pokorny, and V. C. Smith, J. Opt. Sot. Am. A 13, 2,319-2,328 (1996). 52. L. J. Fleishman et al., Anzm. Behav. 56, 1,035-1,040 (1998). 53. L. J. Hornbeck, Digztal light processzng for hzgh-brzghtness hzgh-resolution 1997.
applications.
54. R. W. G. Hunt, Press,
Tolworth,
55. B. A. Wandell
Texas
The Reproduction England, 1987.
Instruments
of Colour,
and L. D. Silverstein, of Color, 2nd ed., Optical DC, 2001.
The Sczence Washington,
m
56. M. D. Fairchild
and D. R. Wyble, terization of the Apple Studio LCD). Munsell Color Science Rochester Institute of Technology, http://www.~~s.rzt.edulmcsllresearchlreports.shtml.
HISTORY
Report,
4th ed., Fountain S. K. Shevell, ed., Society of America,
Colorzmetrzc characDisplay (Flat Panel Laboratory Report, Rochester, NY, 1998;
and M. D. Fairchild, Colorzmetrzc characterzzation of three computer displays (LCD and CRT). Munsell Color Science Laboratory Report, Rochester Institute of Technology, Rochester, NY, 2001; http:ll www.czs.rit.edul mcsl Iresearch lreports.shtml.
58 0. Packer
et al., Viszon
E. F. Zalewski, McGraw-Hill,
Res.
41,427-439
(2001).
in M. Bass, ed., Handbook NY, 1995, pp. 24.3-24.51.
of Optics,
vol.
2,
60. R. S. Berns, M. E. Gorzynski, and R. J. Motta, Color Res. Appl. l&315-325(1993). and D. G. Pelli, Light measurement mstru61. D. H. Brainard mentation,
http:llpsychtoolbox.orgltipsllightmeasure.html.
62. J. J. Vos, Color Res. Appl. 3, 125-128 (1978). 63. A. Stockman, D. I. A. MacLeod, and N. E. Johnson, J. Opt. Sot. Am.A 10(12),2,491-2,521(1993). 64. 0. Estevez and H. Spekreijse, Viszon Res. 22,681-691(1982). 65. D. H. Bramard, in P. K. Kaiser and R. M. Boynton, eds., Human Color Viszon, Optical DC, 1996, pp. 563-579.
66. LTSN
Psychology, ch-resources. html.
Society
of America,
http:llwww.psychology.ltsn.ac.uklsear-
67 F Florer,
Powermac software for experimental http:/lviszon.nyu.edulTipslFaithsSoftwareRevzew.html.
68. D. H. Brainard, 69.
D. G. Pelli,
70. A. B. Watson (1997).
Washington,
psychology,
10,433-436 (1997). 10, 437-442 (1997). and J. A. Solomon, Spatial Visson 10, 447-466 Spatial
Spatzal
Viszon
Viszon
OF PRINTERS
USING
DYE
TRANSFER
Dye-transfer printers form images on sheets using different types of energy such as mechanical pressure, heat, and light. Impact printers use mechanical pressure whereas nonimpact printers use heat and light. In this article, the histories of both impact and nonimpact printers are overviewed. Which is important to understand the position of the dye-sublimation printer in the history of dye-transfer printers. The history following includes both impact printers and nonimpact printers which include the dye-sublimation printer. Impact
57. J. E. Gibson
59
MATSUSHIRO Corporation Japan
INTRODUCTION
46, LT. Neitz and G. H. Jacobs, Vision Res. 30, 621-636 (1990). et al., Nature 356, 431-433 (1992). 47 J. Winderickx 48 J. Neitz, M. Neitz, and G. H. Jacobs, Viszon Res. 33,117-122 49
Okidata Gunma,
A 5,
44. J. Pokorny, V. C. Smith, G. Verriest, and A. J. L. G. Pinckers, Congenztal and Acquzred Color Viszon Defects, Grune and Stratton, NY, 1979. 45.
NOBUHITO
(1958).
Sot.
TECHNOLOGY
Printers
Using
Dye
Transfer
1960 Development of the horizontal chain line printer, the original line printer. established the position of 1961 The golf ball printer the serial impact printer. 1970 Development of the daisy wheel printer, a typeface-font serial impact printer, the successor to the golf ball printer. 1970 The dot matrix printer is the mainstay of current serial impact printers. Nonimpact
Printers
Using
Dye
Transfer
Development of wax-type thermal-transfer recording paper. Development of the early sublimation dye transfer method. 1940s Development of heat-sensitive paper using almost colorless metallic salts. of colorant melting by expo1950s Commercialization sure to infrared radiation. Disclosure of the patent for the thermal printer. Development of a prototype electrosensitive transfer printer. 1960s Development of the leuco-dye-based heatsensitive recording sheet, a current mainstay in the market. 1969 Development of the heat-mode laser-based transfer printer. 1930s
DYE TRANSFER
1970s
The dye-sublimation printer, electrosensitive transfer printer, and wax-melt printer were developed in succession. 1971 Dye-sublimation printer was developed. 1973 Electrosensitive transfer printer was developed. 1975 Wax melt printer was developed.
THERMAL-TRANSFER
PRINTERS
PRINTER
This is one of the most promising printers in recent years. Sublimation is a process by which a solid is transformed directly into a vapor without passing through a liquid phase. This cycle forms the basis of dye-sublimation printing.
Figure
1.
Basic structure
of the dye-sublimation
process.
TECHNOLOGY
189
Structure
In the dye-sublimation printer, as shown in Fig. 1 and described in the outline, ink that contains a dispersed dye is applied to ink sheets. Upon heating, the dye is transferred to the sheet, and then resolidifies at room temperature. Therefore, dry-process printing is possible. Sublimation
In principle, images are formed thermally via physical or chemical reactions of solid inks by using a heating element. The principle of image formation by the thermaltransfer printer is depicted in Fig. 1. Figure 1 shows the dye-sublimation printer which is the most representative of the thermal-transfer printers. The thermal-generating elements are heated by applying an input voltage. The heated area of the ink which coats the base material sublimes onto the recording sheet, thereby forming images. The advantage of the thermal-transfer printer is that it does not require an ink supply, an ink recovery mechanism, or a clog recovery system as in ink-jet printers. It requires only an ink-sheet or coloring sheet supply mechanism, that makes the recording and printing mechanisms simple. Due to these advantages, small printers are possible. By controlling the amount of heat applied, gradation control by dot density control is also possible; high-quality images that have high resolution and gradation can be produced. The disadvantage of the thermal-transfer printer is that it requires special paper; accordingly, the operating cost is higher than that of printers which use plain paper. The dye-sublimation printer and the wax-melt printer are representative types of thermal-transfer printers. DYE-SUBLIMATION
Basic
PRINTING
Dye,
Ink
Sheet,
and
Recording
Sheet
The very fact that sublimation dyes sublime is in itself proof that they are essentially unstable. At the same time, however, recorded materials must be stable, which is contradictory to the nature of sublimable dyes. Accordingly, the selection of dyes is the most important factor in determining the success or failure of ink-sheet manufacturing. In the early 198Os, dispersed dyes and materials related to the dispersed dye group were used as ink dyes. Thereafter, due to characteristics such as light resistance and color reproducibility, required for hard dyes were developed. In addition to copies, special regularly dispersed dyes and basic dyes, new dyes that have the same molecular structure as the pigments used for color photosensitive materials have been recently developed. To reduce the cost of printing systems, multiple transferable ink sheets have been developed. Using these sheets, large quantities of sublimable dyes are included in a thermoplastic resin layer; this makes up to 10 printing cycles possible from the same sheet at identical density. The issue of sublimable dyes is most important for sublimation thermal-transfer recording. Progress in the area of materials, development of dyes with appropriate subliming characteristics, characteristics for dispersing by diffusion, high absorption coefficient, severe weather tolerance and good saturation, as well as progress in the area of recording sheets are the most important issues in the development of sublimation printers. To reproduce the full color spectrum, ink sheets are used onto which dyes of the three primary colors, cyan, magenta and yellow, are applied, In addition to these three primary colors, black is used. The reproduction of colors is realized by subtractive color mixture. In addition, the dyes applied onto the recording sheet must have high linearity with respect to the applied heat and many levels of gradation. The energy required to transfer from this type of ink sheet and recording sheet, is greater than that in the wax-melt process. The requirements for the ink sheets with respect to the thermal head are (1) the ink sheets should not degrade the thermal head, (2) the ink sheets should not adhere to the thermal head, and (3) no ink should remain on the thermal head. Furthermore, the heat-resistant layer itself should not affect the properties of the dye layer. The requirement for the recording sheets onto which the sublimable dye is transferred with respect the thermal head is that the recording sheets should not degrade the thermal head with hard chemical compounds included in the recording sheets.
190 Sublimation
DYE TRANSFER
PRINTING
TECHNOLOGY
Recording
The density gradation method, in which the dots themselves possess gradations, can be realized when using sublimation thermal-transfer sheets (ink sheets and recording sheets). The essence of sublimation thermal transfer is that density modulation is possible for each dot. The density of each dot can be controlled by the pulse width of input signals; thus, by increasing or decreasing joule heat, the amount of dyes contained in the ink layer that is sublimed and transferred to the recording sheets is modulated. This control itself is an analog process. The temperature of the thermal head reaches 200-300 “C momentarily; thus it is essential that the ink sheets be made of heat-resistant materials. Only polyester films are available as low cost sheets that have heatresistant properties and strength. However, polyester films have short useful lives under these conditions. A method to incorporate layers that have improved longevity is being studied and should soon be used. Recording
WAX-MELT
Features
Because the recording system can reproduce 64 or more gradations, it is used in pictorial color printers. Therefore, due to changes in environmental temperature and heat accumulation in the thermal head, the thermal balance can be misdistributed and errors in recording density can result. These effects have a serious impact on sublimation systems that use analog gradation recording systems: they can be the factors that cause problems of non uniformity density in the images and also in the degradation of image quality, including color reproducibility, resolution, and MTF. The recording sensitivity is one-third that of thermal sensitivity systems, and the operating cost of these systems is high due to the requirements of the image processing features; furthermore, they are not suitable for high-speed recording. Although their operating cost is high, because of their excellent color reproducibility, 2.0) and high density level (optical density value: high tone image (gradation: 64 or higher), they have been incorporated into products that require high image quality, such as digital color proofing systems, video printers, over-head-projectors and card-transferring machines. Advantages
and
printing quality close to that of photographic images is required, such as in color proofing and production of pictorial color copies. In addition, this system can be used for applications ranging from personal use such as video printing to professional outputs that involve medical imaging and measurement devices. However, prints have some disadvantages: poor durability, retransfer of transferred dyes, and early degradation of sections touched by fingers. Various methods to alleviate these problems have been developed for practical use. Some representative examples, such as a method to cover the entire recording sheet with a protective layer that contains UV absorbents and a method to stabilize the characteristics of the staining agent itself, have been effective. Another disadvantage is the high price of sheets; however, sheets for multiple transfers, which can be used several to ten times, have been developed.
Disadvantages
The advantages of sublimation thermal-transfer systems include the fact that they can reproduce images that have many gradations. The highest possible range of gradations is 256 levels. They also offer high-density images and produce excellent gradations in low-density areas. Resolution can be improved by reducing the dot diameter. Furthermore, due to the spreading of the ink transferred, the grainy nature of dots disappears, rendering images smooth. In addition, due to the simple printing mechanism, the printer can be fabricated compactly and inexpensively. The dye-sublimation printer creates a continuous tone and can produce color pictures whose image quality is comparable to that obtained by photographic systems. Thus, this system is suitable when
PRINTERS
In the wax-melt printer, when a sheet coated with a heatmeltable ink is applied by heating signals in the thermal head, the ink is transferred onto a recording sheet, and images are formed. Because the density characteristics of this recording system are binary, the dither method or another similar method is used to achieve gradation. Thus, the resolution of this method tends to be low. The melt-type thermal-transfer system produces solid and clear images; thus, the system is used to prepare bar codes. Basic
Structure
The mechanisms of two thermal-transfer printers are compared in Fig. 2. The major difference the thermaltransfer mechanisms of the two systems is that, compared with the wax-melt printer, the dye-sublimation printer requires higher thermal energy. Wax
Melt,
Ink
Sheet,
and
Recording
Sheet
The ink used in the wax-melt printer is a wax containing pigments whereas the dye-sublimation printer uses a resin that contains a sublimable dye. Ink which coats sheets is produced by dispersing coloring materials such as pigments in a wax medium such as paraffin wax. Pigments are used as coloring materials to provide color tones to recorded materials; pigments whose characteristics are very close to those of printing ink can be used. Vehicles function as carriers for pigments during heating, melting, and transferring by the thermal head. At the same time, they cool and bind the pigments onto the recording sheets. Paraffin wax, carnauba wax, and polyethylene wax are used as vehicles. The wax-melt printer employs more stable dyes and pigments as coloring materials than the dye-sublimation printer for high-quality images. Plain paper can be used as the recording sheet for the wax-melt printer, whereas the dye-sublimation printer requires special recording sheets.
DYE TRANSFER
PRINTING
Figure 2. Wax-melt thermal sublimation thermal-transfer.
Recording
Features
The image produced by melt transfer has excellent sharpness; the edges of images are the sharpest among the images obtained by all types of recording systems. THERMAL
HEAD
191
TECHNOLOGY
transfer
and
dye-
the production of large-sized thin-film heads is difficult, and the manufacturing process is complex. Figure 3 shows the structure of thick-film heads and thin-film heads. In sublimation transfer, the conduction time of the current to the thermal head is 2-20 ms, and that in melt transferring is approximately 1 ms. The difference in duration arises from the difference in recording processes, which are reflected in the recording speed.
Requirements
The following thermal head:
basic
characteristics
are required
for the
1. For high-speed printing, the thermal head must have properties that allow rapid heating and cooling. To meet these requirements, the thermal head must have low heat capacity. 2. For high-resolution printing, high-density resistor patterns must be realized. 3. For extended life, the thermal head must be durable throughout continuous rapid heating and cooling cycles. 4. For reduced power consumption, the heat energy supply must be highly efficient. Structures
and
Features
Some types of thermal heads have been developed that meet these requirements. Thermal heads can be classified as (1) semiconductor heads, (2) thick-film heads, and (3) thin-film heads. A semiconductor head has not yet been practically applied because of its slow speed resulting from poor thermal response. A thick-film head is the most suitable for large printers. The heating element is formed using screen-printing and sintering technology. The advantage of this type of head is that the fabrication process is simple, and thus it is suitable for mass production. The thick-film head is used practically for large (A0 paper size) sheet printers and is frequently used for heat-sensitive recording by facsimile machines. The thin-film head has an excellent thermal response and thus is suitable for high-speed printing. It is frequently used in sublimation thermal transfer and melt transfer. However,
Temperature
Control
For ideal temperature control, when an input voltage signal is applied, the thermal head temperature is controlled at a preset temperature, and when the input signal is set to zero, the thermal head temperature returns to its original value immediately. This kind of control is not easily realized because of to heat hysteresis (Fig. 4); however, several methods which use special arrangements have been proposed. Concentrated
Thermal-Transfer
Head
Figure 5 shows the pattern of the heating elements of the thermal head used in the concentrated thermal-transfer system. As shown in Fig. 6, the thermal head in this system has many narrow regions. When using a heating element of this type, high-temperature sections are created in the high-resistance section, and heat transfer starts at areas centering around these high-temperature sections. As the applied energy is increased, the transferred areas
Figure
3. Thick-film
and thin-film
heads.
DYE TRANSFER
192
PRINTING
TECHNOLOGY
Figure
Figure
head.
4.
Transition
of the surface temperature
of a thermal
expand around these centering points. By incorporating sharp heat-generating distribution in the narrow regions, area-based gradation, in which the transfer recording area within one pixel of the heating element is altered, has been realized. This system can also be used with melt ink sheets. The thermal head is the major feature of this system. Ultrasonic
Thermal-Transfer
In this system, heat of ultrasonic vibrations,
Head
is
generated by the impact and the ink is melted and
transferred. This at high speed. Thermal
Thermal
Figure
5. Heating heads.
elements
of conventional
and concentrated
Concentrated
system
thermal-transfer
produces
head.
high-quality
images
Issues
issues related
to the thermal
head are as follows:
1. Issues arising from the temperature increase of the thermal head These issues include the operating temperature limit of the thermal head and the deterioration of the heatresistant layer on the thermal head side of the ink sheet. To cope with these problems, various arrangements related to the heat radiation fin have been incorporated. 2. Issues arising from thermal conditions of the thermal head These issues are related to the problem arising from the constant changes in the thermal head temperature during printing. In particular, measures that counteract transient temperature changes are required. These measures include optimum control by temperature detection using thermistors and temperature prediction using hysteresis data from gradation printing. Wear
thermal
6.
Issues
The thermal head maintains contact with the heatsensitive recording sheet; and, the thermal head wears due to frictional contact with the heat-sensitive recording sheet. In particular, if the heat-sensitive recording sheet contains hard chemical compounds, wear occurs rapidly. In addition to this type of mechanical wear, chemical wear also occurs. Because the surface of the thermal head is glassy, it is corroded by alkali ions and other substances contained in the heat-sensitive recording sheets. Utmost attention must paid to the causes of wear in developing thermal heads.
DYE TRANSFER Resolution
PRINTING
TECHNOLOGY
Progress
Resolution at 150 dpi (1980), 200 dpi (1987), 400 dpi (1996), and 600 dpi (1998) has been introduced for thermal heads. A 1200-dpi thermal head may be introduced in the 2000s.
DRIVING line
METHOD
Sequential
Method
As shown in Fig. 7, two driving methods are available. In the serial line sequential method, the thermal head is shifted for each color, and the recording sheet is gradually moved forward. In the parallel line sequential method, each color is aligned in parallel, the thermal head is shifted in parallel, and the recording sheet is moved forward. Area
Sequential
Figure
8. Area
sequential
method.
Method
As shown in Fig. 8, transfer sheets that are the same size as the recording sheets are continuously aligned, and transfer is carried out onto the entire area of the sheet. This method has high processing speed; and currently, most printers use this method. Typical
Configuration
Figure 9 depicts a typical configuration of a thermalThe system shown in Fig. 9 is the transfer printer. swing type; as the recording sheet completes recording
Figure
9. Typical
configuration
of a thermal-transfer
printer.
in one color, the sheet returns to its initial position. This type of printer is inexpensive and compact. A drumwinding system and a three-head system also exist. In the former, registration is relatively simple, and return of the recording sheet is not required; this contributes to a reduction in recording time. The three-head system is suitable for high-speed recording. ITEMS
RELATED
Reproduction
TO
IMAGE
of Density
PROCESSING
Gradation
The reproduction of color density by various printers can be classified into three methods, as illustrated in Fig. 10. In the first method, the density of each pixel changes. In this system, each pixel can be given a continuous change of color density, and nearly-perfect gradation can be reproduced over the entire gradation range. In the second method, an area is covered by a fixed number of dots whose color density is constant and whose dot size changes in relation to the density. In this method, the area of ink-covered dots changes; thus, when an image produced by this method is observed from a distance at which each dot in the image is not observable due to the resolution
DYE TRANSFER
PRINTING
TECHNOLOGY Color
Reproduction
Range
Figure 11 shows the color reproduction range on the CIE X, y chromaticity diagram. The three primary colors, Y (Yellow), M (Magenta), C (Cyan) of sublimable dyes have almost the same color reproduction range as that realized by offset color printing; they are suitable for use in fullcolor printer recording systems. VARIOUS Improvement
limit of the human eye, a coherent density change can be observed. This method is called the dot-size variable multivalue area pseudodensity reproduction method. In the third method, the number of dots whose dot density and size are constant is changed in an area; this is called the bilevel area method.
IMPROVED
SYSTEMS
in Processing
Speed
Generally, thermal-transfer color printers are inferior to the electrophotographic printers in processing speed. In this context, a one-pass full-color system has been developed. Figure 12 shows the construction of such a system. The one-pass system has four independent recording sections, Y, M, C, and K. The recording sheet is forwarded to the four sections in order. The disadvantage of this system is that the sheet-forwarding section is complex and, therefore, color deviations tend to develop easily. Even if the rotational speed of the drive roller is constant, the speed of the recording sheet varies due to the load and back-tension during forwarding, which results in color deviations. The correction of back-tension is insufficient to eliminate color deviations. Color deviations are prevented by detecting the speed of the recording sheet directly using a detection roller and controlling the drive roller to maintain a constant recording sheet speed. This is one of the countermeasures against color deviation. Improvement
of Durability
of Dye-Sublimation
Prints
The durability of images is poor in dye thermal transfer, due to the inherent nature of the dyes themselves. For example, discoloration due to light irradiation is a problem. To correct this problem, the following measures are adopted: l
Figure
11.
Color
reproduction
range
(CIE
x, y).
A compound that reacts with a transferred dye and stabilizes the resulting substance is mixed in the recording layer.
Figure
12.
One-pass
full-color
system.
DYE TRANSFER l
l
l
The recording layer is hardened after image recording. An ultraviolet-absorbing material or similar substance is added to the recording layer. A protective layer is applied as the surface layer.
Improvement
of Wax-Melt
Sheet-recycling
Printer
Printer
Used sheets must be discarded in thermal-transfer recording; this is problematic. In the sheet-recycling system, ink for transfer is pumped from a heated ink tank using a heated roller and is applied uniformly to an ink sheet. The ink on the used sheet is scraped off by the heated roller, returned to the ink tank, and reused. Thermal
Rheography
By introducing the idea of ink-jet recording, the disadvantage of the wax-melt printer that is, the rate at which ink sheets are consumed, has been reduced. In this system, a small hole is made at the center of the resistor in the thermal head, and solid ink which is melted by heating ink is transferred onto the recording sheets through the hole. OTHER
PRINTERS
Electrosensitive
BASED Transfer
ON
THERMAL
TRANSFER
Printer
In the electrosensitive transfer printer, colored materials are transferred using joule heat from electric conduction through a coated layer of high electric resistance which is placed under the sublimable dye or thermal-transfer ink layer (Fig. 13). The amount of ink transferred in this system, can be changed in accordance with the duration of electric conduction, and expressions of gradation are possible in each pixel. Currently, this system is not popular because the production cost of recording materials for this system is high compared with that of other systems. Light-Sensitive
Microcapsule
Printer
The capsules for this system are hardened by W exposure, unexposed capsules are crushed by pressure, the pigments in capsules are exposed and transferred onto the sheet,
13.
Structure
of electrosensitive
transfer.
TECHNOLOGY
195
and images are produced. The wall material of the capsules is a polymer of urea and formaldehyde. Acrylic monomers, a polymerization initiator, spectral sensitizers, and leuco dyes are contained in the capsules. Each capsule is 5-8 nm in diameter. The recording sheet contains a developer. After exposing capsule-containing sheets to W light, the sheets are superimposed on recording sheets and put through a pressure roller. Although exposed and hardened capsules are not crushed, the unexposed capsules are crushed, and the leuco dyes are transferred onto the recording layer. At this point, the developer and dyes in the capsules make contact with each other, and colors are produced. Micro
Dry-Process
Printer
Basically, the process is the same as that used in the conventional melt thermal-transfer system; however, in micro dry processing, to realize both an increase in contact pressure between the sheet surface and ink and retain ink on the sheet surface, ink penetration is suppressed by using a high viscosity ink. The high viscosity ink transfer process is explained here. Micro dry-process printing of the following four processes. 1. Ink pressurization Conventionally, ink in which the main component is an ethylene vinyl acetate copolymer has been used as a thermoplastic material which is resistant to high pressurization. 2. Ink heating To realize high-quality printing, the heat responsiveness of the thermal head and ink has been improved. The thermal head and ink are prepared using thin films, thus decreasing the heat capacity as well as the heat conduction delay. 3. Ink pressurization and cooling Using a resin-type thermoplastic ink which has a high viscosity to control ink characteristics in accordance with temperature changes during pressurization and cooling is important; excellent characteristics are achieved. 4. Base peel-off A design in which importance is placed on the conditions under which pressurized ink sections are transferred and nonpressurized ink sections are not transferred is adopted. Dye Transfer Source
Figure
PRINTING
Printer
Using
a Semiconductor
laser
as a Heat
Image resolution in a system that uses a thermal head as the heat source, depends on the level of integration of heating elements in the thermal head. Even if higher resolution is desired, there is a limitation in the fabrication of the thermal head. A dye-transfer printer using a laser as a heat source is being studied to resolve this problem. The dye sheet used in the printer is basically the same as that used in conventional dye-transfer printers; it consists of a base film and a dye layer, and the sheet consists of a base sheet and a dye-receiving layer. In addition, it is necessary to incorporate a layer that
196
DYE TRANSFER
PRINTING
Figure 14. (a) Surface-absorbing donar-layer-absorbing type.
TECHNOLOGY
type. (b) Dye-
converts laser light to heat in the dye sheet. Figure 14 shows two types of printing media used for this system. In contrast to the layout in Fig. 14a, in the construction of Fig. 14b, a layer that contains an infrared-ray absorbent and selectively absorbs semiconductor laser waves is included, and part of the laser beam is absorbed in the dye layer. By appropriately setting the parameters related to absorption, the amount of dye transferred can be increased. The thermal-transfer recording system which uses a laser as its heat source is mainly used for color proofing in printing processes. Some systems use melt and sublimation thermal transfer. The characteristics of the melt thermal-transfer system provide sharp dot matrix images. For sublimable dyes, a problem of color reproducibility arises from the difference between the dye colors and the colors used in the recording ink. When using a laser, relatively high power laser is required because it is a heat mode system. Previous research used a gas laser. However, in recent years, a high-power semiconductor laser has been developed, that contributed to the development of compact optical disc recording systems. There are two types of image creation systems in laser drawing: with respect to the recording surface on a drum, one is a system in which the laser from a fixed source is scanned using a mirror, and the other is a system in which the laser itself moves and emits laser beams. The latter systems are popular for thermal-transfer recording with low recording sensitivity. A resolution of 2540 dpi or higher has been currently realized. Resolution is the most important factor in laser recording systems. Multihead systems in which multiple lasers are used are being studied as a countermeasure for the shortcoming of the semiconductor laser system, namely, the time required for laser recording. DIRECT THERMAL
PRINTER
Coloring agents in the direct thermal printer, are applied in advance to the recording sheet itself. When heat is applied locally onto the sheet using a thermal head, physical and chemical coloring forms images. Currently, this system is widely used in simple printers such as facsimiles, word processors, and personal computers. In addition, this system is used in printers of measuring instruments, POS labeling, and medical equipment output systems.
This system has advantages over other systems for these reasons. It uses special sheets on which a coloring agent is applied; thus excellent coloring and image quality can be obtained. The system is so simple that the product is compact, lightweight, and inexpensive. Reliability is excellent and maintenance-free products can be realized. The only supplies required are recording sheets; handling and storage are easy. By altering the temperature of the heating element, the color density of each dot can be changed; thus, high-resolution images with gradation can be obtained. The color density difference is great and backgrounds that have a high level of whiteness can be obtained; thus it is possible to realize images that have good contrast. High-speed printing is possible. The following issues have been raised regarding the system. The system uses sheets onto which special coloring agents are applied; thus, compared with the system using plain paper, the cost for sheets is high. The fixing characteristics of the ink on the heat-sensitive recording sheets are poor. Color change and discoloration are easily caused by heat, light, pressure, temperature, and chemical agents. The storage of paper after printing must be managed carefully; the output from this system is not suitable for long-term storage. CONCLUSIONS Dye sublimation technologies using thermal heads were some of the most commercially important printers in the first half of the 1990s. However, the system has now been improved in performance. The resolution of the thermal head has been improved from 400 to 600 dpi. Further effort is being put forth, aimed at high precision and low cost. The problem of image storage properties, which was the crucial issue for dye thermal-transfer, has been solved substantially by stabilizing the dye itself by chemical reactions. Research in the area of recording materials continues in the search for better materials. In the area of heat sources, a high-power laser has made it possible to produce ultrahigh-precision images, such as those with resolution ranges from 3600 dpi to 4000 dpi, thus contributing to the improvement in printing-plate fabrication by replacing thermal heads. BIBLIOGRAPHY 1. K. Tsuji, The trend of the research in thermal technology E1D89-38, Tokyo, Japan, 1989.
printing
DYE TRANSFER 2. A. Iwamoto, “Thermal Japan, 1988.
printing
3. H. Tanaka,
Rep.
ITE
4. M. Mizutani 5. T. Goto, 6. T. Abe,
ITE ITE
Tech.
Tech.
7. K. S. Pennington Adv. NIP Technol. 8. IBM
Tech.
17(27),
19-24
Okz Tech.
Tokyo,
Rep.
47(10),
Rep.
11(26),
(1993).
J. U(2),
14. Hanma
Bull
28(l),
10. T. Kanai et al., 6th Int. Display Tokyo, Japan, 1986, p. 3.
(1993).
335-337
Res.
11.
S. Nakaya, K. Murasugi, M. Kazama, SID 23(l), 51-56 (1982). 12. C. A. Bruce and J. T. Jacobs, J. Appl. (1977). et al.,
SID
87,
Dig.,
(1984).
(1987).
and W. Crooks, SPSE Proc. VA, USA, 1984, p. 236.
Disclosure
16.
283-289
1397-1400 7-12
9. N. Taguchi, H. Matsuda, T. Kawakami, Proc. 4th Int. Cong. Adv. NIP Technol. LA, USA 1988 pp. 532-543.
13. M. Shiraish pp. 424-427.
E1088-29,
15. Hori
and S. Ito, Tech.
technology”
2nd
Int.
Cong.
(1985). and A. Imai, SPSE Thermograph Session,
Conf
(Jpn.
and Photo Louisiana,
Display
Y. Sekido, Eng.
3(l), USA,
861, Proc. 40-43 1987,
et al., IEEE et al., IEEE
Genno et al., pp. 284-287.
17.
I. Nose
18.
S. Ando
Trans.
Trans. SID
et al., SID
19. S. Masuda CE-28(3),
PRINTING
et al., SID
TECHNOLOGY
CE-31,431-437 CE-32,
Int. 85 Dig., 85 Dig.,
FLA,
(1986).
Dig., USA,
FLA,
et al., IEEE 226-232 (1982).
(1985).
283-289
Symp.
USA,
Trans.
197
AZ,
1985,
USA,
pp. 143-144.
1985,
pp. 160-163.
Consumer
Electron.
20.
W. Grooks et al., IS& T, 2nd Int. Cong. Printing Technol. VA, USA, 1984, p. 237.
21.
H. Ohnishi (1993).
22.
0. Sahni et al., SID 85 Dig., 1985, p. 152. A. Tomotake et al., IS& T, 14th Int. Cong. Adv. Prznting Technol., Tronto, CA, 1998, pp. 269-272.
23. 24.
M. Irie 231-234
et al., IEEE
and T. Kitamura, (1993).
Trans.
Electron
J. Imaging
1990,
Adv.
Devzces
Scz.
Non-impact 40(l),
69-74
Non-impact
Technol.
37(3),
E ELECTROENCEPHALOGRAM TOPOGRAPHY
(EEG) WOLFGANG
SKRANDIES
Institute of Physiology Justus-Liebig University Giessen, Germany THE FUNDAMENTAL THEORY ELECTROENCEPHALOCRAPHY Physiological of Spontaneous
and
Functional EEG
OF (EEG) Considerations
Neurons, the nerve cells in the brain, have a membrane potential that may change in response to stimulation and incoming information. Excitation is generated locally as intracellular depolarization, and it can be propagated over longer distances to nuclear structures and neurons in other brain areas via the nerve cells’ axons by soThe human brain consists of called “action potentials”. billions of neurons that are active spontaneously and also respond to internal and external sensory stimuli. Due to synaptic interconnections, brain cells form a vast complex anatomical and functional physiological structure that constitutes the central nervous system where information is processed in parallel distributed networks. At the synapses, chemical transmitters induce membrane potential changes at connected, postsynaptic neurons. In this way, information in the central nervous system is coded and transmitted either in the form of frequencymodulated action potentials of constant amplitude or as analog signals where local membrane potential changes occur gradually as postsynaptic depolarization or hyperpolarization. The cellular basics of neuronal membrane potentials and conductance have been studied by physiologists since the 1950s (l), and meanwhile much is also known about the molecular mechanisms in ionic channels and inside neurons [see overview by (2)]. Such studies employed invasive recordings using microelectrodes in animals or in isolated cell cultures where activity can be picked up in the vicinity of neurons or by intracellular recordings. Human studies have to rely on noninvasive measurements of mass activity that originates from many neurons simultaneously (the only exception is intracranial recordings in patients before tumor removal or surgery for epilepsy). The basis for scalp recordings is the spread of field potentials that originate in large neuronal populations through volume conduction. As described before, nerve cells communicate by changes in their electric membrane potential and dendritic activation (excitation, inhibition) due to local chemical changes and ionic flow across cell membranes. It is now well established that postsynaptic activation, (like excitatory postsynaptic potentials, (EPSP) or inhibitory postsynaptic potentials (IPSP), not action potentials, constitutes the physiological origin
of scalp-recorded electroencephalographic activity (3,4). Scalp electrodes typically have a diameter of 10 mm that is very large compared to the size of individual neurons (about 20 pm). Thus, the area of one electrode covers about 250,000 neurons, and due to spatial summation and the propagation of activity from more distant neural generators by volume conduction (5), many more nerve cells will contribute to mass activity recorded from the scalp. To detect brain activity at some distance using large electrodes, many neurons must be activated synchronously, and only activity that originates from “open” intracranial electrical fields can be assessed by scalp recordings. In contrast to this, “closed” electrical fields are formed by arrays of neurons arranged so that electric activity cancels. This constellation is found in many subcortical, network-like structures that are inaccessible to distant mass recordings. The major neuronal source for electrical activity on the scalp are the pyramidal cells in the cortical layers that are arranged in parallel perpendicular to the cortical surface which, however, is folded in intricate ways so that it cannot be assumed that the generators are perpendicular to the outer surface of the brain. Brain activity recorded from the intact scalp has amplitudes only in the range up to about loo-150 kV. Thus, these signals often are contaminated by noise that originates from other body systems such as muscle activity (reflected by electromyogram, EMG), eye movements (reflected by electrooculogram, EOG), or the heart (reflected by electrocardiogram, ECG) which often is of large amplitude. Line power spreading from electric appliances in the environment (50 or 60 Hz) is another source of artifact . For this reason, it is not astonishing that extensive measurements of brain activity from human subjects in ordinary electric surround conditions became possible only after the advent of electronic amplifiers that have high common-mode rejection. In 1929, the German psychiatrist Hans Berger published a paper on the first successful recording of human electrical brain activity. Similar to the earlier known electrocardiogram (ECG), the brain’s mass activity was called an electroencephalogram (EEG). In this very first description, Berger (6) already stressed the fact that activity patterns changed according to the functional status of the brain: electric brain activity is altered in sleep, anesthesia, hypoxia and in certain nervous diseases such as epilepsy. These clinical fields became the prime domain for the application of EEG measurement in patients [for details see (‘7)]. Two broad classes of activation can be distinguished in human brain electrophysiology: (1) spontaneous neural activity reflected by the electroencephalogram (EEG) which constitutes the continuous brain activation; and (2) potential fields elicited by physical, sensory stimuli, or by internal processing demands. Such activity is named evoked potential (EP) when elicited by external sensory stimuli, and event-related potential (ERP) when internal, psychological events are studied. Conventionally, EEG
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY
measures are used to quantify the global state of the brain system and to study ongoing brain activity during long time epochs (from 30 minutes to several hours) such as during sleep or long-term epilepsy monitoring. The study of evoked activity aims at elucidating different brain mechanisms, and it allows assessing sensory or cognitive processing while the subject is involved in perceptual or cognitive tasks. Spontaneous EEG can be described by so-called ‘‘graphoelements’’ that identify pathological activity [e.g., ‘‘spike and wave’’ patterns (7)], but it is quantified mainly in the frequency domain. Activity defined by waveforms occurs in different frequency bands between 0.5 and about 50 Hz. These frequency bands were established in EEG research during the past 70 years. A brief, not exhaustive summary of the individual frequency bands follows. It should be stressed that even though it was shown that the bands behave independently in different recording conditions, the exact functional definition of bands by factorial analyses depends to some extent on the conditions that are studied. In addition, large interindividual variation precludes generalizations of the functional interpretation of a given frequency band. The lowest frequencies are seen in the delta band that ranges from 0.5–4 Hz, followed by theta (4–7 Hz). In the healthy adult, these low frequencies dominate the EEG mainly during sleep although theta activity is also observed during processing such as mental arithmetic. In awake subjects, one finds alpha activity in a state of relaxation (8–13 Hz), and beta occurs during active wakefulness (14–30 Hz). Note, however, that alpha activity is also observed during the execution of welllearned behavior. In addition, there are high frequency gamma oscillations (30–50 Hz) that have been recorded in the olfactory bulb of animals and are related to attentive processing of sensory stimuli and to learning processes in cortical areas. A common, general observation is that large synchronized amplitudes occur at low frequencies, whereas high EEG frequencies yield only small signals because the underlying neuronal activity is desynchronized. The basic rhythms of electric brain activity are driven by pacemakers that have been identified in animal experiments in various brain structures such as the cortex, thalamus, hippocampus, and the limbic system (4). From the study of the electroresponsive properties of single neurons in the mammalian brain, it was proposed that oscillations have functional roles such as determining the functional states of the central nervous system (8). A description of the detailed clinical implications of the occurrence of pathological EEG rhythms is beyond the scope of this text, and the reader is referred to the overview by Niedermeyer and Lopes da Silva (7). As a general rule of thumb, in addition to epileptic signs like spikes or ‘‘spike and wave’’ patterns, two situations are generally considered pathophysiological indicators: (1) the dominant local or general occurrence of theta and delta activity in the active, awake adult; and (2) pronounced asymmetries of electric activity between the left and right hemispheres of the brain.
199
Physiological and Functional Considerations of Evoked Brain Activity Evoked brain activity is an electrophysiological response to external or to endogenous stimuli. Evoked potentials are much smaller in amplitude (between about 0.1 and 15 µ V) than the ongoing EEG (whose amplitudes are up to 150 µV), so averaging procedures must be employed to extract the signal from the background activity (9). A fundamental characteristic of evoked activity is that it constitutes time-locked activation that can be extracted from the spontaneous EEG when stimuli are presented repeatedly. For many years, the clinical use of qualitative analysis of EEG had dominated the field, and experienced neurological experts visually examined EEG traces on paper. Only in the 1960s have computer technology and modern analytic techniques allowed quantitative analyses of EEG data that finally led to the possibility of extracting of evoked brain activity and to topographic mapping. Measures of high temporal resolution of the order of milliseconds are needed to study brain processes. This is reflected by the fact that brain mechanisms are fast and that individual steps in information processing occur at high temporal frequency that correlates with rapid changes of the spatiotemporal characteristics of spontaneous and evoked electric brain activity. Similarly, it has been proposed that the binding of stimulus features and cooperation of distinct neural assemblies are mediated through high-frequency oscillations and coherence of neuronal activation in different parts of the brain (10). Due to its high sensitivity to the functional state of the brain, scalp-recorded electrical brain activity is a useful tool for studying human brain processes that occur in a subsecond range. The very high temporal resolution is an important prerequisite when rapidly changing electric phenomena of the human brain are investigated noninvasively. TOPOGRAPHIC IMAGING TECHNOLOGY AND BASIC IDEAS OF ELECTROENCEPHALOGRAPHIC MAPPING As is obvious from the description of other imaging techniques in this volume (CT, structural or functional MRI, PET), imaging of anatomical brain structures or of hemodynamic responses to different processing demands is available at high spatial resolution but typically has to rely on relatively long integration times to derive significant signals that reflect changes in metabolic responses. Different from brain imaging methods such as PET and functional MRI, noninvasive electrophysiological measurements of EEG and evoked potential fields (or measurements of the accompanying magnetic fields, MEG; see corresponding article) possess high temporal resolution, and techniques to quantify electric brain topography are unsurpassed when functional validity is required to characterize human brain processes. In addition, electric measurements are relatively easy and inexpensive to perform and offer the possibility of assessing brain function directly in actual situations without referring to indirect comparisons between experimental states and a neutral baseline condition or between different tasks.
200
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY
Electrophysiological data are recorded from discrete points on the scalp against some reference point and conventionally have been analyzed as time series of potential differences between pairs of recording points. Multichannel recordings allow assessing the topographical distribution of brain electrical activity. For imaging, waveform patterns are transformed to displays that reflect the electric landscape of brain activity at discrete times or different frequency content of the recorded EEG signals. The results of a topographical transformation are maps that show the scalp distribution of brain activity at discrete times or for selected frequencies of the spontaneous EEG. Such functional imaging (1) possesses high time resolution needed to study brain processes, (2) allows characterizing sequences of activation patterns, and (3) is very sensitive to state changes and processing demands of the organism. Conventionally, electrophysiological data were displayed and analyzed as many univariate time series. Only the technical possibility of simultaneous acquisition of data in multichannel recordings allowed treating of EEG data as potential distributions (11–13).The strength of mapping of brain activity lies not only in the display of brain activation in a topographical form, and mapping is also a prerequisite for adequate analysis of brain activity patterns. Because EEG and evoked potentials are recorded as potential differences between recording sites, the location of a reference point drastically influences the shape of activity recorded over time. The basic properties of scalp potential maps and adequate topographical analysis avoid the fruitless discussion about an electrically neutral reference point (14,15). In contrast to potential difference waveshapes, the spatial structure or landscape of a momentary map does not change when a different recording reference is employed. The topographical features remain identical when the electric landscape is viewed from different points, similar to the constant relief of a geographical map where sea level is arbitrarily defined as zero level: The location of maxima and minima, as well as the potential gradients in a given map, are independent of the chosen reference that defines zero. To analyze the topographical aspects of electroencephalographic activity, it is important to remember that we are dealing with electric fields that originate in brain structures whose physical characteristics vary with recording time and space: the positions of the electrodes on the scalp determine the pattern of activity recorded, and multichannel EEG and evoked potential data enables topographical analysis of the electric fields that are reconstructed from many spatial sampling points. Today, from 20 to 64 channels are typically used. From a neurophysiological point of view, components (or subsets) of brain activity are generated by the activation of neural assemblies located in circumscribed brain regions that have certain geometric configurations and spatiotemporal activity patterns. Landscapes of electric brain potentials may give much more information than conventional time series analysis that stresses only restricted aspects of the available electric data (12,13,16). Examining brain electric activity as a series of maps of the momentary potential distributions shows that these
‘‘landscapes’’ change noncontinuously over time. Brief epochs of quasi-stable landscapes are concatenated by rapid changes. Different potential distributions on the scalp must have been generated by different neural sources in the brain. It is reasonable to assume that different active neural assemblies incorporate different brain functions, so a physiologically meaningful data reduction is parsing the map series into epochs that have quasi-stable potential landscapes (‘‘microstates’’) whose functional significance can be determined experimentally. In spontaneous EEG, polarity is disregarded; in EP and ERP work, polarity is taken into account (12,14,17). As in most neurophysiological experiments on healthy subjects or on neurological or psychiatric patients, topographical analysis of EEG and evoked potentials is used to detect covariations between experimental conditions manipulated by the investigator and features of the recorded scalp potential fields. Evoked potential (or event-related potential) studies aim to identify subsets or so-called components of electric brain activity that are defined in terms of latency with respect to some external or internal event and in terms of topographical scalp distribution patterns. Measures derived from such data are used as unambiguous descriptors of electric brain activity, and they have been employed successfully to study visual information processing in humans (13,15). Regardless whether the intracranial generator populations can be localized exactly, the interpretation of scalp potential data combined with knowledge of the anatomy and physiology of the human central nervous system allows drawing useful physiological interpretations (see also later section). Scalp topography is a means of characterizing electric brain activity objectively in terms of frequency content, neural response strength, processing times, and scalp location. Comparison of scalp potential fields obtained in different experimental conditions (e.g., using different physical stimulus parameters, in different subjective or psychological states, or for normal vs. pathological neurophysiological traits) may be used to test hypotheses on the identity or nonidentity of the neuronal populations activated in these conditions. Identical scalp potential fields may or may not be generated by identical neuronal populations, whereas nonidentical potential fields must be caused by different intracranial generator mechanisms. Thus, we can study systematic variations of the electrical brain activity noninvasively, and we are interested in variations of scalp potential fields caused by manipulating independent experimental parameters. TOPOGRAPHIC MAPPING OF SPONTANEOUS ELECTROENCEPHALOGRAPHIC BRAIN ACTIVITY — MAPS OF FREQUENCY BAND POWER The spontaneous EEG is commonly analyzed in terms of the frequency content of the recorded signals that can be quantified by frequency analysis (Fast Fourier Transform, FFT). This allows numerically determining the amount of activity in each of the frequency bands described earlier. In addition, the spectra of electric brain activity summarize the frequency content of brain signals during longer epochs. Conventionally, artifact-free EEG is used with epoch lengths of 2–15 seconds, and the results
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY
are grouped in the classical frequency bands from low (delta, theta), to middle (alpha), and high frequencies (beta, gamma) between 0.5 and about 50 Hz. Intracerebral recordings may reveal even higher frequencies, but due to the filtering characteristics of the cerebrospinal fluid, the skull, and the scalp, the recorded signals contain only little very high frequency activity. To map the distributions of the classical EEGT, frequency bands are displayed topographically as maps of spectral power or amplitude. This allows detecting regional differences of activation patterns and also analyzing the influence of different brain states and processing demands on the spatial scalp activity distribution. Clinically relevant, pathological asymmetries are evident from topographic maps. An example of frequency distribution maps is given in Fig. 1 which illustrates the topography of the spontaneous EEG of a healthy volunteer. The time series of 90 epochs of 1024 ms duration were Fourier-transformed at each of 30 electrodes, and the spectra were averaged. Then, data were summed in the conventional frequency bands, and the results were plotted as scalp maps. For recording, 30 electrodes were distributed across the head in a regular array (see inset in Fig. 1), and the resulting potential distribution was reconstructed by spline interpolation. Note that this interpolation does not change the potential values measured at the recording electrodes and the area between electrodes attains a smooth appearance. Potential maps are used in electrophysiology to visualize recorded data. All quantitative steps and statistical data analysis rely on using the original signals measured at each of the recording electrodes. Various parameters can be extracted for characterization and topographical analysis of EEG and evoked potential data, some of which are described in more detail here. It is obvious that the topographical scalp distributions illustrated in Fig. 1 are dissimilar for the different basic EEG frequencies. Although we see symmetrical activation patterns in all frequency bands across both hemispheres, the detailed topography is different. Note that alpha activity is pronounced across the occipital areas (i.e., the back of the head) and lower frequencies such as EEG activity in the theta and delta bands show a more frontal distribution.
Delta
Theta
Alpha
Beta
201
TOPOGRAPHIC MAPPING OF EVOKED BRAIN ACTIVITY — TIME SEQUENCE MAPS Scalp Distribution of Evoked Potential Fields Following sensory stimulation, brain activity occurs across the primary and secondary sensory areas of the cortex. The brain has specialized regions for analyzing and processing different stimulus modalities that are represented in distinct cortical areas. In addition, sensory information is routed to the central nervous system in parallel pathways that analyze distinct sensory qualities (2). Knowledge of the anatomy and the pathways of various sensory systems allows studying information processing in different brain systems. Evoked potentials can be elicited in all sensory modalities (9), and as a response to visual stimulation, the brain generates an electric field that originates in the visual cortex which is located in the posterior, occipital parts of the brain. Because of their small amplitude, evoked potentials are averaged time-locked to the occurrence of the external sensory stimulus. This enhances stimulus-related activation by averaging and thereby removes the contribution of the spontaneous background EEG. Distinct components occur at latencies that directly reflect processing times and steps in information processing. The evoked field changes in strength and topography over time. A series of potential distributions is given in Fig. 2 that depicts the spatiotemporal activation pattern within the recording array at various times after stimulation. The stimulus was a grating pattern flashed in the right visual field at time zero. The maps in Fig. 2 are potential maps that were reconstructed from data recorded from the electrode array shown in the inset. The 30 electrodes are distributed as a regular grid across the brain regions under study. Only potential differences within the scalp field are of interest, so all data are referred to the computed average reference. This procedure results in a spatial high-pass filter that eliminates the dc-offset potential introduced by the necessary choice of some point as a recording reference. In Fig. 2, the evoked brain activation is displayed between 60 and 290 ms after the stimulus occurred. It Figure 1. Spatial distribution of frequency spectra of spontaneous EEG. Time series of 1.024 seconds length at each of the 30 electrodes were frequency analyzed by FFT, and 90 amplitude spectra were averaged. The results were grouped in the conventional frequency bands (delta: 0.5–4 Hz; theta: 4–7 Hz; alpha: 8–13 Hz; for beta activity the so-called ‘‘lower β’’ spectral values between 14 and 20 Hz were used) and are plotted as potential maps. Subject was a healthy awake volunteer in a quiet room. Head as seen from above, nose up, left ear on the left, right ear on the right side. For recording, 30 electrodes were distributed across the head in a regular array (see inset). Lines are in steps of 0.25 µV, blue corresponds to low activity, and red corresponds to high amplitudes. See color insert.
202
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY Stimulus in the right visual field
Stimulus in the left visual field
60
70
80
90
100
60
70
80
90
100
110
120
130
140
150
110
120
130
140
150
160
170
180
190
200
160
170
180
190
200
210
220
230
240
250
210
220
230
240
250
260
270
280
290 ms
260
270
280
290 ms
Figure 2. Sequence of averaged visual evoked potential maps recorded from the scalp of a healthy young adult who had normal vision. Activity was evoked by flashing a vertical grating pattern in the right visual half field. The stimulus subtending a square of 6° × 6° ; spatial frequency was 4.5 cycles/degree. Maps are illustrated between 60 and 290 ms at steps of 10 ms after stimulus occurred with respect to the average reference; 1-µV steps between equipotential lines. The electrode array covers the scalp from the inion (most posterior) to an electrode at 5% anterior to Fz [according to the International 10/20 Electrode System (27)]; see also head schema.
is obvious that the topography and the strength of the electric field change as a function of time. The field relief is pronounced between 80 and 90 ms, 120 and 140 ms, and 200 and 220 ms: there are high voltage peaks and troughs, associated with steep potential gradients. Obviously, these instances indicate strong synchronous activation of neurons in the visual cortex. Due to the regular retinotopic projection of the visual field onto the mammalian visual cortex, stimuli presented in lateral half fields are followed by lateralized brain activation. The left visual field is represented in the right visual cortex, and the right visual field in the left hemisphere. An asymmetrical potential distribution is evident from the maps illustrated in Fig. 2. Brain activity elicited by internal or external stimuli and events may be decomposed into so-called components that index steps of information processing that occur at different latencies after the stimulus. When the visual stimulus is in the right visual field, there is a positive peak over the left occipital cortex at around 80 ms, whereas a similar peak occurs at 140 ms across the right brain area. When the same visual target is presented in the left half field, the topographical pattern of activity shows a reversed scalp distribution pattern (Fig. 3). The basic features of the fields are similar to those evoked by right visual stimuli, but the lateralization is changed toward
Figure 3. Sequence of averaged visual evoked potential maps elicited by a grating pattern in the left visual half field. Maps are illustrated between 60 and 290 ms at steps of 10 ms after stimulus occurred. Same stimulation and recording parameters as before, for details refer to legend of Fig. 2.
the opposite hemisphere. This is evident when the maps series in Figs. 2 and 3 are compared. More detailed analysis is needed to interpret the topographic patterns. It is important to remember that the absolute locations of the potential maxima or potential minima in the field do not necessarily reflect the location of the underlying generators (this fact has led to confusion in the EEG literature, and for visual evoked activity, this phenomenon became known as paradoxical lateralization). Rather, the location of the steepest potential gradients is a more adequate parameter that indicates the intracranial source locations. This point will be discussed in more detail later. Quantitative Analysis of Brain Activity Images and Derived Measures Mapping electric brain activity in itself does not constitute data analysis; it is used mainly to visualize complex multichannel time series data. To compare activity patterns by conventional statistical methods, data must be reduced to quantitative descriptors of potential fields. In the following, derived measures used for topographical EEG and evoked potential analysis are presented. Scalp recorded fields reflect the synchronous activation of many intracranial neurons, and it has been proposed that steps in information processing are reflected by the occurrence of strong and pronounced potential fields (13,14). During these epochs, the electric landscape of spontaneous EEG and of evoked activity is topographically stable [see Topographic Imaging and (12,14,17)]. In the analysis of evoked brain activity, one of the main goals is identifying so-called components. As evident from the maps shown in Figs. 2 and 3, there are potential field distributions that
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY
i=1
4.5 142 ms 210 ms
3
1.5
84 ms
0 0
125
250
375
500
625
750
625
750
Latency (ms)
(1)
Thus, GFP corresponds to the root-mean-square amplitude deviations among all electrodes in a given potential field. Note that this measure is not influenced by the reference electrode, and it allows for a reference-independent treatment of EEG and evoked potential data. Similarly, GFP may be computed from average reference data: 2 n n 1 1 Ui − Uj GFP = n n
(a)
(2)
j=0
This measure is mathematically equivalent to Eq. (1). In summary, GFP reflects the spatial standard deviation within each map. Potential fields that have high peaks and troughs and steep gradients are associated with high GFP values, and when fields are flat, GFP is small. The potential field strength determined by GFP results in a single number at each latency point, and its value is commonly plotted as a function of time. This descriptor shows how field strength varies over time, and its maximum value in a predetermined time window can be used to determine component latency. Note that GFP
(b)
4.5 Global field power (µV)
n n 1 2 Ui − Uj GFP = 2 2n i=1 j=1
computation considers all recording electrodes equally, and that at each time, it results in one extracted value that is independent of the reference electrode. High global field power also coincides with periods of stable potential field configurations where the major spatial characteristics of the fields remain unchanged. Fig. 4 illustrates the GFP of the topographical data of Figs. 2 and 3 as a function of time. In both conditions, stimulation of the right or left visual field, three components occur between about 80 and 210 ms. The brain’s response to the grating pattern resulted in an electric field that is maximally accentuated at these times. The quantification of field strength defines component latency by considering all topographically recorded data. Thus, this procedure is independent of the choice of a reference electrode. It is evident that visual stimuli in the right and the left visual field yield brain activation that has comparable temporal characteristics. The small latency
Global field power (µV)
have very little activity (few field lines, shallow gradients at 60 ms, around 100 ms, or after 260 ms), whereas at other latency times, maps display high peaks and deep troughs that have large potential gradients in the potential field distributions (around 80, 140, or 200 ms in Figs. 2 and 3). It appears reasonable to define component latency as the time of maximal activity in the electric field that reflects synchronous activation of a maximal number of intracranial neuronal elements. To quantify the amount of activity in a given scalp potential field, we have proposed a measure of ‘‘global field power’’ (GFP) that is computed as the mean of all possible potential differences in the field that corresponds to the standard deviation of all recording electrodes with respect to the average reference (14). Scalp potential fields that have steep gradients and pronounced peaks and troughs result in high global field power, and global field power is low in electric fields that have only shallow gradients and a flat appearance. Thus, the occurrence of a maximum in a plot of global field power over time determines component latency. In a second step, the features of the scalp potential field are analyzed at these component latencies. Derived measures such as location of potential maxima and minima and steepness and orientation of gradients in the field are, by definition, independent of the reference electrode, and they describe the electric brain activity adequately. Global field power is computed as the mean potential deviation of all electrodes in the recording array at each time. Using equidistant electrodes on the scalp, the potentials ei , i = 1, . . . , n yield the measured voltages Ui = ei − ecommon reference . From this potential distribution, a reference-independent measure of GFP is computed as the mean of all potential differences within the field:
203
144 ms 206 ms
3 78 ms 1.5
0 0
125
250 375 500 Latency (ms)
Figure 4. Global field power (GFP) between stimulus occurrence and 750 ms after stimulation as a function of time, computed for the topographical data shown in Figs. 2 and 3. Most pronounced activity is seen at latencies below 250 ms. (a) Stimulus in the right; (b) stimulus in the left visual half field. In both conditions, three components occur with latencies of 84, 142, and 210 ms for right, and 78, 144 and 206 ms for left stimuli. The corresponding scalp field distribution of the components is indicated by the maps at component latency. Blue relative negative, red positive polarity with respect to the average reference; 1-µV steps between equipotential lines. See color insert.
204
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY
differences seen are not statistically significant. On the other hand, there are large, highly significant differences in the topography of the components. The maps illustrated in Fig. 4a and b that are elicited around 80 ms and around 140 ms after stimulation show similar characteristics; however, the pattern of lateralization is mirror image symmetrical: for right visual half field stimuli, a positive extremum occurs at 80 ms across the left occipital areas, for stimuli in the left visual field, the positive component is seen across the right occipital cortex. Similarly, the 140-ms component is systematically differently lateralized in processing of left and right visual stimuli (compare Fig. 4a and b), whereas at 206 and 210 ms latency, the asymmetries are much less pronounced. There is a large literature for the visual system illustrating how evoked potential measures have been used to elucidate processing mechanisms (9). The time course of information processing as well as the structures in the visual cortex activated depends on the location of the stimulus in the visual field (13), and it has also been demonstrated that the temporal characteristics of processing of contrast or stereoscopic depth perception are comparable. Such data on neural processing time in healthy adult subjects or in patients are accessible only with topographic analysis of electric brain activity. Using further topographical analysis, latency of the components, field strength, or the complete potential fields at component latency may be compared directly between experimental conditions or between subject and patient populations, resulting in significance probability maps. All derived measures can be submitted to statistical analysis. Component latency may be equated with neural processing time, and field strength is an index of the amount of synchronous activation or the spatial extent of a neuronal population engaged in stimulus processing. Topographical descriptors such as the location of potential maxima or minima, the centers of gravity, or the location and orientation of potential gradients give information on the underlying neuronal assemblies (see later). Such measures can be used to quantify the topography of potential fields, and they constitute a physiologically meaningful data reduction. As evident from Fig. 4, the visual stimulus that occurred in the right or in the left visual half field elicits cortical activity around 80 and 140 ms that has different strength. As the topographical distribution of the three components suggests, different neuronal assemblies in the human visual cortex are involved in different subsequent processing steps. Inspection of the maps reveals, however, only a very loose relationship between the location of the potential maxima and the intracerebral neuronal assemblies that are activated. From the anatomical wiring of the mammalian visual pathways, we know that information that originates in the left half field is routed to the right occipital cortex, and stimuli in the right visual field are processed by the left hemisphere. The scalp distribution of components around 140 ms is paradoxical. This will be discussed in more detail later. Of course, nonidentical scalp potential fields reflect the activation of nonidentical intracranial neurons. Due to the ‘‘inverse’’ problem, already stated by von Helmholtz (18),
there is no unique solution to the question of source localization if there is more than one active generator. This holds true in general, for EEG and also for other electrophysiological signals such as ECG. The following section will discuss the basic ideas of source localization and will introduce an approach that may result in physiologically meaningful solutions. RELATIONSHIP OF SCALP-RECORDED ACTIVITY TO INTRACRANIAL BRAIN PROCESSES Analysis of Intracranial Processes — Model Sources and Neuroanatomical Structures One aim of electrophysiological recordings of brain activity on the scalp is characterizing of underlying sources in the central nervous system. Information is processed in circumscribed brain areas, and spontaneous activation patterns originate from specific structures in the central nervous system. Thus, it appears consequential to try to explain the topography of scalp distribution patterns in terms of anatomical localization of neuronal generators. To arrive at valid interpretations of scalp recorded data is no trivial task: The so-called ‘‘inverse’’ problem that cannot be uniquely solved constitutes a fundamental and severe complication. Any given surface distribution of electrical activity can be explained by an endless variety of intracranial neural source distributions that produce an identical surface map. Thus, it has long been known that there is no unique numerical solution when model sources are determined (18). Over the years, a number of different approaches to determine physiologically and anatomically meaningful source locations mathematically have been proposed; among them is the fit of model dipoles located within a spherical model head or a realistic head shape model [review by (19)]. To solve the inverse problem mathematically, researchers have to make assumptions that are based on anatomical and physiological knowledge of the human brain. This can also be used to limit the number of possible intracranial source distributions for multiple dipole solutions (20,21). Information is processed in parallel distributed networks, and long-range cooperation between different structures constitutes a basic feature of brain mechanisms (10). Due to this fact, neuronal activation is distributed over large areas of the central nervous system, and its extension is unknown. Thus the approach of single dipole solutions or the fit of a small number of dipoles to the data is not appropriate when unknown source distributions should be detected. More realistic views of underlying mechanisms are applied by methods that aim to determine the most likely intracranial sources that may be distributed through an extended three-dimensional tissue volume. Low-resolution electromagnetic tomography (LORETA) rests on neurophysiological assumptions such as the smooth intracortical distribution of activity and the activation of dendrites and synapses located in the gray matter of the cortex. The details of the method, as illustrated here, can be found in Pascual-Marqui et al. (22). Here, we give only a brief summary of the logic of the analysis: Lowresolution brain electromagnetic tomography determines
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY
the three-dimensional distribution of generator activity in the cortex. Different from dipole models, it is not assumed that there are a limited number of dipolar point sources or a distribution on a known surface. LORETA directly computes the current distribution throughout the entire brain volume. To arrive at a unique solution for the threedimensional distribution among the infinite set of different possible solutions, it is assumed that adjacent neurons are synchronously activated and neighboring neurons display only gradually changing orientation. Such an assumption is consistent with known electrophysiological data on neuronal activation (8). This basic constraint computes the smoothest of all possible three-dimensional current distributions that results in a true tomography that, however, has relatively low spatial resolution; even a point source will appear blurred. The computation yields a strength value for each of a great many voxels in the cortex and no constraints are placed on the number of model sources. The method solves the inverse problem without a priori knowledge of the number of sources but by applying the restriction of maximal smoothness of the solution based on maximally similar activity in neighboring voxels. The result is the current density at each voxel as the linear, weighted sum of the scalp electric potentials. The relation to brain anatomy is established by using a threeshell spherical head model matched to the atlas of the human brain (23) that is available as a digitized MRI from the Brain Imaging Centre, Montreal Neurologic Institute. The outlines of brain structures are based on digitized structural probability MR images derived from normal brains. Registration between spherical and realistic head geometry uses EEG electrode coordinates reported by Towle et al (24), and in a current implementation, the solution space is restricted to the gray matter of the cortical and hippocampal regions. A total of 2394 voxels at 7-mm spatial resolution is produced under these neuroanatomical constraints (21).
The final results are not single locations, as obtained from model dipole sources but are reconstructed tomographical slices through the living human brain. In this respect, there is some similarity to brain imaging methods such as CT, PET, and MRI; however, here we are dealing with the intracranial distribution of extended potential sources that best explain the scalp recorded EEG or evoked potential data. Intracranial Correlates of Spontaneous EEG Spontaneous EEG quantified by frequency analysis can be displayed as topographical maps that illustrate the scalp distribution of conventional frequency bands, as in Fig. 1. The model sources of alpha activity reconstructed by low-resolution electromagnetic tomography are shown in Fig. 5. The source location data were computed for the cortical areas of the Talairach atlas and are displayed as horizontal (axial), sagittal, and frontal (coronal) views. Activity is shown as the estimated current density strength in the different regions of the central nervous system. Dark red areas indicate stronger neural activation; light colors correspond to low activity. It is obvious that major neuronal sources are found mainly in the posterior, occipital areas of the cortex. This is to be expected based on the vast literature on spontaneous EEG in healthy adults (7). The images of Fig. 5 are selected cuts (tomograms) made through the voxel of maximal cortical activity. Other views of the brain anatomy may be used to visualize further the three-dimensional distribution of activity. This is presented in Fig. 6 which shows a series of horizontal cuts through the brain. There are 17 tomograms at different levels of the head that illustrate where alpha activity is located in the cortex. Similar to Fig. 5, major neuronal activation can be seen in occipital regions, and slight asymmetries become evident at different depths. These data further support the interpretation that large areas of occipital cortex are involved in producing the resting EEG rhythm of a healthy adult subject.
Mean alpha activity L
R
−5
0
+5 cm (X)
Loreta−Key
(Y) +5
A
P
(Z)
L
R
(Z)
0
+5
+5
−5
0
0
−10
−5
−5
(Y)
+5
0
205
−5
−10 cm
−5
0
+5 cm (X)
Figure 5. Low-resolution electromagnetic tomography (LORETA) images of alpha-band activity of the data illustrated in Fig. 1. The results are local current densities that were computed for the cortical areas of the Talairach probability atlas (digitized structural probability MR images). From left to right, the three images show horizontal, sagittal, and frontal views of the brain. Red indicates strong activity; light colors indicate low activity. The cuts illustrate intracerebral locations of maximal activation. See color insert.
206
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY L
R [Y]
L
R [Y]
0
0
0
0
−5
−5
−5
−5
−5
−10
+5 cm [X]
−5
0
L
−5
R [Y]
−5
R [Y]
0
+5 cm [X]
L
−5
R [Y]
0
L
+5 cm [X] R [Y]
+5
+5
0
0
0
0
0
−5
−5
−5
−5
−5
−10 −5
−10 [Z=8]
0
L
+5 cm [X]
−5
R [Y]
−10 [Z=15]
0
+5 cm [X]
L
−5
R [Y]
−10 [Z=22]
0
+5 cm [X]
L
−5
R [Y]
0
L
+5 cm [X] R [Y]
+5
+5
+5
+5
+5
0
0
0
0
0
−5
−5
−5
−5
−5
−10
−10
−10 [Z=36]
+5 cm [X]
−5
R [Y]
[Z=43]
0
L
+5 cm [X]
−5
R [Y]
+5
+5
0
0
−5
−5
−10
−10
−10 [Z=50]
0
+5 cm [X]
−5
−10 [Z=57]
0
+5 cm [X]
−5
0
+5 cm [X]
Loreta-Key
0.000
−5
+5 cm [X]
+5
+5 cm [X]
[Z=64]
0
L
−10 [Z=−13]
+5
[Z=29]
0
−10 [Z=−20]
+5
R [Y]
−5
+5 cm [X]
[Z=1]
L
−10 [Z=−27]
−10
L
R [Y]
0
[Z=−6]
0
L
+5
R [Y]
−5
R [Y] +5
[Z=−34]
0
L
+5
−10 −5
R [Y]
+5
[Z=−41]
L
L
+5
1.959
3.917
5.876
7.834 −2 [×10 ]
Mean alpha activity
[Z=71]
0
+5 cm [X]
−5
0
+5 cm [X]
Figure 6. Low-resolution electromagnetic tomography (LORETA) images of alpha-band activity shown as a series of horizontal (axial) cuts through the brain. There are 17 tomograms computed for different levels. Same data as in Fig. 5. See color insert.
When lower frequencies of spontaneous EEG are analyzed, the pattern of intracerebral activation changes. Fig. 7 illustrates the source distribution of theta activity recorded from the same subject. A redistribution of activity to different brain areas is evident. Mainly regions in the frontal lobe and in the cingular cortex are active when the theta rhythm is mapped. This is what is suggested by the surface distribution illustrated in Fig. 1. In addition, it has been shown that scalp recorded frontal midline theta activity (FMT) during mental calculation, exercise, and sleep occurs in healthy human subjects as well as in patients suffering from brain tumors or chemical intoxication. Such findings have been related to brain functions located in the frontal lobe of humans (25). The strong activation in frontal areas that is also seen in series of horizontal tomograms is in line with such reports. The source
localization data in Fig. 8 further illustrate the occurrence of strong electrical activity in frontal brain areas in most of the horizontal (axial) slices through the brain. Intracranial Correlates of Evoked Brain Activity As reviewed earlier, some evoked components occur at unexpected locations in the scalp field distributions. Fig. 4 illustrated the paradoxical lateralization of visual evoked potentials. When intracerebral sources are computed from the recorded data, this observation may be validated or corrected. Fig. 9 shows the scalp field distribution of a component following stimulation of the left visual field at a latency of 144 ms. In the surface map, the activation is lateralized across the left hemisphere where maximal positivity can be identified at occipital areas. This does not agree with what must be expected based on the knowledge of the human visual pathways where
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY
207
Mean theta activity L
Loreta−Key
R (Y) +5
−5
+5 cm
0
A
P
(Z)
L
R
(Z)
0
+5
+5
−5
0
0
−10
−5
−5
(X)
(Y)
+5
−5
0
−5
−10 cm
+5 cm (X)
0
Figure 7. Low-resolution electromagnetic tomography (LORETA) images of theta-band activity of the data illustrated in Fig. 1. The results are local current densities that were computed for the cortical areas of the Talairach probability atlas. From left to right, the three images show horizontal, sagittal, and frontal views. Red indicates strong activity, light colors indicate low activity. The cuts illustrate intracerebral locations of the voxel at maximal activation. Details as in the legend of Fig. 5. See color insert. L
R [Y]
L
R [Y]
0
0
0
0
−5
−5
−5
−5
−5
−10
+5 cm [X]
−5
0
L
+5 cm [X]
+5 cm [X]
0
L
−5
R [Y]
−10 [Z=−13]
0
+5 cm [X]
L
−5
R [Y]
0
L
R [Y]
+5
+5
+5
0
0
0
0
0
−5
−5
−5
−5
−5
−10
−10 0
L
−5
+5 cm [X] R [Y]
−10
+5 cm [X]
0
L
−5
R [Y]
−10 [Z=22]
[Z=15]
[Z=8]
−5
+5 cm [X]
0
+5 cm [X]
L
−5
R [Y]
0
L
R [Y]
+5
+5
+5
+5
0
0
0
0
0
−5
−5
−5
−5
−5
−10 [Z=36]
+5 cm [X]
−5
R [Y]
−10
0
L
+5 cm [X]
−5
R [Y]
+5
+5
0
0
−5
−5
−10 −5
+5 cm [X]
0
−10 [Z=57]
[Z=50]
[Z=43]
0
+5 cm [X]
−5
0
Loreta−Key
0.000
1.707
3.415
5.122
6.830 −2 [×10 ]
Mean theta activity −10 −5
−10 [Z=71]
0
+5 cm [X]
+5 cm [X]
+5
−10
[Z=64]
+5 cm [X]
+5
[Z=29]
0
−10 [Z=−20]
+5
R [Y]
−5
−5
R [Y]
[Z=1]
L
−10 [Z=−27]
−10 0
R [Y]
0
[Z=−6]
−5
L
+5
R [Y]
L
R [Y] +5
[Z=−34]
0
L
+5
−10 −5
R [Y]
+5
[Z=−41]
L
L
+5
−5
0
+5 cm [X]
Figure 8. Low-resolution electromagnetic tomography (LORETA) images of theta-band activity shown as a series of horizontal slices through the brain. There are 17 tomograms computed for different levels. Same data as in Fig. 7. See color insert.
+5 cm [X]
208
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY
Stimulus in the left visual field
144 ms L
Loreta-key
R (Y) +5
−5
0
A
P
(Z)
L
R
(Z)
0
+5
+5
−5
0
0
−10
−5
−5
+5 cm (X)
(Y) +5
0
−5
−10 cm
−5
0
+5 cm (X)
Figure 9. Scalp distribution map, electrode scheme, and low-resolution electromagnetic tomography (LORETA) images of a component elicited at 144 ms by a stimulus in the left visual field. In the potential map, blue indicates relative negative and red positive polarity with respect to the average reference; there are 1-µV steps between equipotential lines. The three LORETA images illustrate horizontal, sagittal, and frontal (coronal) views. Red indicates strong activity; light colors indicate low activity. Note that the intracerebral location of activity corresponds to what is expected from the anatomy of the visual pathway. See color insert.
information is routed to cortical areas opposite the visual half field that is stimulated. When the most likely source configuration is computed by LORETA, a more clear-cut result arises: major intracranial activation is seen in the occipital cortex of the right hemisphere of the brain. This is evident from the tomograms through the areas of maximal intracranial activity illustrated in Fig. 9 where activation in the sagittal plane is found in the posterior, occipital brain areas. From the frontal (coronal) and the horizontal (axial) sections, it is clear that mainly neurons in the right hemisphere respond to the stimulus. The results shown in Fig. 10 confirm this interpretation: the sequences of horizontal slices through the brain display lateralized activity restricted mainly to the right hemisphere. Thus, the three-dimensional source localization confirms that neurons in the cortical areas opposite the stimulated visual half field respond to the grating pattern presented. Such data clearly demonstrate that a direct topographical interpretation of scalp field distributions must not rely on the location of extreme values that may be misleading. More meaningful information is reflected by the location and orientation of potential gradients of the scalp field patterns. CLINICAL AND PRACTICAL IMAGING APPLICATIONS As evident from the data and results presented in the preceding sections, mapping brain electrical activity constitutes a means for (1) visualizing brain electric field
distributions on the scalp, (2) adequate quantitative and statistical analysis of multichannel electrophysiological data, and (3) computational determination of possible underlying neuronal populations that are spontaneously active or are activated by sensory stimulation or psychological events. Thus, electric brain activity can be characterized in terms of latency (i.e., processing times), synchronous involvement and extent of neuronal populations (i.e., field strength), and topographical distribution of frequency content or potential components. Practical applications are twofold: studies of functional states of the human brain, information processing, and motor planning in healthy subjects, and clinical questions on the intactness and functionality of the central nervous system of patients suspected of having nervous or psychiatric disease. Such experimental investigations are part of contemporary neurophysiological questions of the way global states affect brain functions such as processing of sensory or psychological information, motor planning and execution, and internal states related to emotion and cognition. In normal subjects as well as in patients, such processes can be studied and characterized by electric brain activity patterns. For clinical purposes, the question of deviation from normal is relevant. Such deviations may often not be obvious in visual inspection of EEG or evoked potential traces but are detectable in quantitative, numerical evaluation. The mapping of the electric scalp distribution patterns of healthy volunteers allows
ELECTROENCEPHALOGRAM (EEG) TOPOGRAPHY L
R [Y]
L
R [Y]
0
0
0
0
0
−5
−5
−5
−5
−5
−10
+5 cm [X]
−5
0
L
+5 cm [X]
+5 cm [X]
0
L
−5
R [Y]
−10 [Z=−13]
0
+5 cm [X]
L
−5
R [Y]
0
L
+5 cm [X] R [Y]
+5
+5
+5
+5
0
0
0
0
0
−5
−5
−5
−5
−5
−10
−10 [Z=8]
−5
+5 cm [X]
0
L
−5
+5 cm [X] R [Y]
−10 [Z=15]
−5
+5 cm [X]
0
L
R [Y]
−10 [Z=22]
0
−5
+5 cm [X]
L
R [Y]
0
L
+5 cm [X] R [Y]
+5
+5
+5
+5
+5
0
0
0
0
0
−5
−5
−5
−5
−5
−10 [Z=29]
−10 [Z=36]
0
−10 [Z=−20]
+5
R [Y]
−5
−5
R [Y]
[Z=1]
L
−10 [Z=−27]
−10
L
R [Y] +5
[Z=−6]
0
L
+5
R [Y]
−5
R [Y]
+5
[Z=−34]
0
L
+5
−10 −5
R [Y]
+5
[Z=−41]
L
L
209
+5 cm [X]
−5
R [Y]
−10 [Z=43]
0
+5 cm [X]
−5
R [Y]
L
+5
+5
0
0
+5 cm [X]
0
−5
−10 [Z=57]
0
+5 cm [X]
−5
0
+5 cm [X]
Loreta-Key
0.000 −5
−10 [Z=50]
0.348
0.696
1.043
1.391 [×10−2]
−5
Left visual field; 144 ms −10 [Z=64]
−5
−10 [Z=71]
0
+5 cm [X]
−5
0
+5 cm [X]
Figure 10. The source location results of Fig. 9 displayed as a series of 17 slices that display horizontal tomograms. Note that activity is restricted to the right occipital areas throughout the brain. See color insert.
establishing normative data. It has been shown that neurological impairment or psychiatric disorders are characterized by distinct profiles of abnormal brain electric fields. Such significant deviations can statistically validated and used for clinical diagnosis and evaluating treatment efficacy (26). Brain topography also allows drawing conclusions about the possible localization of pathological activity such as spikes or ‘‘spike and wave’’ patterns in epilepsy or the location of abnormal activity caused by structural damage such as lesions or tumors. In general, sensory evoked brain activity is recorded in clinical settings to test the intactness of afferent pathways and central processing areas of the various sensory modalities in neurological patients. Event-related brain activity elicited during cognitive tasks has its main application in the fields of psychology and psychiatry
where cognition, attention, learning, and emotion are under study. These fields profit from the application of topographic mapping and analysis of brain electric activity in real time. Future applications of topographic mapping of electrophysiological activity will include coregistration of high time resolution EEG recordings with brain imaging methods such as functional MRI (see corresponding articles in this volume). It is to be expected that the collaboration of the fields will lead to functional imaging of brain activity that has high temporal and high spatial resolution. Acknowledgments I thank Dr. Roberto Pascual-Marqui for assistance in computing the source localization results illustrated in the section on the relationship of electric brain activity to intracranial brain processes.
210
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
ABBREVIATIONS AND ACRONYMS
22. R. D. Pascual-Marqui et al., Int. J. Psychophysiol. 18, 49–65 (1994).
CT ECG EEG EMG EOG ERP EP EPSP FFT FMRI GFP IPSP LORETA MEG FMT PET
23. J. Talairach and P. Tournoux, in Co-Planar Stereotaxic Atlas of the Human Brain, Thieme, Stuttgart, 1988.
computer tomography electrocardiogram electroencephalogram electromyogram electrooculogram event-related potential evoked potential excitatory postsynaptic potentials Fast Fourier Transform functional magnetic resonance imaging global field power inhibitory postsynaptic potentials Low Resolution Electromagnetic Tomography magnetoelectroencephalogram midline theta activity positron emission tomography
24. V. L. Towle et al., Electroencephalogr. Clin. Neurophysiol. 86, 1–6 (1993). 25. S. Matsuoka, Brain Topogr. 3, 203–208 (1990). 26. E. R. John et al., Science 239, 162–169 (1988). 27. H. H. Jasper, Electroencephalogr. Clin. Neurophysiol. 20, 371–375 (1958).
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER MICHAEL KOTLARCHYK Rochester Institute of Technology Rochester, NY
BIBLIOGRAPHY 1. A. L. Hodgkin and A. F. Huxley, J. Physiol. (Lond.) 117, 500–544 (1952). 2. E. R. Kandel, J. H. Schwartz, and T. M. Jessel, in Principles of Neural Science, 4th ed., McGraw-Hill, NY, 2000. 3. O. Creutzfeldt and J. Houchin, in O. Creutzfeldt, ed., Handbook of Electroencephalography and Clinical Neurophysiology, vol. 2C, Elsevier, Amsterdam, 1974, pp. 5–55. 4. F. H. Lopes da Silva, in R. Greger and U. Windhorst, eds., Comprehensive Human Physiology, Springer, NY, 1996, pp. 509–531. 5. W. Skrandies et al., Exp. Neurology 60, 509–521 (1978). 6. H. Berger, Arch. Psychiatr. Nervenkr. 87, 527–570 (1929). 7. E. Niedermeyer and F. Lopes da Silva, in Electroencephalography: Basic Principles, Clinical Applications, and Related Fields, 3rd ed., Williams and Wilkins, Baltimore, 1998. 8. R. R. Llinas, Science 242, 1654–1664 (1988). 9. D. Regan, in Human Brain Electrophysiology, Elsevier Science, NY, 1989. 10. W. Singer, Neuron 24, 49–65 (1999). 11. D. Lehmann, Electroencephalogr. Clin. Neurophysiol. 31, 439–449 (1971). 12. D. Lehmann, in A. Gevins, A. Remond, eds., Handbook of Electroencephalography and Clinical Neurophysiology, Rev. Series, vol. 1, Elsevier, Amsterdam, 1987, pp. 309–354. 13. W. Skrandies, Prog. Sens. Physiol. 8, 1–93 (1987). 14. D. Lehmann and W. Skrandies, Electroencephalogr. Clin. Neurophysiol. 48, 609–621 (1980). 15. W. Skrandies, Biol. Psychol. 40, 1–15 (1995). 16. W. Skrandies, in F. H. Duffy, ed., Topographic Mapping of Brain Electrical Activity, Butterworths, Boston, 1986, pp. 728. 17. D. Lehmann et al., Int. J. Psychophysiol. 29, 1–11 (1998). 18. H. von Helmholtz, Ann. Phys. Chem. 29, 211–233, 353–377 (1853). 19. P. L. Nunez, in P. L. Nunez, ed., Neocortical Dynamics and Human EEG Rhythms, Oxford University Press, NY, 1995, pp. 3–67. 20. Z. J. Koles and A. C. K. Soong, Electroencephalogr. Clin. Neurophysiol. 107, 343–352 (1998). 21. R. D. Pascual-Marqui, J. Bioelectromagnetism 1, 75–86 (1999).
INTRODUCTION All imaging systems map some spectroscopic property of the imaged object onto a detector. In a large fraction of imaging modalities, the imaged object directly or indirectly emits, reflects, transmits, or scatters electromagnetic (EM) radiation. Such is the case, for example, in radar and remote sensing systems, radio astronomy, optical microscopy, X-ray and positron-emission tomography, fluorescence imaging, magnetic resonance imaging, and telescope-based systems, to name but a few areas. This chapter addresses the following broad topics: 1. The basic nature and associated properties common to all types of EM radiation (1–3) 2. How radiation from different parts of the electromagnetic spectrum is generated (1,2,4) 3. The dispersion, absorption, and scattering of radiation in bulk material media (2,4–7) 4. Reflection and transmission at a material interface (2,4–7) 5. Interference and diffraction by collections of apertures or obstacles (2,4,7,8) 6. The mechanisms responsible for emitting, absorbing, and scattering EM radiation at the atomic, molecular, and nuclear levels (9–13) This treatment in no way attempts to compete with the many fine comprehensive and detailed references available on each of these a subjects. Instead, the aim is just to highlight some of the central facts, formulas, and ideas important for understanding the character of electromagnetic radiation and the way it interacts with matter. Historical Background — Wave versus Particle Behavior of Light A fundamental paradox in understanding the nature of electromagnetic radiation is that it exhibits both wave-like
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
and particle-like properties. Visible light, which we now know is one form of electromagnetic radiation, exhibits this dual behavior. Before the nineteenth century, the scientific community vacillated between one picture of light as a continuous wave that propagates through some all-pervasive, undetectable medium (often referred to as the luminiferous ether) and another picture where light was envisioned as a stream of localized particles (or corpuscles). The wave picture emerged in the forefront shortly after 1800 when the great British scientist, Thomas Young, and others such as Augustin Jean Fresnel in France, successfully explained a variety of interference and diffraction phenomena associated with light. In 1865, James Clerk Maxwell put forth a solid theoretical foundation for the wave theory of light when he clearly showed that the fundamental equations of electricity and magnetism predict the existence of electromagnetic waves (apparently through the ether medium) that propagate at a speed very close to the experimentally measured speed of light. In addition to providing the basis for electromagnetic waves in the visible spectrum, Maxwell’s equations also accounted for EM waves at lower and higher frequencies, namely, previously detected infrared and ultraviolet radiation. In 1888, when Heinrich Hertz published experimental verification of long wavelength (radio-frequency) EM waves, it became apparent that electromagnetic waves span a wide range of frequencies. An understanding of the truly unique character of light and other electromagnetic waves was not understood until around the turn of the century when scientists and mathematicians began to ponder the controversial experimental results published in 1881 and 1887 by the two American physicists, Albert Michelson and Edward Morley. In attempts to detect the motion of the earth through the luminiferous ether, they concluded, based on the high precision of their measurements, a null result — that is, the presence of the ether was undetectable. Around 1900, the hypothesis of the existence of an ether medium began to come into question. The ether hypothesis was completely rejected by Albert Einstein in the introduction of his Theory of Special Relativity in 1905. Around this time, it became clear that an electromagnetic wave is unlike any other wave encountered in nature in that it is the self-sustaining propagation of an oscillating electromagnetic field that can travel through vacuum without the need for a supporting physical medium. A wealth of phenomena that involve the interaction of light and matter are well-explained by the wave character of EM radiation. These include interference and diffraction phenomena, the reflection and refraction of light at an interface, trends in the optical dispersion of materials, and certain scattering and polarization effects. However, many important and even commonplace phenomena exist that are left unexplained by treating light and other electromagnetic radiation as a wave. In those cases, any paradoxes that arise can be resolved by treating light as a stream of discrete particles, or quanta. Historically, this concept, which apparently contradicts the continuous wave picture, is an outgrowth of ideas put forth by the German physicist, Max Planck,
211
in the year 1900. Until then, the observed frequency spectrum of thermal radiation from a hot, glowing solid, also known as blackbody radiation, was unexplained. Previous calculations of the spectral distribution led to the so-called ultraviolet catastrophe, in which the theory predicted, to a severe extent, an excessive amount of radiation emitted at high frequencies. Planck successfully resolved this dilemma by postulating that the walls of a blackbody radiator are composed of oscillators (i.e., atoms) that emit and absorb radiation only in discrete, or quantized, energy units. Only a few years later, Einstein explained the puzzling observations associated with the emission of electrons from metal surfaces exposed to light, a phenomenon known as the photoelectric effect, by similarly quantizing the electromagnetic radiation field itself. Eventually, the name photon was coined to denote a single unit, or quantum, of the radiation field. The discrete, particle-like behavior of light carries over to all regions of the electromagnetic spectrum. A striking illustration of this is the scattering of X rays by free electrons. In 1922, experiments by Arthur Compton, an American physicist, demonstrated that the wavelength shift exhibited by X rays upon scattering is solely a function of the scattering angle. A wave theory fails to predict this result. The only way to account for the experimental outcome correctly is to assume that a particle-like collision takes place between each incoming X-ray photon and a free electron (see section on Compton scattering by a free electron). Even though the wave and particle pictures (also sometimes referred to as the classical and quantum pictures, respectively) of light may appear contradictory, they are not. It is just that different types of experimental conditions elicit either the wave character or the particle character of electromagnetic radiation. For example, interference and diffraction experiments bring wave properties to the forefront, whereas counting experiments focus on particle behavior. Neither picture by itself is adequate to describe all features of radiation and the way it interacts with atoms, molecules, and bulk media. Rather, the various phenomena are understood by treating the wave and particle properties as complementary. So before addressing the various specific interactions between radiation and matter, the next two sections will present a quantitative, detailed, description of the wave and particle models. Classical Wave Description of Electromagnetic Waves in Vacuum Maxwell’s Prediction of EM Waves By the 1830s, it was believed that the behavior of electric and magnetic fields and their relationships to charges and electric currents were well understood. Experimental work performed by Coulomb, Biot and Savart, Ampere, and Faraday culminated in a set of four equations for the fields. Embodied in these equations is the observation that electric fields arise because of the presence of electric charge (Gauss’s law) or because of the existence of timevarying magnetic fields (Faraday’s law). Magnetic fields, on the other hand, arise because of the presence of moving charge, or electric currents (Ampere’s law). Unlike electric field lines that originate and terminate on charges,
212
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
magnetic field lines form loops; they have no starting or ending points (Gauss’s law for magnetism). Maxwell’s contribution was recognizing an inherent inconsistency in the field equations which he resolved by modifying Ampere’s law. He deduced that magnetic fields arise because of the presence of currents and also because of time-varying electric fields as well. In other words, a changing electric field basically acts like an effective current — it is referred to as a displacement current. The four field equations, where Ampere’s law is modified to include the addition of a displacementcurrent term, are the famous Maxwell’s equations. Without Maxwell’s contribution, electric/magnetic fields cannot propagate as waves. On the other hand, the effect of introducing Maxwell’s displacement current is essentially this: According to Faraday’s law, an oscillating magnetic field gives rise to an oscillating electric field. Because the electric field is time-varying, there is an associated displacement current which, in turn, gives rise to an oscillating magnetic field. Then, the sequence repeats over and over, and the coupled electric and magnetic oscillations propagate through space as a self-sustaining electromagnetic wave. Some of the basic mathematics and resulting wave characteristics are presented here. The Wave Equation and Properties of Plane Waves in Vacuum In vacuum, where no charges or currents exist, Maxwell’s equations can be reduced to a single vector equation for the electric field E, which is a function of position and time t: ∂ 2E (1) ∇ 2 E = ε0 µ0 2 . ∂t Similarly, the magnetic-field vector also follows this same equation. Here, ∇ 2 is the Laplacian operator which, in rectangular coordinates, is given by (∂ 2 /∂x2 ) + (∂ 2 /∂y2 ) + (∂ 2 /∂z2 ). The constants ε0 and µ0 are the electric permittivity and magnetic permeability of free (i.e., empty) space, respectively. They are given by ε0 = 8.854 × 10−12 C2 /N·m2 and µ0 = 4π × 10−7 N·s2 /C2 . Equation (1) is the standard, classical wave equation, and its basic solutions are waves that travel in three dimensions. The speed of these waves is given by 1 = 3.00 × 108 m/s. c= √ ε0 µ0
one says that the wave is linearly polarized along the x direction (other types of polarization will be discussed in the section on Polarization). The magnetic field vector B propagates in tandem and in phase with the electric field. For an EM plane wave in vacuum, Maxwell’s equations demand that, at any moment, the ratio of the field magnitudes is given by the speed of the wave: E = c. B
(3)
Furthermore, the vectors E and B are mutually orthogonal, as well as orthogonal to the direction of wave propagation. The fact that the oscillation of the field vectors is perpendicular to the direction in which the wave travels means that electromagnetic radiation propagates as a transverse wave. Although the diagram in Fig. 1 is useful for illustrating some of the features of the wave, it fails to convey why it is labeled a ‘‘plane wave.’’ From the picture, one gets the false impression that the field vectors are restricted to the z axis. The correct way to visualize the situation is to imagine that the direction of propagation is normal to a series of parallel, infinite, planar surfaces, or wave fronts, at various values of z. At any moment, the electric and magnetic field vectors are the same at every point on a given plane, that is, for the wave in Fig. 1, the fields are independent of the x and y coordinates. The electric field of a linearly polarized plane wave that propagates in the +z direction can be represented by the form (4) E(z, t) = E0 cos(kz − ωt + φ). The direction of E0 specifies the direction of the EM wave’s polarization, whereas the magnitude gives the amplitude of the wave. The cosine function gives the wave its simple oscillatory, or so-called harmonic, character. k and ω are parameters related to the wave’s characteristic wavelength λ and period T. A wavelength represents the repeat distance associated with the wave, in other words, the distance between two adjacent peaks (or troughs) at a fixed instant (see Fig. 2). A period, on the other hand, is the time it takes for one complete wavelength of the disturbance to pass a given point in space. Its inverse ν is
(2)
This is precisely the measured speed of light in vacuum, which confirms that light is a propagating electromagnetic wave. The most basic form of a traveling wave that satisfies the wave equation is that of a harmonic plane wave. A plane wave is often used to approximate a collimated, monochromatic light beam. In addition, the fact that the wave equation is linear allows one to add, or superimpose, various plane waves to construct more complicated solutions. Figure 1 depicts a snapshot of a plane electromagnetic wave traveling in the +z direction. The particular wave shown here is one where the electric field vector oscillates parallel to the x axis. In this case,
x E
+z
B y Figure 1. A linearly polarized electromagnetic plane wave propagating in the +z-direction. The electric and magnetic field vectors, E and B, are perpendicular to each other and to the direction of wave propagation.
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
Fixed time
z λ
C
Direction of propagation
E0
k Distance
Electric field
213
r
y
O
Fixed Position
T
x
Figure 3. One of the wave fronts of a plane wave that propagates with wave vector k. Notice that k is normal to the planar wave front and that the product k · r is fixed for all points on the given front.
Time
Figure 2. The wavelength λ and the period T of a harmonic wave of amplitude E0 that travels at speed c.
the frequency of the wave, ν=
1 , T
(5)
which is measured in units of cycles/s, or hertz (Hz). The wavelength and frequency are related simply through the speed c, of the wave, by λν = c.
(6)
The parameter k in Eq. (4) is called the wave number, and is inversely proportional to the wavelength: k=
2π . λ
(7)
ω represents the angular frequency of the wave, given by 2π . ω = 2π ν = T
Wave front (k • r = constant)
(8)
From the previous definitions of k and ω, Eq. (6) can also be written as ω = c. (9) k The entire argument of the cosine function in Eq. (4) is called the phase of the wave. At any fixed time, all points in space that have the same value of the phase correspond to a wave front — in the present discussion, they are the previously mentioned planar surfaces. Because the rate at which the wave propagates is equivalent to the speed of passing wave fronts (having constant phase), the speed of the wave is often referred to as the phase velocity. The parameter φ in Eq. (4) is the wave’s initial phase, and it simply gives the value of the field when both z and t vanish. Clearly, our assumption that the plane wave travels along the z axis is completely arbitrary and unnecessarily
restrictive. Mathematically, the way to specify a general plane wave is first to define a wave vector k. This is a vector that points in the direction in which the wave propagates and has a magnitude equal to the wave number. Then, the wave vector would be normal to any chosen planar wave front, and referring to Fig. 3, one sees that all points on such a wave front satisfy the condition k · r = const (r denotes the position vector from the origin to any point in space). Then, the appropriate form for the plane wave becomes (10) E(r, t) = E0 cos(k · r − ωt + φ). Energy and Momentum Transport by EM Waves Electric and magnetic fields store energy. The energy per unit volume, or the energy density (u), contained in any E and/or B fields at a point in space is given by u=
1 2 1 ε 0 E2 + B . 2 µ0
(11)
A by-product of Maxwell’s classical theory is the result that this energy can flow only from one location to another if, simultaneously, there is an electric and magnetic field in the same region of space. Specifically, the flow of energy is determined by the cross-product between the two fields. In vacuum, the flow is given by S=
1 E × B, µ0
(12)
where S is called the Poynting vector and its direction gives the direction of energy flow. The magnitude of the Poynting vector corresponds to the instantaneous rate of energy flow per unit area, or equivalently, the power per unit area. Its corresponding SI units are watts per square meter (W/m2 ). For an electromagnetic plane wave, one can see that using a right-hand rule for the direction of the crossproduct in Eq. (12) does, in fact, give the expected direction for the energy flow, namely, in the direction of the wave propagation (see Fig. 1). From Eqs. (2), (3) and (10), the
214
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
magnitude of this flow becomes S=
1 2 1 EB = E µ0 µ0 c
= cε0 E20 cos2 (k · r − ωt + φ).
(13)
The cosine function oscillates very rapidly (for example, optical frequencies are on the order of 1015 Hz) and the squared cosine oscillates at twice that frequency. In most cases, the rapid variations cannot be resolved by any radiation detector one might use; hence the Poynting vector is usually averaged over many oscillations. Because the squared cosine function oscillates symmetrically between the values of zero and one, it is simply replaced by the value 1/2. A number of names are used for the magnitude of the averaged Poynting vector S — they include irradiance, intensity, and flux density. We will use either irradiance or intensity, and denote the quantity by I. So, for a plane EM wave, I ≡ S = 12 cε0 E20 .
(14)
The most important result is that the irradiance is proportional to the square of the field amplitude. Classical electromagnetic waves also carry momentum. One way to see this is to consider what occurs when a plane wave strikes a charged particle, say, an electron. At any given instant, the negatively charged electron will be pushed opposite to the direction of the E field. Then, the acquired velocity of the charge is directed perpendicular to the wave’s B field. However, because any moving charge in a magnetic field experiences a lateral force perpendicular to both the particle’s velocity and the B field, the particle is pushed in the direction of the wave propagation. Consequently, the wave imparts momentum to the particle along the direction of the wave vector. The momentum per unit volume, or the momentum density (g), carried by the original wave is proportional to the Poynting vector and is given by g=
1 S. c2
(15)
The transfer of momentum from an EM wave to a material surface is responsible for the phenomenon of radiation pressure. For example, a plane wave that strikes a perfectly absorbing surface at normal incidence imparts a radiation pressure of P=
1 S. c
(16)
Hence, from Eq. (14), the average pressure is P = 12 ε0 E20 .
Figure 4. Spherical wave fronts that emanate from a point source.
constant phase) are spheres centered on the source. As time goes on, these spheres expand outward from the source, as shown in Fig. 4. Assuming that the power that emanates from the source is constant in time, the energy flow (per unit time) carried by the various wave fronts must be the same. Because the area of a wave front is proportional to the square of its radius r, the irradiance, which is the rate of energy flow per unit area, must be proportional to 1/r2 . This is known as the inverse square law. Because irradiance is proportional to E2 , it follows that the field amplitude falls off proportionally to 1/r. Quantum Nature of Radiation and Matter Fundamental Properties of Photons The framework for treating light and other electromagnetic radiation as discrete particles was proposed by Einstein in 1905. In this picture, the transport of radiant energy occurs via photons. A photon is a quantum, or excitation, of the radiation field. It has zero rest-mass and propagates at the speed of light. Each photon carries energy E that is determined solely by the frequency or wavelength of the radiation according to E = hν =
hc , λ
(18)
where h is Planck’s constant, which has the experimentally determined value 6.6261 × 10−34 J·s. Equivalently, from Eqs. (8) and (9), one can express the photon energy in terms of the radiation field’s angular frequency or wave number: E=h (19) ¯ω=h ¯ ck, where h ¯ = h/2π = 1.0546 × 10−34 J · s = 6.583 × 10−16 eV·s (note: h ¯ is called h-bar). A photon also carries a linear momentum p whose magnitude is p=
hν h E = = =h ¯ k. c c λ
(20)
(17)
Then, the vector momentum is directly related to the wavevector k via p=h (21) ¯ k.
When a small, point-like source emits radiation isotropically in all directions, the wave fronts (i.e., surfaces of
At high frequency or wave number, the momentum of a photon is large, and the radiation exhibits behavior that is
Spherical Waves
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
the most particle-like. X rays and gamma rays fall in this category. In the photon picture, intensity or irradiance is determined by the number of photons carried by the illumination. More specifically, it is the product of the photon flux, defined as the number of photons per unit time per unit area and the energy of a single photon. This means that at lower frequencies, where the photon energy is smaller, there are more photons in a beam of a given intensity. According to a basic tenet of quantum mechanics known as the correspondence principle, the larger the number of quanta, the more classical-like the behavior. Hence, low-frequency radio and microwave radiation has a general tendency to be more wave-like in its behavior. However, under certain conditions, radiation at any frequency whatsoever can display wave or particle characteristics. Energy Levels and Transitions in Atomic, Molecular, and Nuclear Systems The idea of photon energy is closely related to the scale of energies encountered in atomic, molecular, and nuclear systems. The rules that govern the properties of these submicroscopic systems are dictated by the laws of quantum mechanics (1,12,15). The framework of quantum theory, most of which was developed during the 1920s, predicts that the basic building blocks of matter, namely, electrons, protons, and neutrons, bind together so that the total energy of an atom, molecule, or atomic nucleus may only take on certain well-defined, discrete values. A transition from one energy level to another is accompanied by the absorption or emission of a photon.
Atomic Transitions and Line Spectra. Figure 5 illustrates an electron transition in an atom. To raise an atom from a low-energy level to a higher level, a photon must be absorbed, and its energy must match the energy difference between the initial and final atomic states. When an atom deexcites to a lower energy level, it must emit a sequence of one or more photons of the proper energy. Atomic transitions are characterized by photon energies that range from a few eV to the order of keV, depending on the type of atom and the particular transition involved. The frequency of the absorbed or emitted radiation is determined solely by photon energy (i.e., E = hν), which in turn is determined by the allowed transitions of atomic electrons. As a result, only certain select frequencies appear in atomic absorption and emission spectra; this gives rise to the observed characteristic line spectra of atoms. Nuclear Transitions. Protons and neutrons bind within atomic nuclei at certain discrete, quantized values of energy. Nuclear levels and photon energies are orders of magnitude larger than those of atomic transitions. A rough estimate of the energies involved can be calculated from the basic Heisenberg uncertainty principle which states that the product of the uncertainties in a particle’s position and its linear momentum must be at least as large as h ¯ /2: xp ≥ h (22) ¯ /2.
215
(a) Allowed electron orbitals
e +
hn
e
(b)
hn +
∆Eatom = hn (c)
hn1 hn2
e +
hn1 + hn2 = hn2 = −∆Eatom Figure 5. Electron transitions in an atom. (a) A photon of energy hν incident on an atom in a low-energy state. (b) If the photon energy matches the energy difference Eatom between two electronic states of the atom, the photon is absorbed, and the atom is raised to an excited state. (c) An atom can deexcite to a lower energy state by emitting a photon. The atom shown here deexcites by undergoing two transitions in rapid sequence.
A given nucleon (i.e., proton or neutron) is essentially localized within the confines of the nucleus, or x ∼ 10−6 nm. The uncertainty in the nucleon’s momentum √ can be calculated from p = p2 = 2mE, where m = 1.67 × 10−27 kg is the mass of the proton or neutron and E is its (kinetic) energy. Then, the uncertainty relationship predicts that
E≥ ≥
2 h ¯ 8m(x)2
(hc)2 , 32π 2 (mc2 )(x)2
(23)
where the second form is particularly convenient for calculation. This gives a rough estimate of the lowest possible quantum energy, or the ground-state energy, of a typical nucleus. Using hc = 1240 eV·nm and mc2 ≈ 940 MeV, we see that the ground-state energy is on the order of 5 MeV. The spacings between nuclear levels are also typically in the MeV range or greater, and so are the corresponding photon energies. These energies are huge compared to those of atomic electrons.
216
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
Vibrational and Rotational States in Molecules. Electron transitions in molecules exhibit energy spacings that are similar to those in individual atoms; they are on the order of eV to keV. However, molecules are more complicated than atoms due to vibrational and rotational motions of the constituent atomic nuclei (14). As an example, consider a diatomic molecule that consists of two atomic nuclei (each of mass M) and the various orbiting electrons. Even though this is the simplest type of molecule, it represents a complex many-body system that has many degrees of freedom. Specifically, each of the two atomic nuclei has three degrees of freedom (corresponding to the three coordinate directions), and each electron in the molecule has the same. At face value, the problems connected with finding the energy states in even a light diatomic molecule appear almost insurmountable. Fortunately, however, a clever method exists that enables one to determine molecular energy states; it is known as the Born–Oppenheimer approximation (16,17). Here, one takes advantage of the fact that nuclei are much more massive than electrons — thus, they move much more slowly than the electrons in a molecule. As a first approximation, therefore, the nuclei can be regarded as having a fixed separation R in space. For each such separation, one then determines the possible values of the total electronic energy ε of the molecule, that is, there is an ε versus R curve associated with each possible electronic energy level. A central assertion of the Born–Oppenheimer approximation is that, for each electronic level, these same ε versus R curves also play the role of the internuclear potential energy function V(R). Typically, each V(R) curve has a rather sharp minimum, as illustrated in Fig. 6. Consequently, the two nuclei have a highly preferred separation at some R0 . The constant V(R0 ) corresponds to the binding energy of the molecule when it is in the electronic state under consideration. The nuclear motions may be neglected as far as the basic structure of the molecule is concerned. However, to analyze photon spectra, it is important to consider the nuclear degrees of freedom; they give rise to closely spaced vibrational and rotational states that accompany each electronic level.
For a given electronic state, the characteristics of the associated vibrational levels are obtained by approximating the V(R) curve in the vicinity of its minimum by a simple quadratic function. This is the characteristic potential function for a simple harmonic oscillator, and has the form V(R) ≈ V(R0 ) + 12 Mω2 (R − R0 )2 .
Put another way, the two nuclear masses have equilibrium separation R0 and vibrate relative to one another at some characteristic frequency ω. This vibrational frequency is determined by the curvature, or sharpness, of the V(R) curve. Given the frequency, the vibrational amplitude is determined by the vibrational energy of the molecule. A quantum-mechanical treatment of the harmonic oscillator shows that the vibrational energy levels are discrete and are equally spaced by separation h ¯ ω (see Fig. 7). The spacing of vibrational levels is on the order of 100 times smaller than that of the electronic levels. This gives a corresponding vibrational amplitude on the order of only one-tenth of the equilibrium separation R0 . This means that diatomic molecules are rather stiff and one is justified in treating the rotations of the molecule independently. At a first approximation, the rotational energy levels of a molecule correspond to that of a rigid rotator that rotates about its center of mass (see Fig. 8a). According to quantum theory, the angular momentum vector L of this (or any other) system is quantized. This leads to the fact that the molecule’s rotational kinetic energy is quantized as well. The allowed energy levels are given by E =
2 ( + 1)h ¯ MR20
V(R)
(25)
and are indexed by the orbital angular momentum quantum number . The possible values of are = 0, 1, 2, 3, . . . .
(26)
In addition, there are 2 + 1 rotational states for each value of (or value of the energy). Each of these states has the same magnitude of orbital angular momentum but a different value for its z component. The possible values for the z component of the angular momentum vector are Lz = mh ¯,
R0
(24)
(27)
E2 = 5hw / 2 R
hw E1 = 3hw / 2 hw E0 = hw / 2 E=0
Figure 6. Internuclear potential energy function V(R) for a diatomic molecule. R0 is the equilibrium separation between the two nuclei of the molecule.
Figure 7. The vibrational energy levels of a diatomic molecule are equally spaced. The spacing is simply h ¯ ω, where ω is the characteristic vibrational frequency of the molecule.
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
Let S represent the spin of a particle. Because it represents a type of angular momentum, quantum mechanically intrinsic spin can be treated in basically the same way as orbital angular momentum L. One can define a spin angular momentum quantum number s. For the spin angular momentum, it turns out that the possible values of s are (29) s = 0, 12 , 1, 32 , 2, . . . .
(a)
R0/2
M
R0/2
M
Fixed midpoint (b)
l
El
m
3
6h2/I
+3 +2 +1 0 −1 −2 −3
2
3h2/I
+2 +1 0 −1 −2
1
h2/I
+1 0 −1
0
0
0
Figure 8. Rotations of a diatomic molecule. (a) Model of the molecule as a rigid rotator that rotates about its center of mass. (b) Rotational energy levels, E . Each energy level has (2 + 1) distinct quantum states. Here, I = MR20 /2 is the moment of inertia of the molecule about its center.
where m is called the magnetic quantum number. Its (2 + 1) possible values are m = −, −( − 1), · · · , 0, · · · , +( − 1), +.
217
(28)
Because a number of different quantum states are associated with a given energy level E , one says that each energy level is (2 + 1)-fold degenerate. Figure 8b shows the energy-level diagram for the rigid rotator. The 2 spacing of energy levels, which is on the order of h ¯ /MR20 , is typically about 100 times smaller than the spacing between the vibrational levels of a molecule (or some 10,000 times smaller than the spacing between electronic levels).
Magnetic Splitting of Spin States. Electrons and atomic nuclei possess a magnetic dipole moment m. When these particles are placed in an external magnetic field, the vector m tends to align with the field lines. In effect, electrons and nuclei behave like submicroscopic bar magnets, where m points along the axis of the magnet (from the south to the north pole) and |m| is a measure of the magnet’s strength. The magnitude of the magnetic dipole moment (or magnetic moment, for short) is an unalterable fundamental property of the electron, proton, and neutron and is proportional to the quantum spin, or intrinsic angular momentum, of each type of particle. In an atomic nucleus, the individual spins and orbital angular momenta of the various nucleons combine to produce a net angular momentum. Nevertheless, one still speaks of a net nuclear spin and its corresponding magnetic moment.
For a given value of s, there are 2s + 1 spin states; each has the same magnitude of spin angular momentum, but each has a different value for its z component. The possible values for the z component of the spin angular momentum vector are (30) Sz = mh ¯, where, as before, m is the magnetic quantum number, but now its possible values are m = −s, −(s − 1), · · · , 0, · · · , +(s − 1), +s.
(31)
In the absence of an external magnetic field, there is no difference between the energies of spin states that have different values of m and the same value of s. Hence, each value of s is (2s + 1)-fold degenerate. By turning on a magnetic field, the different m states split into distinct energy levels; they become nondegenerate states. This is called the Zeeman effect and comes about because there is an interaction between the field B and the particle’s magnetic moment m. If the magnetic field is uniform and points in the +z direction so ˆ the classical interactive energy is given by that m = B0 k, E = −µz B0 .
(32)
This basically says that as the moment becomes more and more aligned with the field, the energy becomes lower and lower. As mentioned previously, the magnetic moment is directly proportional to the particle’s spin: m = ±γ S.
(33)
The proportionality constant γ is called the gyromagnetic ratio of the particle, and its value depends on the type of particle. The upper sign is used for nuclei and the lower sign is used for electrons. Combining Eq. (33) with Eq. (32) and then substituting in Eq. (30) gives E = ∓γ B0 Sz = ∓mh ¯ γ B0 .
(34)
This result shows that for a particle whose spin quantum number is s, the degeneracy is split into 2s + 1 energy levels (or Zeeman states) that are equally spaced by an amount (35) E = h ¯ γ B0 . The spin quantum number of an electron, is always s = 1/2, so the magnetic quantum number m is either +1/2 or −1/2. As a result, when an electron is placed in a magnetic field, there are only two possible energy
218
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
states. The lower energy level is the m = −1/2 state, and the upper level corresponds to m = +1/2. The spacing between the two levels is obtained from Eq. (35). Given that the gyromagnetic ratio of an electron is γ = 1.76 × 1011 T−1 ·s−1 , each tesla of applied field produces a splitting of 1.16 × 10−4 eV. Atomic nuclei have spins restricted to half or whole integers and can take on a variety of values. So, when a nucleus is placed in an external field, a number of possible energy states can appear. For nuclei, the lower energy levels correspond to the higher values of m, whereas the upper energy levels correspond to lower m values. The value of γ depends on the specific nuclear species in question. However, to get a sense of the spacing between levels, consider the simplest nucleus, namely a proton (or hydrogen nucleus), which has a spin of 1/2. This produces two energy levels. The gyromagnetic ratio is 2.67 × 108 T−1 ·s−1 , and each tesla of field splits the levels by only 1.76 × 10−7 eV, or about three orders of magnitude less than the splitting of electron spin states. Regions of the Electromagnetic Spectrum The energy levels and transitions just described give rise to an enormous range of photon energies that span many orders of magnitude. This, in turn, represents a correspondingly wide range of wavelengths (or frequencies). Figure 9 displays the entire gamut of the EM spectrum. Even though it forms a continuum, the spectrum is partitioned (somewhat artificially) into the following regions: radio waves, microwaves, infrared radiation, visible light, ultraviolet radiation, X rays, and gamma rays. The divisions between adjacent regions of the spectrum are not abrupt — rather, they are fuzzy and allow some degree of overlap between one region and the next. The characteristics of the different spectral regimes are presented here:
10−12
Gamma rays
1020
X rays
103
10−8
1016
Ultraviolet Visible 10−4
106
1
Infrared 1012
10−3
Microwaves 1 Radio waves
108
10−6
10−9 104 Wavelength (m)
Frequency (Hz)
Photon(eV) energy
Figure 9. The electromagnetic spectrum.
Radio Waves Radio waves have wavelengths (in vacuum) that exceed about 10 cm and have frequencies typically in the MHz region. The corresponding photon energies are about 10−5 eV or less. These are too small to be associated with electronic, molecular, or nuclear transitions. However, the energies are in the right range for transitions between nuclear spin states (nuclear magnetic resonance) and are the basis for magnetic resonance imaging. Radio waves are used in television and radio broadcasting systems and are generated by oscillating charges in antennas and electric circuits (see section on Electric dipole radiation). Microwaves The wavelengths of microwaves are in roughly the 30-cm to 1-mm range, and have frequencies of the order of GHz. Microwaves correspond to photon energies typically between about 10−6 and 10−3 eV. These energies correspond to transitions between rotational states of molecules, as well as transitions between electron spin states (electron spin resonance). Microwaves are easily transmitted through clouds and the atmosphere, making them appropriate for radar systems, communication, and satellite imaging. In radio astronomy, a particularly prevalent source of microwaves is the atomic hydrogen that is found in great abundance throughout various regions of space. A 21-cm spectral line is observed when hydrogen is present in large enough quantities. The origin of this line is that the electron in hydrogen, when considered in its own frame of reference, sees a circulating proton, which acts like a current loop. From Ampere’s law, any current loop produces a B field. As a result, the magnetic moment of the spinning electron is immersed in a magnetic field, and its spin states are split (Zeeman effect). The splitting is known as the fine structure of hydrogen, and astronomers find that the corresponding microwave emission provides a great deal of information about the structure of galaxies. Infrared Radiation This region of the spectrum is subdivided into the far IR (λ ∼ = 1 mm → 50 µm), mid IR (50 → 2.5 µm), and the near IR (2.5 → 0.78 µm). Frequencies range from ∼3 × 1011 Hz in the far-IR region to ∼4 × 1014 Hz just near the red end of the visible spectrum. The corresponding values of photon energy are on the order of 10−3 eV up to a few eV. Infrared (IR) radiation is generally associated with the thermal motion of molecules. The far IR is related mostly to molecular rotations and low-frequency vibrations. Absorption and emission in the mid-IR region is caused primarily by vibrational transitions. The near IR is associated with molecular vibrations and certain electronic transitions. By virtue of their finite temperature, all bodies emit (and absorb) in the infrared. Most heat transfer by radiation occurs in the IR region of the spectrum. In general, the warmer the object, the more IR radiation that is emitted from its surface (see section on Thermal sources). Because of this, variations in the surface temperature of objects can easily be imaged by using
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
infrared detectors. Many areas of imaging technology, including thermal imaging, remote sensing of the earth’s surface, and certain medical imaging techniques, are based on IR emission and its detection. Visible Light The human visual system responds to the narrow range of frequencies between about 3.8 × 1014 Hz and 7.9 × 1014 Hz that is referred to as visible light. The wavelengths of light in vacuum, range from about 780 nm at the extreme reddish end of the visible spectrum to about 380 nm at the violet extreme. The acronym ROY-G-BIV is useful for remembering the sequence of colors perceived as frequency increases; red, orange, yellow, green, blue, indigo, and violet. Table 1 lists the range of frequencies and vacuum wavelengths that correspond to each perceived color. The table is only a general guide because individual perceptions vary somewhat. Light that is approximately an equal mixture of the various frequencies in the visible spectrum appears as white light. The color response of the human eye and brain to light is rather complex, especially when presented with a mixture of frequencies. For example, light that contains a 50–50 mixture of red light and green light appears yellow, even though no yellow frequency component is present. In addition, certain pairs of frequencies evoke the same response in the visual system as that produced by broadband white light. The perception of whether an object is white or not is also context sensitive. For example, the processing by the human visual system is such that an object may be interpreted as white, even when observed under totally different illumination conditions, say, when viewed under the light of a fluorescent lamp or under bright sunlight. Photon energies in the visible region are between 1.6 eV (red) and 3.2 eV (violet). These energies are generally associated with the excitation and deexcitation of outer electrons in atoms and molecules. Ultraviolet Radiation The ultraviolet (UV) region of the spectrum is subdivided into the near UV (λ ∼ = 380 nm → 200 nm) and the far UV (200 nm → 10 nm). Frequencies range from ∼8 × 1014 Hz just above the visible spectrum to ∼3 × 1016 Hz at the extreme end of the far UV. Photon energies which range between about 3 and 100 eV are of the order of magnitude of ionization energies and molecular dissociation energies
Table 1. Frequencies and Vacuum Wavelengths for Colors in the Visible Spectrum Color Red Orange Yellow Green Blue Violet
Frequencies (×1014 Hz)
Wavelengths (nm)
3.85–4.84 4.84–5.04 5.04–5.22 5.22–6.06 6.06–6.59 6.59–7.89
780–620 620–595 595–575 575–495 495–455 455–380
219
of many chemical reactions. This accounts for various chemical effects triggered by ultraviolet radiation. The sun is a powerful source of ultraviolet radiation and is responsible for the high degree of ionization of the upper atmosphere — hence, the name ionosphere. Solar radiation in the ultraviolet region is often subdivided into UVA (320–380 nm), UV-B (280–320 nm), and UV-C (below 280 nm). The reason for the different labels is that each region produces different biological effects. UV-A radiation is just outside the visible spectrum and is generally not harmful to tissue. UV-B radiation, on the other hand, can lead to significant cell damage. The effects of UV-C can be worse still, even potentially lethal. However, gases in the earth’s upper atmosphere absorb radiation in this region. The gas most effective for absorbing radiation is ozone (O3 ), which has led to the recent concern over depletion of the earth’s ozone layer. X Rays This region of the EM spectrum extends from wavelengths ˚ or of about 10 nm down to wavelengths of about 0.1 A, frequencies in the range 3 × 1016 –3 × 1019 Hz. Photon energies run between approximately 100 eV to the order of a few hundred keV. The lower end of the energy range is often referred to as the soft X-ray region, and the highenergy extreme is called the hard X-ray region. X rays are associated with transitions that involve the inner, most tightly bound, atomic electrons. If an atom is ionized by removing a core electron, an outer electron will cascade down to fill the vacancy that was left behind. This transition is accompanied by the emission of an X-ray photon (so-called X-ray fluorescence). The energy spectrum of X rays emitted by this process is specific to the particular atomic species involved. In effect, this provides a fingerprint of the different elements. X rays produced in this fashion are called characteristic X rays. A common method for producing X rays is to accelerate a beam of electrons to high speed and then have the beam strike a metallic target. This is the method for producing X rays in commercial X-ray tubes. When the energetic electrons rapidly decelerate in the target, a broad, continuous spectrum of radiation known as bremsstrahlung, which consists primarily of X rays, is generated (see section on Bremsstrahlung). Superimposed on this continuous spectrum are the discrete lines of the characteristic X rays of the target atoms. Medical X-ray imaging and computerized tomography (CT) use the difference in the X-ray attenuation of different anatomical structures, especially between bone and soft tissue. X-ray astronomy is concerned with detecting and imaging X rays emitted by distant stars, quasars, and supernovas, as well as other sources, including our sun. Often, the origin of these X rays is the high temperature of the emitting object, which can be millions of degrees. At such extreme temperatures, the corresponding blackbody radiation consists primarily of X rays. Gamma Rays Gamma radiation is nuclear in origin. The wavelengths ˚ and frequencies run from about 0.1 A˚ to less than 10−5 A,
220
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
range from ∼1019 Hz to more than 1023 Hz. Because their photon energies are so large, of the order of keV to GeV, gamma rays are highly penetrating radiation. Gamma-ray (γ -ray) emission accompanies many of the nuclear decay processes of radioactive nuclides. Photons in this regime readily produce ionization and, in some cases, can initiate photonuclear reactions. The most common type of radioactivity is nuclear beta decay (β decay), of which there are two types — β − decay and β + decay. In β − decay, the nucleus emits an electron, which causes the atomic number of the nuclide to increase by one. In β + decay, the nucleus emits a positron (i.e., a positively charged electron), that causes the atomic number to decrease by one. These processes are important for γ -ray imaging in nuclear medicine because gamma rays are almost always a secondary by-product of the β decays that occur in radiopharmaceuticals administered to the patient. One reason for this is that when the original (or parent) nucleus decays by either β − or β + decay, it produces a new (or daughter) nucleus in an excited state. The subsequent nuclear deexcitation is accompanied by one or more gamma-ray transitions. The other reason is that the positron emitted during β + decay always combines with a nearby electron, and the two particles mutually annihilate one another — a process known as electron–positron annihilation. The result is that the charged-particle pair vanishes and two back-to-back 511keV photons are created. In positron-emission tomography (PET), these two photons are simultaneously detected by two oppositely facing detectors. Generation of EM Radiation This section describes the basic principles behind producing electromagnetic radiation and identifies some of the radiative sources commonly found in practice; many are encountered in imaging systems: Electric Dipole Radiation According to classical theory, a charged particle that moves at constant velocity does not radiate any electromagnetic energy. However, should the particle experience an acceleration a, the surrounding field lines undergo rearrangement, and energy is carried away as electromagnetic radiation. As long as the accelerated charge is always moving at speeds much less than the speed of light (i.e., v c), the total power P radiated by such a particle (charge q) is given by q2 a2 . (36) P= 6π ε0 c3 This is known as Larmor’s formula. Probably the most fundamental way to generate electromagnetic waves is to oscillate a charged particle harmonically back and forth along a line, say, the z axis, according to z(t) = z0 cos ωt. This is often referred to as an oscillating electric dipole and is characterized by an instantaneous electric dipole moment of magnitude p, defined as p(t) ≡ q · z(t). (37)
The oscillating particle undergoes an acceleration a(t) ≡ d2 z/dt2 = −z0 ω2 cos ωt; hence it radiates and produces socalled electric dipole radiation. According to Eq. (36), the time-averaged power radiated by the dipole is P =
p20 ω4 , 12π ε0 c3
(38)
where p0 ≡ qz0 is the size of the dipole moment (note: the average value of cos2 ωt is simply 1/2). An important result is that the power radiated is proportional to the oscillatory frequency to the fourth power. Sufficiently far from the dipole (in the so-called far-field or radiation zone), the energy is carried away as an outgoing electromagnetic wave that has the same frequency as the oscillating charge. For a small dipole (i.e., z0 much less than the emission wavelength), the angular distribution of the radiation follows a sin2 θ form, as depicted in Fig. 10, where θ represents the angle from the oscillation axis of the dipole (the z axis). This means that an oscillating electric dipole does not radiate along its axis. On the other hand, the radiated wave is strongest in directions along the dipole’s midplane. Another important fact is that the radiated waves are linearly polarized, and the electric field oscillates in a plane that contains the dipole. The electric-field lines in such a plane are illustrated in Fig. 11. Probably, the most direct practical application of these ideas is the radio-frequency (RF) transmitting antenna. In this case, a long, thin conductor is driven at its midpoint by a sinusoidally varying voltage or current and pushes free electrons up and down at the frequency of the desired radiation. The performance is optimized by using a halfwave antenna, where the length of the antenna is set at one-half the wavelength of the emitted radio waves. For example, AM radio stations transmit at a frequencies near 1 MHz, which gives corresponding wavelengths of a few hundred meters. Therefore, to act as a half-wave antenna, the height of the transmitting tower must be on the order of 100 m. At microwave frequencies (on the order of a GHz or larger), the length of a simple half-wave antenna is in the centimeter to submillimeter range. In many radar and telecommunication applications, this size turns out to be too small for practical purposes and necessitates a number of other designs for antennas and antenna arrays in the
z q
Figure 10. Angular distribution of the far-field radiation emitted by a small electric dipole oscillating along the z axis.
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
221
Speeds of this magnitude correspond to kinetic energies (K) very large compared to the rest energy of an electron (m0 c2 = 511 keV), or γ ≡ K/m0 c2 1. Currently, there are approximately 75 operating or planned storage-ring synchrotron radiation sources worldwide (20). Just under half of them are located in Japan, the United States, and Russia. The most energetic sources produce electron energies up to 7 GeV (or γ ∼104 ). A highly relativistic electron (charge e) that travels in an orbit of radius R radiates a total power of (21) P∼ = γ4
Figure 11. Snapshot of the electric field lines surrounding a small oscillating electric dipole. The dipole, located at the center, is oriented vertically (adapted from Fundamentals of Electromagnetic Phenomena by P. Lorrain, D. R. Corson, and F. Lorrain. Copyright 2000 W. H. Freeman and Company. Used with Permission.)
microwave region, including reflector-type antennas, lens antennas, and microstrip antennas (18), the details of which are outside the scope of this article. The EM radiation emitted by individual atoms, molecules, and quite often, nuclei is predominantly electric dipole radiation. This can be understood by considering, for simplicity, the circular orbit of a single electron in an atom (or proton in a nucleus). First of all, even though the particle maintains constant speed, it accelerates nonetheless because it experiences a centripetal acceleration toward the center of the orbit. Because the charge accelerates, it radiates electromagnetic energy. Secondly, circular motion is actually a superposition of two straight-line harmonic oscillations in orthogonal directions. Consequently, one can basically treat the circular motion of the charged particle as equivalent to an oscillating electric dipole whose angular vibrational frequency ω matches the angular velocity of the circulating charge, and the radiation emitted is electric dipole in character. One might argue that Larmor’s formula and the resulting production of electric dipole radiation are based on classical physics, whereas atoms, molecules, and nuclei are inherently quantum objects. However, except for a few modifications to be discussed later on (see section on Absorption and emission of photons), the results of quantum-mechanical calculations agree essentially with classical predictions. Synchrotron Radiation The synchrotron, a source of intense, broadband radiation, is a large facility used to accelerate electrons up to extremely high speeds and steer them around a circular storage ring at approximately constant speed. Bending magnets are used to maintain the circular trajectory, and the resulting centripetal acceleration causes the electrons to emit so-called synchrotron radiation (19). The electron speeds v attained are highly relativistic, which means that they are very close to the speed of light (v/c ≈ 1).
e2 c . 6π ε0 R2
(39)
At each instant, the synchrotron radiation is preferentially emitted in the direction of the electron’s velocity, tangent to the circular path. The radiation is concentrated within a narrow cone of angular width 1/γ (∼0.1 to 1 mrad). As the electrons circulate around the storage ring, the radiative beam resembles that of a searchlight. To use the radiation, a momentary time slice, or pulse, of this searchlight is sampled through a beam port. Because the duration t of the pulse is very short, the radiation received has a large bandwidth, ω = 2π/t. The resulting frequency spectrum is broad and continuous. An important parameter of the spectrum is the critical frequency, given by ωc = 3cγ 3 /2R. This corresponds to a corresponding critical photon energy Ec = γ 3
3h ¯c . 2R
(40)
Up to the critical value, all frequencies (or photon energies) strongly contribute to the radiated spectrum. However, above the critical value, the radiation spectrum drops off exponentially, and little contribution is made to the spectrum (5). An important property of synchrotron radiation is that, when viewed in the same plane as the circular orbit, it is linearly polarized along a direction parallel to that plane. When viewed above or below the orbit plane, the radiation is elliptically polarized (see section on Types of polarization). Often, it is desired to enhance the X-ray portion of the emission spectrum. In practice, this is accomplished by inserting either a beam undulator or a wiggler magnet. The actions of these devices and their effects on the angular distribution, frequency spectrum, and polarization of synchrotron radiation are discussed in the references. Suffice it to say that the addition of these devices has extended the usable spectrum into the hard X-ray regime (22). Astronomy provides many examples of naturally occurring synchrotron radiation. High-speed charged particles are found in the magnetic fields that permeate various regions of space. When this happens, the particles follow circular or helical trajectories and emit synchrotron radiation. Examples of this include radiation from particles trapped in the earth’s magnetic field, from sunspots, and from certain nebulae.
222
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
Thermal Sources The surface of any object at temperature T emits (and absorbs) thermal energy as electromagnetic radiation. The spectrum has the form 2π hc2 I(λ, T) = ε(λ) λ5
1 ehc/λkB T − 1
,
(41)
where kB = 1.381 × 10−23 J/K is Boltzmann’s constant. I(λ, T) represents the so-called spectral radiant flux density (or spectral radiant exitance), which is the power emitted per unit surface area per unit wavelength. The dimensionless quantity ε(λ) represents the emissivity of the radiating surface and, in general, is a function of wavelength. A surface whose emissivity is unity over all wavelengths is an ideal emitter (and absorber) and is called a blackbody radiator. It produces the Planck blackbody radiation spectrum. A graybody radiator is characterized by a constant emissivity and has a value less than unity. Figure 12 shows the shape of the blackbody spectrum for a few different temperatures. As temperature increases, the spectra peak at shorter and shorter wavelength λmax . This relationship is given quantitatively by the Wien displacement law: λmax T =
hc = 2.90 × 10−3 m · K. 5kB
(42)
For example, at typical ambient temperatures, say ∼300 K, the spectrum peaks in the mid IR at around 10 µm. It is not until a body is heated to 4000–5000 K that the thermal spectrum begins to peak in the visible. As the temperature is increased further, the peak shifts from red to violet and eventually enters the UV region at temperatures of 6000–7000 K. Notice that when the peak falls, for example, in the blue, there is still plenty of radiation emitted at the red, yellow, and green wavelengths. The mixture of different visible wavelengths
produces the glow of white-hot surfaces. At temperatures that exceed 106 –108 K, like those encountered in the plasmas of evolving stars and some nuclear fusion reactors, blackbody radiation peaks well into the X-ray regime. In all, the spectral characteristics of the emitted radiation provide a reliable means for remotely sensing an object’s surface temperature. The effective surface temperatures of the sun (∼5500 K) and of distant stars are determined in this manner. Another clear trend in the spectra is that the flux density integrated over all wavelengths (the total power emitted per unit surface area) increases rapidly with temperature. Specifically, the total blackbody emission is proportional to the absolute temperature to the fourth power: ∞ I(λ, T)dλ = σ T 4 . (43) I(T) = 0
This is known as the Stefan–Boltzmann law. The Stefan–Boltzmann constant σ , has the value 5.67 × 108 W/m2 ·K4 . Optical sources that use radiation emitted by a heated element fall under the heading of incandescent sources. The globar is a source of this type used for the near and mid IR. It consists of a rod of silicon carbide that is joule heated by an electric current. The rod acts like a graybody that has a typical emissivity around 0.9. The temperature can be varied up to about 1000 K and produces usable radiation in the range of about 1–25 µm. Tungsten filament lamps are used to generate light in the visible through mid IR. The tungsten filament acts like a graybody and typically operates at about 3000 K. The filament is in a glass bulb that contains nitrogen or argon gas, which retards the evaporation of the filament. Any evaporation degrades the filament and leaves a darkened tungsten coating on the inside of the bulb, which diminishes light output. Halogen lamps mitigate this problem by the addition of iodine or bromine vapor into the gas. The halogen vapor combines with the tungsten on the bulb surface and forms tungsten iodide or tungsten bromide. The molecules dissociate at the heated filament, where the tungsten is redeposited, and the halogen atoms are recycled back into the surrounding gas.
Spectral flux density [× 107 W/(m2 • µm)]
10 6000 K
5 5000 K
4000 K 3000 K 0 0.0
0.5
1.0 Wavelength (µm)
1.5
Figure 12. Blackbody radiative spectra.
2.0
Electric Discharge Sources. In a discharge lamp, an electric arc is struck between the electrodes at either end of a sealed transparent tube filled with the atoms of a particular gas. The gas atoms become excited and emit radiation whose spectrum is characteristic of the type of gas. Low-pressure tubes operated at low current produce spectral lines that are quite sharp, whereas lamps operated at high pressure and current tend to sacrifice spectral purity in favor of increased output intensity (4). The sodium arc lamp and the mercury discharge tube are two examples of electric discharge sources. Light from a sodium lamp is yellow and arises from the two closely spaced spectral lines (referred to as the sodium doublet) near 589 nm. Mercury vapor produces prominent visible lines in the violet (405 and 436 nm), green (546 nm), and yellow (577 and 579 nm). Fluorescent lamps, commonly found in household and office fixtures, are low-pressure glass discharge tubes that
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
contain mercury vapor. The inside surface of the tube is covered with a phosphor coating. Ultraviolet emission from the excited mercury vapor induces fluorescence (see section on Fluorescence and phosphorescence) in the phosphor, which produces visible light. Lasers. In 1917, Einstein first introduced the principle of stimulated emission of radiation from atoms (see section on Stimulated emission). In the process of stimulated emission, an existing radiation field induces the downward transition of an atom in an excited state. More specifically, when a photon is present whose energy hν matches the transition energy of the excited atom, there is an enhanced probability that the atom will deexcite. Upon doing so, the atom emits a photon that has precisely the same energy (or frequency), phase, polarization, and direction of propagation as the stimulating photon. The chance that an emission of this type occurs is proportional to the number of stimulating photons present. Devices that produce radiation by taking full advantage of this process are called lasers (23,24). The word laser stands for light amplification by the stimulated emission of radiation. Lasers emit an intense, highly monochromatic, collimated beam of optical radiation. They serve as light sources in many imaging applications, including holography, Doppler radar, LIDAR (light detection and ranging), and a number of areas in diagnostic medicine. All lasers consist of three basic components. First, the laser must have an amplifying medium in which stimulated emission occurs. Next, there is an optical resonator, or highly reflective cavity, in which the medium is enclosed. The resonator reflects the emitted photons back and forth between two highly reflective ends. This produces further stimulated emission, and leads to the intense buildup of a coherent electromagnetic field; a portion of the emission passes through one end of the cavity and is used as the light beam. The third component is the laser pump. To understand its purpose, consider the following: Under conditions of thermal equilibrium (temperature T), the amplifying medium contains many more atoms in low-energy states than in those of high energy; the number in any energy state E is proportional to the Boltzmann factor, exp(−E/kB T). Given two energy levels E1 and E2 , the number of atoms N1 and N2 in these two states is in the ratio N2 = e−(E2 −E1 )/kB T . N1
(44)
For transitions in the optical regime where E2 − E1 is around 2–3 eV, the number of atoms that reside in the upper energy level is on the order of some 10−53 to 10−35 less than the number in the lower state for a system at room temperature (∼300 K). Under these circumstances, it is much more likely that photons will be absorbed than undergo stimulated emission. Hence, for stimulated emission to dominate and produce the amplification of a light field, it is first necessary to devise a method for placing more atoms in higher energy states than in lower ones — a condition known as population inversion. The role of the laser pump is to accomplish this inversion. A number
223
Table 2. Properties of Some Different Types of Lasers Type
Wavelength
Comments
Gas Lasers Helium–neon Argon ion Nitrogen Carbon dioxide
632.8 nm 488.0, 514.5 nm 337.1 nm 10.6 µm
Cwa Cw Pulsed Cw, high power
Solid-State Lasers Ruby Nd-YAG Nd-glass
694.3 nm 1.06 µm 1.06 µm
Pulsed, high power Cw Pulsed
(Liquid) Dye Lasers Rhodamine 6G Sodium fluorescein
560–650 nm 520–570 nm
Cw/pulsed, tunable Cw/pulsed, tunable
Semiconductor Laser Diodes (Injection Lasers) GaAs AlGaAs GaInAsP
840 nm 760 nm 1300 nm
Cw/pulsed Cw/pulsed Cw/pulsed
Excimer (Molecular) Lasers Xenon fluoride Argon fluoride
486 nm 193 nm
Pulsed, high power Pulsed, high power
Chemical Lasers Hydrogen fluoride
2.6–3 µm
Cw/pulsed
Free-Electron Lasers Wiggler magnet a
100 nm–1 cm
Tunable, high power
Continuous wave.
of different types of pumps exist that are based on various types of processes and energy transfer mechanisms. Table 2 lists the different types of lasers and the properties of some that are more commonly used. A description of the various laser types can be found in the references. Note, however, that the last entry in the table, the free-electron laser, is the only type of laser listed that is not based on the process of stimulated emission. Rather, radiation is produced by the oscillation of relativistic, free electrons in a wiggler magnet (25,26). The free-electron laser resembles a synchrotron source and does not function like other types of lasers. Bremsstrahlung. Figure 13 is a basic drawing of a typical X-ray tube, a common source of X rays used in many industrial and medical settings, as well as a source for X-ray spectroscopy. An electric current heats a filament (the cathode) and generates a cloud of free electrons (a process known as thermionic emission). The electrons are accelerated across the gap in the evacuated tube by a potential difference on the order of 103 –105 V. Then, the energetic electrons strike a metal target (the anode). Upon entering the target, the electrons are deflected by the coulombic field that surrounds the nuclei of the
224
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
where e is the electron’s charge and V is the accelerating voltage of the X-ray tube. It is possible, on occasion, for all of the electron’s kinetic energy (except for an extremely small amount due to recoil of the target nucleus) to be radiated as a single photon. Such a photon would correspond to one that has the maximum permissible energy, or equivalently, the shortest allowable wavelength λmin . The minimum wavelength is determined by setting the photon energy hc/λmin , equal to the kinetic energy of the electron. The result is simply λmin = hc/eV, or
X rays e− Target
Current supply
Filament
+
−
Hv supply Figure 13. A basic X-ray tube. Electrons from a heated filament are accelerated toward a metal target where X rays are produced.
30 KV
(46)
As an example, the spectrum from a 20-kV accelerating potential has a cutoff wavelength of 0.062 nm and peaks at a wavelength of about 0.093 nm. In addition to bremsstrahlung, real X-ray tubes emit characteristic X rays specific to the target atoms. The characteristic X rays show up as pronounced narrow spectral lines that sit on top of the bremsstrahlung continuum. In practice, only a very small fraction of the electron beam energy shows up as X radiation. Most of the energy actually goes to heat the anode. Target materials like tungsten that have high atomic numbers produce a higher yield of X rays — but even then heating effects dominate. For this reason, the anode must be cooled and/or rotated to prevent melting.
8
Intensity (arb. units)
1240 . V (volts)
In addition, the approximate peak of the spectrum is located at (47) λpeak ∼ = 32 λmin .
10
6
4 20 KV 2
10 KV 0 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 Wavelength (nm) Figure 14. Continuous X ray, or Bremsstrahlung, spectra for different accelerating potentials.
target atoms and thus experience an acceleration. The electrons radiate electromagnetic energy, causing them to experience a rapid deceleration in the target. The radiation emitted is called bremsstrahlung or braking radiation. Bremsstrahlung is characterized by a continuous spectrum; most of the radiation is emitted in the X-ray region. Figure 14 shows typical bremsstrahlung emission spectra from an X-ray tube that uses different accelerating voltages. An important feature to observe is that each spectrum has a minimum wavelength λmin below which no radiation is emitted. The existence of this wavelength cutoff can be explained only by considering a quantum, that is, photon, picture of the emitted radiation. In general, as an electron decelerates, it can lose energy by emitting any number of photons. However, the sum of the photon energies emitted by the electron in the vicinity of any one target nucleus, cannot exceed the kinetic energy K of the electron upon entering the target. This kinetic energy is determined by the product K = e · V,
λmin (nm) =
(45)
Electromagnetic Waves in Matter Speed of Light in a Dielectric Medium In vacuum, all electromagnetic waves travel at the √ same speed, c = 1/ ε0 µ0 = 3.00 × 108 m/s (see Eq. 2). This follows from the wave equation (Eq. 1) developed by Maxwell. Now, if one applies Maxwell’s equations to a region of space filled with a nonconducting, or dielectric, medium, the wave equation appears yet again, except that ε0 and µ0 , the electric permittivity and magnetic permeability of free space, are replaced by ε and µ, the permittivity and permeability of the medium. The result is that, like vacuum, homogeneous dielectric materials also support the propagation of EM waves, but now the wave speed is given by 1 (48) v= √ . εµ Basically, ε and µ are measures of a material’s response to applied electric and magnetic fields. When an E field is applied to a dielectric, the individual electric dipoles of the atoms or molecules become aligned. Similarly, an applied B field aligns the orientation of the individual electron current loops (or magnetic moments) of the atoms or molecules. These alignments produce internal fields within the dielectric. As a result, the actual electric or magnetic field in the medium is the sum of the externally
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
applied field and the internal field generated by the atoms or molecules of the material. The overall effect is to change the apparent coupling between the E and B fields in Maxwell’s equations, which in turn, changes the propagative speed of the electromagnetic waves. Ordinarily, the speed of an EM wave in various materials is quoted as an index of refraction (or simply index, for short). The index of refraction n of a medium is just the ratio of the wave speed in vacuum to the wave speed in the medium:
(49)
For nonmagnetic media, µ and µ0 are essentially identical, √ so n = ε/ε0 . The dielectric constant κ of a material is defined as its permittivity relative to that of free space:
Therefore, n=
ε . ε0 √
(50)
κ.
(51)
The refractive index of some common substances in yellow light are listed in Table 3. The larger the index of refraction, the slower the speed of the wave. For example, the speed of light in air differs from the speed in vacuum by only about 0.03%, whereas the speed in water is about 33% slower than in vacuum. For a given frequency, the wavelength of the radiation depends on the speed of the wave, and hence the refractive index of the supporting medium:
Table 3. Index of Refraction for Some Common Substancesa Substance
Index
Gases (0° C, 1 atm) Air Carbon dioxide
1.000293 1.00045
Liquids (20° C) Benzene Ethyl alcohol Water
1.501 1.361 1.333
Solids (20° C) Diamond Silica Crown glass Flint glass Polystyrene
λ=
2.419 1.458 1.52 1.66 1.49
a Using sodium light, vacuum wavelength of 589 nm.
(52)
where λ0 is the wavelength in vacuum. In other words, compared to vacuum, the wavelength is reduced by the factor n. From Eq. (7), the wave number k is also larger than the wave number k0 , in vacuum, by the factor n: k = nk0 .
c n≡ v εµ = . ε0 µ0
κ≡
v c/n c/ν = = ν ν n λ0 = , n
225
(53)
The precise value of the speed, and hence the refractive index, of a substance depends on the frequency of the radiation, a general phenomenon known as dispersion. Notice that the values of n in Table 3 are for yellow light whose vacuum wavelength is 589 nm. Generally, visible light toward the violet end of the spectrum has a somewhat higher index and reddish light has an index that is somewhat lower. Dispersion occurs, because the speed of light is determined by the permittivity of the medium, and the permittivity is a function of frequency. A convincing example of this is that for microwave frequencies and lower, the relative permittivity (i.e., dielectric constant) κ for water is around 80. According to Eq. (51), the fact that yellow light has an index of 1.333 means that the permittivity of water at optical frequencies, must be drastically less. A more complete discussion of dispersion follows in the next section. Dispersion in a Dielectric Medium. Many of the basic features that underlie the frequency dependence of the speed of light in a dielectric are successfully explained by a relatively simple classical model of the interaction between an EM wave and the molecules of a material. The model considers that each molecule, in effect, is composed of two equal and opposite charges, q and −q, bound together. For simplicity, one of the charges is treated as stationary (i.e., infinitely massive), whereas the other charge (mass m) is free to move. In a nonpolar molecule, for example, q and −q correspond to the combined positive charge of the nuclei and the surrounding negatively charged cloud of electrons. When the molecule is unperturbed, the centers of the positive and negative charge distributions coincide. However, in the presence of an external electric field, the center of the electron cloud becomes displaced relative to the nuclei — in other words, the field produces an induced dipole moment in the molecule. Now the charge distribution is asymmetrical, and the electron cloud experiences an electrostatic restoring force that tends to pull it back to equilibrium. In our simple model, this restoring force is represented by a linear spring (force constant K) that connects the two charges. The system has the properties of a simple harmonic oscillator, characterized by a natural,
or resonant, vibration frequency ω0 = K/m. When visible light or any other electromagnetic wave, is present, the effect is to introduce a harmonically oscillating electric field at the location of the molecule. The field continually shakes the charge in the molecule back and forth at a frequency, ω that matches that of the wave. The
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
oscillator (i.e., molecule) experiences both this harmonic driving force and the spring-like restoring force discussed before. There is a third and final force, namely, a damping force, that tends to retard the motion and cause energy losses in the system. It arises because of interactions between the oscillator and other nearby molecules and because energy is carried away by radiation from the oscillating (and hence accelerating) charge. The latter effect is referred to as radiation damping. It can also be thought of as the reactive force of the radiative field on the oscillator. Each of the three forces identified before contributes a term to the oscillator’s equation of motion. If it is assumed that the EM wave is linearly polarized along the x direction, the charge will displace along that direction, and the equation of motion is m
dx d2 x = −Kx − b + qE0 cos ωt. dt2 dt
(54)
The left-hand side is simply the product of the mass and acceleration of the oscillating charge, and the right-hand side is the sum of the three forces acting on it. The first term represents the restoring force, the second term is the damping force, and the third term corresponds to the driving force that arises from the interaction of the charge with the oscillating field of the EM wave. Notice that the damping term (where the damping constant is b) resembles a drag force and is proportional to the speed of the oscillator at a given instant. The negative signs attached to the first two terms guarantee that the restoring force and the damping force are directed opposite to the oscillator’s instantaneous displacement and velocity, respectively. Equation (54) can also be written as q d2 x 1 dx + + ω02 x = E0 cos ωt, dt2 τ dt m
(55)
where τ = m/b represents an effective time constant associated with the damping, or energy dissipation, in the system. There are two important frequencies that appear — the resonant frequency ω0 , which is solely a property of the oscillator or type of molecule, and ω, which is the frequency of the driving wave. Now, we seek the steady-state, or particular, solution to Eq. (55), which is an inhomogeneous, linear, secondorder differential equation having constant coefficients. In cases like this, where the driving term is oscillatory, the process of solving the equation is simplified by replacing the factor cos ωt by the complex exponential exp(−iωt), where i stands for the square-root of −1. According to a mathematical identity known as Euler’s formula, e±iωt = cos ωt ± i sin ωt,
where x˜ is complex, and then taking the real part of this result. To solve Eq. (57), assume a solution of the form x˜ (t) = Ae−iωt ,
The meaning of a complex amplitude is as follows: As for any complex quantity, A(ω) can be reexpressed in the polar form (60) A(ω) = |A(ω)|eiϕ , the product of a magnitude and a complex phase factor (whose phase angle is ϕ). Substituting in the assumed form for the complex solution, Eq. (58), gives x˜ (t) = |A(ω)| exp[−i(ωt − ϕ)]. Then, the actual physical solution is obtained by taking the real part of this expression, which is just x(t) = |A(ω)| cos(ωt − ϕ). In other words, a complex amplitude means that there is some phase difference ϕ between the driving force and the resulting oscillation. The actual amplitude of the oscillation is given by the magnitude of A(ω): |A(ω)| =
(q/m)E0
(ω02
−
ω 2 )2
ω2 + 2 τ
.
(61)
This function, sketched in Fig. 15, represents the so-called resonance curve for a damped, driven oscillator. Naturally, the question at this point is how does finding the solution for the displacement x(t) lead to an understanding of dispersion? The answer has to do with the way different
1/tw0 = 0.2
(56) 0
we see that cos ωt is identical to the real part of exp(−iωt). This means that the solution to Eq. (55) is obtained by first determining the mathematical solution to the equation d2 x˜ 1 d˜x q + + ω02 x˜ = E0 e−iωt , dt2 τ dt m
(58)
where A is some complex oscillatory amplitude. Substitution in the equation of motion shows that the amplitude is a function of the driving frequency of the electromagnetic wave: (q/m)E0 . (59) A(ω) = 2 ω0 − ω2 − i ωτ
A (w)
226
(57)
1
2
3
w/w0 Figure 15. Resonance curve for a damped, driven oscillator. The curve shows how the oscillation amplitude depends on ω, the driving frequency. ω0 is the natural, or resonant, frequency of the oscillator and the factor 1/τ is proportional to the amount of damping in the system.
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
driving frequencies induce different degrees of charge separation in the molecules and whether they result in different phase shifts between the perturbing wave and the oscillatory motion. The action of the EM field on a molecule is to produce an oscillating electric dipole, whose (complex) dipole moment p(t) = q · x˜ (t). Using Eqs. (58) and (59), one finds that the dipole moment can be reduced to the form
This complex refractive index can be broken up into its real part and imaginary part, n˜ = n + i, where
n=1+
α(ω) =
q2 /m ω02 − ω2 − i
(64)
It is important not to confuse the polarization of the medium with the polarization associated with the EM wave. In a simple, linear dielectric, another measure of the degree to which a medium is polarizable is the value of the relative permittivity κ in excess of the value in vacuum (namely, unity). More specifically, P = (κ − 1)ε0 E.
(65)
Combining Eq. (65) with Eq. (64) gives N α. ε0
(66)
Now, we can make the connection between our model and the refractive index, or speed of light in the medium. √ Recall that n = κ, where the refractive index n is a real quantity. Note that in the present treatment, the molecular polarizability α is, in general, complex, and therefore, so is κ. Now, we introduce a complex index of ˜ given by refraction n, 1+
N α. ε0
(67)
At sufficiently low density, such as in a gas, Nα/ε0 1, and the right-hand side of Eq. (67) can be expanded in a Taylor series. Keeping only the two leading terms and plugging in Eq. (63) for α, gives n˜ = 1 +
=1+
N α 2ε0
2 ω 2 2 2 (ω0 − ω ) + 2 τ −ω
1 2 ω 2 p
(69)
. ω (ω02 − ω2 )2 + 2 τ ω/τ
2
ωp2 =
Nq2 . mε0
(70)
(71)
ωp is known as the plasma frequency of the medium. As a specific example, consider variations in the refractive index of crown glass in the visible region of the spectrum (ν = ω/2π ≈ 5 × 1014 Hz). If we make the simplifying assumptions that the frequencies of interest are much lower than the natural vibrational frequency of molecules in the glass (i.e., ω ω0 , and the light is far from resonance) and that damping effects are small (i.e., τ is very large), then, n≈1+
1 2 ω 2 p
1 2 ω0 − ω 2
ωp2 n=1+
and
2
1 Nq . 2mε0 ω2 − ω2 − i ω 0 τ
(68)
,
(72)
and is essentially negligible. In other words, n˜ ≈ n, and the refractive index is purely real, as one might expect. Figure 16 displays data for the index of crown glass as a function of the wavelength in vacuum. The accompanying curve is the best fit to the data using the very simple model of Eq. (72). The resonant frequency can be extracted from the fit; the result is ν0 = ω0 /2π = 2.95 × 1015 Hz — that is, the resonant frequency is in the ultraviolet. The characteristic shape of the curve shown here is typical of the dispersion curve for many transparent substances at frequencies far below resonance: At short wavelengths, the index of a given material is higher than at long wavelengths. This type of behavior, where dn/dω > 0 (or dn/dλ < 0), is referred to as normal dispersion. Now, consider the behavior of the refractive index in the vicinity of resonance, when ω ≈ ω0 . Then ω02 − ω2 = (ω0 + ω)(ω0 − ω) ≈ 2ω0 (ω0 − ω), and Eqs. (69) and (70) for the real and imaginary parts of the index reduce to ω0 − ω 4ω0
(ω0 − ω)2 +
2
We have introduced the parameter ωp , where
P = Np = NαE.
n˜ =
1 2 ω 2 p
(63)
ω τ
and is called the molecular polarizability. This parameter is a measure of how easy it is, at frequency ω, for the oscillating electric field to induce a separation of the bound charges in the molecule. If we consider a region of dielectric that contains N molecules per unit volume, we can speak of the local electric polarization P of the medium, which is the dipole moment per unit volume:
κ =1+
ω02
(62) =
where
and
p(t) = α(ω)E(t),
227
=
1 (2τ )2
ωp2 /8τ ω0 (ω0 − ω)2 +
1 (2τ )2
.
(73)
(74)
228
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
according to Einstein’s Theory of Special Relativity, it is impossible for a signal to propagate at a speed greater than c. To resolve this enigma, one must distinguish between the phase velocity and the group velocity of a wave. As mentioned earlier, v = c/n, the phase velocity, corresponds to the speed of a perfectly harmonic, monochromatic wave, as in Eq. (4). Such a wave can, in fact, travel faster than c. But to carry any information, some sort of modulation, or variation in amplitude, must be impressed on the idealized wave. Even the process of simply switching the source of the wave on or off introduces some degree of modulation because the wave will no longer be infinite in extent and duration. Information, or the signal carried by the wave, is encoded in the modulation envelope. In general, however, the speed of the envelope may be different from the phase velocity c/n. The rate at which the envelope propagates, known as the group velocity of the wave, is given by
1.56 Dispersion curve for crown glass
1.55 Measured data
n
1.54 Simple oscillator theory
vg =
1.53 400
500 600 Wavelength (nm)
700
dω . dk
(75)
Compare this to the expression for the phase velocity:
Figure 16. Refractive index of crown glass as a function of wavelength for visible light. Fitting the data to simple theory produces a resonant frequency of 2.95 × 1015 Hz, which is in the ultraviolet. (Data are from Ref. (2), Table 3.3, p. 71.)
v=
ω ω/k0 ω c = . = = n n nk0 k
(76)
It is not too difficult to show that these relationships lead to yet another expression for group velocity: c
vg =
n+ω
n−1 w
1/t w0 Figure 17. Frequency dependence of the real part (n) and imaginary part () of the refractive index in the vicinity of resonance ω0 for a dielectric medium whose damping is 1/τ .
A general sketch of n and near resonance is shown in Fig. 17. Consider the dispersion curve for n, the real part of the index. At ω = ω0 , the value of n is unity, and just below and above resonance, there are a maximum and a minimum, respectively. For frequencies between the maximum and minimum, the slope of the curve is negative, (dn/dω < 0), and one says that the medium exhibits anomalous dispersion. On either side of this region, the slope is positive (dn/dω > 0) and, as before, normal dispersion occurs. A puzzling feature of Fig. 17 is that for frequencies above the resonant frequency, the real index n is less than unity. This means that c/n, the speed of the wave, is greater than c, the speed of light in vacuum. However,
dn dω
.
(77)
Even though the phase velocity c/n is greater than c in the region of normal dispersion above the resonant frequency, the slope dn/dω is positive, and Eq. (77) guarantees that the group velocity is always less. Hence, the signal (i.e., the envelope of the wave) propagates at a speed less than c. Notice, however, that in the region of anomalous dispersion, where dn/dω is negative, Eq. (77) indicates that vg > v and apparently the group velocity may exceed c. However, the idea of group velocity loses its significance in this region, and again the transfer of information never exceeds c (5). Absorption Near Resonance We have yet to discuss the meaning of , the imaginary part of the index of refraction, and its behavior as a function of frequency, as displayed in Fig. 17. To understand the significance of , consider a simple EM plane wave that propagates in the +z direction through a dielectric medium. In complex form, the wave is written as ˜ E(z, t) = E0 ei(kz−ωt) .
(78)
k represents the wave number in the medium, or the product of the index of refraction, and k0 , the wave number in vacuum [see Eq. (53)]. We know that, in general, the medium has a complex index n˜ = n + i. So, we replace k ˜ 0 , and the field reduces to by nk ˜ E(z, t) = E0 e−ωz/c ei(nωz/c−ωt) .
(79)
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
The actual physical field is obtained by taking the real part of this expression, so E(z, t) = E0 e−z/δ cos
nω c
z − ωt .
(80)
The parameter δ = c/ω
(81)
is called the skin depth of the medium at frequency ω. Recall that I, the intensity of the wave, is proportional to the magnitude of the field squared (averaged over time). So, (82) I = I0 e−(2/δ)z , where I0 is the intensity at z = 0. For a purely real refractive index, = 0, which means that the skin depth is infinite and the wave propagates without any attenuation or energy loss. However, if the index is complex, the skin depth is finite, and the wave is attenuated. The energy carried by the wave is absorbed exponentially along the direction of propagation. The quantity 2/δ is called the absorption or attenuation coefficient of the medium. For a substance to be transparent to radiation of frequency ω, the skin depth at that frequency must be much larger than the thickness of the material. Now, return to Fig. 17 and the sketch of as a function of frequency. It shows that the imaginary part of the index is very pronounced in the vicinity of resonance, which signifies that absorption in this region is strong. The shape of the versus ω curve is Lorentzian, and the two frequencies that mark where the curve hits half the maximum height correspond to the edges of the anomalous dispersion region, also called the absorption band. The width of this region, or the curve’s full width at halfmaximum (FWHM), is simply 1/τ . Corrections to Dispersion Model Identified here are two important corrections to the simple molecular dispersion model developed thus far (27): 1. Until now, the model has assumed a low molecular density in the dielectric. However, in general, this assumption holds only for dilute gases. It turns out that the effects of increased density can be properly accounted for by simply replacing the quantity κ − 1 with 3(κ − 1)/(κ + 2) in Eq. (66). The revised equation is known as the Clausius–Mossotti relationship: 3ε0 κ − 1 = α. (83) N κ +2 The correction to the original equation comes from the fact that, in addition to the externally applied field, each molecule also experiences another field produced by the polarization of the surrounding molecules. The Clausius–Mossotti relationship is particularly useful because it relates macroscopic quantities [left-hand side of Eq. (83)] to α, which is a molecular quantity. Furthermore, because molecular polarizability should depend only on the
229
type of molecule (and frequency), the left-hand side of Eq. (83) remains constant, independent of density. Hence, if one knows κ for a molecular species at one density, the value can be computed for another √ density, and likewise for κ, or the refractive index. It can be shown that the only effect of increased density on the dispersive behavior of a dielectric medium is simply to shift the center of the absorption band downward from frequency ω0 to a value of ω02 − ωp2 /3. 2. The model presented assumes that a molecule has only a single resonance ω0 . In general, there are resonances at a number of frequencies; ω01 , ω02 , ω03 , . . ., etc. As before, resonances in the UV portion of the spectrum are ordinarily associated with natural oscillations of electrons bound in the molecule. Resonances in other regions of the spectrum, however, usually originate from other oscillatory modes in the system. For example, resonances in the IR are usually caused by interatomic vibrations in the molecule. Each resonance ω0i has its own characteristic mass mi and damping time τi . The result is that now the molecular polarizability α contains a contribution from each resonant frequency of the molecule, so that α(ω) =
αi (ω),
(84)
i
where fi αi (ω) =
q2 mi
2 ω0i − ω2 − i
ω. τi
(85)
Each resonance has a weighting factor, or oscillator strength, denoted by fi . As shown in Fig. 18, the overall effect of multiple resonances is that each gives rise to anomalous dispersion and its own absorption band. Wave Propagation in a Conducting Medium Any medium that can conduct an electric current contains free charges. Unlike a dielectric, these charges are
IR
n
Visible
UV
X ray
1
0
w01
w02
w03
w
Figure 18. Real part of the index of refraction as a function of frequency for a dielectric medium that has multiple resonances at ω01 , ω02 , ω03 , . . ., etc.
230
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
unbound, and no restoring force acts on them when they are displaced by an electric field. In a metal, for example, outer (i.e., valence) electrons are released from the grip of individual atoms and, in the process, form a pool of so-called conduction electrons. These electrons are no longer localized near any particular atom. Instead, they are completely free to move about within the solid, much like the motion of particles in an ideal gas. The metal is said to contain a free electron gas. The basic behavior of EM waves in a conductor can be modeled by saying that the existence of an unbound charge is equivalent to setting ω0 → 0. Applying this condition to Eq. (63) for the molecular polarizability α and plugging the expression into Eq. (66) for κ gives κ =1−
ωp2 ω2 + i
ω. τ
(86)
But recall that κ = n˜ 2 = (n + i)2 = (n2 − 2 ) + 2in. Equating the real and imaginary parts of this expression to those from the right-hand side of Eq. (86) gives ωp2
n2 − 2 = 1 −
ω2 +
1 2n = ωτ
ωp2 ω2
1 + 2 τ
.
ωp2 τ ω
.
(88)
(89)
(90)
This result is applicable, for example, to radio waves and microwaves in metals. Later on, it will be shown that at these wavelengths, most of the energy carried by the wave is generally reflected from the surface of a metal, but because of the large imaginary part of the index, the small part that enters the medium is strongly absorbed and heats the conductor. From Eq. (81), the skin depth of the medium is given by δ=
2c2 . ω(ωp2 τ )
(91)
2 . µ0 ηω
(92)
The conductivities of most metals are on the order of 108 −1 ·m−1 . So, at a typical microwave frequency (say 10 GHz), the skin depth is on the order of a micron or less. Compare this to the skin depth of seawater at the same frequency (seawater conducts because the salt dissociates into ions, which are free charges). Seawater has a conductivity of about 4–5 −1 ·m−1 , which gives a microwave skin depth of a few millimeters. The penetration of the wave can be increased still further by reduction to radio-wave frequencies. At a frequency of 10 kHz, for example, the skin depth of a wave in seawater is a few meters, showing that it can penetrate significantly. At intermediate frequencies, where τ −1 ω ωp , n2 − 2 ∼ =−
2n ∼ =
Then, solving for n and gives ωp n∼ 1. =∼ = 2ω/τ
2c2 ε0 = ηω
and
n2 − 2 ∼ = −ωp2 τ 2
2n ∼ =
δ=
(87)
The most common situation is one where the damping is small and τ −1 ωp . This allows a straightforward determination of n and at low, intermediate, and high frequencies, as discussed here. In the limit of low frequency, where ω τ −1 , the preceding equations reduce to
and
and
1 τ2
It turns out (6) that the quantity ωp2 τ is identical to η/ε0 , where η is the dc (i.e., low-frequency) electrical conductivity of the medium. This means that the conductivity is the only material property needed to determine the skin depth:
ωp2 ω2
ωp2
,
.
ω3 τ
(93)
Then, the real and complex index become n∼ = and
ωp 2ω2 τ
∼ =
ωp . ω
(94)
In this frequency regime, is much larger than n, and the skin depth is simply c . (95) δ= ωp These results apply to metals, when light is in the infrared. Finally, consider the index at high frequencies, when ω ωp . Then, n2 − 2 ∼ = 1, and 2n ∼ = which leads to
ωp2
,
(96)
.
(97)
ω3 τ
n∼ =1
and ∼ =
ωp2 2ω3 τ
Here, 1, so the skin depth is extremely large. The conductor is essentially transparent to the radiation. This explains, at least from a classical point of view, a number of important phenomena. For example, consider
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
copper, which has a free electron density of N = 8.48 × 1028 electrons/m3 (this number is based on the fact that each copper atom contributes a single conduction electron to the solid). From Eq. (71), the plasma frequency of copper is νp = ωp /2π = 2.61 × 1015 Hz, which is in the near UV. Because the plasma frequency is much lower than the frequency of gamma rays, X rays, and even radiation in the far UV, these radiations are easily transmitted through copper and other metallic conductors. Another example involves transmission of radiation through the ionosphere, which contains ionized gas. Here, N, the density of these ions, is roughly 17 orders of magnitude lower than the value for a typical metal. This gives a plasma frequency of only a few megahertz, which explains why the atmosphere is transparent in the microwave region. Figure 19 is a plot of n and versus ω/ωp for the particularly simple case of τ −1 → 0, or no damping. Using this idealization, the plasma frequency plays the role of a critical frequency. At frequencies less than this value, the index is purely imaginary — this means that the wave is completely reflected at the boundary of the medium. At frequencies higher than ωp , the index is purely real, and the conductor is transparent. Of course, any real conducting medium is characterized by a finite value of τ , and the actual curves will not be quite as simple as those shown here. However, except at very low frequencies (ω τ −1 ) where the effects of damping are extremely important, there is a remarkable resemblance between the curves of Fig. 19 and those for real conductors (27).
particular coordinate direction. In reality, a state of linear polarization corresponds to only one of a number of possible types of polarization, as discussed in this section. Types of Polarization Consider a monochromatic electromagnetic plane wave (frequency ω, wave number k) that propagates in the +z direction. The most general form for the electricfield vector is a superposition (i.e., sum) of two mutually orthogonal, linearly polarized waves, one polarized along the x direction and the other polarized along the y direction: (98) E(z, t) = ˆiEx (z, t) + ˆjEy (z, t). Each component is just a harmonic wave that has its own amplitude (E0x , E0y ) and its own initial phase: Ex (z, t) = E0x cos(kz − ωt) Ey (z, t) = E0y cos(kz − ωt + ).
(99)
Without any loss of generality, the initial phase of the x component of the wave has been set to zero, and represents the phase difference between the y component and the x component. Depending on the value of and the relative values of the two amplitudes, E0x and E0y , one obtains different types of polarization for the propagating wave. The different cases are presented here:
Linear Polarization. Consider the case when = 0 or π . Then, the field is simply
Polarization Until now, it has been assumed that the electric-field vector of an EM wave is linearly polarized along a 5
E(z, t) = (ˆiE0x ± ˆjE0y ) cos(kz − ωt).
(100)
(The upper sign is for = 0, and the lower sign is for = π .) This is the situation where the x and y components oscillate precisely in synch with one another; the result is a linearly polarized wave whose amplitude is E20x + E20y . Together, the electric-field vector and the direction of propagation define the plane of vibration of the wave. Imagine that the wave is traveling directly toward you, and consider the time variation in the electric-field vector at some fixed position (i.e., z = const). As depicted in Fig. 20, the tip of the vector will oscillate back and forth along a line tilted relative to the +x axis by angle θ , where tan θ = E0y /E0x . When = 0, the field vibrates in quadrants I and III in the x, y coordinate system, and when = π , it vibrates in quadrants II and IV.
4
3
2
1
Circular Polarization. Let E0x = E0y = E0 ±π/2. Then,
n
E(z, t) = E0 [ˆi cos(kz − ωt) ∓ ˆj sin(kz − ωt)]. 0 0.0
231
0.2
0.4
0.6
0.8
1.0 w/wp
1.2
1.4
1.6
1.8
2.0
Figure 19. Frequency dependence of the real part (n) and imaginary part () of the refractive index for a conductor that has no damping (i.e., 1/τ → 0). The plasma frequency ωp plays the role of a critical frequency.
and =
(101)
In this case, the net field amplitude remains fixed at E0 , but its direction rotates about the origin at angular velocity ω. As the wave approaches and passes a fixed position, the tip of the field vector traces out a circle of radius E0 , and the wave is said to be circularly polarized. Figure 21 shows the situation for = +π/2, where the rotation is
232
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
y E
E0y
E0y E
q
E0x
q
x
E0x
Figure 20. Linearly polarized electromagnetic wave. For a plane wave (amplitude E0 = E20x + E20y ) that travels in the +z direction (i.e., toward the reader), the tip of the electric field vector oscillates along a line tilted at angle θ = tan−1 (E0y /E0x ) relative to the x axis.
Figure 22. Elliptically polarized wave (left-handed). The polarization ellipse is inscribed in a rectangle whose sides are 2E0x and 2E0y .
ellipse is given by θ , where y
tan 2θ =
x
Figure 21. Circularly polarized wave (propagating toward the reader). The magnitude of the electric-field vector remains fixed and rotates at constant angular velocity about the z axis. The sense of rotation shown here corresponds to left-hand circular polarization.
counterclockwise — this corresponds to so-called left-hand circular polarization. When = −π/2, rotation becomes clockwise, and the wave has right-hand polarization. It is interesting to note that by combining left- and rightcircularly-polarized light (i.e., applying the two different signs in Eq. (101) and adding the fields together), one obtains a wave that is linearly polarized.
Elliptical Polarization. Finally, consider the general case of arbitrary values for the amplitude components and relative phase. Following some involved algebra, the quantity kz − ωt can be eliminated from Eqs. (99), producing the following equation that relates Ex and Ey : Ex E0x
2
+
Ey E0y
2
−2
Ex E0x
(103)
Linear polarization and circular polarization are both special cases of elliptical polarization.
E0
2E0x E0y cos . E20x − E20y
Ey E0y
cos = sin2 .
(102) This is the equation of an ellipse traced out by the tip of the electric-field vector, as shown in Fig. 22. The resulting wave is elliptically polarized, and Eq. (102) defines a polarization ellipse. As before, the field vector can rotate counterclockwise or clockwise, giving left-ellipticallypolarized or right-elliptically-polarized radiation. The ellipse is inscribed in a rectangle whose sides 2E0x and 2E0y are centered at the origin. The tilt of one axis of the
Unpolarized and Partially Polarized Radiation As can be seen from the preceding discussion, an electromagnetic wave is polarized (linear, circular, or elliptical), as long as some definite, fixed phase difference is maintained between the oscillations of the two orthogonal field components. However, most sources of light and other EM waves produce radiation that is more complex due to the introduction of rapidly varying, random phase shifts in the x and y field components. In an incandescent or discharge lamp, for example, each excited atom emits a brief polarized wave train of light whose duration is on the order of about 10−8 s. Once such a wave train is emitted, the random nature of atomic emission makes it impossible to predict exactly when the next wave train will appear and what its polarization state will be. The result is that the phase shift fluctuates randomly and rapidly in time, and no well-defined polarization exists (at least, not for more than about 10−8 s). Radiation of this type is said to be unpolarized or randomly polarized. Another way to think of unpolarized radiation is as an incoherent superposition of two orthogonal, linearly polarized waves. By incoherent, we mean that there is no well-defined phase relationship between the two waves during any appreciable time. As unpolarized light advances, an observer might imagine seeing an extremely rapid sequence of polarization ellipses that have random orientations and eccentricities. Over time, however, no clear polarization ellipse emerges. The most general state of EM radiation is one of partial polarization. As in the unpolarized case, the phase difference between the orthogonal oscillations is not fixed. On the other hand, the phase difference is not completely random either. In partially polarized light, the phases of the two components are correlated to some degree, and the path of the electric-field vector tends to fluctuate about some preferential polarization ellipse. One way to think of the light is as a superposition of both polarized and unpolarized radiations. One sometimes refers to the degree of polarization of a radiation, which
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
is a number between 0 and 1. Light that is completely polarized, whether linear, circular, or elliptical, has a degree of polarization of unity, whereas light that is completely unpolarized corresponds to a value of zero. Partially polarized light is characterized by a fractional value between the two extremes. Polarization-Modifying Materials Here, we briefly mention certain classes of materials that can alter the state of polarization of an incident radiation. These materials are the basis for a number of optical elements common in many imaging systems. To begin with, certain materials can selectively absorb one of the two orthogonal polarization components and transmit the other. Materials of this type are said to exhibit dichroism. They include certain natural crystals, as well as commercially produced Polaroid sheet. In the field of optics, dichroic materials function as linear polarizers — they can be used to produce a beam of light that is linearly polarized, so that its plane of vibration is determined by the orientation of the element. These materials can also be used as polarization analyzers, which means that they block or transmit light that has certain polarization characteristics. A large number of crystalline solids exhibit a phenomenon known as birefringence. These are anisotropic materials that have two different indexes of refraction for a given direction of wave propagation; each index corresponds to one of two orthogonal polarization directions. When a general polarized beam travels through a birefringent crystal, each polarization component advances at a different phase velocity. This introduces an additional phase shift between the two components, causing a change in the polarization ellipse. The amount of shift depends on the precise values of the two indexes, the wavelength of the light, and the thickness traversed in the medium. An optical element that produces a phase shift of π/2 is called a quarter-wave plate; it can transform linearly polarized light into circularly polarized light, and vice versa. A halfwave plate introduces a shift of π . One of its main uses is to reverse the handedness of right- or left-circular (or elliptical) light. Because of their effect on relative phase, wave plates, such as those mentioned here, are also referred to as phase retarders. Birefringence can also be induced in a material by subjecting it to an external electric, magnetic, or mechanical (i.e., stress) field. For example, when a constant electric field or voltage is applied, the medium experiences a so-called electro-optic effect. When an optically isotropic medium is placed in the E field, a refractive index difference n develops in the material for light polarized parallel and perpendicular to the applied field. The degree of birefringence (i.e., the value of n) that appears is proportional to the square of the field. This is known as the Kerr effect. A second important electrooptic phenomenon is the Pockels effect, which occurs only for certain crystal structures. In this case, the effect of the field is linear, that is, the induced birefringence is proportional to E, rather than E2 . Kerr cells and Pockels cells are electro-optical devices based on these effects. They
233
are used as optical switches and modulators, as well as high-speed optical shutters. Finally, certain materials exhibit a property known as optical activity. These are substances that rotate the plane of vibration of linearly polarized light, either clockwise or counterclockwise. Certain liquids, as well as solids, are optically active. Two simple examples are sugar–water and quartz. Reflection and Refraction of Waves at an Interface Consider the fate of an electromagnetic plane wave that propagates in a transparent medium (index n) toward a smooth interface formed with a second transparent medium (index n ). In general, upon encountering the interface, the incident wave gives rise to two other waves; one is transmitted beyond the interface into the second medium, and the other is reflected back into the first medium, as illustrated in Fig. 23. The propagation direction of each wave is represented by a ray, which for isotropic media, also represents the direction of energy flow. The direction of a given ray is the same as the direction of the corresponding wave vector and hence tracks a line normal to the advancing wave fronts. The wave vectors for the incident, transmitted, and reflected waves are denoted by k, k , and k , respectively. The angles between the line normal to the interface and the incident, transmitted, and reflected rays are given by θ , θ , and θ , respectively. Boundary Conditions for the Fields By applying Maxwell’s equations to a region of space that contains material media, one can derive four conditions that must be satisfied by any electric and magnetic fields, E and B, at an interface. These are referred to as boundary conditions for the fields. Assuming for the present that the media are nonconducting, the boundary conditions can be stated as follows: 1. The normal component of B must be continuous across an interface. 2. The normal component of εE must be continuous across an interface.
Incident ray k
Reflected ray
k″
q q″
n n′
q′
k′ Transmitted ray
Figure 23. Incident, transmitted, and reflected rays at an interface. All three rays (as well as the normal to the interface) lie in the same plane, known as the plane of incidence.
234
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
3. The tangential component of E must be continuous across an interface. 4. The tangential component of µ−1 B must be continuous across an interface. For an EM wave striking an interface (see Fig. 23), the importance of these boundary conditions is that the incident and reflected waves that reside on the first side of the interface produce a net E field and B field that can be compared to the fields on the transmitted side. By demanding that the field components satisfy the criteria listed, one finds that certain restrictions are imposed on the directions of wave propagation, as well as on the relative amplitudes and phases of the incident, transmitted, and reflected fields. Geometry of Reflection and Refraction The four boundary conditions on the fields lead to some simple relationships that involve the directions of the three rays in Fig. 23 (these results hold independently of polarization). The first important observation is that all three rays lie in the same plane, called the plane of incidence. Next, we have the Law of Reflection, which simply states that the angle of incidence θ matches the angle of reflection θ : (104) θ = θ. The Law of Reflection is the basis for tracing rays through a sequence of reflective surfaces. When the surface is curved, one constructs the plane tangent to the surface, and measures the angles of incidence and reflection relative to the line normal to that plane. The equality between θ and θ holds for very smooth, or specular, surfaces. If the surface has any roughness, its rapidly varying curvature will still give rise to specular reflection on a local scale. However, on a larger scale, it will appear that the surface reflects radiation in all directions. This is called diffuse reflection. In general, a surface may produce both a specular and diffuse component of reflected intensity. The final geometric condition involves the incident and transmitted rays and is called Snell’s Law. It is simply n sin θ = n sin θ.
(105)
According to this relationship, whenever a wave travels from a medium of low index n (sometimes referred to as low optical density) into one whose index n is higher (or high optical density), the transmitted ray is bent toward the surface normal, that is, θ < θ (unless, of course, θ = 0, in which case θ and θ both vanish, and the transmitted wave travels straight through and normal to the interface). This is shown in Fig. 23. When a wave travels from a high-index medium to one whose index is lower, the transmitted ray bends away from the normal, and θ > θ . This phenomenon, where the wave changes direction after encountering an interface, is called refraction, and Snell’s law is sometimes also known as the Law of Refraction. The transmitted ray is often referred to as the refracted ray, and θ is also known as the angle of refraction. Among other things, refraction is responsible for the focusing action of simple lenses.
It is worth mentioning that the Laws of Reflection and Refraction can be derived from considerations other than Maxwell’s equations and the boundary conditions for the fields. One method involves the geometric construction of wave fronts starting with some initial wave front and knowing how the speed of the wave changes when going from one medium to the next. This is the method used in many introductory level optics texts. A far more interesting method is based on an idea called Fermat’s principle of least time. In its simplest form, it states that the path followed by a ray of light (or, for that matter, any other electromagnetic radiation) between any two specified points in space is the one that takes the shortest amount of time. For example, consider the path followed by the incident and refracted rays in Fig. 23, and compare it to a straight-line path that connects the same starting and ending points. Clearly, the straight path covers a shorter distance; however, light would take a longer time to cover that distance because the wave travels at a higher speed in the first medium (lower index) than in the second. Hence, the light’s travel time is minimized by having a majority of its path reside in the low-index medium, causing the wave to bend, or refract, at the interface. By applying some simple ideas of calculus to the problem of minimizing the wave’s travel time, one finds that Fermat’s principle reproduces Snell’s law. Effect of Dispersion on Refraction Recall that, because of dispersion, the refractive index of a material is, in general, a function of wavelength. This has an important effect on refraction, namely, that different wavelengths are refracted by different amounts. For example, consider a beam of white light obliquely incident on an air–glass interface, and recall that white light is a mixture of the various frequency components in the visible spectrum. Because the index of glass increases slightly at the shorter wavelengths (normal dispersion), white light will be spread out into a continuum of colors; violet and blue (i.e., short wavelengths) are refracted the most, and red (or long wavelengths) is refracted the least. This phenomenon explains why a prism can be used to separate white light into its component colors. It also underlies the mechanism for rainbow formation when light enters a region of air that contains a mist of small water droplets. Chromatic aberration is an often undesirable effect that occurs in optical instrumentation. The value of the focal length of any lens in the system is governed by refraction at the two surfaces of the lens. Because the index of the lens material is a function of wavelength, so too is the focal length of the lens, and different colors will focus to different points — this has the effect of smearing images that are formed. A parameter that quantifies the degree to which a given lens material produces chromatic aberrations is called the dispersive power of the medium. Roughly speaking, dispersive power is the ratio of the amount of dispersion produced across the range of the visible spectrum compared to the amount the lens refracts light in the middle of the spectrum (when surrounded by air). The inverse of the dispersive power, known as the Abbe number, or V number, of the material, is given by
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
V=
nyellow − 1 . nblue − nred
(106)
nblue , nyellow , and nred are the indexes of the material at wavelengths of 486.1327 nm, 587.5618 nm, and 656.2816 nm, respectively. These values are chosen because they correspond to precisely known hydrogen and helium absorption lines (or Fraunhofer lines) that appear in solar absorption spectra. Larger Abbe numbers signify that less chromatic aberration is produced by that lens material. One way to correct for the problem of chromatic aberration is to use a so-called achromatic doublet. This is nothing more than a combination of particular converging (positive focal length) and diverging (negative focal length) lens elements that have well-chosen Abbe numbers, V1 and V2 , and focal lengths, f1 and f2 (for the yellow wavelength listed before). For the most part, chromatic aberrations are eliminated by choosing the parameters to satisfy (2) f1 V1 + f2 V2 = 0.
Now, we turn to the problem of relating the field amplitudes of the transmitted and reflected waves to that of the incident wave. This, again, is determined by applying the boundary conditions at the interface. In this case, the results depend on the polarization of the incoming wave. Figure 24 shows the situation for (a) the electricfield vector normal to the plane of incidence and (b) the electric-field vector parallel to the plane of incidence. The first case is sometimes referred to as the transverse electric,
(a) E
k″
k
1 1 1 − B cos θ + B cos θ = − B cos θ . µ µ µ
= n n′
q′
E′
B′
k′
TE polarization E
E″
k″
B
=− n n′
q′
2 sin θ cos θ . sin(θ + θ )
E′
B′ k′ TM polarization Figure 24. Electric and magnetic-field vectors for the incident, transmitted, and reflected waves at an interface. (a) TE polarization: Electric field vector vibrates normal to the plane of incidence. (b) TM polarization: Magnetic field vector vibrates normal to the plane of incidence.
(110)
The second expression for t is easily obtained from the first by applying Snell’s law. One can also define an amplitude reflection coefficient r which is the ratio of the reflected to the incident field. By eliminating the transmitted field, one finds that E n cos θ − n cos θ = rTE ≡ E TE n cos θ + n cos θ
B″ q q″
(109)
The magnetic fields can be eliminated in this equation because the magnetic and electric-field amplitudes in a medium of index n are related by B = nE/c [see Eq. (3) where c is replaced by c/n]. In addition, the Law of Reflection allows replacing θ by θ . Further simplification occurs if we also assume that the media are nonmagnetic, which means that µ ∼ = µ0 . Then, the reflected field = µ ∼ can be eliminated from Eqs. (108) and (109), and one finds an expression for the quantity E /E. This ratio of the transmitted to the incident field is called the amplitude transmission coefficient t. For TE polarization, it is E 2n cos θ = tTE ≡ E TE n cos θ + n cos θ
B"
q q″
k
In addition, the tangential component of µ−1 B must be continuous, so
E″
B
(b)
or TE, case, and the second case is referred to as the transverse magnetic, or TM, case. In keeping with our previous convention for primed and unprimed symbols, the incident, transmitted, and reflected fields are denoted by the pairs (E, B), (E , B ), and (E , B ), respectively (a field vector represented by indicates that it is pointing out of the page). Note that for either polarization, the set (E, B, k) defines a right-handed triad, and similarly for the primed vectors. Because some of the boundary conditions depend on the permittivity and permeability of the media, we will use the symbols ε and µ for the incident medium and the symbols ε and µ for the transmitted medium. First consider TE polarization. The tangential component of the electric-field vector has to be continuous across the interface, so (108) E + E = E .
(107)
The Fresnel Equations
235
sin(θ − θ ) . sin(θ + θ )
(111)
Equations (110) and (111) are known as the Fresnel equations for TE polarization. The Fresnel equations for TM polarization can be derived similarly. Using the same two boundary conditions as before demands that
and
E cos θ − E cos θ = E cos θ
(112)
1 1 1 B + B = B . µ µ µ
(113)
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
Again assuming nonmagnetic equations turn out to be tTM ≡
E E
TM
media,
the
Fresnel
2 sin θ cos θ sin(θ + θ ) cos(θ − θ )
0.6
(114a)
and
E E
= TM
Internal reflection (n /n ′ = 1.50)
0.8
2n cos θ = n cos θ + n cos θ =
rTM ≡
1.0
n cos θ − n cos θ n cos θ + n cos θ
tan(θ − θ ) . = tan(θ + θ )
Reflrection coefficient
236
0.4 TE 0.2 qB 0.0
(115)
0
10
−0.2
The amplitude reflection coefficients for both TE and TM polarization are plotted as a function of incident angle in Figs. 25 and 26 for light that strikes an interface between air and a typical glass. There are two graphs because the results depend on whether the incoming light is on the air side (n /n = 1.50) or the glass side (n/n = 1.50) of the interface. The first is referred to as external reflection, and the second as internal reflection. Negative values of r indicate a phase change of π upon reflection. Reflectance and Transmittance One is usually interested in the fraction of the wave’s energy or power that is reflected and transmitted at an interface. The desired quantities are the reflectance and transmittance of the interface. The reflectance R is defined as the power carried by the reflected wave divided by the power of the incident wave; the transmittance T is the ratio of the transmitted power to the incident power.
0.2 qΒ Reflection coefficient
−0.2
20
30
40
50
60
70
80
80
90
−0.4 Figure 26. Amplitude reflection coefficients for both TE and TM polarization as a function of incident angle for internal reflection from a glass/air interface (n/n = 1.50). θc is the critical angle for total internal reflection, and θB is the Brewster angle for internal reflection.
To see how the reflectance and transmittance are related to the Fresnel coefficients just derived, we use the fact that the power carried by any one of the three waves is given by the product of the intensity (I, I , or I ) and the cross-sectional area presented by the beam. If A is the illuminated area of the interface, then the cross-sectional area of the incident, transmitted, and reflected beams are A cos θ , A cos θ , and A cos θ , respectively. Hence, the reflectance is I I (A cos θ ) = . (116) R= I(A cos θ ) I
2 E R = = |r|2 . E
90
Angle of incidence TM
−0.4 TE −0.6 −0.8
70
TM
0.0 10
30 40 50 60 Angle of incidence
Because I = 12 vε|E|2 , and I = 12 vε|E |2 (see Eq. (14) in which c is replaced by v, the speed in the incident medium, and ε0 is replaced by ε), the expression, for either polarization, simply reduces to
0.4
0
20
qC
External reflection (n ′/n = 1.50)
−1.0 Figure 25. Amplitude reflection coefficients for both TE and TM polarization as a function of incident angle for external reflection from a typical air (n = 1)/glass (n = 1.50) interface. θB denotes the Brewster angle for external reflection, which is the incident angle where the TM reflection coefficient vanishes.
(117)
To get the transmittance, start with T=
v ε |E |2 cos θ I (A cos θ ) = . I(A cos θ ) vε|E|2 cos θ
(118)
Assuming nonmagnetic media, ε /ε = (n /n)2 . In addition, make the substitution v /v = n/n . Then, the transmittance simplifies to 2 E n cos θ n cos θ = |t|2 . T = E n cos θ n cos θ
(119)
Assuming no absorption at the interface, Eqs. (117) and (119) combined with the Fresnel equations yield R + T = 1, which is a statement of energy conservation.
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
First consider the special case of a wave that strikes an interface at normal incidence, that is, at θ = 0° . Then θ = 0° as well, and the reflectance and transmittance formulas become n − n 2 Rnorm = n + n
(120)
4nn . (n + n)2
(121)
and Tnorm =
These expressions follow from either the TE or TM Fresnel equations because the plane of incidence cannot be defined when θ = θ = θ = 0° . For an air–glass interface, we find that Rnorm = 0.04, that is, 4% of the intensity is reflected. This becomes especially important in optical systems that contain, for example, many lenses, where each presents two such interfaces. Hence, the need for applying antireflection coatings. Consider also the other extreme, where the incoming wave strikes the interface at a glancing incidence — that is, when θ approaches 90° . The result for both TE and TM polarization is that Rglance → 1, and an air–glass surface will behave in a mirrorlike fashion. Figure 27 shows the reflectance plotted for various incident angles at an air–glass interface, for both external and internal reflection and for both TE and TM polarization. The most important features of these curves are discussed in the two sections that follow.
1.0 0.9 0.8
Reflectance
0.7 0.6 0.5
External reflection
Internal reflection
0.4 0.3
TM TE
0.2 TE
TM
0.1 20
30 40 50 60 Angle of incidence
70
80
90
Brewster #2
10
Critical
0
Brewster #1
0.0
Figure 27. Reflectance from a typical air/glass interface as a function of incident angle. The TE and TM polarization cases are graphed for external and internal reflection.
237
Polarization by Reflection For TM polarization, notice that there is an incident angle at which R = 0 for both external and internal reflection. This angle is referred to either as the polarization angle or the Brewster angle θB . It is the angle at which there is no reflection of a wave polarized parallel to the plane of incidence. If the polarization of an incoming wave has components both parallel and normal to the plane of incidence, the reflected wave at the Brewster angle will be completely polarized normal to the plane; it will be TE polarized. The amplitude reflection coefficient rTM must vanish at the Brewster angle, so, from Eq. (115), tan(θB + θ ) must become infinite. This occurs when θB + θ = π/2. This leads to tan θB =
sin θB sin θB sin θB = π = . cos θB sin θ sin − θB 2
(122)
Now, use Snell’s law to replace the expression on the right by n /n. The result is the formula for the Brewster angle: tan θB =
n . n
(123)
Furthermore, once θB is determined for external reflection, the Brewster angle for internal reflection is easily obtained by interchanging n and n in Eq. (123). One finds that the two Brewster angles are complements of one another. For the air–glass interface, θB = 56.3° for external reflection and θB = 33.7° for internal reflection. Polarized sunglasses take advantage of what happens at the Brewster angle. When driving a car, much of the glare from the road surface is sunlight reflected near the Brewster angle, so the light is TE polarized parallel to the road surface. From Fig. 27, notice that the minimum of the reflectance curve about θB is quite broad. So even reflections on either side of the true minimum at angles substantially removed from θB are primarily TE polarized. Sunglasses can block out most road reflections by using linear polarizers oriented to block out waves polarized parallel to the road surface. Boaters and fishermen also use polarized glasses to block TE waves reflected from the surface of lakes. This improves one’s ability to view things below the water surface. In designing lasers, one places a so-called Brewster window (see Fig. 28) at each end of the cavity that contains the amplifying medium. A Brewster window is constructed simply by tilting the end of the enclosure (for example, glass) at the Brewster angle, making it completely transparent to laser light that is TM polarized. Conveniently, if the window is tilted at the air–glass Brewster angle, it is straightforward to show that refraction in the glass causes the light to strike the exiting surface at the glass–air Brewster angle (assuming that the surfaces are parallel), allowing complete transmission of the TM wave. Total Internal Reflection For internal reflection (n > n ), Fig. 27 shows that beyond some well-defined incident angle, the reflectance of the
238
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
qB1 E
45° qB2
E
Figure 28. Transmission of a TM polarized wave through a Brewster window. θB1 is the external (air–glass) Brewster angle, and θB2 is the internal (glass–air) Brewster angle.
interface has a value of unity. This holds true for either TE or TM polarization, and the angle is the same in both cases. The incident angle at and above which R = 1 is called the critical angle; it marks the onset of the phenomenon known as total internal reflection (TIR). Above the critical angle, the surface behaves as a perfect reflector, and no refracted ray is transmitted through the interface. It is clear from Snell’s law why a critical angle must exist. When light is transmitted across an interface from a region of high index to one of low index, the refracted ray is bent away from the normal, that is, θ > θ . As the incident angle is increased, so is the refracted angle until, at some point, the refracted angle reaches the value θ = π/2. Once this occurs, any further increase in the incident angle θ causes Snell’s law to break down, and no refracted ray is produced. The critical angle θc , is determined by setting θ = π/2 in Snell’s law, Eq. (105). The result is sin θc =
n . n
(124)
As a simple application, consider light that enters the simple glass prism shown in Fig. 29. The critical angle for glass–air (n/n = 1.50) is θc = 41.8° . After being transmitted through the first interface, the light strikes the next interface at 45° . Because this angle is greater than θc , the light is totally reflected toward the next interface, again at an incident angle of 45° . The light is totally reflected once more, then exits the prism. Basically, the prism is useful as a light reflector, similar to a shiny metal surface, but it is not hampered by losses due to oxidation or corrosion. Total internal reflection is also the basic principle behind guiding light through fiber-optic cables. The extremely astute reader might realize that there is an apparent problem with this simple treatment of TIR — it violates the boundary conditions that must hold at the interface. Specifically, the incident and reflected waves give rise to electric/magnetic fields on the high-index side of the interface, but no fields exist on the low-index side. This is not permitted because certain tangential and normal field components must be continuous across the interface. The resolution of the problem lies in the fact that in addition to having a totally reflected wave, another
Figure 29. A light reflector that uses a simple 45 ° –90 ° –45 ° glass prism (in air). The light strikes each interface at 45 ° , which is larger than the glass–air critical angle of 41.8 ° . Hence, the light undergoes total internal reflection at both interfaces.
wave is produced that travels along the interface. It has the following basic form:
Esurface ∼ e−k αz cos[k (1 + α 2 )1/2 x − ωt].
(125)
k is the wave number in the low-index medium and α=
sin θ sin θc
2 − 1.
(126)
The z axis has been chosen so it is normal to the interface, and the x axis is chosen along the interface (and in the plane of incidence). The interpretation of Eq. (125) is that there is a disturbance that propagates along the interface and has an amplitude that decreases exponentially with distance beyond the surface. Strange as it may seem, this surface wave, or evanescent wave, as it is commonly known, does not produce any energy flux across a single interface (a calculation of the Poynting vector shows that this is true). So, as originally stated, all of the energy carried by the incident wave is transferred to the reflected wave. The existence of the evanescent wave leads to an interesting phenomenon called frustrated total internal reflection (FTIR). When a second interface is brought into close proximity to the original one, some optical energy is taken from the reflected wave and leaks across the intervening gap. The fraction of incident energy transmitted to the other side can be varied by adjusting the thickness of the gap relative to the skin depth 1/k α of the evanescent wave. This is the basic principle behind the design of many optical beam splitters. FTIR is the optical analog to the phenomenon of quantum-mechanical tunnelling across a potential barrier (15). Yet another important property of total internal reflection is that the reflected wave is phase-shifted relative to the incident wave. The size of the shift depends on the angle of incidence and how far it is above the critical angle, as well as the wave’s state of polarization. This can be seen as follows. The amplitude reflection coefficients given by Eqs. (111) and (115) can, by using Snell’s law, be
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
rewritten as rTE =
cos θ − iα sin θc cos θ + iα sin θc
(127)
α sin θc . = α cos θ + i sin θc
and
cos θ − i
rTM
(128)
In other words, for θ > θc , the coefficients are complex because α, as given by Eq. (126), is always real. For either polarization, the reflectance R = |r|2 = r∗ r is unity for any angle greater than or equal to θc , as expected. The phase shifts ϕ on the other hand, reduce to the following functions of θ : sin2 θ − sin2 θc ϕTE = (129) tan 2 cos θ sin2 θ − sin2 θc ϕTM tan = . (130) 2 sin2 θc cos θ ϕTE and ϕTM are plotted as a function of incident angle in Fig. 30 for n/n = 1.50. By totally reflecting a wave that is partly polarized in the TE and TM directions from one or more interfaces, it is possible to adjust the relative phase shift, = ϕTM − ϕTE , of the two orthogonal components and change the polarization state of the wave. This is the basic principle behind a number of achromatic optical polarizing elements. For example, the Fresnel rhomb (2) is an element that transforms linearly polarized light into circularly polarized light (i.e., introduces a relative π/2 phase shift between initially in-phase and equal TE and TM components). Unlike a simple quarter-wave plate, which
180
150
239
can introduce only a π/2 shift at certain wavelengths, the rhomb is relatively insensitive in this regard and can be used throughout the visible spectrum. The idea of total reflection can also be extended to the high-frequency or X-ray region of the spectrum. However, referring back to Fig. 17, observe that in this regime where the frequency is typically above some UV resonance, the index of refraction is less than unity. Now, if one considers an interface between, say, air and glass, X rays see the air as the higher index medium. Because the index of glass is typically only slightly less than unity, the critical angle for the total reflection of X rays in air from a glass surface is just slightly less than 90° . This means that total reflection will occur only for X rays that strike the interface at a glancing angle (i.e., almost parallel to the interface). This principle is used to steer and focus X rays in various instruments such as X-ray telescopes and spectrometers. Reflection from a Conducting Surface Consider a wave that strikes the surface of a metal or other conductor at normal incidence. As before, the reflectance of the surface is given by Eq. (120), but now n is replaced by the complex refractive index n˜ = n + i of the conducting medium. Assuming that the wave starts out in air or vacuum, the reflectance is given by n˜ − 1 2 R = n˜ + 1 =
(n − 1)2 + 2 . (n + 1)2 + 2
(131)
Figure 31 graphs the real and imaginary indexes n and , as a function of wavelength for both silver and copper, along with plots of the corresponding reflectance at normal incidence calculated from Eq. (131). Notice that silver has a very high, essentially constant reflectance across the visible and IR spectrum, and hence has excellent mirrorlike behavior at these wavelengths. Copper, on the other hand, reflects much more in the IR and reddish end of the visible spectrum than in the blue-violet; this gives copper its characteristic surface color.
Phase shift (degrees)
TM
Interference of Electromagnetic Waves
120 TE 90
60 ∆
30 qc = 41.8° 0 40
50
60 70 Angle of incidence
80
90
Figure 30. Solid curves: Phase shift ϕ of reflected wave relative to the incident wave for total internal reflection from a typical glass–air interface (n/n = 1.50) for TE and TM polarization. Dashed curve: = ϕTM − ϕTE , the difference between the solid curves.
All waves in nature (light waves, sound waves, water waves, etc.) exhibit the phenomenon known as interference. The fact that waves interfere with one another is one of the most important characteristics that distinguishes wave behavior from the behavior of particles. When one particle in motion encounters another particle, a collision occurs that alters the trajectory of the particles. However, when two waves meet, each wave continues on as if the other wave was not present at all. As a result, the net disturbance caused by the two waves is a sum of the two individual disturbances, and interference effects occur. In a nutshell, interference between waves can be explained by a simple law known as the principle of superposition. Simply stated, it says that when two or more waves are present, the value of the resultant wave, at a given point in space and time, is the sum of the values of the individual waves at that point. For electromagnetic
240
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
1.0
Source 1
Copper
k1
Observation point
x
R
0.8
0.6
′(×10−1) k2
0.4 Source 2
0.2
Figure 32. Interference of two waves (wave vectors k1 and k2 ) at a point in space.
n ′(×10−1) 0.0 300
400
500 600 700 Wavelength (nm)
800
Because of the principle of superposition, the resultant field at the observation point is obtained by adding the instantaneous fields of the two waves, E(r, t) = E1 (r, t) + E2 (r, t). We are interested in determining the intensity produced by the interference at point r. This is given by the time-averaged value of the Poynting vector, which for a plane wave is
1.0 Silver R 0.8
0.6
′(×10−1)
I = cε0 E2 = cε0 E · E .
0.4
(134)
From the superposition, this becomes 0.2
I = cε0 (E1 + E2 )·(E1 + E2 )
n ′(×10−1) 0.0 300
400
500 600 700 Wavelength (nm)
= cε0 E21 + cε0 E22 + 2cε0 E1 ·E2 .
800
Figure 31. Reflection from a copper and a silver surface (in air) as a function wavelength. R, the reflectance at normal incidence, is determined by both the real and imaginary refractive indices, n and , of the metal. (Data are from Ref. (28).)
waves, this principle can be applied to the instantaneous electric-field vector at a given point, that is, the resultant electric field E at position r at time t is obtained by summing the individual electric-field vectors: E(r, t) =
Ej (r, t).
(132)
(135)
The last term that involves E1 ·E2 comes about because of interference. Clearly, when the polarizations of E1 and E2 are mutually orthogonal, the dot product vanishes, and no interference occurs. So, in general, we see that interference occurs only when the polarizations have parallel components. Let us assume that E1 and E2 are identically polarized. Then, the intensity becomes I = cε0 E21 + cε0 E22 + 2cε0 E1 E2
(136)
and the interference term can be reduced to
j
Notice that it is the fields that add, not the intensities or fluxes. Two-Beam Interference Consider the simplest case of two monochromatic plane waves that have the same frequency ω, and their interference at some fixed point (r = const) in space. As shown in Fig. 32, the two waves could represent two beams (assume in vacuum) that have wave vectors k1 and k2 , emitted by two different sources, that interfere at some chosen observation point. The electric-field vectors are E1 (r, t) = E01 cos(k1 ·r − ωt) E2 (r, t) = E02 cos(k2 ·r − ωt).
(133)
E1 E2 = E01 E02 cos(k1 ·r − ωt) cos(k2 ·r − ωt) = 12 E01 E02 cos[(k2 − k1 )·r].
(137)
The last expression follows after using the standard trigonometric identity for the cosine of a difference and then averaging the resulting cross-terms over time. The argument (k2 − k1 )·r represents the phase difference introduced between the two waves that arises from the difference in path lengths traveled by the beams. For the sake of brevity, we introduce the symbol to represent this phase difference. In general, also incorporates any initial phase difference associated with the oscillations of the source. After averaging the first two terms in Eq. (136) over time, we can rewrite the expression for the intensity as
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
I = 12 cε0 E201 + 12 cε0 E202 + 2 12 cε0 E201 12 cε0 E202 cos . (138) The terms 1/2(cε0 E201 ) and 1/2(cε0 E202 ) represent just the separate intensities, I1 and I2 , of the individual contributing waves. Therefore, the final expression for the intensity takes the simplified form
I = I1 + I2 + 2 I1 I2 cos .
(139)
For the special case where both waves have the same intensity, one sets I1 = I2 = I0 , and Eq. (139) becomes I = 2I0 (1 + cos ) = 4I0 cos2
. 2
(140)
A plot of the intensity as a function of phase difference is displayed in Fig. 33. The important features are as follows: When is any integral multiple of 2π , the waves are exactly in phase at the observation point, and the intensity is maximized — this is known as constructive interference. When is an odd-integral multiple of π , the waves are exactly 180° out of phase, and the intensity vanishes — this is called destructive interference. Observe that in constructive interference, the value of the intensity is 4I0 , four times the intensity of either of the two individual waves. This occurs because interference arises from adding fields, not from adding intensities. When the waves are in phase, the observed intensity is obtained by first adding the individual field amplitudes and then squaring the result. Compare this to what would happen if one were to add intensities — the order of the two operations would be reversed, that is, the individual fields would be squared before adding the results, and one would end up with an observed intensity of only 2I0 . This distinction will be discussed more fully in the section on Coherent versus incoherent sources. A phasor diagram is a useful tool for visualizing interference effects. The basic principle underlying these diagrams is that at any fixed point in space, a harmonic wave can be thought of as the real part of a complex exponential of the form E0 exp[i(ωt + φ)]. This can be represented by a phasor, or vector in the complex plane, as shown in Fig. 34. The length of the phasor is E0 and, at t = 0, it makes an angle φ with the real axis. As time goes
4I0
241
w
E0 wt + f
Figure 34. Phasor of magnitude E0 and initial phase φ that rotates about the origin at angular velocity ω.
on, the phasor rotates about the origin at angular velocity ω. At any chosen instant, the actual field is obtained by taking the geometric projection of the rotating phasor on the real axis, and the result is E0 cos(ωt + φ). Now suppose that one is interested in the interference of two waves that have the same polarization, both of frequency ω. Each of these waves can be represented by a complex exponential; one has initial phase φ1 , and the other has initial phase φ2 . If we assume, for simplicity, that the waves have equal amplitude, that is, E01 = E02 = E0 , the exponentials are given by E0 exp[i(ωt + φ1 )] and E0 exp[i(ωt + φ2 )]. According to the superposition principle, the net field is obtained by adding these complex quantities and then taking the real part. Alternatively, one can draw the two phasors in the complex plane and add them together, head to tail, as vectors. This is demonstrated in Fig. 35 for arbitrarily chosen φ1 and φ2 . The amplitude of the net field is the given by the length of the resultant phasor. The fact that the phasors are rotating becomes immaterial when one is combining waves of identical frequency. Because they rotate at the same rate, the relative angle = φ2 − φ1 between the phasors remains fixed and fixes the magnitude of the resultant as well. Some simple geometry applied to the diagram shows that the amplitude of the total field is |Etotal | = 2E0 sin
1 (π − ) = 2E0 cos . 2 2
(141)
As before, the intensity is 1 cε0 |Etotal |2 2 1 =4 cε0 E20 cos2 2 2
I=
= 4I0 cos2
. 2
Etotal
2I0
(142)
E0
f2 − f1 −6π
−4π
−2π
0
2π
4π
6π
f1
E0
Phase difference (rad) Figure 33. Intensity pattern produced by the interference of two waves; each has intensity I0 , as a function of phase difference at some observation point.
Figure 35. The addition of the phasors for two harmonic waves that have the same amplitude (E0 ) and frequency. φ1 and φ2 are the initial phases of the two waves.
242
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
This is the same as the result obtained previously [Eq. (140)]. The reader should be able to deduce that the phasor diagrams for constructive and destructive interference, where is some integral multiple of π , are quite trivial. In constructive interference, the two phasors are aligned, so the total field is just 2E0 , and the intensity is proportional to its square, giving four times the intensity of a single beam. For destructive interference, the phasors are oriented in opposite directions, and they cancel, giving the expected vanishing field and intensity. As is seen shortly, the real advantage of using phasor diagrams becomes more apparent when one considers the problem of interference among many waves. Ideas related to two-beam interference arise in the analysis of many standard problems, including those of two-slit interference, thin-film interference, and the interference of X rays scattered from crystals. Some of these will be discussed in the sections that follow. In addition, the results of this section also carry over as the basis of various interferometric instruments; the most widely known probably is the Michelson interferometer (2,4). Coherent Versus Incoherent Sources It should be apparent from the preceding analysis that interference effects depend critically on the precise phase difference between the waves involved. Unless a well-defined, constant phase difference is maintained over a reasonable amount of time, one will fail to observe interference between the waves. Two sources of light or other electromagnetic waves characterized by a fixed phase difference are said to be mutually coherent, and the same is said of emitted waves. The radiation fields from such sources combine according to the superposition principle, that is, the fields add and lead to readily observable interference effects. On the other hand, if the phase difference between the sources has a fluctuating value, one says that the sources (and waves) are mutually incoherent. In this case, the sources emit waves independently, causing the individual intensities, rather than the fields, to add. Incoherent sources produce no interference effects. In actuality, the concept of coherence is somewhat involved because no two real sources can maintain a fixed phase difference forever. Coherence is usually quantified in terms of a characteristic time, or coherence time, denoted by τc . One can think of τc as the time during which an approximately constant phase difference is maintained. Whether or not sources are coherent and hence can produce interference depends on the value of τc relative to the response time Tr , of the detector used to observe the interference. In general, the condition for mutual coherence is given by τc Tr , whereas the condition for incoherence is τc Tr . When the coherence time and the detector response time have similar values, the situation is no longer clear-cut, and the simple descriptors coherent and incoherent lose their significance. Ordinary extended optical sources such as incandescent filament and gas-discharge lamps are incoherent. They are incoherent in the sense that there are rapidly fluctuating phase differences between waves that originate from
different sources or, for that matter, from different parts of the same source. The fluctuations arise because individual atoms emit independently from one another and randomly, causing the relative phases to become uncorrelated. The coherence time for a typical gas lamp is on the order of a fraction of a nanosecond, and it becomes impossible to observe interference effects from sources of this type even by using the fastest detectors. Electronic oscillators and antennas for generating radio waves and microwaves are generally considered coherent sources. Even so, the phase stability of these sources gives them typical coherence times on the order of a second or less. This means that even for these so-called coherent sources, any interference signal that develops will last only a few seconds, at best, before evolving into a different signal. A method for overcoming such drifts in the signal is to phase-lock the sources by using some type of feedback control. Similar considerations hold for lasers, which serve as coherent sources in the optical regime. Although much more stable than other optical sources, lasers have attainable coherence times only on the order of milliseconds, at most. One might be able to observe interference between beams from two independent lasers by employing a very fast detection scheme (sampling time less than a millisecond), but clearly this presents a challenge. To attain a truly stable interference signal requires long-term coherence between sources. This is usually done by extracting two or more beams from the same set of wave fronts that emanate from a single source. A specific example of this idea is described in the next section. Huygens’ Principle and Two-Slit Interference. The first, and probably most well-known, demonstration of interference between electromagnetic waves is the interference of light passing through two closely spaced slits. The phenomenon was first observed by Thomas Young in 1802. A diagram of a basic setup similar to that used by Young is shown in Fig. 36. Visible light from an incoherent source
x
q
∆r
d
Figure 36. Schematic of Young’s double-slit setup. Light that passes through a single slit illuminates a pair of slits separated by a small distance d. Light from the two slits produces an interference pattern that consists of bright and dark fringes on a distant screen. If the distance from the slits to the screen is large compared to the slit separation, then the distance marked r is the extra path length traversed by light from one slit, compared to the other slit.
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
(coherent optical sources were unavailable at the time Young did his experiments) illuminates a single narrow slit. The light passing through illuminates two narrow slits separated by a very small distance d (typically on the order of tenths of millimeters or less), which in turn produce an intensity pattern on a screen a substantial distance D away (D d). Our discussion will assume that the light from the source is approximately monochromatic, centered on some wavelength λ. Understanding the role of the single- and double-slit arrangement requires stating a fundamental idea put forth by the Dutch physicist, Christian Huygens, during the latter part of the seventeenth century, that has become known as Huygens’ principle. Given a wave front at one instant, this principle provides a way to determine how that wave front propagates through space at later times. Basically, Huygens’ principle states that each and every point on a given wave front acts as a new source of outgoing spherical waves, called wavelets. Once one wave front is specified, a subsequent wave front can be constructed by tracking the envelope generated by these wavelets. To follow the development of the wave front at still later times, one only need apply Huygens’ principle again every time a newly constructed wave front is determined. Based on this understanding, suppose we let light from some arbitrary source fall on an extremely small, almost pointlike, aperture. When a wave front reaches the aperture, only the wavelet generated at the opening will appear on the other side — in effect, the aperture acts as a small source of spherical wave fronts. If the single aperture is now followed by two more apertures, closely spaced, the succession of spherical wave fronts will strike the pair of openings, causing the apertures to act like two point sources, emitting spherical wavefronts in the region beyond. Not only do they behave as point sources, but they behave as coherent sources as well. This property is essential if one expects to see interference effects. Even though the original light source was incoherent, the presence of the first aperture guarantees that the train of wave fronts that strike the double aperture stimulates the production of two wavelet sources that have a definitive, fixed, phase relationship with each other. As was learned earlier, a fixed phase difference is the defining characteristic of coherent sources. Even though our discussion has assumed point-like apertures, the essential ideas still carry over to a setup that uses long, narrow slits. The main difference is that instead of acting as coherent sources of spherical waves, the slits behave approximately as coherent sources of cylindrical waves. Now, referring to Fig. 36, consider the illumination at some arbitrarily chosen point on the screen that is positioned at a large distance beyond the double slit. The light level that appears at a given point depends on the phase difference appearing between the waves that originate at each of the two slits. This phase difference can be determined by knowing how much further the light from one slit has to travel compared to light from the other slit. Assuming that D, the distance to the screen, is large, this difference in path length is approximately given by the distance r labeled on the diagram. If the
243
location of the point under consideration is specified by the angle θ measured relative to the symmetry line that bisects the two-slit arrangement, then, from some simple geometry, one sees that r = d sin θ , where, again, d is the separation between the slits. Thus, the corresponding phase difference is determined by the size of the path difference relative to the wavelength of the light: = 2π
r λ
=
2π d sin θ . λ
(143)
At the center of the screen where θ = 0, the waves from the two slits travel the same distance, so there is no phase difference and the waves arrive at the screen exactly in phase. The waves constructively interfere at this point, and high light intensity is observed. Constructive interference also occurs at other points on the screen on either side of θ = 0, in particular, those that satisfy the condition = 2π m, where m is an integer. Equivalently, the values of θ for constructive interference meet the condition d sin θ = mλ,
m = 0, 1, 2, . . . .
(144)
Destructive interference, corresponding to points of zero intensity on the screen, is given by the condition = 2π(m + 1/2). So θ must satisfy d sin θ = m + 12 λ,
m = 0, 1, 2, . . . .
(145)
Regions of constructive interference are often called bright fringes, and those of destructive interference are called dark fringes. The integer m specifies the so-called order of a fringe. For example, the bright illumination at the center of the pattern corresponds to the zero-order bright fringe, and the adjacent bright fringes on either side of center are both first-order fringes, etc. Suppose that we call the position at the center of the screen x = 0 and measure the actual distance x between this point and the centers of the various bright and dark fringes. Then, as long as θ is sufficiently small, we can replace sin θ ≈ tan θ = x/D in Eqs. (144) and (145). The result is that successive bright fringes or successive dark fringes are equally spaced by the amount x ≈ λD/d. Knowing the slit spacing and the distance to the screen, one can then measure the spacing between adjacent fringes and calculate the wavelength of the light. Using this technique in his early experiments, Young achieved wavelength determinations for visible light. It is important to notice that there is an inverse relationship between slit spacing d, and fringe spacing x. Hence, for a fixed wavelength of light, increasing the slit spacing decreases the spacing between fringes in the pattern (and vice versa). The functional form of the intensity distribution observed on the screen is given by Eq. (140), and is given by Eq. (143). In other words,
π d sin θ λ π dx . ≈ 4I0 cos2 λD
I = 4I0 cos2
(146)
244
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
Bragg Reflection of X-Rays from Crystals. In the X-ray region, a standard two-slit arrangement, such as that just described, would not produce an observable interference pattern. The reason for this can be seen by considering a ˚ According typical X-ray wavelength, for example, λ = 1 A. to Eq. (144), if one sends x-rays at this wavelength through even the most closely spaced slits, say on the order of only a few microns, the first-order maxima would appear at an angle no larger than a few thousandths of a degree. The spacing between higher order fringes would be similarly small and, in practice, one could not resolve any variations in intensity. In practice, producing an observable X-ray interference pattern requires two or more coherent sources separated by an extremely small distance d that approaches the wavelength of the radiation, that is, the separation needs to be on the order of angstroms, or at most, nanometers. These are exactly the types of spacings between atoms that form the lattice of a crystalline solid. When X-rays strike a crystal, the beam is scattered in various directions by the individual atoms. The various atoms that make up the crystal act as a set of mutually coherent X-ray sources (see the section on Rayleigh scattering by atoms), and an interference pattern is produced. Correctly accounting for all the scattering from the individual atoms is somewhat involved (12), but the condition for constructive interference, or maximum intensity, turns out to be quite simple. Rather than concentrating on scattering from the individual atoms, one only needs to treat the X-rays as if they are specularly reflected by various sets of parallel atomic planes within the crystal. A crystal structure, in general, has many sets of parallel planes; each set has its own orientation and plane-to-plane spacing. For now, however, consider only one such set and focus on two of its adjacent planes, as in Fig. 37. Here, d stands for the spacing between successive planes, and φ is the angle of the incoming beam relative to the surface of a reflecting plane. X-ray interference occurs because of the phase difference between the waves that are reflected from the upper and lower planes. From simple geometry, one sees that the ray reflected from the bottom plane must travel an extra path length of 2d sin φ relative to the ray reflected from the top plane. The condition for constructive interference is that this distance must be an
f
integral multiple of the X-ray wavelength, or 2d sin φ = mλ,
m = 1, 2, . . . .
(147)
This is known as Bragg’s law. Actually, one should be considering interference between waves reflected from all of the many parallel planes within a given set. When this is done, however, one finds that this form of Bragg’s law is still correct (see the ideas related to multiple-beam interference in the next section). For a particular set of crystal planes, the spacing d is fixed. If, in addition, the wavelength is fixed as well, then varying the incoming angle φ produces alternating maxima and minima that represent constructive and destructive interference, respectively. Alternatively, Bragg’s law can be used to measure the plane separation if the wavelength is known, and vice versa. These ideas are the basis for standard crystal spectrometers (or x-ray diffractometers) (29). Keep in mind that, in general, there are many sets of crystal planes in a given crystal structure, and quite often, more than one species of atom is present in the crystal. The result is that real X-ray interference patterns from single crystals (also known as a Laue patterns) can be somewhat complex. Nevertheless, the patterns obtained are, in effect, fingerprints of the various crystal structures. Multiple-Beam Interference and Gratings Now, we turn to the problem of the interference between waves emitted by N mutually coherent sources of the same frequency ω. The application that we have in mind is multiple-slit (i.e., N-slit) interference, depicted in Fig. 38. It is assumed that each slit is separated from its neighbor by the distance d. A geometric construction essentially identical to that described for two-slit interference can be applied here to the rays from two adjacent slits, showing that a path difference of d sin θ , or a phase difference of = (2π d/λ) sin θ , arises between the waves from neighboring slits. In addition, if the slit arrangement is small compared with the distance to the observation screen, then all of the waves will have approximately the same amplitude E0 upon reaching the interference point.
f
q
d
in f
ds
f
*
*
*
*
*
* d sinq
Atomic planes
Figure 37. Bragg reflection from atomic planes in a crystal.
d N slits
Figure 38. Interference of light passing through N slits that have separation d. The path difference between rays from two adjacent slits is d sin θ.
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
˜ = E0 eiωt E
N
16
A
12
I /I0
As before, we can treat the field produced by any one of the waves as a complex exponential, or equivalently, a phasor in the complex plane. Let us start by using complex exponentials, and let the field produced at the observation point by the wave from the first slit be E0 exp(iωt). Then, the waves from the other slits are just phase-shifted, in succession, by an amount . Hence, the wave from the nth slit is given by E0 exp{i[ωt + (n − 1)]}. Superimposing all N waves gives a total complex field of
8
B
4
ei(n−1)
n=1
= E0 eiωt
N−1
D C
ein .
(148)
n=0
The has the form of a standard geometric series N−1sum n α , where α = exp(i). The series converges to the n=0 value (1 − α N )/(1 − α). Therefore, ˜ = E0 eiωt E
245
1 − eiN 1 − ei
0 −2π
2π
0 ∆ (rad)
Figure 39. N = 4 slit interference pattern as a function of the phase difference between waves from two adjacent slits.
.
(149)
E
∆ = 0 (Point A)
Hence, the observed intensity I is proportional to the ˜ times its squared magnitude of this complex field (or E complex conjugate). The result reduces to the form N sin2 2 , I = I0 sin2 2
E
∆ = π/4 (Point B)
(150)
where I0 is the intensity from an isolated slit. Note that for N = 2, one can show that Eq. (150) reduces to the expression previously obtained for two coherent sources, or a double slit [i.e., Eq. (140)]. Let us investigate the main features of the intensity pattern predicted by Eq. (150). When the phase difference is any integral multiple of 2π , both the numerator and denominator vanish, and the limiting value of the intensity becomes N 2 I0 . These points in the interference pattern are referred to as principal maxima — they are points where constructive interference occurs. The condition for the principal maxima is = (2π d/λ) sin θ = 2π m, or d sin θ = mλ,
m = 0, 1, 2, . . . .
(151)
This is exactly the same as the condition for maxima in the two-slit interference pattern. Points where the numerator vanishes, but the denominator does not, correspond to zeroes (or minima). These are given by the condition = 2π n/N, where n = 1, 2, . . ., N − 1. There are N − 1 minima between adjacent principal maxima. Between the minima are peaks known as a secondary maxima. There are N − 2 secondary maxima between each pair of principal maxima. Figure 39 shows a plot of intensity as a function of for N = 4 slits. The phasor diagrams for the points labeled A, B, C, and D are displayed in Fig. 40. When the number of slits is large, the separation between the first minima on either side of a principal
π/4
∆ = π/2 (Point C)
(E = 0)
E ∆ = 3π/4 (Point D)
3π /4
Figure 40. Phasor diagrams for points labeled A, B, C, and D on the interference pattern of Fig. 39.
maximum is very small. As a result, the principal maxima are very narrow and very intense (I = N 2 I0 ). When N is large, one refers to the slit arrangement as a grating. The principal maxima in the interference pattern are easily resolvable when a grating is used. This is important when the source contains spectral lines that have closely spaced wavelengths. Suppose that one wants to separate principal maxima of the same order m that correspond to two wavelengths λ1 and λ2 . Ordinarily, the condition that must be satisfied to say that two lines (i.e., wavelengths) are just resolved is that the principal maximum for λ1 occurs at the location of the first zero next to the principal maximum for λ2 (for the same value of m). This condition, called the Rayleigh criterion, is sketched in Fig. 41. It can be shown that the Rayleigh criterion is equivalent to
246
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
Principal maximum for l1
Principal maximum for l2
Figure 41. Rayleigh criterion for resolving two wavelength components (i.e., spectral lines). The wavelengths λ1 and λ2 are considered just resolved if the principal maximum for one component falls at the same location as the first zero of the other component.
requiring that
|λ2 − λ1 | 1 = , λave Nm
(152)
where λave is the average of the two wavelengths. One defines the resolving power RP, for a grating as RP =
λave = Nm. |λ2 − λ1 |
(153)
For example, consider a grating where N = 50,000. The resolving power of this grating for the first-order (m = 1) principal maximum is simply RP = 50,000. Suppose that two spectral lines, both green and close to a wavelength of 550 nm, enter the grating. Then, the grating can resolve a wavelength difference of |λ2 − λ1 | = (550 nm)/50,000 = 0.011 nm. A number of other optical devices are based on the principles of multiple-beam interference. Fabry–Perot devices, for example, contain two closely spaced, highly reflective surfaces that face each other. Each time light traverses the gap between the two surfaces, most of the light is reflected back to the other surface; however, a small portion is transmitted through the device. If the reflectance of the surfaces is very close to unity, a great many of these reflections will occur; each produces a weakly transmitted beam in the process. These transmitted beams will superimpose and produce an interference pattern whose resolution characteristics are very high. Fabry–Perot interferometers are instruments based on this principle and are used for high-resolution optical spectroscopy. Optical elements that incorporate Fabry–Perot geometry are also useful as narrowband spectral filters (2,4,27). Diffraction When light or other EM radiation encounters an aperture or obstacle whose size is either smaller than or on the order of the wavelength λ, the advancing wave produces a rather complex illumination pattern unlike that predicted by basic geometric or ray optics. A crisp geometric shadow is not formed, as one might expect. Instead, the observed irradiance varies smoothly in the vicinity of an edge and
is accompanied by alternating regions of brightness and darkness. Furthermore, the wave is bent to some degree around the edges of the aperture or obstacle into the region of the geometric shadow. These effects, especially the latter, fall under the heading of diffraction. Diffraction occurs because of the wave properties of EM radiation and basically encompasses any effects that represent deviations from geometric optics. When the size of the aperture or obstacle is significantly greater than the wavelength of the radiation, the observable effects of diffraction are extremely minimal, and one is justified in using simple ray-tracing techniques and the principles of geometric optics. Diffraction effects are generally classified into one of two regimes. When the radiation source and observation plane are both sufficiently far from the diffracting aperture (or obstacle) so that the curvature of the wave fronts at the aperture and observation plane is extremely small, a plane wave approximation can be used, and one speaks of far-field or Fraunhofer diffraction. When the distances involved are such that the curvature of the wave front at the aperture or observation plane is significant, one is dealing with near-field or Fresnel diffraction. As the observation plane is moved further and further beyond the aperture, the observed pattern continuously evolves from that predicted by geometric optics to that of Fresnel diffraction, to that of Fraunhofer diffraction. The changes in the illumination pattern are quite complex until one enters the Fraunhofer regime; at that point, any further increase in distance introduces only a change in the scale, not the shape, of the pattern observed. Because of its complicated nature, Fresnel diffraction will not be discussed any further here. Instead, the interested reader should consult some of the available references on the subject (2,4). The focus of this section is on the Fraunhofer regime, starting with diffraction from a single slit as described here. Single-Slit Diffraction According to Huygens’ principle, each point of a wave front acts as a source of outgoing spherical wavelets. This means that when a plane wave encounters a slit, the diffraction pattern is essentially caused by the interference of wavelets that emanate from a huge number of infinitesimally spaced coherent point sources that fill the slit (see Fig. 42). To obtain the far-field (Fraunhofer) irradiance pattern, consider the corresponding phasor diagram shown in Fig. 43. Each phasor represents the electric-field contribution that originates from a single point in the slit and has magnitude dE. The phase difference d for waves from adjacent points separated by a distance dy is d =
2π (dy) sin θ. λ
(154)
The various phasors form a circular arc of some radius R that subtends an angle 2α, and the resultant electric-field amplitude E at a given observation angle θ is given by the length of the chord that connects the end points of the circular arc, or E = 2R sin α. (155)
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
247
I /I0
1.0
q 0.5
dy
a
Figure 42. Single-slit diffraction is caused by the interference of wavelets emanating from an infinite number of closely spaced point sources that fill the slit (width a).
−3π
−2π
−π
π
0 a
2π
3π
Figure 44. Intensity pattern for diffraction from a single slit of width a as a function of α = (π a/λ) sin θ.
R a a
R
E E0
2a
Figure 43. Phasor diagram for single-slit diffraction. 2α is the phase difference between waves emanating from the extreme end points of the slit.
The radius R is just the arc length divided by 2α. Let the arc length be denoted by E0 — this corresponds to the field amplitude at θ = 0 (i.e., where all the phasors are parallel). Then, Eq. (155) reduces to E = E0
sin α . α
(156)
This can be squared to give the irradiance: I = I0
sin α α
2 ,
(157)
where I0 is the intensity at θ = 0. In addition, notice that 2α is the total phase difference between the waves that originate from the extreme end points of the slit — in other words, 2α = N(d). Using Eq. (154) gives α = [π N(dy)/λ] sin θ . However, N(dy) is just the width of the slit, which we shall denote by a. Therefore, α=
πa sin θ. λ
(158)
A sketch of Eq. (157) for single-slit diffraction is shown in Fig. 44. The zeros of the intensity pattern occur when α is an integral multiple of π (except for α = 0), or using Eq. (158), when a sin θ = mλ,
m = 1, 2, 3, . . . .
(159)
The central maximum is twice as broad and much more intense than the secondary maxima and essentially represents the slit’s image. The angular width θ of this image can be defined as the angular separation between the first minima on either side of the central maximum. Using the approximation sin θ θ and setting m = ±1 in Eq. (159) gives the width as θ =
2λ . a
(160)
When a, the size of the slit, is much larger than the wavelength, the spreading of the beam is small, and the effects of diffraction are minimal. As the ratio a/λ increases, one becomes more and more justified in using the approximations inherent in simple geometric optics. However, for slits that are small relative to the wavelength, pronounced beam spreading occurs, and the geometric-optics approach fails completely. For a fixed wavelength, the narrower the slit, the more the beam spreads. Said another way, the more collimated a beam of light or other electromagnetic radiation, the more the beam tends to spread because of diffraction. Even if one just considers a bare, propagating beam, that has no aperture present, diffraction will still cause the beam to spread naturally and lose collimation to some extent. Fraunhofer Diffraction by a General Aperture The single-slit result just obtained is actually a special case of a more general result of Fraunhofer diffraction. Consider a general aperture lying in the x, y plane. The observation plane is a distance D away and parallel to the plane of the aperture. Points on the observation plane are labeled according to a separate X, Y coordinate system (see Fig. 45). It turns out that, in general, the Fraunhofer diffraction pattern for the electric field in the observation plane is proportional to a two-dimensional
248
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
where
Observation plane
α=
ωx a 2
β=
ωy b . 2
Y
and
X
(165)
The single-slit result [Eq. (157)] found in the previous section can be obtained by just letting b → 0 and observing that for small θ , the parameter α reduces to
D
Aperture plane
y
kXa πa X πa πa ωx a = = = tan θ ≈ sin θ, 2 2D λ D λ λ (166) as in Eq. (158). α=
Fourier transform of the aperture: E(ωx , ωy ) = const ×
+∞
−∞
+∞
g(x, y)ei(ωx x+ωy y) dx dy.
−∞
(161) The parameters ωx and ωy are related to the coordinates (X, Y) in the observation plane by kX ωx = D and ωy =
kY , D
(162)
where k = 2π/λ, as usual. ωx and ωy have units of inverse length and are usually referred to as spatial frequencies. g(x, y) is known as the aperture transmission function. When the aperture is simply a hole or set of holes in an otherwise opaque screen, the function g(x, y) is zero everywhere, except within the hole(s) where it has the value of unity. The resulting diffraction patterns, that is, the squared magnitude of E(ωx , ωy ), from a few standard shapes of open apertures are presented here. In general, however, g(x, y) can represent an aperture that has a transmission that varies in amplitude and/or phase as a function of x and y.
Rectangular Aperture. For a rectangular hole, whose dimensions are a and b in the x and y directions, respectively, the aperture transmission function is g(x, y) =
1, 0,
for |x| ≤ a and |y| ≤ b . otherwise
(163)
It is straightforward to compute the Fourier transform, and the diffraction pattern has the form I = I0
sin α α
2
sin β β
2 ,
(164)
Circular Aperture. To handle the computation for a circular aperture of radius R, it becomes convenient to transform to plane polar coordinates, (x, y) → (r, φ) and (X, Y) → (ρ, ϕ). Then, the aperture transmission function is simply 1, for r ≤ R g(x, y) = . (167) 0, otherwise The observed diffraction pattern is circularly symmetrical and is plotted in Fig. 46. The irradiance has the form I = I0
2J1 (u) u
2 ,
(168)
where J1 (u) is the first-order Bessel function whose argument is kR u= ρ. (169) D
1.0
0.8
0.6
I /I0
x Figure 45. Coordinate systems for the aperture and observation planes used in Fraunhofer diffraction. The light is incident on the aperture plane (from below) and produces the diffraction pattern on a screen located in the observation plane.
0.4
0.2
0.0
0
2
4 u
6
8
Figure 46. Fraunhofer diffraction pattern produced by a circular aperture of radius R as a function of u = kRρ/D, where ρ is the distance measured from the center of the observed pattern. D is the distance from the aperture to the observation plane, and k = 2π/λ.
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
The bright, central area is called an Airy disk. The edge of the disk is determined by the first zero of the Bessel function, which occurs at u = 3.832. Then, Eq. (169) gives the angular extent θ of the disk (measured from its center) as approximately θ ≈ 1.22
λ . 2R
(170)
Multiple Apertures. Fraunhofer diffraction from a collection of identical apertures can be handled by using an important result known as the array theorem. The theorem states that the diffraction pattern from an array of identical apertures is given by the product of the diffraction pattern from a single aperture and the interference pattern of an identically distributed array of point sources. An illustration of this idea is the problem of diffraction from N identical long slits whose widths are a and whose center-to-center separation is d. For large N, this would be the observed irradiance from a diffraction grating. The observed diffraction pattern is the product of the diffraction pattern from a single slit [Eq. (157)] and the interference pattern from N slits that have infinitesimal widths [Eq. (150)]. In other words, the pattern is given by
I = I0
sin α α
2
2 N 2 , sin 2
12
The ability of imaging systems to resolve objects is diffraction-limited. For example, when an optical instrument like a telescope or camera images a distant point source, it basically forms a Fraunhofer diffraction pattern at the focal plane of a lens. In this case, the lens opening itself is the aperture. If an attempt is made to image two or more closely spaced point sources, each produces its own Airy disk, and they overlap. According to the Rayleigh criterion discussed previously (see section on multiple-beam interference and gratings), two images are considered resolved when the center of the Airy disk from one source coincides with the edge (i.e., first minimum) of the Airy disk from the other source. The minimum angular separation between the disks (as well as between the sources) is the same θ as given by Eq. (170).
16
I /I0
249
sin
(171)
where, as before, α = (π a/λ) sin θ and = (2π d/λ) sin θ . The pattern for N = 4 and d/a = 8 is shown in Fig. 47. The second factor in Eq. (171) associated with multipleslit interference determines the location of fringes in the pattern, and the first factor, which is due to slit width, determines the shape of the envelope that modulates the fringe intensities. Babinet’s Principle An amazing fact about diffraction in the Fraunhofer regime is that if an aperture (or array of apertures) is replaced by an obstacle (or array of obstacles) that has exactly the same size and shape, the diffraction pattern obtained is the same except for the level of brightness. This result comes from a more general theorem known
8
4
0 −6
−4
−2
0 a
2
4
6
Figure 47. Fraunhofer diffraction from N = 4 identical, long slits. The ratio of the slit separation d to the slit width a is d/a = 8. The intensity pattern is displayed as a function of α = (π a/λ) sin θ.
as Babinet’s principle. For example, coherent light aimed at a long, thin object, like an opaque hair, will produce the same irradiance pattern as light that impinges on a long, thin slit of the same width. Rather than producing a geometric shadow and acting to block the light, the hair causes the light to bend, resulting, as before, in a diffraction pattern that has a central bright fringe. Classical Scattering of Electromagnetic Waves The term scattering refers to an interaction between incident radiation and a target entity that results in redirection and possibly a change in frequency (or energy) of the radiation. For radiation of the electromagnetic type, the word target almost always refers to the atomic electrons associated with the scattering medium. From a fundamental standpoint, the scattering of electromagnetic radiation from atomic and molecular systems needs to be treated as a quantum-mechanical problem. This approach will be taken later on in the section on the scattering of photons. In the quantum treatment, both elastic and inelastic scattering of photons can occur. An elastic scattering process is one in which no change in photon energy or frequency takes place. In an inelastic process, radiation experiences an upward or downward frequency shift due to the exchange of energy between photon and target. This section presents a classical wave picture of the scattering process. According to the classical approach, the only type of scattering that can occur is elastic scattering; inelastic scattering appears only as a possibility in the quantum treatment. Amazingly, however, the results for elastic scattering of classical waves are completely consistent with the quantum results for elastic scattering of photons. Thomson Scattering of X rays and the Classical Concept of a Scattering Cross Section Begin by considering the elastic scattering of low-energy X rays by an atomic electron, a process known as Thomson
250
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
scattering. For X rays, the photon energy h ¯ ω is much greater than the binding energy of an atomic electron, and it is quite reasonable to treat the electron as essentially unbound, or free, and initially at rest. Thomson scattering applies only to X rays whose photon energies are much less than me c2 ( 511 keV), the electron’s rest energy. At higher energies, quantum effects become important, and X rays undergo Compton scattering (to be described later). In a classical treatment of Thomson scattering, one considers the incident radiation as a monochromatic wave (frequency ω, amplitude E0 ) that is polarized, for example, along the x direction. The wave’s electric-field vector exerts an oscillating force Fx = −eE0 cos ωt on a free electron. As a result of this force, the electron (mass me ) undergoes a harmonic acceleration, d2 x Fx eE0 = =− cos ωt, dt2 me me
(172)
dP = I0
2 (d) sin2 γ ,
(175)
where I0 is the intensity of the incident wave. The sin2 γ angular dependence of the scattered power should remind the reader of the distribution of radiation emitted from a simple oscillating electric dipole. The power of the scattered wave per unit solid angle divided by the incident intensity is known as the (angular) differential scattering cross section, and is represented by the symbol dσ/d. For Thomson scattering, it has the particularly simple form dP/d dσ = r20 sin2 γ , = d I0 where r0 =
e2 4π ε0 me c2
= unpol
1 2 r (1 + cos2 θ ). 2 0
(178)
Here, θ is the angle between the propagation directions of the scattered and incident waves, usually referred to as the scattering angle. The total scattering cross section σ is obtained by adding up, or integrating, the expression for the differential cross section over all possible solid angles (i.e., over 4π steradians). Curiously, the result turns out to be the same, independent of whether the radiation is polarized or not: σ =
2π π dσ dσ d = sin θ dθ dφ d d φ=0 θ =0
8π 2 r 3 0
(179) (180)
(173)
which, in turn, radiates electromagnetic energy as a scattered wave, also at frequency ω. It can be shown that the time-averaged power carried by the scattered wave into a small cone of solid angle d (measured in steradians) at an angle γ relative to the polarization vector of the incident field (i.e., the x axis) is given by e2 4π ε0 me c2
dσ d
= 0.665 × 10−28 m2 .
Therefore, the electron behaves as an oscillating electric dipole, e2 E0 cos ωt, (174) p(t) = −e · x(t) = − me ω 2
=
or equivalently, harmonic displacement eE0 cos ωt. x(t) = me ω 2
a linearly polarized wave. For an unpolarized beam of X rays, the differential cross section becomes (12)
(176)
(177)
is known as the classical electron radius, which has a value of 2.82 × 10−15 m. Note that the dimensions of a differential cross section are that of an area (per steradian). The expression in Eq. (176) was derived for
Usually, cross sections are stated in units of barns; 1 barn is defined as 10−28 m2 . Hence, the Thomson scattering cross section is 0.665 barns. One can think of the total cross section σ for a particular scattering process as the effective cross-sectional area of the scatterer (in this case, the electron) presented to the incident radiation beam, and it is proportional to the probability that the scattering process in question takes place. It is important to realize that a scattering cross section does not, in general, correspond to the true geometric cross-sectional area of the scatterer. Although not the case here, the differential and total cross sections are commonly functions of the incident energy (or frequency). The effective radius of a circular disk, whose geometric area exactly√matches that of the derived Thomson cross section, is σ/π, or 4.6 × 10−15 m. In comparison, the radius of an atom is on the order of an angstrom, or 10−10 m. The fact that the cross section for Thomson scattering is so small explains, in part, why X rays penetrate so easily through many materials. Scattering of Light by Small Particles Now, we turn to another problem that can be approached by classical means, namely, the elastic scattering of light by small, dielectric particles. The details are rather mathematical and beyond the scope of this article, but the central ideas and important results are outlined here. Consider a homogeneous dielectric particle with index of refraction n immersed in a uniform background medium of refractive index n0 . A typical scattering geometry is shown in Fig. 48. A beam of polarized light, whose wavelength is λ0 in vacuum, is incident on the particle. The propagation of the incoming beam is chosen to be in the +z direction, and it is characterized by the wave number k = kv n0 , where kv = 2π/λ0 = ω/c is the wave number in vacuum. The incoming light is treated as a polarized plane wave of the (complex) form E0 exp[i(k · r − ωt)], where k is the incident wavevector. The setup displayed in the
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
x
Eo k n
q
n0
z
Es k ′ Dete ctor
y Figure 48. Scattering of light from a homogeneous dielectric particle (refractive index n) immersed in a uniform background medium (refractive index n0 ). The wave vector k of the incident light is along the z axis and, in the geometry shown, the incident light is taken as polarized along the x axis. k is the wave vector of the scattered light in the y, z plane (or scattering plane), and θ is called the scattering angle. (Adapted from Interaction of Photons and Neutrons with Matter by S. H. Chen and M. Kotlarchyk. Copyright 1997 World Scientific Publishing Company. Used with permission.)
figure assumes that the electric field of the incident wave is polarized along the x axis and that one is concerned only about radiation scattered into the y, z plane, also known as the scattering plane. The wave vector of the scattered wave is k , and the angle θ between k and k is the scattering angle. Because the scattering is elastic, the magnitude of the scattered wave vector is the same as the magnitude of the incident wave vector. In general, the polarization vector of the scattered field could have a component normal to or parallel to the scattering plane. These are called the polarized (or vertical) and depolarized (or horizontal) components, respectively, of the scattered wave. An important parameter that arises in the analysis is the scattering vector or wave vector transfer Q. As shown in Fig. 49, Q is the difference between the incident and scattered wavevectors: Q = k − k .
(181)
Because |k| = |k | = k, one can see from simple geometry that the magnitude of the scattering vector is directly related to the scattering angle θ by Q = 2k sin
θ 4π n0 θ sin . = 2 λ0 2
(182)
The general problem to be addressed is to calculate the scattered wave at a position r (often called the field point) far from the scattering particle located at the origin. There are two basic classical approaches for doing so. One approach is based on an integral formulation; the other is based on a differential formulation of the scattering problem. The integral approach will be described first: For field points sufficiently far from the scatterer (farfield approximation), Maxwell’s equations can be recast into a form that produces an explicit expression for the ! scattered field that involves the integral E(r ) exp(−ik · r )dV over all points r (often called source points) inside the particle volume (12). The integrand involves the quantity E(r ), which is the electric-field vector at each point within the particle. Because the internal field of the particle is unknown, it might appear that the integral approach to scattering produces an indeterminate expression for the scattered field. However, by making certain simplifying approximations, a number of useful results can be obtained, as discussed here:
Rayleigh–Gans–Debye Scattering. The Rayleigh–Gans– Debye (RGD) approximation is applicable under the following two conditions: 1. The refractive index of the scattering particle is close to that of the surrounding background medium: |m − 1| 1,
Q
q/2 q
k Figure 49. The scattering vector Q is defined as the difference between the incident and scattered wave vectors k and k .
(183)
where m = n/n0 . 2. The particle is small relative to the wavelength of the light: kd|m − 1| 1, (184) where d is the diameter, or characteristic size, of the particle. If the RGD conditions are met, the scattering is sufficiently weak so that once the incident wave undergoes scattering at some point r within the particle, a second scattering event becomes highly unlikely. This allows one essentially to replace the internal field E(r ) with the value of the incident field at the same point:
E(r ) → E0 ei(k·r −ωt) .
(185)
Using this replacement, the previous integral reduces to a tractable form and the complete expression for the scattered field Es becomes Es = k2 (m − 1)
k′
251
ei(kr−ωt) E0 Vf (Q), 2π r
(186)
where V is the volume of the scattering particle. Notice that scattering of light occurs only when the refractive index of the particle is different from that of the background medium, that is when m = 1. In addition, for RGD scattering, the polarization of the scattered wave is always parallel to the polarization of the incident wave; no depolarized scattering component appears. The function
252
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
f (Q) is called the particle form factor (9,10) — it depends on the details of the shape and size of the particle, as well as the particle’s orientation relative to the direction of the vector Q. The form factor is given by an integral over the particle volume: f (Q) =
1 V
eiQ·r dV .
(187)
V
In physical terms, each point r acts as a source of spherical, outgoing waves. The total scattered wave at the field point r is a coherent superposition of these waves; each has an associated phase shift due to the position of the source point within the particle. This is very much like the superposition of Huygens wavelets encountered in standard interference and diffraction problems. Consider scattering from a small dielectric sphere of radius R. In this case the particle form factor is only a function of u = QR, and is given by f (u) =
3 (sin u − u cos u). u3
(188)
The scattered intensity is proportional to |Es |2 , which is proportional to |f (u)|2 . The angular distribution of the scattered light is plotted in Fig. 50. An important observation (notice the logarithmic vertical scale) for RGD scattering is that the larger the particle, the more it tends to scatter in the forward (i.e., small θ ) direction.
Classical Rayleigh Scattering. Rayleigh scattering occurs when the particle size is very much smaller than the wavelength of the light, irrespective of the relative refractive index m. In other words, it applies when 2kR 1, or equivalently, when QR 1. In this limit, the
form factor approaches unity, and there is no θ dependence to the scattering. The RGD expression for the scattered field reduces to Es =
ei(kr−ωt) 2 (m − 1)k2 R3 E0 . 3 r
(189)
However, this expression is not quite right because the RGD approximation puts a restriction on the refractive index that is not applicable here. The correct expression is actually 2 m −1 ei(kr−ωt) (190) k2 R3 E0 . Es = 2 m +2 r We can introduce the particle polarizability, α, which is analogous to the quantity molecular polarizability previously discussed in connection with Eq. (62). The polarizability of a particle is a measure of the ease with which the oscillating light field induces an electric dipole moment in the particle. It corresponds to α = R3
m2 − 1 . m2 + 2
(191)
The particle essentially acts like a small radiating electric dipole; it produces light that has the same polarization as that of the incident wave (true only for light emitted into the previously defined scattering plane). Now, we can calculate the differential cross section for Rayleigh scattering: E 2 dσ s = r2 d E0 = k4 α 2 .
(192)
The total scattering cross section is σ = 4π k4 α 2
1E+0
A parameter that is sometimes quoted is the particle scattering efficiency (9,10) η, which is the total scattering cross section divided by the geometric cross section of the particle, η = σ/π R2 . For a Rayleigh scatterer, the scattering efficiency can be written as
Relative intensity
1E−1
1E−2
2 2 m −1 . η = 2k2 R2 m2 + 2
1E−3
1E−4
1E−5
1E−6
(193)
0
2
4
6
8
10
QR Figure 50. Rayleigh–Gans–Debye scattering from a dielectric sphere of radius R. (Adapted from Interaction of Photons and Neutrons with Matter by S. H. Chen and M. Kotlarchyk. Copyright 1997 World Scientific Publishing Company. Used with permission.)
(194)
Because 2kR 1, the scattering efficiency is much less than unity. The cross section, and hence likelihood, of Rayleigh scattering is characterized by a k4 or 1/λ4 dependence. This means that scattering of blue light by small particles is much more pronounced than the scattering of red light from the same particles. Probably the most ubiquitous illustration of this fact is the blue color of the sky. When sunlight enters the atmosphere, it is scattered by small particles in the atmosphere. All wavelengths are Rayleigh scattered to some degree, but the blue component of sunlight is scattered the most, and that is the light our eye picks up.
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
Another important property of light scattered by the atmosphere involves its polarization characteristics. Before it enters the atmosphere, light from the sun is completely unpolarized. Now consider some observer who views the scattering of this light along a line of sight perpendicular to a ray of sunlight. Recall that unpolarized light is an incoherent superposition of two orthogonal, linearly polarized waves. Choose one of these polarization components so that it is perpendicular to the line of sight and the other so that it is along the line of sight. Because the line of sight lies in the scattering plane for the perpendicular component, the Rayleighscattered wave associated with this component will also be linearly polarized perpendicular to the line of sight. The polarization component that lies along the line of sight is also Rayleigh scattered, however this component is normal to the scattering plane. Consequently, none of the scattering from this component occurs along the line of sight. The result is that, in this geometry, Rayleigh scattering makes the skylight linearly polarized.
Mie Scattering. For particles that are too large and/or whose refractive index is sufficiently different from that of the background medium to use the RGD approximation, the integral approach to scattering calculations becomes unmanageable. Instead, a differential treatment of the problem can be used. This method involves constructing the solution to Maxwell’s equations both inside and outside the particle, subject to the boundary conditions for the field components at the surface of the particle. The approach is rigorous, and has the advantage that it leads to an exact solution of the scattering problem for particles of any size and refractive index. The usefulness of the technique is limited, however, by its inherent mathematical complexity, hence the problem has been completely solved only for scattering of light from homogeneous spheres, long cylinders or rods, and prolate/oblate spheroids. The application of the differential formalism to the general problem of scattering by a homogeneous sphere (radius R) is referred to as Mie scattering (30). The solution for the scattered field a distance r far outside the particle is ∞ ei(kr−ωt) 2 + 1 Es = E0 −ikr =1 ( + 1) " # P(1) d (1) (cos θ ) × a + b P (cos θ ) . sin θ dθ
(195)
As before, the expression given is valid only for points in the scattering plane. The P(1) ’s stand for associated Legendre functions — these are tabulated in standard mathematical handbooks. The a ’s and b ’s are coefficients given by ξ (x)ξ (y) − mξ (x)ξ (y) a = ζ (x)ξ (y) − mζ (x)ξ (y) and b =
ξ (x)ξ (y) − mξ (x)ξ (y) . mζ (x)ξ (y) − ζ (x)ξ (y)
(196)
253
As before, m is the relative refractive index of the particle. (2) ξ (z) = zj (z), and ζ (z) = zh(2) (z), where the j ’s and h ’s are, respectively, spherical Bessel and spherical Hankel functions of order . A primed ( ) function indicates a derivative. The arguments are x = kR and y = mkR. It is interesting to note that, irrespective of the particle’s size and refractive index, there is never a depolarized scattering component when detection is performed in the scattering plane. In general, this is not true for nonspherical scatterers. However, recall that when a particle of any shape whatsoever satisfies the RGD conditions, the depolarized component does indeed vanish. Figure 51 shows how the intensity of the scattered light, which is proportional to |Es |2 , varies with scattering angle for m = 1.33, 1.44, 1.55, and 2.0, evaluated at two different values of kR. Absorption and Emission of Photons Until now, the interaction of electromagnetic radiation with matter has been discussed from a classical perspective, where the radiation was treated as a wave. Now, we turn to a quantum picture of radiation–matter interactions, where the EM radiation is treated as a collection of photons (refer to the previous section on Quantum nature of radiation and matter). Specifically, we consider processes that involve the emission or absorption of a single photon. Processes of this type are called first-order radiative processes. Excitation and Deexcitation of Atoms by Absorption and Emission An atom in its ground state or other low-lying energy level Ea can be promoted to a state of higher energy Eb by the absorption of a photon of energy hν = Eb − Ea . As one might expect, the likelihood that an atom absorbs one of these photons is proportional to the intensity, or number of such photons, in the incident radiation field. Once in an excited state, an atom may be able to make a downward transition (subject to certain atomic selection rules) to a state of lower energy by emitting a photon of energy hν, again matching the energy difference Eb − Ea between the two states. When such an event occurs, it falls under one of two headings — either spontaneous emission or stimulated emission.
Spontaneous Emission. This process refers to the emission of a photon by an excited atom when no external EM radiation field is present. That is to say, when an excited atom is left on its own, in the absence of any perturbing outside influence, there is some chance that it will spontaneously emit a photon on its own accord. The photon’s polarization and direction of emission are completely random. One defines Einstein’s coefficient of spontaneous emission, normally denoted by Ae , which is the probability per unit time that an individual atom will emit a photon via spontaneous emission. The coefficient depends on ν, the frequency of the emitted photon, according to Ae =
8π 2 p2ba ν 3 . 3ε0 h ¯ c3
(197)
254
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
Comparing this result to Eq. (38), we see that our quantum-mechanical expression is nearly identical to that for the power emitted by a classical radiating dipole, i.e., both expressions are proportional to the fourth-power of the radiation frequency and the square of the dipole moment. At a certain level, it is natural then to think that the emission of EM radiation from an atom is due to the oscillating dipole moment produced by the circulating atomic electrons. Keep in mind, however, that this is a semiclassical point of view that has limited utility. Equation (197) allows one to estimate the fluorescent or radiative lifetime, τ = 1/Ae , of an excited state before it decays spontaneously. For an order-of-magnitude calculation, we can replace the dipole moment pba by ea, where a is the approximate linear dimension of the atom. Then, ωa 2 ω, (199) τ −1 = Ae ∼ α c
10
kR = 2.0 1 × 10−2
0.1
Relative intensity
0.01
m = 1.33
× 10−4
m = 1.44
1E−3 1E−4
m = 1.55 × 10−7
1E−5 1E−6
m = 2.0 1E−7 1E−8 1E−9
0
30
60
90 120 q (degrees)
150
180
1E+4
KR = 6.0 1E+3
This result shows that for excited atomic states that lead to transitions in the visible region of the spectrum (ω ∼ 1015 s−1 ), the lifetimes are on the order of nanoseconds, whereas X-ray transitions (ω ∼ 1018 s−1 ) have lifetimes on the order of picoseconds.
1E+2
m = 1.33
10 Relative intensity
× 10−2 1
m = 1.44
0.1 × 10−4
0.01
m = 1.55
1E−3 × 10−6
1E−4
m = 2.0
1E−5 1E−6 1E−7
0
30
60
90 q (degrees)
120
150
180
Figure 51. Mie scattering from a dielectric sphere (radius R, refractive index n) for kR = 2.0 and 6.0 (k is the wave number in the surrounding medium whose refractive index is n0 ) and for different values of m = n/n0 . (Adapted from Interaction of Photons and Neutrons with Matter by S. H. Chen and M. Kotlarchyk. Copyright 1997 World Scientific Publishing Company. Used with permission.)
pba is a quantum-mechanical quantity known as the dipolemoment matrix element associated with the transition between the upper and lower energy states of an atom. Its value is determined by an integral that involves the wave functions of the two atomic states (15). The power P emitted by the atom during spontaneous emission is obtained by simply multiplying Eq. (197) by the photon energy hν: P=
p2 ω 4 16π 3 p2ba ν 4 = ba 3 . 3 3ε0 c 3π ε0 c
where α = e2 /4π ε0 h ¯ c ≈ 1/137 is the so-called fine structure constant. Furthermore, h ¯ ω = Eb − Ea ∼ e2 /4π ε0 a, or ωa/c ∼ α, so, (200) τ −1 ∼ α 3 ω.
(198)
Stimulated Emission. In the process of stimulated emission, an existing EM field induces the emission of a photon from an excited atom. The emitted photon will be of the same type or mode; it will have the same frequency, polarization, and direction of travel as one of the photons in the external field. It is important to know how the probability of stimulated emission compares to the probabilities for spontaneous emission and absorption. If there are n photons that populate a particular mode in the external field, the occurrence of stimulated emission for that photon mode is n times as likely as that for spontaneous emission into that mode. Furthermore, Einstein first discovered the following amazing fact. The probability that an excited atom undergoes stimulated emission to some lower energy state is precisely the same as the probability that an atom in the lower state will be excited to the upper one by photon absorption. When a large collection of atoms is immersed in a radiation field, the process of stimulated emission competes with that of photon absorption. Stimulated emission tends to increase the number of photons in a particular mode and makes it more and more probable that further emissions will occur. On the other hand, photon absorption removes photons from the field and counters the effects of stimulated emission. Because the probabilities of stimulated emission and absorption are identical, the process that dominates is determined by the fraction of atoms in low-energy and high-energy states. For a system in thermal equilibrium, most of the atoms will be in the ground state, and absorption dominates. However,
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
in a laser (see previous section on Lasers), a mechanism such as optical pumping is introduced, which effectively inverts the statistical distribution of the number of atoms in excited versus low-energy states. This makes the rate of stimulated emission overwhelm the rate of absorption. The cumulative result is a coherent amplification of a highly collimated, monochromatic, polarized beam. This process is fundamentally responsible for laser operation. In lasers, spontaneous emission also occurs, but it appears as background noise in the field. Molecules and Luminescence. The term luminescence refers to the spontaneous emission of photons from excited electronic states of a molecule. More specifically, in photoluminescence a molecule absorbs a photon of energy hν, generally decays to a lower energy excited electronic state via some non-radiative relaxation process (see below), and then emits a photon of energy hν which is less energetic than the absorbed photon. Chemiluminescence is the emission of radiation from a molecule that has been left in an excited electronic state as a result of some chemical reaction (31). Luminescence is generally divided into two categories, depending on the radiative lifetime of the excited state, namely, fluorescence and phosphorescence; lifetimes of phosphorescence are much longer. As a general rule of thumb, the lifetimes of fluorescence are usually on the order of picoseconds to hundreds of nanoseconds, whereas lifetimes for phosphorescence of are on the order of microseconds or longer, sometimes even as long as seconds. Luminescence does not account for all decays from excited molecular states. The reason is that a substantial number of nonradiative processes compete with the spontaneous emission of photons. For example, when a molecule is excited to a high vibrational level (see section on Vibrational and rotational states in molecules), it will usually relax quickly to the lowest vibrational level of the excited electronic state without the emission of a photon. This process is referred to as vibrational relaxation. Other radiationless relaxation processes include internal conversion (nonradiative transition between excited states of identical spin) and intersystem crossing (transition between excited states of different spin). Energy released from nonradiative deexcitation processes appears as thermal energy in the material. One defines a quantum yield for luminescence, as the fraction of transitions that are radiative (i.e., photon-producing). In practice, the quantum yield is less than unity for virtually all luminescent materials. Photoelectric Effect and Its Cross Section Until now, the discussion has been concerned with the absorption and emission of photons that accompany transitions between discrete, or bound, quantum states of atoms and molecules. In the photoelectric effect, a photon is absorbed by an atom, and one of the electrons is kicked out of the atom. In other words, as a result of the process, the atom is left in an unbound state, and a so-called photoelectron is promoted to the energy continuum. There is a strong tendency for the electron to be emitted at an angle close to 90 ° relative to the propagation direction
255
of the incoming photon, especially at low energies. At higher energies, the angular distribution of photoelectrons is shifted somewhat toward the forward direction (i.e., toward angles less than 90° ) (12). The kinetic energy Te of the ejected photoelectron is simply (201) Te = hν − Eb, where hν is the energy of the absorbed photon and Eb represents the binding energy of the electron in the atom. Clearly, the photoelectric effect is possible only if the energy of the incoming photon is at least as large as the binding energy of the most weakly bound electron in an outer atomic shell. Emission of an electron from a more tightly bound shell requires the absorption of a higher energy photon, so that hν exceeds the shell’s absorption edge. Specifically, the K, L, M, . . .-edges refer to the binding energies of atomic electrons in the first, second, third,. . . shells. It should also be emphasized that whenever a photoelectric event occurs, the emitted electron leaves behind a vacancy in one of the atomic shells. Consequently, the photoelectric effect is always followed by downward transitions made by outer shell atomic electrons, which in turn are accompanied by the emission of characteristic X rays from the ionized atom. It is observed, however, that the measured intensities of the resulting X ray emission lines are often very different from what one would expect, especially when there is an inner shell ionization of an element that has a low Z (atomic number) because, in addition to deexcitation by X ray emission, there may also be a significant probability for a nonradiative transition within the atom — this is known as the Auger effect. In this process, rather than emitting a photon of some characteristic energy, the ionized atom emits a secondary electron whose kinetic energy is equal to the aforementioned characteristic energy less the electron’s binding energy in the already singly ionized atom. Such an electron is readily distinguishable from the primary photoelectron in that the energy of the latter depends on the energy of the incident photon [see Eq. (201)], whereas the energy of an Auger electron does not. For atoms ionized in a given shell, one is usually interested in knowing the fluorescent yield Y which is the fraction of transitions to the vacant level that produce an X ray (rather than an Auger electron). The fluorescent yield of a given shell increases with atomic number Z according to the approximate form (32), Y=
1 . 1 + βZ−4
(202)
For the K-shell fluorescent yield, β has a value of about 1.12 × 106 , and for the L-shell, β is about 6.4 × 107 . The literature contains very little information on the fluorescent yields for the M shell and above. An important quantity to consider is the quantummechanical cross section for the photoelectric effect. This is defined as the rate of occurrence of photoelectric events (number of events per unit time) divided by the incident photon flux (i.e., the number of photons in the incident
256
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
beam that crosses a unit area per unit time). In effect, the cross section is a measure of the quantum-mechanical probability that a photon will undergo photoelectric absorption. It is found that this cross section is a strong function of the incident photon energy hν and the atomic number Z of the target atom. When hν is near an absorption edge, the cross section varies greatly and generally changes abruptly at the binding energy of each atomic shell. This effect is most pronounced for high-Z target atoms. For K-shell absorption assuming that hν is far above the absorption edge (i.e., hν Eb ), a quantummechanical calculation shows that the photoelectric cross section is proportional to (hν)−7/2 and Z5 . In other words, the probability of photoelectric interaction diminishes rapidly as incident photon energy increases and increases strongly as the atomic number of the target increases. These results are valid only when the photoelectron produced is nonrelativistic, that is when the kinetic energy of the electron is much less than its rest energy (∼511 keV). In the highly relativistic case, the energy dependence of the cross section varies as (hν)−1 , and there are some complex, but rather small, modifications to the cross section’s Z dependence (12). Pair Production An important mechanism for absorbing high-energy photons, or γ rays, by a material is called pair production. In this process, a photon enters a material, it spontaneously disappears, and is replaced by a pair of particles, namely, an electron and a positron (i.e., a positively charged electron). Electric charge is conserved in this process because the photon carries no charge, and the net charge of an electron–positron pair is zero as well. To satisfy energy conservation, pair production requires a minimum threshold energy for the incoming photon. In relativistic terms, the photon must be at least energetic enough to be replaced by the rest energy of an electron–positron pair. So it appears that pair production requires a minimum photon energy of twice the rest energy of an electron, or 2me c2 . However, this statement is not exactly correct because if a photon is replaced by an electron and a positron at rest in vacuum, the relativistic momentum of the system is not conserved. This is easy to see because a photon always carries a momentum hν/c; however, a resting electron–positron pair clearly has no momentum. What makes pair production possible is the presence of another particle, specifically some nearby massive atomic nucleus (mass M). By requiring that the process take place within the field of such a nucleus, it becomes possible to conserve both energy and momentum, and pair production can occur. The correct expression for the threshold energy of the incident photon becomes me . (hν)min = 2me c2 1 + M
(203)
Because the electron mass is thousands of times smaller than the nuclear mass, the term me /M is virtually negligible. Hence, as before, one can say that for all practical purposes, the threshold energy is approximately twice the electron rest energy, or 1.02 MeV.
Above the threshold value, the energy dependence of the pair-production cross section (rate of pair production divided by incident photon flux) is quite complex, and values are usually obtained from tables or graphs. However, the general trend is that just above the threshold, the probability of interaction is small, but as energy increases, so does the cross section. For highenergy photons above 5–10 MeV, the process of pair production predominates over other interactive processes. The dependence of the cross section on atomic number is nearly proportional to Z2 . Hence, if for some photon energy hν, it is known that the pair-production cross section for a particular element of atomic number Z is σpp (hν, Z), then the cross section for an element of atomic number Z is σpp (hν, Z )
Z Z
2 σpp (hν, Z).
(204)
Shortly following pair production, the positron that is created always combines with an electron in the material. The two particles mutually annihilate one another and emit two back-to-back 511-keV photons. This radiation may escape from the material, or it may undergo either photoelectric absorption or Compton scattering (see next section). Scattering of Photons Recall that in the context of quantum mechanics, the absorption and emission of electromagnetic radiation are classified as first-order radiative processes because these processes involve either annihilating (absorbing) or creating (emitting) a single photon. On the other hand, scattering that involves electromagnetic radiation is considered a second-order radiative process because, from a quantum-mechanical perspective, scattering processes involve the participation of two photons, the incident photon and the scattered photon. According to the theory of quantum electrodynamics (QED), scattering is not simply a redirection of one and the same photon. Instead, it involves annihilation of the incident photon, accompanied by the creation of the scattered photon. Analogous to classical scattering (see section on Thomson scattering of X Rays and the concept of a cross section), one can define an angular differential cross section dσ/d for the scattering of photons. It is defined as the rate of scattering per unit solid angle d divided by the incident photon flux. As in the classical case, the total scattering cross section σ , is obtained by integrating the differential cross section over all solid angles. Compton Scattering by a Free Electron. Begin by considering the scattering of X rays and γ rays by an atomic electron. For such high-energy radiation, the incident photon energy hν far exceeds the binding energy of an atomic electron; hence, one is justified in treating the electron as a particle that is virtually free. The quantum theory of scattering from a free electron is essentially an extension of classical Thomson scattering; however, the latter was applicable only to photon energies far less than the electron’s rest energy, where relativistic considerations can be neglected. Now, we allow for
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
λ of the scattered and the wavelength λ, of the incident photons (also known as the Compton shift) is
hn′ hn′/c
hn hn/c
e−
θ e−
Before scattering
λ − λ =
φ
pe′ Ee
After scattering
Figure 52. Compton scattering of a high-energy photon (energy hν, momentum hν/c) by a free electron. The energy and momentum of the scattered photon are hν and hν /c, respectively, and the electron recoils with momentum pe and total relativistic energy Ee .
high-energy X rays and γ rays and envision that the scattering event is a relativistic particle-like collision between an incoming photon and an initially resting free electron, as illustrated in Fig. 52. This is referred to as Compton scattering. Because the incident photon carries a momentum hν/c, the electron must experience a recoil to conserve momentum in the system. As a result, the scattered photon will have an energy hν and momentum hν /c less than that of the incident photon. Thus, unlike the classical situation, the scattering process shifts the frequency of the radiation downward, that is, ν < ν. The precise relationship between the incident and scattered frequencies comes from requiring conservation of both the relativistic energy and momentum of the system. The energy-conservation equation is
hν + me c = hν + Ee , 2
(205)
where Ee is the total relativistic energy (sum of the rest energy me c2 and kinetic energy Te ) of the scattered (i.e., recoiling) electron. It is related to the relativistic momentum pe , of the scattered electron by Ee = Te + me c2 =
(me c2 )2 + (pe c)2 .
(206)
The conservation relationships for the momentum components are hν hν (207) = cos θ + pe cos φ c c and 0=
hν sin θ − pe sin φ, c
257
(208)
where θ and φ are the scattering angles of the photon and electron, respectively, relative to the propagation direction of the incident photon. Manipulation of Eqs. (205)–(208) produces the following result for the energy of the scattered photon: hν . (209) hν = 1 + α(1 − cos θ ) Here, α = hν/me c2 is the incident photon energy in units of 511 keV. Using the fact that a photon’s wavelength is the speed of light c divided by its frequency, it becomes easy to show that the difference between the wavelength
h (1 − cos θ ). me c
(210)
The kinetic energy of the recoil electron is obtained simply from the fact that Te = hν − hν . Using Eq. (209), the expression reduces to Te = hν
α(1 − cos θ ) . 1 + α(1 − cos θ )
(211)
The relationships derived until now have been based solely on considerations of momentum and energy conservation. Deriving the probability, or cross section, of Compton scattering requires undertaking a complicated, fully relativistic, quantum-mechanical calculation. The closed-form expression for the differential scattering cross section dσ/d for Compton scattering has been determined and is known as the Klein-Nishina formula. The expression is valid for a beam of unpolarized (i.e., randomly polarized) X rays or γ rays and is given by
dσ d
= unpol
1 2 ν 2 (1 + cos2 θ ) r0 2 ν × 1+
$ α 2 (1 − cos θ )2 , (212) (1 + cos2 θ )[1 + α(1 − cos θ )]
where r0 is the previously defined classical electron radius (2.82 × 10−15 m). As shown in Fig. 53, there is a pronounced increase in the fraction of photons scattered in the forward direction as incident photon energy increases. The total cross section for Compton scattering by an electron, σ , is obtained by integrating the Klein–Nishina formula over all solid angles. The resulting energy dependence of σ is plotted in Fig. 54. The important facts are as follows: • For hν much less than me c2 (i.e., α 1), the scattering cross section from a free electron approaches 665 millibarns, which matches the value of the cross section for classical Thomson scattering. This is precisely what is expected. • The cross section is a monotonically decreasing function of energy. • Each electron in an atom presents the same Compton scattering cross section to a beam of incoming photons. Therefore the cross section for a particular atom is simply the number of electrons, that is, the atomic number Z, multiplied by the electronic cross section. Rayleigh Scattering by Atoms. Previously (see section on Classical Rayleigh scattering), Rayleigh scattering was treated as scattering by particles very much smaller than the wavelength of light. In Rayleigh scattering at the quantum level, a photon interacts with the constituent bound electrons of an atom and undergoes elastic scattering; the energy of the scattered photon is
258
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
80 a =0
(ds/dW)unpol (millibarns/steradian)
70 60 a = 0.1 50 40 30 20
a =1
10 0
a = 10 0
30
60
90 120 q (degrees)
150
180
Figure 53. Differential scattering cross section for the Compton scattering of randomly polarized photons as a function of scattering angle θ. The cross section is displayed for four different values of the parameter α = hν/me c2 , the ratio of the incident photon energy to the rest energy of an electron. (From Interaction of Photons and Neutrons with Matter by S. H. Chen and M. Kotlarchyk. Copyright 1997 World Scientific Publishing Company. Used with permission.)
700 600
s (millibarns)
500 400 300 200 100
a single, weakly bound electron, Rayleigh scattering requires the participation of the atom as a whole. In general, it is the primary mechanism for the scattering of EM radiation by atoms in the optical regime, where photon energies are less than or on the same order of magnitude as the spacing of atomic energy levels. The dominance of Rayleigh scattering also extends well into the soft X ray regime, up to photon energies of about 10 keV. In the range of about 10–100 keV, the cross section for the Rayleigh scattering of X rays competes with that for Compton scattering. For hard X rays (∼100 keV and beyond), the probability of Rayleigh scattering becomes virtually negligible. An important property of Rayleigh scattering is that the scattering is coherent. That is to say, the scattered radiation is coherent and allows for interference effects between Rayleigh-scattered waves from atoms at various locations in the target. This fact is what allows the possibility of X ray diffraction and crystallography, as well as many coherent light-scattering and optical effects. Rayleigh scattering arises because of a coupling between the field of the photon and the electric dipole moment of the atom. This type of interaction is the same as that responsible for a transition accompanying a simple single-photon (i.e., first-order) atomic absorption or emission process. Such an electric-dipole interaction, however, can only mediate a transition between distinctly different atomic states. Said another way, the dipolemoment matrix element pba , in Eq. (197), always vanishes when the initial and final atomic states are identical. Because the state of the atom is left unchanged as the result of elastic or Rayleigh scattering, one concludes that some bridging intermediate atomic state is required for the process to take place. The basic idea is illustrated in Fig. 55. Here, one clearly sees that Rayleigh scattering is a second-order, or two-photon, process: The target atom starts with initial energy Ea and absorbs the incident photon of energy hν. The atom forms an excited intermediate state of energy EI and reemits a photon, also of energy hν. Because, this photon, in general, propagates in a direction different from that of the incident photon, it is perceived as scattered. The events described, along with Fig. 55, represent a conceptually simplified version of what is really a fuzzy quantum-mechanical process. First of all, the photon energy hν does not, in general, match the energy difference, EI − Ea , between the intermediate and initial atomic states. The transitions that take place are called virtual transitions, and the intermediate states are
0 0.01
0.1
1 a
10
1E+2
Figure 54. Total cross section for Compton scattering as a function of α = hν/me c2 . (From Interaction of Photons and Neutrons with Matter by S. H. Chen and M. Kotlarchyk. Copyright 1997 World Scientific Publishing Company. Used with permission.)
identical to that of the incident photon. As a result, this type of scattering leaves the atom in its original bound state. Unlike Compton scattering, which only involves
Intermediate state
hn
EI hn
Ea Figure 55. Elastic Rayleigh scattering of a photon from an atom occurs via an intermediate atomic state.
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
often referred to as virtual states. A virtual transition does not need to conserve energy because a virtual state exists only for a fleeting instant. According to the time–energy uncertainty principle of quantum mechanics, violation of energy conservation is permitted for extremely short periods of time (15). In particular, the shorter the lifetime of an excited state, the greater the opportunity not to conserve energy (this principle is responsible for the natural width of spectral lines). Because the lifetime of a virtual state is infinitesimally small, there is no requirement that the excitation energy, EI − Ea , match the energy of the photon. Secondly, an atom presents many choices for the excited intermediate state. According to quantum electrodynamics (QED), the system will actually sample all possible intermediate states! And finally, the transition to and from an intermediate state does not necessarily take place in the order previously described. In addition to having the incident photon vanish before the appearance of the scattered photon, QED also allows the possibility that the scattered photon will appear before the incident photon disappears! As nonsensical as this may seem, such an event is possible in the quantum world. The two possible sequences of events are often illustrated by using constructs called Feynman diagrams. The characteristic behavior of the cross section for Rayleigh scattering depends strongly on the energy of the photon relative to the excitation energy, EI − Ea . The behavior can be divided into the following three regimes: • hν EI − Ea Here, one considers the very long-wavelength limit (relative to the size of an atom) that corresponds to the elastic scattering of optical photons. In this case, the quantum mechanically derived cross section exhibits the same 1/λ4 dependence, as well as the same angular dependence, that characterized the classical formulation of Rayleigh scattering. What is gained from the quantum derivation is an expression for calculating the polarizability of the atom given the available intermediate states and dipole-moment matrix elements of the atom. • hν EI − Ea This corresponds to resonant scattering, or the phenomenon of resonant fluorescence. The resulting cross section is extremely large. Resonant fluorescence occurs, for example, when a well-defined beam of monochromatic yellow light from a sodium lamp enters a transparent cell that contains sodium vapor. When an incoming photon undergoes elastic scattering by one of the gas atoms, the process can be thought of as an absorption of the photon, accompanied by the immediate emission of a photon at precisely the same frequency, but in a different direction. Because the probability of such an event is so large at resonance and because the light frequency is unaffected by the process, the light scattered by any one atom will be rescattered by another. This sequence of events will occur over and over within the gas, causing strong diffusion of the light beam. The result is a pronounced yellow glow from the sodium cell.
259
• hν EI − Ea This condition is met in the soft X ray region of the spectrum (hν ∼ a few keV) where the wavelengths are still large compared to the size of the atom. In this region, the atomic cross section for Rayleigh scattering is proportional to Z2 . In fact, the cross section is exactly Z2 times the Thomson scattering cross section from a single free electron. The fact that the multiplier is Z2 , rather than simply Z, is a consequence of the fact that the various electrons within the atom all scatter coherently with the same phase. Raman Scattering. The inelastic scattering of light from an atom is referred to as electronic Raman scattering. The cross section for this type of scattering is much smaller than that for elastic Rayleigh scattering. The basic process is illustrated in Fig. 56. The target atom that has initial energy Ea absorbs the incident photon of energy hν and makes a virtual transition to an intermediate state of energy EI . The scattered photon appears as a result of the downward transition from the intermediate state to an atomic state of energy Eb different from energy Ea . As in Rayleigh scattering, the principle of energy conservation can be briefly violated; hence, the energy of the incident photon need not match the transition energy, EI − Ea . In addition, all possible intermediate states come into play, and the order of photon emission and photon absorption can go either way. As shown in Fig. 56, the frequency of the scattered photon is either less than or greater than the frequency of the incident light, depending on which is larger, the energy of the initial state Ea , or that of the final atom Eb . When Eb > Ea , the frequency ν of the scattered radiation is downshifted relative to the frequency of the incident light ν, that is, ν = ν − (Eb − Ea )/h, and is commonly referred to as the Stokes line. On the other hand, when Eb < Ea , then ν = ν + (Ea − Eb )/h, and the
(a)
Intermediate state
EI hn ′
hn
Eb Ea (b)
Intermediate state
hn
EI hn′
Ea Eb Figure 56. Inelastic Raman scattering from an atom occurs via an intermediate state. (a) The scattered photon energy hν is less than the energy hν of the incident photon. (b) The scattered photon energy is greater than the energy of the incident photon.
260
ELECTROMAGNETIC RADIATION AND INTERACTIONS WITH MATTER
frequency is upshifted and produces an anti-Stokes line. In a scattered light spectrum, the Stokes and anti-Stokes lines appear as weak sidebands on either side of the elastic peak due to Rayleigh scattering.
acts as an effective thickness of the material, having units of g/cm2 , and Eq. (213) is rewritten as I = I0 e−(µ/ρ)(ρz) .
(215)
Attenuation of Gamma-Rays in Matter Recall that in the classical treatment of an electromagnetic wave that passes through a material (see section on Electromagnetic waves in matter), the intensity of a light beam decreases exponentially as it propagates [(see Eq. (82)]. There is a continuous absorption of the energy carried by the wave along the direction of travel, and the intensity decays by a factor of 1/e (i.e., ∼0.37) after propagating a distance of one-half the skin depth δ of the medium. The passage of high-energy X rays and γ rays through matter is also characterized by an exponential drop in intensity as a function of the distance z into a medium. The intensity decays as
Figure 57 shows plots of the mass-attenuation coefficient as a function of energy for the passage of photons through silicon and lead. The abrupt peaks that appear correspond to various K, L, M, . . . photoelectric absorption edges. Figure 58 is a map that indicates which of the three interactive processes dominates for different combinations of incoming photon energy and atomic number of the target.
1E+4
1E+3
(213)
where µpe , µc , and µpp are the linear attenuation coefficients for photoelectric interaction, Compton scattering, and pair production, respectively. Each attenuation coefficient is actually the atomic cross section for that particular interaction multiplied by the atomic number density (# atoms/m3 ) for the attenuating medium. Because the latter quantity clearly depends on the physical state of the material (i.e., density ρ of the solid, liquid, or gas), it is often more convenient to tabulate the quantity µ/ρ for each process, as well as for all processes together. µ/ρ, known as the mass-attenuation coefficient of the medium, is usually quoted in units of cm2 /g. The quantity ρz then
Pb
1E+2 m/r (cm2/g)
where µ is called the linear attenuation coefficient of the medium. µ is analogous to the classical quantity 2/δ, but it has a different physical interpretation. In highenergy radiation, it is necessary to think of the intensity in terms of the number of photons carried by the beam. As these photons travel through a medium, the beam is weakened by the interaction of individual photons within the material. Three major interactive mechanisms are present: photoelectric interaction, Compton scattering, and pair production. Any one of these processes removes a photon from the incident beam. Because each process is characterized by its own cross section, there is a certain probability that each will occur, depending on the nature of the target medium and the energy of the incoming photon. Unlike the wave picture, where the beam intensity decays continuously, high-energy X rays and γ rays are removed from the beam via individual, isolated, quantum events that occur with certain likelihoods. The quantity 1/µ can be interpreted as the mean distance traveled by a photon before it undergoes an interaction. The linear attenuation coefficient of a given medium is the sum of the attenuation coefficients due to each type of interaction: (214) µ = µpe + µc + µpp ,
1E+1 Si
1
0.1
1E−2 1E−3
0.01
0.1 1 hn (MeV)
10
1E+2
Figure 57. Mass-attenuation coefficients for lead and silicon as a function of photon energy. [Data obtained from NIST Physics Laboratory Website (33).]
120 100 Z of absorber
I = I0 e−µz ,
80
Photoelectric effect dominant
Pair production dominant
60 Compton effect dominant
40 20 0 0.01
0.05 0.1
0.5 1 hn (MeV)
5
10
50 100
Figure 58. Relative importance of the three major types of γ -ray interactions for different combinations of hν, the incident photon energy, and Z, the atomic number of the target. The lines show the values of Z and hν for which the neighboring effects are just equal. (From The Atomic Nucleus by R. D. Evans. Copyright 1955 McGraw-Hill Book Company. Reproduced with permission.)
ELECTRON MICROSCOPES
ABBREVIATIONS AND ACRONYMS AM CT dc EM FTIR FWHM IR LIDAR PET QED RF RGD ROY-G-BIV SI TE TIR TM UV
amplitude modulated computerized tomography direct current electromagnetic frustrated total internal reflection full width at half-maximum infrared light detection and ranging positron-emission tomography quantum electrodynamics radio frequency Rayleigh–Gans–Debye red-orange-yellow-green-blue-indigo-violet International system of units transverse electric total internal reflection transverse magnetic ultraviolet
BIBLIOGRAPHY 1. S. T. Thornton and A. Rex, Modern Physics for Scientists and Engineers, 2nd ed., Saunders College Publishing, Fort Worth, 2000. 2. E. Hecht, Optics, 3rd ed., Addison-Wesley, Reading, MA, 1998. 3. R. Loudon, The Quantum Theory of Light, 2nd ed., Oxford University Press, NY, 1983. 4. F. L. Pedrotti and L. S. Pedrotti, Introduction to Optics, 2nd ed., Prentice-Hall, Upper Saddle River, NJ, 1993. 5. J. D. Jackson, Classical Electrodynamics, 3rd ed., Wiley, NY, 1998. 6. J. R. Reitz, F. J. Milford, and R. W. Christy, Foundations of Electromagnetic Theory, 4th ed., Addison-Wesley, Reading, MA, 1993. 7. M. Born and E. Wolf, Principles of Optics, 7th ed., Cambridge University Press, Cambridge, UK, 1999. 8. C. Scott, Introduction to Optics and Optical Imaging, IEEE Press, NY, 1998. 9. C. F. Bohren and D. R. Huffman, Absorption and Scattering of Light by Small Particles, Wiley, NY, 1983. 10. M. Kerker, The Scattering of Light and Other Electromagnetic Radiation, Academic Press, NY, 1969. 11. P. W. Barber and S. C. Hill, Light Scattering by Particles: Computational Methods, World Scientific, Singapore, 1990. 12. S. H. Chen and M. Kotlarchyk, Interaction of Photons and Neutrons with Matter, World Scientific, Singapore, 1997. 13. D. Marcuse, Engineering Quantum Electrodynamics, Harcourt, Brace & World, NY, 1970. 14. G. Herzberg, Molecular Spectra and Structure, Van Nostrand Reinhold, NY, 1950. 15. D. J. Griffiths, Introduction to Quantum Mechanics, PrenticeHall, Upper Saddle River, NJ, 1995. 16. M. Born and J. Oppenheimer, Ann. Phys. 84, 457–484 (1927). 17. A. Goswami, Quantum Mechanics, 2nd ed., Wm. C. Brown, Dubuque, IA, 1997, pp. 425–430. 18. K. F. Sander, Microwave Components and Systems, AddisonWesley, Wokingham, UK, 1987, pp. 73–81. 19. R. P. Godwin, in Springer Tracts in Modern Physics, vol. 51, G. Hoehler, ed., Springer-Verlag, Heidelberg, 1969, pp. 2–73.
261
20. H. Winick, Synchrotron Radiation News 13, 38–39 (2000). 21. M. Alonso and E. J. Finn, Fundamental University Physics: Fields and Waves, vol. 2. Addison-Wesley, Reading, MA, 1967, pp. 740–741. 22. H. Wiedemann, Synchrotron Radiation Primer, Stanford Synchrotron Radiation Laboratory, Stanford, CA, 1998, http://www-ssrl.slac.stanford.edu/welcome.html. 23. P. W. Milonni and J. H. Eberly, Lasers, Wiley, NY, 1988. 24. M. Sargent III, M. O. Scully, and W. E. Lamb, Jr., Laser Physics. Addison-Wesley, Reading, MA, 1977. 25. S. F. Jacobs, M. O. Scully, M. Sargent III, and H. Pilloff, Physics of Quantum Electronics, vol. 5–7. Addison-Wesley, Reading, MA, 1978–1982. 26. C. Brau, Free-Electron Lasers, Academic Press, San Diego, CA, 1990. 27. M. V. Klein and T. E. Furtak, Optics, 2nd ed., Wiley, NY, 1986, pp. 98–121. 28. D. R. Lide, ed., CRC Handbook of Chemistry and Physics, 81st ed., CRC Press, Boca Raton, FL, 2000, pp. 12–136 and 12–150. 29. B. E. Warren, X-Ray Diffraction, Dover, NY, 1990. 30. G. Mie, Ann. Physik 25, 377–445 (1908). 31. M. A. Omary and H. H. Patterson, in J. C. Lindon, G. E. Tranter, and J. L. Holmes, eds., Encyclopedia of Spectroscopy and Spectrometry, Academic Press, London, 1999, pp. 1186–1207. 32. E. H. S. Burhop, The Auger Effect and Other Radiationless Transitions, Cambridge University Press, London, 1952, pp. 44–57. 33.
Physical Reference Data, National Institute of Standards and Technology — Physics Laboratory, Gaithersburg, MD, 2000, http://physics.nist.gov/PhysRefData/XrayMassCoef/ tab3.html.
ELECTRON MICROSCOPES DIRK VAN DYCK S. AMELINCKX University of Antwerp Antwerp, Belgium
In 1873, it was proven by Ernst Abbe that the resolving power of a light microscope will always be limited by the wavelength of the light, which is of the order of 1 µm, so that there could be no hope to visualize much smaller objects such as atomic scale structures. (In the 1980s, near-field optical scanning techniques were developed that can bring the resolution down by two orders of magnitude.) Fifty years later, a new impulse was given to the problem by the hypothesis of Louis De Broglie about the wave nature of particles so that other particles could also serve as ‘‘light.’’ In 1931 Ernst Ruska developed the first transmission microscope TEM that uses electrons instead of photons. In 1986 Ernst Ruska was awarded the Nobel Prize for his pioneering work. Electrons are the best candidates since they can easily be generated by a heated filament or extracted from a point by an electric field and they are easily deflected by electric and magnetic fields.
262
ELECTRON MICROSCOPES
When accelerated to, say, 100 keV, their wavelength is much smaller (3 pm = 3 × 10−12 m) than that of visible light. They can also be detected on a photoplate, a fluorescent screen, or an electronic camera. On the other hand, they can only propagate in vacuum and they can only penetrate through very thin objects <103 nm), so that vacuum and specimen preparation techniques are crucial. In past decades, electron microscopy has matured to an indispensable tool for materials and biomedical research. Numerous are the Nobel Prizes for research work in which electron microscopy revealed crucial evidence. Despite the very small wavelength of the electrons, it has only recently been possible to visualize individual atoms. The reason for this is that magnetic lenses inevitably suffer from aberrations that, contrary to glass lenses in a light microscope, cannot easily be corrected. Recently the problem of compensating the spherical aberration has become solved on the research level. When individual atoms, the building blocks of nature, can be revealed, electron microscopy enters into a new era. Indeed, then it becomes possible to determine the atomic structure of matter quantitatively and accurately, even for objects for which only limited prior knowledge is available. Although the goal is not yet achieved, the progress is very encouraging. In the future, as materials science evolves into materials design, and mesostructures into nanostructures, the role of electron microscopy will become even more important. In recent years we have also seen the development of related techniques such as scanning transmission electron microscopy (STEM), electron holography, ptychography, and add-on techniques that use complementary information such as X-ray analysis (EDX), and energy filtering. There is now a growing tendency to incorporate all these techniques into one versatile instrument under full computer control, where they are all considered as different ways to obtain complementary information, mainly of chemical nature. The development of scanning electron microscopy (SEM) has taken place parallel to that of TEM. The first SEM was constructed by Manfred von Ardenne in the 1930s. The SEM is conceptually simpler than the TEM. With the aid of magnetic lenses, the electron beam is focused into a small point that is scanned over the surface of the object. The backscattered electrons, or other secondary signals, can then be detected and synchronously displayed on a cathode-ray-tube (CRT) screen. The SEM is relatively easy to use and does not require extensive specimen preparation and is also less expensive than the TEM. On the other hand, it can only be used to study the surface area and, because of the beam spread in the object, its resolution is rather limited and quantitative interpretation of the images is more difficult. TRANSMISSION ELECTRON MICROSCOPY The Instrument A transmission electron microscope is very similar to a light microscope (Fig. 1) in which the light bulb is replaced
Optical microscope
Electron microscope Tungsten filament
Light source
Anode
Condenser lens
Condenser lens Object
Object Objective lens
Objective lens
Intermediate image
Intermediate image
Projector lens
Final image ground glass screen
Projector lens
Photographic plate
Final image fluorescent screen
Figure 1. Comparison between an electron microscope (right) and a light microscope (left). All the components of the light microscope have their analog in the electron microscope. The acceleration electrons have a wave character and act as light.
by an electron source and the glass lenses by magnetic (or electric) lenses. The focusing of the lenses can be varied by changing the lens currents so that the setting of the microscope (focusing, magnification, etc.) can easily be altered. The condenser lens shapes the electron flow into a parallel beam. The specimen is a very thin object mounted in a holder that, through an airlock system, can be brought into the electron beam. Holders exist with a variety of degrees of freedom (translation, rotation, tilt, temperature, stress). The objective lens produces an enlarged and rotated image, which in its turn, through a system of intermediate and projector lenses, is further magnified and projected onto a fluorescent screen, a photoplate, or an electronic camera. Another mode of operation is not to image the object but to image the focal plane of the objective lens. In this way one can observe the electron-diffraction pattern directly onto the fluorescent screen. The resolving power of the electron microscope is mainly determined by the quality of the objective lens. With modern instruments, a resolution of about 0.15 nm can be reached. In modern electron microscopes the microscope is operated under computer control and the alignment can be done automatically. The visual electron source, which is a thermal filament, is increasingly replaced by a fieldemission source, from which the electrons are extracted by an electric field and which yields a much higher brightness.
ELECTRON MICROSCOPES
Image Formation
263
Incident beam
Intuitive Description The imaging process in the electron microscope can be sketched in a very simple way as follows (Fig. 2). The object is illuminated by a plane wave. Due to the interaction of the electrons with the atoms of the object, the electron wave is modulated by the structure of the object. By propagating this object exit wave through the electron microscope to the image plane this electron wave is blurred, that is, each point of the wave is spread into the so-called point-spread function or impulse response function. The wave as well as the point-spread function is a complex valued function with an amplitude and a phase component. Finally, however, in the image plane only the intensity is recorded so that the phase information is lost. Full recording of this phase information can be done using holographic techniques (discussed later).
Specimen
f (x,y)
Objective lens
Back-focal plane
F(u,v ) =
[f (x,y)]
Objective aperture
Wave Optical Description As in the light microscope, the formation of the image must be described by wave optics (Fig. 3). The electron wave is diffracted by the object, and each diffracted beam is focused by the objective lens into one point in the focal plane. The electron wave in the focal plane can be considered as the Fourier transform of the exit wave of the object. The high spatial frequencies (small details) correspond to beams far from the optical axis and the small spatial frequencies (large details) are located close to the optical axis. In case the object is periodical, such as is the case of a perfect crystal, the Fourier transform is discrete, that is, the diffraction pattern in the back focal plane consists of discrete spots. These are the different diffracted beams. In case the object is aperiodic, the image in the focal plane shows diffuse intensity. In the second step of the imaging process, the points in the focal plane act again as sources of waves, which interfere in the image plane. The image wave found in the image plane is the reverse Fourier transform of the wave found in the focal plane; therefore, under ideal imaging conditions with no imaging aberrations or apertures, the image wave should be identical to the object exit wave. In short,
Object Φ(R )
Impulsive response function t (R )
Image |Φ(R )t (R )|2 Figure 2. Schematical representation of the imaging process. The electron wave is modulated by the object and then blurred by the electron microscope.
Image plane
ψ(x,y ) =
[F(u,v )]
Figure 3. Wave optical representation of the image formation by the objective lens in a transmission electron microscope. The corresponding mathematical operations are indicated [see text with R = (x, y) and G = (u, v)].
the whole imaging process is wave optically considered as a sequence of two Fourier transforms, the first one leading from the object plane to the back focal plane of the objective lens, and the second one from the focal plane to the image plane. In practice, however, the objective lens has intrinsic aberrations. The most important ones are the spherical aberration and the defocus. As a consequence, the Fourier components of the electron wave in Fourier space suffer from hampering phase shifts arising from these aberrations that are more important for the high spatial frequencies and that blur the image. Usually the Fourier components, falsified by more than a tolerable threshold, are masked out to minimize the blurring effect. On the other hand, one has to find a compromise because, for high-resolution imaging, as many Fourier components as possible should be admitted to the image. In a sense, one can then consider the image wave as a truncated Fourier reconstruction of the object exit wave with only a limited number of Fourier components, which may be falsified by the aberrative phase shift. The image formed by interference of the residual waves cannot be interpreted easily. The resolution, in a sense the smallest detail present in the image, is then determined by the highest spatial frequency that is allowed to contribute to the image formation. One distinguishes between the point resolution, which is the highest spatial frequency that contributes without extra phase shift and which gives rise to directly interpretable details, and the information limit, which is the highest spatial frequency in the image formation but with a phase shift that complicates a direct interpretation of the details.
264
ELECTRON MICROSCOPES
(a)
Specimen Lens
beam is included, a so-called dark field image results. In these cases, since there is no interference between diffracted beams, the image contrast can more easily be interpreted in terms of the number of electrons scattered in the corresponding particular direction. However, high resolution cannot be obtained with these techniques. In any case, for the interpretation of the images, it is equally important to understand the interaction between the electrons and the object. Mathematical Formulation
Aperture Lens Screen
(b)
Transfer Function By interaction with the object, the object exit wave ψ(R) is modulated both in amplitude and phase. Hence ψ(R) is complex. (R) is the position vector in the object exit plane. According to Fraunhofer’s diffraction theory the diffracted wave in the direction given by the reciprocal vector g (or spatial frequency) is given by the Fourier transform of the object function, that is, φ(g) = Fg ψ(R)
Specimen Lens
Aperture Lens Screen
(c)
Specimen Lens
Aperture Lens Screen Figure 4. Different imaging modes. (a) Bright-field imaging: only the transmitted beam contributes to the image. (b) Dark field: only one diffracted beam contributes to the image. (c) High-resolution imaging: the image is formed by interference of many diffracted beams.
The wave in the back focal plane of the objective lens is then the Fourier transform of the object wave. If one directly images the focal plane, one can see the diffraction pattern, given by |φ(g)|2 . If the object is periodic, such as a crystal, the diffraction pattern will consist of sharp spots. A continuous object will give rise to a continuous diffraction pattern. The second stage of the imaging process is described by an inverse Fourier transform that reconstructs the object function ψ(R) (usually enlarged) in the image plane (Fig. 3). The intensity in the image plane is then given by |ψ(R)|2 . In practice, by inserting an aperture in the focal plane of the objective lens, it is possible to obtain an image in which only selected beams contribute. On passing through the objective lens, each electron beam g undergoes a phase shift and an amplitude reduction (damping). Hence the transfer function takes the form T(g) = A(g) exp[−ix (g)]D(g)
(2)
A(g) describes the effect of the beam-selecting aperture and the damping caused by incoherent effects (such as fluctuations in voltage and lens currents). χ (g) is the phase shift, given by χ (g) =
π (Cs λ3 g4 + 2ελg2 ) 2
(3)
with Cs the spherical aberration constant of the objective lens, ε the focus distance, and λ the electron wavelength. The wave function at the image plane is now given by the inverse Fourier transform ψim (R) = FR−1 T(g)φ(g)
For special investigations of crystalline objects, it is useful to include only one diffracted beam for imaging (Fig. 4). If the central beam is used, one obtains a so-called bright field image (BF); if only a diffracted
(1)
(4)
and the image intensity by Iim (R) = ||ψim (R)||2
(5)
ELECTRON MICROSCOPES
Equation (4) is called the coherent approximation; it is valid for thin objects. In general, the expressions are more complicated. One then has to consider the image as an incoherent superposition of images that are slightly different, due to fluctuations in the conditions of the electron microscope. For a more general description we refer the reader to Ref. 1. The total image formation process is illustrated in Fig. 2.
Impulse Response Function As intuitively described earlier the image transfer can also be described in real space as a blurring effect, and as also follows from Eq. (4) using the convolution theorem, φ(R) = ψ(R) ∗ t(R)
(6)
where ψ(R) is the object wave in real space and t(R) is the Fourier transform of the transfer function. For an hypothetical ideal pointlike object, the object wave ψim (R) would be a δ function or ‘‘impulse’’ [ψ(R) = δ(R)] so that φim (R) = t(R), that is, the microscope would reveal t(R), which therefore is called impulse response function. If the transfer function would be constant (i.e., perfectly flat) in the whole spatial frequency range, the impulse response
265
function would be a δ function so that ψim (R) = ψ(R), that is, the wave function in the image plane represents exactly the wave function of the object. In a sense the microscope is perfect. However, in practice the transfer function cannot be made constant as is shown in Fig. 5. The impulse response function is still peaked as shown in Fig. 6. Hence, as follows from Eq. (6), the object wave ψ(R) is smeared out (blurred) over the width of the peak. This width can then be considered as a measure for the resolution in the sense as originally defined by Rayleigh. The width of this peak is the inverse of the width of the constant plateau of the transfer function in Fig. 5. From another point of view one can argue that if all the spatial frequencies have the same phase shift, the information is transferred forward and keeps a point to point relation to the object. However, the information beyond this plateau is still contributing to the image but with a wrong phase. It is scattered outside the peak of the impulse response function and it is thus redistributed over a larger area in the image plane.
Phase-Contrast Microscopy In an ideal microscope, the image wave function would exactly represent the object wave function and the image of a pure phase object function would show no contrast.
1.0
1.0
0.5
Im(Impulse response function)
Im(Transfer function)
0.8
0.0
0.6
0.4
0.2
−0.5 0.0
−1.0 −1.0 0
1
2
3 4 5 6 7 8 Reciprocal-lattice vector (nm)
9
Figure 5. Transfer function (imaginary part) of a 300 keV electron microscope at optimum defocus.
−1.0
−0.5
0.0 0.5 Lattice vector (nm)
1.0
Figure 6. Impulse response function (imaginary part) corresponding to the transfer function of Fig. 5.
266
ELECTRON MICROSCOPES
This can be compared with imaging a glass plate with variable thickness in an ideal light microscope. If the phases of the diffracted beam (Fourier components) are shifted over π/2 with respect to the central beam the image contrast would directly reveal the phase of the object. In light microscopy, phase contrast can be obtained by inserting an annular quarter-wavelength plate in the focal plane of the objective lens. In electron microscopy, phase-contrast imaging can be achieved by making the transfer function as constant as possible. From Eq. (3) it is clear that phase shifts occur due to spherical aberration and defocus. However, the effect of spherical aberration, which, in a sense, makes the objective lens too strong for the most inclined beams, can be compensated somewhat by slightly underfocussing the lens. The focus setting is called the optimum focus or Scherzer focus and its value can be calculated as 1/2
ε = −1.2 CS λ1/2
(7)
The transfer function for this situation is depicted in Fig. 5. The phase shift χ (g) is nearly equal to −π/2 with respect to the central beam for a large range of spatial coordinates, up to the value −1/4 (8) |g| = 1.5 CS λ−3/4 The image then reveals directly the phase of the object. Now a thin material object acts as a phase object in which the phase is proportional to the electrostatic potential of the atoms projected along the viewing direction. Hence, if the object would be very thin, optimum focus imaging would directly reveal atom columns as dark dots and empty spaces as light areas. However, this argument only holds for spatial frequencies that are within the range given by Eq. (8). Furthermore the thickness up to which an object can be considered as a weak phase object is very small (e.g., 1 nm) and is rarely met in practice.
Resolution One has to distinguish between point resolution (or structural resolution) as the finest detail that can be interpreted in terms of the structure and the information limit, which is the finest detail that can be resolved by the instrument, irrespective of a possible interpretation. The point resolution can be obtained from the inverse of the maximal spatial frequency Eq. (8) as 1/4
ρS = 0.65 CS λ3/4
values are λ = 2 pm (300 keV), CS = 1 mm, ρS = 0.2 nm, ρI = 0.13 nm. The point resolution can be improved by reducing the incoherent effects (e.g., energy and current fluctuations), for instance, by using a field emission source and a stable voltage supply. The Specimen Specimens for high-resolution electron microscopy are prepared using the same techniques as for usual transmission electron microscopy, that is, ion beam milling, chemical and electrolytical thinning, cleavage, crushing, and so on. The only requirements are that the specimen should be sufficiently thin, that is, less than about 10 nm. Furthermore, the specimen should be sufficiently clean and free of contamination. Depending on the type of specimen one can use different thinning methods such as ion beam milling, electrochemical etching, crushing, and dimpling. For details of specimen preparation we refer the reader to Ref. 2. Crystalline specimens are oriented with a zone axis parallel to the incident beam so that all the diffracted beams of that zone are excited simultaneously and maximal information is present in the image. This is shown in Fig. 7. In this situation, the electrons propagate parallel to a zone axis, that is, parallel to the atom rows, which makes a possible interpretation of the images in terms of the projected structure meaningful. As an example, Fig. 8 shows a drawing of a model of a dislocation viewed along a zone axis. It is also possible, using an aperture placed in the focal plane of the objective lens, to select a particular set of beams so that the images contain specific information. If the central beam is not included, these images are called dark-field images. After finding a suitably thin part with the proper orientation, one has to adjust the focus. When the specimen is very thin, the zero focus corresponds to minimal contrast. In practice, one takes a series of images at gradually different focus settings, recorded approximately around the optimum defocus. This is called a through-focus series. When dealing with a specimen that is unstable in the electron beam, the specimen can be completely destroyed within a period of seconds. Here a minimal exposure technique has to be used. Electron Diffraction
(9)
The point resolution is also equal to the ‘‘width’’ of the impulse response function (Fig. 6). The information beyond ρS is transferred with a nonconstant phase and, as a consequence, is redistributed over a larger image area. The information limit ρI can be defined as the finest detail that can be resolved by the instrument. This limit is mainly determined by incoherent effects (fluctuations on the energy of the electrons and on the lens currents, divergences of the illuminating beams). The information limit is usually smaller than the point resolution. Typical
Diffraction Mode When the focal length of the intermediate lens is increased, by weakening its excitation, so as to make the back focal plane of the objective lens coincide with the object plane of the projector lens, a magnified image of the diffraction pattern is projected on the screen. Neither specimen orientation nor selected area are hereby changed; the diffraction pattern is thus representative of the selected area. It should be noted that the area selected in the diffraction mode is much larger than the field of view for imaging.
ELECTRON MICROSCOPES
(a)
k0 Specimen
k
k
z
Ewald sphere
Reciprocal lattice
y x (b)
Figure 7. Formation of the diffraction pattern. The simultaneously excited electron beams can be used for image formation.
267
dH where H represents the three Miller indices of the considered family of lattice planes. ‘‘Reflection’’ will occur for certain ‘‘glancing’’ angles θH satisfying Bragg’slaw; 2dH sin θH = nλ, where λ is the wavelength of the radiation used and n is an integer. This relation expresses the condition that the difference in path length between waves scattered by two successive lattice planes is an integer number of wavelengths (Fig. 9). For 100 kV electrons λ ≈ 0.004 nm and hence the Bragg angles are very small, of the order of a few degrees at most. In terms of reciprocal space the Ewald diffraction condition (4) can be formulated as kH = k0 + BH , where k0 is the wave vector of the incident wave and kH that of the scattered wave and BH is a reciprocal-lattice vector BH = h1 b1 + h2 b2 + h3 b3 . The bj are the base vectors of the reciprocal lattice, defined in terms of the base vectors ai of the direct lattice by the relations ai · bj = δij (Kronecker δ) (i, j = 1, 2, 3). The Ewald condition expresses the conservation of energy and linear momentum on elastic scattering; its physical content is the same as that of Bragg’s law. Geometrically it expresses the condition that a sphere with radius |k| = 1/λ and center C in −k0 (Ewald’s sphere) must pass through a point H of the reciprocal lattice for diffraction to occur. The scattered wave direction is then obtained by joining the center of Ewald’ssphere with the excited reciprocal-lattice point H (Figs. 7 and 10). In electron microscopy the specimen has to be a thin foil since electrons are strongly ‘‘absorbed’’ in solids. This causes an anisotropic relaxation of the Bragg condition along the foil normal. Diffraction now occurs also for conditions deviating somewhat from the exact Bragg angles. This is represented in reciprocal space as a change of the sharp spots into infinitely thin rods, called ‘‘relrods’’ perpendicular to the foil surfaces. These relrods have an intensity profile represented in Fig. 11 according to the kinematical theory of electron diffraction (see the Appendix). Since the radius of Ewald’s sphere (1/λ) is large as compared to the mesh size of the reciprocal lattice (1/dH ), the sphere can be approximated by a plane. The diffraction pattern can thus be considered as the projection of an undeformed planar section of the reciprocal lattice.
k − k 0 = k0 kH Figure 8. Structure model of a crystal containing a dislocation viewed along the atomic columns.
qH
k0
qH
Geometry of the Diffraction Pattern The diffraction conditions can be formulated in two different ways: either emphasizing direct space or reciprocal space. In direct space the diffraction conditions are known as Bragg’s law (3). Attention is focused on sets of parallel lattice planes with interplanar spacing
dH sin qH
dH
Figure 9. ‘‘Reflection’’ of electron waves by successive lattice planes leading to the Bragg condition.
268
ELECTRON MICROSCOPES
C
(a)
k0
C
kH 1 l
qH
k0
kH
BH H
H
O
O Figure 10. Ewald construction of diffracted wave. A diffracted wave is formed if Ewald’s sphere (≈plane) intersects a reciprocal-lattice node H. k0 is the wave vector of the incident wave, kH is the wave vector of diffracted wave, and BH is the reciprocal-lattice vector belonging to node H.
(b)
kH
k0
Intensity
SH BH
H
Figure 12. llustration of the diffraction geometry in reciprocal space, defining the excitation error SH . (a) SH = 0. (b) SH < 0.
2/t
SH
Figure 11. Intensity profile along a ‘‘relrod’’ for a foil with thickness t.
(a)
(b) Condenser lens
The distance SH by which Ewald’s sphere misses the reciprocal-lattice node H is called the excitation error. It is a vector parallel to the foil normal that connects the reciprocal-lattice node and the intersection point with Ewald’s sphere. It is by convention called positive when pointing in the sense of the propagating electrons, that is, when the node is inside Ewald’s sphere. In Fig. 12(b), SH is negative.
Specimen Objective lens Back focal plane
Aperture
Image plane Projector lens
Convergent Beam Electron Diffraction By converging the incident beam into a cone, the diffraction spots become disks with a fine structure of intensity variations. This way of diffraction is highly sensitive to small changes in the crystal structure. It can be used very efficiently to determine the space group symmetry of a crystal, to measure very accurately the unit cell dimensions, thermal expansion coefficients, local strain, and crystal potentials. For more detail, we refer to Ref. 5. Imaging Diffraction-Contrast Imaging
Principles Diffraction-contrast images are preferably obtained under two-beam conditions since this allows an easier
Plane of observation Bright-field image Figure 13. Ray paths (b) dark-field mode.
for
Dark-field image the
(a) bright-field
mode
and
interpretation. The specimen is oriented such that apart from the direct beam only one scattered beam is strong. The selector aperture allows us to select either the direct beam or the intense scattered beam. The selected diffraction spot is then highly magnified by the lenses. The image obtained in this way is a highly magnified map of the intensity distribution in the corresponding diffraction spot. If the direct beam is magnified this is called a brightfield image [Fig. 13(a)]; if the diffracted beam is selected,
ELECTRON MICROSCOPES
(a)
parts. These two parts are related by a translation if the interface is a stacking fault or an antiphase boundary with R = R0 ; they differ in orientation, that is, in S when the interface is a domain boundary (Fig. 15). In both cases sets of parallel fringes are formed, which are parallel to the closest intersection line of the interface and the surface. Their intensity profiles are different, however. Planar interfaces parallel to the foil surfaces only cause a brightness difference in the faulted area. In Fig. 16 the two types of fringes are compared. The most important feature is the nature, bright (B) or dark (D), of the edge fringes (i.e., the first and the last fringe). Bright-field fringe patterns produced at stacking faults have edge fringes of the same nature, whereas in the dark-field fringe pattern the edge fringes have opposite
(b) qH
qH
k0
k0
kH
Z0 = t
kH ∆
269
qn
Figure 14. Repeated reflection of electron waves in a column of crystal. According to the (a) kinematical theory and (b) dynamical theory.
(a)
R
R0
the image is a dark-field image [Fig. 13(b)]. Since the electrons propagate within a narrow column by diffracting back and forth, such columns can be considered as picture elements forming the image (Fig. 14). This is the so-called column approximation (6,7). The amplitude of either the scattered or the transmitted beam emerging in a point on the exit surface of the foil is the sum of the contributions of all volume elements along a column parallel to the incident beam. This amplitude depends on the variation of the excitation error SH along the column, which itself depends on the local orientation of the diffracting planes. In a perfect foil S is constant and it is the same along all the columns; the image is an area of uniform intensity. If a strain pattern is present in the foil S will vary differently along different columns and as a result an image is formed. Diffraction-contrast images are thus very sensitive to strain patterns but they do not reveal the crystal structure.
z1 R0
z (b)
R ∆H
H2 H1
z1
Examples Interface Contrast
z
Interfaces situated in planes inclined with respect to the foil surfaces divide the foil into two wedge-shaped
α = 2π H R0
δ = S 1t H1− S 2t H2
•
BF F
DF L
F
Figure 15. Displacement fields of planar interfaces: (a) stacking fault and (b) domain boundary.
BF L
DF
F
L
F
L B
sin a > 0
B
B
B
D
d>0
B
D
B
a≠π
–
–
–
–
–
–
–
–
–
sin a > 0
D
D
D
B
d>0
D
B
D
D
Figure 16. Comparison of fringe pattern characteristics due to a stacking fault (left) and to a domain boundary (right). The abbreviations F, L, B, and D denote first fringe, last fringe, bright, and dark, respectively.
270
ELECTRON MICROSCOPES
nature. The reverse is true for domain boundary fringes; the edge fringes are of opposite nature in the bright-field image, whereas in the dark-field image the nature of the two edge fringes is the same. From the nature of the first and last fringes one can conclude, for instance, whether a stacking fault in a face-centered cubic crystal is either intrinsic (i.e., of the type . . .abcababc. . .) (8) or extrinsic (i.e., of the type . . .abcabacabc. . .). Figure 16, right refers to a domain boundary whereas Fig. 19, left is due to a stacking fault. Dislocation Contrast (9,10) The contrast produced at dislocation lines can be understood by noting that the reflecting lattice planes in the regions on two opposite sides of the dislocation line are tilted in the opposite sense. Hence the Bragg condition for reflection is differently affected on the two sides of the line. On one side, the diffracted intensity may be enhanced because the Bragg condition is better satisfied (s is smaller), whereas it is decreased on the other side because s is larger, leading to a black-white line contrast shown schematically in Fig. 17 for the case of an edge dislocation. In this schematic representation the line thickness is proportional to the local beam intensity. In bright-field images dislocation lines are thus imaged as dark lines, slightly displaced from the actual position of the dislocation line towards the ‘‘image side.’’ This model implies that imaging in reflections associated with families of lattice planes that are not deformed
by the presence of the dislocation will not produce a visible line image; the image is then said to be extinct (Fig. 18). The extinction condition can to a good approximation be formulated as H · b = 0, where b is the Burgers vector of the dislocation. If extinction occurs for two different sets of lattice planes with normals H1 and H2 , the direction of the Burgers vector b is parallel to H1 × H2 . Images of dislocations can be simulated accurately by numerically solving the equations that describe the dynamical scattering of the electrons in the object (see the Appendix). Fast computer programs (10) have been developed to calculate such images for various strain fields and various diffraction conditions. An example of the agreement between the observed and the computed image that can be achieved is shown in Fig. 19 after Ref. 10. Weak-Beam Images (11)
C E1
Figure 18. Three images of the same area made under two-beam conditions using three different diffraction vectors H1 = 020, H2 = 110, H3 = 210. Note the extinctions.
eZ
E E2 k0
s H sH
O
The width of the bright peak that images a dislocation in the dark-field imaging mode decreases with increasing S. This effect is exploited systematically in the weak-beam method, which allows one to image the dislocations as very fine bright lines on a dark background, using a reflection that is only weakly excited, that is, for which s is large. Unfortunately the contrast is weak and long exposure times are required (Fig. 20). High-Resolution Imaging
Images of Thin Objects IT
IS
BF
DF
Figure 17. Image formation at an edge dislocation according to the kinematical approximation. The thickness of the lines is a measure for the intensity of the beams. IT is the intensity of the transmitted beam and IS of the scattered beam.
A very thin object acts as a phase object, in which the phase is proportional to the projected potential along the electron path. The reason is that, in an electrostatic potential, the electron changes speed, which results in a phase shift. Then the exit wave of the object can be written as ψ(R) ≈ 1 + iσ Vp (R)
(10)
Vp (R) is the projected potential of the object. In the phase-contrast mode [Appendix A1, Eq. (18)], the phase shift of π/2 changes i into −1 so that the image
ELECTRON MICROSCOPES
271
intensity is I(R) ≈ 1 − 2σ VP (R)
(11)
The image contrast of a thin object is proportional to its electrostatic potential VP (R) projected along the direction of incidence.
Building-Block Structures It often happens that a family of crystal structures exists, of which all members consist of a stacking of the simple building blocks but with a different stacking sequence. This is, for instance, the case in mixed-layer compounds, polytypes, and periodical twins, but also periodic interfaces such as antiphase boundaries and crystallographic shear planes can be considered as mixedlayer systems. If the blocks are larger than the resolution of the microscope, each block will show its characteristic contrast. In this way, stacking of the blocks can be directly ‘‘read’’ from the image. The relation between image and structure is called the image code. An example is shown in Fig. 21 for the case of the binary alloy Au4 Mn in which Au or Mn atoms are located on two sublattices of a single fcc lattice. In this image the Mn atoms are visualized as bright dots. The Au atoms are not visible. This kind of image can be interpreted unambiguously.
Interpretation using Image Simulation
Figure 19. Comparison of observed (left) and computer simulated (right) images of dislocations. Note the excellent correspondence (after Ref. 9).
Figure 20. Weak beam image of dislocations in RuSe2 .
In most cases, however, the image cannot easily be decoded in terms of the object structure, making interpretation difficult, especially at very high resolution, where the image contrast can vary drastically with the focus distance. As a typical and historical example, structure images obtained by Iijima for the complex oxide Ti2 Nb10 O25 with a point resolution of approximately 0.35 nm are shown in Fig. 22 (upper parts). The structure as reproduced schematically in Fig. 23 consists of a stacking of corner- or face-sharing NbO6 octahedra
Figure 21. Dark-field superlattice image of Au4 Mn. Orientation and translation variants are revealed. (Courtesy of G. Van Tendeloo.)
272
ELECTRON MICROSCOPES
Holographic Reconstruction Methods
Figure 22. Schematic representation of the unit cell of Ti2 Nb10 O29 consisting of corner-sharing NbO6 octahedra with Ti atoms in tetrahedral sites.
−350Å
−800Å
−950Å
Due to the complexity of the imaging process, the information about the structure of the object is scrambled in the image. The structural information can be extracted directly from the images using holographic reconstruction methods. These methods aim at undoing the image process, that is, going back from image to object. Such a procedure consists of three steps. First one has to reconstruct the electron wave in the image plane. Then one has to reconstruct the exit wave at the object and from this one has to deduce the projected structure of the object. In a recorded image, which shows only intensities, the phase information is lost. Hence, the reconstruction of the whole image wave is a typical phase problem that can only be solved using holographic methods. Basically two methods are workable. One method is offaxis holography (12). Here the electron beam is split in two waves by means of a biprism, which essentially is an electrostatically charged wire. One wave crosses the object so as to produce an enlarged image. The other wave (reference wave) passes by the object through the vacuum and interferes with the image wave in the image plane. In this way the high-resolution image is modulated by the interference fringes. From the position of the fringes one can then determine the phase of the electron wave. The other method is the focus variation method (1). The image wave is calculated by computer processing a series of images taken at different focus settings. Figure 24 shows an experimentally reconstructed exit wave for YBa2 Cu4 O8 . From this the structure of the object can be deduced.
Reconstruted PHASE
−1100Å
−1400Å
−1700Å
Figure 23. Comparison of experimental images (upper parts) and computer simulated images (lower parts) for Ti2 Nb10 O29 as a function of defocus. (Courtesy S. Iijima.)
with the titanium atoms in tetrahedral positions. Highresolution images are taken at different focus values, causing the contrast to change drastically. The best resemblance with the X-ray structure can be obtained near the optimum Scherzer defocus, which is −90 nm in this particular case. However, the interpretation of such high-resolution images never appears to be trivial. The only solution remains in the comparison of the experimental images with those calculated for various trial structures. The results of such a calculation using the model of Fig. 23 are also shown in Fig. 22 (lower parts) and show a close resemblance with the experimental images. Image simulation, however, is a very tedious technique that has to deal with a number of unknown parameters (specimen thickness, exact focus, beam convergence, etc.). Furthermore, the comparison is often done visually. As a consequence, the technique can only be used if the number of plausible models is very limited.
Experimental IMAGE Figure 24. Experimentally reconstructed exit wave for YBa2 Cu48 . Top, reconstructed phase; center: structure model; bottom: experimental image.
ELECTRON MICROSCOPES
Quantitative Structure Determination Ideally quantitative extraction of information should be done as follows. One has a model for the object, for the electron object interaction, for the microscope transfer and for the detection, that is, all the ingredients needed to perform a computer simulation of the experiment. The object model that describes the interaction with electrons consists of the assembly of the electrostatic potentials of the constituting atoms. Also the imaging process is characterized by a number of parameters such as defocus, spherical aberration, and voltage. These parameters can either be known a priori with sufficient accuracy or not, in which case they have to be determined from the experiment. The model parameters can be estimated from the fit between the theoretical images and the experimental images. What one really wants is not only the best estimate for the model parameters but also their standard deviation (error bars), a criterion for the goodness of fit, and a suggestion for the best experimental setting. This requires a correct statistical analysis of the experimental data. The goodness of the fit between model and experiment has to be evaluated using a criterion such as likelihood, mean square difference, or R factor (as with X-ray crystallography). For each set of parameters of the model, one can calculate this goodness of fit, so as to yield a fitness function in parameter space. In principle, the search for the best parameter set is then reduced to the search for optimal fitness in parameter space. This search can only be done in an iterative way as given in the schematic in Fig. 25. First one has a starting model, that is, a starting value for the object and imaging parameters {an }. From these one can calculate the experimental images. This is a classical image simulation. (Note that the experimental data can also be a series of images and/or diffraction patterns.) From the mismatch between experimental and simulated images one can
(a)
273
Simulation
p ({ni }/{ax }) Direct methods
Model space
Experimental data
{ax }
{n i }
New estimate Figure 25. Scheme for the refinement procedure.
obtain a new estimate for the model parameters (for instance, using a gradient method), which can then be used for the next iteration. This procedure is repeated until the optimal fitness (i.e., optimal match) is reached. The refinement procedure needs a good starting model to guarantee convergence. Such a model can be derived from holographic reconstruction. The refinement can also be done using experimental electron diffraction patterns. An application of such refinement is shown in Figs. 26 and 27. Figure 26 (left) shows an HREM image of a Mg/Si precipitate in an Al matrix (13). Figure 26 (right) shows the phase of the exit wave, which is reconstructed experimentally using the focus variation method. From this an approximate structure model can be deduced. From different precipitates and different zones, electron diffraction patterns could be obtained, which were used simultaneously for a final fitting. For each diffraction pattern the crystal thickness as well as the local orientation was also treated as a fittable parameter. An overview of the results is shown in Table 1.
(b)
Figure 26. HREM image (a) and phase of the experimentally reconstructed exit wave (b) of an Mg/Si precipitate in an Al matrix. [Courtesy H. Zandbergen (14).]
274
ELECTRON MICROSCOPES Table 1. Results of Structure Refinement Using Electron Diffraction Data
Zone
Number of Observed Reflections
[010] [010] [010] [010] [010] [001] [001]
50 56 43 50 54 72 52
0 0 0 0 0 4.5 −1.9
R Value (%)
l
MSLS
Kinematic
−2.3 −1.8 0.3 −1.0 2.5 0 0
3.0 4.1 0.7 1.4 5.3 4.1 6.8
3.7 8.3 12.4 21.6 37.3 4.5 9.3
Synchronously scanned beams
+
+
+
8.3 2.6 −1.7 −5.0 −5.9 −3.9 3.6
k
+
+ +
h
6.7(5) 15.9(6) 16.1(8) 17.2(6) 22.2(7) 3.7(3) 4.9(6)
+
+
Crystal Misorientation
Thickness (nm)
+
+ +
+ +
+
+
+ +
+ + +
+ +
Probe beam
+
+
+
Line scan
+
c
+
+
+
a Mg
Display beam
B
A
Si
Figure 27. Structure model with MSLS from the fitting procedure described in the text. [Courtesy H. Zandbergen (14).]
The obtained R factors are of the order of 5%, which is well below the R factors using kinematical refinement that do not account for the dynamical electron scattering. Figure 27 shows the structure obtained after refinement. Details of this study have been published by Zandbergen et al. (14). SCANNING ELECTRON MICROSCOPY The SEM is a mapping, rather than an imaging device (Fig. 28) and so is a member of the same class of instruments as the facsimile machine, the scanning probe microscope, and the confocal optical microscope (19). The sample is probed by a beam of electrons scanned across the surface. Radiations from the specimen stimulated by the incident beam are detected, amplified, and used to modulate the brightness of a second beam of electrons scanned, synchronously with the first beam, across a cathode-ray-tube display. If the area scanned on the display tube is A × A and the corresponding area scanned on the sample is B × B, then the linear magnification M = A/B. The magnification is therefore geometric in origin and may be changed by varying the area scanned on the sample. The arrangement makes it possible for a wide range of magnifications to be obtained and allows rapid changes of magnification since no alterations to the electron-optical system are required. There is no rotation between object and image planes, and once the instrument has been focused on a given area the focus
B Frame scan
A Pixel
Figure 28. Schematic illustration of the basic mapping principle of the scanning electron microscope. [Courtesy D. Joy (19).]
need not be changed when the magnification is varied. To a first approximation the size of the finest detail visible in the image will be set by the size of the probe scanning the specimen. Multiple detectors can be used to collect several signals simultaneously that can then be displayed individually or combined in perfect register with each other. It is this probability in particular that makes the SEM so useful a tool since multiple views of a sample, in different imaging modes, can be collected and compared in a single pass of the beam. Figure 29 shows the basic components of a SEM. These can be divided into two main categories: the electronoptical and detector systems and the scanning processing and display systems. The electron-optical components are often described as being the ‘‘column’’ of the instrument while the other items are the ‘‘console’’ of the machine. The source of electrons is the gun, which produces electrons either by thermal emission, from tungsten or lanthanum hexaboride cathodes, or from a field-emission source. These electrons are then accelerated to an energy in the range from 500 eV to 30 keV. The beam of electrons leaving the gun is then focused onto the specimen by one or more condenser lenses. Although either electrostatic or electromagnetic lenses could be employed all modern SEMs use electromagnetic lenses. Typically, the final objective lens has been of the pinhole design with the sample sitting outside the field of the lens since this
ELECTRON MICROSCOPES
275
Gun Condenser lens Objective lens
Lens controller
Scan coils
Scan generator
High voltage 1 kV to 30 kV
Scan coils
Aperture Incident beam energy E0
Detector
A
B B Specimen Electron column
A Visual display Signal amplification and processing (digital or analog)
Figure 29. Basic components of the scanning electron microscope. [Courtesy D. Joy (19).]
Console
arrangement gives good physical access to the specimen. However, in this arrangement the specimen is 10 mm to 20 mm away from the lens, which must therefore be of long focal length and correspondingly high aberration coefficients. In modern, high-performance instruments it is now common to use an immersion lens (15), in which the sample sits inside the lens at the center of the lens field, or a ‘‘snorkel’’ lens (16) in which the magnetic field extends outside of the lens to envelop the sample. Although the immersion lens gives very good performance and, by making the sample part of the lens structure, ensures mechanical stability, the amount of access to the specimen is limited. The snorkel lens, on the other hand, combines both good electron-optical characteristics with excellent access for detectors and stage mechanisms. The coils that scan the beam are usually incorporated within the objective lens. A double-scan arrangement is often employed in which one sets of coils scans the beam through some angle θ from the axis of the microscope while a second set scans the beam through an angle 2θ in the opposite direction. In this way all scanned beams pass through a single point on the optic axis allowing for the placement of a defining aperture without any constriction of the scanned area. The scan pattern, produced on the specimen, is usually square in shape and is made up of 1,000 horizontal lines, each containing 1,000 individual scanned points or pixels. The final image frame thus contains 106 pixels, although for special activities such as focusing or alignment frames containing only 256 × 256 pixels may be used. Increasingly the detector output is passed through an analog-to-digital convertor (ADC) and then handled digitally rather than as an analog video signal. This permits images to be stored, enhanced, combined, and analyzed using either an internal or an external computer. While the majority of the images are still recorded onto photographic film, digital images can be stored directly to magnetic or magneto-optic disks, and hard-copy output of the images can then be obtained using laser or
dye sublimation printers. Typically scan repetition rates ranging from 15 or 20 frames/s (‘‘TV rate’’) to one frame in 30 s to 60 s (‘‘photographic rate’’) are provided. In addition individual pixels or arrays of pixels within an image field may be accessed if required. In the case of the SEM the attainable resolution is determined by a number of factors, including the diameter of the electron-beam probe that can be generated, the current Ib contained in that probe, the magnification of the image, and the type of imaging mode that is being used. Over most of the operating energy range (5 keV to 30 keV) of the SEM, the probe size and beam current are related by an expression of the form (17) Ib 3/8 1/4 d = CS λ3/4 1 + βλ2
(12)
where λ is the wavelength of the electrons (λ ≈ −1/2 1.226 E0 nm, where E0 is the incident electron energy in eV), β is the brightness of the electron gun in A · cm−2 · sr−1 , and CS is the spherical aberration coefficient of the objective lens. Finally, if the gun brightness is further increased to 108 A · cm−2 · sr−1 by using a field-emission source (18), then the factor is close to unity for both modes of operation considered. For a modern SEM CS is typically a few millimeters; thus minimum probe sizes of 1 nm or 2 nm are available. At low beam energies (below 5 keV) additional effects including the energy spread of electrons in the beam must also be considered, but the general conclusions discussed previously remain correct. Modes of Operation Secondary-Electron Imaging Secondary electrons (SE) are those electrons emitted by the specimen, under irradiation by the beam, which have energies between 0 eV and 50 eV. Because of their low
276
ELECTRON MICROSCOPES
energy the SE only travel relatively short distances in the specimen (3 mm to 10 mm) and thus they emerge from a shallow ‘‘escape’’ region beneath the surface. There are two cases in which an SE can be generated and subsequently escape from the specimen: first, when an incident electron passes downward through the escape depth, and second as a backscattered electron leaves the specimen and again passes through the escape region. Secondary electrons produced in the first type of event are designated SE1, and because they are generated at the point where the incident beam enters the specimen, it is these that carry highresolution information. The other secondary electrons are called SE2, and these come from a region the size of which is of the order of the incident beam range in the sample. Secondary-electron imaging is the most common mode of operation of the SEM. The reason for this is that secondary electrons are easy to collect and they carry information about the surface topography of the specimen. Information about surface chemistry and magnetic and electric fields may also be obtainable on suitable specimens. SE images can usually be interpreted readily without specialized knowledge and they yield a spatial resolution of 1 nm or better. Examples of typical SE images are shown in Figs. 30 and 31. The light and shadow effects together with the very large depth of focus enhance the 3D aspects of the surface structure. Another imaging mode is voltage constrast, which is illustrated in Fig. 32. Here large regions of uniform bright and dark contrast correspond to regions that have a negative and positive voltage with respect to ground.
Figure 31. High-resolution image of magnetic disk media surface recorded at 30 keV in a JEOL JSM 890 field-emission SEM. [Courtesy D. Joy (19).]
Backscattered Electrons Backscattered electrons (BSE) are defined as being those electrons emitted from the specimen with energies between 50 eV and the incident beam energy E0 . Because the yield of BSE varies with the atomic number of the specimen the contrast of the images is related to the atomic number of the object. Other Imaging Modes With a SEM it is possible to measure the current through the object as induced by the imaging electron beam [electron-beam-induced current (EBIC)]. This signal gives
Figure 32. Voltage contrast from an integrated circuit. Recorded at 5 keV in a Hitachi S-800 field-emission SEM. [Courtesy D. Joy (19).]
information about the electron-hole pair carriers in a semiconductor such as those at p–u junctions. In cathode luminescence, one detects the fluorescence radiation that is due to irradiation by the incident beam. This is a very sensitive technique that gives information about the impurities in semiconductors. For more information we refer the reader to Ref. 19. SCANNING TRANSMISSION ELECTRON MICROSCOPY
Figure 30. Secondary-electron images of radiolarium. Recorded in Hitachi S-4500 field emission SEM at 5 keV beam energy. Magnification: 800×. [Courtesy D. Joy (19).]
In principle a STEM can be considered as a SEM in which the object is transparent for the high energy electrons and in which the detector is placed behind the object. As in a STEM, a fine electron probe, formed by using a strong objective electron lens to demagnify a small source, is scanned over the specimen in a two-dimensional raster [Fig. 33(a)]. The electron probe is necessarily convergent: the convergence angle is, ideally, inversely proportional to the minimum probe size that determines the microscope resolution. On any plane after the specimen, a convergent beam electron-diffraction pattern is formed. Some part of this diffraction pattern is collected in a detector, creating a signal, which is displayed on a cathode-ray-tube screen to
ELECTRON MICROSCOPES
(a)
277
Stem Source
Objective Detector Specimen
Detector
A
B Detector
Source
CTEM
(b)
ADF detector
BF detector
Lens FEG source
Specimen
EELS spectrometer CTEM Display Scan
form the image using a raster scan matched to that which deflects the incident electron beam (20,21). Dark-field images, obtained with an annular detector in a STEM instrument, showed the first clear electron microscopy images of individual heavy atoms (22). From that time, STEM has developed as an important alternative to conventional, fixed-beam transmission electron microscopy (CTEM), with special advantages for many purposes. The use of a field emission gun (FEG) for high-resolution STEM is necessary to provide sufficient signal strength for viewing or recording images in a convenient time period. Because the FEG source has a brightness that is a factor of 103 or 104 greater than that of a W hairpin filament, the total current in the electron beam is greater when beam diameters of less than about 10 nm are produced. The current in a beam of 1 nm diameter is typically about 1 nA. As suggested by Fig. 33(b), the essential components of a STEM imaging system are the same as those for a CTEM instrument, with the electrons traveling in the opposite direction. In this diagram condensor and projector lenses have been omitted, and only the essential objective lens, which determines the imaging characteristics, is included. The STEM detector replaces the CTEM electron source. The STEM gun is placed in the detector plane of the CTEM, and the scanning system effectively translates the STEM source to cover the CTEM recording plate. When one uses a detector with a hole to eliminate the unscattered electron beam, the imaging is effectively incoherent so that the image contrast can be interpreted directly in terms of the atomic number of the constituting atoms. This imaging mode is therefore called Z contrast imaging (20). Figure 34 shows a STEM image of a tilt boundary in silicon in which the local atomic configuration can be seen directly in the images.
Signal
Figure 33. (a) Diagram of the essential components of a STEM instrument. (b) Diagram suggesting the reciprocity relationship between STEM (electrons going from left to right) and CTEM (electrons going from right to left). [Courtesy J. Cowley (20).]
(a)
(b)
Figure 34. " = 9, {221}(100) symmetric tilt boundary in silicon viewed along the [110] direction showing its fiveand seven-membered ring structure. ADF = annular dark field; EELS = electron energy loss spectroscopy. [Courtesy S. Pennycook (21).]
278
ELECTRON MICROSCOPES
The strength of STEM as compared to TEM is that a variety of signals may be obtained in addition to the bright-field or dark-field signals derived from the elastic scattering of electrons in the specimen. STEM instruments are visually fitted with an energy-loss spectrometer. Energy filtered images reveal compositional information. For more information we refer to Refs. 20 and 24.
The effect of all processes prohibiting the electrons from contributing to the image contrast, including the use of a finite aperture, can in a first approximation be represented by a projected absorption function in the exponent of Eq. (A.5) so that ψ(x, y) = exp[iσ VP (x, y) − µ(x, y)]
(A.7)
ψ(R) = exp[iσ VP (R) − µ(R)]
(A.8)
or APPENDIX A. ELECTRON-DIFFRACTION THEORIES
with R = (x, y) the vector in the plane perpendicular to z.
Phase Object Approximation We will now follow a classical approach. The nonrelativistic expression for the wavelength of an electron accelerated by an electrostatic potential E is given by 1 λ= √ 2meE
(A.1)
with h the Planck constant, m the electron mass and e the electron charge. During the motion through an object with local potential V(x, y, z) the wavelength will vary with the position of the electron as 1 λ1 (x, y, z) = 2me[E + V(x, y, z)]
(A.2)
For thin phase objects and large accelerating potentials the assumption can be made that the electron keeps traveling along the z direction so that by propagation through a slice dz the electron suffers a phase shift. dz dz − 2π 1 λ λ & % E + V(x, y, z) dz √ −1 = 2π λ E
dφ (x, y, z) = 2π
σ V(x, y, z) dz with σ =
π λE
V(x, y, z) dz = σ VP (x, y)
(A.4)
where VP (x, y) represents the potential of the specimen projected along the z direction. Under this assumption the specimen acts as a pure phase object with transmission function ψ(x, y) = exp[iσ VP (x, y)]
(A.5)
In case the object is very thin, one has ψ(x, y) ≈ 1 + iσ VP (x, y) This is the weak-phase approximation.
As follows from Eq. (1) a diffraction pattern can be calculated from the Fourier transforms of the exit wave ψ(R). However, even for a simple approximation such as Eq. (A.8) the Fourier transform is not expressed in a simple analytical form. In order to derive a simpler, albeit approximated, expression for the diffraction pattern it is more convenient to describe the diffraction process directly in Fourier space. According to the kinematical diffraction theory electrons are scattered in the specimen only and moreover the incident beam is not depleted by scattering. Each atom (scattering center) thus sees the same incident beam amplitude. This approximation is excellent in neutron diffraction, justified in X-ray diffraction, but it is poor in electron diffraction because the atomic scattering cross sections for electrons are relatively much larger than those for the other forms of radiation. The kinematical approximation is therefore only applicable to very thin crystals (a few nanometers for most materials) or for very large deviations from the exact Bragg condition (large s). It allows one to compute the amplitude of the diffracted beam only since the incident beam remains undepleted. Qualitative conclusions from the kinematical theory are nevertheless usually in agreement with the observations. A crystal is made up of identical unit cells, regularly arranged at the basic lattice nodepoints given by AL = l1 a1 + l2 a2 + l3 a3
(A.3)
Therefore the total phase shift is given by φ(x, y) = σ
Kinematical Theory
(A.9)
(where lj is an integer). In each unit cell, a number N of atoms is found at the relative positions ρk (k = 1, . . . , N). Mathematically speaking, the whole crystal is made up by convolution of one unit cell with the basic crystal lattice. Atom positions are thus rj = AL + ρk and they depend on four indices l1 , l2 , l3 , and k. Let k0 represent the wave vector of the incident wave and k that of the diffracted wave; then at large distance, i.e., in the Fraunhofer approximation, the phase difference between a wave diffracted by an atom at the origin and an atom at rj is given by 2π(k − k0 ) · rj and the scattered amplitude A(k) along the direction of k (Fig. 35) A(k) =
fj exp[2π i(k − k0 ) · rj ]
(A.10)
j
(A.6) This amplitude will exhibit maxima if all exponents are integer multiples of 2π i; maxima will thus occur if
ELECTRON MICROSCOPES
279
reciprocal-space base vectors bj : s = s1 b1 + s2 b2 + s3 b3
(A.17)
s · AL = l1 s1 + l2 s2 + l3 s3
(A.18)
k
one has
k0 B A
rj
The triple sum can be expressed as the product of three single sums of geometrical progressions. Calling N1 , N2 , and N3 the numbers of unit cells along the three lattice directions a1 , and a3 , one obtains, neglecting an irrelevant phase factor
C OC − AB = (k − k 0)• rj
N1 −1 N2 −1 N3 −1
O Figure 35. Illustrating the path difference OC − AB between waves diffracted by an atom at the origin and an atom at rj .
(k − k0 ) · rL = integer, which implies that k − k0 must be a reciprocal-lattice vector k − k0 = BH ≡ h1 b1 + h2 b2 + h3 b3
(A.11)
(where hj are integers and bi are base vectors). This is Ewald’s condition as discussed in the section on electron diffraction. However, A(k) will also be different from zero if the diffraction condition is not exactly satisfied, that is, if Ewald’s sphere misses closely a reciprocal-lattice node by a vector s, called the excitation error. This vector is parallel to the foil normal and connects the reciprocallattice node with the intersection point with Ewald’s sphere; by convention s is positive when the reciprocallattice node is inside Ewald’s sphere and negative when outside (Fig. 7). One can now set k − k0 = BH + s and AH =
fj exp[2π i(BH + s) · rj ]
AH = FH
This is the well-known von Laue interference function (10) (Fig. 36), which describes the dependence of the scattered amplitude on the deviation parameter s. The sine functions in the denominators can be approximated by their arguments, since these are always small. We further note that for large N one has sin(π Ns )/Nπs ∼ δ(s) with δ(s) = 0 for s = 0 and δ(s) = 1 for s = 0. We can then write, neglecting irrelevant phase factors, AH = FH δ(s1 )δ(s2 )δ(s3 )
(A.20)
(A.13)
One can then rewrite Eq. (A.19) in terms of Sx , Sy , and Sz as AH = FH
fk exp[2π i(BH + s) · (AL + ρk )]
(A.12)
(A.14)
sin π sx N1 a1 sin π sy N2 a2 sin π sz N3 a3 sin π sx a1 sin π sy a2 sin π sz a3
(A.22)
400
AH = FH
exp(2π is · AL )
(A.15)
L
where the structure factor FH is defined as FH =
fk exp[2π i(BH · ρk ]
(A.16)
k
Equation (A.15) is in fact a triple sum over the indices L(l1 , l2 , l3 ). If s is written as a vector in terms of the
300
200 sin2pSxa1
Neglecting s · ρk as compared to the other terms and noting that BH · AL is always an integer, this can be written as
for N1 = 20
k
sin2pSx a1N1
L
Va
where is the volume of the crystal and Va the volume of the unit cell ( = N1 N2 N3 a1 a2 a3 ; Va = a1 a2 a3 ). For a parallelopiped-shaped crystal block one often introduces the components of s along the three mutually perpendicular edges of the block with unit vectors ex , ey , and ez : (A.21) s = sx ex + sy ey + sz ez
and with r − j = AL + ρk
(A.19)
sin π s1 N1 sin π s2 N2 sin π s3 N3 = FH sin π s1 sin π s2 sin π s3
j
AH =
exp[2π i(l1 s1 + l2 s2 + l3 s3 )]
l1 =0 l2 =0 l3 =0
100
3
p/2
p
3p/2
pSx a1 Figure 36. von Laue interference function describing the dependence of the scattered intensity on the excitation error sx .
280
ELECTRON MICROSCOPES
Hereby use was made of the relations valid for a parallelopiped s1 = sx a1 , s2 = sy a2 , sz = sz a3
(A.23)
Equation (A.22) is only true if N1 , N2 , and N3 are sufficiently large. However, the foils used in transmission electron microscopy are only large in two dimensions, that is, along x and y, the foil being only a small number N3 of unit cells thick along z. In such a foil one thus obtains AH = FH
sin π sz N3 a3 δ(sx )δ(sy ) Va π sz a3
(A.24)
Introducing the specimen thickness N3 a3 = t and assuming sx , xy = 0 and calling sz = 0 one finds AH =
sin π st stH
(A.25)
per surface area where tH = π/FH ; tH is called the extinction distance. This result is interpreted as meaning that the sharp reciprocal-lattice nodes, characteristic of a large crystal, become rods in the case of a thin plate, as already mentioned before. These rods are perpendicular to the foil plane and have a weight profile given by sin(π st)/stH . The corresponding intensity is given by (Fig. 37) sin2 π st IH = (A.26) (stH )2 it is called the rocking curve. An intensity can be associated with each intersection point of the Ewald sphere with this rod (called relrod), that is, with each s value, the intensity being given by the value of the function at the intersection point. Another way to interpret these results is the following: In the Fraunhofer approximation, the diffracted wave is expressed by the Fourier transform of the electrostatic potential of the crystal. A crystal can be considered as a convolution product of two factors: The convolution theorem then states that the diffracted wave is given by the product of the respective
Fourier transforms of these two factors [e.g., Eq. (A.15)]. The Fourier transform of the lattice function yields delta functions at the reciprocal nodepoints, which describe the directions of the diffracted beams. The amplitudes of these beams are then given by the Fourier transforms of the potential of one unit cell, i.e., the structure factors [e.g., Eq. (A.20)]. The conservation of energy also requires that the wavevectors of the diffracted beams should all have constant length, or, that the reciprocal nodes should lie on a sphere, the Ewald sphere. In case the object is a thin crystal slab, it can be described as the product of an infinite crystal with a slab function that is equal to 1 inside the slab and 0 elsewhere. In that case, the diffraction pattern is given by the convolution product of the diffraction pattern of the infinite crystal with the Fourier transform of the slab function. Then each reciprocal node is smeared along a line perpendicular to the slab, with a factor given by a sinc function of the form in Eq. (A.26). Note that if the Ewald sphere would be flat, the diffracted wave can be derived from the Fourier transform of Eq. (A.8) provided the phase is weak. This means that, apart from the Ewald sphere, the weak phase object theory and the kinematical theory are equivalent. Equation (A.15) can also be understood intuitively in terms of the column approximation along the z direction. The amplitude of the wave diffracted by the volume element zn at level zn in the column (measured from the entrance face) is given by AH = FH zn exp(2π iszn ) or in differential form dAH = FH exp(2π isz) dz
(A.27)
The amplitude at the exit face of the column is then given by the sum AH = FH
exp(2π iszn )zn
(A.28)
n
which, if s = const, can be approximated by the integral
t
AH = FH
exp(2π isz) dz
(A.29)
0
or AH =
FH sin 2π st πs
(A.30)
Is
I0
O
1/t 0
S Figure 37. Rocking curve for a foil with thickness t0 according to the kinematical theory.
which is consistent with Eq. (A.26), though not identical. In the complex plane the sum Eq. (A.28) can be represented by an amplitude-phase diagram (7) (Fig. 38); it consists of the vector sum of elementary vectors, all of the same length, each representing the amplitude diffracted by a unit cell. Successive unit cells along a column in a perfect crystal diffract with constant phase differences, that is, the corresponding vectors enclose constant angles. The diagram is a regular polygon that can be approximated by a circle with radius FH /2π s. The length of the arc of the circle is equal to the column length. The amplitude diffracted by the column is given by the length of the segment connecting the endpoints P and P of the arc. It is clear that the amplitude will be zero if the column length (i.e., the foil thickness t) is an integer number of
ELECTRON MICROSCOPES
P′
281
interference between the twice transmitted beam with amplitude φ0 (z)φ0 (dz) and the doubly scattered beam of which the amplitude is φH (z)φ−H (dz) [Fig. 39(a)]. The minus sign in −H means that reflection takes place from the −H side of the set of lattice planes. We thus obtain
P
φ0 (z + dz) = φ0 (z)φ0 (dz) + φH (z)φ−H (dz)
t 2
t 2
1 2ps
dz
2psz 0 Figure 38. Complex plane construction of the amplitude-phase diagram for a perfect foil.
complete circles long. The maximum amplitude is equal to the diameter of the circle, that is, to FH /π s. Along deformed columns the amplitude-phase diagrams become curved (spiral shaped) since the angle between successive segments is no longer constant.
(A.31)
The slice dz being arbitrarily thin the kinematical approximation [e.g., Eq. (A.27)] can be applied rigorously to φ0 (dz) (≡ 1, no beam depletion) and φ−H (dz) = π i/tH exp(−2π isz) dz, where the factor i results from the phase change on scattering and where the structure amplitude FH has been expressed in terms of the extinction distance tH . Note that changing H → −H also changes the sign of s [Fig. 39(b)]. Similarly the scattered beam amplitude results from the interference between (1) the transmitted beam, which is subsequently scattered in dz and (2) the scattered beam, which is subsequently transmitted through dz [Fig. 39(a)].
(a) q
z
dz
Two-Beam Dynamical Theory for Perfect Crystals The dynamical theory takes into account that a scattered beam can act in turn as an incident beam and be scattered again in the interior of the crystal. The simplest case that can analytically be discussed and which moreover is relevant for image formation in the diffraction contrast mode is the two-beam case. Next to the incident beam only one beam is strongly excited (has small s). This scattered beam is then again an incident under the Bragg angle for the same set of lattice planes and can thus be scattered again. This interplay between incident and scattered beam tends to obliterate the destruction between incident and scattered beam in the interior of the crystal; it limits strongly the lateral spread of an incident Bragg diffracted electron beam and justifies the column approximation used in image calculations of defects according to the diffraction contrast mode. The dynamical theory is applicable to ‘‘thick’’ crystals provided also absorption is taken into account. It allows to compute the amplitudes of the transmitted beam as well as of the diffracted beam for a single Bragg reflection. We will include all usual approximations in the model already from the onset. These approximations are as follows: (i) ideal two-beam situation, (ii) no absorption, and (iii) column approximation. Within a column along z, perpendicular to the foil surface, we describe the interplay between the transmitted beam represented by the plane wave φ0 (z) exp(2π ik0 · r) and the scattered beam represented by φH (z) exp(2π ik · r) (twobeam approximation). The complex amplitudes φ0 and φH depend on z only (column approximation). Within the slice dz at the level z behind the interface we express that the transmitted beam amplitude results from the
x
f0(x, z) fH (dz )
fH (x, z) f−H (dz )
fH (x, z) f0(dz )
f0(x, z) f0(dz )
(b)
C
k ′H , k ′0
k ′H , k0
H sH
+H
O′
O −H
−sH −H ′
Figure 39. Schematic representation of the interfering waves during dynamical diffraction in the two-beam case. (a) Right: transmitted beam, left: scattered beam. (b) Changing H → −H changes also s → −s.
282
ELECTRON MICROSCOPES
This leads to the relation φH (z + dz) = φ0 (z)φH (dz) + φH (z)φ0 (dz)
(A.32)
where again φ0 (dz) = 1 and φH(dz) = (π i/tH )[(exp(2π isz)] dz. The two Eqs. (A.31) and (A.32) can be transformed into differential equations by noting that quite generally φ(z + dz) − φ(z) =
dφ dz dz
(A.33)
One thus obtains the following set of coupled differential equations: πi dφ0 exp(2π isz) φH (z) = dz t−H dφH πi exp(−2π isz) φ0 (z) = dz tH
(A.34)
in centro-symmetrical crystals t−H = tH. An alternative system is obtained by the substitution φ0 = T,
φH = S exp(−2π isz)
dS πi T = 2π isS + dz tH
(A.36)
These are the Darwin-Howie-Whelan equations (10,11) of the two-beam dynamical diffraction theory. The solution for a perfect crystal (i.e., s is constant) is easily obtained by the standard procedure used to solve systems of coupled first order differential equations; one finds sH sin(π σH z) exp(π isH z) T = cos(π σH z) − i σH (A.37) i sin π σH z exp(π isH z) S= σH t H where σH2 =
(1 + s2H t2H ) t2H
Two-Beam Dynamical Theory for Faulted Crystals Displacement Fields of Defects
(A.35)
which only changes the phase of the amplitudes but not the resulting intensities. One obtains dT πi S = dz t−H
beam and vice versa; this effect is called the Pendell¨ousung effect because of its similarity with the behavior of two coupled pendulums or two coupled oscillating circuits. Equations (A.37) describes the periodic depth variations of the diffracted and transmitted intensity, as well as the variation as a function of the excitation error s. Equation (A.39) is called the rocking curve. In an undeformed wedge-shaped specimen the depth variation gives rise to thickness extinction contours, which are parallel to the cutting edge of the wedge (Fig. 40). In a bent plane-parallel specimen the lines of constant s give rise to equi-inclination or bent contours (Fig. 41). It can be shown that taking absorption into account the shape of the rocking curve becomes asymmetric in s for the transmitted beam (Fig. 42) whereas it remains symmetric for the scattered beam. [A similar effect occurs in x-ray diffraction: the Bormann effect (22)]. The steep slope of the transmitted intensity in the vicinity of s = 0 is exploited in imaging strain fields due to dislocations and other defects.
In transmission electron microscopy defects are characterized by their displacement fields R(r). The simplest example is the stacking fault, for which R(r) is a step function, R = 0 for z < z1 , and R = R0 for z1 < z < z0 , z1 being the level of the stacking fault plane in the foil and z0 being the foil thickness. The exit part of the foil is displaced over a vector R0 with respect to the entrance part (Fig. 15). At the level of the interface the diffracted beam undergoes a relative phase shift given by α = 2π H · R0
(A.40)
whereas the transmitted beam is unaffected. The amplitude TS of the transmitted beam for the foil containing
IS
1/sH
1/SH
(A.38)
The scattered intensity is thus given by the square modulus of S sin2 (π σH z) IS = SS∗ = (A.39) (σH tH )2 where S∗ denotes the complex conjugate of S and IT = 1 − IS since absorption is neglected. Formula (A.39) is the homolog of formula (A.26), found in the kinematical approximation. Note that the depth period, which is 1/sH in the kinematical case, now becomes 1/σH . There is no longer a divergence for SH → 0. Equations (A.34) describe the periodic transfer of electrons from the transmitted beam into the scattered
z
Figure 40. Illustration of the formation of thickness extinction contours.
ELECTRON MICROSCOPES
283
and the exit part, respectively; the minus sign indicates that the excitation error is −s in the corresponding expression because the diffraction vector is −H. Similarly the diffracted beam amplitude can be expressed as (Fig. 42, right) SS = T1 S2 e−iα + S1 T2−
IS
O SH
1/t
−H
1
(a) Direct wave
Figure 41. Illustration of the formation of equi-inclination (bent) contours.
+H
1
Diffracted wave
The meaning of Eq. (A.41) is obvious; it expresses that the transmitted beam results from the interference between the doubly transmitted beam and the doubly scattered beam. The factor exp(iα) takes into account the phase shift over α of the beam scattered by the exit part. Equation (A.42) has a similar meaning. In Eq. (A.42) the phase shift is −α because the phase shifts are opposite in sign for S2 and S− 2 . Detailed expressions can be obtained by replacing T1 , T2 , S1 , S2 by their explicit expressions in Eq. (A.37). If the fault plane is inclined with respect to the foil plane, the phase change α takes place at a level z1 , which now depends on position x along the foil. For instance, in Fig. 43, z1 becomes a linear function of x. As a result TS and SS become quasiperiodic functions not only of z1 , but also of x. For s = 0 the depth period is equal to tH ; for s = 0 it becomes 1/σH , where σH is given by Eq. (A.38).
t
2/t
(A.42)
z S1
T1
S1
T1
1 s
z1
a
S 2−e i a
T2
S1S 2−e i a
T1T2
S 2e i
T2−
S1T2−
T1S 2e −i a
T1T2+S1S 2−e i a
T1T2e −i a +S1S 2−
(b)
O −S
+H −H
Direct wave
Ewald sphere +S
O′
Figure 42. Schematic representation of the interfering waves in the case of a foil containing a planar interface. Left: transmitted amplitude; right: scattered amplitude.
z0
Diffracted wave
x
1 sH
a stacking fault parallel to the foil plane can thus be formulated as (Fig. 42, left) iα TS = T1 T2 + S1 S− 2e
x
(A.41)
The expressions T1 , T2 , S1 , S2 refer to the amplitudes for perfect foils. The indices 1 and 2 refer to the entrance part
Figure 43. Cross section of foil containing a stacking fault in an inclined plane: illustrating the formation of stacking-fault fringes. (a) According to the kinematical theory s = 0; (b) according to the dynamical theory s = 0.
284
ELECTRON MICROSCOPES
Strained Crystals Strain fields and lattice defects are characterized by their displacement fields R(r): the atom that was at r before deformation will be found at r + R(r) after deformation. A twin boundary with a small twinning vector (domain boundary) parallel to the foil plane at the level z1 (Fig. 15) can, for instance, be represented by the displacement field R = 0 for z < z1 and R = kz for z > z1 . A pure screw dislocation can be described by the function R = b(θ/2π ), where θ is the azimuth angle measured in the plane perpendicular to b; all displacements are parallel to b. The Darwin-Howie-Whelan Eqs. (A.36) can be adapted to the case of a deformed crystal by the substitution
s ⇒ seff = s + H
dR dz
r
(x ′,y ′)
(x,y )
(A.43)
The Multislice Method The two-beam dynamical treatment is insufficient for the general situation in which high resolution images are taken with the incident beam along a zone axis where many diffracted beams are involved. Therefore the multislice method was developed as a numerical method to compute the exit wave of an object. Although the multislice formula can be derived from quantum-mechanical principles, we follow a simplified version (23) of the more intuitive original optical approach (24). A more rigorous treatment is given in the next section. Consider a plane wave, incident on a thin specimen foil and nearly perpendicular to the incident beam direction z. If the specimen is sufficiently thin, we can assume the electron to move approximately parallel to z so that the specimen acts a pure phase object with transmission function Eq. (A.5)
e
(A.44)
A thick specimen can now be subdivided into thin slices, perpendicular to the incident beam direction. The potential of each slice is projected into a plane that acts as a twodimensional phase object. Each point (x, y) of the exit plane of the first slice can be considered as a Huyghens source for a secondary spherical wave with amplitude ψ(x, y) (Fig. 44). Now the amplitude ψ(x , y ) at the point (x , y ) of the next slice can be found by the superposition of all spherical waves of the first slice, that is, by integration over x and y, yielding
ψ(x , y ) =
e
(x − x )2 + (y − y )2 + ε2 (x − x )2 (y − y )2 ≈ε 1+ + 2ε2 2ε2
r=
(A.46)
so that exp(2π ikε) ε × exp[iσ Vp (x, y)]
πk × exp i [(x − x )2 + (y − y )2 ] dx dy ε (A.47) which, apart from constant factors, can be written as a convolution product: ψ(x, y) = ε[iσ Vp (x, y)] iπ k(x2 + y2 ) × exp ε
(A.48)
where the convolution product of two functions is defined as (in one dimension)
exp[iσ Vp (x, y)]
(A.45) exp(2π ikr) × dx dy r
e
When |x − x | ε|y − y | ε, with ε the slice thickness, the Fresnel approximation can be used, that is,
ψ(x , y ) ≈ ψ(x, y) = exp[iσ Vp (x, y)]
e
Figure 44. Schematic representation of the propagation effect of electrons between successive slices of thickness ε.
f (x) ∗ g(x) =
f (x ) g(x − x ) dx
(A.49)
ELECTRON MICROSCOPES
If the wave function at the entrance face is ψ(x, y, 0), instead of a plane wave one has for the wave function at the exit face ψ(x, y, ε) = {ψ(x, y, 0) exp[iσ Vp (x, y)]} iπ k(x2 + y2 ) × exp ε
(A.50)
This is the Fresnel approximation in which the emerging spherical wavefront is approximated by a paraboloidal wavefront. The propagation through the vacuum gap from one slice to the next is thus described by a convolution product in which each point source of the previous slice contributes to the wave function in each point of the next slice. The motion of an electron through the whole specimen can now be described by an alternation of phase object transmissions (multiplications) and vacuum propagations (convolutions). In the limit of the slice thickness ε tending to zero, this multislice expression converges to the exact solution of the nonrelativistic Schr¨odinger equation in the forward-scattering approximation. In the original multislice method one used the Fourier transform of Eq. (A.50) where the real space points (x, y) are transformed into diffracted beams g and where convolution and normal products are interchanged, that is, ψ(g, ε) = [ψ(g, 0) exp(iσ Vg )] iπ g2 ε × exp k
(A.51)
where Vg are the structure factors (Fourier transforms of the unit-cell potential). The wave function at the exit face of the crystal can now be obtained by successive application of Eq. (A.50) or (A.51). This can either be done in real space [Eq. (A.50)] or in reciprocal space [Eq. (A.51)]. The major part of the computing time is required for the calculation of the convolution product, which is proportional to N2 [N is the number of sampling points (real space) or beams (reciprocal space)]. Since the Fourier transform of a convolution product yields a normal product (with calculation time proportional to N) a large gain in speed can be obtained by alternatingly performing the propagation in reciprocal space and the phase object transmission in real space (23). In this way the computing time is devoted to the Fourier transforms and is proportional to N log2 N. Another way of increasing the speed is in the so-called real-space method (24). Here the whole calculation is done in real space using Eq. (A.50) but the forward scattering of the electrons is exploited so as to calculate the convolution effect of the propagation only in a limited number of adjacent sampling points. In this way, the calculation time is proportional to N. This method does not require a periodic crystal and is thus suitable for calculation of crystal defects.
285
Electron Channeling The multislice method is an efficient method to compute numerically the exit wave of an object. However it observes interesting physical aspects of dynamical electron scattering. The channelling theory is more approximate (1,25) (although improvements are currently being made) but it is simple and it gives much physical insight. Electron Wave Consider an isolated column of atoms, parallel to the electron beam. If we assume that the fast electron in the direction of propagation (z axis) behaves as a classical particle with velocity v = hk/m we can consider the z axis as a time axis with mz (A.52) t= kh Hence we can start from the time-dependent Schr¨odinger Eq. (A.53) −h ∂ψ (R, t) = Hψ(R, t) (A.53) i ∂t with H=−
h2 R − eU(R, t) 2m
(A.54)
with U(R, t) the electrostatic crystal potential, m and k the relativistic electron mass and wavelength, and R the Laplacian operator acting in the plane (R) perpendicular to z. Using Eq. (A.52) we then have i ∂ψ(R, z) = [R + V(R, z)]ψ(R, z) ∂z 4π k with V(R, z) =
2me U(R, z) h2
(A.55)
(A.56)
This is the well-known high-energy equation in real space, which can also be derived from the stationary Schr¨odinger equation in the forward-scattering approximation (22). If we now consider the depth proportional to the time, the dynamical Eq. (A.55) represents the walk of an electron in the two-dimensional projected potential of the columns. The solution can be expanded in eigenfunctions (eigenstates) of the Hamiltonian ψ(R, z) =
n
En z Cn φn (R) exp −iπ E λ
(A.57)
where Hφn v(R) = En φn (R)
(A.58)
with the Hamiltonian H=−
h R − eU(R) 2m
(A.59)
286
ELECTRON MICROSCOPES
U(R) is the projected potential of the column E=
h2 k2 2m
(A.60)
E is the incident electron energy, and λ is the electron wavelength. For En < 0 the eigenstates are bound to the column. We now rewrite Eq. (A.57) as ψ(R, z) =
Cn φn (R) +
n
Cn φn (R)
n
En z × exp −iπ −1 E λ
(A.61)
The coefficients Cn are determined from the boundary condition Cn φn (R) = ψ(R, 0) (A.62) n
In case of plane-wave incidence one thus has
Cn φn (R) = 1
(A.63)
n
n
Diffraction Pattern Fourier transforming the wave function Eq. (A.66) at the exit face of the object yields the wave function in the diffraction plane, which can be written as
so that ψ(R, z) = 1 +
where the summation runs over all the atomic columns of the object parallel to the electron beam. The interpretation of Eq. (A.66) is simple. Each column i acts as a channel in which the wave function oscillates periodically with depth. The periodicity is related to the ‘‘weight’’ of the column, that is, proportional to the atomic number of the atoms in the column and inversely proportional to their distance along the column. The importance of these results lies in the fact that they describe the dynamical diffraction for larger thicknesses than the usual phase-grating approximation and that they require only the knowledge of one function φi per column (which can be set in tabular form similar to atom scattering factors or potentials). Furthermore, even in the presence of dynamical scattering, the wave function at the exit face still retains a one-to-one relation with the configuration of columns for perfect crystals as well as for defective crystals provided they consist of columns parallel to the electron beam. Hence this description is very useful for interpreting high-resolution images.
En z −1 Cn φn (R) exp −iπ E λ
ψ(g, z) =
i
exp(−2π ig · Ri )Fi (g, t)
(A.67)
i
Only states will appear in the summation for which Eλ |En | ≥ z
(A.64)
These are bound states with deep energy levels that are localized near the column cores. In practice if the column does not consist of heavy atoms and the distance between columns is not too close (e.g., larger than 0.1 nm) only one eigenstate will appear, which can be compared to the 1s state of an atom. We then have ψ(R, z) = 1 + Cφ(R) E z −1 × exp −iπ E0 λ
(A.65)
A very interesting consequence of this description is that, since the state φ is very localized at the atom core, the wave function for the total object can be expressed as a superposition of the individual column functions φi so that Eq. (A.65) in that case becomes ψ(R, z) = 1 +
Ci φi (R − Ri )
i
E z −1 × exp −iπ E0 λ
(A.66)
In a sense the simple kinematical expression for the diffraction amplitude holds, provided the scattering factor for the atoms is replaced by a dynamical scattering factor for the columns, which is defined by −iπ Ei z Fi (g, z) = exp − 1 Ci fi (g) E λ
(A.68)
with fi (g) the Fourier transform of φi (R). It is clear that the dynamical scattering factor varies periodically with depth. This periodicity may be different for different columns. In case of a monatomic crystal, all Fi are identical. Hence ψ(g, z) varies perfectly periodically with depth. In a sense the electrons are periodically transferred from the central beam to the diffracted beams and back. The periodicity of this dynamical oscillation (which can be compared with the Pendel¨osung effect) is called the dynamical extinction distance. It has, for instance, been observed in Si(111). An important consequence of Eq. (A.67) is the fact that the diffraction pattern can still be described by a kinematical type of expression so that existing results and techniques (e.g., extinction rules) that have been based on the kinematical theory remain valid to some extent for thicker crystals in zone orientation.
ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING
BIBLIOGRAPHY
287
ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING
1. S. Amelinckx et al., eds., Electron Microscopy, vol. 1, chap. IV, in Handbook of Microscopy, Weinheim, VCH, 1997.
SANDRA S. EATON GARETH R. EATON
2. D. W. Robards and A. J. Wilson, Procedures in Electron Microscopy, Wiley, Chichester, 1993.
University of Denver Denver, CO
3. W. L. Bragg, Nature 124, 125 (1929). 4. P. P. Ewald, Ann. Phys. 54, 519 (1917). 5. J. Steeds, Convergent Beam Electron Diffraction, vol. 1, chap. IV.1.5, in S. Amelinckx et al., eds., Handbook of Microscopy, Methods I, Weinheim, VCH, 1997. 6. S. Takagi, Acta Crystallogr. 15, 1311 (1962). 7. P. B. Hirsch, A. Howie, and M. J. Whelan, Philos. Trans. R. Soc. A252, 499 (1960). 8. S. Amelinckx and J. Van Landuyt, in S. Amelinckx, R. Gevers, and J. Van Landuyt, eds., Diffraction and Imaging Techniques in Material Science, Amsterdam, North Holland, 1978, p. 107. 9. P. B. Hirsch et al., Electron Microscopy of Thin Crystals, Butterworths, London, 1965. 10. P. Humble, in S. Amelinckx, R. Gevers, and J. Van Landuyt, eds., Diffraction and Imaging Techniques in Material Science, Amsterdam, North Holland, 1978, p. 315. 11. D. J. H. Cockayne, in S. Amelinckx, R. Gevers, and J. Van Landuyt eds., Diffraction and Imaging Techniques in Material Science, Amsterdam, North Holland, 1978. 12. H. Lichte, Electron Holography Methods, chap. IV.1.8, in S. Amelinckx et al., eds., Handbook of Microscopy, Methods I, Weinheim, VCH, 1997. 13. J. Jansen et al., Acta Crystallogr. A54, 91 (1998). 14. H. W. Zandbergen, S. Anderson, and J. Jansen, Science 12, 1221 (1997).
PRINCIPLES OF EPR IMAGING Electron paramagnetic resonance (EPR) imaging maps the spatial distribution of unpaired electron spins. Information about a sample can be obtained from images that have one, two, or three spatial dimensions. Images also can be obtained that have an additional spectral dimension that reveals the dependence of the EPR spectrum on the position in the sample. One can then seek to understand why certain types of spins are found at particular locations in a sample, why the concentrations vary with position, or why the concentrations vary with time. For the benefit of readers who are not familiar with EPR, a brief introduction to the principles of EPR is given in the next section. The emphasis is on the aspects of EPR that impact the way that imaging experiments are performed. In the section following, we discuss types of species that have unpaired electrons, also called paramagnetic species, that have been studied by EPR imaging. Subsequent sections discuss procedures for EPR imaging and provide examples of applications. More extensive introductions to EPR imaging can be found in Ref. 1–3. Reviews of the EPR imaging literature for the years 1990 to 1995 (4) and 1996 to early 2000 (5) provide many additional references.
15. T. Nagatani et al., Scan. Microsc. 1, 901 (1987). 16. T. E. Mulvey and C. D. Newman, Inst. Phys. Conf. Ser. 18, 16 (1973). 17. K. C. A. Smith, in O. Johari, ed., Proceedings 5th Annual SEM Symposium, IITRI, Chicago, 1972, p. 1. 18. J. I. Goldstein et al., chap. 2, Scanning Electron Microscopy and X-ray Microanalysis, Plenum, New York, 1992. 19. S. Amelinckx et al., eds., Scanning Electron Microscopy, D. C. Joy, Scanning Reflection Electron Microscopy, chapter IV.2.1, in Handbook of Microscopy, Methods II, Weinheim, VCH, 1997. 20. J. Cowley, Scanning Transmission Electron Microscopy, vol. 1, chap. IV.2.2, in S. Amelinckx et al., eds., Handbook of Microscopy, Methods II, Weinheim, VCH, 1997. 21. S. Pennycook, Scanning Transmission Electron Microscopy, Z Contrast, vol. 1, chap. IV.2.3, in S. Amelinckx et al., eds., Handbook of Microscopy, Methods II, Weinheim, VCH, 1997. 22. A. V. Crewe, J. Wall, and J. Langmore, Science 168, 1333 (1970). 23. D. Van Dyck, in P. Hawkes, ed., Advances in Electronics and Electron Physics, Academic, New York, 1985. 24. J. M. Cowley and A. F. Moodie, Acta Crystallogr. 10, 609 (1957). 25. K. Ishizuka and N. Uyeda, Acta Crystallogr. A33, 740 (1977). 26. K. Kambe, G. Lempfuhl, and F. Fujimoto, Z. Naturforsch. 29a, 1034 (1974).
Principles of EPR Resonance Condition. EPR, (also known as electron spin resonance (ESR) or electron magnetic resonance (EMR)), studies unpaired electrons by measuring the absorption of energy by the spin system in the presence of a magnetic field. Molecules or materials that have unpaired electrons are called paramagnetic. For the purposes of this discussion, we consider only paramagnetic species that have a single unpaired electron, because they are more likely to be amenable to EPR imaging. More comprehensive introductions to EPR are available in Ref. 2 and in standard texts (6). Many of the physical principles are similar to those of nuclear magnetic resonance (NMR). When an electron is placed in a magnetic field, the projection of the electron’s magnetic moment on the axis defined by the external field (usually designated as the z axis) can take on only one of two values, +1/2 and −1/2, in units of h ¯ , which is Planck’s constant divided by 2π . This restriction to only a small number of allowed states is called quantization. The separation between the two energy levels for the unpaired electron is proportional to the magnetic field strength B (Fig. 1). Unlike many of the other spectroscopies described in this encyclopedia, EPR involves interaction of the unpaired electron with the magnetic component of electromagnetic
288
ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING Projection of spin on z axis +1 2
B=0
gßB
−1 2
B Increasing Absorption line
Figure 1. The splitting of electron spin energy levels increases proportionally to magnetic-field strength. Transitions between the two energy levels are stimulated by electromagnetic radiation when hν = gβe B. The customary display in EPR spectroscopy is the derivative of the absorption line that is shown here.
radiation, rather than with the electric component. When electromagnetic radiation, whose energy is equal to the separation between the spin energy levels, is applied to the sample, energy is absorbed by the spins, and transitions occur between the spin states. This resonance condition is defined by hν = gβe B, where h is Planck’s constant, ν is the frequency of the electromagnetic radiation, βe is the Bohr magneton, B is the magnetic field strength, and g is a characteristic value for a particular paramagnetic species. EPR experiments are most commonly performed at microwave frequencies of about 9.0–9.5 GHz (in the frequency band that is commonly called X band). g values of organic radicals are typically close to 2.0, so that resonance at X band for an organic radical occurs at magnetic fields of about 3200 to 3400 G (1 gauss = 0.1 mT). Samples are placed in a structure that is called a resonator. At X band, the most common resonator is a rectangular box that is called a cavity whose dimensions are about 2.3 × 1.1 × 4.3 cm. The maximum usable dimension of a sample is about 1.0 cm, provided that the sample does not absorb too much microwave energy. Other resonant structures are typically used at the lower frequencies that are used for in vivo imaging. In a continuous wave (CW) experiment, the microwave frequency and power are held constant, and the magnetic field is swept through resonance to record the spectrum. Because magnetic field modulation and phase-sensitive detection are used, traditionally, the first derivative of the absorption spectrum is recorded. The first-derivative display also provides resolution enhancement, which is advantageous because many EPR lines have rather large line widths. Later, some factors related to the choice of the microwave frequency at which to perform imaging experiments, are discussed. Hyperfine Splitting. In real samples, an unpaired electron is typically surrounded by many nuclear spins
that contribute to the net magnetic field experienced by the electron spin. The energy required for the electron spin transition depends on the quantized spin states of neighboring nuclei, which results in splitting the EPR signal into multiple lines. This is called hyperfine splitting. The number of lines in the EPR signal is equal to 2nI + 1 where n is the number of equivalent nuclei and I is the nuclear spin. Thus, interaction with one 14 N nucleus (I = 1) causes splitting into three lines, and interaction with three equivalent protons (I = 1/2) causes splitting into four lines. Splitting between adjacent lines is called the hyperfine splitting constant. The magnitude of the splitting constant in fluid solution, for a system tumbling rapidly enough to average electron–nuclear dipolar interactions to zero, depends upon the extent to which the unpaired electron spin is delocalized onto that nucleus. Hyperfine splitting constants of 10 to 20 G are common for organic radicals, and splitting constants of 100 G or more are common for transition-metal ions. Thus, a typical EPR spectrum of an organic radical may extend over tens of gauss. The hyperfine splitting constants are independent of magnetic field, so hyperfine splittings become increasingly large fractions of the resonant field, as the microwave frequency (and corresponding resonant magnetic field) is decreased. Rigid Lattice Spectra. The g values for most paramagnetic centers are anisotropic which means that the g value is different for different orientations of the molecule with respect to the magnetic field. When a paramagnetic center tumbles rapidly in solution, the g anisotropy is averaged, and a single g value is observed. However, when a sample that contains an unpaired electron is immobilized in a rigid lattice (a crystalline solid, an amorphous solid, a frozen solution, or a solution that formed a glass when it was cooled), the EPR spectrum is more complicated and extends over a wider range of magnetic fields than when the same material is in solution. The anisotropic dipolar contributions to hyperfine splitting, which were averaged in fluid solution, also contribute to spectral complexity in the solid state. Contributions to Line Widths. As discussed later, the spatial resolution of the image for most approaches to EPR imaging, is inversely proportional to the line width of the EPR signal. Thus, it is important to appreciate the more common factors that contribute to line width. EPR line widths for organic radicals typically are of the order of a gauss (1 gauss at g = 2 is 2.8 MHz). In some cases, line widths are relaxation-time determined and become temperature dependent. Frequently, unresolved nuclear hyperfine splitting and a distribution of chemical environments make a significant contribution to line widths. In fluid solutions, collisions between paramagnetic species (including oxygen) can cause EPR spectral line broadening, so solutions whose radical concentrations are higher than a few mM typically have line widths greater than those observed at lower concentrations. At concentrations higher than about 10 mM, collision broadening is usually severe and can cause loss of
ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING
hyperfine structure. At very high concentrations, exchange narrowing can cause collapse of the spectrum to a single line and give a line width that depends strongly on concentration. Solvent viscosity has an impact on line widths in samples that have significant g anisotropy because incomplete motional averaging of the anisotropy results in broadening the EPR signal. In rigid lattice samples, unresolved hyperfine splitting and distributions of g values and nuclear hyperfine splittings are major contributors to line widths. As the concentration of the paramagnetic species increases, there is increasing dipolar interaction between neighboring paramagnetic centers, which broadens the lines. Thus, for both fluid solution and rigid lattice spectra, concentrations of paramagnetic centers greater than a few mM significantly broaden the spectra and result in a loss of spatial resolution in EPR images. Ultimately, the resolution achievable in EPR imaging is limited by signalto-noise (S/N), so broadening of the signal that occurs at high concentrations of paramagnetic centers (high S/N) places physical limits on achievable resolution. EPR Imaging versus EPR Spectroscopy. EPR spectroscopy is performed in a magnetic field that is as uniform as possible. Often the spectrum is uniform through the sample, but if that is not the case, the EPR spectrum is a superposition of the signals from all positions in the sample. The goal in an imaging experiment is to distinguish between signals from different portions of the sample. An image typically displays EPR signal intensity as a function of one or more spatial dimensions. For samples in which the EPR line shape varies with position in the signal, the image may include line shape as an imaging dimension. Any EPR observable could potentially be an imaging dimension. Species whose EPR Signals are Amenable to Imaging Most EPR imaging experiments are performed at room temperature, so the species to be imaged must have a suitable EPR signal at room temperature. Many organic radicals in fluid solution are so unstable that their EPR signals do not persist long enough for an imaging experiment. There are important exceptions to this generalization, which has led to the widespread use of certain classes of organic radicals as probes in imaging experiments. Nitroxyl radicals have an unpaired electron delocalized in a nitrogen–oxygen π ∗ molecular orbital. An example of this class of radicals is the molecule that is given the acronym Tempone (I). These compounds are also sometimes called nitroxide radicals or aminoxyl radicals. This class of radical can be made of five-membered rings in place of the six-membered ring of Tempone and have a variety of substituents on the 4-position of the ring. The carbon atoms adjacent to the N–O moiety of these molecules are substituted with methyl groups (as in Tempone) or other bulky side chains to protect the paramagnetic center sterically. Many nitroxyl radicals are commercially available. They are stable in solution, even in the presence of oxygen, for prolonged periods provided that no species are present that are strong enough oxidizing or reducing agents to destroy the paramagnetic center.
289
The EPR signals for nitroxyl radicals exhibit hyperfine splitting due to the nitrogen atom, which results in a three-line signal whose splitting is about 15 G in fluid solution. The magnitude of the hyperfine splitting and the relative widths of the three lines of the hyperfine pattern can be useful monitors of the environment of the radical but can also complicate the imaging experiment as discussed later. A second class of compounds that has been exploited more recently in imaging experiments is derivatives of the triarylmethyl radical. These radicals have the advantage that they do not contain nitrogen, which avoids splitting the signal into three hyperfine lines. The triarylmethyl radical has relatively low stability in air. However, highly substituted derivatives, such as II, are substantially more stable than the parent radical and can be made water-soluble by selecting the groups on the periphery (7). These radicals are called trityl radicals or triarylmethyl (TAM) radicals. The EPR signal for these radicals is a single relatively narrow line (that has many unresolved hyperfine components), which is very convenient for spatial imaging. Radicals entrapped in a solid may be more stable than the same species in solution. For example, the EPR signals due to certain radicals in organic chars and irradiated solids are quite stable and suitable for imaging.
O
N
C• CD3
S
CD3
S
•
3
S
CD3
S
CD3
•
O
COO-K+
I
II
Many transition-metal ions have stable oxidation states and one or more unpaired electrons. However, the EPR signals of many of these metal ions at room temperature are quite broad, or there are many lines in the spectrum, which makes imaging based on magnetic-field gradients more difficult than for narrow-line signals. Encoding Spatial Information using Magnetic-Field Gradients The spatial information in most EPR imaging experiments is encoded using magnetic-field gradients. The basic principles underlying the use of gradients are illustrated in Fig. 2 for a single imaging dimension. Consider two samples that contain the same paramagnetic species for which the EPR signal is a single line. Let Bres designate the magnetic field that satisfies the resonance condition (hν = gβe Bres ). In the absence of a magnetic-field gradient, if Bext (the externally applied uniform magnetic field) is swept as in the usual field-swept continuous wave EPR experiment, both samples give a single-line signal centered at Bext = Bres . Now, consider the impact of an applied magnetic-field gradient along the direction of the main magnetic field, which is defined as the z axis. The magnetic-field gradient can be expressed as ∂Bz /∂z, the change in the z component
290
ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING
Center of cavity
Bext
(a)
Bi (b) Gradient contribution to B
0
Bj Zi
Z0
Zj
Z
(c) EPR signal
Bext
Field at sample : Bo+Bi
Bo
Bo+Bj
of coils is used to generate a gradient along the direction of the external magnetic field (∂Bz /∂z), and the current flows in opposite directions in the two coils. To create gradients in the two perpendicular directions, pairs of coils shaped like a figure eight are employed. The current in the coils is varied to adjust the magnitude of the gradient. Fixed gradients can be generated by using ferromagnetic wedges (8). One-dimensional Spatial Imaging. Consider the sample shown in Fig. 3. Coal has a strong, single-line EPR spectrum. Five tubes that contained coal, or coal mixed with NaCl, were put in the EPR cavity and oriented so that the five samples were spaced along the z direction. The normal (nongradient) X-band EPR spectrum exhibited a single line. When a magnetic-field gradient of 103 G/cm was applied and the external magnetic field was scanned, the spectrum in Fig. 3a was obtained. Five peaks were observed. The horizontal axis in the figure is the magnetic field. This is a complicated spectrum, but knowing the conditions of the experiment, one can see that each of the five samples of coal yielded a separate peak in the
Bext at resonance : Bres−Bi Bres Bres−Bj Figure 2. Encoding of spatial information by using a magnetic-field gradient. (a) Two samples of the same paramagnetic material that has a single-line spectrum are placed in an EPR cavity at positions zi and zj . (b) A magnetic-field gradient is applied, so that the gradient is zero at position zo , positive at position zi , and negative at position zj . (c) During a CW field-swept EPR experiment, the two samples achieve resonance at different magnitudes of the slowly swept external field Bext . (Figure adapted from Ref. 3 and used with permission).
of the magnetic field as a function of the z spatial coordinate. The magnetic-field gradient is generated so that at the center of the cavity zo , there is zero gradient, and the total magnetic field Bo (the field at location zo ) equals Bext . The applied magnetic-field gradient is such that at the left-hand sample (position zi ) the gradient field contributes an amount Bi , and at the right-hand sample (position zj ), the gradient contributes Bj . For the present example, let Bi be positive and Bj be negative. In the presence of the applied gradient, the sample at zi experiences a magnetic field that is larger than Bext by the amount Bi . Resonance occurs when Bext + Bi = Bres , so Bext = Bres − Bi . Similarly, the sample at zj experiences a magnetic field that is smaller than Bext by the amount Bj . Hence, the sample at zj will not be at resonance until Bext increases to Bres − Bj . Thus, when the external field is scanned to achieve resonance, the magnetic-field gradient distinguishes between the samples at different spatial locations in the EPR cavity. Therefore, we say that the magnetic-field gradient encodes spatial information into spectral information, which means that samples at different spatial positions appear as if they are different spectral components when the gradient is applied and the external field is varied to record a spectrum. The resulting spectrum in Fig. 2c is shown in the traditional first-derivative presentation. Magnetic-field gradients are usually generated by passing electric currents through coils of wire. A pair
(a)
Coal 25 G
(b) 103 G/cm gradient 2.0 mm
Figure 3. One-dimensional X band spatial imaging of a sample composed of five small tubes that contain coal. Coal exhibits a single-line EPR spectrum in a normal nongradient EPR experiment. The five samples were the same height, as shown in the insert, but the coal in three of the tubes was diluted with NaCl. (a) First-derivative X band CW spectrum of the sample in the presence of a magnetic-field gradient of 103 G/cm. (b) One-dimensional image obtained by mathematical deconvolution of the nongradient spectrum from the spectrum shown in (a). (Figure reproduced from Ref. 23 and used with permission.)
ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING
CW EPR spectrum. To find the spatial position of the samples, the magnitude of the field gradient (103 G/cm in this case) is used to convert the spectral dimension to a spatial dimension. Because the EPR lines are finitely wide (importantly — see later - they are all the same width in this case), the line width can be mathematically removed by deconvolution. The result of the deconvolution of the first integral of the spectral line width and conversion from gauss to distance for the horizontal axis scale is shown in Fig. 3b. This is a display of the intensity of the EPR signal as a function of position in the cavity. Figure 3b is a onedimensional spatial image of unpaired spin density in the composite coal sample. The gradient is the scaling factor between the sweep width in gauss and the spectral axis in the image. For example, a 50-G scan of a sample in the presence of a 100-G/cm gradient generates an image whose spatial axis is 0.50 cm long. The intensity of the EPR signal is proportional to the concentration of coal in each tube. The next step in developing the concepts of EPR imaging is to add hyperfine structure to the problem described in the preceding paragraph. So we substitute solutions of nitroxyl radicals for the coal samples and keep everything else the same. A nitroxyl radical in fluid solution in a uniform magnetic field exhibits a threeline spectrum whose splitting is about 15 G due to the coupling to the I = 1 14 N nucleus. The X-band firstderivative CW EPR spectrum for the five-part sample in the presence of a 103-G/cm gradient is shown in Fig. 4a. Six lines are immediately evident, and some of them show further complications. Actually, there are 15 overlapping lines — three lines from each of the five samples. The appearance depends on the magnitude of the gradient (more about this later). Taking advantage of the fact that the nitroxyl EPR line widths and hyperfine splittings in all five samples are identical, the line shape including hyperfine splitting can be deconvoluted in the same way as was done for the coal sample. The result is Figure 4b, which shows the spatial distribution of the signal intensity from the five samples prepared from different concentrations of the nitroxyl radical. Addition of a Spectral Dimension. Key to the deconvolution that was used to obtain the spatial information in the one-dimensional images of the coal and nitroxyl phantoms was the assumption that the line shape of the EPR signal was invariant through the sample. This assumption is not valid for some important classes of samples. In samples that contain naturally occurring radicals or radicals induced by irradiation, there may be more than one radical present in the sample, and the relative concentrations of the radicals are likely to vary through the sample. Even when only one species is present, the line shape of the EPR signal may vary through the sample. For example, the line shape for a nitroxyl radical depends on the rate of molecular tumbling, which provides a way to monitor local viscosity. The broadening of a narrow-line signal from a trityl radical (II), a char, or a nitroxyl radical due to collisions with paramagnetic molecular oxygen can also be used to monitor local oxygen concentration, which is called oximetry. An attempt to deconvolute a line shape
291
(a)
Nitroxyl
O N O
103-G/cm gradient 50 G (b)
4.0 mm
Figure 4. One-dimensional X band spatial image of a phantom composed of five tubes that contain solutions of the nitroxyl radical Tempone. The five tubes were the same height, as shown in the insert, but the concentrations of radical in the solutions were different. The radical gives a three-line spectrum that has hyperfine splitting of about 15 G. The magnetic-field gradient was 103 G/cm. (a) First-derivative field-swept spectrum obtained in the presence of the gradient. (b) Image obtained by deconvoluting the nongradient line shape, including hyperfine splitting from the first-integral of the spectrum shown in (a). (Figure reproduced from Ref. 24 and used with permission.)
from an image in which the line shape is not constant causes distortions and artifacts in the image. For these types of samples, it is important to include a spectral dimension in the image. An image that has one spectral dimension and one spatial dimension is called a spectralspatial image. Rather than include a spectral dimension, the lines in some spectra can be instrumentally broadened to minimize line shape differences, but line broadening reduces resolution in the image. If the motional information inherent in the nitroxyl line shape, for example, is not of interest, the sample can be cooled; this slows the motion for all environments and restores a broadened but uniform line shape (9). An approach to generating a spectral-spatial image is illustrated in Fig. 5. This image was obtained from a phantom1 constructed of two tubes that contained a radical (DPPH) that gives a single-line EPR spectrum and a third 1 A phantom is a sample of known composition and geometry that is designed to test an imaging procedure.
292
ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING
0.67 cm
spatial dimensions can also be obtained by extending the method just described.
DPPH Tempone
0.90 cm
Resolution of the Image
aN
80°
44 G 44°
aN
1°
aN Figure 5. X band spectral-spatial image and three of the projections used to create the image. Projections at higher angles are obtained from higher gradients and wider magnetic-field scans. The maximum magnetic-field gradient used to obtain the experimental projections that had the highest angle in the spectral-spatial plane was 400 G/cm. The sample consisted of two specks of solid DPPH, which gives a single-line EPR spectrum, and a tube that contained isotopically enriched 15 N-Tempone. (Reproduced from Ref. 3 and used with permission.)
tube that contained isotopically enriched 15 N-Tempone which gives a two-line EPR spectrum. The horizontal axis of the image is the spectral axis, the vertical axis is the spatial axis, and the spectral traces represent variations in signal intensity and line shape. A ‘‘view’’ of this image or pseudo-object is the shadow that an observer would see along a perpendicular plane when viewing the object from a particular direction. Three of these views are shown in the figure. Views are often called ‘‘projections.’’ Views of a spectral-spatial object are obtained by varying the magnetic-field gradient and the magnetic-field scan in a particular way. Typically, 16 to 128 such views are obtained and combined mathematically to reconstruct the image shown in the figure. The reconstructive algorithms are common to various imaging modalities based on tomographic reconstruction. Detailed discussions related to EPR imaging can be found in Ref. 10. Imaging by using Two or Three Spatial Dimensions. Spatial information for these images is encoded by using gradients in two or three directions. In each case, the net magnetic field is along the z axis, but the magnitude of the field varies with position along the x and/or y axes. The gradients are then ∂Bz /∂x and/or ∂Bz /∂y. Gradients along the axes are varied, and the magnetic field is scanned to provide ‘‘views’’ of the object which are then reconstructed to generate the full image. Images that have a spectral dimension in addition to either two or three
To a first approximation, the resolution of an image is determined by the ratio of the line width of the EPR signal to the magnetic-field gradient. For example, if the line width is 1 G and the gradient is 100 G/cm, then the resolution is 1G/(100 G/cm), which is 0.01 cm or 100 microns. A detailed discussion of resolution in magnetic resonance imaging can be found in Ref. 1 and 11. The resolution can be improved by deconvoluting the line shape, but practical considerations limit this enhancement to perhaps a factor of 3. In principle, increasing the gradient can increase the resolution, but again there are practical limits. Most EPR imaging experiments to date have used gradients less than 400 G/cm (<4.0 T/m). Higher gradients require higher electrical currents through the pairs of coils that are used to create the gradients, which requires greater heat dissipation. For the same current and coil dimensions, placing the coils closer together can generate a higher gradient. However, this further restricts the size of the object that can be imaged. A more fundamental limitation is experimental signal-to-noise. To estimate the spin concentration validly in a volume element (voxel) of an image requires a detectable number of spins per voxel. For X band EPR, current commercial spectrometers can detect about 1010 spins in samples that have a single line 1 G wide. Thus signal-to-noise becomes the fundamental limit on resolution for samples that have low signal intensity. For images of multiple dimensions, resolution also depends on the number of projections. Size of Object and Localized Detection To image an object by using magnetic-field gradients and traditional resonators requires that the object fit into the resonator. At X band (ca. 9.5 GHz), the maximum size of the object is a cylinder of radius about 10 mm and height about 25 mm. Such an object would need to have a small dielectric constant and not exhibit much nonresonant absorption of microwaves. For example, if the sample were aqueous, it would have to be much smaller. Even for an object whose dimensions are of the order of 10 mm, it is necessary to correct for nonuniformities in signal detection. One approach to imaging larger objects is to perform the imaging experiments by using lower frequencies of electromagnetic radiation where the wavelength is longer and the dimensions of the resonators are larger, as discussed in the next section. Several approaches have been used to obtain EPR images of larger objects by localized detection. In these techniques, the EPR signal from selected regions of the sample is recorded sequentially, and the measurements are assembled into an image. If a hole is cut in a traditional resonant cavity, the magnitude of the magnetic-field from the electromagnetic radiation is sufficiently high at the hole to permit signal detection from a sample outside the resonator. The EPR signal intensity at the field that is
ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING
determined by the external magnet can be detected for the segment of an object that is positioned immediately adjacent to the hole. By moving the object on a stage, it is possible to obtain a raster scan of signal intensity. In this method, the resolution of the image is determined by the size of the hole and the precision of the raster scan. Resolution can be improved by deconvoluting the response function of the hole and then the resolution again ultimately becomes limited by signal-to-noise. The size of the object that can be imaged by this method is limited by the size of the moving stage and the need to place the object between the poles of the magnet. This technique has been called EPR scanning microscopy (2). An example of an image obtained by this technique is described later in Mineralogy and Dating. Another method of localized detection uses a surface coil (12). The object must still fit between the poles of a magnet, but detection uses a coil of wire that can be placed at the surface of the object and does not require that the object fit inside a resonator. Because of constraints on dimensions, surface coils are more practical at frequencies lower than about 9 GHz.
Selection of Electromagnetic Frequency Most modern EPR experiments have been performed at frequencies near 9.5 GHz, where the wavelength of the electromagnetic radiation is about 3 cm. There are several incentives for performing imaging experiments at lower frequency. The tendency for nonresonant absorption of energy is denoted as the dielectric lossiness of a sample. Dielectric loss is much greater at the microwave frequencies that are typically used in EPR than at the radio frequencies used in NMR, which limits the amount of sample that can be put in a resonator. For example, in a standard X band resonator, a lossy aqueous sample is limited to a cylindrical tube whose internal diameter is about 0.5 mm or to a flat cell whose maximum thickness is about 0.3 mm. Because of the frequency dependence of the dielectric loss, larger quantities of aqueous or other lossy samples can be used at lower microwave frequencies. The depth penetration into a sample is smaller for microwaves than for radio frequencies. It has been estimated that penetration into aqueous samples decreases from tens of centimeters at about 10 MHz to less than a millimeter at 10 GHz. This decrease occurs for samples inside a resonator or monitored with localized detection devices, which is the reason that in vivo EPR imaging is performed at frequencies less than about 1 GHz. These arguments might give the impression that lower frequency would always be preferable. However, there is a price to pay for operating at lower frequency. The EPR signal intensity decreases as the electromagnetic frequency decreases. The detailed dependence of signal intensity on frequency depends on many factors, including the structure of the resonator and whether the volumes of the sample and/or the resonator are scaled with frequency or held constant.
293
Pulsed EPR Imaging Pulsed EPR imaging is currently limited to samples that have relatively long electron spin relaxation times. This technique has been demonstrated for a trityl radical in a phantom and in a mouse tail at 250 MHz (13), for a single crystal of an organic conductor at 300 MHz (14), and for selected radicals, primarily in solids, at X band (1,4). Even for these samples, the relaxation times are short enough to constrain the pulse sequences that can be employed to encode information. The maximum magneticfield gradient that could be employed is limited by the decrease in the effective electron spin relaxation time in the presence of the gradient. Development of pulsed EPR imaging requires spectrometers that have shorter dead time between the end of the pulse and the beginning of data acquisition and very fast data acquisition. EPR Imaging versus NMR Imaging Readers may be more familiar with nuclear magnetic resonance (NMR) imaging, which is also called magnetic resonance imaging (MRI) (MAGNETIC RESONANCE IMAGING) than with EPR imaging. Because both NMR and EPR are magnetic resonance techniques, one might assume that NMR and EPR imaging would be performed similarly. Both techniques examine spins in the presence of a magnetic-field: NMR looks at nuclear spins, and EPR looks at electron spins. However, there are many differences in the practical aspects of the two techniques. The most striking contrast between typical NMR imaging and most EPR imaging experiments is that the NMR experiments use pulses of electromagnetic radiation, whereas in most EPR experiments, the electromagnetic radiation is continuous, that is, steady state, and so the experiments are designated as ‘‘continuous wave,’’ or CW experiments. Pulsed NMR is used because more averages of the signal can be achieved per unit time than in a CW experiment. Several factors make pulsed techniques less generally applicable in EPR than in NMR: 1. The time that is available to record the response of the spins after a pulse is determined by the electron spin relaxation time — the time constant for the process that returns the populations of spin energy levels to equilibrium. Electron spin relaxation times are typically much shorter than nuclear spin relaxation times. At room temperature, electron spin relaxation times typically are of the order of microseconds for organic radicals. Relaxation times for transition-metal ions typically are much shorter than for organic radicals. By contrast, nuclear spin relaxation times frequently are of the order of milliseconds or longer. Thus, pulsed EPR experiments must be conducted on timescales about 1000 times faster than NMR experiments, microseconds instead of milliseconds. Pulsed NMR experiments also use pulsed magneticfield gradients. It is difficult to pulse magneticfield gradients of the magnitude required for EPR imaging (see below) on a timescale of microseconds.
ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING
Furthermore, the effective relaxation time for electron spins becomes even shorter in the presence of the gradient that is used to encode the spatial information than in ‘‘normal’’ nongradient spectra. 2. EPR spectra frequently extend over tens of megahertz (10 G = 28 MHz), whereas fluid solution NMR spectra for many nuclei extend over only tens of kilohertz. To excite such a wide range of frequencies requires short high-power pulses. Pulsed EPR spectrometers typically require kilowatt microwave amplifiers which are much more expensive than the lower power RF amplifiers that are adequate for NMR imaging. Even using high-power amplifiers, it is often difficult to excite the full extent of an EPR spectrum, although full excitation is straightforward for most NMR experiments. The ultimate limit on resolution is the requirement for adequate signal-to-noise for each volume element (voxel) in an image. NMR typically images protons that are present at high concentration in many samples. For example, the concentration of protons in water is 110 M. In most samples of interest, unpaired electrons are present at concentrations lower than millimolar. Furthermore, increasing the concentration of unpaired electrons results in broadening of the EPR lines, which can make the signal harder to detect. The concentration of electron spins is more than five orders of magnitude lower compared to nuclear spins which is partially compensated for by the greater inherent sensitivity of EPR spectroscopy compared to NMR spectroscopy, but sensitivity is a challenge for EPR imaging. The problem of low signal intensity is even more severe at the lower frequencies used for in vivo imaging. Most EPR spectra consist of multiple lines due to nuclear hyperfine splitting, which divides the signal intensity among several lines and further decreases sensitivity. Hyperfine splitting can be used to characterize the species that is present, but requires including the spectral dimension as an additional dimension in the image if the line shape is not uniform through the sample. The additional dimension increases the time required to acquire the data for an image and presents additional challenges concerning image visualization. Although NMR imaging is readily performed on relatively large samples, most current techniques for EPR imaging are limited to smaller samples. In conventional EPR spectrometers, the sample is positioned in a cavity or other structure that is resonant at the microwave frequency of the instrument. The dimensions of the resonator are determined by the wavelength of the microwaves, which limits sample size as discussed in the first section. Objects too large to fit inside a resonator have been examined by some of the localized detection techniques that are described earlier. EXAMPLES OF APPLICATIONS OF EPR IMAGING EPR imaging has been applied to a variety of systems. In this article, a few examples are selected to highlight results obtained by a range of methods and to demonstrate the
variety of information that can be obtained. The examples are selected as ‘‘case studies’’ of uses of EPR imaging. Diffusion Coefficients A major application of EPR imaging is the study of diffusion coefficients, typically using nitroxyl radicals as probes, in a wide variety of homogenous and heterogeneous media, including membranes and liquid crystals (1,15). The nitroxyl radical is initially present in a small region of the sample. The radical diffuses into the surrounding media as a function of time. The concentration of the radical can be selected so that there is little spatial variation in the line shape and so that the line shape can be removed from the experimental data by deconvolution as discussed earlier. The initial spatial distribution of radicals and the geometry for diffusion can be designed so that only a single spatial dimension is required in the image. Examples of spatial distributions obtained by onedimensional imaging and deconvolution of the nitroxyl line shape are shown in Fig. 6 for the isotropic phase of a liquid crystal (8). If the signal intensity extends across more than a few millimeters, it is necessary to account for the variation of EPR detection sensitivity as a function of position in the resonator in analyzing the signal intensities. The dashed lines in Fig. 6 were calculated by adjusting the diffusion coefficient in a theoretical expression for diffusion from a point source, followed by multiplying the predicted distribution by the cavity sensitivity function. The calculated curves agree well with the experimental data. Successive measurements of the spatial profile give consistent values of the diffusion coefficient. CW EPR imaging has been used to measure diffusion coefficients D in the range of 10−5 to 10−7 cm2 /s (15).
C (z) S (z) [arbitrary units]
294
D iso c
b
a
z Figure 6. One-dimensional X band spatial images of the distribution of the nitroxyl radical Tempone diffusing in the isotropic phase of a liquid crystal at 323 K. The length of the z axis is about 1.2 cm. The images were obtained by deconvoluting the nongradient spectrum from the experimental data. The three curves were obtained from data recorded (a) 3.64 × 104 s, (b) 1.44 × 105 s, and (c) 12.94 × 105 s after diffusion started. The dashed lines are least-squares fits obtained by multiplying the cavity sensitivity profile S(z) by the calculated concentration profiles C(z), assuming diffusion from a point source, where D = 2.4 × 10−6 cm2 /s. (Figure reproduced from Ref. 8 and used with permission.)
ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING
4.0 × 1015
mm 0.5
1.5 G
Figure 7. Spectral-spatial X-band image of the E’ signal in a sample of amorphous SiO2 that was irradiated by 17-keV X rays. The face of the plate on which the radiation was incident is oriented toward the rear of the image. The projections for the image were obtained by using a maximum gradient of 300 G/cm. (Figure reproduced from Ref. 16 and used with permission.)
Spins/cm2
Spins/cc
(a) 4.0 × 1016
0.5 mm
0
0 8.0 × 1015
Spins/cm2
Spins/cc
(b) 8.0 × 1016
0.5 mm
0
0
(c) 2.0 ×1016
Spins/cc
Modern electronic devices use amorphous insulators extensively. During fabrication or when these devices are used in outer space, they may be exposed to ionizing radiation. This exposure raises the question how much damage is done to the material by radiation and what impact it has on performance. Experimental measures of defect formation are required to test the validity of mathematical models. One of the most commonly used of these amorphous materials is silicon dioxide (a-SiO2 ). After a-SiO2 is exposed to radiation, a characteristic EPR signal is observed that is assigned to an E’ defect, which it is proposed, is due to a broken Si−O bond. Formation of these defects degrades the performance of electronic devices that use a-SiO2 . To improve the reliability of electronic and optical components, it is important to understand the processes that form these defects and the characteristics of the samples that minimize defect formation. A spectral-spatial image of the E’ defects in a flat plate of a-SiO2 that had been irradiated with 17-keV X rays is shown in Fig. 7 (16). The face of the plate on which the radiation was incident is oriented toward the rear of the image. Spectral-spatial imaging was selected for these samples because more than one type of radical is generated by the irradiation and the spectral-spatial imaging permits selective examination of only the E’ signal. A single spatial dimension is adequate because the samples were rectangular plates that were irradiated by a beam perpendicular to a flat surface of the plate. Thus the key spatial dimension is along the direction of the irradiation. The image was obtained by using a magnetic-field gradient of 300 G/cm. Horizontal slices
0
2.0 ×1015
0.5 mm Back
Front
Spins/cm2
Defects in Irradiated Solids
295
0
Figure 8. Spatial slices through X-band spectral-spatial images of irradiated amorphous SiO2 at the magnetic-field that corresponds to the maximum intensity of the EPR absorption spectrum of the E’ center. The direction of irradiation was parallel to the spatial axis. The sample was oriented so that the face on which the radiation was incident appears at the right side of the spatial profile. The solid line shows the experimental data. Analysis of the profiles: (---) response function, (•••) dose profile, (•-•-) profile simulated from the response function and the dose profile. The samples were irradiated by (a) 5-keV, (b) 8-keV, and (c) 17-keV X rays. (Figure modified from Ref. 16 and used with permission.)
through the image display the characteristic EPR signal from the E’ defects. Slices parallel to the spatial axis show the spatial distribution of the signal intensity of the E’ defects. Spatial slices from plates irradiated by X rays of different energies are shown in Fig. 8 (16). For these optically flat plates, the leading edge of the theoretical spin distribution at the spatial resolution of these images (about 50 microns) is a step function. Thus, the shape of the profile at the edge of the sample was used to estimate the experimental response function. The experimental profiles of EPR signal intensity were simulated as the convolution of the response function with a defect formation profile. The basic features of the spatial profiles were consistent with predictions for the dose profile. However, in some cases there was enhanced defect formation near the surface that could result from strains or increased impurity levels near the surface. Comparison of the total EPR signal intensities demonstrated that aSiO2 samples that have higher concentrations of −OH groups were more susceptible to defect formation than
296
ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING
purer samples of a-SiO2 . Understanding these properties may be very important in designing electronic devices with ever thinner layers of an a-SiO2 substrate. Polymer Degradation Degradation of polymeric materials upon exposure to heat, mechanical stress, or UV irradiation in the presence of oxygen is the polymer equivalent of metal corrosion (17). Degradation typically occurs during prolonged exposure to these conditions and eventually results in the loss of the mechanical properties that are required for many applications. It has been found that adding hindered amine stabilizers (HAS) to certain types of polymers slows down the degradative process. Nitroxyl radicals are some of the major reaction products of HAS. The EPR spectra of the resulting nitroxyl radicals formed in these reactions can be used both to probe the course of the degradation and of the protective reactions. EPR imaging has been used to study HAS-derived nitroxyl signals in a mixed polymer, poly(acrylonitrile- butadiene-styrene) (17). The samples were exposed on one side to UV irradiation at a controlled temperature of 338 K and humidity corresponding to a dew point of 298 K to simulate the average summer noontime solar intensity in Miami, Florida. The EPR imaging was performed at X band using a maximum gradient strength of about 320 G/cm, which gave a resolution of about 0.3 mm for the relatively broad signals in these samples. The samples were 4-mm diameter × 4-mm long cylinders cut from larger plates. Because the UV radiation was incident on one face of the larger plate, a single spatial dimension (along the direction of radiation) was adequate to describe the samples. A spectral-spatial image for a sample that had been exposed to UV irradiation for 141 h is shown in Fig. 9. The image is represented as a contour plot and as a perspective plot to demonstrate two different ways of displaying the same information. The spectral dimension in the images is shown as absorption spectra, which reflect signal intensity as a function of position in the sample.
However, first-derivatives of the absorption spectra (firstderivative spectra) reveal details of spectral line shape more clearly than can be seen in the absorption spectra, so spectral slices through the image are presented in Fig. 9 as the first derivatives of the slices. These slices show that the line shape of the nitroxyl spectrum varied dramatically through the sample. This variation in line shape through the sample precludes one-dimensional spatial imaging at room temperature. The line shapes provide important information about the variation of the mobility of the nitroxyl in different regions of the sample. For example, much more of the signal in slice d (Fig. 9) is due to nitroxyl that is tumbling relatively rapidly than is present in slice a (Fig. 9). The nitroxyl radicals are initially formed and then destroyed by other radicals that are generated by the polymer degradation. In the study, it was proposed that the time dependence of the spectral line shape indicated that the HAS-derived radical was preferentially consumed in the butadiene-rich regions of the polymer. The faster loss of nitroxyls from the butadiene-rich regions is consistent with other experiments that find that these regions are most sensitive to degradation (17). In the image in Fig. 9, the radical concentrations are highest near the surfaces of the polymer block and much lower in the interior. Oxygen plays a key role in polymer degradation. It was proposed that the slow rate of diffusion of oxygen into the polymer may limit the rate of degradation in the interior of the sample. By contrast, in polymers undergoing thermal degradation at 333 K, the intensity of the nitroxyl signal and line shape were uniformly distributed through the plate. The key advantages of these studies are the ability of EPR imaging to provide information about the degradation process based on both the intensity and the line shape of the nitroxyl signal. Mineralogy and Dating Barite (BaSO4 ) is a common mineral in medium- and low-temperature hydrothermal veins (18). The X-band
Spectral slices
d (3.6 mm)
c (3.2 mm) Figure 9. Two-dimensional X band spectralspatial contour (top) and perspective (bottom) plots of HAS-derived nitroxyls after 141 h of UV irradiation in a weather chamber. The spectral dimension in the image is displayed as absorption spectra. The spectral slices a, b, c, and d, for the indicated positions in the sample are shown in derivative mode. (Figure reproduced from Ref. 17 and used with permission. Copyright (2000) American Chemical Society.)
UV
a
b c
0
1
Le
a (0.2 mm)
2
ng
th
3 4 (m m)
b (1.2 mm)
d
3320
3380
3360
3340
)
c field (G
Magneti
3320
3340
3360
Magnetic field (G)
3380
ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING
EPR spectrum of a single crystal of barite is shown in Fig. 10a for the orientation in which the magneticfield is along the a axis. The full width of the spectrum, excluding the manganese marker signals, is about 40 G and includes several different signals. To image the full spectrum at even modest resolution would require a very large gradient. The several lines are too close together to permit imaging an individual line by using magneticfield gradients. Furthermore, the sample was too large to fit inside a standard resonator. The method of choice for imaging this sample was the ESR microscopy technique (2) in which the sample is placed on a platform and moved relative to a pinhole in the resonator. The magnetic-field is set to observe a particular signal in the spectrum. The signal at g = 2.019 was observed only at the surface and is attributed to defects formed by β rays after crystal growth had terminated. The electron-center signal at g = 2.012 3− is assigned to SO− 3 and the hole center is O2 stabilized by M3+ ions including Y3+ (18). The intensities of these signals are increased by artificial gamma irradiation, so it is proposed that these signals might be useful for ESR dating based on estimated doses from surrounding sources of radiation. Images of the intensities of the signals from the hole center and electron center are shown in Fig. 10b and 10c, respectively. The images show that the spatial distributions of the two centers are different, which may reflect differences in the distributions of diamagnetic species that stabilize the paramagnetic centers. Magnetic Susceptibility Effects When a magnetic material is placed in an external magnetic-field, the field distribution is perturbed because the magnetic-field is concentrated within the object and the lines of magnetic flux must be continuous, as shown in Fig. 11a. If this perturbation is not taken into account, artifacts are generated in spatial images. On the other hand, if the source of the perturbation is recognized and can be adequately modeled, the perturbation can be used to characterize the object. For example, these perturbations can be used to measure local magnetic susceptibilities (19). Polypyrrole generated electrochemically has a sharp EPR signal that can be used for EPR imaging. A thin film of polypyrrole was deposited on a 1-mm diameter palladium (Pd) wire. A 2-cm-long segment of the wire was oriented in an X-band resonator so that the long dimension of the wire was perpendicular to the direction of the external magnetic-field, which is also the orientation of the gradient. In the absence of a gradient or magnetic susceptibility effects, the EPR spectrum of polypyrrole is a single line whose a peak-to-peak line width is 0.3 G. A contour plot of an experimental spectral-spatial image for the polypyrrole on the Pd wire is shown in Fig. 11b. If the Pd wire had not perturbed the magnetic-field, the signal for the polypyrrole would have been observed at the same setting of the external field independent of the position along the spatial axis. However, the image shows that the external field setting that is required to achieve resonance increases as you move from region 1 to 2 and then goes back down as you move into region 1 (Fig. 11a,b). The net field at the sample that is required to achieve resonance is a constant (for constant microwave frequency), so the
(a)
297
g = 2.019 (only at the surface) g = 2.012 hole-center g = 2.000 electron-center Mn(4)
Mn(3)
H//a
Hole center
(b)
H
Barite
b a (c)
Electron center Barite
(c) 2 mm Figure 10. (a) X band EPR spectrum of a single crystal of barite where the magnetic-field was along the a axis of the crystal. The two lines labeled Mn(3) and Mn(4) are marker/calibration signals added by the spectrometer. Images of the intensity of (b) the hole centers (g = 2.012) and (c) the electron centers (g = 2.000) in the barite sample that were obtained by moving the crystal relative to a hole in a rectangular resonator. (Figure reproduced from Ref. 18 and used with permission from Elsevier Science.)
variation in the external field required for resonance is a measure of the perturbation of the field by the magnetic susceptibility of the wire. The observed variation in the external field setting that is required to achieve resonance (Fig. 11b) agrees well with the calculated variation based on the known permeability of Pd (Fig. 11c) (19). This correspondence between calculated and observed effects on the local field strength provides the basis for correcting for susceptibility effects in other samples.
In vivo Imaging and Oximetry The physiological importance of radicals in processes that range from aging to signaling pathways and the immune response has motivated substantial work on in vivo EPR spectroscopy and imaging (20). For the reasons discussed earlier, in vivo experiments are performed with electromagnetic radiation frequencies less than 1 GHz. Most radicals that occur in living systems have very short lifetimes, so the steady-state concentrations are too low to detect by EPR, although it may be possible to find evidence for some of these radicals by reacting them with spin traps to form species that have longer lifetimes. Some naturally occurring radicals that have somewhat
298
ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING
longer lifetimes and higher steady-state concentrations include melanin, ascorbate, and NO bound to myoglobin or hemoglobin. In vivo EPR spectroscopy and imaging studies have predominantly used paramagnetic probes that are added to the system, including those discussed earlier (20). For example, by selecting the substituents on the nitroxyl ring (I) one can have a probe that does, or does not, cross the blood–brain barrier (5). Once the probe is in the tissue, one can monitor the rate at which the probe undergoes redox reactions and is converted to diamagnetic species. The rates of these processes then reflect the local metabolic state of the samples, which varies within organs
(c) A 25% B 50 C 75
1 C
0.2 mm B A 2
Spatial
(a)
1 m0
Spectral
1 Cylindrical object 2
µ
2
(m>m0)
1
B0 (b) 1 C
A 25% B 50 C 75
0.2 mm
B A
2
1G
Figure 11. (Continued)
and among organs. Differences in metabolic state may also distinguish healthy tissue from diseased tissue. In narrow-line EPR spectra, the line width depends on local oxygen concentration, which is the basis for EPR oximetry. Images have been obtained of a colloidal glucose char suspension perfused into an excised perfused rat heart. At the resolution of these images, key anatomical features can be recognized, and it is possible to detect the effects of local ischemia (21). Data acquisition can be synchronized with the heart beat to define the time-dependent alterations in radical distribution and oxygenation (22). The signal-to-noise requirements for in vivo imaging present substantial challenges for technique development. However, the potential enlightenment that could be gained by better understanding, of physiology and metabolic states in vivo, and even real-time monitoring of the efficacy of interventional procedures motivate efforts in this area. Summary and Future Directions
Spatial
1
Spectral
1G
Figure 11. (a) Sketch of the perturbation of a static magnetic-field around a cylindrical object whose magnetic permeability >1 (cgs). (b), (c) Contour plots of X band spectral-spatial image of polypyrrole on a 1.0-mm diameter Pd wire. The data were obtained by using a maximum gradient of 300 G/cm. (b) Experimental spectral-spatial image. (c) Image calculated for the effect of a Pd wire whose permeability was 1.0 + 8.0 × 10−4 (cgs). The values on the spectral axis are for the external magnetic-field. Labels 1 and 2 in parts (b), (c) refer to the regions labeled as 1 and 2 in part (a). (Figure adapted from Ref. 19 and used with permission.)
EPR imaging, it has been shown, provides key insights in materials science, including radiation-damaged samples. This information would be difficult to obtain by other methods. In studies of diffusion coefficients, EPR has the advantage that local tumbling rates can be determined from nitroxyl line shapes. Imaging then provides values of the diffusion coefficients, so both microscopic and macroscopic motion can be monitored in the same sample. Key challenges that face applications in vivo are the time required to obtain an image and the limitations placed on resolution by large line widths and limited signal intensity. At least initially, it may be useful to compare low-resolution EPR imaging results for local behavior of radicals with higher resolution images of anatomical features obtained by other imaging modalities. Spatial distributions change less rapidly with time in many materials than in in vivo experiments, so rapid signal averaging is less of an issue in materials science than in vivo.
ELECTROPHOTOGRAPHY
ABBREVIATIONS AND ACRONYMS DPPH EPR ESR HAS NMR TAM Tempone
1,1-diphenyl-2-picryl hydrazyl Electron Paramagnetic Resonance Electron Spin Resonance Hindered Amine Stabilizer Nuclear Magnetic Resonance Triarylmethyl radical, also called trityl radical 4-oxo-2,2,6,6-tetramethyl-piperidin-oxyl
23. G. R. Eaton and S. S. Eaton, in J. A. Weil, eds., Electronic Magnetic Resonance of the Solid State, Canadian Society for Chemistry, Ottawa, 1987, pp. 639–650. 24. S. S. Eaton and G. R. Eaton, in L. Kevan and M. K. Bowman, eds., Modern and Continuous-Wave Electron Spin Resonance, Wiley Interscience, NY, 1990, pp. 405–435.
ELECTROPHOTOGRAPHY O. G. HAUSER
BIBLIOGRAPHY 1. G. R. Eaton, S. S. Eaton, and K. Ohno, EPR Imaging and In Vivo EPR, CRC Press, Boca Raton, FL, 1991. 2. M. Ikeya, New Applications of Electron Spin Resonance: Dating, Dosimetry, and Microscopy, World Scientific, Singapore, 1993. 3. G. R. Eaton and S. S. Eaton, Concepts Magn. Resonance 7, 49–67 (1994). 4. S. S. Eaton and G. R. Eaton, Electron Spin Resonance 15, 169–185 (1996). 5. S. S. Eaton and G. R. Eaton, Electron Spin Resonance 17, 109–129 (2000). 6. J. A. Weil, J. R. Bolton, and J. E. Wertz, Electron Paramagnetic Resonance: Elementary Theory and Practical Applications, Wiley, NY, 1994. 7. J. H. Ardenkjaer-Larsen et al., J. Magn. Resonance 133, 1–12 (1998). 8. J. P. Hornak, J. K. Moscicki, D. J. Schneider, and J. H. Freed, J. Chem. Phys. 84, 3387–3395 (1986). 9. J. Pilar, J. Labsky, A. Marek, and S. Schlick, Macromolecules 32, 8230–8233 (1999). 10. R. K. Woods, W. B. Hyslop, R. B. Marr, and P. C. Lauterbur, in G. R. Eaton, S. S. Eaton, and K. Ohno, eds., EPR Imaging and In Vivo EPR, CRC Press, Boca Raton, FL, 1991, pp. 91–117. ¨ ¨ 11. M. Van Kienlin and R. Pohmann, in P. Blumler, B. Blumich, R. Botto, and E. Fukushima, eds., Spatially Resolved Magnetic Resonance: Methods, Materials, Medicine, Biology, Rheology, Ecology, Hardware, Wiley-VCH, Weinheim, 1998, Chap. 1, pp. 3–20. 12. H. Nishikawa, H. Fujii, and L. J. Berliner, J. Magn. Resonance 62, 79–82 (1985). 13. R. Murugesan et al., Magn. Resonance Med. 38, 409–414 (1997). 14. A. Feintuch et al., J. Magn. Resonance 142, 382–385 (2000). 15. J. H. Freed, Annu. Rev. Biomol. Struct. 23, 1–25 (1994). 16. M. Sueki et al., J. Appl. Phys. 77, 790–794 (1995). 17. K. Kruczala, M. V. Motyakin, and S. Schlick, J. Phys. Chem. B 104, 3387–3392 (2000). 18. M. Furusawa, M. Kasuya, S. Ikeda, and M. Ikeya, Nucl. Tracks Radiat. Meas. 18, 185–188 (1991). 19. M. Sueki, S. S. Eaton, and G. R. Eaton, J. Magn. Resonance A 105, 25–29 (1993). 20. H. M. Swartz and H. Halpern, Biol. Magn. Resonance 14, 367–404 (1998). 21. P. Kuppusamy, P. Wang, and J. L. Zweier, Magn. Resonance Med. 34, 99–105 (1995). 22. P. Kuppusamy et al., Magn. Resonance Med. 35, 323–328 (1996).
299
Rochester, NY
INTRODUCTION Electrophotography, also called xerography, is a process for producing high quality copies or images of a document. Electrophotography is based on forming an electrostatic charge pattern or image of the original document that is made visible by ultrafine electrically charged particles. The process was first commercialized by the Haloid Corporation in 1949 in the manual XeroX Copier Model A and then in 1955 in the XeroX Copyflo Printer. In 1986, it was estimated that the total copier business worldwide was roughly twenty billion dollars (1). Today, many corporations manufacture copying or printing machines based on the electrophotographic process, and the worldwide copying and printing market is estimated at well in excess of twenty billion dollars. This represents the production of more than 5000 billion dollars pages in 1999. The need for convenient, fast, low-cost copying intensified during the 1940s and 1950s as business practice became more complex. Letters and forms were submitted in duplicate and triplicate, and copies were maintained for individual records. Copying at that time was accomplished by Eastman Kodak’s Verifax and Photostat, mimeographing, and the 3M Thermofax processes. The intent was to replace carbon paper in the typewriter, but these processes were cumbersome and had major shortcomings. Chester F. Carlson patented the first electrophotographic copying machine (2) in 1944, but the first automatic commercial office electrophotographic copying machine, the Xerox model 914, was commercialized in 1959. Since then, the electrophotographic process has come to dominate the field of office copying and duplicating. Electrophotography is defined as a process that involves the interaction of electricity and light to form electrostatic latent images. The formation of electrostatic images (without the use of light) can be traced back to Lichtenberg (3) who observed that dust would settle in star-like patterns on a piece of resin which had been subjected to a spark. In 1842, Ronalds fabricated a device called the electrograph. This device created an electrostatic charge pattern by moving a stylus, which was connected to a lightning rod, across an insulating resin surface. This device could be considered a forerunner of electric stylus writing on dielectric-coated, conductive paper substrates. Stylus writing is commonly used today in rather commercially successful Versatec wide format black and white engineering and color electrographic printers.
300
ELECTROPHOTOGRAPHY
Experiments in forming electrostatic charge patterns on insulators during the 1920s and the 1930s were performed by Selenyi (4–8). Carlson realized that these early processes could not lend themselves easily to document copying or reproduction, so he and Otto Kornei began to experiment with the photoconductive insulators sulfur and anthracene. Using a sulfur-coated zinc plate, the successful reduction to practice of what is now called electrophotography occurred on 22 October, 1938. The sulfur was charged in darkness by rubbing it vigorously with a handkerchief. The characters 10-22-38 ASTORIA were printed in India ink on a microscope slide. The charged sulfur was exposed for a few seconds to the microscope slide by a bright incandescent lamp. Sprinkling lycopodium powder on the sulfur made the image visible after the loose powder was carefully blown away. Finally, the powder image was transferred to wax paper which was then heated to retain the image permanently. Carlson called this process electrophotography. A patent for electrophotography was applied for in 1939 and was granted in 1942 (9). Since then, Schaffert (10) redefined electrophotography to include those processes that involve the interaction of electricity with light to form latent electrostatic images. Significant improvements were made during the late 1940s and the 1950s to the original process steps and materials that Carlson had used. One major improvement by Bixby and Ullrich (11) was using amorphous selenium instead of sulfur as the photoconductor. Other significant improvements were using wire- and screencontrolled corona-charging devices (12), two-component contact electrification developer particles (13), and electrostatic transfer methods (14). These improvements all helped to commercialize the electrophotographic process. The advent of the personal computer and desktop publishing created a need for convenient, low-cost, pointof-need printing. At first, typewriters were adapted to perform impact printing, followed by dot matrix printers, but these concepts had many disadvantages. Small (and slow) ink jet printers have come to dominate the low-volume desktop publishing market at this time. The mid-volume and high-volume markets, however, are presently dominated by faster and more reliable laser and light-emitting-diode (LED) electrophotographic printers. The print quality of these modern printers approaches that of offset lithography quite closely, so many electrophotographic machines compete effectively with the offset press. Examples of these machines are the Xerox Docu-Tech, the Xerox 4850, 5775, 5650, the Heidelberg Digimaster 9110, and the Canon Image Runner 9110. There are business advantages for each type of printer, however, and offset print shops are certainly not going to go out of business in the foreseeable future. Because of the popularity of electrophotographic printers, both large and small, color and black only, this article explores the fundamentals involved in producing hard copy by this process. This article first presents an overview of the essential elements in an electrophotographic imaging system. Then, each of the commercialized process steps is covered in some detail to lead the reader from corona sensitization of the
photoconductor to fixing of the final image. An extension of the electrophotographic process to color printing is discussed in the final section. In a form of electrophotography called xeroradiography (10), the X-ray portion of the electromagnetic spectrum is used to form a latent image on a photoconductor. Xeroradiography is very beneficial in the medical profession, but it is not discussed in this article. A Overview of an Electrophotographic System An electrophotographic imaging system consists of a photoconductive belt or drum that travels through cyclic charging, exposure, development, transfer, erasure, and cleaning steps. The direction in which the photoconductor moves is called the process direction because the various process steps occur sequentially in this direction. Charging occurs by uniformly depositing positive or negative ions generated usually by air breakdown, using one or a combination of several devices called chargers. These are usually corotrons, scorotrons, ionographic devices, or biased conductive rollers. It is also possible to charge the photoconductor uniformly by ion transfer from conductive fluids. The charged photoconductor is exposed to light to form a pattern of charges on the surface, called an electrostatic image. In light lens copiers, the exposure system usually consists of a light source that illuminates a platen, a set of mirrors and a lens that focuses the platen surface onto the photoconductor. The illumination can be full frame flash or a scanning set of lights that move synchronously with the photoconductor. The exposure system in laser printers, consists of a laser beam, which is turned on and off by an acousto-optic modulator or by modulating the laser power supply itself. The electric charges of the electrostatic image are not visible to the human eye, but the pattern is made visible by a development process. During development, ultrafine, charged, pigmented toner particles are deposited on the photoconductor in the image areas, and the background areas are left clean. These particles are pigments coated with a thermoplastic and are electrically charged. Commercial dry development methods are open cascade (two-component developer particles are cascaded over the photoconductor surface in an open tray), electroded cascade (two-component developer particles are cascaded between the photoconductor surface and an electrically biased, closely spaced electrode), powder cloud (electrically charged toner particles are raised into a cloud that comes in contact with the photoconductor surface), magnetic toner touchdown (single-component toner particles that contain magnetite contact the photoconductor), magnetic brush (see Magnetic Brush development and Hybrid Scavengeless Development (development). The most widely used development technology to date has been the magnetic brush. A form of single-component dry development (widely used in Canon products) is called jumping development.
ELECTROPHOTOGRAPHY
The details of single-component and two-component jumping development are covered later in the sections on Jumping development. Liquid toner techniques include liquid immersion development and electroded liquid applicators (see Liquid development of electrostatic images) used in the Indigo 9000 machine made by Indigo N.V. of the Netherlands. After development, it is desirable to transfer these particles to some transfer medium such as paper so that the photoconductor can be cleaned and reused. The transfer can take place electrostatically by charging devices similar to those used to charge the photoconductor uniformly. However, a pressure-assisted biased transfer roll (BTR) is very widely used in modern machines. Then a fixing process, usually the application of heat to the toner image on paper, makes the visible images permanent because the pigment particles are embedded in a thermoplastic base. If the developed image-bearing surface can sustain high pressures and high temperatures, for example using ionography (see Ref. 15 for a description of the ionographic process used by Delphax Systems Ltd., Mississauga, Ontario, Canada), the fixing and transfer process can be performed simultaneously. Fusing the toner simultaneously upon transfer is referred to as a transfuse or a transfix process. Photoconductors tend to degrade at high temperatures, so transfuse is carried out when the image-bearing surface is a rugged insulator, such as aluminum oxide (Delphax machines) or Mylar (intermediate belt-transfer methods). Each of these processes will be covered later in some detail. The Photoconductor The heart of an electrophotographic system is the imaging material called a photoconductor. (For a concise description of the physical principles involved in photoconductors and also a review of the electrophotographic process (see Ref. 1). The photoconductor conducts electrical charges when illuminated; in darkness, it is an insulating dielectric. This material is coated as a thin layer on an electrical conductor, usually aluminum. In the first electrophotographic machines produced during the 1950s and 1960s, this material was zinc oxide powder coated on paper covered by aluminum foil, cadmium sulfoselenide or copper phthalocyanines coated on aluminum drums, or polyvinylcarbazole (PVK) coated on metallic foils or drums. Xerox machines, however, used amorphous selenium vacuum deposited on aluminum drums. As electrophotography matured, these substances were replaced by organic photoconductors. See Mort (1) for a concise discussion of the photoconductivity of amorphous selenium and Borsenberger and Weiss (16) for a thorough discussion of the photoconductivity of some organic materials. A photoconductive material commonly used in modern electrophotographic machines is an Active MATrix coating (AMAT) on a conductive substrate (Fig. 1). Patents to Weigl, Smith et al. and Horgan et al. (17–19) describe AMAT. An AMAT photoconductor consists of an electron-hole pair-generating layer, a charge-transport layer, and an adhesive layer coated on aluminized Mylar. Charge generation occurs within the layer that consists of about 0.5- to 0.7-µm particles of finely powdered trigonal
301
Charge-transport layer
Charge-generation layer Conductive substrate Support structure Figure 1. Schematic drawing of AMAT photoconductor.
selenium uniformly embedded in the adhesive that is usually the bottom layer. The electron-transport layer is usually doped polycarbonate. The transport layer can be any practical thickness, but usually it is about 25–30 µm. The change to AMAT and other organic photoconductors was brought about by cost, availability, and environmental issues with selenium. This change had far-reaching, process-related consequences. Selenium accepted positive ions as the surface charge, whereas organic materials required negative charging. Subsequently, problems of charging uniformity (it is more difficult to produce a uniform negative corona than a uniform positive corona) had to be resolved. This is covered in more detail in the section on charging. In addition, a magnetic brush material technology had been developed (for use with selenium photoconductors) that produced developer particles that had negatively charged toner particles and positively charged carrier particles. (These will be covered in detail in the section on development.) However, the reversal of the electrostatic image charge sign required new developer materials that provided a positively charged toner and a carrier that had a negative countercharge. The development of new materials is always a time-consuming and expensive task. As its name implies, a photoconductor is sensitive to light, so that it is an electrical conductor in illuminated areas and a slightly leaky dielectric (insulator) in darkness. Thus, when it is sensitized by uniform surface electrical charges, it retains an electrostatic charge for some time in the dark areas, but the surface charge in illuminated areas quickly leaks through the material to the ground plane. The mechanism of electrical conduction in the illuminated areas is the formation of electron-hole pairs by the light absorbed in the generating layer. The number of hole–electron pairs per photon is called the quantum efficiency. The magnitude of charge density in gray (partially exposed) areas depends on total exposure to light (total number of hole–electron pairs produced by the exposure). A photo-induced discharge curve (PIDC) is usually generated in a scanner (20). The PIDC relates the voltage of the charge density after exposure to the value of the exposure. These curves are generated at various initial charge densities. Local internal electric fields set up by the uniform surface charges before exposure cause movement of either holes or electrons through the transport layer. In some materials, such as amorphous selenium, only holes are allowed to move from the charge-generating layer to the ground plane. Thus, the charge-generating layer, the front surface of selenium, has to be charged positively. If it
302
ELECTROPHOTOGRAPHY
were charged negatively, then the direction of the electric field would transport electrons through the bulk of the material. However, pure amorphous selenium has a very high density of electron traps distributed through the bulk, so that the electrons would be trapped, and the buildup of space charge in the interior of the photoconductor would prevent further discharge. Traps empty comparatively slowly, and the contrast in electrical potential between image and background areas would be low, resulting in a very weak electrostatic image. The actual charge density in the image at the time of development is the result of the original surface charge density and dark decay. The density of charges that decay because of thermally activated sites for electron–hole pair production is called dark decay. (See Refs. 1,16, and 20 for detailed discussions of photoconductors and photoconductivity.) The photoconductor is flat in the direction perpendicular to the process direction and can be a drum or a belt. There are advantages and drawbacks to both belts and drums that depend on the particular machine configurations under consideration. These involve the generally complicated problem of choosing subsystem configurations for the specific machine and the process applications being considered. For example, a small slow personal printer such as the Hewlett Packard Laserjet would use a drum configuration, whereas a high-speed printer such as the Xerox DocuTech would use a belt configuration. The choices depend on the engineering and innovations required to achieve the performance and cost goals set for the machines. UNIFORM CHARGING OF THE PHOTOCONDUCTOR The goal of charging the photoconductor is to deposit a spatially uniform density of charges of single polarity on the surface of the material. Traditionally, this has been accomplished by running the photoconductor at a uniform speed under (or over) a charging corotron or scorotron. The simplest forms of these two general categories of chargers are a very fine wire (about 0.003 in. diameter) suspended by dielectric materials from a conductive shield and spaced about 0.25 in. over (or under) the moving photoconductor. Corotron Charging A traditional corotron is shown schematically in Fig. 2. A very high dc voltage Vw of about 5000 volts is applied to the wire. The conductive backing of the photoconductor and the shield of the corotron are electrically grounded. Thus, the electric field in the region between the wire and the grounded components exceeds the threshold
Coronode Shield
for air breakdown, and a steady flow of ions of single polarity leaves the wire coronode. The threshold for corona discharge, however, is a function of many variables. Some of these are the distance between electrodes, the atmospheric pressure, the polarity of the wire voltage (i.e., positive or negative corona emission), the geometry of the wire, and the nature and composition of the gas that surrounds the coronode. Relative humidity and ozone concentration also influence corona production. The total ionization current i splits between flow to the shield and flow to the photoconductor. The sum of current supplied to the photoconductor and the grounded shield is constant and is controlled by the wire voltage: i = AVw (Vw − Vth ).
The constant A also depends on the geometry of the shield, wire diameter, air pressure, and temperature among other things. This constant and the threshold voltage Vth , for corona discharge, have been traditionally obtained from experiments for specific configurations. However, the current density in the flow of ions that reaches the photoconductor depends on the voltage Vp , of the charges on the photoconductor, and the voltages of the components surrounding the wire, zero for grounded shields. The current flow to the photoconductor is established within milliseconds, and it would remain steady if the charge receiver were a conductor. It is altered, however, by the buildup of electrical potential on the surface of the photoconductor. So, initially the corona current to the photoconductor is a high value but, as charges accumulate, the current is reduced. Assuming a capacitively charging photoconductor that has no charge traps to be emptied by initial deposition of corona charge on its surface and negligible dark discharge during charging time, the current ip , to the photoconductor surface is ip = C
ip = Ap (Vw − Vp )(Vw − Vp − Vth ).
(2)
(3)
A in Equation (1) is associated with the total current leaving the wire, and Ap is associated with the current arriving at the photoconductor. The two constants are not the same, and they have to be measured individually. Thus,
Vp
dVp =
Photoconductor
Figure 2. Schematic drawing of a traditional corotron charger.
dVp . dt
If we assume that the ions arriving at the photoconductor surface are driven simply by the instantaneous electric field between the coronode and the photoconductor surface, then in the simplest form
0
Charged ions
(1)
=
1 C 1 C
t
ip dt
0 t
Ap (Vw − Vp )(Vw − Vp − Vth )dt.
(4)
0
For any set of wire voltages, geometries, and other conditions that govern the generation of a corona, the
ELECTROPHOTOGRAPHY
variables in Eq. (4) are Vp and t, the charging time. So,
Vp 0
dVp 1 = (Vw − Vp )(Vw − Vp − Vth ) C
t
Ap dt = 0
Ap t . C
(5)
Following Schaffert (10, p. 238), we integrate and solve for Vp in terms of Vw , the geometrical factors and charging time t, to obtain the photoconductor voltage: Ap Vth t
1−e C . Vp = Vw Ap Vth t Vw 1− e C Vw − Vth
(6)
The slope of the curve relating current to the photoconductor and the voltage of the accumulated charge called the slope of the charging device is important. One way of increasing this slope is to increase the wire voltage; however, electrical conditions for sparking or arcing (a disruptive surge of very high current density) must be avoided. When sparking occurs, the wire or the photoconductor can be damaged, and spots of high charge density can appear on the photoconductor surface. (See Ref. 21 for further information on gaseous discharges.) Corona spatial and temporal uniformity are very important requirements for good print uniformity. When the wire voltage is positive and a positive corona is generated, uniformity is generally very good. However, using negatively charging photoconductors, the spatial and temporal uniformity of negative corona current emitted by metallic wires is insufficient to achieve good print uniformity. Spots of high current density appear along the corona wires and tend to move along the wire with time, thereby yielding streaky prints. To avoid these streaks, the wire may be coated with a dielectric material such as glass. Instead of applying a steady (dc) voltage to the wire, using a grounded shield, a high frequency alternating (ac) voltage is applied to the wire, and the shield is biased negatively to drive the negative ions to the photoconductor. This arrangement is called a dicorotron and is used in such machines as the Xerox 1075 and the Xerox Docu-Tech. Other methods of providing uniform negative corona charging use a linear array of pins in a scorotron configuration. Scorotron Charging A typical scorotron is shown schematically in Fig. 3. In scorotron charging, a screen is inserted between the highvoltage wire and the charge-receiving photoconductive surface. The screen is insulated from the shield and
the coronode. It is biased independently to a potential which is close to the desired voltage of the charges on the photoconductor. Thus, a charge cloud is generated by air breakdown in the volume enclosed by the screen and the wire. The electric field generated by the voltage difference between the photoconductive surface and screen drives ions of the right polarity to the photoconductor. When this field is quenched by the accumulation of charge on the photoconductive surface, the current in the charging beam diminishes. Finally no further charging occurs, even though the photoconductive surface element may still be under the scorotron’s active area. Possible combinations of wire and screen voltages are positive dc wire and screen voltages negative dc wire and screen biased ac wire that has either a positive or negative screen. Another variation of scorotron construction is replacing the screen wires, that run parallel to the coronode by a conductive mesh. The openings of the mesh are placed at a 45° angle with respect to the coronode. This has been named a potential well (POW) scorotron and is disclosed in Ref. 22. Yet another variation of scorotron construction is replacing the coronode wire by a series of fine coronode pins. This device is called a ‘‘pin scorotron’’ and is disclosed in Refs. 23 and 24. Each of these devices has its respective advantages and drawbacks. Other Methods of Charging Roller charging is used in some electrophotographic machines (see Refs. 25 and 26). A biased conductive elastomeric roll moves synchronously across the uncharged photoconductive surface. The electric field is distributed across the gap defined by the surface of the roller and the photoconductor. If there is no compression of the elastomer as it contacts the photoconductor, the gaps on approach and departure are symmetrical. The threshold for air breakdown corresponds to a critical combination of electric field and gap. Therefore, the instantaneous point along the circumference of the roller, above the photoconductor surface at which air breakdown initiates, is a function of roller nip geometry, conductivity, and applied bias. Consider a point on the surface of the roller during the approach phase of the charging process (Fig. 4).
Conductive elastomer
Coronode Biased core Shield Photoconductor Screen Charged ions Figure 3. Schematic drawing of a traditional scorotron charger.
303
Gap approach
Gap separation Photoconductor
Figure 4. Schematic drawing of a biased roller charger.
304
ELECTROPHOTOGRAPHY
The gap diminishes to zero as the point approaches contact. During this time, charging may occur for a period of time that depends on current flow, the dielectric properties of the photoconductor, and the rate of charge density accumulation on the photoconductor, which moves synchronously with the roller. If the air breakdown is not quenched during the approach, then current continues as the point on the roller surface and photoconductor surface depart from the nip. As the photoconductor and roller surface separate, the gap increases and charging will cease at some value. The total charge density deposited on the photoconductor is the accumulation of charge during this process. Thus, limitations on charging uniformity from this system are related to both the mechanical and electrical properties of the coating. In all of these systems, the final surface voltage on the photoconductor depends on component voltages, device geometry, and process speed. Other methods of charging, not yet mature enough for extensive machine usage are conductive blade charging (27), contact sheet charging (28), and aquatron charging (29,30). IMAGE EXPOSURE Image exposure can occur by full frame flash, scanning illuminators moving synchronously with the photoconductor, an array of light-emitting diodes, or a laser scanner. Let us examine each type of system. Full Frame Flash A platen supports the optical original that will be converted into an electrostatic image which consists of a pattern of electronic charge on the photoconductor surface. Flash exposure requires a flat surface such as a belt or plate. Figure 5 shows an exposure device schematically (for example, as used in the Xerox 1075 family of machines of the late 1970s and 1980s). Let x, y be coordinates on the photoconductor surface. Exposure is a function of light intensity and time of illumination at the photoconductor surface and is given by EX(x, y) =
For a full frame flash exposure, Eq. (7) can be approximated as EX(x, y) = I(x, y)t (8) if flash intensity, rise time, and decay times are insignificant compared to flash duration t and the photoconductor moves only an insignificant distance during t. In Eq. (8) I(x, y) is the intensity of light at the photoconductor surface that is reflected from the platen. Neglecting light lost through the lens and illuminating cavity system and light reflected from the platen surfaces as stray light, (9) I(x, y) = I0 R(x, y), where R(x, y) is the reflectance distribution of the original image on the platen and I0 is the intensity of light that strikes the original image surface. Because the lens focuses the original onto a plane, only belt or plate photoconductors can be used for this exposure system. If a drum is desired as a photoconductor, then a moving light source synchronized with the moving drum surface is possible when using stationary flat platens. An alternative is a moving platen that has a stationary light source. Moving Illuminators with Stationary Platens Moving illuminators combined with a partially rotating mirror enable focusing a flat image on a platen onto the curved surface of a drum. The exposure system has to be designed so that the moving illuminated portion of the platen is always focused at a stationary aperture above the moving drum surface. This is enabled by a set of mirrors, at least one of which partially rotates within the optical path. Figure 6 shows schematically how this could be accomplished.
Stationary platen
n Moving light sources
toff
I(x, y)dt.
(7)
ton
Lens Stationary platen
Rotating mirror
Stationary mirror
Illuminating cavity Flash lamps Aperture
Lens and shutter Moving photoconductor
Rotating photoconductor drum n
Drive roll Figure 5. Schematic drawing of a full frame flash exposure system.
Figure 6. Schematic drawing of a stationary platen with that has a moving light source exposure system.
ELECTROPHOTOGRAPHY
Here, exposure at the moving drum surface is still given by Eq. (7) and in a 100% magnification system, x, y at the platen corresponds to x, y on the drum. The exposing aperture is a slit lengthwise to the drum, so although Eq. (8) still applies, t is now x0 /v, where x0 is the aperture width and v is the drum speed. Because the illuminators move at the same speed as the drum surface, then again neglecting light loss through optics, the intensity distribution at the drum surface is given by Eq. (9).
t p < x0/ n
Case 1: tp < x0 v, Case 2: tp = x0 v, Case 3: tp > x0 v. Figures 7–9 show the exposure distribution schematically for each of the three cases, assuming that there is complete darkness before and after the LED array. In these figures, it is assumed that the relationships shown for the three cases are achieved by varying the pulse time. However, if the relationship of Fig. 9 is achieved by increasing velocity instead of pulse time, then exposure is decreased, as shown in Fig. 10. In these figures, points 1 and 2 are the leading and the trailing edges of the LED array aperture. The photoconductor moves at constant velocity v. Points 1 and 2 are the projections of points 1 and 2 onto the moving photoconductor. At t = 0, the LED is activated, and the photoconductor under point 1 moves away from point 1, so at 1 , EX = 0. However, the photoconductor under point 2 continues to be illuminated either
x0 I =0
1
2
I =I 0
EX
EX=I0t p
2′
1′
x0
n
nt p Photoconductor
Exposure by Light-Emitting Diodes Linear arrays of light-emitting diodes (LEDs) are used in some printers as an exposure system alternative to a scanning laser spot. The array of approximately 3000 diodes is arranged perpendicularly to the photoconductor motion. Each diode is pulsed for a period that depends on factors such as the geometry and light intensity of the diode and the photosensitivity and velocity of the photoconductor. Because the array is stationary and the photoconductor moves, the x, y location of each exposed element (pixel) on the photoconductor is determined by the timing of the pulses. A start of scan detector signals the LED array electronics (sets the clock to zero for the scan line) that the photoconductor has moved one scan line in the process direction. Depending on the desired image, sets of individual LEDs are activated simultaneously during each scan line. The x location is defined by the position of the photoconductor, and the y locations by which sets of LEDs are turned on. However, the illuminated area element on the photoconductor is defined by the size of the output aperture of each diode. So exposure is again defined by Eq. (7), but now t is related to pulse time tp , assuming negligible illumination rise and fall times. The photoconductor moves, and the stationary LED’s aperture defines the illuminated area of the photoconductor. Three cases define the exposure distribution at the photoconductor surface. Letting x0 be the width of the aperture in the process direction and specifying that the photoconductor moves at constant velocity, (x0 v = const), the three possible conditions are
LED
Case 1 :
I =0
305
nt p
x0
Figure 7. Schematic drawing of exposure at the photoconductor surface when pulse time is less than transition time.
LED Case 2: t p = x0/ n
x0
I =0
2
I =0
1
I = I0
EX
EX= I0t p 1′ n
2′
x0
nt p
nt p
x0
Photoconductor
Figure 8. Schematic drawing of exposure at the photoconductor surface when pulse time equals transition time; nominal photoconductor speed.
LED Case 3: t p > x0/ n
I =0
x0 2
1
I =0 Photoconductor
I =I0
EX=I0x 0 /n
EX 2′
x0 nt p
1′
n
nt p
x0
Figure 9. Schematic drawing of exposure at the photoconductor surface when pulse time is greater than transition time; nominal photoconductor speed.
until the LED is turned off or until 2 reaches the leading edge, point 1. Any point on the photoconductor upstream of point 2 is in darkness until it reaches point 2, and then it is exposed until it reaches point 1 or the LED is turned off. The exposure that occurs between points 1 and 1 and 2 and 2 constitutes smear. From Figs. 7–9, to minimize smear, it is desirable to make pulse time as short and light output as
306
ELECTROPHOTOGRAPHY
LED
x0
t p > x0/ n I =0
2
I =0
1
I = I0
Photoconductor
EX 2′
x0
EX= I0x 0/ n 1′ n
nt p
nt p
x0
Figure 10. Schematic drawing of exposure at the photoconductor surface when pulse time is greater than transition time because photoconductor speed is greater than nominal.
LED
x0 / n > t p ; t r = x0/ n I =0
EX
x0 2
I =0
1
Photoconductor
I = I0
d
EX = I0tp /2
2′
d =x 0 2″ nt p
x0 nt p
1′
x0
EX= I0tp 1″ nt p
n
x0
Figure 11. Optimum exposure at the photoconductor surface to eliminate banding when pulse time is less than transition time; nominal photoconductor speed.
intense as possible as shown in Fig. 11. However, the raster line spacing d has to be critically controlled to avoid printing bands perpendicular to the process direction. The raster line spacing is the distance that the photoconductor moves during the time interval between the initiation of one exposure pulse and the next. Thus, the optimum combination of pulse time and raster line spacing would be such that the combination of smear and fully exposed photoconductor provides constant exposure in all background areas. For this to happen, the time between pulse bursts tr has to be exactly x0 v, where v is the photoconductor velocity. It can be appreciated that any fluctuation in v would cause fluctuations in EX(x) and the areas between 1 and 2 would either be overor underexposed because exposure is additive. Banding print defects are objectionable because the human eye is particularly sensitive to periodic fluctuations of optical density or color. When periodic banding perpendicular to the process direction occurs in background regions (using charged area development), it is called structured background to distinguish it from the more random background sometimes called fog. However, when the banding occurs in image regions (using discharged area development), it is called structured image. Schemes to avoid banding are reported in Ref. 31.
In this discussion, it was assumed that the illumination at the surface caused by the LEDs were sharply defined square or rectangular areas and completely uniform in intensity. In practice, the exposure optical system makes the intensity distribution at the photoconductor surface more Gaussian than shown in Figs. 7–11. Gaussian distributions are discussed in the following section. Relative to lasers, individual LEDs are limited in intensity because of losses in the optical system and possible overheating, if run at high power. So, more sensitive photoconductors are required for high-speed LED printers than for laser applications. Exposure by Laser Scanning In laser printers, a succession of spots of light of uniform diameter are turned on and off as the laser beam sweeps perpendicularly across the plane of the moving photoconductor. Because the electrical charges dissipate due to exposure, the desired pattern left on the surface is the electrostatic image. Just as in LED exposure, this electrostatic image has characteristics that correspond to the physical characteristics of the spot of light that exposes the uniformly charged material. The resolution and edge sharpness attainable in line or halftone images depends on, among other things, the exposing light spot diameter and the intensity distribution within the spot. In addition, the darkness of the developed image or the dirtiness of the background areas can depend on the intensity distribution of the spot of light and the ‘‘fast and slow’’ scan velocities with which the spot traverses the photoconductor. Fine lines have their peak charge densities eroded because of the overlap of exposure when the intensity distribution within the spot is broad and Gaussian. The light spot used to expose the photoconductor is generated by a laser that is optically focused on the plane of the generating layer of the photoconductor. The spot is modulated in time by modulating the voltage across the lasing junction in solid-state lasers or by passing the light beam through an acousto-optic modulator and modulating the polarization state of the shutter. The acousto-optic shutter is enabled by the plane polarization of laser light. A rotating polygon that has mirror surface finishes on the facets reflects the beam so that it sweeps across the photoconductor in the fast scan direction, which is perpendicular to the process direction. A start of scan detector signals a clock in the electronic subsystem that the spot is at the start of the scan position. The start of the scan position is usually the left edge of the document image area. A line of discharge spots is generated as the photoconductor moves one spot diameter in the slow-scan, or process, direction. A twodimensional exposure pattern is generated by sequentially turning the spot on or off. Figure 12 shows a laser scanning exposure system schematically. Electrical charges are invisible to the human observer, so the electrostatic image has to be made visible by a development process. Generating an Electrostatic Image by Laser Exposure
Exposure. The fast scan velocity vfast of the exposing laser spot is associated with the rotation of the polygon. The slow scan velocity vslow is the photoconductor velocity.
ELECTROPHOTOGRAPHY
307
at the photoconductor, at anytime, t
Rotating polygon Optics
I(x, y, t) = I0 e−(ax
Laser
Optics
2 +by2 )
.
(12)
For simplicity, assume circularly symmetrical Gaussian, a = b, laser spot intensity distributions.
Photoconductive rotating drum
Figure 12. Schematic drawing of a scanning laser exposure system. See color insert.
A start of the scan detector tells the laser modulator when the spot is at the beginning of a line of information. An end of the scan detector tells the modulator when the spot has reached the end of the line. During the time it takes the spot to complete a line of information of length L, the photoconductor moves one raster line spacing d. The raster spacing is normally one full width half-max (FWHM) of the spot intensity distribution at the surface. Fast and slow scan velocities are related to each other by vslow vfast = . L d
(10)
The acousto-optic modulator either lets light through or shuts it off. The time from the start of the scan determines the location of the image element. Hence, considering charged area development (CAD), the laser will be on until the spot reaches an area element that corresponds to the start of an image (such as a letter, a line or a halftone dot, etc,). Then the modulator shuts the beam off. The modulator turns the beam on again when the end of the image element is reached. While the spot is traversing the photoconductor perpendicularly to the process direction, the photoconductor is moving along the process direction. The next line of information begins when the distance traversed by the photoconductor at the start of the scan detector is just one FWHM of spot intensity distribution away from the previous start of the scan signal. Periodic variations in vslow produce periodic banding in image and background areas, perpendicular to the process direction. This image defect will be discussed later. Exposure is defined as the product of light intensity and illumination time, as given by Eq. (7). However, in that equation, x, y, and t are independent variables. When using scanning lasers, light intensity is distributed along the x, y directions on the photoconductor generally as a Gaussian function. Because the spot and the photoconductor move, x, y, and t are no longer independent. Thus,
Generating Lines Parallel to the Process Direction. Consider exposure in the fast scan direction separately from exposure in the slow scan direction by letting y be zero. On a moving photoconductor, x = vfast t, but to generate lines, the laser is turned off and then on again. Thus, for charged area development, the laser is on from the start of the scan until the first image element is reached. Then, it is turned off (I0 = 0). At the end of the image element, the laser is turned on again (IO = IO ) until the next image element is reached and so on. For discharged area development, the procedure is reversed. Consider charged area development. Exposure at dx located at a point x in background areas is I(x)dt. So, at any point x in the background along the fast scan line, 2
I(x) = I0 e−a(x−vfast t0ff ) ,
where toff is the time from the start of the scan, at which the laser is turned off. Then, dx . (14) dEX(x) = I(x)dt = I(x) vfast But as the spot of light moves past that element of the photoconductor, the total exposure at x is the sum of all of the exposures at x, as the intensity of the spot of light first increases then decreases. We can think of any x as a discrete point along the photoconductor, and intensity is represented by 2
2
(11)
and x, y are coordinates on the moving photoconductor in the fast and slow scan directions, respectively. Generally,
2
+ I3 e−a(x−vfast t3 ) + · · · In e−a(x−vfast tn ) · · · , (15) where vfast t1 , vfast t2 , vfast t3 , vfast tn . . . etc., are the fast scan raster points. Each raster point is associated with a laser intensity and if these are all the same I1 = I2 = I3 = In . . . = I0 then, they factor out, of course, and Eq. (15) is simplified. However, when the laser is pulsed periodically to generate lines, then the corresponding I s = 0 and vfast t s locate the positions along the fast scan direction at which the laser is turned off. Total exposure at any x is the integral from the start of the scan to the end of the scan, and the exposure pattern for each line or location on the photoconductor is the superposition of each of the exposures of all the other patterns. A series of exposures results for every element in the fast scan direction:
I(x, y, t)dt,
2
I(x, t) = I1 e−a(x−vfast t1 ) + I2 e−a(x−vfast t2 )
EX(x, y) =
(13)
EX(x) = I0 (x)
∞
2
e−a(x−vfast tn ) dt.
(16)
−∞
To evaluate the integrals at each element in the background areas, we use Eq. (16), hold tn constant, and
308
ELECTROPHOTOGRAPHY
√ change the variable √ of integration to u = a(x − vfast tn ), using x = vfast t, du = avfast dt, and exposure becomes EX(x) =
I0 √ vfast a
√ π I0 2 e−u du = √ . a vfast −∞ ∞
(17)
Thus, exposure in background regions along a fast scan turns out to be simple, until the laser is turned off. EX =
I0 vfast
π EX = 0 a
See Fig. 13 for an illustration. Exposure along x during the transition from on to off is approximately 2
dEX(x) = I(x)dt = I0 e−a(x−vfast toff )
dx vfast
I0 fast
p a
Structured Background or Structured Image and Motion Quality. Exposure on the photoconductor in the y direction, perpendicular to fast scan velocity, consists of rasters of exposure placed one raster spacing apart. Equation (12) can be expressed as
EX=0 Fast scan
Figure 13. Formation of lines parallel to the slow scan direction for charged area development by periodically turning off the laser.
2
I(x, y, t) = I0 e−ax e−by . 2
(18)
The term vfast toff in the exponent is the position along the photoconductor at which the laser turns off. The transition is usually made within one pixel. In the example see Fig. 14, the laser is on and turns off (I0 = 0) at t = 6/vfast . It turns back on at t = 9/vfast and stays on until t = 15/vfast . Then, it turns off and on again at t = 18/vfast , and so on. The normalized exposure pattern is shown in Fig. 14. The normalization is vfast t = 3n to represent n pixels that are three units wide. It can be appreciated that if the lines are too narrow, then exposure overlap erodes the null exposure between the lines, and therefore charges on the photoconductor are dissipated in the lines as well as in background areas. This is one reason that it becomes difficult, in CAD to print narrow lines that are as dark as wider lines. It might be obvious that if the laser spot intensity
EX= n
distribution had a steeper slope, then the overlaps at the peak of the line would be reduced. Hence, the charge density would increase and make it possible to develop a darker narrow line. This, however, has consequences in the avoidance of structured background in CAD systems. When development occurs in discharged areas of DAD (or write black) systems, the print defect is structured solid areas.
All integrations in the previous section were with respect to x. Exposure in background areas along the y direction for each raster at constant x is √ π I0 −by 2 e . (20) EX(y) = √ a vfast see Eq. (17) for integration in the x direction. However, the combination of raster spacing and exposure spot size b is chosen to overlap exposures to prevent bands from printing. To account for the raster spacing, we introduce into Eq. (20), the slow scan velocity vslow , and tr the time required for the polygon to move from one facet to the next: √ π I0 −b(y−vslow tr )2 e . (21) EX(y) = √ a vfast Here, vslow tr is the raster spacing d controlled by the rotational rate of the polygon. It can be appreciated that a cyclic perturbation in vslow will cause the raster spacing to fluctuate with time, noting that tr is a constant. If the polygon rotational rate fluctuates or wobbles, then tr is not constant, but that is a separate issue not addressed here. Assume that, because of mechanical and electrical imperfections in the photoconductor drive, vslow (t) = vs [1 + A sin(π t)],
Fast scan direction
Exposure at x ′
1.2
√ π I0 −b(y−vs [1+A sin(π t)]tr )2 e . EX(y, t) = √ a vfast
1 0.8 0.6 0.4 0.2 0 10
20 30 40 position on photoconductor (x ′)
(22)
where A is the amplitude of variation, is the frequency of variation, and vs is the perfect slow scan velocity.
1.4
0
(19)
50
Figure 14. Exposure pattern for single-pixel lines spaced two pixels apart; I0 = 0 at n = 2,5,8,11, . . . .
(23)
Equation (23) is a function of t as well as the constant tr . Assume that the first line of consecutive rasters under consideration occurs at y = 0 and t = 0. Because the photoconductor moves away from the line as t increases, it is necessary in Eq. (23) to include only as many tr ’s as influence the exposure of two consecutive rasters. The maximum error in exposure occurs halfway between these two rasters. Depending on spot size b, only a few values
309
1.2
1.2 Normalized exposure
Intensity distribution at photoconductor
ELECTROPHOTOGRAPHY
1 0.8 0.6 0.4
1.0 0.8 0.6 0.4 0.2 0
0.2
0
0.002
0 0
0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 Position on photoconductor (y )
Figure 15. Normalized intensity distribution in the slow scan direction for rasters spaced 0.0033 in. apart and b = 2.5 × 105 in−2 .
of tr are significant. For example, notice from Fig. 15, that the intensity (therefore exposure) contribution of the third raster (y = 0.0066 in.) to the point halfway between rasters at y = 0.00165 in. is essentially zero. Consider a laser that has an optimum spot size for a 300 lpi printer. The intensity distribution is shown in Fig. 15. Here the values of b and vslow tr were chosen so that exposure in background regions is optimum. The intensity distribution shown above would give an exposure distribution, as shown in Fig. 16. The waviness in this figure reflects the fact that when exposures add, the middle of the slow scans gets less light than the peak regions. The result is that the photoconductor has more charge density at the halfway points than at the points that correspond to the peaks of exposure. If the voltage of the charges at these points
0.8
0.6
t =1/(2W)
0.4
0.2
0 0
0.001
0.002
0.003
0.004
0.014
is not biased out by the developer subsystem, then some background deposition occurs at these points in CAD (see Development). On the other hand, in DAD, the image has structure if this occurs. This is extremely objectionable because the human eye is more sensitive to periodic than to random variations in darkness. For the plot in Fig. 16, the drive system is assumed perfect, so that A = 0 in Eq. (23). Now let us presume that some 120-Hz vibration gets into the drive system. For example, the power supply for the drive motor might have a 5% ripple, A = 0.05, (a realistic value for some low-cost power supplies). This manifests itself as shown in Fig. 17. Only three consecutive rasters were included for the calculation in Eq. (23). The top curve in the set shown in Fig. 17 corresponds to t = 0, and the bottom curve to t = 1/(2). This variability in exposure is converted to a variability in voltage in the background regions that uses the photo-induced discharge curves discussed in Photoconductor Discharge.
Consecutive raster
1
0.012
Figure 16. Normalized exposure distribution in the slow scan direction for rasters Spaced 0.0033 in. apart, b = 2.5 × 105 in−2 , and has perfect motion quality.
Initial raster
1.2
Normalized exposure
0.004 0.006 0.008 0.01 Position y (inches)
0.005
0.006
0.007
0.008
0.009
0.01
Position y (inches) Figure 17. Normalized exposure distribution at consecutive rasters as a function of time for one-quarter period of motion quality perturbation.
310
ELECTROPHOTOGRAPHY
It can be appreciated that if the laser spot were smaller, then, for the same raster distance of 0.0033 in., the severity of underexposure would increase halfway between rasters. The unit of laser power is expressed in milliwatts and exposure in ergs/square centimeters. So, intensity is in milliwatts/square centimeter, but a watt is a joule/sec and so after conversion, exposure by 1 mw per cm2 for 1 second yields 104 ergs/cm2 . But for an 11 in. wide printing system where the photoconductor moves at 6 in/s, and printing spots are 0.0033 in. in diameter, the exposure time is of the order of 10−4 seconds. Therefore, exposure is of the order of 1 erg/cm2 per milliwatt. Thus for reasonably powered lasers (5–10 mw) in this type of system, using conventional magnetic brush development methods, the photoconductor should discharge from a full charge of about 600 volts to a background in the range of 100–200 volts. PHOTOCONDUCTOR DISCHARGE Photoconductor discharge is covered in detail in the Photoconductor Detector Technology article of this encyclopedia and also by Scharfe, Pai and Gruber (20). Photoconductor discharge is described by a photo-induced discharge curve (PIDC) that can be obtained experimentally by a curve fit of potentials when the exposure It is varied in a scanner (32). A scanner is a device that cyclically charges a photoconductor uniformly and measures the surface voltages at various stations. At the charging station, the corona current to the photoconductor and the surface voltage are measured. The charged photoconductor moves to an exposure station where it is exposed to a known light source. The surface voltages before and after exposure are measured by electrostatic voltmeters. The exposed photoconductor then moves to an erase station where it is discharged to its residual potential. The surface potential between exposure and erase is measured at several stations, so that the discharge rate after exposure can be calculated. After erase, the process starts over again, and exposure is varied. Thus, after a series of experimentally determined points, a curve can be fitted to the data. The measured data are modeled in terms of the transit times of the photogenerated carriers; the quantum efficiency of the charge generation; its dependence on the internal electric field; trapping of charges during transit from the photogeneration layer to the surface; dark decay, if significant; and cyclic stability. In modeling the PIDC, it has been assumed that electrical contacts are blocking. (There is no charge injection from the conductive substrate into the charge generating layer.) Let us follow the derivation of Scharfe et al. (20). For a capacitively charging photoconductor in which light photons are highly absorbed by the generating layer and the polarity of the surface charges of the mobile charge carriers are the same as the surface charges, the voltages of the surface charges before and after exposure can be obtained from dV = ηeI, (24) Cp dt provided that the transit time of the carriers is short compared to the exposure time and that the range of
the carriers is long compared to the photoconductor layer thickness. Here Cp is the photoconductor capacitance, η is quantum efficiency (see Ref. 20 for the distinction between quantum and supply efficiencies), e is the electronic charge, and I is the exposing light intensity. This leads of course to photoconductor discharge as
V
eI Cp
dV = V0
t
ηdt,
(25)
0
which leads to a linear discharge curve, if quantum efficiency is a constant independent of electric field and intensity is not a function of exposure time. Then Vimage is Vimage = V0 −
eI Cp
t
ηdt,
(26)
0
where V0 is the voltage at start of exposure and Vimage is the voltage immediately after exposure. This is a simple linear equation, generally not representative of real photoconductors except for hydrogenated amorphous silicon (20). In terms of surface charge densities, σ0 dp , kε0
V0 = and Vimage =
σimage dp . kε0
(27)
Thus, when no charges are in the bulk, σimage is the surface charge density on the photoconductor after exposure, dp is the photoconductor geometrical thickness, k is its dielectric constant, and ε0 is the permittivity of free space. For many photoconductors, η depends on some power of the electric field but not on light intensity (i.e., it obeys reciprocity) given by η = η0
V dp
Pe .
(28)
Here, V is the instantaneous voltage of the charge on the surface, η0 is the free carrier density and Pe is the power. So, combining Eq. (24) and Eq. (28) and integrating as in Eq. (25), yields, for Pe = 1, " Vimage = V0
1−Pe
− (1 − Pe )
eη0 It
#
1 1−Pe
.
Cp dp Pe
(29)
The shape of the PIDC depends on the electric field dependence of photogeneration of the carriers when there are no mobility or range limitations. Another relationship for the PIDC is given by Williams (33). It
Vimage = Vr + (V0 − Vr )e− Ea .
(30)
In Eq. (30), Ea is an empirical constant also called the normalized sensitivity. Williams gives the value of this
ELECTROPHOTOGRAPHY
Vc 2 − SIt Y PIV = 2 Vimage = PIV + Vr + PIV 2 + Vc 2 Y−
(31)
fit some typical photoconductors. (It can be shown that the equations in Ref. 32) and Eq. (31) are equivalent.) Here V0 is the electrostatic potential in complete darkness, S is the spectral sensitivity of the photoconductor, Vr is a residual potential (under high exposure, the potential is never less than this value), Vc is an empirically derived constant which connects the initial slope of the PIDC with the residual value. Figure 18 is a fit to Eq. (31) of PIDC #8 shown in Fig. 6 of Ref. 19. A more useful plot is the linear plot shown in Fig. 19. In this figure, PIDC #8 of Ref. 19 is calculated for V0 = 700 volts and V0 = 1350 volts to show how background voltage increases when the initial corona charge increases. The linear plot is more convenient especially when the sensitivity and curvatures are improved, as for example in Fig. 20. These improved PIDCs are more practical for exposures in the 5–10 erg/cm2 range that are encountered in machine applications. (See Exposure by laser scanning.) So, by using the PIDC curves, along with the exposures discussed in Image Exposure, obtains the electrostatic image that needs to be developed.
1400 Photoconductor voltage
Y = V0 − Vr
1600
1200 1000 800 600 400 200 0 0
5
10
15
20
25
30
35
Exposure (erg/cm2) Figure 19. Linear plot of photo-induced discharge curve #8 of Ref. 16, fitted by Melnyck parameters S = 60 volts/erg/cm2 , Vc = 400 volts, Vr = 40 volts, and Vddp = 1350 volts.
1600 1400 Photoconductor voltage
constant as 31.05 ergs/cm2 for organic photoconductors and 4.3 ergs/cm2 for layered photoconductors, when exposure occurs by light illumination at 632.8 nm (33). Melnyk (32), found from experimental measurements in a scanner that PIDC equations of the form
311
1200 1000 800 600 400
Photoconductor Dark Discharge The PIDC curves describe the photoconductor voltage in exposed areas. These are background areas when
200 0 0
10
15 EX
1600
20
25
30
35
(erg/cm2)
Figure 20. Linear plot of photo-induced discharge curves that are more practical for exposure systems capable of delivering 1 erg/cm2 per milliwatt; S = 80 volts/erg/cm2 , Vc = 120 volts, Vr = 40 volts.
1400 Photoconductor voltage
5
1200 1000 800 600 400 200 0 0.1
1
10
100
Log exposure (erg/cm2) Figure 18. Logarithmic plot of photo-induced discharge curve #8 of Ref. 16, fitted by Melnyck parameters S = 60 volts/erg/cm2 , Vc = 400 volts, Vr = 40 volts, and Vddp = 1350 volts.
using charged area development (CAD) and image areas when using discharged area development (DAD). However, a time delay occurs after exposure, during which the electrostatic image traverses from the exposure station to the development station. This time delay depends on process speed and machine architecture. During this delay, the photoconductor voltage of the electrostatic image degenerates because of dark decay. The decrease in voltage has to be considered when the system is designed because, as will be seen later in the section on Development, the development potential is key in determining the toner mass deposited on
312
ELECTROPHOTOGRAPHY
the photoconductor image. Development potential is defined as the difference in potential between the image at development time and the bias applied to the developer. However, the applied bias has to be tailored to the background potential to avoid dirty backgrounds. Therefore, it is important to know what the image and background potentials are as the photoconductor enters the development zone. The dark decay rate depends on the photoconductor material among other things and is discussed in detail by Borsenberger and Weiss in Ref. 16, pp 42–52, by Scharfe et al. in Ref. 20, p. 57, and by Scharfe, Ref. 34, pp. 108–114. The basic mechanism of bulk dark decay is thermal generation of charge carriers which causes timedependent degeneration of the voltage in the charged areas of the electrostatic image. Obviously, the time between charging and exposure and exposure and development has to be short enough to prevent significant degeneration. Yet, these times have to be greater than the charge transit times of the carriers. For a discussion of charge transit through the bulk of the photoconductor transport layer, see Williams, Ref. 33, pp. 24–28 and Scharfe Ref. 34, pp. 127–137. Thus the design of an imaging system architecture needs to take into consideration the concepts of dark decay, photo-induced discharge, charge trapping (16), and carrier transit times. DEVELOPMENT Development is the process by which ultrafine pigmented toner particles are made to adhere to the electrostatic image formed on the photoconductive material. Various means are available for developing this image. In the earliest copying machines such as the Xerox 914 and the Xerox 2400, cascade development was used in open and electroded form. Open cascade development (Xerox 914) deposits toner only at the edges of the electrostatic image. The electric field away from the edges of the image is mainly internal to the photoconductor. Consequently, only the fringe fields at the edges of images attract toner from the developer particles. A need was perceived to produce images that had dark extended areas. To do this, a biased electrode was introduced (Xerox 2400) spaced from the photoconductor that caused the electric field in the interior of extended areas to
split between the field internal to the photoconductor and external to it. Consequently, toner deposition in solid areas was somewhat enhanced. The electrically conductive sleeve of a magnetic brush developer serves the same purpose as the electrode in electroded cascade development. Other methods of development are liquid immersion (LID), powder cloud (used extensively for xeroradiography), fur brush, magnetic toner touchdown, jumping development, used in many machines by Canon, and hybrid scavengeless development (35–39). Two of the most widely used systems now are magnetic brush and jumping development. They will be discussed separately. The form of LID used by the Indigo 9000 printer will also be described. Magnetic Brush Development In magnetic brush development, a mixture of carrier particles and much smaller toner particles are dragged across the electrostatic image. The carrier particles are magnetic, but the toner particles are nonmagnetic and pigmented. They are used to make the image visible. Typically, carrier particles are about 100 µm, and toner particles are about 10–20 times smaller, only about 5–10 µm. Carrier particles are usually coated with polymers to enhance contact electrification between the toner and carrier surfaces. Stationary magnets inside a roughened, rotating, nonmagnetic cylindrical sleeve cause a stationary nonuniform magnetic field to draw the magnetic carrier particles toward the sleeve surface. Friction between the developer particles and the sleeve make the developer move through the gap formed by the photoconductor and the sleeve. Figure 21 shows a typical magnetic brush schematically. The combination of toner and carrier, called the developer, moves across the electrostatic image as a blanket. Because of friction between the carrier and sleeve surfaces, it can reasonably be assumed that, for a range of conditions, developer does not slip on the sleeve surface. Therefore, the developer flow rate per unit roll length through the development zone is proportional to the product of gap width g and the relative velocity vR : dMD = ρD gvR . dt
Development zone
Photoconductor
Toner image Developer particles
Pickup zone Metering blade
S
N
S
Release zone Developer bed height
N Magnets Figure 21. Schematic diagram of a magnetic brush developer.
(32)
ELECTROPHOTOGRAPHY
Here MD is the developer mass per unit length, and ρD is the mass density in the developer blanket. However, Eq. 32 is not valid when the developer blanket slips on the sleeve surface. The condition at which slip starts depends on carrier material, shape, magnetic field intensities and gradients, and sleeve material and roughness, among other things. These conditions are found experimentally for specific configurations of specific materials. Developer mass density is not necessarily constant but depends on toner concentration and carrier and toner particle size averages and distributions. However, the material mass densities and the amount of compression caused by magnetic fields, as the developer is driven through the development gap, are constants for specific configurations. Thus, for a specific geometry, developer material, magnetic field strength, and configuration, one can reasonably assume that ρD is constant under constant operating conditions. The moving conductive sleeve of the magnetic brush is biased electrically with respect to the (usually grounded) conductive backing of the photoconductor. So, an electric field in image areas is generated by the presence of the electrostatic image. This field makes electrically charged toner particles move, during development time, in the desired direction (toward the photoconductor in the image areas and away from the photoconductor in the background areas) and form a powder image on the surface. Development time is the time that an electrostatic image element spends in the development zone. It is the ratio of development zone length to photoconductor relative velocity. The electric field in the image imparts high velocity to the charged toner particles. They traverse the distance between the developer particles and the imaging surface and become part of the developed image. Because the force on a charged particle is the product of charge multiplied by field, low charge toner will fail to make the transition. After being separated from close contact with the carrier particles, the toner particles that have low charge become entrained in the air stream and land in unwanted areas. The development roll bias is chosen to minimize the deposition of toner that has the wrong polarity into the background areas. Toner Concentration and Charge-to-Mass Ratio Toner concentration (ratio of toner mass to developer mass) is a major factor that determines the quantity of electric charge on the toner particles, among other things. For most developers, increased toner concentration yields lowered average charge per particle. However, there are many more toner particles in a developer than carrier particles and the total electric charge in a quantity of developer is usually distributed normally among the toner particles. Let us use the definition that toner concentration, TC, is the ratio of toner mass Mt to developer mass MD . Developer mass is the sum of toner mass and carrier mass, MC . Mt . TC = MC + Mt
(33)
313
TC determines the total amount of toner that is delivered into the development zone. However, the amount that is actually available to an electrostatic image is only a fraction of the total amount. So the mass of toner per unit area deposited in the image is Mt = gvR TC#D A
t
ρD dt.
(34)
0
Here, ψD is the fraction of toner delivered into the development zone that becomes available to the image electric field forces during development time t. Toner particles adhere to the carrier particles by electrical and Van der Waals attraction. An electric field is generated in the development gap by the charges on the photoconductor when they come close to the electrically biased developer roll. This field overcomes the adhesion of toner particles to the carrier particles and causes deposition of the charged toner into the image areas. Thus, ψD depends on the adhesion of toner to carrier, the mechanical forces in the development zone that tend to dislodge the toner, and the depth into the developer bed to which the imaging electric field penetrates. Toner concentration affects the toner delivery into the development zone and also influences the quantity of charge on toner particles. The average toner charge-to-mass ratio (Q/M) in most developers is related to the toner-to-carrier mass ratio Ct , by Q At . (35) = M Ct + TC0 In Eq. (35), Ct is the total mass of toner divided by the total mass of carrier in a bed of developer, and Q/M is the average toner charge-to-mass ratio in the developer bed. At and TC0 are measured parameter. The surface state theory of contact electrification (see Ref. 40, p. 83) assumes that the charge on a carrier particle after charge exchange with toner is the product of the carrier area available for charging, the electronic charge, the carrier surface state density per unit area per unit surface energy, and the difference between the carrier work function and surface energy. Similarly, the charge on the toner particles on a carrier is the product of the total toner surface area, the electronic charge, the toner surface state density per unit area per free energy, and the difference between the toner work function and free energy. These assumptions lead (see Ref. 40, p. 83, Eq. 4.14) to the toner mass-to-toner charge ratio on a single carrier particle (the reciprocal of the charge-to-mass ratio) given in Eq. (36): Mt = RCt Qt
ρc 3ϕeNc
+r
ρt 3ϕeNt
.
(36)
In Eq. (36), ϕ is the difference between the carrier ‘‘work function’’ and the toner ‘‘work function.’’ The other parameters in Eq. (36) are Nc and Nt , the surface state densities per unit area per unit energy on the carrier and toner particles, respectively
314
ELECTROPHOTOGRAPHY
R and r, the carrier and toner average particle radii, respectively ρc and ρt , the carrier and toner particle mass densities, respectively Ct , the ratio of the mass of all the toner particles on one carrier to the mass of the carrier particle e, the electronic charge Qt , the average charge of all the toners on one carrier particle Mt , the mass of all of the toner particles on one carrier particle. It can be shown that the average charge-to-mass ratio on one carrier equals the average charge-to-mass ratio in a bed of developer. Let mt and mc be the mass of one toner and one carrier particle, respectively. The number of toner particles in a bed of developer is n0 nc , where n0 is the number of toner particles on one carrier particle and nc is the number of carrier particles in the bed of developer. Thus, the mass of the total number of toner particles in a bed of developer is n0 nc mt . The total toner charge in a bed is n0 nc q0 = Qt , where q0 is the average charge on a toner particle. The total toner mass in the bed is n0 nc mt = Mt . So, the average charge-to-mass ratio in the bed is Qt Mt . However, the total toner charge on a single carrier particle is n0 q0 , and the total toner mass on the carrier particle is n0 mt . So, the average charge-to-mass ratio on a carrier particle is q0 /mt which is just equal to Qt /Mt . So, the average charge-to-mass ratio on a carrier particle is the same as the average charge to mass ratio in the bed. The mass of all of the carriers in a bed of developer is nc mc , and the ratio of the two is n0 mt /mc . So, Ct is both the ratio of the mass of all of the toner particles on one carrier particle to the mass of the carrier particle and the ratio of the total toner mass in the bed of developer to the total carrier mass in the bed. Because both toner and carrier particles are distributed in size, these arguments apply only to the average toner size, carrier size, and charge-to-mass ratio. For these averages, Eq. (35) can be obtained from Eq. (36) by letting At =
3ϕeNc Rρc
and TC0 =
Nc rρt . Nt Rρc
(37)
The parameters At and TC0 are usually measured in the laboratory, under experimental conditions designed to provide insight into the behavior of developer materials of varying physical and chemical formulations, charge control additives, relative humidity, developer age, running history, and toner concentrations. Ruckdeschel performed experiments on dynamic contact electrification by rolling a bead in the track of a rotating split Faraday cage and measuring the voltage buildup as a function of the number of revolutions (41). The experiments were repeated with beads of various
materials. These experiments showed that contact electrification is a nonequilibrium ion transfer process that depends on the area of contact, the adsorption of water on the surfaces, the speed at which the surfaces separate, and, of course, the materials that comprise the surfaces in contact. The adsorption of water which influences toner charge when in contact with the carrier is often seen by the marked influence of relative humidity on the toner chargeto-mass ratio. By performing his experiments in a bell jar, Ruckdeschel found that the nature of the surrounding gas had a small influence, as long as the pressure was close to atmospheric. Close to vacuum, the pressure did indeed have an influence on charging. Nash in a review article (Ref. 42, p. 95–107) presents factors that affect toner charge stability. He presents a modified form of Eq. (35) for charge generation of a simple additive-free toner–carrier mixture: At Q [1 − e−γ t ]. = M Ct + TC0
(38)
The exponential factor γ , the effective rate constant for charging, depends on the frequency of toner to carrier contacts during mixing. The frequency per unit mass depends on geometrical mixing factors and the power put into the mixing device. These important factors make it difficult to compare the results of bench tests with those obtained from machine operating conditions, yet, toner charge per particle is one of the key parameters involved in toner deposition. Mixing time t determines the total number of toner to carrier contacts. From surface state theory, it follows that when all the surface states on carrier and toner particles are filled, Q/M saturates. Toner particles are usually combined with charge-control agents. The sign and quantity of charge per toner particle depend on the carrier core and coating materials, surface coverage, toner polymers, and charge-control agents that are used in fabricating the developer (see Ref. 1, p. 119 and Ref. 40, p. 82 for details). Comparisons between bench tests and the results of running the same materials in a machine, although very difficult, are particularly important. The evaluations of these comparisons are used to guide the formulations of carrier coatings and toner additives that provide stable developers that have long life. For a detailed discussion of toner instability, see Ref. 42, p. 95. Some factors involved in developer instability are the sensitivity of Q/M to ambient temperature and humidity, to toner addition strategies during operation, and to developer aging effects. Some charge-to-mass ratio fluctuations at developer start-up and the accumulation of wrong polarity and low charge toner also contribute to developer instability. Developer aging effects are related to the loss of carrier coating after extended operation; toner ‘‘welding’’ onto the carrier surface; and/or ‘‘poisoning’’ of the carrier surface by toner charge-control agents, flow control agents, or pigments. Both the toner and carrier charge distributions in the developer determine to what extent toners end up in the image or background areas. The fraction of toner that has the wrong polarity (or very low charge of the right polarity) is also determined by toner concentration. A developer can become ‘‘dirty’’ or ‘‘dusty’’
ELECTROPHOTOGRAPHY
at various high concentrations. This toner becomes part of unwanted print background or machine dirt. Much effort has gone into producing stable developer materials. Electric Field and Toner Deposition. The optical density of a toner monolayer in an extended area image is proportional to the number of toner particles deposited per unit area. After a monolayer is deposited, the optical density tends to saturate, and further deposition only increases the thickness of the layer without making it darker. Let us define development to completion (neutralization-limited development) of an electrostatic image as the condition reached when the vertical component of the electric field in a horizontal development zone is shut off. This can happen by neutralizing the electrostatic image charge density either by toner charge or by quenching the electric field in the development zone by the space charge left on carriers after the toner is stripped. (There is another limitation on toner deposition called supply-limited development.) Let us consider neutralization-limited development without space charge. (Also see Ref. 40 p. 149 and Ref. 33, p. 165.) When toner flow is sufficient to saturate development, see Eq. (34), field neutralization shuts off further deposition in the image. Before deposition of toner, the initial electric field in the development zone, of a capacitively charged photoconductor in image areas is given by Ei =
Vimage − Vb Vimage − Vb ' . = dz dp Z ε0 + + dair kz kp
(39)
Assume that the development zone is partitioned into the photoconductor, the developer blanket, and an air gap. Here, Vimage is obtained by the methods described in Photoconductor Discharge Vb is the bias applied to the development roller; and Z is the sum of the electrical thickness of the developer bed, the photoconductor, and the air gap. In any specific experimental setup, the photoconductor material and thickness, the magnetic field and distribution, and the developer material are usually held constant. Thus the denominator of Eq. (39) is determined by photoconductor thickness dp , its dielectric constant kp , air gap dair , the effective electrode spacing dz and the dielectric constant kz in the region between the effective electrode and the photoconductor surface. As toner deposition proceeds, the numerator is diminished by Vt , the voltage of the average charge density of the toner charges. Assuming that these toner charges can be approximated by a thin sheet of charge and that development proceeds until Ei vanishes, then, Vimage − Vt − Vb = 0 and Vimage − Vt = (σimage − σt )Z.
(40)
Because σt = (Q/M)(M/A), Eq. (40) can be rewritten as M = A
Vimage − Vb ' . dp dz ε0 (Q/M) + + dair kz kp
(41)
315
So, in the neutralization, limit the deposition of toner depends on the development potential, Vimage − Vb , the photoconductor thickness, and the toner charge-to-mass ratio, when other things are equal. Many simplifying assumptions are made in deriving Eq. (41). However, deposition does not usually go to field neutralization. In practice, 50–80% field neutralizations are common. Developer Conductivity. The conductivity of a magnetic brush in the development zone is very important. Consider that charged toner particles leave their countercharged carrier particles during development, and become part of the developed image on the photoconductor. This constitutes charge flow or an electrical current out of the developer in the development zone. The countercharges on the carrier particles either dissipate to the developer sleeve, when a highly conductive developer is used, or become part of space charge in the development zone when insulating developers are used. This space charge tends to negate the electric field caused by the electrostatic image near the effective electrode. When an insulating developer is used, this space charge is swept out of the development zone because the development roll moves faster than the photoconductor. By moving faster than the photoconductor, the carrier particles that have lost toner and have become charged (countercharge of toner) are replaced by fresh particles from the developer sump. See Ref. 33, p. 165 for an analysis of space-charge-limited magnetic brush development. Thus, one function of a high conductivity developer is to prevent, at roll speeds slower than for insulating carriers, the premature shutdown of development because of a buildup of space charge in the development zone. When insulating carrier particles are used, the location of the biased effective electrode in the development zone is defined by the electrically conductive sleeve, usually about 1 mm away from the photoconductor surface. However, if the carrier particles are conductive, then the effective electrode is defined by the time constant of the developer blanket and will be closer (25–200 µm) to the photoconductor than the developer sleeve. The time constant, in turn, is a function of the particle shape and conductivity, toner concentration, magnetic field strength, the particle density of the blanket, and its state of agitation in the development zone (see Ref. 43 and Ref. 44, p. 1510). Carrier particle conductivity can be controlled by the nature of the core material; the shape of the particle (spherical vs. granular); the presence and thickness of an oxide layer; and the nature, thickness, and surface coverage of the carrier coating. Grit and oxidized iron coated with low percentages of polymer to enhance contact electrification are popular materials. Toner concentration affects developer conductivity by influencing the number of carrier-to-carrier contacts made by the conductive carrier particles within a magnetic brush chain. Figure 22 shows, schematically, bead chains that have low or high toner concentrations. The random distribution of insulating toner particles within a chain causes statistically fewer electrically conductive contacts between the conductive carrier particles at high than at low concentrations. One can also visualize how carrier
316
ELECTROPHOTOGRAPHY
Grounded photoconductor with imaging charge High toner concentration
Effective electrode
Effective electrode Low toner concentration Figure 22. Schematic diagram of effective electrode location at high and low toner concentrations.
Biased sleeve
shape and size distribution as well as toner shape and size distribution influence the number of bead-to-bead contacts per chain. It can be appreciated that if the carrier core is conductive, then the fraction of the surface covered by the coating that is added for contact electrification of toner statistically influences the conductivity of the developer bed. The effective electric field during development at locations within extended areas far away from electrostatic image edges is enhanced by conductive developers. When insulating developers are used, the fields at these locations are internal to the photoconductor, and only the fringe fields at the image edges develop. However, if the developer conductivity is high enough to present a closely spaced effective development electrode to the electrostatic image, then the field at the interior of extended areas is no longer internal to the photoconductor. Thus, toner deposition within extended areas is enhanced by using conductive carriers, and fringe field development is reduced. The flow of charge during deposition (toners are charged particles) constitutes an electrical current associated with the flow of toner particles from the developer bed to the photoconductor surface. To approximate the influence of the effective electrode spacing (conductivity) on development, assume that toner current dσt /dt during development is proportional to the electric field in the development zone. Following Ref. 40, p. 125 and Ref. 34, p. 26, assume that the electrode-to-photoconductor backing is divided into three slabs, as shown in Fig. 23. Therefore, because the field in the image area, including toner deposition as a thin sheet of charge, is given by
Ei =
Vimage − Vb − Vt Vimage − Vb − Vt ' = dz dp Z ε0 + + dair kz kp
=
VD − Vt ' , dz dp ε0 + + dair kz kp
dσt VD − Vt = αEi = α . dt Z
In terms of M/A, Eq. (43) becomes d(M/A) α VD − Vt α = Ei = dt Q/M Q/M Z
dz , k z
Air gap d p, k p
d air Photoconductor
Development zone length Figure 23. Schematic diagram a dielectric slab of a conductive magnetic brush developer.
(45)
Assume that the toner charge density σt in the developed image is a thin sheet, as before. For ease of notation, let M/A = m, dp /kp ε0 = c, and Q/M = q.
Vt =
Developer bed
(43)
Because σt = (Q/M)(M/A) and Q/M is constant (the average value) during deposition, but the variable is M/A, so, Q d(M/A) dσt = . (44) dt M dt
Then, Effective electrode
(42)
σt dp = kp ε0
Q M
M A
(46)
dp = mqc, kp ε0
(47)
and Eq. (45) becomes dm α α = VD − mc dt qZ Z
(48)
Putting Eq. (48) into the standard form of a linear firstorder differential equation yields dm + C1 m = C2 dt
ELECTROPHOTOGRAPHY
C2 =
α VD qZ
C1 =
and
αc . Z
(49)
C1 and C2 vary slowly, if at all, during the deposition process. ! C dt Letting the integrating factor be φ(t) = e 1 yields the solution m = C1 e−C1 t C2 eC1 t dt + K = C1 e−C1 t
C 2 C1 t e +K , C1 −C1 t
m = C2 + KC1 e
(50)
αct αVD αc − = +K e Z qZ Z
(51)
The boundary conditions are m = 0, when t = 0; m = 0 when VD = 0; and m = 0, when α = 0. Solving Eq. (51) for K using these conditions yields K = −αVD /qZ. When the substitutions for the coefficients (Eqs. (46) and (49)) are replaced in Eq. (51), it becomes kp αVD M 1 − e = A (Q/M)Z
−dp αt dz dp + +dair kz kp
(52)
Equation (52) fulfills the experimental requirements that 1. as development potential VD ⇒ 0, toner deposition ⇒ 0; 2. as development time, expressed as development zone length (LD /vp ) divided by process speed, t ⇒ 0, toner deposition ⇒ 0 3. as dependence of deposition rate on field ⇒ 0, toner deposition ⇒ 0; 4. as development time increases and other things remain equal, deposition approaches the neutralization limit; 5. as the charge to mass of the toner increases, the neutralization limit decreases; and 6. toner deposition is linear with development potential, if configuration and materials properties are constant. As developer conductivity increases, Z decreases because both dz and dair decrease whereas kz increases (see Eq. (42)), that is the effective electrode spacing decreases. In addition, the exponent in Eq. (52) increases, so that the neutralization limit is reached more rapidly. Thus, Eq. (52) shows that deposition is enhanced by using conducting developers, if other conditions are kept constant. However, Eq. (52) also shows that adjustments can be made to allow insulating developers to deposit as much toner as conductive developers. For example, development time is increased by extending the development zone or slowing the process speed. If high process speed is essential when insulating developers are used another way of increasing development time is to use multiple development zones. This would allow some insulating developers to yield the
same toner deposition as more conducting developers at comparable development potentials. There are other ways of accomplishing the same thing, but all at some additional cost. Figure 24 shows some examples of development curves. Curves, like these examples, can be generated by varying the parameters discussed earlier and allow estimates of the optical density of solid areas. (Optical density is a function of M/A, and M/A = 0.6 to 0.8 mg/cm2 is desirable.) These parameters help define an imaging system. Using conductive developers to make the latent image visible has the advantage that close proximity of the front surface of the photoconductor to the effective electrode minimizes fringe field development. Therefore the geometry of the developed image closely approximates the geometry of the electrostatic charge pattern on the photoconductor. This is useful for estimating the optical density of lines and halftones. Write Black (DAD) and Write White (CAD) Two forms of development, can be used to make electrostatic images visible: Charged Area Development (CAD, ‘‘write white’’) and Discharged Area Development (DAD, ‘‘write black’’). In CAD, toner is charged opposite in polarity to the electrostatic image. Because opposite charges attract each other, toner is deposited into the charged area of the image. CAD provides optically positive image areas from optically positive exposure patterns. The bias on the developer roll has the same polarity as the charges in the image. In DAD, however, toner is charged with the same polarity as the electrostatic image, but the developer bias is set close to the voltage of the charges in the unexposed background areas. Exposure is in the form of an optical negative. Image areas have a lower magnitude of charge than unexposed background areas. Thus, optically negative electrostatic images provide optically positive output images. Figure 25 shows an example of an exposure pattern. This exposure pattern is converted into the electrostatic image voltage pattern, shown in Fig. 26, by a PIDC. Figure 26 shows development potentials to be used in one of the development models, illustrating the difference between CAD and DAD.
1.2
Q /M = 15 µc /g
1
M /A (mg/cm 2)
where
317
Q /M = 20 µc /g
0.8 0.6 0.4 0.2 0 0
100
200
300 400 V D (volts)
500
600
Figure 24. Examples of development curves calculated from Eq. (52).
ELECTROPHOTOGRAPHY
1.2
700
1
600
0.8
500
V image at x ′
Exposure at x ′
318
0.6 0.4 0.2
Normal exposure
Vbias (DAD) Overexposure
400
VD
300 200
Vbias (CAD)
100
0
0 0
0.005
0.01
0.015
0.02
0.025
0.03
0
Position on pc (x ′) Figure 25. Example of exposure pattern for conversion into the image voltage pattern in Fig. 26.
0.01 0.015 0.02 Position on pc (x ′)
0.005
Development zone
600
Toner transition zone
V bias (DAD)
500 400
VD
300
Metering blade
VD
200
0 0
0.005
0.01 0.015 0.02 Position on pc (x ′)
0.03
S
Photoconductor Toner image
N
S
Magnetic toner particles
N Magnets N
V bias (CAD)
100
0.025
Figure 27. Electrostatic image patterns of normal exposure and overexposure that Show line growth in DAD and line shrinkage in CAD.
700
Vimage at x ′
VD
0.025
0.03
Figure 26. Electrostatic image pattern illustrating how developer bias is set for CAD or DAD.
Line growth and overexposure. When overexposure occurs in DAD mode, the totally exposed areas between the unexposed regions tend to widen because the photoinduced discharge curve (PIDC) is flat in the very high exposure regions (see Fig. 20). The bias voltage on the development roll then tends to drive toner into the exposed image areas. Because the developer bias is set close to the background potentials, DAD produces line growth and tends to reduce resolution. However, in the CAD mode, the line image occurs in the regions where the exposing spot is turned off between the exposed background areas. Because light intensity is distributed in space, the fraction of peak exposure at a given point in space will discharge the photoconductor more, as peak exposure is increased. Thus, overexposure in the CAD mode tends to produce line thinning and low optical density. This is illustrated in Fig. 27. The upper curve represents the electrostatic image when peak exposure is 10 erg/cm2 , and the lower curve is for a peak exposure of 20 erg/cm2 . If the difference between the background voltage and developer bias is to be held constant to avoid background deposition when overexposure occurs, the bias has to be readjusted. This influences the value of development potential VD in Fig. 27. Optimizing electrophotographic process parameters is a complicated subject (see Ref. 34 for methods and details).
S
S Toner sump
N
Figure 28. Schematic drawing of jumping development using magnetic toner.
Jumping Development Using Magnetic Toner Jumping development is a method also popular in some modern machines. It can be carried out with magnetic or nonmagnetic toner. In the former method, magnetic toner is metered onto a moving roller sleeve by a metering blade that also charges the toner. The thin toner layer is brought close to the electrostatic image but is spaced 50–200 µm away from the photocondutor surface. An oscillating field, provided by an ac bias with a dc offset on the roller sleeve, causes a cloud of toner to be lifted off the roller surface, but out of contact with the photoconductor surface. No contact is made by the airborne toner with the photoconductor surface until an electric field attracts the toner into electrostatic image areas. The force balance that keeps the toner airborne in background areas is the magnetic attraction to the roller and the electrical forces of the ac voltage on the bias. The role of the electrostatic image is to unbalance these forces and motivate toner to move toward the photoconductor. Figure 28 shows a schematic drawing of a jumping development system. Controlling the toner thickness on the roller as it goes into the development zone is very important. Hosono (45,46) describes some ways of achieving this. Because there is a magnetic attraction as well as the usual forces of adhesion between the magnetic toner and the developing roller, deposition does not begin until a threshold electric field is exceeded. The application of biases is explained by Kanbe (47). As toner on the sleeve moves into the development zone, the ac and dc electric fields get stronger, and toner
ELECTROPHOTOGRAPHY
oscillation back and forth between the photoconductor and the sleeve becomes more energetic. When the phase of the ac bias attracts the toner toward the electrostatic image, the particles move further and faster than in the reverse polarity phase because the fields caused by the electrostatic image augment the fields caused by the ac component of the developer bias. When the phase is such as to bring toner back to the sleeve, the ac component of field is diminished by the presence of the electrostatic image field. Consequently, the motion of the particles that is augmented by the electrostatic image causes them to reach and adhere to the photoconductor and become part of the developed image. However, if the ac component is too high, toner will be urged to the photoconductor in background areas as well as in image areas during deposition. Once the toner contacts the surface and forms an adhering bond, the reverse phase of the ac component cannot dislodge and remove it. This is desirable in image areas but not in background areas. Thus the factors that affect the developed image quality are the ac magnitude and frequency, the dc offset level, the toner layer thickness and charge density on the developing roller, and the effective air gap between the toner layer and the photoconductor surface. Let us divide the development zone into three regions (Fig. 29 and 30). In region 1, the superposition of electrostatic image field, ac and dc bias fields are less than the threshold field (Ed < Eth ) and attract toner away from the image. In region 2, they are less than the threshold field but attract toner toward the image. In region 3, they are greater than the threshold field (Ed > Eth ) and attract toner toward the image. (see Fig. 29). As a partially developed image area passes through the development zone, it goes through the three regions a number of times, depending on the frequency and amplitudes of the ac bias,
319
the value of the threshold field for deposition, the dc offset of the bias, and the instantaneous potential of the partially neutralized electrostatic image. The total length of the development zone is defined by the geometry of the development roller and the photoconductor configuration. Total development time is LD /vp , where vp is process speed. Let the total time that an image element spends in regions 1, 2 and 3 be t1 , t2 , and t3 , respectively. For a sinusoidal ac component (Ref. 48 shows some nonsinusoidal voltage pulses) added to a dc bias, the electric field in the development zone is
ED (t) =
Vimage − Vti (t) − [Vb + Vtc (t) + Z
Vp−p sin π ωt] 2 .
(53) Here Vimage is the voltage of the image on the photoconductor, and Vti is the voltage of the toner deposited on the photoconductor as it moves through the development zone. This quantity depends on the instantaneous mass per unit area (M/A) of the layer. Assuming conservation of charge in the cloud, then the voltage of the charge in the toner cloud layer Vtc is also a function of instantaneous deposition and time. The dc component of the applied bias Vb is not a function of time. A plot of Eq. (53) is shown in Fig. 30 for a hypothetical case where LD /vp = 0.02 seconds, Vb = 200 volts, Vimage = 200 volts, ω = 400 Hz, Vp−p = 800 volts, Z = 100µ m, and Eth = 1 volt/micron. It was assumed that the voltages Vti and Vtc were zero to indicate a possible initial condition in the development gap before deposition starts to occur. So, in this case t1 = 0.010 seconds, t2 = 0.0015 seconds, and t3 = 0.008 seconds. However, as deposition begins, Vi depends on the amount of toner deposited and its charge-to-mass ratio.
d p, k p
Photoconductor Toner particles
Air gap
d air
Region 1 Region 2 Region 3 Region 1 Region 2 Region 3
d t kt
Toner layer Biased sleeve
Figure 29. Schematic drawing of development zone for jumping development.
E D (volts/micron)
Development zone L D
5 4 3 2 1 0 −1 0 −2 −3 −4 −5
Deposition Region 3
E th
Region 2 0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
Region 1
Development time t(s)
Figure 30. Oscillating electric field in development zone that shows times when toner is attracted toward the electrostatic image.
320
ELECTROPHOTOGRAPHY
Therefore, as an element of charged photoconductor moves through the development zone, the value of the development field diminishes. If there is sufficient development time, development again approaches completion. It can be assumed that the instantaneous deposition rate is proportional to ED (t), as in Eq. (45) but only when ED > Eth , that is, region 3. This assumes that the toner particles respond to the field, make it all the way across the air gap, and adhere to the photoconductor image area. Because the electric force on a toner particle is the product of the field and charge, uncharged particles would not be motivated to move. The details of this development process are covered by Kanbe (49). Jumping Development using Nonmagnetic Toner Magnetic toner contains magnetite, a black magnetic material. Thus, it is not a practical toner for color applications. Tajima (50) discloses a method that allows jumping development using nonmagnetic toner. In this method, a conventional two-component magnetic brush developer deposits a layer of toner onto a donor roll that is biased ac and has a dc offset voltage (Fig. 31). As in magnetic toner jumping development, nonmagnetic toner in the development zone is maintained as an oscillating cloud by the ac component of the bias. The electrostatic image augments the fields to attract toner to deposit on the photoconductor. Because full donor coverage is required by toner in a confined space, a conductive carrier is usually used in the magnetic brush developer. In this process, great care must be taken to prevent carrier particles from depositing on the roll surface and bringing toner particles into the image development zone. The advantage of the magnetic toner scheme is that there are
Development zone
no carrier particles, but of course, the magnetic toner is disadvantageous for color applications. Hybrid Scavengeless Development (HSD) Hybrid scavengeless development is very similar to jumping development using nonmagnetic toner, as just described. However, a set of wires is placed into the photoconductor–donor roll nip. These wires are at a high ac voltage, typically 600–800 volts peak to peak, and mechanically and electrically lift off the toner deposited on the donor by the magnetic brush developer (Fig. 32). So again, an oscillating cloud of toner is maintained in the electrostatic image development zone and the electrostatic image attracts toner to make the transition to the photoconductor. Patents by Hays (51), Bares and Edmunds (52), and Edmunds (53) disclose some of the details involved in this development system. A form of hybrid scavengeless development, described by Hays (54), replaces the wires with electrodes that are integral to the donor roll. These electrodes are biased with respect to each other and have an ac voltage. The electric field strips the toner off the donor surface, raising an oscillating cloud of toner which is used to develop the electrostatic image. Figure 33 shows the electroded development system schematically. The electrodes are all commutated so that the ac bias is applied only to triads of electrodes simultaneously. The donor roll is a dielectric material so that the ac electric fields between the outer electrodes and the central electrode of the triad are mainly tangential to the surface of the roll and tend to break the adhesion between the toner and the roll surface. Once this bond is broken, the toner is available to form a cloud. The dc bias on the central electrode and the electrostatic
Photoconductor
Development zone
Toner image
Toner transition zone
Nonmagnetic toner particles
Toner image Nonmagnetic toner particles
Wires
Skiving blade
S
S
Skiving blade
S
S N
N Magnets
Two−component developer sump
N
Figure 31. Schematic drawing of jumping development using nonmagnetic toner.
Photoconductor
S N
N S
Magnets
Two−component developer sump
N
Figure 32. Schematic drawing of hybrid scavengeless development using wires in the development zone.
ELECTROPHOTOGRAPHY
Photoconductor
Commutated electroded donor roll
Magnetic brush developer
Figure 33. Schematic drawing of hybrid scavengeless development using a segmented donor roll.
image augment the fields perpendicular to the surface and attract toner onto the photoconductor in the image areas. Development continues until either the electrostatic image is neutralized or until the image-bearing element of the photoconductor is beyond the development zone. Liquid Development of Electrostatic Images Liquid development of an electrostatic image, liquid ink electrophotography, was a popular method when ZnO coated papers were used as photoconductors in the 1950s and 1960s. Originally, the electrostatic image on the photoconductor was totally immersed in a mixture of pigment particles and fluid carrier. The development process proceeded by having electric fields attract the charged pigment particles to the latent image. As in open cascade development, only fringe fields were developed in the original version of liquid immersion development (LID). So, Stark and Menchel (55) introduced an electrode close to the imaging surface (aluminized Mylar) and let the developer fluid pass between the member and the electrode. This provides solid area development capability. In this form of LID, the developed images were quite wet and excess carrier fluid had to be evaporated or removed mechanically. This form of LID became undesirable for commercial application because disposable photoconductors became unpopular. To reuse the photoconductor, the developed toner particles needed to be transferred to a transfer sheet such as paper. The easiest way to do this for the small (submicron sized) pigment particles was to use a combination of pressure and heat. ZnO coated paper was not a good candidate for a reuseable photoconductor. In addition, due to increasing environmental awareness in the 1980s and 1990s, evaporation of the hydrocarbon liquid carrier became less desirable, so LID became
321
commercially unpopular. The invention of developer fluid metering (56,57), blotting and cleaning rollers (58), and compaction and rigidization of the developed image to make it transferable to an intermediate member (59–61) made the new method of LID commercially viable for high-speed, high-quality color printing. Indigo machines use this procedure quite successfully. Toner particles for dry development systems consist of pigments dispersed in a thermoplastic coated with chargecontrol and flow additives. These particles range from 5–12µ, in average diameter. Liquid development systems use pigment particles that range from 0.2–0.4µ in average diameter and are dispersed in a hydrocarbon insulating carrier fluid. Commonly used carrier fluids include Isopar. Currently there are many patented formulations that have complicated chemistry (62–67). They also have charge director substances in the fluids to control the charging of the pigment particles (which are now coated with thermoplastics) by contact with the carrier fluid. The carrier fluid is nonconducting, so, that the electrostatic image is not discharged by intimate contact with the fluid. The average particle sizes of the coated pigments range from 0.4–2.0µ. A counterrotating, very closely spaced electrically biased metering surface considerably reduces the amount of fluid carryout and provides excellent solid area development. Image disturbances can appear when the toner image is too thick or if the charge on the topmost layer is too low. However, biasing schemes and geometries can counteract these defects (68). A set of cleaning rollers that moves in the same direction as the photoconductor (69), further reduces the amount of fluid carryout. Figure 34 shows a schematic drawing (out of scale) of the development metering zone in a counterrotating system, such as depicted by Denton (61). In practice, the closest approach of the photoconductor to the roll is on the order of thousandths of an inch, Whereas the roll diameter can be several inches. The photoconductor is about 25µ thick. The dotted lines in Fig. 34. represent the general pattern of fluid flow as the image element on the photoconductor approaches the minimum gap in the development zone. The toner particles move with the carrier fluid until the perpendicular electric field in the development zone overpowers viscous drag. However, fluid at the surface of the photoconductor moves at the photoconductor velocity, and fluid at the roller moves at the velocity of the roller surface. Consequently, the fluid is sheared, and the locations of the menisci at the entrance and exit zones of the development zone help to define the development zone length and the distribution of the electric field. Thus, the development zone length depends in part on the velocities of the photoconductor and roll surfaces, as well as on the geometry. A relatively thick layer of ink is deposited on the photoconductor either by nozzles that squirt ink on the surface (70), or by allowing it to run through a bath of developer. These ink deposition zones are comparatively far away (on the order of inches) from the development metering roll. The electrostatic image does not have a reference electrode close by during the transition from the
322
ELECTROPHOTOGRAPHY
Electrostatic image
Development zone Photoconductor
Meniscus of ink leaving zone
Fluid flow
Meniscus of ink entering zone
Toner image
Toner particles
Development and metering roll
Figure 34. Schematic drawing of development using liquid ink and a counterrotating development/metering roll.
deposition zone to the development zone. So the electric fields generated by the electrostatic image charges are mostly internal to the photoconductor. Some development may start during this transition because of fringe fields, but most development occurs in the development zone. As the ink layer approaches the minimum gap in the development zone, the electric field in the inking gap approaches a maximum. However, as the field increases, toner particles entrained in the carrier fluid are attracted toward the charges on the photoconductor surface, and are deposited when contact is made. The toner deposited tends to neutralize the image (as in dry development). So, the total amount deposited on the photoconductor depends on the flow rate of toner through the development zone, the electrostatic image charge density and the fields set up by these charges, the charge-to-mass ratio of the particles, the total development time, and the fluid characteristics of the ink. Among the ink characteristics are the carrier fluid viscosity, the mobility of the particles and their volume concentrations, the volume charge density of the micelles that carry the countercharge left behind by the deposited toner and their mobility through the fluid, and the development zone geometry. Development of the electrostatic image by liquid inks does not leave a dry image. The concentration of toner particles in the ink before development is about 1% to 4%, but the concentration of the particles in the developed image can be in the range of 15% to 20%. In background areas, of course, the volume concentration of particles has to be zero. So, although there is carrier fluid in both background and image regions, the effective action of the development fields is to concentrate the particles and to rigidize the toned image to resist the shearing stress at the exit meniscus of the development zone. The function of the metering part of the development metering roll is to reduce the fluid layer thickness from many microns at the entrance zone meniscus of Fig. 34. to a thickness of about 4µ to about 8µ at the exit meniscus. Rigidization or
compaction of the toned image is disclosed by Denton (61) and by Domoto (68). Then, as disclosed by Landa (60), the developed photoreceptor is passed through an electrostatic transfer zone where the remaining fluid helps to enable transfer of the particles to an intermediate medium which then passes the toned image through a blotter or squeegee. The blotter or squeegee is electrically biased to concentrate the toner particles further and remove carrier fluid until particle concentrations of about 40 to 50% are achieved. Another function of the electric field at the blotter is to rigidize the toner image further. This intermediate member is passed into a pressurized and heated zone and the toner is transferred to paper or any final substrate. TRANSFER Now that the electrostatic image is visible on the photoconductor, the toner particles can be transferred to an intermediate member such as a dielectric or to a viewing medium such as paper or a transparency. To do so, the transfer medium is brought into contact with the developed photoconductor so that a sandwich is formed, and the toner layer is placed between the photoconductor and the transfer medium. Transfer is achieved electrically, mechanically or by a combination of both. If the transfer medium can withstand high temperatures (on the order of 150 ° C), then the toner melts simultaneously in the transfer process and is fixed to the viewing medium. This process is called transfusion. In the ionographic process used by Delphax Corporation (see Ref. 15 for a description of the process), the electrostatic image is formed on an aluminum drum that uses a layer of aluminum oxide as the charge-receiving dielectric. Alumina is hard and heat resistant, so after the image is developed with toner, it is brought into contact with paper backed by a heated pressure roller. Thus the toner melts in the pressurized nip and transfuses. However, most photoconductors degrade rapidly when subjected to high
ELECTROPHOTOGRAPHY
temperatures (degradation accelerates for Se alloys and AMAT layers when temperature exceeds about 110 ° F). Thus, the toner image is transfused after it is first electrostatically transferred to an intermediate member which can be a heat resistant dielectric such as Mylar. Cold pressure transfer, a purely mechanical means, can be used if the photoconductor can withstand high pressures. However, low transfer efficiencies result unless the pressure is so high that the medium is deformed by it. Paper tends to calendar under these conditions. Electrostatic transfer is achieved by corona charging the back of the transfer medium (a purely electrical method) or by bringing the toner sandwich in contact with a conformable, electrically biased roller (a combination of electrical and mechanical means). Corona transfer is disclosed in Ref. 71 and biased roll transfer is disclosed in Refs. 72–78. The biased transfer roller (BTR) consists of a hard, usually metallic, cylindrical core that is coated with a layer of conductive conformable material (72,79). Coating materials such as carbon-loaded elastomers have been used. The electrical conductivity and compressibility have to be tightly controlled and depend on the photoconductor and transfer medium speed and electrical characteristics. Toner particles are charged. Thus, they react to electric fields that are generated between the photoconductor surface and the transfer medium backing. If the field is in the right direction and is strong enough to overcome the adhesive and cohesive forces, the toner particles stick to the paper and provide a heat-fixable image. Because toner particles are charged, the forces that tend to move them toward the surface of the transfer medium are proportional to the product of charge and field. Thereby, one would imagine that simply increasing the value of the electric field would enhance transfer. In fact, air breakdown provides an upper limit to the field and exceeding the threshold for breakdown only inhibits transfer by tending to neutralize the charge on the toner particles. Increasing the charge-to-mass ratio of the toner particles assists transfer, but increased charge on the particles also increases the countercharge on the photoconductor. So there is an effect but it is small. Obviously if the particles are uncharged, they will not react to uniform electric fields. When the photoconductor, toner, paper sandwich leaves the corona transfer region, the paper will stick to the photoconductor because of electrostatic tacking forces. The electrical force on the transfer medium arises from the attraction of the electric charges on the back surface of the medium and the conductive backing of the photoconductor. They are relieved by a detacking device that allows the charges from the paper backing to relax when the paper is separated from the photoconductor. These devices are usually high frequency ac corona generators or ac biased rolls. The frequency has to be chosen so that strobing is not introduced. Using biased transfer rolls, tacking is not as severe as in corona transfer (80). Tacking can be avoided by tailoring the electric field in the toner sandwich, so that it tends to decay at the exit portion of the roller nip. Reference 81 describes exposure of the photoconductor at the exit region to control the field as separation occurs between the toner, paper, and photoconductor sandwich.
323
Transfer efficiency is the fraction of the total toner deposited on the photoconductor which appears on the transfer medium. It depends on the electric field in the sandwich, the electrical and mechanical adhesion of particles to the photoconductor, the cohesion that holds the toner layer together as a unit, and the electric field distribution at the exit of the transfer zone. If one plots the experimentally obtained mass of toner per unit area on the transfer medium versus the mass of toner per unit area on the photoconductor then, a linear curve is obtained for coverage up to about 1.0 mg/cm2 . However, there is usually threshold coverage of about 0.02 mg/cm2 before transfer starts to occur. Therefore, transfer efficiency is usually stated, at or near 1.0 mg/cm2 deposition, as the ratio of toner mass on the paper to toner mass on the photoconductor before transfer. Reference 33 (pp. 220–221) presents an exponential curve fit to some transfer data taken at IBM Corporation. Transfer efficiencies of about 0.6–0.9 are common for conventional corotron transfer devices, depending on the specific paper thickness and absorbed humidity, as well as the photoconductor thickness. When paper absorbs moisture, it becomes conductive. If the paper is simultaneously in contact with any grounded conductive surface, then the charges on the back of the paper bleed to that surface, and the potential of the paper diminishes. Thus, charges deposited on the back surface of the paper tend to bleed through toward the toner, and the total electric field in the sandwich diminishes. In extreme cases of absorbed moisture and corona transfer, efficiencies approach zero, and catastrophic failure is experienced. To minimize this failure, it is necessary to fabricate all paper transport belts from good insulators. It is actually possible to transfer toner to a metallic plate or foil if the potential of the foil is kept constant as separation occurs and its value is such that air breakdown is avoided. An approach to minimizing the effects of humidity on transfer, among other things, by field tailoring is disclosed in Ref. 82. Paper is packaged under dry conditions, and the user is advised to keep the stacks intact after opening a ream. Moisture penetration is a slow process that depends on the exposed area, the sides of the stack, among other things. The design of papers for electrophotographic transfer is a separate technology. Yang–Hartmann (83) developed a model of charged particle transfer when subjected to an electric field generated by two dielectric coated parallel electrodes. This model is cited in Ref. 33, (pp. 204–208). In this model, the electric field in the toner pile between the photoconductor and the transfer medium is calculated. Because the particles are charged, the field vanishes at a point inside the toner layer. It is assumed that the toner layer splits at this thickness. Transfer efficiency in this model is defined as the ratio of toner pile height on paper to the pile height of the original deposition on the photoconductor. As Williams (Ref. 33, pp. 204–208) points out, this model is not very representative of corona transfer, but it is the only one published. When using photoconductor materials coated on flexible belts, vibration at about 60 kHz can assist in breaking the toner to photoconductor adhesion when applied
324
ELECTROPHOTOGRAPHY
simultaneously with the electric field (84). The vibration is supplied by a piezoelectrically driven horn that contacts the back surface of the photoconductor. Best performance is obtained by placing the horn tip at a position that corresponds to the peak of the electric field (84). Transfer efficiencies of 98% have been achieved by this method. The transferred image has to be treated such that the charged toner will not jump back to the photoconductor upon separation from the photoconductor but rather sticks to the paper. If this retransfer does occur, it usually happens randomly and leaves an ugly image on the paper. Good engineering shapes the paper path and controls the corona currents or ac so that the electric field is always in a direction to hold the toner to the paper, as the paper leaves the transfer region. Transfer in color electrophotography is complicated by the requirement of transferring three or four separate images in registration. These images are generated either cyclically or in parallel. The transfer system has to be tailored for the specific application. In a cyclic color electrophotographic process, the same transfer sheet is brought into contact with the same developed photoconductor as many times as there are color separations. Thus, in a cut sheet machine, the transfer sheet is held in place by clips (there are also other methods of keeping the paper on the transfer member) until the last transfer has occurred. However, in a tandem application, the same transfer sheet moves synchronously as the imaged and developed photoconductor pass through as many separate transfer zones as there are color separations. The sheet has to be in registration with the image on the photoconductor in each of these zones. This presents engineering challenges. FUSING OR FIXING The final step in producing an electrophotographically printed page is fusing or fixing the toner image, so that it becomes permanent. Toner particles are pigments embedded in a thermoplastic such as polystyrene. These particles soften at their glass transition temperature (about 60 ° –70 ° C) and become soft and rubbery. Then as temperature rises, the melt viscosity decreases, and the plastic begins to flow. When the image support is paper, the molten toner flows into the pores of the paper by capillary action. Upon cooling, the toner solidifies, and it is trapped by the paper fibers and by intimate contact with the fiber surfaces. If the image support is another plastic, for example, a transparent material, then the heated solid plastic sheet comes into intimate contact with the molten toner and again, upon solidification, the image becomes permanent. As the temperature of the toner and plastic contact drops, the thermal contraction of the toner image matches that of the plastic support. Otherwise, a mismatch of deformations at the interface causes the solidified toner to flake off the support. The temperature of the toner support has to match or exceed that of the toner above the glass transition point for a bond to occur.
Contact Heat Transfer Typically, heat is supplied to both the toner particles and the support by contact with a heated surface, but radiant heat transfer has also been used. A roll fuser is an example of a contact heat transfer system. Typically, the paper that contains unfused toner is passed between heated pressure rolls that are hard cylinders made of a highly thermally conductive material such as steel coated with a heatresistant conformable material such as silicone rubber. The coating has to be highly thermally conductive and highly resistant to cyclic heating and cooling. Inside the cylinder an infrared-rich source such as a quartz filament lamp heats the steel. The surface temperature of the cylinder is controlled by pulsing the lamp at the proper rate. The surface temperature, about 180 ° C, is measured by sensors, and feedback control loops determine the pulse duration and frequency. These depend on the presence of paper and the temperature and humidity content of the paper. The quantity of toner on the paper contributes slightly to the heat transfer required to maintain proper roll temperature. The conformance of the roll coating to the paper surface and toner image has to be high, so that the image is not distorted upon fusing. When the toner melts, pressure from the conformable surface helps confine it to the boundaries of the image. Before conformable coatings were used, the roller contacting the image was hard, and considerable distortion of the image occurred, especially on transparent substrates. This was particularly objectionable when fusing halftones because dot distortions contribute to image optical noise. However, the conformance of the roll surface in the image area depends on the coating and also on the toner melt viscosity and dwell time in the roller nip. Short dwell time exists when the roll surface speed is high and/or the roller diameter is small and also when the coating has a high elastic modulus. Thus, for short dwell times, the roll temperatures have to be higher than for long dwell times. Two operational boundaries confine the operating characteristic of a fuser roll. Toner from the image can offset to the roller surface when it is too hot (melt viscosity is too low) or when the roller is too cool (melt viscosity is too high). These boundaries can be widened by metering release agents onto the fuser roll. These help form a layer of low cohesion between the molten toner and the roller surface. References 85 and 86 disclose a contact fuser assembly for use in an internally heated fuser roll structure. It is comprised of a rigid, thermally conductive core that is coated by a thin layer of a normally solid, thermally stable material and a liquid release agent is subsequently applied to the coated core. The liquid release agent is a silicone oil. Reference 87 discloses a heat and pressure roll fuser. The apparatus includes an internally heated fuser roll that has a backup or pressure roll to form a nip through which the copy substrates pass and the images contact the heated roll. The heated fuser roll has an outer layer or surface of silicone rubber or VitonTM to which a low viscosity polymeric release fluid is applied. The release fluid is dispensed from a sump by a metering roll and a donor roll. The metering roll contacts the release fluid in the sump, and the donor roll contacts the surface
ELECTROPHOTOGRAPHY
of the heated fuser roll. Reference 88 discloses a release agent management (RAM) system for a heat and pressure fuser for black and color toner images. Both the fuser roll and the backup roll are cleaned in each cycle to remove any debris that may have stuck to their surfaces because of contact with toner and its substrate. Fusing colored toner images requires attention to the degree of coalescence of the toner particles. To a certain extent, the fused image should not remain particulate because surface scattering of light tends to reduce the color gamut available from the pigments in the toners. Thus, engineering of fuser rolls that have operational latitude requires sophisticated heat transfer calculations. Radiant Heat Transfer Flash fusing and focused infrared light are examples of radiant heat transfer systems. In machines of the early 1960s such as the Xerox 914, the paper passed through an oven. This created a fire hazard if a jam held paper in the fuser. So considerable effort went into designing sensors and switches that would turn off the heat source when a jam occurred. In addition, the ovens were constructed of high thermal capacity materials. Therefore, engineering was focused on providing an ‘‘instant on or off’’ low thermal mass fuser (89). Here, radiant and contact fusing occur simultaneously. The back of the paper contacts the floor of the fuser, and a quartz lamp that has a focusing shield concentrates radiation on the front of the paper. This fuser is constructed of low thermal mass materials such as sheet aluminum or stainless steel. Reference 90 describes an instant-on fuser that has a relatively thin, fiber-wound cylinder that supports a resistance wire, heating foil, or printed circuit secured on the outside surface of the cylinder or embedded in the surface of the cylinder. Flash fusing concentrates the energy of a xenon flash tube on the toner particle image. During the flash, these particles absorb the infrared radiation and are heated to their melting point. Particle contact with paper heats the paper locally until the molten toner flows into the fibers. The paper remains cold in background areas. The toner polymer also has to melt to a low viscosity and flow quickly. Reference 91 describes a toner composition for flash fusing that contains a polyester binder whose glass transition temperature is between 50 and 80 ° C. A flash fusing system is currently not compatible with colored toner because most of the energy of the xenon flash is concentrated in the visible portion of the spectrum. By definition, colored toner, other than black, absorbs only a part of the visible spectrum. Efforts have been made, with limited success, to include visibly transparent infrared absorbing molecules in the toners to enable color flash fusing. Reference 92 describes a toner in which an ammonium salt of an infrared light absorber and a positive charge-control agent are used in combination. Image permanence is a rigid requirement in any commercial application. Fusing Color Images A black image requires absorbing light in the whole visible spectrum. Therefore, because the colorant in black
325
toner is usually carbon black, image quality does not depend on the coalescence of the toner particles into a solid mass. (Some color machines use a ‘‘process black’’ that consists of cyan, magenta, and yellow pigments dispersed within a single toner particle.) The main concern is image permanence. When toner particles are permanently attached to paper fibers without a high degree of coalescence, the image is said to be ‘‘fixed’’ but not necessarily ‘‘fused.’’ On the other hand, the color of nonblack toner does depend on coalescence of the particles. The surface roughness of the fused toner layer particularly influences the saturation of the output color because of light scattering. Therefore, the color toner has to melt in the fuser and it also has to flow within the image boundary. This requirement makes designing color toner materials very complicated because cyan, magenta, and yellow pigments influence contact electrification and toner color. Dispersion and concentration of colorant in the plastic binder influences color, toner charging characteristics, and the flow properties in the molten state. The cleanliness of the fuser roll in color applications is a high priority. Any contamination of the image due to residue from a previous image contaminates background areas and also distorts the color. A light colored image such as yellow is particularly vulnerable. COLOR ELECTROPHOTOGRAPHY To print process color electrophotographic images, the basic xerographic process steps, described in the preceding sections are repeated three or four times. Each of the latent electrostatic images is developed by using a subtractive color primary pigmented toner. Each time, the exposure step is tailored to meet the individual demands of the cyan (minus red), magenta (minus green), and yellow (minus blue), or black separation. The merits of fourcolor printing versus three-color printing are beyond the scope of this article. Quantitative control of charging, exposure, development, and transfer is necessary to meet the demanding requirements of a customer who is accustomed to obtaining high-quality lithographic color prints. However, lithographic and electrophotographic printing processes differ. Lithographic printing of colored images requires skilled press operators to perform trial runs while setting up the press. The final adjustments on the relative amounts of cyan, magenta, yellow, and black inks can use up hundreds if not thousands of pieces of printed material. Then, during the run, the operators observe the output of the presses to insure constant performance. (Short run lithographic presses are currently on the market, but a discussion of these devices is beyond the scope of this article.) If colors begin to shift during a job on a conventional press, the operators make adjustments to the press to restore color. This is highly impractical for color electrophotographic printers or copiers that are used for jobs that may consist of 1 to 1000 prints. The image quality requirements for highquality color printing need quantitative understanding of all of the process steps involved in forming the final image.
326
ELECTROPHOTOGRAPHY
that yields
Optical Density of Color Toner The darkness (optical density) of black images, is directly related to toner mass per unit area on the paper. However, the color of the developed toner image of color prints depends on the total and also on the relative amounts of the subtractive primary colorant pigments that make up the toner layer. Thus, the electrostatic images of the different primary ‘‘separations’’ of three-or four-color images are different. When light of intensity I0 (λ) strikes an image made up of pigmented areas on a substrate such as paper, part of the spectrum of the illuminating light is partially absorbed. Part of the illuminating light is scattered at the image surface, and part is reflected by the surface. A portion of the light that enters the image is absorbed by the pigment in the toner layer. The partially absorbed light that strikes the paper surface is scattered and partially absorbed by the paper fibers. Then, it is reflected back through the same toner layer. On its way out of the layer, it is again partially absorbed by the pigment. Thus, the light that emerges from the toner layer has a spectrum different from the light that enters it. The total intensity I(λ) of the light gathered by a sensing device such as the human eye or a densitometer is I(λ) = It (λ) + Ir (λ) + Is (λ)
(54)
where It (λ) is the intensity of light coming through the toner layer after being partially absorbed by the pigment on its way in, partially absorbed by the paper at the toner paper interface, and again partially absorbed by the pigment on its way out of the layer. Ir (λ) is the intensity of light reflected at the surface, and Is (λ) is the intensity of the light scattered at the surface. The total reflectance of the toner layer on the paper is R(λ) =
−β(λ)Cp
Rt (λ) = e
Ir (λ) + Is (λ) Is (λ) − Ir (λ) −β(λ)Cp Lt e Rp (λ) − R(λ) = 1 − I0 (λ) I0 (λ) (59) If front surface reflection and scattering are ignored, then optical density is given in terms of mass deposition by OD(λ) = 0.43β(λ)Cp M/Aρt
(60)
If one knows the toner composition, then the pigment concentration is known, and β(λ) can be measured. Then, a more representative optical density is obtained from mass deposition from. Ir (λ) + Is (λ) Is (λ) − Ir (λ) −β(λ)Cp M/Aρt Rp (λ) − ]e , I0 (λ) I0 (λ) (61)
and OD(λ) = − log10 R(λ).
and the optical density is OD(λ) = − log10 R(λ)
(58)
where Cp is the pigment concentration and β(λ) is the spectral absorption of pigment per unit layer thickness. Equation (57) is a reasonable assumption, but it is not accurate for OD > ∼1.4 because front surface reflection and light scattering by both the surface and the pigment at the surface of the fused layer become significant at high densities. Optical density has an asymptote at a finite value, which is determined by the reflecting and scattering characteristics of the fused toner layer and by viewing geometry, not by the M/A. To obtain the total reflectance as measured by a spectrophotometer or as seen by the human eye, the front surface reflection and scattering have to be included because Is (λ) + Ir (λ) can become greater than It (λ) at high values of mass deposition. So the total reflectance R(λ) of the fused toner on the paper surface is modified and yields reflectance in terms of layer thickness Lt ;
R(λ) = [1 −
I(λ) I0 (λ)
M Aρt
(55)
(62)
Figure 35 is a typical plot of the optical density of black toner using the reflectance given in Eq. (61). Similar plots
But the reflectance associated with only the amount of light absorbed by the pigment in the toner layer is 1.6
(56)
1.2
However, the reflectance of the toner layer on the paper (which may be colored), neglecting surface reflection and scattering is the product Rt (λ)Rp (λ). Let Vmass = ALt be the volume of 1cm2 of fused toner. It can readily be seen that the layer thickness Lt = M/Aρt , where M/A is the mass per unit area of the toner which has been transferred and fused on the paper and ρt is the mass density of the layer. It is often assumed that the reflectance of the fused layer can be represented by Rt (λ) = e−β(λ)Cp Lt
1.4
(57)
1 OD
It (λ) Rt (λ) = I0 (λ) − Is (λ) − Ir (λ)
0.8 0.6 0.4 0.2 0 0
0.5
1
M /A (mg/cm2) Figure 35. Optical reflection density versus deposition of some typical powder toners.
ELECTROPHOTOGRAPHY
are obtained for cyan, magenta, and yellow toners using red, green, and blue filters, respectively, in a densitometer. Parameters for this plot are βCP /ρt = 6.5 mg/cm2 , RP = 1, (Ir + Is )/I0 = 0.04, and because the toner for the calculation is black, the parameters are uniform for the whole visible spectrum. This calculation agrees well with the curve shown in Ref. 40, p. 39, Fig. 2.11, and p. 38, Eq. (2.4). Also see Ref. 93. The assumption implicit in obtaining Lt from M/Aρt is that fusing at low depositions causes the toner particles to flow into a solid layer of low thickness. This does not usually happen. At depositions less than about 0.3 mg/cm2 , toner particles, although fused, tend to remain as agglomerates of small particles. For typical spectral distributions of reflectance by cyan, magenta, or yellow printing inks, see Fig. 36 and Ref. 94. The spectral distributions β(λ) of the pigments in cyan, magenta, and yellow toners are very close to those of the printing inks shown in Fig. 36. The reflectance of white paper is a constant close to one for the visible spectrum. Thus, an approximation can be made from Eq. (58) that β(λ) is represented by a fraction, which is not a function of mass per unit area, multiplied by the natural logarithm of the reflectances shown in Fig. 36.
1
Yellow
0.9
Reflectance
0.8
Magenta
327
The transformation yields the spectral distribution of β(λ) shown in Fig. 37 when normalized to a maximum value of 6.5 per mg/cm2 . Using the values of β(λ) in Eq. (60) or (61), the spectral reflectance and optical density of layers of cyan, magenta, or yellow toner can be calculated from mass depositions obtained from the process parameters discussed in the preceding sections. If light scattering by pigment particles and toner layer interfaces is ignored, then the spectral reflectance of superposed layers of cyan, magenta yellow, and black can be estimated. Light first passes through the multiple layers and is partially absorbed on its way in. Then, the portion that gets through is reflected at the paper interface and is partially absorbed on its way out. Calculation procedures that account for significant light scattering are used extensively in the paint industry. However, the pigment particles used for paint are opaque. The paint binder is often mixed with titanium or silicon oxides that increase the opacity of the paint to help the newly painted surface color hide the previously painted surface. There are also conditions in color photography where light scattering invalidates Eq. (57). See Ref. 95 for more details. The pigments used in color toners tend to be more transparent than the pigments and fillers such as TiO2 or SiO2 used in paints. So, if one ignores light scattering by the pigment and fusing transforms the toner particles into homogeneous layers, then the spectral reflectance of the composite layer can be estimated from
0.7 0.6 −βblack (λ)Cblack
0.5
Rblack (λ) = e
0.4 0.3
−βcyan (λ)Ccyan
0.2
Rcyan (λ) = e
Cyan
0.1 0 400
450
500 550 600 Wavelength (nm)
650
DMAcyan ρt ,
(63)
DMAmagenta −βmagenta (λ)Cmagenta ρt , Rmagenta (λ) = e
700
Figure 36. Spectral distribution of reflectance of some typical cyan, magenta, and yellow printing inks.
DMAblack ρt ,
and −βyellow (λ)Cyellow
Ryellow (λ) = e
DMAyellow ρt ,
b(l) per (% /100 C p) per mg/cm2
7 Cyan
where DMA is M/A
6 5
DMAcyan = (M/A)cyan , DMAmagenta = (M/A)magenta ,
4
DMAyellow = (M/A)yellow ,
Magenta
Ccyan = cyan pigment concentration,
3
Cmagenta = magenta pigment concentration,
2
Cyellow = yellow pigment concentration,
1 Yellow 0 400
450
500 550 600 Wavelength (nm)
650
700
Figure 37. Spectral distribution of absorption per pigment layer thickness per pigment concentration fraction.
and the reflectance of the four superposed layers, ignoring front surface reflections and scattering, is Rlayer (λ) = Rcyan (λ)Rmagenta (λ)Ryellow (λ)Rblack (λ).
(64)
328
ELECTROPHOTOGRAPHY
Including reflection and scattering at the front surface in the composite layer yields Rcomposite (λ) = 1 −
Is (λ) − Ir (λ) I0 (λ)
× Rlayer (λ)Rp (λ) −
Ir (λ) + Is (λ) I0 (λ)
(65)
So, the spectral reflectance of the composite color as a function of the mass per unit area of the developed, transferred, and fused toner layers can be estimated. Architecture of a Color Electrophotographic Printer Light lens color copiers did not enjoy very enthusiastic market acceptance for many reasons. One technical drawback was that color could not be corrected for unwanted absorptions of magenta and yellow. Another drawback was that the color original to be copied could exhibit metamerisms because of the spectral absorptions of the pigments used in the original and the spectral distribution of energy in the illuminating system used to expose the photoconductor. A discussion of metamerism is beyond the scope of this article. Cyclic Color Copying or Printing. A light lens color copier has to be a cyclic machine. An example of a cyclic color copier is discussed in Ref. 96. The charged photoconductor in a cyclic color machine is exposed through a red filter to a colored original and developed with cyan (minus red) toner. The cyan toner image is transferred to a receiver and held there. Then, the photoconductor is cleaned, recharged, and exposed again to the same original through a green filter. The first developer housing that contains the cyan toner is moved away and a second housing is moved in to develop the second latent electrostatic image with magenta (minus green) toner. The magenta image is transferred in registration on top of the cyan toner. This process is repeated for exposure through a blue filter followed by development by yellow toner until the full color image is on the receiver. Then the receiver that contains the multiply colored toner layer is released and passed through a fuser. Modern color copiers first scan the original document by using a filtered raster input scanner. They generate a bit map of information that corresponds to the red, green, and blue and (for a four-color printer) black separations. This information is analyzed by an onboard computer and is color corrected for the unwanted absorptions of the primary colorants used in the toners. The color-corrected separations are printed by a raster output scanner (ROS) exposure system. Printing with a ROS opens the possibility of using a tandem or an ‘‘Image On Image’’ (IOI) (97), instead of the cyclic process described before. The IOI process is sometimes called the ‘‘recharge expose and develop’’ (REaD) process. When the bit map of imaging information controls the exposure step, the copier is the same as a printer. However, the source of the information is obtained from an original document instead of being generated by a computerized publishing system.
Tandem Color Printing Process The unique feature of a tandem process is that three (or four) sets of photoconductors, chargers, exposure systems, and transfer systems are used sequentially. Each set constitutes a mini ‘‘engine’’ that forms one of the primary colorant separations and deposits the appropriate toner on a single receiver. An example of a tandem printer is discussed in Ref. 98. Thus, as the receiver moves through the individual transfer zones of each mini ‘‘engine,’’ the full color toner image is built up. Primary colorant toners accumulate on the single receiver. In this kind of system, the developer housings are not moved in and out between separations. However, although mechanical complexity is increased, the number of prints generated per minute at a given process speed is triple (or quadruple) of that using the cyclic process. So, to obtain the same output rate as a tandem engine, the process speed of a cyclic engine needs to be increased substantially. The trade-off between cyclic and tandem processes involves many engineering and performance decisions that are beyond the scope of this article. Image on Image or REaD Color Printing In the REaD process, a uniformly charged photoconductor is exposed to the bit map of the first separation, pixel by pixel, to form the first latent electrostatic image. This image is developed by a developer housing that contains the appropriate primary colored toner. This first toner image is not transferred from the photoconductor. The toned image on the photoconductor passes through a charge erase and leveling station. Here, the charges that remain on the toned image and untoned background areas of the photoconductor are dissipated. Then the toned image on the photoconductor passes through a recharge station and a reexpose station. Here, the electrostatic latent image of the second separation is formed in registration on top of the first developed toner image and the photoconductor upon which it sits. The combination of the first developed toner image and the second latent electrostatic image is developed by a second developer housing that contains the appropriate second primary colorant toner. Again, the first two layers of toner are not transferred, but the previous steps are followed until the final, registered, three- or four-layer, multicolored toner image is formed. Then, the multiple layers of toner are treated by a corona to ensure that all of the particles are charged with the same polarity, and they enter a final transfer zone where they are all transferred simultaneously to the receiver sheet. Then, the toner image is fused to form the colored output. Several challenges presented to the electrophotographic subsystems used in the IOI process are spelled out in the ‘‘Background of the invention’’ section of Ref. 97. The light that exposes a previously developed latent image cannot be appreciably absorbed or scattered by the primary color toner that is on the photoconductor. This challenge was met by using infrared (IR) light for exposure because the primary colored toners are mostly transparent in the near IR. The previously toned latent image can be neutralized before subsequent recharging by
ELECTROPHOTOGRAPHY
329
Table Sub System
Equation
Corotron charging Flash exposure Laser scanning exposure with perfect motion quality Laser scanning exposure with imperfect motion quality Photoconductor illuminated area discharge Magnetic brush development neutralization limit
(6) (8) (17) (20) (23) (30) (31) (41)
Magnetic brush development before neutralization limit Transfer
(52)
Optics of fused toner layer
Output Quantity Photoconductor voltage before exposure (volts) Exposure (erg/cm2 ) Exposure distribution in fast scan direction Exposure distribution in slow scan direction Exposure distribution in slow scan direction Electrostatic image potential (volts)
%mass transferred (61) (62)
Toner mass per unit area deposited in electrostatic image (mg/cm2 ) Toner mass per unit area deposited in electrostatic image (mg/cm2 ) Toner mass per unit area on transfer sheet Toner layer reflectance in visible range Toner layer optical density in visible range
various forms of ac corona. Scorotrons tend to level the surface potentials of the previously developed image and undeveloped background areas of the photoconductors. These can be used to recharge the photoconductor before it moves into a second, third, or fourth exposure station. Then, because the photoconductor has a previously developed toner image on it when it moves into the second, third, and fourth development zones, these previously developed toner images must not be disturbed. Therefore, the development methods have to be non-interactive. One such development method is hybrid scavengeless development (see previous section). The nonmagnetic form of jumping development is another development method adapted for scavengeless performance (99,100). Finally, the transfer system has to be able to transfer high as well as low depositions faithfully with high efficiency. One such transfer method involves acoustic transfer assist, (see preceding section and Ref. 84). The final multilayer toner image can be transferred either to the final receiver material or to an intermediate surface. If it is transferred to the final receiver, then the toner image passes on to a fuser where the toner particles coalesce and form the output color print. The image quality of the color print is determined by the quality of the engineering and systems analyses that were performed in designing the color printer.
to estimate the influences of various parametric values on the optical density of the output images. The assumptions used in formulating the equations are discussed in the appropriate sections. These equations can be used to produce multivariable system plots to help examine the influences of the many input variables on the visual appeal of the electrophotographic images. Sensitivity can be analyzed by partially differentiating the various constitutive equations in the list to obtain an estimate of the influence on output print quality by fluctuations of the independent variables.
SUMMARY AND CONCLUSION
AMAT
Electrophotography, also called xerography (a process for producing high-quality images by using the interaction of electricity and light with materials) was described in this article. A number of methods were discussed for executing the process steps: charging, exposing, and developing the photoconductor, transferring and fusing the developed toner image. The measurable parameter is the optical density, which is a function of the toner mass per unit area and the pigmentation on the transfer sheet, among other things. The following is a list of the equation numbers that can be used (under the assumptions of their derivations)
BTR CAD DAD DMA FWHM HSD IOI IR LED LID M/A
Acknowledgments Many models and workers not mentioned in this article have contributed to the present state of understanding of electrophotography, and that this article is by no means complete. The science of electrophotography is multidisciplinary, involving chemistry, physics, mathematical distributions, and engineering. This article just briefly describes only some of the most common concepts, principles, and phenomena that are active during the production of toner images. The reader interested in further detail is referred to the texts and patents cited. The many articles published primarily in the Proceedings of the IEEE-IAS Industry Applications Society, the Journal of the SPIE and the journal Applied Physics also provide more detailed and possibly more current information.
ABBREVIATIONS AND ACRONYMS active matrix coating on a conductive substrate biased transfer roll charged area development discharged area development deposited mass per unit area full width at half maximum hybrid scavengeless development image on image infrared radiation light emitting diode liquid immersion development toner mass per unit area
330
OD PIDC PVK POW Q/M RAM REaD ROS TC
ELECTROPHOTOGRAPHY 26. Contact Charging Member, Contact Charging Making Use of It, and Apparatus Making Use of It, US Pat. 5,140,371, (1992), Y. Ishihara et al.
optical density photo-induced discharge curve polyvinylcarbazole potential well scorotron average toner charge to mass ratio release agent management recharge and develop raster output scanner toner concentration
27. Electrophotographic Charging Device, US Pat. 5,068,762, (1991), T. Yoshihara. 28. Charging Device and Image Forming Apparatus, US Pat. 5,940,660, (1999), H. Saito. 29. Control of Fluid Carrier Resistance and Liquid Concentration in an Aquatron Charging Device, US Pat. 5,819,141, (1998), J. S. Facci et al. 30. Roll Charger with Semi-Permeable Membrane for Liquid Charging, US Pat. 5,895,147, (1999), J. S. Facci. 31. Banding-Free Printing by Linear Array of Photosources, US Pat. 4,475,115, 1984, Garbe et al.
BIBLIOGRAPHY 1. J. Mort, The Anatomy of Xerography: Its Invention and Evolution, McFarland, Jefferson, NC, 1989. 2. Electrophotographic Apparatus, US Pat. 2,357,809, (1944), C. F. Carlson. 3. G. C. Lichtenberg, Novi. Comment. Gott. 8, 168 (1777). 4. P. Selenyi, Zeitschr. Phys. 47, 895 (1928). 5. P. Selenyi, Zeitschr. Tech. Phys. 9, 451 (1928).
8. P. Selenyi, J. Appl. Phys. 9, 637 (1938). Pat.
2,297,691,
34. M. Scharfe, Electrophotography Principles and Optimization, Research Studies Press, John Wiley & Sons, Inc., New York, NY, 1984.
36. Electrode Wire Cleaning, US Pat. 4,984,019, 1991, J. Folkins.
7. British Pat. 305,168, (1929), P. Selenyi. US
33. E. M. Williams, The Physics & Technology of Xerographic Processes, Wiley, NY, 1984.
35. Scavengeless Development Apparatus for Use in Highlight Color Imaging, US Pat. 4,868,600, 1989, D. Hays et al.
6. P. Selenyi, Zeitschr. Tech. Phys. 10, 486 (1929).
9. Electrophotography, C. F. Carlson.
32. A. Melnyk, Third Int. Congr. Adv. Non-Impact Printing Technol. Aug. 24–28, San Francisco, pp. 104–105, 1986.
(1942),
10. R. M. Schaffert, Electrophotography, Focal Press, NY, 1975. 11. Method for the Production of a Photographic Plate, US Pat. 2,753,278, 1956, W. E. Bixby and O. A. Ullrich Jr.
37. Dual AC Development System for Controlling the Spacing of a Toner Cloud, US Pat. 5,010,367, 1991, D. Hays. 38. Development Apparatus Having a Transport Roll Rotating at Least Twice the Surface Velocity of a Donor Roll, US Pat. 5,063,875, 1991, J. Folkins et al.
12. Corona Discharge Device, US Pat. 2,777,957, (1957), L. E. Walkup.
39. Hybrid Development Type Electrostatographic Reproduction Machine Having a Wrong Sign Toner Purging Mode, US Pat. 5,512,981, 1996, M. Hirsch.
13. Developer Composition for Developing an Electrostatic Image, US Pat. 2,638,416, (1953), L. E. Walkup and E. N. Wise.
40. L. Schein, Electrophotography and Development Physics, Springer-Verlag, NY, 1988.
14. Method and Apparatus for Printing Electrically, US Pat. 2,576,047, (1951), R. M. Schaffert.
41. F. R. Ruckdeschel Dynamic Contact Electrification Between Insulators, PhD Thesis, University of Rochester, 1975; also J. Appl. Phy. 46, 4416 (1975).
15. J. R. Rumsey, Electronic Imaging ’87, Int. Electron. Imaging Exposition Conf., Boston, Mass., 1987. pp. 33–41.
42. R. J. Nash, IS&T, Tenth Int. Congr. Adv. Non-Impact Printing Technol., New Orleans, 1994, pp. 95–107.
16. P. M. Borsenberger and D. S. Weiss, Organic Photoreceptors for Imaging Systems, Marcel Dekker, NY, 1993.
43. Process for Developing Electrophotographic Images by Causing Electrical Breakdown in the Developer, US Pat. 4,076,857, 1978, G. P. Kasper and J. W. May.
17. Method for the Preparation of Electrostatographic Photoreceptors, US Pat. 3,956,524, (1976), J. W. Weigl. 18. Layered Imaging Member and Method, US Pat. 4,282,298, 1981, M. W. Smith, C. F. Hackett, and R. W. Radler. 19. Imaging System, US Pat. 4,232,102, (1980), A. M. Horgan.
44. D.A. Hays, IEEE-IAS Annu. Conf. Proc., Toronto, Canada, 1985, p. 1510. 45. Developing Apparatus for Electrostatic Image, US Pat. 4,386,577, 1983, N. Hosono et al.
20. M. E. Scharfe, D. M. Pai, and R. J. Gruber, in J. Sturge, V. Walworth, and A. Shepp, eds., Imaging Processes and Materials, Neblette’s Eighth Edition, Van Nostrand Reinhold, NY, 1989.
46. Developing Apparatus for Electrostatic Image, RE. 34,724, 1994, Hosono et al.
21. J. D. Cobine, Gaseous Conductors Theory and Engineering Applications, Dover, NY, 1958.
48. Developing Method and Apparatus, US Pat. 4,610,531, 1986, Hayashi et al.
22. Corona Generating Device, US Pat. 3,936,635, (1976), P. F. Clark.
49. Developing Method for Developer Transfer Under A.C. Electrical Bias and Apparatus Therefor, US Pat. 4,395,476, 1983, Kanbe et al.
23. Long Life Corona Charging Device, US Pat. 4,837,658, (1989), L. Reale.
47. Magnetic Developing Method Under AC Electrical Bias and Apparatus Therefor, US Pat. 4,292,387, 1981, Kanbe et al.
50. Developing Device, US Pat. 4,383,497, 1983, H. Tajima.
24. Corona Generating Device, US Pat. 5,451,754, (1989), L. Reale.
51. Scavengeless Development Apparatus for Use in Highlight Color Imaging, US Pat. 4,868,600, 1989, D. A. Hays.
25. Contact-Type Charging Member Which Includes an Isulating Metal Oxide in a Surface Layer Thereof, US Pat. 5,502,548, (1996), Y. Suzuki et al.
52. Hybrid Scavengeless Developer Unit Having a Magnetic Transport Roller, US Pat. 5,359,399, 1994, J. Bares and C. Edmunds.
ENDOSCOPY 53. Donor Roll with Electrode Spacer for Scavengeless Development in a Xerographic Apparatus, US Pat. 5,338,893, 1994, C. G. Edmunds. 54. Developing Apparatus Including a Coated Developer Roller, US Pat. 5,386,277, 1995, D. A. Hays et al. 55. H. Stark and R. Menchel, J. Appl. Phys. 41, 2905 (1970). 56. Squeegee Roller System for Removing Excess Developer Liquid from Photoconductive Surfaces, US Pat. 3,955,533, 1976, I. E. Smith et al. 57. Developer Wringing and Removing Apparatus, US Pat. 3,957,016, 1976, K. Yamad et al. 58. Apparatus for Cleaning and Moving a Photoreceptor, US Pat. 4,949,133, 1990, B. Landa. 59. Method and Apparatus for Removing Excess Developing Liquid from Photoconductive Surfaces, US Pat. 4,286,039, 1981, B. Landa et al. 60. Imaging System with Rigidizer and Intermediate Transfer Member, US Pat. 5,028,964, 1991, B. Landa. 61. Method and Apparatus for Compaction of a Liquid Ink Developed Image in a Liquid Ink Type Electrostatographic System, US Pat. 5,655,192, 1997, G. A. Denton and H. Till. 62. Liquid Developer, US Pat. 3,729,419, 1973, S. Honjo et al. 63. Charge Control Agents for Liquid Developers, US Pat. 3,841,893, 1974, S. Honjo et al. 64. Milled Liquid Developer, US Pat. 3,968,044, 1976, Y. Tamai et al. 65. Dyed Stabilized Liquid Developer and Method for Making, US Pat. 4,476,210, 1984, M. D. Croucher et al. 66. Metallic Soap as Adjuvant for Electrostatic Liquid Developer, US Pat. 4,707,429, 1987, T. Trout. 67. Liquid Developer, US Pat. 4,762,764, 1988, D. S Ng et al. 68. Liquid Ink Development Dragout Control, US Pat. 5,974,292, 1999, G. A. Domoto et al. 69. Apparatus for Cleaning and Moving a Photoreceptor, US Pat. 4,949,133, 1990, B. Landa. 70. Liquid Developer Imaging System Using a Spaced Developing Roller and a Toner Background Removal Surface, US Pat. 5,255,058, (1993), H. Pinhas et al. 71. Electrophotographic Printing Machine, US Pat. 2,807,233, (1957), C. J. Fitch. 72. Constant Current Biasing Transfer System, US Pat. 3,781,105, (1973), T. Meagher. 73. Electrostatic E. F. Mayer.
Printing,
US
Pat.
3,043,684,
(1962),
331
82. Transfer System with Field Tailoring, US Pat. 5,198,864, (1993), G. Fletcher. 83. C.C. Yang and G.C. Hartmann, IEEE Trans. Elec. Dev. ED23, 308 (1976). 84. Method and Apparatus for using Vibratory Energy with Application of Transfer Field for Enhanced Transfer in Electrophotographic Imaging, US Pat. 5,016,055, (1991), K. Pietrowski et al. 85. Renewable CHOW Fuser Coating, US Pat. 3,934,547, (1976), Jelfo et al. 86. Renewable CHOW Fuser Coating, US Pat. 4,065,585, (1977), Jelfo et al. 87. Roll Fuser Apparatus and Release Agent Metering System therefor, US Pat. 4,214,549, (1980), R. Moser. 88. Web with Tube Oil Applicator, US Pat. 5,500,722, (1996), R. M. Jacobs. 89. Instant-on Radiant Fuser, US Pat. 4,355,225, (1982), D. G. Marsh. 90. Filament Wound Foil Fusing System, US Pat. 4,883,941, (1989), R. G. Martin. 91. Developer Composition for Electrophotography for Flash Fusing, US Pat. 5,330,870, (1994), S. Tanaka et al. 92. Flash Fixing Color Toner and Process for Producing Same, US Pat. 5,432,035, (1995), Y. Katagiri et al. 93. P. E. Castro and W. C. Lu, Photogr. Sci. Eng. 22, 154 (1978). 94. E. Jaffe, E. Brody, F. Preucil, and J. W. White, Color Separation Photography, GATF, Pittsburgh, PA, 1959. 95. R.M. Evans, W.T. Hanson, and W.L. Brewer, Principles of Color Photography, John Wiley, NY, 1953. 96. Multicolor Xerographic Process, US Pat. 4,135,927, (1979), V. Draugelis et al. 97. Single Positive Recharge Method and Apparatus for Color Image Formation, US Pat. 5,579,100, (1996), Z. Yu et al. 98. Tandem Trilevel Process Color Printer, US Pat. 5,337,136, (1994), J. F. Knapp et al. 99. Method and Apparatus for Color Electrophotography, US Pat. 4,949,125, (1990), H. Yamamoto et al. 100. Cleaning Method for use in Copy Apparatus and Toner Used Therefor, US Pat. 5,066,989, 1991, H. Yamamoto.
ENDOSCOPY
74. Powder Image Transfer System, US Pat. 3,267,840, (1966), T. Honma et al.
NIMISH VAKIL ABBOUD AFFI
75. Method of and Means for the Transfer of Images, US Pat. 3,328,193, (1967), K. M. Oliphant et al.
University of Wisconsin Medical School Milwaukee, WI
76. Photoelectrostatic Copying Process Employing Organic Photoconductor, US Pat. 3,598,580, (1971), E. S. Baltazzi et al. 77. Impression Roller for Current-Assisted Printing, US Pat. 3,625,146, (1971), J. F. Hutchison.
INTRODUCTION
78. Electrophotographic Receiver Sheet Pickup Method and Apparatus, US Pat. 3,630,591, (1971), D. R. Eastman.
Endoscopy is an imaging method that is used to visualize the interior of human organs. The widest application of endoscopy is in the gastrointestinal tract. Specialized instruments have been developed to image specific gastrointestinal organs. Gastroscopes, which are passed through the mouth, are used to examine the esophagus, the stomach, and the duodenum. Enteroscopes, also through the mouth, are specially designed to examine the small
79. Photoelectrostatic Copier, US Pat. 3,520,604, (1971), L. E. Shelffo. 80. Biasable Member and Method of Making, US Pat. 3,959,574, (1976), D. Seanor. 81. Transfer System with Tailored Illumination, US Pat. 4,014,605, (1977), G. Fletcher.
332
ENDOSCOPY
intestine. Colonoscopes, passed through the anal orifice, are used to examine the lower gastrointestinal tract. There are two major classes of instruments, fiber-optic endoscopes that consist of a bundle of coaxial fiber-optic bundles and electronic endoscopes that rely on a chargecoupled device. The electronic endoscope consists of a charge-coupled device that transmits an electronic signal resulting in an image that is visualized on a television monitor. Modern endoscopes, whether electronic or fiberoptic contain three systems: (1) a mechanical system used to deflect the endoscope tip, (2) a system of air/water and biopsy/suction channels with controls. Insufflation of air is important to distend the walls of the organ being studied. Suction capabilities allow the aspiration of contents from the gastrointestinal tract for examination. In other cases, fluid in the gastrointestinal tract obscures the view and suction allows the fluid to be aspirated away. (3) The imaging system which may be fiber-optic or electronic is used to visualize tissue. Table 1 lists the most frequent reasons for which endoscopy is performed and the organs that are visualized. Figures 1–10 are examples of endoscopic images.
Figure 2. Normal appearance of the duodenum.
Table 1. Common Endoscopic Procedures: Organs and Indications
Procedure Esophagogastro duodenoscopy (EGD)
Organs Examined Esophagus Stomach
Colonoscopy
Duodenum Colon Ileum
Enteroscopy
Small intestine
Esophageal ring
Common Conditions Involving These Organs Bleeding Gastroesophageal reflux disease Peptic ulcer disease Polyps Bleeding Ulcerative colitis Crohn’s disease Bleeding Tumors, benign and malignant
Figure 3. Esophageal ring. Note the sharp ring that caused obstruction and dysphagia in this patient.
Stomach Ulcer
Visible vessel
Esophagus
Figure 1. Normal junction of the esophagus and the stomach.
Figure 4. Duodenal ulcer that has a visible vessel. Artery protruding from the base of an ulcer.
ENDOSCOPY
333
Polyp
Diverticuli
Bleeding varix
Figure 8. Diverticulosis and colon polyp. Multiple diverticuli are seen in the colon, and a polyp is also seen projecting from the wall of the colon.
Figure 5. Bleeding esophageal varicose vein (varix).
Polyp Ulcers in the colon
Snare holding polyp
Figure 9. Polypectomy. A wire snare has been applied to the stalk of a polyp. Figure 6. Ulcerative colitis. Multiple ulcers in the colon.
Polyp Cauterized base
Ulcer
Figure 10. Polypectomy. The polyp has been resected after applying current through the wire snare.
INSTRUMENTS Fiber-Optic Instruments
Figure 7. Crohn’s disease of the colon. Note the solitary ulcer in the colon.
Clinical endoscopy began in the 1960s using fiber-optic instruments (1). The fiber-optic endoscope consists of two fiber-optic bundles, one to transmit light into the gastrointestinal tract and the other to carry light back to the endoscopist’s eye. Using these devices, light from a cold
334
ENDOSCOPY
light source is directed into a small bundle of glass fibers. Light emerges from the fiber bundle end to illuminate the mucosa of the gastrointestinal tract. Light is then reflected from the tissue of the gastrointestinal tract and enters an adjacent bundle of coherent glass fibers that returns it to the endoscope head, where the light forms an image. Each fiber in the endoscope is specially designed for endoscopy. The index of refraction decreases from the axis of the fiber to its outer surface, the light rays that travel through the fiber bend toward the center of the fiber. The outside of each fiber is covered with material that minimizes light leakage and adds strength and flexibility to the fiber. The fibers are arranged within a bundle identically at both the imaging end and the viewing end (coherent bundle). Fibers used in endoscopes may be as small as 8 microns. A satisfactory image requires a large number of fibers. There may be as many as 50,000 fibers in a bundle. Electronic Endoscopes. In the last decade, electronic endoscopes replaced fiber-optic endoscopes in gastrointestinal endoscopy. They make up for some of the deficiencies of the fiber-optic instruments (2). Fiber-Optic endoscopes do not allow binocular vision, and the endoscope needs to be held up to the endoscopist’s eye. Assistants cannot see the endoscopic image, and photography is difficult because a camera must be attached to the endoscope and the resulting image is frequently substandard. Electronic endoscopes permit binocular vision, and a large image is seen on a television monitor that permits assistants to participate actively in the procedure. This allows teaching and interaction. Images from the procedure can be stored on magnetic media and printed immediately without loss of resolution. There are many advantages to electronic endoscopes, but there is a learning curve associated with the device, and calibration of color is important, so that endoscopists view the image as ‘‘natural’’ and so that subtle changes in color and texture that could represent disease are not missed. All electronic endoscopes consist of two components: a charge-coupled device (CDD) and a video processor. The CCD is mounted at the tip of the endoscope, and it is a twodimensional array of photocells. Each photocell generates an electrical charge proportional to the intensity of the light that falls on it. This charge is relayed as a voltage to the video processor, which reconverts the information into an electrical signal that is sent to the television monitor. The television monitor is a two-dimensional grid of tiny fluorescent dots; each produces one additive primary color (red, green, or blue). The electrical impulses from the video processor direct a stream of electrons to the phosphors of each dot on the TV monitor. Stimulation of the phosphors produces a tiny point of colored light. The intensity of a given dot depends on the intensity of the signal from the processor. A trio of primary-colored dots on the monitor is a pixel. The pixel is the smallest picture element. The brightness and the color of a pixel depend on the intensity of the electron beam that strikes each component dot.
ANALOG AND DIGITAL TECHNOLOGIES IN ENDOSCOPY An analog signal is a waveform whose frequency and amplitude vary continuously and take on any values within an allowed range; a digital signal falls into one several predetermined discrete values (3). When a CCD records an image, the intensity of illumination of each pixel or brightness is an analog quantity. In digital imaging, on the other hand, the spatial information such as the location of each CCD pixel is represented by a binary number that forms a digital quantity. In an analog system, the varying intensities of the pixels are recorded as continuous variations in a physical quantity such as magnetic orientation (for an optical disk device and videotape) or optical polarization (for optical disk drives). As the disk spins, the variations in the appropriate physical quantities are converted into the temporal variation of intensity of the monitor pixels, and hence the images are retrieved. In a digital image storage system, the instantaneous value of the intensity of a pixel is converted into a binary pattern by an analog-to-digital converter. This pattern is copied as binary digits and saved on a magnetic or optical disk. When the images are retrieved, the binary digits are fed at a fixed rate into a digital-to-analog converter, which reproduces the original value of the pixel intensity at discrete time intervals. One of the advantages of digital storage methods is the high quality of the images reproduced. They do not degrade after reproduction, and they can be manipulated by computer image-processing systems. Initial acquisition costs of electronic endoscopes are higher but their longer life more than compensates for the extra cost. RESOLUTION Resolution is defined in the field of endoscopy as the shortest distance between two pictorial elements (4). A pictorial element is the smallest discernible portion of a picture. Therefore, resolution is the ability to distinguish two adjacent points in the target tissue being imaged. For example, the resolution decreases if the distance between the two adjacent points decreases or the distance from the target increases. The number of sensors in an individual instrument determines the resolution of that instrument. Therefore, the resolution of the fiberscope depends on the number of fibers in a given instrument. The number of fibers is limited by the diameter of the instrument; larger instruments cause greater patient discomfort. Due to improvements in fiber manufacture, modern endoscopes are narrower but contain more fibers than their predecessors. In electronic endoscopes, resolution is determined by the number of sensing elements in the CCD chip of the endoscope. Another factor that plays a role in resolution is the angle of view because resolution decreases with wide-angle lenses that allow studying a large area of mucosa. Resolution can be improved by using a narrow angle of view, and small areas of the mucosa can be examined in detail. The endoscopic procedure that uses a narrow angle of view becomes cumbersome because the tip of the endoscope has to be moved repeatedly to scan the entire circumference of the bowel. Routine endoscopy is designed for rapid scanning of the mucosa
ENDOSCOPY
of the gastrointestinal tract, so wide-angled lenses are part of all endoscopy systems. Therefore, all endoscopes are compromises between resolution and field of view. A third factor that effects resolution is brightness. Detail is difficult to discern in dark images, and resolution decreases. COLOR The principal determinant of mucosal color is the ability of the mucosa or an abnormality in the mucosa to reflect some wavelengths of light and not others. In addition to reflecting light, some areas in the gut wall also transmit light through the wall to underlying structures from which it is reflected. One example of light transmission through the bowel wall is the blue color seen in the region of the liver when it is viewed through the colon. The colonic lining in this situation acts as a filter, and the color visualized depends on the absorption and transmission of selected wavelengths through it. The eye sees color by using receptors of three different sensitivities according to the wavelength of the light within the electromagnetic spectrum. Therefore, it is convenient to break the visible spectrum of light into three approximately equal portions that consist of the red end (wavelength longer than 630 nm), the green middle (midsized wavelength between 480 and 560 nm), and the blue (wavelength shorter than 480 nm) end of the spectrum. The eye can appreciate color only as a combination of these three colors, and therefore they are called primary colors. Adding red, green, and blue produces the sensation of a particular color. These colors are called additive primaries. Another way to produce the same effect would be to use filters that subtract light selectively from the visible spectrum by transmitting light that possesses only a range of wavelengths. The three colors magenta, yellow, and cyan are special because each of these is composed essentially of two-thirds of the visible spectrum, that is, a third of the spectrum is missing. When used as filters, these colors can selectively subtract a third of the visible portion from white light. These colors are subtractive primaries. The brightness of the colors and of the images as a whole is an important aspect of endoscopic visualization. The visual impression of brightness of the image is also called luminosity. Color has three principal attributes: hue, saturation, and brightness. Hue is the property by which the eye distinguishes different parts of the spectrum. This property distinguishes red from green and blue. Saturation measures how much color is present. Colors appear whiter or grayer as saturation is reduced. Brightness or lightness, on the other hand, is a measure of the brilliance of the color, which depends on its ability to reflect or transmit light. For example, two colors may have the same hue, like yellow in lemon and banana, but one may reflect light more intensely, creating a more brilliant color sensation due to the difference in their brightness. When designing a video system for special applications such as endoscopy, it is essential to utilize video instruments and configurations to detect and quantify colors that are best suited to the human tissues and the body conditions under which they are used. The transmission
335
characteristics of fiber-optic instruments have been studied using standard illuminants of defined spectral composition (5). Fiber-Optic instruments transmit red colors much more efficiently than green and transmit blue least efficiently. Fiber-Optic and lens assemblies can distort the impression of color by selectively absorbing some wavelength of light because of the transmission characteristics of the fiber bundles. The perception of a particular color can be produced in three ways: (1) By viewing a source of light that is monochromic and produces light of a single wavelength, for example, laser light (2) Via the subtractive method of producing color, such as that employed when pigments or dyes are used. When white light shines through stained glass, for example, the glass selectively transmits some wavelengths and absorbs others giving rise to the sensation of color in the viewer’s eye. (3) Via the additive method of producing color. The additive method is used to generate color images on a television monitor. Stimulating red, green, and blue phosphors by using different intensities produces a particular color. The CCD, the key element of the electronic endoscope, is a small microelectronic device that converts an image into a sequence of electronic signals, which are transformed into an image on the monitor screen after appropriate processing. The CCD, which consists of thousands of minute photosensitive elements, converts light energy into an electrical charge proportional to the red, green, or blue component of the image. The CCD per se does not see color, and each sensing element responds only to the brightness of the light that falls upon it, much like rods in the retina. Therefore, special arrangements must be made for CCDs to visualize and reproduce color. The tristimulus system of human color vision is used as the basis for determining the color image. These two principal mechanisms are used to generate color in the image: 1. Sequential illumination: A rotating wheel that has three colored filters, red, green, and blue, is interposed between the light source and the mucosa. The filter rotates rapidly so that the mucosa is rapidly is exposed to short, rapid bursts of red, green, and blue light. The mucosa selectively absorbs different wavelengths of each color light, depending on its own color, and reflects the rest. Each of the thousands of sensing elements in the CCD senses reflected light that falls upon it and produces an electrical charge that is sent to the video processor. The processor is a computer that records the information from the CCD for red, green, and blue filters and transforms the electronic impulses into a message that is sent to the television monitor. The color wheel rotates at speeds of 30–50 revolutions per second that is too fast for the eye to appreciate the red, green, and blue components of the image. 2. Color chip: In color chip technology, each sensing element on the CCD is covered with a filter that is red, green, and blue, and white light shines continuously on the mucosa. This is achieved by mounting stationary filters on the surface of the CCD. When light is reflected selectively from the
336
ENDOSCOPY
mucosa, it passes through a filter before reaching an individual sensing element. Then, the element produces a charge proportional to the red, green, and blue component of the image. This proportional charge production depends upon the element filter color. The processor reads and integrates the information from the CCD and transforms the electronic impulses into a message that is sent to the television monitor, as described before. In laboratory studies, the performance of the two types of systems was similar (4). Measurement of Color and Resolution of an Endoscopy System Colorimetry is the technique that is used to measure color in endoscopy based on the principle that any color can be defined by an appropriate combination of red, green, and blue. The color receptors (cones) in the human eye are sensitive to these colors, so image display devices such as monitors are designed where the color of each pixel is determined by the intensity of the red, green, and blue dots that constitute it. Significant diagnostic information is encapsulated in the red, green, blue values measured by a CCD to make the use of a video system worthwhile. Photoelectrical colorimeters consists of three photoelectric cells covered by appropriate filters that convert light into electrical impulses that are amplified and can be converted into numerical units for comparison to reference colors of known values. Spectrophotometry is more accurate but much less convenient color measurement method (6,7), which has been used to characterize the transmission characteristics of fiber-optic instruments and to quantify blood flow in gastrointestinal tissues. Spectrophotometers produce a quantitative graph of the intensity of each wavelength of light across the visible range. Measurement is in the form of a graph. Resolution is measured using standard test charts of alternating black and white lines. It has been shown that the resolution of electronic endoscopes surpasses fiberscopes (4). IMAGE PROCESSING Both computer and television monitors create the illusion of continuous motion by rapidly displaying a number of images (called frames) each second. The eye cannot distinguish them as separate, and thereby an impression of continuity is created. Each frame is made of thin horizontal lines on the picture tube. The television monitor scans at rates and sequences different from the computer monitors, and this creates compatibility problems. A computer monitor, for example, scans lines from top of the screen to the bottom, whereas television monitors display all odd numbered lines on the screen and then all even numbered lines. The time for developing a frame is also different in the two systems. Therefore, the signal from an electronic endoscope needs to be modified to suit the monitor. The image processor converts the images captured by the video devices into numbers in a process called digitization. The processed numbers are used to create an image. Video capture devices that can digitize sufficiently quickly to
approximate real time are called frame grabbers. The digitization process includes color information, and this is achieved by matching colors in the image to colors in the computer’s memory (via a color look-up table). Computers can demonstrate varying numbers of colors that depend on the amount of memory attached to each pixel in the image. For example, 256 different colors can be displayed, from eight bits of information, several thousands colors can be displayed, from 16 bits and 16 million colors can be displayed from 24 bits. Because the range of colors in the gastrointestinal tract is limited, it is uncertain whether providing images of higher color resolution is necessary for diagnosis. In a study, lesions imaged in 8-, 16- and 24-bit color were displayed to endoscopists in random order on a monitor (8). The endoscopists could not distinguish among the images in 41% of the cases; images were correctly identified in 22% of the cases and incorrectly in 37% of the cases. Image Processing in Endoscopy Image processing has been introduced to the field of endoscopy but remains an investigational tool without general clinical applicability. The principal applications of image processing in endoscopy have been quantifying the characteristics of lesions or organs to provide diagnostic or prognostic information. Image analysis has been successfully used in other endoscopic disciplines to provide objective information that may aid in diagnosis. In gynecology, for example, preliminary studies have shown that computerized colposcopy may be useful in managing cervical dysplasia, a precursor of cancer (9). There are two major applications for image processing in endoscopy: (1) more accurate measurement of lesion dimensions and (2) characterization of tissue. Measurement of Dimensions in Endoscopy In addition to determining the prognosis of lesions visualized in endoscopy, endoscopic measurement of lesion size is important in predicting the pathological characteristics or the outcome of therapy. The risk of malignancy in colonic polyps, for example, increases when they are larger than 1 cm in diameter (10). Size may also affect treatment decisions for malignant lesions. For example, rectal cancers smaller than 3 cm in diameter are locally resected, and the recurrence rate is low. Larger lesions are associated with extrarectal spread, and the results of local resection are poorer. Therefore, the size of a lesion may help direct the surgical approach to some lesions. Problems with Lesion Measurement in Endoscopy It has been shown that visual estimates of ulcer size by endoscopists using endoscopes have many inaccuracies (11–13). The cause of this error is the distortion caused by the wide-angle lens that is part of the design of all endoscopes. This distortion causes a uniform grid to appear barrel shaped, so that points in the periphery of the field appear compressed relative to those in the center of the image. As a result of this barrel distortion, lesions appear smaller when they are further from the center
ENDOSCOPY
of the endoscopic field. The ability of the endoscopists to estimate the size of the ulcer was assessed in a model ulcer visualized by an electronic endoscope (13). The study demonstrated that there is no correlation between the true size of the ulcer and its size as estimated by the endoscopists. The error increased as the size of the lesion increased because the effects of barrel distortion were accentuated. Several attempts have been made to correct the measurement error caused by the wide-angle lens. Early studies (14,15) used grids built into the lens system of the endoscopes but had high error rates (11 ± 6%). Another method was using devices of known size introduced into the endoscopic field. The size of the lesion was then compared to the size of the measuring device. Measuring devices have varied, but the most popular method is to compare the size of the lesion to the distance between the tips of the jaws of alligator forceps used to obtain biopsy specimens in endoscopy. The underlying hypothesis is that the lesion and the measuring device will undergo equal distortion and magnification. Therefore, comparative measurements were expected to provide a more accurate determination of size. To make a measuring device of this nature work, the measuring device needs to be on or adjacent to the lesion so that the distances of the lesion and the measuring device from the lens are equal. When the measuring device is not in the same plane as the lesion, an error results from unequal magnification. The magnitude of this error increases with increasing distance of the lesion from the endoscope tip. Another issue is that if the measuring device is smaller or larger than the lesion, unequal barrel distortion will cause incorrect measurements because barrel distortion is not uniform; compression of the image becomes more pronounced at the periphery of the endoscopic field than in the center. For example, when lesions lie in the periphery of the field, comparative measurements become inaccurate because the measuring device tends to enter the field near the center due to the fixed location of the biopsy channel. Under these conditions, barrel distortion squeezes individual pixels in the image of the lesion in the periphery of the field together and creates an apparent reduction in size, the measuring device remains minimally altered, resulting in a systematic undermeasurement of ulcer size. In vitro (6,7) and in vivo (7) studies using open biopsy forceps as a measuring device of fixed size have shown significant underestimation of lesion size. Underestimation becomes more pronounced as the size of the lesion becomes greater than the size of the open biopsy forceps due to unequal barrel distortion. Dancygier et al. described a modification of the comparative measurement technique (16). Measurements are made by introducing a rubber plate of known area into the endoscopic field through the biopsy channel onto a lesion such as an ulcer. This technique yielded accurate measurement that had a mean error of 4.2 ± 0.5%. Using a modified technique of the rubber plate, Okabe et al. found that the method is accurate as long as the distance of the lens from the target is greater than 4 cm (17). When the distance is shorter, techniques to correct barrel distortion are necessary to obtain accurate results.
337
Image Processing Techniques for Lesion Measurement. In this technique, a new image is created by applying a geometric correction factor to each pixel of the original image, shifting the position of the pixel relative to the center of distortion of the lens (18). The correction factor is determined by measuring how an endoscope distorts an array of points. An inverse transformation is then applied to the image. Once determined, this conversion factor is constant for that endoscope. Once the conversion is accomplished, the number of pixels in the image that correspond to a given distance (1 mm) is determined. By this simple method of converting the pixels into millimeters, there is no need to introduce measuring devices into the endoscopic field. This process was used for a model ulcer of known size and also a target of known size (coated chewing gum) swallowed by a patient before endoscopy (13). Then, the procedure was transformed into a computer program that can be run on a personal computer. Measurements using this technique showed an error of 1.8 ± 2.2% for the model ulcer and 2.8 ± 3.2% for the swallowed gum. This technique has the important advantage that, once the instrument is calibrated, the only measurement that needs to be made by the endoscopist is the distance of the lesion from the endoscope using a graduated probe. The technique is relatively inexpensive because it uses personal computers that are widely available. Other image processing techniques have also been described. Hofstad et al. described colonic polyp measurement by photographing a polyp that had a measuring device next to it (19). Then, the photographic images were processed, and the size of the polyp was determined. The polyp was subsequently removed, and its size and the weight were obtained. The weight of the polyp correlated with the calculated area of the polyp. Kim et al. applied an image processing technique to study Barrett’s esophagus (a precancerous change in the esophagus) to determine the precise area of involvement (20). Photographs of the epithelium were converted into computer-generated maps that can be used to provide areas involved by the metaplastic changes. The software that was developed corrects first for the distortion of the endoscope lens, then calculates the center of the endoscopic photograph, and finally unrolls the photograph into a planar image. A series of photographs is obtained at a short distance from each other, and these are stacked together by the computer to calculate the area affected along the length of the esophagus. The measured area using this technique had an error rate of 5.2%. There was little interobserver variability. Finally, Yamagushi et al. described a technique using a diffraction grating and a laser (21). The system consists of an argon ion generator, a side-viewing endoscope that has a diffraction grating made of glass fiber, and a graphic processing system. A fiber grating was fit at the end of the endoscope, and 1600 light spots produced by the diffracting system were projected on the gastric mucosa. By analyzing the deviation of laser spots on the gastric lesion, the diameter, area, and depth of the lesion could be determined. For lesions 30 mm in diameter, the error of area measurement was 2.8 ± 1% and 3.7 ± 3% in vitro
338
ENDOSCOPY
and in vivo, respectively. For 5-mm lesions, the error of area measurement was 7.5 ± 5% and 6.5 ± 3% in vitro and in vivo, respectively. It is clear that the disadvantages of this system are that when the ulcer is smaller than 5 mm in diameter, error increases significantly. Unfortunately, none of the image processing techniques is widely available because they are either cumbersome or expensive. Despite their inaccuracies, visual estimation and comparative measurements (without image processing) are the most frequently used methods for determining lesion size today. Image Processing to Characterize Tissue. Abnormal tissue seen in endoscopy does not provide a pathological diagnosis. Often, biopsies must be obtained and processed, and a final diagnosis may be delayed by several days. A method that would allow a precise diagnosis at the time of endoscopy would be very useful. A variety of techniques has been developed that would allow characterizing tissue at the time of endoscopy that are described here. Laser-Induced Fluorescent Spectroscopy (LIF) and Elastic Scattering Spectroscopy This is a technique in which low-power laser light (<5 mW) is directed at a target tissue. A fraction of the light absorbed by the tissue is reradiated (autofluorescence) at a wavelength longer than the incident light. No damage is caused to the tissue being examined. The spectrum of the emitted light is characteristic of the physicochemical composition of the target tissue and can serve as a spectral ‘‘signature’’ of the target. Even though fluorescent spectroscopy is widely used in chemistry (22), it has been used only sparingly in clinical medicine. This is due in part to the sensitivity of fluorescent signals to physical factors such as pH and the oxidative state that may vary between individuals. LIF has been used in cancer research to differentiate breast, prostate, and renal malignancies from their benign counterparts (23–25). It has been shown that LIF can diagnose dental caries and atherosclerotic plaque (26,27). In the area of gastrointestinal diseases, LIF has been used to differentiate adenomatous (precancerous) from nonadenomatous (no cancer potential) polyps and to detect foci of dysplasia in Barrett’s esophagus. Cothren and colleagues (28) demonstrated that LIF can differentiate precancerous colon polyps from normal colonic mucosa very accurately by using a laser whose wavelength is 370 nm. The sensitivity was 100%, and the specificity was 97% in differentiating adenomas from normal colonic mucosa and hyperplastic polyps. In a recent study, the same authors reported on LIF as a technique for detecting dysplastic changes in colonic polyps (29). Using the 370-nm wavelength laser for excitation, fluorescent spectra were collected from normal mucosa and colonic polyps during colonoscopy. The sensitivity, specificity, and positive predictive values were 90%, 95%, and 90%, respectively, for detecting dysplasia. The overall accuracy was 88%. More recently, a novel technique, elastic scattering spectroscopy (ESS) or ‘‘optical biopsy’’ has been studied in detecting dysplasia in Barrett’s esophagus. This technique
takes a point measurement from tissue and is sensitive to the size and structure of cellular and subcellular components. In a study by Lovat et al., specimens from 41 sites in 27 patients who had Barrett’s metaplasia were analyzed (30). The probe tip was placed in gentle contact with the tissue surface at the site being studied. A pulse of white light from a xenon arc lamp was used. Data were normalized to the region of 650–700 nm, and spectral analysis was carried out from 330–650 nm. Histological findings were correlated with appropriate spectra. The technique demonstrated a sensitivity of 90–95% and a specificity of 85–90% for detecting dysplasia or cancer. The overall accuracy for correct tissue diagnosis (neoplastic or normal) was about 90%. ENDOSCOPIC ULTRASONOGRAPHY Endoscopic ultrasonography has emerged as the most significant advance in diagnostic gastrointestinal endoscopy in the last several years (32,33). A high frequency transducer placed at the tip of a dedicated endoscope enables the endoscopist to obtain high-resolution images of the gastrointestinal wall and surrounding structures. This technique has significant advantages compared to traditional ultrasound examinations through the abdominal wall. The probe is placed against the tissue being studied without intervening air, fat, and bone. Fundamentals of Ultrasonography Several characteristics of sound explain how ultrasound interacts with human tissue. Sound is a form of energy, and therefore, it has intensity and can be attenuated while traveling through a medium. Sound travels in a waveform that has frequency and wavelength. A sound wave has three main characteristics: frequency, velocity, and wavelength. Frequency is expressed in hertz (Hz) indicating the number of cycles (variations) that an acoustic variable goes through per second. Sounds at a frequency of 20 KHz or greater are called ultrasound because their frequency is too high to be audible to the human ear. In standard abdominal or pelvic ultrasound, the frequency of ultrasound used ranges from 2.5 and 7.0 MHz. Endoscopic ultrasound (EUS) uses frequencies that range from 5–25 MHz. This limits the depth of penetration of the ultrasound beam into tissue, but it improves resolution and hence the clarity of the image. The velocity of a sound wave is the speed with which a wave moves through a medium, and it is fairly constant in human tissue. The velocity of sound in tissue is determined by the density (concentration of matter or mass per unit volume) and stiffness (hardness or resistance of a material to compression) of the medium. Assuming constant density, velocity increases if the stiffness is increased. On the other hand, if the stiffness is constant, velocity decreases as density increases. Therefore, sound velocity is low through gases, higher in liquids, and highest through solids because the materials usually have larger differences in their stiffness than in their density. Although the velocity of sound can vary from 1450 m/s in fat to 4100 m/s in bone, the usual average velocity through
ENDOSCOPY
water and most human tissues is 1540 m/s (31,32). For practical purposes, velocity may be considered constant in human tissue. The wavelength of sound being used affects both the clarity and the accuracy of the image. Ultrasound waves pass through tissue, are reflected back to the receiver, and the echoes are used to create an image. Reflection occurs because of a property of tissue known as acoustic impedance, which is a measure of how well matter propagates sound waves. At interfaces between organs, sound reflected depends on the difference in the acoustic impedance between the tissues. The capacity of tissues and interfaces between tissues to reflect sound determines how well ultrasound can form an image. If there is a large difference in acoustic impedance, reflection at the interface is high, the transmittance is low, and hence, no image can be formed. For example, the impedance between the soft chest wall and air-filled lungs is very high, and therefore ultrasound has limited utility in visualizing the lungs. On the other hand, if there is little difference in acoustic impedance between the two tissues, reflection at the interface is low, the transmittance is high, and hence, an image can be formed. Most tissues in the gastrointestinal tract vary slightly in their acoustic impedance and therefore ultrasound imaging is possible. The angle of incidence and the angle of reflection refer to the angle formed between the incident and reflected ultrasound, respectively, in relationship to the interface between two media. Smooth surfaces or specular reflectors tend to reflect ultrasound in a single direction, which is related to the angle of incidence. Some examples of specular reflectors in the abdomen are the diaphragm, a capsule of the liver, a capsule of the kidney, and the surface of the gallbladder. Diffuse reflectors, on the other hand, are irregular surfaces and reflect ultrasound in multiple directions. The pattern of reflection from diffuse reflectors gives tissue its characteristic ultrasound texture. To improve the transmittance of sound across interfaces between media that are widely divergent in acoustic impedance, acoustic coupling agents can be used to narrow the discrepant gap. In transabdominal ultrasound, an acoustic gel is used as a coupling agent. In endoscopic ultrasound, a balloon filled with water is placed around the EUS probe to couple to the tissues in the gastrointestinal tract. In soft tissue, the relationship between frequency and absorption is linear: as the frequency of the sound wave increases, the absorption of sound increases, and the distance of penetration decreases. In EUS, for example, a 5-MHz transducer penetrates approximately 8 cm, but a 10-MHz transducer can penetrate only about 4 cm. Therefore, transducers of different frequencies are used for different imaging applications. Ultrasound Image Production An ultrasound transducer produces a beam of ultrasound, which travels through the tissue, and then the same transducer receives the echoes. Typically, transducers transmit sound in pulses and then listen for echoes between those transmitted pulses. Two types of transducers are used in ultrasound units, a mechanical and an electronic array. Mechanical transducers employ crystals mounted in a
339
casting. The crystals either oscillate back and forth or rotate through a fixed distance. Electronic arrays are made up of multiple small crystals arranged either in a straight line (linear array) or across a curved surface (curved array). These crystals do not move as in the mechanical transducers; instead, they are independently stimulated by electronic means. The crystals in both types are piezoelectric; when stimulated by an electric pulse, they vibrate and produce sound waves, which travel through the tissue. Each piezoelectric crystal resonates across a band of frequencies. However, it resonates best at a specific frequency known as its resonant frequency. This is the frequency at which an ultrasound transducer produces ultrasound waves, for example, 2.5, 5, 10 MHz. As echoes return to the transducer, the sound wave induces the piezoelectric crystal to oscillate, generating an electrical pulse. Therefore, during the receiving mode, the series of electric pulses produced by the transducer is processed into an ultrasound image. The amplitude or the strength of the echoes depends on attenuation of the sound beam emitted from and returning to the transducer as well as on the interaction of the sound at various interfaces. The velocity of sound in tissue is relatively fixed at 1540m/s, so the greater the distance that sound has to travel, the greater the attenuation of the transmitted sound and the echoes. For this reason, most of the echoes in diagnostic ultrasound are extremely attenuated and need to be amplified to create a visible image. This amplification is achieved electronically by time-gain compensation (TGC), a mechanism that increases the amplitude of echoes proportionately to the time it takes these echoes to return. The information received by the transducer is digitized in a unit known as a digital scan converter. This digitization creates points (pixels) that form an image on a video screen. The whiteness or blackness of each pixel reflects the amplitude of the echo that the pixel represents. For example, echoes of high amplitude produce a bright pixel, whereas echoes of low amplitude produce a darker pixel. To achieve realtime imaging of the organs examined, multiple images or frames (more than 16 frames per second) must be produced for the human eye to perceive a smooth, continuous image. Applications of Endoscopic Ultrasound in Gastroenterology Technical Issues. Two dedicated flexible instruments have been developed for endoscopic ultrasound: a radial EUS that has a 360° sector scanner that scans in a plane perpendicular to the axis of the endoscope and a linear EUS that provides a 100° sector parallel to the axis of the endoscope. The radial EUS instrument is the standard device used in most EUS studies. Linear EUS instruments are equipped with a working channel, through which needles can be passed under endoscopic guidance into tissues to obtain biopsy material — EUSguided fine-needle aspiration (EUS-FNA). Clinical Applications of EUS. EUS imaging is achieved by applying the transducer directly to the gastrointestinal wall by using a balloon filled with water that surrounds the tip of the probe and acts as an acoustic coupling
340
ENDOSCOPY
Stromal tumor
Figure 11. Endoscopic ultrasound image of a tumor of the gastrointestinal tract.
agent. Five major layers have been observed on EUS images of the normal gastrointestinal tract. EUS has proved to be an accurate method for local and regional staging of esophageal cancer (34,35). In addition, EUSFNA significantly improves the accuracy of lymph-node staging by allowing specimens to be obtained from suspicious looking lymph nodes (36). EUS is also useful in staging gastric and pancreatic malignancies (37,38). Other common applications of EUS include rectal cancer staging, evaluation of subepithelial tumors of the gastrointestinal wall, fine-needle biopsy of lymph nodes (FNA), and celiac plexus neurolysis (EUS CPN). Fine-needle aspiration under EUS guidance is used to sample mediastinal lymph nodes in staging lung cancer. Figure 11 shows a stromal tumor in the wall of the gut visualized by ultrasound. OPTICAL COHERENCE TOMOGRAPHY Optical coherence tomography (OCT) is a new technique for noninvasive cross-sectional imaging at high spatial resolution (10–20 µm). A signal is delivered from an energy source to an organ or tissue, and the reflected signal is captured and analyzed by a detector. Image information is obtained by repeated axial measurements at different transverse positions, as the optical beam scans across tissue. The resulting data constitute a twodimensional map of the backscattering or reflectance from internal architectural morphology and cellular structures in the tissue. OCT is similar to B-mode ultrasound, except that the energy delivered is infrared light rather than ultrasound. To date, the majority of studies of the application of OCT have concentrated on transparent structures such as the eye (39,40). OCT may be useful in detecting glaucoma at an early stage or other diseases of the eye (41,42). Recent studies have shown that OCT may detect precancerous changes in the esophagus (43). Das and colleagues studied eight patients who had known Barrett’s where the mean length was 7.7 cm (43). The images were obtained using
a 2.4 OCT probe passed through the working channel of the endoscope. The authors found that the esophageal mucosa was significantly thicker in areas of Barrett’s epithelium compared with squamous epithelium and the squamocolumnar transition zone could be easily distinguished in all but one patient. Although three patients had proven areas of high-grade dysplasia, OCT failed to identify them in any of these patients. Poneros et al. showed that OCT is highly sensitive in diagnosing specialized intestinal metaplasia in patients who undergo routine endoscopy (44). OCT results were reviewed in all biopsy proven cases of specialized intestinal metaplasia, and diagnostic criteria were formulated based on these OCT images. It was found that these criteria are 100% sensitive and 88% specific for detecting intestinal metaplasia in the esophagus. Future improvements in OCT, including improvement in resolution and advancement in probe technology, could eventually lead to its use for surveilling Barrett’s esophagus for any dysplastic and premalignant lesions.
PHOTODYNAMIC THERAPY Photodynamic therapy (PDT) depends on interaction between light and an otherwise inactive drug. This reaction results in transforming light energy into oxygen, which in turn interacts with tissue and causes its necrosis. This process is achieved first by administering a drug, called a photosensitizer, and then exposing the target tissue to a light source. The photosensitizer absorbs the light energy and releases high-energy singlet oxygen. Singlet oxygen interacts with the tissue causing its necrosis. The amount of photosensitizer administered and the intensity of light delivered to the surface of the target tissue can be varied according to the therapeutic effect desired. Most photosensitizers currently used in clinical practice are derivatives of porphyrin (45). Porphyrin compounds are administered either intravenously or orally. Normal tissue eliminates the porphyrin during several hours, whereas tumor tissue tends to retain it for several days. This allows administering treatment selectively to malignant tissue. The most common photosensitizers used in current practice are hematoporphyrin derivatives and porfimer sodium, both of which are administered intravenously, and aminolevulinic acid (ALA), which is administered orally (46). Energy is delivered through a catheter passed through an endoscope (47). PDT works by the destructive effect of singlet oxygen on intracellular organelles or enzymes. Receptors on tumor cells may also play a role (47–49). The degree of cell death caused by PTD depends on the type of cell treated; the effect can be seen within minutes to hours. The most common application of PDT in the field of gastroenterology is managing Barrett’s esophagus and superficial esophageal cancer. Complete elimination of Barrett’s esophagus is achieved in approximately onethird of the cases (50,51).
ENDOSCOPY
ARGON PLASMA COAGULATION
341
7. S. Kawano et al., Endoscopy 23, 317–320 (1991). 8. N. Vakil and K. Bourgeois, Endoscopy 27, 589–592 (1995).
Argon plasma coagulation (APC) has been successfully used for several years in surgery. Special probes allow using it in endoscopy (52,53). APC is an electrosurgical, monopolar, noncontact technique that applies highfrequency current to tissue through ionized, and thus electrically conductive argon (54). The argon becomes ionized in an electrical field that is created between an electrode at the distal end of the APC probe and the tissue. The plasma beam that is generated in this fashion automatically directs itself to the area of the tissue surface in which the electrical resistance is the lowest. As soon as the resistance rises at this location due to desiccation, the argon beam automatically turns away to other areas not yet desiccated. This process continues until the entire tissue surface in the exposed area to APC has been desiccated. The end result is uniform desiccation and coagulation of tissue, and injury is limited to the superficial layers of the bowel. Several studies have evaluated the ability of APC to eradicate Barrett’s esophagus and to restore squamous epithelium. In a recent prospective study by Van Laethem and colleagues (55), 31 patients who had Barrett’s esophagus received APC. Complete restoration of normal squamous epithelium was demonstrated in 81% of the patients after a mean of 2.4 APC sessions. Pathological examination, however, demonstrated that complete ablation of the abnormal epithelium was accomplished in 61% of patients. Several patients had residual abnormal epithelium under the newly formed squamous epithelium. Of the patients who had confirmed histological eradication of the Barrett’s 47% had a relapse after 1 year. Argon plasma coagulation is a relatively new method of tissue coagulation and destruction whose full potential is yet to be realized.
9. M. S. Mikail, I. R. Merkatz, Gynecol. 80, 5–8 (1992).
and
S. L. Romney,
Obstet.
10. R. L. Koretz, Ann. Int. Med. 118, 63–68 (1993). 11. A. Sonnenberg et al., Br. Med. J. 2, 1322–1324 (1979). 12. C. Margulies, B. Krevsky, and M. Catalano, How accurate are endoscopic estimates of size? Gastrointestinal Endoscopy 40, 174–177 (1994). 13. N. Vakil et al., Gastrointestinal Endoscopy 40, 178–183 (1994). 14. T. Sakita and Y. Oguro, Gastrointestinal Endoscopy 15, 456–464 (1973). 15. M. Murayama et al., Gastrointestinal Endoscopy 13, 49–57 (1971). 16. H. Dancygier, D. Wurbs, and M. Classen, Endoscopy 13, 214–216 (1981). 17. H. Okabe et al., Gastrointestinal Endoscopy 32, 20–24 (1986). 18. W. E. Smith, N. Vakil, and S. A. Maislin, IEEE Trans. Med. Imaging 11, 117–122 (1992). 19. B. Hofstad et al., Endoscopy 26(5), 461–465 (1994). 20. R. Kim et al., Gastroenterology 108, 360–366 (1995). 21. M. Yamaguchi, Y. Okazaki, H. Yanai, and T. Takemoto, Endsocopy 20, 263–266 (1988). 22. J. R. Lakowicz, in Principles of Fluorescence Spectroscopy, Plenum Press, NY, 1983. 23. P. S. Anderson et al., Laser Med. Sci. 2, 41–49 (1987). 24. R. R. Alfano et al., IEEE J. Quantum Electron. QE-23, 1908–1911 (1987). 25. Y. Yuanlong et al., Lasers Surg. Med. 7, 528–532 (1987). 26. M. B. Leon et al., J. Am. Coll. Cardiol. 12, 94–102 (1988). 27. R. R. Alfano et al., IEEE J. Quantum Electron QE-20, 512–516 (1984). 28. R. M. Cothren et al., Gastrointestinal Endoscopy 36, 105–111 (1990).
SUMMARY Endoscopy has grown from a field of diagnostic imaging into a therapeutic specialty in which diagnostic information is integrated with treatment strategies that allow patients to be treated without surgery in certain disease states. Further developments in imaging and instrumentation will increase the range of therapeutic options available to patients from endoscopes.
BIBLIOGRAPHY
29. R. M. Cothren et al., Gastrointestinal Endoscopy 44, 168–176 (1996). 30. L. B. Lovat et al., Gastrointestinal Endoscopy 51(4), AB227 (2000) (abstract). 31. J. A. Zagzebski, in Introduction to Vascular Ultrasonography, 3rd ed., W. B. Saunders, Philadelphia, 1992, pp. 19–44. 32. P. Sprawls, in Physical Principles of Medical Imaging, Aspen, Frederick, MD, 1989, pp. 389–406. 33. P. N. Burns, in Ultrasound Imaging and Doppler: Principles and Instrumentation. Ultrasonography of Urinary Tract, 3rd ed., Williams & Wilkins, Baltimore, 1991, pp. 1–33. 34. T. L. Tio et al., Gastroenterology 96, 1478–1486 (1989). 35. J. F. Botet et al., Radiology 181, 419–425 (1991).
1. G. B. Berci, ed., Endoscopy, Appleton-Century-Crofts, NY, 1976, p. 27.
36. M. J. Wiersema et al., Gastroenterology 112, 1,087–1,095 (1997).
2. P. H. Bartels, Cancer Suppl. 69(6), 1636–1638 (1992). 3. S. Hattori, Endoscopy 24(Suppl. 2), 506–508 (1992).
37. J. Greenberg, M. Durkin, M. Van Drunen, and G. V. Aranka, Radiology 116, 696–702 (1994).
4. K. Knyrim et al., Gastroenterology 96, 776–782 (1989). 5. N. Vakil, K. Knyrim, and E. C. Evebach, Balliere’s Clinical Gastroenterol. 5, 183–194 (1991).
39. J. A. Izatt et al., Arc. Ophthalmol. 112, 1584–1589 (1994). 40. M. R. Hee et al., Arch. Ophthalmol. 113, 325–332 (1995).
6. L. D. Picciano and J. M. Taylor, J. Clin. Eng. 19, 490–496 (1994).
41. J. S. Schuman et al., Arch. Ophthalmol. 113, 586–596 (1995).
38. L. Palazzo et al., Endoscopy 25, 143–150 (1993).
342
ENDOSCOPY
42. C. A. Puliafito et al., Ophthalmology 102, 217–229 (1995). 43. M. V. Das et al., Gastrointestinal Endoscopy 51(4), AB93 (2000) (abstract). 44. J. M. Poneros et al., Gastrointestinal Endoscopy 51(4), AB226 (2000) (abstract). 45. T. J. Dougherty et al., J. Natl. Cancer Inst. 90(12), 889–905 (1998). 46. L. Gossner et al., Gastroenterology 114(3), 448–455 (1998) (abstract). 47. H. I. Pass, J. Natl. Cancer Inst. 85(6), 443–456 (1993). 48. M. R. Hamblin and E. L. Newman, J. Photochem. Photobiol. B. 26(2), 147–157 (1994) (abstract).
49. Q. Peng, J. Moan, and J. M. Nesland, Ultrastruct. Pathol. 20(2), 109–129 (1996). 50. K. K. Wang, Gastrointestinal Endoscopy 49(3), S20–S23 (1999). 51. B. F. Overholt and M. Panjehapour, Gastrointestinal Endoscopy Clin. N. Am. 7(2), 207–220 (1997). 52. K. E. Grund, M. B. Naruhn, H. Fischer, and D. Storek, SAGES 1993. Surg. Endoscopy 7, 118 (1993) (abstract). 53. K. E. Grund, D. Storek, and G. Farin, Endoscopy Surg. 2, 42–46 (1994). 54. G. Farin and K. E. Grund, Endoscopy Surg. 2, 71–77 (1994). 55. J.-L. Van Laethem et al., Gut 43, 747–751 (1998).
F FEATURE MEASUREMENT
of such intersections through a population of objects, the distribution of object sizes that must be present to generate the measured distribution of intersection feature sizes can be calculated. This is one of the classical techniques of stereology and requires that the sections examined be a statistically representative sample (isotropic, uniform, random) of the specimen. The measurement parameters for the size of the features that appear in the two-dimensional image and can be determined by the computer include the area, the maximum and minimum caliper dimensions (also called Feret’s diameter), and the perimeter (exterior and total if there are internal holes). In addition, the convex area and perimeter may be useful; they are determined from the convex hull (also known as the tautstring or rubber-band boundary). Figure 1 summarizes these parameters. Some systems report a mean diameter for features, that may be determined from the area by using the relationship Diam = Sqrt(4 •Area/π ) or by measuring the diameter in a number of directions and averaging the values (the two values are not numerically equal). Area is easily determined simply by counting the pixels within the feature and taking into account the image calibration (pixels per mm, etc.). The maximum and minimum caliper dimensions are determined by applying a rotating coordinate system while searching the
JOHN C. RUSS North Carolina State University Raleigh, NC
The measurement of images is used in many fields at magnifications from the atomic to the cosmic. Generally, the purpose of measurement is to extract numerical values from the image that will meaningfully represent either the structure or the objects (features) in the field of view. The quantity of data is this greatly reduced from the original collection of pixels to a comparatively small set of measurements. Measurements fall into two major categories: global measurements on the entire image field that describes the structure and on each of the individual features. Interpretation of those measurements requires statistical and stereological techniques that lie beyond the scope of this article. Stereology is the science of geometrical probability that relates the parameters of three-dimensional structures and objects to the measurements that are accessible in two-dimensional images. Typical measurements used in stereological calculations are the total area fraction of the structure of interest, the length of the boundary surrounding the selected regions, and the number of objects present (or simply the number of intersections between the structure and some appropriate grid of lines). Measurements of individual objects within a field of view fall naturally into four groups: measures of size, density (or anything else derived from the pixel brightness values), position, and shape. Within each category are several choices of specific parameters, discussed later. In all cases, features that intersect the edge of the field of view cannot be measured accurately because their total extent is not shown. Omitting features that touch the edges of the image would bias the data because large features are more likely to intersect the edges. This bias can be corrected either by restricting the measurements to a region smaller than the entire image, so that features that cross the boundary of the measurement region do not reach the edge of the image and can be measured, or by making a statistical correction to the data based on the size of each measured feature.
Area (with or without internal holes) Convex hull (area and perimeter)
External perimeter Internal perimeter Maximum caliper dimension Minimum caliper dimension Figure 1. Diagram of some of the more common size parameters measured on a feature.
SIZE The size of objects is a concept that is easily understood but imperfectly represented in images. A typical microscopic image consists of a plane section through the object, and therefore the two-dimensional image feature is likely to be smaller than the extent of the three-dimensional object. It is generally not possible to measure an individual object from a single cross section, but given a large number
Actual minimum caliper dimension
Minimum caliper dimension determined with 5° error
Figure 2. Determining the minimum width of an elongated feature is very sensitive to the angle of the measurement. 343
344
FEATURE MEASUREMENT
Figure 3. Example of boundary representation for perimeter measurement (gray squares represent enlarged individual pixels). Red: city-block method; Green: chain-code method; Blue: smoothed boundary line. The length of the red line is greater than the green, which is greater than the blue. See color insert.
points on the periphery for the minimum and maximum coordinates. The maximum caliper dimension is not very sensitive to the number of rotational steps used in the procedure because only a cosine error is present (with 18 steps in every 10° , the worst case error is 5° and the cosine of 5° is 0.996; this means that a feature whose width is 200 pixels would be measured as 199. The minimum value is much more sensitive because it depends on the maximum dimension and the sine of the angle as indicated in Fig. 2. The maximum diameter points determined in many directions can be used as vertices of a polygon to form a robust measurement of the convex hull. A few systems use only the maximum diameter and the dimension at right angles to that direction to construct an ellipse whose area is used. The most troublesome measurement is that of the perimeter. A simple count of the edges of the pixels comprising the feature (the so-called city-block perimeter) greatly overestimates the perimeter length. Connecting the boundary pixels with lines whose lengths are equal either to the edge or diagonal of a pixel (the so-called chain code perimeter) is much better but still overestimates the perimeter length as shown in Fig. 3 and varies with the orientation of the feature. Fitting smooth curves to the boundary produces better results in most cases but requires significant computation, so that few systems use it. Users of image analysis systems should test their systems by digitizing an image of known shape and measuring the perimeter as a function of orientation. BRIGHTNESS-DERIVED PARAMETERS The brightness values of pixels within features are used to determine a number of derived parameters. One of the
most common is the optical density of objects illuminated in transmission. This situation applies to the light and electron microscopes and to macroscopic images of gels used in electrophoretic separation of proteins, for instance. The absorption of light (or other radiation) passing through objects usually obeys Beer’s law I/Io = exp(−µt) where I is the measured intensity, Io is the incident light intensity, µ is the linear absorption coefficient, and t is the thickness. Thus, the optical density (µt) can be measured by taking the logarithm of I/Io , which is unfortunately not as easy as it seems. The incident intensity Io is often taken as the intensity measured adjacent to the sample. Calibration of the camera to measure intensity may present difficulties. Most solid-state cameras are inherently linear (voltage is proportional to light intensity), whereas tube cameras may have various nonlinearities and outputs whose linearity depends on the overall light intensity in the image, the operating temperature of the camera, etc. But the major source of calibration difficulty lies with the video amplifiers used in cameras. Automatic gain controls and gamma adjustments are often hard to disable, and they produce images in which the intensity values cannot be related meaningfully to density. Good scientific-grade digital cameras usually enable densitometric measurements but still require calibration. Figure 4 is an image of a
(a)
(b)
Pixel
Density
23.9690 30.9020 34.9710 45.7740 62.4380 83.1790 105.659 144.587 180.320 253.906
2.00000 3.00000 4.00000 6.00000 8.00000 12.0000 16.0000 24.0000 32.0000 48.0000
Density calibration units = exposure factor
Linear Log
Figure 4. A photographic density wedge imaged as a camera calibration standard: (a) the original image; (b) calibration curve relating pixel brightness to optical density (exposure factor).
FEATURE MEASUREMENT
345
variations in the color temperature of the incident light alter measured RGB data. For some quality control applications, it may be enough to monitor changes in relative RGB intensity without trying to perform actual quantitative colorimetry. POSITION
Figure 5. Measuring feature clusters. Each feature in the image (a transmission electron micrograph of carbon black particles) consists of a three-dimensional cluster of particles. The integrated optical density of each cluster, divided by that for a single particle, estimates the number of particles in the cluster.
photographic density wedge and the resulting calibration curve. Brightness can also be related to other parameters in specific instances, such as dosage or activity measured from autoradiographic images and mean atomic number measured from backscattered electron images. In all cases, calibration requires some known standards. A threedimensional cluster of objects viewed in a section of finite thickness cannot be measured directly by the projected image because of feature overlap, but the integrated optical density of the cluster measures the total cluster mass, which when divided by the mass of a single feature estimates the number of particles in the cluster (Fig. 5 shows an example). In many situations, the intensity cannot be measured directly but must be normalized by using the ratio of intensities between two images taken at different wavelengths. For instance, in remote sensing (satellite or aerial photography), the local sun angle and surface topography affect the measured intensity, but the ratio of red (or infrared) to green may be relatively insensitive to these effects and can be calibrated to the surface albedo or other properties. It is tempting to try to use color images from a video camera to measure the color of objects, but in fact this does not work well. The red, green, and blue (RGB) intensities (even neglecting the camera and electronic problems noted before) are produced by filters that cover a range of wavelengths and even overlap to some degree. Many different incident light spectra can produce identical RGB outputs, so it is not possible to extract detailed color information from a measured image. In addition,
The position of objects is generally based on the geometric centroid of the area, which is a more robust descriptor than the center of the bounding rectangle (formed by the maximum and minimum x, y coordinates) or the average value of the perimeter points. Position of a feature in absolute coordinates requires some external information, such as readings from a microscope stage micrometer or the latitude and longitude of an airplane. Another measurement parameter that can be considered in the position category is the feature orientation. The angle of the longest diameter may be used, or for greater precision, the moment angle (the angle of a line from which the pixels have the smallest sum of squares distances) can be calculated. Statistical tests of orientation angle and position for departure from randomness are often used to detect gradients and anisotropy in structures (Fig. 6). For most purposes, it is more useful to measure the relative position of features within an image. This includes the distance between features and the distance of features from some other structure such as a cell wall, grain boundary, or road. The distances between feature centers provide a tool for detecting feature clustering. The mean
(a)
Figure 6. Example of measuring feature orientation: (a) moment angle and (b) position centroid location to detect a gradient, in this case of angle vs. horizontal position.
346
FEATURE MEASUREMENT
(b)
(b)
Horz: X-Centroid, min = 2.54813, max = 58.6258 Vert: Moment Angle, min=3.57635, max=177.050 Equation: Y = −2.87226 * X + 183.803 R-squared = 0.90261, 115 points
(c)
Figure 6. (Continued)
nearest neighbor distance for a random distribution is [0.5/sqrt(NA )] where NA is the number of features per unit area in the image. If the measured nearest neighbor distance is greater than this value, it indicates a selfavoiding distribution (e.g., precipitates in a metal, cacti in the desert). A value less than that for the random case
(a)
Figure 7. (Continued)
Figure 7. Example of measuring feature clustering using nearest neighbor distance: (a) self-avoiding features — the measured mean nearest neighbor distance is 25.1 pixels; (b) random distribution — the measured mean nearest neighbor distance is 13.1 pixels; (c) clustering — the measured mean nearest neighbor distance is 8.3 pixels. For an ideally random distribution, the theoretical mean nearest neighbor distance should be 12.6 pixels.
indicates clustering (e.g., magnetic particles, people at a party). Figure 7 is an illustration. Measurement of the distances of features from a boundary is difficult to calculate using analytical geometry because the boundary is likely to be irregular and not readily fit to a simple equation. Image processing offers another approach. As shown in Fig. 8, the Euclidean distance map of the region next to the boundary marks each pixel with a gray-scale value (shown for clarity using a pseudocolor table) that is the distance to the nearest point on the boundary. Assigning this gray-scale value to the feature’s centroid location provides the desired measurement. The same technique can be used to characterize the distances between feature boundaries, rather than centroid locations. When the Euclidean distance map of the background between features is combined with the skeleton (the lines of pixels that are equidistant from the
FEATURE MEASUREMENT
(a)
(b)
(c)
(d) Vert. scale = 6, total = 50
347
Min = 0.0, Max = 75.0000, 20 bins Mean = 26.8800, Std. Dev. = 17.3168 Skew = 0.66769, Kurtosis = 2.62004
Figure 8. Example of measuring feature distances from an irregular line: (a) How far are the red features from the green line?; (b) Euclidean distance map of the region (pseudocolor used to label the distances); (c) distance map values assigned to the features; (d) measurement of feature distances based on pixel gray values. See color insert.
boundaries), the resulting pixel gray values efficiently measure the surface-to-surface distances in the structure. Of course, for images that show sections through threedimensional specimens, the results must be interpreted stereologically. SHAPE The shapes of features are one of the most difficult things to characterize. One reason for this is that there are few common words that really describe shape. We can describe an object as circular, but what does the departure from circularity mean? Which of the features in Fig. 9 is less
Form factor = 0.861 Roundness = 0.677
Form factor = 0.644 Roundness = 0.802
Figure 9. Departures from circularity. The form factor and roundness (defined in Table 1) of the features show that the left feature is elongated whereas the right one has an irregular boundary.
348
FEATURE MEASUREMENT
circular, the elongated one or the one that is equiaxed but irregular? The various numerical descriptors of shape that are discussed later all suffer from the problem that they do not correspond to familiar concepts or to the usual meaning of the arbitrary (and inconsistent) words used to label the measurements.
The simplest shape descriptors are formally dimensionless ratios of size parameters, such as length/breadth (aspect ratio). The list shown in Table 1 shows only a few of the more common combinations. The reader should be aware that the names assigned to many of these parameters are not universally used. Some software packages use
(a)
(b)
(c)
(d)
(e)
Figure 10. Measurement of feature shape (from a metallographic image of a dendritic alloy structure) using dimensionless ratios as defined in Table 1 (the numerical value of the measurement is shown as a color value for ease of comparison): (a) form factor; (b) roundness; (c) aspect ratio; (d) convexity; (e) solidity. Notice that the various parameters rank the features in quite different orders. See color insert.
FEATURE MEASUREMENT
349
A second class of shape descriptors is the topography of features. This is most easily extracted from the skeleton produced by a specialized iterated erosion (see the article on IMAGE PROCESSING TECHNIQUES). The end points, branch points, and loops describe the topology of the feature, which is one of the characteristics of objects that people recognize instinctively. Figure 11 shows an example, in which the skeletons of stars are used to count the end pixels and label each feature with the number of points. The fractal dimension of feature boundaries describes another aspect of shape. Many natural objects have surfaces that are self-similar, meaning that new irregularities appear as they are magnified and the overall appearance stays the same (at least in a statistical sense). This dimension can be measured in several ways, the most efficient is to generate the Euclidean distance map of the feature and count the number of pixels as a function of distance from the boundary. The slope of the cumulative plot of number
(a)
(b)
(a)
the same names for other combinations of terms or other names for the ones shown. The descriptors shown in the table capture quite different aspects of shape, including the elongation or boundary irregularity shown in Fig. 9. But they are not unique because a wide variety of features with shapes that are visually quite distinct can have the same value for any parameter. Because the parameters are very quick to calculate, finding a shape parameter that is useful in a given circumstance is very desirable and sometimes possible. Figure 10 shows a variety of feature shapes classified by several different shape descriptors from the list in Table 1. Table 1. Selected Shape Parameters Defined as Dimensionless Ratios of Size Values Common Name Form factor Roundness Aspect ratio Convexity Solidity
Equation 4 π Area/perimeter2 4 Area/π max.diam2 Max.diam/min.diam Convex perim/perimeter Area/convex area
(b)
4.7
(c)
Log (area)
Figure 11. Topological measurement of features: (a) image of star-shaped features labeled with the number of points determined by using image b; (b) skeletons of features in image a where the end points are highlighted. See color insert.
Slope = 0.521 Fractal dimension = 1.479
4.5 4.3 4.1 3.9 0
0.5
1
1.5
Log (distance) Figure 12. Fractal dimension measurement: (a) rough surface (microscopic image of cross section through oxide on metal); (b) Euclidean distance map shown with pseudocolor; (c) log–log plot of cumulative area (number of pixels) vs. distance from background. The surface fractal dimension is 2.0 minus the slope of the plot. See color insert.
350
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
versus distance on log–log axes gives the dimension, as shown in Fig. 12. Each of the previous shape descriptors deals with only one portion of the overall concept of shape. A method that gives a comprehensive representation of shape using a small set of numbers is harmonic analysis. If the boundary of a feature is ‘‘unrolled’’ (e.g., as a plot of radius from the centroid as a function of angle or a plot of slope vs. position along the boundary), it forms a repeating curve. The Fourier transform of such a function reduces it to a small number of complex coefficients that give the magnitude and phase of each sinusoidal component of the curve. The amplitude of these terms decreases very rapidly and usually only the first 10–20 terms are sufficient to capture the shape completely, so that it can actually be reconstructed to the resolution of the original pixel image. For statistical purposes, these few numbers are often extremely useful for feature recognition and classification. Few computer programs offer these complicated parameters, partially because the calculations are nontrivial and partially because the terms do not easily correspond to visual ideas about shape and can be difficult to understand. However, they are used for such diverse purposes as identification of sedimentary soil particles, wheat grains, foraminifera, and tree leaves and offer an immensely powerful tool when required. There are also related methods that fit boundary lines to features to locate sharp corners or to describe the shape in terms of various physical models.
BIBLIOGRAPHY A large number of textbooks and references books are available on image processing and analysis. Many deal with the underlying mathematics and computer algorithms for implementation, and others concentrate on selecting methods for particular applications and the relationship between the algorithms and the processes of human vision that they often emulate. A few recommendations follow.
Classical Emphasis on Mathematical Procedures A. K. Jain, Fundamentals of Digital Image Processing. PrenticeHall, Englewood, CA, 1989. W. K. Pratt, Digital Image Processing. John Wiley & Sons, Inc., New York, NY, 1991. R. C. Gonzalez and R. E. Woods, Digital Image Processing. Addison-Wesley, Reading, PA, 1993.
Application-oriented Illustrations Comparing Techniques J. C. Russ, The Image Processing Handbook. CRC Press, Boca Raton, FL, 1998.
Feature Measurement and Stereology J. C. Russ, Computer Assisted Microscopy. Plenum Press, New York, NY, 1990. J. C. Russ, Practical Stereology. Plenum Press, New York, NY, 2001.
Computer Algorithms and Code J. R. Parker, Algorithms for Image Processing and Computer Vision, John Wiley & Sons, NY, 1996. G. X. Ritter and J. N. Wilson, Handbook of Computer Vision Algorithms in Image Algebra. CRC Press, Boca Raton, 1996.
FEATURE RECOGNITION AND OBJECT CLASSIFICATION SING-TZE BOW Northern Illinois University De Kalb, IL
SIGNIFICANCE AND POTENTIAL FUNCTION OF THE PATTERN RECOGNITION SYSTEM During the twentieth century, automation liberated human beings from most heavy physical labor in production industries. However, many tasks, which were thought to be light in physical labor, such as parts inspection, are still at their primitive stage of human operation. Such kinds of work involve mainly the acquisition of information through the eyes and the processing of this information and decision making through the brain. This is the function of the automation of pattern recognition. The application of pattern recognition is very wide. It can be applied, in theory, to any situation in which visual information is used in a decision process. Take, as an example, mail sorting. This job does not look heavy in comparison with steel manufacturing plant workers. But although the steel manufacturing plant has been highly automated, mail sorting remains weary, monotonous, and boring. Should a pattern recognition technique be used to identify the name and address of an addressee on the envelope, the efficiency and the effectiveness of mail sorting would increase highly. In the future, the use of pattern recognition to process documents (including text, graphics, and images) for archiving and retrieval will obviously be urgent and of great significance when dealing with large quantities of documents. Another important area of application is in medical imaging and diagnosis. The objective is to automate laboratory examination of medical images such as (1) chest Xrays for pneumoconiosis and (2) cervical smears and mammograms to detect precancer or early stages of cancer and to differentiate those abnormal cells from the cancerous cells even though they look very much the same. Automation of aerial and satellite photo interpretation is another important application of pattern recognition. Among the applications in this field are (1) crop forecasts and (2) analysis of cloud patterns. In summary, for rapid development of the natural, social, and technological sciences in the coming century, pattern recognition, including object classification, is, without doubt, an indispensable and integral built-in part of most machine intelligence systems that will be used for data analysis and decision making.
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
Images and Image Processing What is a ‘‘pattern’’? Patterns can be categorized as abstract or concrete. Abstract patterns are more or less conceptual. For example, from the style of writing, we can very easily differentiate prose from a poem. From the version of a poem, we can no doubt identify Emily Dickinson’s work from others. When we listen to music, we can certainly differentiate Tchaikovski’s work from that of Mozart. Let us take another example. In some important political debates, there does not seem to be much difficult in identifying the party from whom an idea originates. Recognition of the patterns mentioned before, termed conceptual recognition, belongs to another branch of artificial intelligence and is beyond the scope of this article. Examples of concrete patterns include characters, symbols, pictures, biomedical images, three-dimensional physical objects, target signatures, speech waveforms, electrocardiograms, electroencephalograms, and seismic waves. Images are descriptive information about the objects that they represent and display this information so that the human eye and brain can visualize the objects themselves. Under such a relatively broad definition, images can be classified into several types based on their forms and the method of generation. Figure 1 shows sets of all images that are commonly encountered. Visible images are those that can be seen or perceived by eye, whereas nonvisible physical images are just distributions of some measurable physical properties. Multispectral physical images are those that are displayed and printed out by using spectral response information obtained (sometimes with pseudocolors). Image processing and pattern recognition are intimately related. Very frequently, they go together as image processing and pattern recognition (IPPR). They have been a ramification of artificial intelligence (AI). However, this subdiscipline is growing so fast that there are many independent activities. The rapid development is driven by demand from applications. IPPR can be applied to various areas to solve existing problems. In the course of resolving practical problems using image processing, even better and better results are demanded. Often, knowledge of existing
techniques is not enough to meet upcoming requirements. This phenomenon motivates and speeds the development of the image processing discipline. Modes of Image Processing Systems Image processing systems can be categorized into the following modes: 1. The System is developed to transform a scene into another, which is more suitable for human recognition (or understanding). Various kinds of interference may be introduced during the process of acquiring an image. The interference may come from external sources or from the sensor itself (i.e., the acquiring device). Techniques are used to improve the image and to recover its original appearance. This image processing involves a sequence of operations performed on the numerical representation of pixels to achieve a ‘‘desired’’ result. For a picture, the image processing changes its form into a modified version so as to make it more suitable for identification. For example, if we want to understand what is in a low-contrast image, as shown in Fig. 2a, we can first improve the image to that shown in Fig. 2b, from which we can then visualize the scene more readily. Another example is detecting and identifying a moving tank in a battlefield, as shown in Fig. 3. We need to acquire all of the information, such as range, size, and aspects of the target. Aspects of the target include the shape of the target, symbols painted on the target, how fast and in which direction the target is moving. Some factors (e.g., smoking environment, target/background contrast, background radiance, stealth) influence correct target identification and therefore should also be taken into consideration. 2. The system is developed to complete a task in hand. To achieve noiseless transmission, the teeth on a gear and pinion must match precisely. Human operators usually perform this measurement using their hands, eyes and brains. It would be useful to design a computer inspection system to
Images Visible images
Pictures •
Spectral images
Non-visible images
Multi-spectral images
Tri-spectral images (RGB)
351
Multi-spectral images
Temp map
Pressure map
Elevation map
• •
Photographs • • •
Drawings Paintings Figure 1. Categories of images based on their form and the method of generation.
Population map
352
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
relieve the human inspectors from this tedious and monotonous work. Processes would involve several steps: (1) acquire the images of both gear and pinion, (2) bilevel them, (3) trace the contours, (4) obtain precise measurements of the teeth, and (5) perform the shape analysis on the gear teeth. An image processing system can also be designed for industrial parts verification and also for bin-picking in industry. ‘‘Bin-picking’’ uses an artificial vision system to retrieve parts that have been randomly stored in a bin. Dimensional checking and structural verification of hot steel products at a remote distance in a steel mill is also a good example. 3. The system is developed to initiate subsequent action to optimize image acquisition or image feature extraction. Examples can be found in the autonomous control of the image sensor for optimal acquisition of ground information for dynamic analysis (1). Due to the fixed orbit of the satellite and the fixed scanning mode of the multispectral scanner (MSS), a satellite acquires ground information in the form of a swath. Two consecutive swaths of information scanned are not geographically contiguous. Two geographically contiguous swaths are scanned at times that differ by several days. This will cause difficulty in accurately acquiring information on the ground when the target of interest falls left or right outside the current swath. Autonomous control of the image sensor enlarges the viewing range of the scanner so that we can acquire the entire target of interest in a single flight. This would permit on-time acquisition and on-line processing of the relevant time-varying scene information and also would save a lot of postflight
(a)
(b)
Figure 2. (a) A scenic image taken during a foggy morning. (b) Processed image with an image enhancer.
Image generation
Sensor
Segmentation
Feature extraction
Classification
Scene
• Speed • Sensor depression angle
• Target range • Atmosphere • Target size • Target aspect
• Background radiance
• Target/background contrast
Figure 3. Detecting a target on the ground from the air.
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
353
Traditional system Data acquisition
Digitization
Preprocessing
Recognition prediction
Decision Figure 4. An intelligent data acquisition system.
ground processing. The basic idea of autonomous control is to adjust the viewing angle of the sensor to track the target based on processing image-segment data acquired from previous scans (see Fig. 4). A simulated experiment has been conducted to evaluate the realizability of this approach. Thematic mapper data obtained from LANSAT D on the Susquehanna River in Pennsylvania and on the Yangtze River (the longest river in China) were chosen for the experiment. Figures 5a and 6a show composite images that are at least two times wider than the scanner view. Figures 5b and 6b show the changes in the scanner view for the optimal acquisition of useful information (tracking on the river). Figures 5c and 6c show the stored data for later image restoration. From these figures, we can see that we could acquire ground information across an area as wide as three swaths in a single flight. The systems mentioned can have many applications and can be designed as open or closed loops. If the processed scene is for human reference only, it is an open-loop system. If the processed image were used, for example, to help a robot to travel around a room in a hazardous environment, a closed loop would be suitable. To summarize, an image processing system can be designed for any one of the previously mentioned three modes to suit different applications. An image processing system, in general, consists of the following processes: image acquisition, image data preprocessing, image segmentation, feature extraction, and object classification. Results may be used for interpretation or for actuation. Image display in spatial and transform domains at intermediate stages is also an important functional process of a system. DATA REDUCTION THROUGH FEATURE EXTRACTION Necessity for Data Reduction To process an image using a computer, we first digitize the images as an x, y array of pixels. The finer the digitization, the closer the image to the original. This is what we call the spatial resolution. In addition, the larger the number of gray levels used for quantizing the image, the more details are shown in the display. Say, we have an image of size 4 × 4 , and would like to have a spatial resolution of 500 dpi (dots per inch) and 256 gray levels for image function quantization; we will have 2048 × 2048 × 8 or 33.55 million bits to represent a single image. The amount of data is very extensive. One of the most commonly used steps for image processing is convoluting an image using an n × n array or mask. Let us choose n = 3 as an example. There will be
nine multiplication and additions (or 18 floating-point mathematical operations) for each of the 2048 × 2048 or 4.19 million pixels, totaling 75.5 million mathematical operations. Assuming that six such processes are required to complete a specific image processing job, we need to perform 75.5 M × 6 or 453 million operations, very high computational complexity. If an average of 20 cycles of processing time is needed for each mathematical operation using a 500 MHz computer, then, (453 × 106 × 20)/(500 × 106 ) or 18 seconds are needed only for the mathematical computation of a single image without taking into consideration the time needed for the data transfer between the central processing unit (CPU) and the memory during processing. This amount of time will, no doubt, be much longer than the CPU time, perhaps 20 times as much. Therefore, to speed up the processing of an image, it is necessary to explore a way to represent an image accurately by using much less data. Features That Could Best Identify Objects Research results from experimental psychology show that the human vision system does not visualize an image (or an object) pixel by pixel. Instead, the human vision system extracts some key information that is created by grouping related pixels together to form features. Using properly selected features, we can differentiate a house from trees; differentiate a human being from animals, differentiate the people in the Western Hemisphere from those of the Eastern Hemisphere, and differentiate the fingerprint of one person from those of others. Features are in the form of a list of descriptions called a ‘‘feature vector,’’ much less in number, but carrying enough discriminating information for identification. Images Containing Categorized Objects. Processing by using feature(s) is actually high-level processing. Proper selection of features is crucial. A set of features may be very effective for one application but of little use for another application. The pattern classification problem is more or less problem-oriented. A proper set of features can be found by thorough studies of the pattern, the preliminary selection of possible and available features, and final selection of the most effective ones by evaluating each for its effectiveness in that particular classification problem (2–4). For images containing objects that have been categorized, an image feature can be referred to as parts of the image that have some special properties. Lines, curves, and textured regions are examples. They are local features, so called to differentiate them from global features such as average gray level. As a local feature, each should
354
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
(a)
(b)
(c)
Figure 5. Simulation experiment and results on the Susquehanna River.
be a local, meaningful, and detectable part of the image. By meaningful, we mean that the features are associated with an interesting scene element via the image formation process. By detectable we mean that algorithms must exist to output a collection of feature descriptors, which specify the position and other essential properties of the features in the image. Many examples can be enumerated. In the example given previously which describes the precise matching of gears and pinions (see Fig. 7), the objects of our concern are the teeth of the gears and pinions, especially the pitches between teeth and the profiles
of the teeth. We must find whether the pitches between teeth and the profiles of the teeth are the same (or at least within a tolerance) in both the gears and the pinions. These two requirements are very crucial and have to be met to assure low noise transmission. Our problem is to locate the teeth, get their contours, measure the distances between corresponding pixels of the adjacent teeth for the pitch measurements, and get the profiles of the teeth for matching. Local features describing the teeth will be useful for designing our teeth detection algorithm.
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
355
(a)
Figure 7. Dimensional checking of manufactured parts (gears and pinions).
deep or shallow, but is definitely not cut through the glass. Very often, the scratch has the form of a line that may be intermittently dotted and a little bit curved. A scratch usually does not occur at the edge of the glass. Images taken from the top and bottom sides of the glass differ in size due to optical refraction. Using such a priori knowledge, we can then design an algorithm to detect the features by locating them first and then identifying them. Figure 8 shows pictures of some typical cracks and scratches in a crystal blank. Figure 9 shows a microscopic image of a vaginal smear, where (a) shows the shape of normal cells and (b) shows that of abnormal cells. A computer image processing system using a microscope can be developed to automate screening the abnormal cells from the normal ones. There are many other applications in this category, for instance, the recognition of alphanumeric characters, and binpicking of manufactured parts by robots.
(b)
(c) (a)
Figure 6. Simulation experiment and results on the Yangtze River.
In glass manufacturing cracks and scratches are the main defects and need to be detected. A crack is a detrimental defect, and a scratch, if only superficial, produces a degraded product. In appearance, a crack usually runs from the edge of a glass piece toward its interior as a straight or broken line. A crack also, penetrates through the glass from one side all the way to the other side. Then, it appears on the image as a dark line of a certain width, when backlighting is employed for illumination. A scratch is somewhat different. A scratch superficially damages the surface of the glass, may be
(b)
Figure 8. (a) Tiny crack and (b) scratch on a crystal blank.
356
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
they are young, especially when viewed at a distance. When we use remote sensing to predict the annual yields of various agricultural products across a large terrestrial area, some other features should be extracted from those objects for identification rather than just using those described in the previous section. Research shows that different agricultural objects possess different spectral characteristics. Agricultural products, such as corn, wheat, and beans, reflect differently in the various wave bands of the optical electromagnetic wave spectrum (see Fig. 10). For this reason, the strength of responses in some particular wave bands can be selected as feature(s) for classification. This technique is widely used in remote sensing in acquiring resource information from the earth’s surface. Remote sensing is concerned with gathering data from the earth and its environment by visible and nonvisible electromagnetic radiation. Multispectral data are gathered, and as many as 24 (or more) bands are acquired simultaneously. Ultraviolet, visible, infrared, and thermal wavelengths are collected by passive sensors. Active sensors exploit microwave radiation in the form of synthetic aperture radar (SAR). The primary advantage of SAR is that it is usable in all weather and day/night. Multispectral sensors (satellite or airborne) provide data in the form of multiple images of the same area of the earth’s surface but taken through different spectral bands. The spectral bands typically exhibit high interband relationships so that a significant amount of redundancy exists between the spectral images. For a specific application, selection of information from a few spectral bands would be enough. Effective classification rests on intelligent choice of the spectral bands, often few in number. What is important is to select them to reduce the number and at the same time retain as much as possible of the class discriminatory power. For illustration, consider the previously-mentioned crop problem. Assume that three-proper features have already been selected. Then, a three dimensional graph can be plotted in which pixels corresponding to different classes of crops (corn, wheat, beans, etc.) will cluster together, and the clusters will be separated from each other, as indicated in Fig. 11
(a)
(b)
Figure 9. Cytological images of vaginal smears. (a) Normal vaginal exfoliated cells; pink and light blue colors represent respectively, the acidophilic and eosinophilic characteristics of these cells. The original image is a color image (b) Cancerous cells and red blood cells in the background. The original is a color image.
Scenic Images Containing Objects Best Represented by Their Spectral Characteristics. There are objects that cannot be well separated by their shapes alone, for example, objects that grow continuously with time. Agricultural products are good examples. They look similar when
The optical spectrum
10
200 Extreme
300 390 Far
455
492
Near Violet Blue
Ultra-violet
577 Green
597
Yellow
622
Orange
770 Red
1500
6000
Near Medium
Visible light
6
10
4×104
Far
Extreme
Infra-red
Gamma rays Cosmic rays
X-rays
mm waves
10−1010−910−8 10−7 10−6 10−5 10−4 10−3 10−2 10−1 1
Radio waves
Audio frequencies
10 102 103 104 105 106 107 108 109 1010 1011 1012 1013 1014
Figure 10. Optical spectrum in perspective.
λ (µm)
λ (nm)
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
357
Spectral band 3
(a)
(b)
Figure 12. Features selected for object recognition (a) for alphanumeric character recognition; (b) for chromosome analysis.
Spectral band 2
Sp e
ct ra
lb
an
d
1
whether the grouping provides meaningful information. Many of the features of interest are concerned with the shape of the object. The shape of an object or region within an image can be represented by features gleaned from the boundary properties of the object and/or from the regional properties. For example, visual inspection of a pinion in the teeth matching problem of gears and pinions (see Fig. 7) can use features such as ‘‘diameter of the pinion,’’ ‘‘number of teeth in the pinion,’’ ‘‘pitch between the teeth,’’ and the ‘‘contour shape of the teeth.’’ Features selected as shown, respectively, in Fig. 12a,b may be useful for recognizing alphanumeric characters and analyzing chromosomes (5,6).
Figure 11. Predicting the yearly yields of agricultural product via satellite image.
The classification problem then becomes finding the clusters and the separating boundaries among all of these classes. The yearly yields of each of these agricultural products can then be estimated. As already noted, beyond agricultural crop estimation, there are many fields that can benefit from remote sensing technology. For example, this technique has been used to survey large areas of inaccessible and even unmapped land to identify new sources of mineral wealth. This technique has also been used to monitor large or remote tracts of land to determine its existing use or future prospects. Satellite data are very useful for short-term weather forecasting and are important in studying long-term climate changes such as global warming.
Derived Features. For some applications, it is more convenient and effective to use computed parameters as features for classification. Such features are called derived features. ‘‘Shape factor (perimeter2 /area)’’ is one of them. It is a dimensionless quantity, invariant in scale, rotation, and translation, making it a useful and effective feature for identifying various shapes like a circle, square, triangle, and ellipse (see Fig. 13). Moments are also examples of derived features (4). Moments can be used to describe the properties of an object in terms of its area, position, and orientation. Let f (x, y) represent the image function or the brightness of the pixel, either 0 (black) or 1(white); x and y are the pixel coordinates relative to the origin. The zero- and first-order moments can be defined as
Feature Extraction By feature extraction, we mean identifying the inherent characteristics found within an image. These characteristics (features) are used to describe an object or attributes of the object. Feature extraction operates on a two-dimensional image array and produces a ‘‘feature vector (2–4).’’
m00 =
Features Extracted Directly From Pixels. Extracting of features this way is converting the image data format from spatially correlated arrays to textual descriptions of structural and/or semantic information. First, we properly threshold the image to classify each pixel as either part of the foreground or the background and then group the pixels together with some measurements, if needed, to see
d
m01 =
f (x, y)
Zero moment, it is the same as the object area for a binary image,
x ∗ f (x, y)
First-order moment with respect to y axis.
y ∗ f (x, y)
First-order moment with respect to x-axis.
b
l
s
l P = πd A = πd 2/4 S F = 4π = 12.6
m10 =
P = 4l A = l2 S F = 16
s
a
b
a
s P = 3s A = (s 2√3)/4 S F = 36/√3 = 20.8
P = 2π√{(a 2 + b 2)/2} A = πab S F = {2π(a 2 + b 2)}/ab
Figure 13. Simple derived features for various shapes.
358
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
Centroid (center of area or center of mass), a good parameter for specifying the location of an object, can also be expressed in terms of moments as x =
m10 ; m00
y =
m01 , m00
and
where x and y are, respectively, the coordinates of the centroid with respect to the origin. Features Obtained from Spectral Responses. Most real world images are not monochromatic, but have full color. A body that reflects light that is relatively balanced in all visible wavelengths appears white to an observer. On the other hand, a body that favors reflectance in a limited range of the visible spectrum exhibits some shades of color. To the human eye system, all colors are seen as variable combinations of three so-called primary colors, red (R), green (G), and blue (B). However, these three ‘‘R,’’ ‘‘G,’’ and ‘‘B’’ sensors in the human eye overlap considerably. For image processing, a composite color image can be decomposed into three component images, one red, one green and the third blue. These three component images can be processed separately and then recombined to form a new image for various applications. When we scan an image using a 24-channel multispectral scanner (see Fig. 16 later for details of the multispectral scanner), for a single picture point, we obtain, 24 values, each corresponding to a separate spectral response. The pattern xT = (x1 x2 . . . x24 ) will be a vector of 24 elements in a 24-dimensional space. Twenty-four images will be produced from one scan. Each image corresponds to a separate spectral band. APPROACHES FOR PATTERN CLASSIFICATION Structural Approach for Pattern Classification Many patterns are well represented by their shapes, as well as by measurements of the major parts of the pattern. The method of extracting features discussed previously can be applied to these problems. Furthermore, some processing such as contour tracing, dimensional checking, medial axis transformation, or shape description, may be needed to complete the classification task. Translation-invariant, rotation-invariant, and size-invariant capabilities should be taken into consideration in designing algorithms for classification and/or for measurements to make them practical. In general, for patterns that possess a rich content of structural information, the syntactic or structural approach can show its power in achieving effective classification. Examples of applications for such kinds of objects are numerous. Due to space limitations, two paradigm applications will be discussed briefly to show the processes involved in the syntactic or structural approach. Computer recognition of printed alphanumeric characters is one of the earliest projects attacked in this field.
Due to the many different fonts of characters in use, a simple template matching method is not realistic. Instead, the primitive positional features directly extracted from the pattern (or object), as shown in Fig. 12(a), together with their structural relationships can be used for classification. Characters A, A, and A of different fonts should be recognized as the same character. Three features ‘ ’, ‘ ’ and ‘ ’ are sufficient to identify it as character ‘‘A.’’ Character ‘‘D,’’ can be identified if features ‘ ’, ‘ ’ and ‘⊃’ are extracted from the object and their positional relationship is maintained. Similar arguments may be applied to other printed alphanumeric characters. Refer to (3,5) for the details of character recognition. The image of the human chromosome is another classical example of using syntactic description (6) because of its strong structural regularity (see Fig. 14). There might be variations in the lengths of arms, but the basic shapes will be the same for certain types of chromosomes, such as submedian or telocentric ones. These variations can be taken care of by a set of rule syntax. Figure 14 shows samples of submedian and telocentric chromosomes. Figure 15 shows the structural analysis of a submedian chromosome. Figure 15a shows bottom-up parsing of a submedian chromosome, Fig. 15b shows its structural representation, and Fig. 15c shows the primitives that are used for shape description and chromosome identification. Let us trace the contour of the submedian chromosome shown in Fig. 14a, in a clockwise direction. The upper arm pair of the submedian chromosome starts with a shape that can be depicted by the primitive ‘‘b,’’ then followed by a curve segment that can be depicted by primitive ‘‘a’’ (see Fig. 15c for the shape of the primitives). Contiguous to this curve segment ‘‘a’’ is a curve segment that can be represented by primitive ‘‘b’’ again, then followed by curve primitive ‘‘c.’’ Continue to trace the contour of the chromosome in a clockwise direction. We will obtain a string such as ‘‘babcbabdbabcbabd,’’ where the symbols ‘‘a,’’ ‘‘b,’’ ‘‘c,’’ and ‘‘d’’ represent, respectively, the primitives shown in Fig. 15c. In the description string ‘‘babcbabdbabcbabd’’ just mentioned, the first ‘‘bab’’ curve segment group represents the upper left arm, and the second ‘‘bab’’ group, the upper right arm. These two arms are linked together by the shape primitive ‘‘c’’ to form the upper arm pair. The third and fourth ‘‘bab’’ groups represent, respectively, the lower right and lower left arms of the submedian chromosome. These two arms (lower right and lower left arms) form the lower arm pair when they are linked together by a primitive ‘‘c.’’ After completing the tracing around the contour, we find that the the upper and lower arm pairs are linked together, respectively, by two ‘‘d’’ primitives, thus giving the description ‘‘babcbabdbabcbabd’’ for the shape of a submedian chromosome. If the contour tracing starts at some other point, for example, the upper valley ‘‘c’’ of the contour, the string description becomes ‘‘cbabdbabcbabdbab.’’ Note that these two string descriptions are exactly the same because the contour of an object is a closed shape. Strings like ‘‘babcbabdbabcbabd,’’ ‘‘abcbabdbabcbabdb,’’ and ‘‘dbabcbabdbabcbab,’’ etc. are the same when the curve segment primitives ‘‘a,’’ ‘‘b,’’ ‘‘c,’’ and ‘‘d’’ are linked together
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
(b)
(a)
359
b e
a a c b
b b
a
b
b
b b
c a d
d
c
b b
b
b
a a
Figure 14. Samples of submedian and telocentric chromosomes (a) submedian chromosome; (b) telocentric chromosome.
in the same cyclic sequence. For this reason, such a string is an orientation- invariant description of a submedian chromosome. By the same token, a telocentric chromosome can be represented by ‘‘ebabcbab,’’ that is, a certain shape will be represented by a certain string of symbols. In the terminology of syntactic pattern recognition, a grammar, or set of rules of syntax, can be established for generating sentences for a certain type of chromosome. The sentences generated by two different grammars, say G1 and G2 , will represent two different shapes; but the sentences generated by the same grammar, say G1 , represent the same category (e.g., submedian chromosomes), that tolerates minor changes in the shape. For details on the grammar and sentences generated by grammar, see (6).
Nonparametric Decision Theoretic Classifier and Training It by Using Prototypes Nonparametric Decision Theoretic Classifier. The previous section discussed objects that are well represented by their shapes. However, there are also many other objects (e.g., agricultural objects) whose structural information is not considered important. Instead, their spectral characteristics are much more useful. Measurements from several carefully selected spectral bands will be a good feature vector for use with the object classification in agricultural application. In such cases, another method, called the decision theoretic method (7,9–12), is more suitable for use in classifying it. To clarify this idea, let us consider a scene as the system input. If we scan the scene using a 24-channel multispectral scanner, for a single picture point, we obtain
360
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
(a)
Lower arm pair Arm
Upper arm pair Side
Arm
Arm
Arm b
a
b
c
b
a
b
d
b
a
b
c
b
a
b
Side d
(b)
(c)
a
b
c
d
24 values, each corresponding to a separate spectral response. Figure 16 is a schematic diagram of a data analysis system using an aerial multispectral scanner (MSS). The set of data at B, C, and D are, respectively, in the pattern space (n-dimensional), feature space (r-dimensional) and classification space (M-dimensional). Figure 17 shows in more detail how the multispectral scanner (MSS) extracts the spectral responses during its flight on board a plane or satellite. In Fig. 17, B, C, D, E, and F are mirrors. A is also a mirror, but rotates at a very high but constant speed. G is a beam splitter. When it is used to acquire information about objects on the earth’s surface, light reflected from objects
e
Figure 15. Structural analysis of a submedian chromosome: (a) bottom-up parsing; (b) structural representation; (c) primitives used for the analysis.
is reflected by the rotating mirror A to mirrors B and C, then to D and E, then to F, and then to the beam splitter G. A range of the electromagnetic energy spectrum is generated through the beam splitter. Twenty-four sensors are lined up in a linear array to receive the spectral responses at various wavelengths. Because the number of sensors used in the acquisition device is 24, responses are received from only 24 spectral bands, thus reducing the dimensionality to 24. Each spectrum component is assigned to one dimension. A pattern x can then be represented as xT = (x1 x2 . . . x24 ). The process of feature extraction converts the original data to feature vectors for use as input to the decision processor for classification. As
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
Pattern space A Data acquisition
•
Data preprocessing
B •
Feature extraction
Feature space
Classification space
C
D
•
Classification
361
•
Preprocessing
Flight line
mentioned previously, selection of information from a few spectral bands may be enough for a specific application. Effective classification rests on an intelligent choice of the spectral bands, often a few in number, but still retaining much class discriminating power, thus further reducing the dimensionality. The output of the decision processor is in the classification space. There are M labels, if the input patterns are to be classified into M classes. For the simplest two-class problem, M equals 2; for aerial-photo interpretation, M can be 10 or more. The synapses (or weights) used in the decision processor are initially set arbitrarily or set on the basis of a priori information about the statistics of the patterns to be classified and then are adjusted during a training phase. During the training phase, a set of patterns (known as prototypes) is presented to the decision processor, one after the other, and the synapses are adjusted whenever the classification of a prototype is incorrect. Prototypes from the same class share some common properties, and thus they cluster in a certain region of the pattern or feature space. The classification problem will simply be to find one or more separating surfaces that partition the known prototypes into their correct classes. Thus, separating surfaces (also known as decision surfaces, or hyperplanes) can formally be defined as the surfaces that separate known patterns into their respective categories without error and now are used to classify unknown patterns. Such decision surfaces are (n − 1)-dimensional. When n = 1, the decision surface is a point. When n = 2, the decision surface becomes a line. When n = 3, the decision surface is a plane. When n = 4 or more, the linear decision surface is a hyperplane represented by w1 x1 + w2 x2 + w3 x3 + · · · + wn xn + wn+1 = 0, or in vector form wT · x = 0,
Figure 16. A paradigm application in which the decision theoretic approach is used for classification.
where wT = (w1 w2 . . . wn wn+1 ), xT = (x1 x2 . . . xn 1). The augmented vector form used for w and x will allow translation of all linear discriminant functions to pass through the origin of the augmented space. A discriminant function for the kth category is in the following form: dk (x) = wk1 x1 + wk2 x2 + · · · , +wkn xn + wk,n+1 xn+1 , or dk (x) = wTk x. Then, dk (x) > dj (x),
∀ x ∈ ωk , and j = 1, 2, . . . , M, j = k,
where dk (x) and dj (x) are respectively, evaluated values of the discriminant functions for patterns x in classes k and j. Training the Nonparametric Decision Theoretic Classifier. The decision surfaces that partition the space may be linear or nonlinear. Many nonlinear cases can be handled by the method of piecewise linear or φ machine [see details in (8)]. In the last section of this article, we will introduce artificial neural networks for pattern classification. An artificial neural network is a powerful tool in solving nonlinear problems but has much higher computational complexity. Let us go back to the linear case. The training of a system is to find the weight vector w that has a priori information obtained from the training samples. We can do this either in the pattern space or in the weight space, but it is much more efficient and effective to work in the weight space. As described in the previous section, the weight space is an (n + 1)-dimensional space. For each prototype zm k , k = 1, 2, . . ., M, m = 1, 2, . . ., Nk (where M represents the number of categories (classes) and Nk represents the
362
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
The optical spectrum
10
200 Extreme
300 390 Far
455
492
Near Violet Blue
Ultra-violet
577 Green
597
Yellow
622
Orange
770 Red
1500
6000 4×10
Near Medium
Visible light
Far
4
10
6
λ (nm)
Extreme
Infra-red G
D
E
F
A B
C
Figure 17. Schematic diagram of a multi-spectral scanner.
number of prototypes belonging to category k), there is a hyperplane in the weight space on which
hyperplanes in the weight space. A vector w can be found in such a region so that
wT zm k = 0. Any weight vector w on the positive side of the hyperplane m yields wT zm k > 0. This correctly classifies zk in class k. Any weight vector w on the negative side of this hyperplane m yields wT zm k < 0 and incorrectly classifies zk in a class other than class k. Whenever any misclassification occurs, the w vector should be adjusted (or moved) to the positive side of the hyperplane to make wT zm k greater than zero again. Consider a simple two class problem. The decision surface is represented by d(x) = d1 (x) − d2 (x). When w is on the positive side of the hyperplane and z1 , z1 ∈ ω1 , is presented to the system, the prototype, z1 , will be correctly classified, because wT z1 > 0. When z2 , z2 ∈ ω2 , is presented to the system, wT z2 should be less than zero. Suppose that there are N1 prototypes that belong to ω1 and N2 prototypes that belong to ω2 . Then N = N1 + N2
∀ zm 1 ∈ ω1 ,
m = 1, 2, . . . , N1
m wT zm 2 < 0 ∀ z2 ∈ ω2 ,
m = 1, 2, . . . , N2 .
wT zm 1 > 0 and
This is the solution region for class ω1 in W space. That region lies on the positive sides of the N1 hyperplanes for class ω1 and on the negative sides of the N2 hyperplanes for class ω2 . Figure 18 shows the solution region for three prototypes z11 , z21 , and z31 (all of them belong to ω1 ). m We know that zm 1 are in ω1 , and z2 are in ω2 . To start the training, w may be chosen arbitrarily. Let us present zm 1 to the classification system. If w is not on the positive T m side of the hyperplanes of zm 1 , w z1 must be less than 0. Then we need to move the w vector to the positive side of the hyperplanes for zm 1 . The most direct way of doing this is to move w in a direction perpendicular to the m hyperplane (i.e., in the direction of zm 1 or −z2 ). If w(k) and w(k + 1) are, respectively, the weight vectors at the kth and (k+1)-th correction steps, then the correction of w can
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
procedure of weight vector adjustment based on the order of presentation of the prototypes shown in the figure.
w2
. fo
1
rz2
+ Solution region +
h.p
+
. h.p
for
Principal Component Analysis for Dimensionality Reduction
3
z1
w1 +
h.p
. fo
rz1
+
1
+
z1 1
Figure 18. Solution region for three prototypes z11 , z21 , and z31 of ω1 .
2
z1 c
e
+
+
d +
w
y = Ax
wTz2m < 0
z12
f
In previous sections, we have already discussed the problems that may arise in pattern classification in high-dimensional spaces. We have also mentioned that improvements can be achieved by mapping the data in pattern space into a feature space. Note that the feature space has a much lower dimensionality, yet it preserves most of the intrinsic information for classification. On this basis, let us introduce the technique of principal component analysis (8,11,13). The objective of principal component analysis is to derive a linear transformation that emphasizes the difference among pattern samples that belong to different categories. In other words, the objective of the principal component analysis is to define new coordinate axes in directions of high information content useful for classification. Let ml and Cl denote, respectively, the sample mean vector and the covariance matrix for the lth category (l = 1, 2, . . . , M). These two quantities can be computed from the training set. Now, our dimensionality reduction problem is to find a transformation matrix A. Two results are obtained by this transformation. First, an ndimensional observation vector, x = x1 , x2 , . . . , xn , will be transformed into a new vector yT = [y1 , y2 , . . . , yp ], whose dimensionality p is less than n, or
wTz1m > 0
Solution g region for ω1
z22
a
b Order of presentation of prototypes: z11, z21, z12, z22, z11, z21, z12, z22, z11, z21, z12, z22, z11, z21, z12, z22
where A is a p × n transformation matrix. This transformation is based on the statistical properties of the sample patterns in the vector representation and is commonly referred to as principal component analysis. Principal eigenvectors represent directions where the signal has maximum energy. Figure 20 shows a two-dimensional pattern space in the Cartesian coordinate system for a two-class classification problem. From the distribution of the pattern samples in Fig. 20, we can see that these pattern points cannot be effectively
Figure 19. Weight vector adjustment procedure for weight training a two-class classification system.
x2 y2
be formulated as w(k + 1) = w(k) +
363
y1
czm 1
if w
T
(k)zm 1
<0
w(k + 1) = w(k) − czm 2
if wT (k)zm 2 > 0
w(k + 1) = w(k)
if correctly classified
ω1
x1
During this training period, patterns are presented, one at a time, through all prototypes. A complete pass through all of these patterns is called an iteration. After an iteration, all of these patterns are presented again in the same or other sequence to carry on another iteration. This is repeated until no corrections are made through a complete iteration. Figure 19 shows the step-by-step
ω2
Figure 20. A two-dimensional pattern space and principal component axes.
364
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
ω1
ω2
ω2 to
y
ω1
ω2
t0
ω1
The conventional way to handle this problem is to preprocess the data to reduce its dimensionality before applying a classification algorithm, as mentioned in the previous section. Fisher’s linear discriminant analysis (8,11,13) uses a linear projection of n-dimensional data onto a one-dimensional space (i.e., a line). It is hoped that the projections will be well separated in classes. In so doing, the classification problem becomes choosing a line that is oriented to maximize class separation and has the least amount of data crossover. Consider that an input vector x is projected on a line resulting in a scalar value y:
y′ Figure 21. Selection of the first principal component axis for classifier design.
discriminated according to their distribution along either the x1 axis or the x2 axis alone. There is an overlapped region resulting in an error in classification. If we rotate the x1 and x2 axes to the positions shown as y1 and y2 , then the distribution of these pattern points are well represented for discrimination. Component axis y1 will be ranked first by its ability to distinguish between classes ω1 and ω2 (i.e., it has smallest error in the projected space) and is called the first principal component axis. Along the axis y2 , the projections of the sample patterns of the two classes have a large overlapping region. This principal component axis is not effective for classification. Figure 21 shows an example of a two-class problem (ω1 and ω2 ), where distributions of sample patterns are projected onto two vectors y and y , as shown. A threshold t0 is chosen to discriminate between these two classes. The error probability for each is indicated by the cross-hatched region in the distributions. From the figure, we see that the error from projection onto the vector y is smaller than that onto the vector y . This component y will be ranked first by its ability to distinguish between classes ω1 and ω2 . For the linear classifier design, we will select y and t0 to give the smallest error in the projection space. Data analysis by projecting pattern data onto the principal axes is very effective in dealing with high-dimensional data sets. Therefore, the method of principal components is frequently used for the optimum design of a linear classifier. How many principal components are sufficient for a classification problem? That depends on the problem. Sometimes, the number of components is fixed a priori at two or three, as in those situations that require visualizing of the feature space. In some other situations, we may set a threshold value to drop out less important components whose associated eigenvalues (λ) are less than the threshold. The procedure for finding the principal component axes can be found in (8).
y = wT x where w is a vector of adjustable weight parameters specifying the projection. This is actually a transformation from the data point set x into a labeled set in a onedimensional space y. By adjusting the components of the weight vector w, we can select a projection that maximizes the class separation. Consider a two-class problem consisting of N pattern samples. N1 of these samples belong to class ω1 , and N2 of these samples belong to ω2 . The mean vectors of these two classes are, respectively, m1 =
and
m2 =
1 N1
1 N2
N1
xi1 ,
for class ω1 ,
xi2 ,
for class ω2 ,
i=1
N2 i=1
where the subscripts denote the classes and the superscripts denote the patterns in the class. The projection mean (m1 ) for class ω1 is a scalar given by m1 =
1 N1
N1
yi1 =
i=1
= wT
1 N1
N1
1 N1
N1
wT xi1 ,
i=1
xi1 = wT m1 .
i=1
Similarly, the projection mean of class ω2 is a scalar given by m2 =
1 N2
N1
= wT
yi2 =
i=1
1 N2
N1
1 N2
N1
wT xi2 ,
i=1
xi2 = wT m2 .
i=1
Optimum Classification Using Fisher’s Discriminant Multidimensional space, no doubt, provides us with more information for classification. However, the extremely large amount of data involved makes it more difficult for us to find the most appropriate hyperplane for classification.
Therefore the difference between the means of the projected data is, m2 − m1 = wT (m2 − m1 ).
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
365
SB is the between-class covariance matrix. Similarly, the denominator can be rewritten as ω1
s21 + s22 = wT Sw w
ω2
where
Arb
itra ry bou deci nda sion p(x /ω ry 1 )p (ω ) 1
Sw =
Optimum decision boundary
N1
(xi − m1 )(xi − m1 )T +
i=1 xi ∈ω1
R1
N2
(xi − m2 )(xi − m2 )T .
i=1 xi ∈ω2
Sw is the total within-class covariance matrix. Hence,
E1 = p ∫ (x /ω2)p(ω )d 2 x R1
p(x /ω
2 )p
R2
(ω2 )
E2 = p ∫ (x /ω1)p(ω )d 1 x R2 Figure 22. Probability of error in a two-class problem and Fisher’s linear discriminant.
J(w) =
wT S B w wT S w w
J(w) is Fisher’s criterion function to be maximized. Differentiating J(w) with respect to w and setting the result equal to zero, we obtain (wT SB w)Sw w = (wT Sw w)SB w.
When data are projected into y, the class separation is reflected by separation of the projected class means, and then we might simply choose an appropriate w to maximize (m2 − m1 ). However, as shown in Fig. 22, some cases would have to be discussed. We have to take into account the within-class spreads (scatter) of the data points (or the covariance of each class). Let us define the within-class scatter of the projection data as s21 =
(y − m1 )2 ,
y∈Y1
and s22 =
(y − m2 )2 ,
y∈Y2
where y1 and y2 are, respectively, the projections of the pattern points from ω1 and ω2 . Then, fisher’s criterion J(w) is J(w) =
squared difference of the projection means total within class scatter of the projection data
or =
(m2 − m1 )2 , s21 + s22
where s21 and s22 are, respectively, within-class scatters of the projected data of each class. The sum of s21 and s22 gives the total within-class scatter for all of the projection data. Then, (m2 − m1 )2 , the numerator of J(w), can be rewritten as (m2 − m1 )2 = wT (m2 − m1 )(m2 − m1 )T w = wT SB w, where SB = (m2 − m1 )(m2 − m1 )T .
We see that SB w = (m2 − m1 )(m2 − m1 )T w = k(m2 − m1 ) is always in the direction of (m2 − m1 ). Furthermore, we do not care about the magnitude of w, only its direction. Thus, we can drop the bracketed scalar factors. Then, w ∝ S−1 w (m2 − m1 ) Figure 22 shows the probability of error in a two-class problem using the Fisher discriminant. The probability of error that would be introduced in schemes discussed previously is a problem of much concern. Let us take the two-class problem for illustration. The classifier will divide the space into two regions, R1 and R2 . The decision that x ∈ ω1 will be made when the pattern x falls into the region R1 , and x ∈ ω2 , when x falls into R2 . Under such circumstances, there will be two possible types of errors: 1. x falls in region R1 , but actually x ∈ ω2 . This gives the probability of error E1 , which may be denoted by Prob(x ∈ R1 , ω2 ). 2. x falls in region R2 , but actually x ∈ ω1 . This gives the probability of error E2 , or Prob(x ∈ R2 , ω1 ). Thus, the total probability of error is Perror = Prob(x ∈ R1 | ω2 )p(ω2 ) + Prob(x ∈ R2 | ω1 )p(ω1 ) = p(x | ω2 )p(ω2 ) dx + p(x | ω1 )p(ω1 ) dx R1
R2
where p(ωi ) is the a priori probability of class ωi , and p(x | ωi ) is the likelihood function of class ωi or the class conditional probability density function of x, given that x ∈ ωi . More explicitly, it is the probability density function for x given that the state of nature is ωi , and p(ωi | x) is the probability that x comes from ωi . This is actually the a posteriori probability. Perror is the performance criterion that we try to minimize to give a good classification. These
366
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
two integrands are plotted in Fig. 22. Figure 22 also shows Fisher’s linear discriminant which gives the optimum decision boundary. For a more detailed treatment of this subject, refer to (8). K-Nearest Neighbor Classification What we have discussed so far has been supervised learning, that is, there is a supervisor to teach the system how to classify a known set of patterns and then let the system go ahead freely to classify other patterns. In this section, we discuss unsupervised learning, in which the classification process will not depend on a priori information. As a matter of fact, quite frequently, much a priori knowledge about the patterns does not exist. The only information available is that patterns that belong to the same class share some properties in common. Therefore,
similar objects cluster together by their natural association according to some similarity measures. Euclidean distance, weighted Euclidean distance, Mahalanobis distance, and correlation between pattern vectors have been suggested as similarity measures. Many clustering algorithms have been developed. Interested readers may refer to books dedicated to pattern recognition (7,9,10). One of the algorithms (k nearest neighbor classification) (14–16) will be discussed here. Clusters may come in various forms (see Fig. 23). In previous sections, we have already discussed the classification of two clusters whose shape is shown in Fig. 23a. This is one of the most common forms that we encountered. However, many other forms such as those shown in Fig. 23b–f also appear, whose data sets have different density distributions. In some cases,
(a)
(b)
(c)
(d)
Figure 23. Patterns cluster in various forms.
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
367
(f)
(e)
Figure 23. (Continued)
there is a neck or valley between subclusters, and in other cases, the data set clusters in a chain. In these cases, we notice that the boundary surface is not clearcut and patterns that belong to different classes may interleave together. In this section and the sections that follow, we will introduce algorithms that address these problems. Hierarchical clustering based on k nearest neighbors will be discussed first. Hierarchical clustering, in general, refers to finding cluster structure of unknown patterns. Classification based on nearest neighbors is a process of classifying a pattern as a member of the class to which its nearest neighbor (or neighbors) belongs. If the membership is decided by a majority vote of the k nearest neighbors, the procedure will be called the k nearest neighbor decision rule (14–16). This is to address the cases whose distributions are not the same throughout the pattern space. The procedure consists of two stages. In the first stage, the data are pregrouped to obtain initial subclusters. In the second stage, the subclusters are merged hierarchically by using a certain similarity measure. Pregrouping of Data to Form Subclusters. First, let us determine k appropriately. For every pattern point, one can always find its k nearest neighbors. At the same time, this pattern point may also be one of the k nearest neighbors of its k nearest neighbor. According to the argument in the previous paragraph, there are two choices: (1) assign this pattern point as a member of the class to which its nearest neighbor belongs, or (2) classify its k nearest neighbor as a member of the class to which this pattern point belongs. The choice between (1) and (2) will depend on whether this pattern point or its k nearest neighbor has the higher potential to be a subcluster. For the Euclidean distance d(xi , xj ) between sample points xi and xj , we can define Pk (xi ) as the average of the distances
d(xi , xj ) over k: 1 d(xi , xj ), Pk (xi ) = k j∈ k (xi )
where k (xi ) is a set of k nearest neighbors of the sample point xi based on the Euclidean distance measure and Pk (xi ) is a potential measure for the pattern point xi to be the center of a subcluster. Obviously, the smaller the value of Pk (xi ), the higher the potential for the pattern point xi to be the center of a subcluster. In other words, all k nearest neighbors (x1 , x2 , . . . , xk ) of the pattern point xi should be clustered toward the pattern point xi (see Fig. 24). xi is said to be k-adjacent to x1 , x2 , x3 , . . . and xk ; k = 6 in this example. Figure 25 illustrates ξk (xi ), which is a set of sample points that are k-adjacent to the sample point xi . From this figure, we see that xi is a k nearest neighbor of xa , or xa is k-adjacent to xi , and that xi is a k nearest neighbor of xb , or xb is k-adjacent to xi . Therefore, ξk (xi ) = [xa , xb , . . .], which is a set of sample
x2
x1
x3 xi x6
x5
x4
Figure 24. Definitions of k (xi ). k (xi ) = [x1 , x2 , . . . , xk ], k = 6 in this example; xi is said k-adjacent to (x1 , x2 , . . . , xk ), and Pk (xi ) is the smallest in value among Pk (xj ), j = 1, 2, . . . , 6, and i.
368
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
x1 x2
x6
x3
xa xi x5
where xi is its center should subordinate to xa or to xb will depend on the strength of Pk (xa ) and Pk (xb ). Subordinate the xi and its group members to xa , when xa has the highest potential [i.e., when Pk (xa ) has the smallest value among the others in ξk (xi )] to be a cluster center of a new larger subcluster. After the pattern points are grouped in this way, one will find that several subclusters are formed. Let us count the subclusters and assign every pattern point to its nearest subcluster.
x4
xb
Merging of Subclusters. Subclusters obtained after pregrouping may exhibit two undesired forms. (1) The boundary between two neighboring subclusters may be unclear, compared to the regions near the subcluster centers (see Fig. 26a); and (2) sometimes there may be a neck between these two subclusters (see Fig. 26b). What we expect is to merge the two subclusters in (1) and leave the two subclusters as they are in (2). An algorithm follows to differentiate these cases. sc Let us define Psc k (m) and Pk (n) (see Fig. 27) as the potentials, respectively, for the subcluster m and the subcluster n. They are expressed as
Figure 25. Definition of ξk (xi ), which is a set of sample points k-adjacent to xi . Note that xa is k adjacent to xi however, xi may not be k-adjacent to xa .
1 Nm [ k (xm ) ∩ Wm ]
Psc k (m) =
=
points that are k-adjacent to xi . After knowing ξk (xi ), xi and its group members can cluster toward xa or xb , depending on which one (xa or xb ) has the higher potential to form a larger subcluster. Note that xa , xb , . . . may or may not be the k nearest neighbors of xi . Now follow the described procedure to compute ξk for every sample point x. The next step in the process is to subordinate the sample point xi to a sample point xj , which is k-adjacent to xi and at the same time, possesses the highest potential among all pattern points that are k-adjacent to xi . Then, Pk (xj ) = min Pk (xm ), xm ∈ξk (xi )
where xm is one of the pattern points that are k-adjacent to xi and Pk (xj ) is the potential measure of pattern point xj . This expression means that Pk (xj ) of the pattern point xj has the smallest value (i.e., highest potential) among Pk ’s of those points that are also k-adjacent to xi . Then, we assign xi to xj , because xj has the highest potential to be the center of a subcluster. If it has already been known that xj belongs to another subcluster, then we assign the point xi and all of its members to the same subcluster to which xj belongs. If Pk (xi ) = Pk (xj ), that means that no points in ξk (xi ) have a potential higher than Pk (xi ), and xj is the only point whose potential Pk (xj ) is equal to Pk (xi ) of xi . Then, the pattern point xi subordinates to no point because the possible center of the subcluster is still itself. So far, all of the pattern points xi , i = 1, 2, . . ., N, can be grouped to form subclusters. Next, let us check whether some of these subclusters can further be grouped together. From Fig. 25, ξk (xi ) = (xa , xb , . . .). Whether the subcluster
1 Nm
Nm
Pk (xi )
i∈[ k (xm )∩Wm ]
Pk (xi ),
i=1
and Psc k (n) =
1 Nn [ k (xn ) ∩ Wn ]
=
1 Nn
Nn
Pk (xj )
i∈[ k (xn )∩Wn ]
Pk (xj ),
j=1
where k (xm ) is a set of k nearest neighbor pattern points of the center xm of the subcluster m; Wm is the set of pattern points contained in the subcluster m; the number of these pattern points may be (and usually is) greater
(a)
(b)
Figure 26. Two different forms of pattern distribution.
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
than k. Nm is the number of points which are the k nearest neighbors of xm and are also in the subcluster m. Therefore, Psc k (m) is the average of Pk (xi ) over all of the points that are in cluster m and at the same time are the k nearest neighbors of the sub-cluster center xm . When more points in Wm are k nearest neighbor points of the cluster center xm , it means that the subcluster m has a higher density. Then, we can use this as a tightness measure for subcluster m. The tightness measure is a good measure for subcluster merging considerations. Similar definitions can be made for k (xn ), xn , Wn , and Psc k (n). Next, let us define a few more parameters: Ykm,n , Ykn,m , BPkm,n and BPn,m k , so that we can establish a condition for merging two subclusters. Ykm,n is defined as the set of pattern points x’s in subcluster m, each of which is, respectively, a k nearest neighbor of some pattern points in subcluster n. That is to say, Ykm,n represents the set of pattern points, x’s, that are in subcluster m, and those points that are k-adjacent to these pattern points x’s are in subcluster n. Take a couple of sample pattern points xi and xj from Fig. 28 as examples for illustration. xi is a pattern point in subcluster m, and xi at the same time, is a k nearest neighbor of xa , which is in the subcluster n. Similarly, xj is a pattern point in subcluster m and, at the same time, is a k nearest neighbor of xb , which is in subcluster n. Then, Ykm,n
= [x | x ∈ Wm
369
xa xi xj
xb
Cluster m Cluster n m,n
Yk
Figure 28. Definitions for Ykm,n and BPkm,n .
Pk (xi ) over all of the pattern points, x’s, that are in Ykm,n . Obviously, these x’s mentioned before are only part of the pattern points in subcluster m. Similar definitions are made for Ykn,m and BPn,m k ; Ykn,m = [x | x ∈ Wn and = BPn,m k
and ξk (x) ∩ Wn = 0],
and ξk (x) ∩ Wm = 0],
1 Pk (xi ). n,m N(Yk ) n,m xi ∈Y
k
where Wm and Wn represent, respectively, the sets of pattern points contained in subclusters m and n, and ξk (x) represents a set of points k-adjacent to the point x, as defined previously. BPkm,n can be defined as BPkm,n =
1 Pk (xi ) N(Ykm,n ) m,n xi ∈Y
k
where N(Ykm,n ) is the number of pattern points in the set represented by Ykm,n and Pk (xi ) is a potential measure of xi as defined previously. Then, BPkm,n is the average of
provide tightness information on the BPkm,n and BPn,m k boundary between these two subclusters m and n. Let us take the ratio of the tightness measure of the boundary points across the tightness measure of the subclusters (i.e., tightness of BPkm,n /tightness of subcluster) as a measure in considering the possible merging of the two subclusters. Naturally, we would like to have the boundary points more densely distributed to merge these two subclusters, that is, to have a smaller value of the BPkm,n for merging subclusters m and n because BPkm,n is computed in terms of Euclidean distance in pattern space. The smaller the computed value of BPkm,n , the tighter the boundary points. A similar argument applies to the tightness of the subclusters. The smaller the computed value of Psc k , the denser the subcluster in the distribution. Then, we choose a similarity measure (SM− 1) that is m,n proportional to the ratio of Psc k to BPk . SM− 1(m, n) ∝
xn xm
n
m x ∈ Ωk(xm) ∩ Wm
sc Figure 27. Definitions for Psc k (m) and Pk (n).
or SM− 1(m, n) =
Psc k , BPkm,n
sc min[Psc k (m), Pk (n)] . m,n max[BPk , BPn,m k ]
m,n are expressed in terms of distance Both Psc k and BPk in the pattern space. To play safe, let us use the max function in the denominator and the min function for the numerator. SM− 1(m, n) represents the difference in the tightness measure between the subclusters and the
370
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
boundary and can be used to detect the valley between two subclusters. To detect the neck between two subclusters, we use the similarity measure (SM− 2) defined as: SM− 2(m, n) =
(a)
N(Ykm,n ) + N(Ykn,m ) , 2 min[N(Wm ), N(Wn )]
where N(Ykm,n ) and N(Ykn,m ) represent, respectively, the number of points in Ykm,n and Ykn,m , as defined earlier. N(Wm ) denotes the number of points in subcluster m; and N(Wn ) represents the number of points in subcluster n. A large value of SM− 2 (m, n) signifies that the relative size of the boundary is large in comparison with the size of the subclusters. Then, the similarity measure (SM− 2) can be used to detect the ‘‘neck.’’ Combining SM− 1(m, n) and SM− 2(m, n), (b)
SM(m, n) = SM− 1(m, n)∗ SM− 2(m, n). The similarity measure SM (m, n) can be used to merge the two most similar subclusters. Figure 29 gives some results for merging two subclusters that have a neck (upper) and a valley (lower). Algorithm Based on Regions of Influence Most of the approaches discussed previously were based on a distance measure, which is effective in many applications. But difficulties would occur if this simple distance measure were employed for clustering certain types of data sets in the studies of galaxies and constellations. These data sets frequently appear with a change in the point density or with chain clusters within a set, as shown, respectively, in Fig. 30a,b, where one may
(a)
Figure 30. Examples of special types of data sets that have (a) change in point density; (b) chain clusters.
identify the clusters visually. For data like these, some other method is needed for clustering them. The method discussed in this section is based on the limited neighborhood concept (17), which originated from the visual perceptual model of clusters. Several definitions are given first. Let
4
S = [S1 , S2 , . . . , SM ]
3
and = [R1 , R2 , . . . , RM ],
(b)
1
2
Figure 29. Results obtained in merging two sub-clusters that have a neck and a valley: (a) two sub-clusters that have a neck between them; (b) two sub-clusters that have a valley between them.
where Sl and Rl , l = 1, 2, . . . , M, represent respectively, the graphs and the regions of influence; (xi , xj ) represents a graph edge that join points xi and xj . To illustrate the region of influence, two graphs are defined, the Gabriel graph and the relative neighborhood graph. The Gabriel graph (GG) is defined in terms of circular regions. Line segment (xi , xj ) is included as an edge of the GG, if no other point xk lies within or on the boundary of the circle where (xi , xj ) the diameter, as shown in Fig. 31a. Otherwise, (xi , xj ) is not included as an edge segment of the GG. Similarly, the relative neighborhood graph (RNG) is defined in terms of a lune region. The line segment (xi , xj ) is included as an edge of the RNG, if no other point xk lies
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
(a)
371
R2 (xi , xj , β) = RRNG (xi , xj ) ∪ {x : β min[d(x, xi ), d(x, xj )]
xi
< d(xi , xj ), i = j}, where β (0 < β < 1) is a factor called the relative edge consistency. Thus, S1 (β) is obtained from the GG by removing edges (xi , xj ), if d(xi , xj ) < β, min[d(xi , xa ), d(xj , xb )]
xj (b)
xi
xj Figure 31. The shapes of regions defined by (a) Gabriel graph; (b) relative neighborhood graph.
within or on the boundary of the lune where xi and xj are the two points on the circular arcs of the lunes. Then, Sl and l can be defined as (xi , xj ) ∈ Sl
iff xk ∈ / l (xi , xj ), ∀ k = 1, 2, . . . , n;
k = i = j,
Rl (xi , xj ) = {x : f [d(x, xi ), d(x, xj )] < d(xi , xj ); i = j}, where f [d(x, xi ), d(x, xj ) is an appropriately defined function. From these definitions, it can be seen that Sl defines a limited neighborhood set. When max [d(x, xi ), d(x, xj )] is chosen for the function f [d(x, xi ), d(x, xj )] in the previous equation (i.e., find the maximum between d(x, xi ) and d(x, xj ), and use it for the f function), we obtain RRNG (xi , xj ) = {x : max[d(x, xi ), d(x, xj )] < d(xi , xj ); i = j}, where RRNG (xi , xj ) represents the RNG region of influence. It can be seen in Fig. 31b that upper arc of the lune is drawn so that xj is the center and (xi , xj ) is the radius, and the lower arc is drawn so that xi is the center and (xi , xj ) as radius. When d2 (x, xi ) + d2 (x, xj ) is used for f [d(x, xi ), d(x, xj )], RGG (xi , xj ) = {x : d2 (x, xi ) + d2 (x, xj ) < d2 (xi , xj ); i = j}, where RGG (xi , xj ) represents the GG region of influence. The definition of Rl will determine the property of Sl . If Rl ⊆ RGG , the edges of Sl will not intersect. But if Rl ⊃ RGG , intersecting edges are allowed. Take an example to illustrate the case when Rl ⊃ RGG . Assume that regions of influence are defined as R1 (xi , xj , β) = RGG (xi , xj ) ∪ {x : β min[d(x, xi ), d(x, xj )] < d(xi , xj ), i = j},
where xa ( = xj ) denotes the nearest Gabriel neighbor to xi , and xb ( = xi ) denotes the nearest Gabriel neighbor to xj . Figure 32 illustrates the effect of β on the region of influence. Then, it is clear that varying β would control the fragmentation of the data set and hence would give a sequence of nested clusterings. Increasing β would break the data set into a greater number of smaller clusters. The examples of two-dimensional dot patterns shown in Fig. 33 demonstrate the effectiveness of this clustering method. See (17) for supplementary reading.
MULTILAYER PERCEPTRON FOR PATTERN RECOGNITION What we have presented are the relatively simple pattern recognition problems described in the previous sections. For problems in real-world environments, we desire to have a much more powerful solution method. This motivates us to seek help from artificial neural networks. About 10 years ago, there were two hardware products on artificial neural networks on the market. One of them is called ETANN, manufactured by Intel that has 64 neurons; and the other is called ‘‘neural logic,’’ manufactured by Neural Logic Inc. Two functions of artificial neural networks are attractive: (1) the associative property, the ability to recall; and (2) self-organizing, the ability to learn through organizing and reorganizing in response to external stimuli. Such human-like performance will, no doubt, require an enormous amount of processing. To obtain the required processing capability, an effective approach needs to be developed for the dense interconnection of a large number of simple processing elements, an effective scheme for achieving high computation rate is required; and some hypothesis would need to be established. A multilayer perceptron is one of the models developed under the name ‘‘neural networks’’ which is popular in applications. Many other models have been proposed. Due to space limitations let us discuss only the multilayer perceptron. There are many books on artificial neural networks. Readers interested in this subject can find them in a library (12–13, 18–23). The powerful capability of the multilayer perceptron comes from the following network arrangements: (1) one or more layers of hidden neurons are used; (2) a smooth nonlinearity, for example sigmoidal nonlinearity is employed at the output end of each artificial neuron; and (3) a high degree of connectivity in the
372
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
x
x xi
d(x, xi) xi
x xi
d(x, xj)
xj
xj
R1(xi, xj, β)
R1(xi, xj, β)
β<1
xj
xb
R1(xi, xj, β)
β=1
β>1
R1(xi, xj, β) = RGG(xi, xj) ∪ {x: β min[d(x, xj), d(x, xj)] < d(xi, xj), i ≠ j} Figure 32. Drawing illustrating the effect of β.
network. These three distinctive characteristics enable a multilayer perceptron to learn complex tasks by extracting more meaningful features from the input patterns. Naturally, training such a system is much more complicated. Figure 34 shows a three-layer perceptron that has N inputs and M outputs. Between the inputs and outputs are two hidden layers. Let yl , l = 1, 2, . . . , M, be the outputs of the multilayer perceptron, and xj and xk are the outputs of the nodes in the first and second hidden layers. θj , θk and θl are the internal thresholds (not shown in the figure). Wji , i = 1, 2, . . . , N, j = 1, 2, . . . , N1 , are the connection weights from the input to the first hidden layer. Similarly, Wkj , j = 1, 2, . . . , N1 , k = 1, 2, . . . , N2 , and Wlk , k = 1, 2, . . . , N2 , l = 1, 2, . . . , M, are, respectively, the connection weights between the first and second hidden layers and between the second hidden layer and the output layer. They are to be adjusted during training. The outputs of the first hidden layer are computed according to xj
=f
N
Wji xi − θj ,
j = 1, 2, . . . , N1 .
i=1
Those of the second hidden layer are computed from: xk
N1 =f Wkj xj − θk ,
k = 1, 2, . . . , N2 ,
j=1
and the outputs of the output layer are yl = f
N 2 k=1
wlk xk
−
θl
,
l = 1, 2, . . . , M.
The decision rule is to select the class that corresponds to the output node that has the largest value. The function f’s can be a hard limiter, a threshold logic, or a sigmoid logistic nonlinearity which is 1/[1 + e−(α−θ ) ]. Training the Multilayer Perceptron: Backpropagation Algorithm. The backpropagation algorithm is a generalization of the least-mean-square (LMS) algorithm. It uses an iterative gradient technique to minimize the mean-square error between the desired output and the actual output of a multilayer feedforward perceptron. The training procedure is initialized by selecting small random weights and internal thresholds. The training data are repeatedly presented to the net, and weights are adjusted in the order from Wlk to Wji in the backward direction after each trial, until they are stabilized. At the same time, the cost function (which is the mean-square error mentioned before) is reduced to an acceptable value. Essentially the training procedure is as follows: The feedforward network calculates the difference between the actual outputs and the desired outputs. Using this error assessment, weights are adjusted in proportion to the local error gradient. To do this for an individual neuron, we need values of its input, output, and the desired output. This is straightforward for a single-layer perceptron. But for a hidden neuron in a multilayer perceptron, it is difficult to know how much input is coming into that node and what is its desired output. To solve this problem, we may consider the synapses backward to see how strong that particular synapse is. The local error gradient produced by each hidden neuron is a weighted sum of the local error gradient of neurons in the successive layer. The whole training process involves two phases, a forward phase and a backward phase. In the forward phase, the error is computed, and in the backward phase, we proceed to
FEATURE RECOGNITION AND OBJECT CLASSIFICATION
(a)
373
(b)
(c)
Figure 33. Effect of changing β on the clustering results. (a) data set; (b) when β is small; (c) when β is larger. x′1
x1 x2
For details of the development of the backpropagation training algorithm, see (8,12,13,18–19,23).
x′′j
x′2
xi
x′j wji
wkj
wlk
yj
yM
xN-3 xN
x′′K
ABBREVIATIONS AND ACRONYMS Output
Input
yi
′ xN 1 First hidden layer
x′′N
AI CPU MSS SAR k-NN SM RNG LMS
artificial intelligence central processing unit multi-spectral scanner synthetic aperture radar k-nearest neighbor similarity measure relative neighborhood graph least mean square
2
Second hidden layer
Figure 34. Three-layer perceptron.
modify the weights to decrease the error. This two-phase sequence runs through for every training pattern until all training patterns are correctly classified. It can be seen that a huge amount of computation is needed for training.
BIBLIOGRAPHY 1. S. T. Bow, R. T. Yu, and B. Zhou, in V. Cappelline and R. Marconi, eds., Advances in Image Processing and Pattern Recognition, North Holland Amsterdam, New York, 1986, pp. 21–25. 2. B. Yu and B. Yuan Pattern Recognition, 26(6) 883–889 (1993). 3. O. D. Trier, A. K. Jain, Patter Recognition, 29(4) 641–661 (1996).
374
FIELD EMISSION DISPLAY PANELS
4. S. Ghosal and R. Mehrotra, IEEE Transs Image Proce. 6(6), 781–794, (1997). 5. J. A. Ullman, Pattern Recognition techniques, Crane, Russak, N.Y. 1973. 6. K. S. Fu, Syntactic Pattern Recognition and Applications, Prentice-Hall, Englewood Cliffs NJ, 1982. 7. S. T. Bow, Pattern Recognition and Image Preprocessing, Marcel Dekker, New York, Basel, 1992. 8. S. T. Bow, Pattern Recognition and Image Preprocessing, (revised and expanded) Marcel Dekker, New York, Basel, 2001. 9. S. Theodoridis and K. Koutroumbas, Pattern Recognition, Academic Press, Boston, 1999. 10. J. T. Tou and R. C. Gonzalez, Pattern Recognition Principles, Addison-Wesley, Reading, MA., 1972. 11. F. Fukunaga, Introduction to Statistical Pattern Recognition, academic Press, Boston, 1990. 12. R. Schalkoff, Pattern Recognition (statistical, structural and neural approaches), J. Wiley, NY, 1992. 13. C. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, Oxford, 1990. 14. R. Mizoguchi and O. Kakusho, Proc. Int. Conf. Pattern Recognition, Japan, 1978, 314–319. 15. A. Djouade and E. Bouktache, IEEE Trans. Pattern Anal. Mach. Intelligence 29(3), 277–282, (1997). 16. T. Hastie and R. Tibshhirani, IEEE Trans. Pattern Analy. Mach. Intelligence 18(6), 607–616. 17. R. Urquhart, Pattern Recognition, 15(3), 173–187, (1982). 18. A. Pandyo and R. Macy, Pattern recognition with Neural Networks in C++, CRC Press. Boca Raton, 1995. 19. L. Fausett, Fundamentals of Neural Networks (Architectures, algorithms, and applications), Prentice-Hall 1994. 20. S. I. Gallant, IEEE Trans. Neural Networks, 1(2), 179–191, (1990). 21. E. M. Johnson, F. U. Dowla, and D. M. Goodman, Int. J. Neural syst., 2(4) 291–301, (1992). 22. V. Nedeljkovic, IEEE Trans. Neural Networks, 4(4), 650–659, (1993). 23. J. M. Zurada, Introduction to Artificial Neural Systems, West Publishing Co. St Paul, 1992.
FIELD EMISSION DISPLAY PANELS ROBERT J. HANSON Micron Technology, Inc. Boise, ID
DAVID R. HETZER Timber Technologies, Inc. Freemont, CA
DUANE F. RICHTER Metron Technology, Inc. Boise, Idaho
SCOTT W. STRAKA Seattle, WA
INTRODUCTION The flat panel display (FPD) market grew significantly throughout the 1990s, and the liquid crystal display (LCD)
had the majority of the market share. However, the world monitor market has been, and still is, dominated by the cathode-ray tube (CRT) (1–3). A relatively new kind of FPD called a field emission display (FED) combines the wide viewing angle and brightness of the CRT and the linearity and thinness of the LCD. FEDs exhibit characteristics that allow FEDs to directly compete with both the CRT and LCD. FEDs are smaller, lighter, and consume less power than CRTs. They also can overcome the fundamental performance shortcomings of LCDs such as limited temperature range of operation, limited viewing angle, and slow response to fast-motion video. FEDs may be used in desktop applications, original equipment manufacturer’s (OEM) systems, and medical equipment, where the CRT has dominated for many years. They may also be used in markets such as laptop computers and camcorders that are traditionally dominated by LCDs. It is projected that the world FPD market will reach $35 billion in 2005 (4). As improvements are made by research and development and manufacturing costs are reduced, FEDs have the potential to account for a substantial percentage of the world FPD and desktop market. It is estimated that the high volume manufacturing costs of FEDs will be approximately 30% less than those of active matrix LCDs (AMLCDs) due to fewer process steps and because FEDs do not require a backlight, polarizers, or color filters. The major components that comprise a field emission display are the cathode panel (also referred to as a backplate or baseplate), the emitter tips, the anode panel (also referred to as a faceplate, phosphor screen, or cathodoluminescent screen), the spacers, and the electronic drivers (external circuitry). The emitter tips on the cathode panel provide the electron beams, the anode panel contains the phosphor that gives off light, the spacers prevent the anode and cathode panels from imploding into the vacuum space between them, and the drivers are responsible for controlling the images displayed. Like CRTs, FEDs are cathodoluminescent (CL) devices that involve the emission of electrons from a cathode and the acceleration of the electrons toward an anode where they impinge on phosphors to generate light. In a CRT, one to three electron beams are rastered across the screen and require enough depth to deflect the beam fully. On the other hand, a FED has millions of individual electron sources directly behind each pixel that eliminate the need to raster the beam; therefore, the display can be substantially thin as well as flat. The cathode generally consists of millions of small coneshaped electron emitters, and each picture element (pixel) has a few hundred to several thousand cones. Applying a potential difference between the cathodes and the gate generates cold cathode electrons. Figure 1 shows an array of cathode electron emitters whose tips generate electron beams. The electrons are accelerated to the anode, which is at a much higher potential than the gate. Some FED producers also incorporate a focusing grid between the gate and the anode to enhance the kinetic energy and trajectories of the
FIELD EMISSION DISPLAY PANELS
++
Black matrix
375
Transparent conductor
Light Faceplate
+ −
Extraction grid
Electrons
Spacer
Spacer
Phosphor particles
Insulator Emitter
Baseplate
Emitter electrodes
electrons toward the phosphors (the concept is the same as the focusing plate used in CRTs). Figure 1 also shows the anode panel, which contains the phosphor screen that gives off light when a phosphor is struck by electrons. The light generated by the phosphors is transmitted through the transparent glass substrate and the end user sees the images that the light produces. FED fabrication involves processing the anode and cathode panels individually and incorporating standoffs, called spacers, on one of the panels. As a rule of thumb, FEDs whose diagonals are larger than approximately 10 cm require spacers to prevent the panels from imploding into the vacuum that separates them. Figure 1 shows how spacers are arranged in the matrix so that they do not block the path of electrons from the cathode to the anode and do not interfere with the optical performance of the display. Once both of the panels have been completely processed and spacers have been incorporated onto a panel, the panels are brought close together and are hermetically sealed together so that a vacuum is in the space between the panels. The vacuum is obtained either by sealing the panels together in a vacuum chamber or by using a tubulated seal. Tubulated seals allow evacuating the display outside a chamber, and the tube is pinched off when the desired pressure has been reached. The final steps to create a fully functional FED include attaching the external driving circuitry to the panels and testing the display. The testing process evaluates the optical and electrical functionality of the display to determine if it meets specifications. After the display has been tested, it is assembled into a package that protects it from moisture and physical damage. BACKGROUND FEDs are based on the phenomena of cathodoluminescence and cold-cathode field emission. Cathodoluminescence is a particular type of luminescence in which a beam of electrons strikes a phosphor and light is given off. Luminescence was first written about thousands of years ago in the Chinese Shih Ching (Book of Odes). In 1603, Cascariolo made the first recorded synthesis and observation of luminescence in solids by heating a mixture of barium sulfate and coal. The study of cathodoluminescence burgeoned in 1897 when Braun made the first oscilloscope, the invention of the CRT. The phenomenon of extracting electrons from cold cathodes due to intense electric fields has been described
Figure 1. Cross sectional view of a FED, including baseplate and cathodes, faceplate and phosphor and transparent conductor, spacers, and external circuitry that has the voltage sign convention.
by an approximate theory developed by Schottky (5). The theoretical exposition and correlation with experimental results was greatly improved by Fowler and Nordheim in their now famous article in 1928 (6). Their derivation has led to what is now called the Fowler–Nordheim (F–N) equation and allows for the theoretical treatment of cold cathodes, thereby facilitating characterization and optimization of cold-cathode electron emission devices. The early development of fabrication strategies for integrated circuit (IC) manufacturing in the 1950s and 1960s made it feasible to produce FED anodes and cathodes by processing techniques used in thinfilm technology and photolithography. This allows for a high density of cathodes per pixel array, process controllability, process cleanliness and purity, and costeffective manufacture. The field emitter array (FEA) based on microfabricated field emission sources was introduced in 1961 in a paper by Shoulders (7). The concept of a full-video rate color FED was first presented by Crost et al. in a U.S. patent applied for in 1967 (8). In 1968, Charles A. ‘‘Capp’’ Spindt at the Stanford Research Institute (SRI) had the idea of fabricating a flat display using microscopic molybdenum (Mo) cones as cold-cathode emitters of electrons to illuminate CRT-like phosphors on an anode/faceplate. His basic cathode idea became the focus of nearly all of the ensuing work on field emission displays (9). Since then, considerable effort has been put into research and development of FEDs at academic and industrial levels. Shortly thereafter, Ivor Brodie and others were funded by several U.S. Government agencies to continue development (10). In 1972, the SRI group had demonstrated to their satisfaction that it was feasible to manufacture FEDs (11). Molybdenum Microtips Versus Silicon Emitters A different approach to FEDs was to use a p–n junction in a semiconductor device as a ‘‘cold’’ electron source. In 1969, Williams and Simon published a paper on generating field emission from forward-biased silicon p–n junctions. This work set off a series of experiments using silicon rather than molybdenum as a cold cathode. In 1981, Henry F. Gray and George J. Campisi of the Naval Research Laboratory (NRL) patented a method for fabricating an array of silicon microtips (12). This process involved thermal oxidation of silicon followed by patterning of the oxide and selective etching to form silicon tips. The silicon tips could act as cold cathodes for a FED.
376
FIELD EMISSION DISPLAY PANELS
In December 1986 at the International Electron Devices Meeting, Gray, Campisi, and Richard Greene demonstrated a silicon field emitter array (FEA) that was fabricated on a five-inch diameter silicon wafer (13). At this time, Gray and his colleagues were more interested in demonstrating the benefits of using FEAs as switching devices (like transistors) or amplifiers than as displays, but the possibility of using them for displays was held out as an additional potential benefit. The arguments that Gray made in favor of FEAs was that they might permit faster switching speeds than semiconductors and could be made in either a vertical or a planar structure. They could be constructed using a variety of materials: silicon, molybdenum, nickel, platinum, iridium, gallium arsenide, and conducting carbides. The devices did not require an extreme vacuum to operate, but there were problems of achieving uniformity in FEAs that had to be overcome before they could be used in commercial displays (14). In 1985, when further U.S. research funding was unavailable, the technology development moved to the Laboratoire d’Electronique de Technologie et de l’Informatique (LETI), a research arm of the French Atomic Energy Commission, in Grenoble, France. This is where a group lead by Meyer demonstrated their FPD using Spindt type cathodes, and the field began to sprout (15). This is when the basic technologies were developed and then licensed to a French company called PixTech. They used these technologies to develop their commercialization strategy of building multiple partnerships for manufacturing. This allowed the company to leverage the strengths of its partnerships. Other companies like Raytheon, Futaba, Silicon Video Corporation (later known as Candescent Technologies), Motorola, and Micron Display all developed their own FED programs; all were based on Spindt’s basic idea of an emissive cathode. Of these companies, only Candescent and Micron were not partnered with PixTech in an alliance established between 1993 and 1995. PixTech Jean-Luc Grand Clement put together a short proposal for venture capital (VC) financing of a new display company called PixTech S.F. Advent International became one of his earliest investors. PixTech would be able to produce FEDs using the LETI patents. PixTech S.F. raised capital in incremental amounts, starting at $3 million from Advent in late 1991 and then rising to $10, $22, and $35 million from other investors. More than half of the funds came from U.S. investors. PixTech, Incorporated, was formed as a U.S. corporation in June 1992. Using these funds, Grand Clement purchased exclusive rights to all 16 of the patents that were owned by LETI in early 1992. Then, he convinced Raytheon to become a partner of PixTech so that it would have access to LETI’s patents. Motorola, Texas Instruments (TI), and Futaba also joined the PixTech alliance for the same reason. In March 1996, TI abandoned its FED efforts, and the agreement was terminated. PixTech set up a research and development facility in Montpellier, France, which was leased from IBM
France. They used off-the-shelf equipment in their pilot production plant along with a small amount of FED custom equipment. The plant used a conventional semiconductor photolithography process and a MRS stepper. PixTech started initial production of FEDs in November 1995. These displays were monochrome green, 5.2-inch diagonal, 1/4 VGA (video graphics array) displays that had 70 foot lamberts (fL) of brightness. PixTech opened a small R&D facility in Santa Clara, California, in February 1996. Production yields in the first half of 1996 were erratic. There were residual problems with lithography and sealing equipment. PixTech’s approach involved the use of high-voltage drivers for the cathode, combined with low anode voltages. The high cathode voltages created problems of arcing and expensive drivers, and the low anode voltages meant that PixTech FEDs could not be as bright as CRTs (which relied on high-voltage phosphors) unless better low-voltage phosphors could be developed. The first markets that PixTech targeted were avionics, medical devices, and motor vehicles. These markets were interested in smaller displays in the 4 to 8-inch range (diagonal measurement). Customers in these markets needed very bright displays that were readable from wider angles than were available from LCDs. In November 1996, PixTech announced that it had concluded a memorandum of understanding with Unipac Corporation in Taiwan to use Unipac’s high-volume thinfilm transistor (TFT) LCD manufacturing facility to produce FEDs on a contract basis. PixTech was given the ‘‘Display of the Year’’ award from Information Display magazine for its 5.2-in. monochrome display in 1997. In April 1997, PixTech announced that it would lead a consortium of European companies to develop FED technology under a $3 million grant from the European Union. They would work with SAES Getters and RhonePoulenc to develop getters and low-voltage phosphors, respectively. PixTech also received a zero-interest loan of $2 million from the French government to develop FEDs for multimedia applications. In the second quarter of 1998, PixTech received its first high-volume commercial order from Zoll Medical Corporation. Zoll Medical introduced a portable defibrillator using PixTech’s FEDs. This was the first commercial order for the field emission display. During this same period, the first FEDs came off the production line at Unipac. In May 1999, PixTech closed a deal to purchase Micron Display. As a result, Micron Technology would own 30% of PixTech’s equity and an unspecified amount of liabilities. PixTech said that it would use the Micron Display facility in Boise, Idaho, for the codevelopment of 15-inch displays with a major unnamed Japanese company. Texas Instruments Texas Instruments, along with Raytheon and Futaba, was one of the three initial partners in the PixTech alliance. TI’s initial interest in FEDs stemmed from its desire to have an alternative to TFT LCDs for notebook computers. In 1990, TI was paying around $1,500 per display for TFT LCDs. When PixTech board members suggested that equivalent FEDs would eventually cost
FIELD EMISSION DISPLAY PANELS
around $500 per unit, TI became interested in FEDs. The FED project at TI was put under the control of the analog IC division of the firm that was responsible for digital signal processors (DSPs). TI built an R&D laboratory, and PixTech transferred its proprietary technology to the lab, but the lab had difficulties using the PixTech technologies to build FED prototypes. Futaba Futaba had experience with manufacturing monochrome vacuum fluorescent displays (VFDs) for small devices, such as watches, and was interested in scaling up its work on VFDs. VFDs use low-voltage phosphors, so the fit with PixTech’s FED technology was good. Futaba was also working on a 5.2-inch diagonal, 1/4 VGA display. Production began in November 1995, and the first sample units were sold in December of that same year. Futaba demonstrated prototypes of color FEDs at a number of international conferences after 1997 but had not introduced any of these displays to the marketplace as of mid-1999. Motorola In 1993, Motorola set up a corporate R&D facility to research FEDs. In 1995, the company set up the Flat Panel Display Division of the Automotive Energy Components Sector (AECS). A 6-inch glass pilot production line was built in Chandler, Arizona, in 1996. Later that year, construction of a second-generation 275,000 square foot facility was begun in Tempe, Arizona. Construction was completed in 1998. The new plant was capable of producing up to 10,000 units per month. It was designed to produce multiple displays on 370 × 470 mm glass substrates, a standard size for second-generation TFT LCD plants. Problems of ramping up production occurred soon after the plant was completed. The main problems had to do with sealing the glass tubes, selecting the correct spacers to obtain uniform distances between the anode and the cathode, and discovering the right getters for dealing with residual gases in the cavity. In May 1999, Motorola said that it would delay the ramp up of its FED production facility to solve some basic technological problems, including the limited lifetime of color displays. They announced a scale-back in staff and now consist of a small research team. Micron Display (Coloray) Commtech International sold the 14 SRI patents to Coloray Display Corporation of Fremont, California, in 1989. Shortly thereafter, Micron Display, A Division of Micron Technology, Inc., was formed and purchased a small stake in Coloray. However, this was not enough to keep Coloray from filing for bankruptcy under Chapter 11 in 1992. Micron Display developed a new microtip manufacturing technology for FEDs involving chemical mechanical polishing (CMP). CMP makes it possible to manufacture cold-cathode arrays precisely and reliably without the need for lithography at certain process modules. The process begins with the formation of the tips, which are then covered by a dielectric layer, which in
377
turn is covered by the conductive film that will constitute the FEDs gate structure. The CMP process removes the raised material much faster than the surface material. As the dielectric and conductive layers are polished, the surface becomes flat. A wet chemical etch is used to remove the insulator surrounding the tips (16). This process would allow Micron Display to scale up to displays as large as 17 inches uniformly. After several years of struggling, Micron Technology decided to sell its Display Division to PixTech in exchange for an equity stake in the firm. Candescent In 1991, Robert Pressley founded a company called Silicon Video Corporation (SVC), which was renamed Candescent Technologies in 1996. Initial work was started on the basis of a DARPA contract to develop a hot-cathode thin-film display. Shortly after, it was determined that hotcathode devices would not compete with LCDs in power consumption, and work was switched to cold-cathode efforts. Additional fund-raising efforts continued along with gradual technological success. In May 1994, SVC and Advanced Technology Materials, Inc. (ATMI) received a $1.1 million contract award from DARPA focusing on building FEDs using diamond microtips. It was hoped that depositing thin films of diamond materials on cold cathodes would lower power consumption and hence the cost of building and operating FEDs (17). Difficulties in working with diamond materials stalled advancements. Early in 1994, Hewlett-Packard and Compaq Computers decided to take equity positions in SVC. Several other firms followed, including Oppenheimer Management Corporation, Sierra Ventures, Wyse Technology, Capform, and ATMI (18). In September 1994, SVC purchased a 4-inch wafer fabrication facility in San Jose, California, and in October SVC received a $22 million grant from a U.S Technology Reinvestment Program (TRP) to develop its FEDs further. In March 1995, the first prototypes were actually produced using ‘‘sealed tubes’’. In June 1995, the first ‘‘sealed tube’’ prototype came off the line. SVC first showed its 3-inch prototype display publicly at Display Works in San Jose in January 1996. Shortly thereafter, SVC (now called Candescent) upgraded to 320 × 340 mm display tools that can produce 5-inch prototypes, and the first units came off this line in February 1998 (19). In November 1998, ‘‘Candescent and Sony Corporation announced plans to jointly develop high-voltage Field Emission Display (FED) technology’’ (20). Candescent also has strong partnerships with some of the computer industry’s dominant players. They have an alliance with Hewlett-Packard Company as both a major customer partner and a development partner; Schott Corporation, a leading maker of special glass; and Advanced Technology Materials Inc. (ATMI), thin-film technology specialists (21). As of 1 May 1999, Candescent had received more than $410 million in funding from investing strategic partners, venture capital firms, institutional investors, U.S. government-sponsored organizations, and capital
378
FIELD EMISSION DISPLAY PANELS
equipment leasing firms. They are committed to deliver full-motion, true color video image quality, low-cost flatpanel displays to the marketplace. Candescent intends to become a major supplier of flat-panel displays for notebook and desktop computers, communications, and consumer products (19). THEORY To understand how a FED functions as an information display device, it is necessary to know the physical, electrical, and optical properties of its major components: the cathode and anode panels, the spacers, and the external driving circuitry. This requires defining each component sufficiently and explaining the theoretical foundations of each component. Once the functionalities of these components are understood, the application of these components to fabricate a display device of the field emission type is easily realized. This section outlines the theoretical treatment of the major components of a FED so that the reader reasonably understands them. This is only meant to be a short summary; the reader is urged to refer to the literature for complete derivations and in-depth theory development. The Cathode A cathode is a negatively charged electrode; hence, the FED panel that houses the electron source of the display is often called the cathode panel. FED cathode panels are usually fabricated so that the cathodes are cone shaped and have substantially sharp tips (called emitter tips because electrons are emitted from the tip surface) in a matrix-addressable array. FED cathode materials must be chosen for their processability, process repeatability and robustness, physical and electrical properties, availability, and cost. There has been significant research and development on a number of cold-cathode materials, including tungsten (22), molybdenum (23), carbon (24,25), silicon (26), nickel, and chromium, to name a few. Continued research is being conducted on small and negative electron affinity (NEA) materials that allow lower gate voltages necessary to induce field emission (27). A potential as low as 50 to 500 volts is typically applied between the electron extraction gate and the cathode cones, resulting in an electric field of the order of more than 106 V cm−1 at the tips and current densities greater than 10 A cm−2 (more than 100 µA per emitter) (23). The emission of electrons from a material under the influence of a strong electric field is called field emission (28). Field emission from materials at low temperatures, including the order of room temperature, is termed cold-cathode field emission. Field emission occurs when the electric field at the surface of the emitter, induced by the voltage difference between the gate and cathode, thins the potential barrier in the tip to the point where electron tunneling occurs (10). Field emission is a tunneling process that is described by theories in quantum mechanics and requires the solution of Schrodinger’s wave equation in one dimension to arrive at the well-known and
generally accepted Fowler–Nordheim equation (6,29): −8π(2m)1/2 φ 3/2 e3 E2 exp ν(y) , J= 8π hφt2 (y) 3heE
(1)
where J is the current density (A cm−2 ), e is the elementary charge in C, E is the electric field strength in V cm−1 , h is Planck’s constant in J s, φ is the work function of the electron emitting material in eV, and m is the electron rest mass in kg. The t2 (y) and v(y) terms are image potential correction terms (also called Nordheim’s elliptic functions) and are approximated as t2 (y) = 1.1
(2)
ν(y) = 0.95 − y2 ,
(3)
and
where y is given by y=
φ , φ
(4)
where φ is the barrier lowering of the work function of the cathode material given in terms of e, E, and the permittivity of vacuum ε0 : φ =
eE 4π ε0
1/2 .
(5)
Substituting for the constants e, ε0 , π , h, m, and Eqs. (2), (3), (4), and (5) in (1) yields 10.4 −6.44(10)7 φ 3/2 exp . J = 1.42(10) exp φ 1/2 E (6) The F–N equation shows that the current density is a function only of the electric field strength for a constant work function. In practice, however, the field emission current I and the potential difference V between the cathode and gate are much easier to measure than J and E. This requires relating the F–N equation numerically to measurable quantities so that parameters in real-life cold-cathode devices may be quantified and optimized. Therefore, it is necessary to relate I and V to the F–N parameters J and E, so that the observed phenomenon of cold-cathode field emission may be compared to theoretical predictions. This is facilitated by setting
−6
E2 φ
I = JA
(7)
E = βV,
(8)
and
where A is the emitting surface area (cm2 ) and β is the field emitter geometric factor or field enhancement factor (cm−1 ). Due to the sharpness of the tip, a phenomenon known as field enhancement allows a strong electric field
FIELD EMISSION DISPLAY PANELS
to form at a low applied voltage. Substituting Eqs. (7) and (8) in (6) yields 10.4 β2V2A exp φ φ 1/2 −6.44(10)7 φ 3/2 . × exp βV
I = 1.42(10)−6
(9)
By plotting log(I/V 2 ) on the abscissa and 1/V on the ordinate, a negative sloping linear plot, known as a Fowler–Nordheim plot, is obtained. A typical F–N plot is shown in Fig. 2. The bent line in Fig. 3 shows how the impact of the electric field changes the barrier height between a metal and vacuum in the energy band diagram. The difference in the vacuum level Ev and the Fermi level Ef is the barrier height and is called the work function of the metal. The difference between Ev and the top of the field-modified band structure is the barrier lowering given by Eq. (5). The difference between the maximum height of the fieldmodified band structure and Ef gives an effective work function and is a direct result of the applied electric field. The change in the barrier due to the applied electric field results in electron tunneling. This result tells us that to have a constant current I for a fixed potential difference V between the cathode and the gate, the variables φ, β, and A must all remain constant across time (this may also be done if they vary to offset variations in each other, but this is not desired). Therefore,
Fowler-Nordheim plot
−8
log10(I/V2)
−9 −10 −11 −12 −13 −14 0.01
0.015
0.02 1/V
0.025
0.03
Figure 2. A Fowler–Nordheim plot.
Ev
No field Barrier height Electron tunneling
Ef
Metal With field
Figure 3. Metal–vacuum energy band diagram showing barrier height without an applied electrical field and with an applied electrical field.
379
it is necessary to determine which phenomena give rise to change in each of these variables and to prevent the changes. Changes in the work function φ of the emitter are attributed primarily to the adsorption of chemically active gas species such as oxygen, hydrogen, carbon dioxide, and carbon monoxide. Special measures must be taken to ensure that gases like these are not able to contaminate the vacuum area of the display. The anode may also be a source of these contaminants, and various attempts have been made to reduce the amount of contaminants originating from the anode. The geometric factor β generally changes due to variations in the surface roughness of the emitter or when the radius of the tip is changed. The former is influenced by ion bombardment, in which ions impinge on the cathode surface and sputter the cathode, resulting in atomic-scale surface roughening. These ions either are generated in a gas-phase electron ionization process or by electron-stimulated desorption of gases/ions from the anode. It has been shown that temperature changes change the emitter radius. Theories for the temperature-dependent change in the emitter radius of field emission cathodes have been developed and include the effect on emitter radius in the presence of an electric field (30). Variations in emitter radius due to temperature are typically neglected because they are small for the typical operating temperatures of FEDs. The electron emitting area A at the surface of the cathode plays a smaller role in the current than φ and β, as can be seen by inspecting the F–N equation [Eq. (9)]. A is not included in the exponential term of the equation and thus, has an obvious lesser effect on the emitter current. It can also be seen that the slope of the F–N plot remains constant and A cannot change if φ and β are held constant. However, if either φ or β change, A is likely to change. Various techniques may be used to quantify A to an order of magnitude approximation (23). Field emission currents can vary from tip to tip even though the same potential difference is applied between the cathode and their respective extraction grids. These variations are generally attributed to localized and global variations in φ, β, and A across the pixel arrays and the cathode panel, respectively. Variations in emission current can result in optical nonuniformities and reduce image quality. These image variations can be greatly reduced by incorporating extremely large numbers of emitter tips per pixel array (100–5,000) aided by electrical compensation techniques such as the use of a resistive ballast layer under the tips or thin-film transistor (TFT) switches to eliminate current runaway (31). After evaluating the variables that affect field emission current and the phenomena that change these variables, it is obvious that the pressure in the vacuum space of the sealed display must be as low as possible. As a rule of thumb, the lower the vacuum pressure in a sealed FED, the better the stability of the cathode tips, and the longer the life of the display. There will always be some degree of outgassing when gases are desorbed from surfaces within the device during
380
FIELD EMISSION DISPLAY PANELS
operation. Outgassing is combated by using a gettering material such as barium (32) or silicon (33), among others, to capture residual gases located inside the vacuum area of the sealed display. Generally, the final pressure in a sealed FED should be less than 10−6 torr (1.3 × 10−7 kPa) to meet the vacuum requirements of stable cathode operation. The Anode Images produced by a FED are formed in exactly the same way as produced by the CRT by a process called cathodoluminescence. Electrons generated by the cathode panel are accelerated toward the anode by applying a higher voltage to the anode (1–10 kV), relative to the cathode; hence, the name anode panel. The electrons bombard the phosphor screen that is located on the anode panel, and the result is light generation (wavelengths in the range of 400–700 nm). Light generation in cathodoluminescence is due to the transfer of electron energy to the phosphors and the subsequent energy transfer and relaxation mechanisms within the phosphors. Certain energy relaxation processes result in energy given off by the phosphor as visible light. Low voltage (1–10 kV) phosphor technology is one of the most important developmental areas in FED research (34). The promise of FEDs to provide a very bright, long lasting, fast responding, and low power consuming display is highly dependent on the phosphor screen. Traditional CL phosphors have been developed so that the CRT is most efficient at high voltages in the range of 10–35 kV. Low-voltage phosphors must be developed that exhibit the desired chromaticity coordinates (give off light in the color spectrum where the human eye is most sensitive), high luminance efficiency, and long operating lifetime (>10,000 hours). In 1931, the Commission Internationale de l’Eclairage (CIE) standard for color characterization was developed, the commonly accepted chromaticity diagram provides a set of normalized values for comparing colors of differing wavelengths. This standard is commonly used to characterize the visible emission spectrum of phosphors. The luminous efficiency ε in units of lumens per watt (lm/W) is defined as the energy of visible light emitted per amount of input power and is given by (34) ε=
π LA , P
(10)
where L is the luminance in cd m−2 , A is the area of the incident electron beam spot in m2 , P is the power of the electron beam in watts, and the factor of π is used because the emission of the phosphor is lambertian. Equation (10) is used to evaluate the potential of CL phosphors for use in FEDs. The efficiency and chromaticity coordinates of all phosphors degrade over time due to burning and coulombic aging. At very high loading, phosphors may suffer permanent damage called burning, which may result in a burnt image on the display. Coulombic aging is the gradual degradation in phosphor brightness due to the operation of the display. The reduction in phosphor efficiency can be directly related to the overall integrated coulombic charge
through the phosphor. The emission intensity I after aging is given by I0 , (11) I= (1 + CN) where I0 is the initial intensity, N is the number of electrons deposited into the phosphor per unit area, and C is an aging parameter of the phosphor/display. Low voltage CL phosphors usually contain dopant species called activators, which act as the luminescent source. Activator ions are incorporated into the phosphor matrix and are surrounded by the host crystal, which forms the luminescent centers where the process of luminescence takes place. These dopants are usually present in small percentages, otherwise they will lie too close together in the host matrix and possibly quench neighboring dopant species. An activator dopant that is added to a host material that does not normally luminesce is called an originative activator, and an activator that improves or increases the luminescence of the host material is called an intensifier activator. More than one activator (coactivator) species may also be used to induce the phenomena of cascade or sensitized luminescence (35). Cascading occurs when a coactivator introduces new energy absorption bands and nonradiatively transfers energy to the other activator(s). Sensitization, on the other hand, is caused by radiative energy transfer. Phosphors must also be wide band-gap materials (>2.8 eV), must form low-energy surface barriers to allow efficient electron injection, and must be highly pure. Some high luminous efficiency, low-voltage red phosphors include Eu3+ doped oxides such as Y2 O2 S •• Eu, Y2 O3 •• Eu, and YVO4 •• Eu. For green phosphors, ZnO •• Zn has been significantly researched and extensively used; however, Gd2 O2 S •• Tb and ZnS •• Cu,Al show high luminous efficiencies of the order of 30 lm/W at 1 kV. ZnS •• Ag, ZnS •• Ag,Cl,Al, and SrGa2 S4 •• Ce are three promising blue phosphors. These phosphors are chosen for their chromaticity coordinates, luminous efficiency, and resistance to coulombic aging. It has also been found that some sulfur-based phosphors contaminate the cathode tips as the phosphors degrade, and this must also be considered when choosing a phosphor (34). Improvements in phosphor synthesis, screening, particle size uniformity, and characterization of degradation mechanisms are still needed and are ongoing. These areas are the primary focus of low-voltage phosphor research and development for use in FEDs. Anode panels generally have a black matrix pattern incorporated into their film stack structure. A black matrix is comprised of an opaque material that usually contains silicon or chromium and is used to increase color contrast between pixels. Gettering materials may also be incorporated into the black matrix grille (33). Because the anode is the high-voltage panel and must be transparent to allow the end user to view the visible light emitted by the phosphors, it is necessary to have an optically transparent (85–95% transmission of visible light) and electrically conductive (resistivity of 10−4 –10−2 cm, or sheet resistance on the order of 10 /square) material for the anode electrode. Some
FIELD EMISSION DISPLAY PANELS
typical electrode materials consist of silver oxide (AgO), zinc-doped zinc oxide (ZnO •• Zn) (36), and indium tin oxide (ITO) (37,38). ITO is the most common transparent conductor material for FPDs. The optical and electrical properties may be optimized by obtaining the ideal combination of free carrier concentration and free carrier mobility (39) aided by thermal annealing processes (40). It is also well known that ITO surface properties may be altered to improve electrical properties by subjecting the film to chemical treatment (41,42). It is common to deposit a layer over the phosphor screen, called a binder or dead layer, to bind the phosphors to the anode, otherwise phosphor grains may become dislodged from their pixels and cross contaminate adjacent pixels of differing phosphors. The binder increases the voltage required to activate the phosphors because the electrons lose energy while passing through the layer. It is also important for ambient light that strikes the display not to cause any unwanted glare due to reflection. Therefore, it is customary to use antireflective coatings (ARCs) such as silicon nitride (SiNx ) or silicon oxynitrides (SiOx Ny ) (43). These films are widely used in the semiconductor industry as antireflectants for photolithographic processing. The Spacers It is well known that FEDs must have a high vacuum in the gap between the cathode and anode panels. As a rule of thumb, FEDs whose diagonals are more than 2 inches require special standoffs, called spacers, to prevent the panels from imploding into the vacuum space. Of course, extremely thick glass may be used to fabricate large area FEDs, which defeats the purpose of creating a small, lightweight display device. Substrate glasses 1–3 mm thick are usually used for fabricating large area FEDs. FEDs whose diagonals are larger than 30 cm use thin glass primarily to reduce the weight of the display. To be effective, spacer structures must possess certain physical and electrical characteristics. The spacers must be sufficiently nonconductive to prevent catastrophic electrical breakdown between the cathode array and the anode. They must exhibit sufficient mechanical strength to prevent the FPD from collapsing under atmospheric pressure and must have the proper density across the display so that there is minimal warping of the substrates. Furthermore, they must exhibit electrical and chemical stability under electron bombardment, withstand sealing and bake-out temperatures higher than 400 ° C, and have sufficiently small cross-sectional area to be invisible during display operation. It is also important to be able to control the amount of current that passes from the anode to the cathode (opposite the flow of electrons) via conduction of the spacers because it is an important consideration for power consumption and charge bleed off. Charge bleed off is important in preventing charging of the spacers and subsequent electrostatic electron repulsion and associated image distortion. The physical forces (due to atmospheric pressure) that act upon a spacer induce normal compressive stresses in
381
the spacer. The average stress σs at any point in the crosssectional area of a spacer As is given by the normal stress equation for axially loaded members (44): σs =
Fs , As
(12)
where Fs is the internal resultant normal force in each spacer (for this type of problem, it is the same as the force acting on the spacer). If there are n evenly spaced spacers and Fd is the atmospheric force acting on the display given by (13) Fd = Pa Ad , where Pa is the atmospheric pressure, Ad is the area of the vacuum space of the display parallel to the cross-sectional area of the spacers and is given by the product of the length Ld and width Wd of the display, then, Ad = 2Ld Wd , and Fs =
Fd . n
(14)
(15)
The factor of 2 in Eq. (14) is required because there are two panels, one on each end of the spacer. Rearranging and substituting Eqs. (13), (14), and (15) in (12) yields the stress per spacer in terms of known and measurable quantities: 2Pa Ld Wd . (16) σs = nAs Because the maximum allowable stress applied to a spacer should be known (it is an intrinsic property of the spacer material), it is possible to calculate the minimum number of spacers required to withstand atmospheric pressure by rearranging Eq. (16) and solving for n. To ensure the proper electron beam cross sections, it is important to be able to control cathode emission currents, the emission area, and the gate opening; it is also important to have a uniform and controllable gap separating the cathode and anode panels. For this reason, spacer dimensions are controlled to a high degree of precision. Consideration must be given to the sealed package as well. The lengths of spacers subjected to atmospheric pressure and its associated normal compressive strain ε contract by δ (As is assumed to be constant). The average unit axial strain εs in a spacer is given by (45) δ , (17) εs = Ls where Ls is the original length of the spacer. Hook’s law for elastic materials may be used to relate the stress and strain numerically by the simple linear equation: σs = Eεs ,
(18)
where E is Young’s modulus, also called the modulus of elasticity. Substituting Eq. (16) and (17) in (18) and
382
FIELD EMISSION DISPLAY PANELS
rearranging yields δ=
2Pa Ld Wd Ls . nAs E
(19)
From Eqs. (16) and (19), it is possible to calculate the optimum number of spacers n to be used, as well as their dimensions As and Ls , so that the sealed display has good vacuum integrity and proper spacing between panels. Note that we have neglected the change in the diameter of the spacers due to the compressive forces of atmospheric pressure. This is based on the assumption that the spacer length is much greater than the width and that Poisson’s ratio for the spacer is small (0.23 for soda-lime glass). Additionally, displays should be overdesigned to have an excess of spacers so that the atmospheric pressure and any other outside loads do not result in mechanical failure. The coefficient of thermal expansion (CTE) is another important property of the spacer. CTE mismatches between the substrate and spacer can result in residual stress and possibly the destruction of the spacers during sealing. For this reason, using the same glass as the substrate is often the best choice for spacer fabrication. However, CTE matched ceramics have also been found useful as spacer structures. The Drivers All display technologies require unique electronic circuitry to transform the display assembly from an inert piece of glass into a useful device that can present real-time visual information. In contrast to the now commonplace analog CRT, FEDs use primarily digital techniques to control the display of video information. FED technology is still in development, but the principal approach for driving these displays is well known within the industry. To extend the gate–cathode arrangement to a flat panel display, one possible approach is to connect the emitter tips to the columns of a matrix display and divide the grid into rows. To excite a particular pixel, one applies a sufficient positive select voltage to a row and a negative, or at least less positive, modulation voltage to a column. Generally, a single row is selected, and all the columns are simultaneously switched. This allows addressing the display a row at a time, instead of a pixel at a time, as in the raster scan of a CRT. A gate–cathode voltage difference of 80 V is sufficient to generate full pixel luminosity for a typical FED. A black level is attained at 40 V or less. The 40-V difference, is used to modulate the pixel between on and off states (46). As in most FPD technologies, the overall display area of a FED is subdivided into a rectangular matrix of pixels that are individually addressed in a row and column arrangement. From a sufficient distance, the human eye fuses these discrete points of light to form an image. For this discussion, a common SVGA (super VGA) resolution (800 columns × 600 rows) color display will be used as an example. In this display, the matrix formed by the intersections of these rows and columns yields 480,000 pixels. Even larger matrix sizes are possible, such as 1, 024 × 768 or even 1, 600 × 1, 200 with a corresponding increase in complexity. To depict
color, each pixel is further divided into red, blue, and green subpixels. Although a myriad of subpixel geometries is possible, the most common arrangement consists of vertical color stripes oriented to form a square pixel. This requires multiplying the number of columns by a factor of 3. Thus, the example color SVGA display would require 800 × RGB = 2, 400 discrete columns × 600 rows. Note that most current approaches constructing a FED are so-called passive displays, which positions the active drive circuitry at the periphery of the display. The typical FED consists of an active area populated by emitter tips and intersecting row and column metal lines that have high-voltage IC driver chips located around the edge. These exterior driver chips attach to the individual rows and columns via a flexible electronics substrate similar to a printed circuit board. An individual chip may be responsible for driving hundreds of rows or columns simultaneously. One consequence of this passive approach is that the narrow metal address lines that cover the entire display can be resistive as well as mutually capacitive where they overlap. The resulting RC transmission line behavior can distort and delay the outputs from the external driver chips, which can affect display sharpness and contrast. Maximum rise and fall times of the column and row signals can limit the maximum scan rate and gray-scale capability of the display. Furthermore, the large capacitive load of the lines requires that the drive ICs supply large transient currents. Figure 4 shows a block diagram of the overall electronics of a FPD system used in driving a FED. An analog-to-digital (A/D) portion interfaces with the video signals commonly encountered in the real world such as analog television (NTSC/PAL) or computer video. These analog signals are converted to digital values that can be applied to the display. Commonly, red, green, and blue are represented by 6- or 8-bit digital values that result in 64 or 256 discrete levels of brightness. It is also becoming increasingly more common for digital data to be available directly from a computer source. In this case, no A/D conversion is required, and less noise is introduced into the video data. Once these digital signals are acquired, they may need to be rescaled to match the native resolution of the FPD. For instance, VGA video (640 × 480 pixels) would use only a portion of the SVGA example display. To perform this nonintegral scaling without artifacts is difficult, but high-performance ICs are becoming available to do this task cleanly (47). Then, the appropriately scaled digital data are piped into a video controller IC that directs the data to the specific column and row drivers. The driver chips located at the edge of the display’s active area receive the digital data and control signals allowing them to drive the row and column address lines to the required levels. Another important component is the system power supply that must generate the row, column, and anode potentials in addition to the logic level voltages. Finally, a user interface is required to allow the viewer to control display parameters such as contrast and brightness. In operation, a single output of a row driver is selected that turns on a row by raising its voltage from ground potential to +80 V. Column drivers have been previously loaded with data for that particular row. After a slight
FED block diagram Column data & control
User control
Column drivers
383
Row data & control
FIELD EMISSION DISPLAY PANELS
Video control Video data
Video controller
Logic voltages Logic voltages NTSC/PAL Computer video video
Voltage control
FED active area M cols x N rows
Row drivers
A/D conversion video scaling
Row drivers
Row data & control
Voltage supply Display voltages Column date & control
Column drivers
Figure 4. FED driving system block diagram with various electronic elements.
delay that allows the row to charge to its final value, column outputs are either turned on by swinging to ground or remain in their off state at +40 V. In this manner, an entire row of selected pixels is illuminated simultaneously. After a certain amount of time, the active columns are reset to the off state, and the next row is selected. The columns of new data are again selectively switched, and the process repeats. The entire display is scanned row by row at a rapid rate. Ideally, the rows should be refreshed upward of 60 times a second to minimize flicker. A simple calculation shows that the example SVGA display would have a row time of 1/(60 rows/s × 600 rows) or 27.8 µs. Simply switching rows and columns between on and off states is useful in producing black or white pixels to yield a binary display. This is appropriate in many applications such as text displays where gray scale is not needed. However, to render gray levels accurately, the amount of light from individual pixels must be modulated. Two primary methods for attaining modulation involve controlling the amplitude of the column drive, called amplitude modulation (AM), or varying the time the column is turned on, called pulse-width modulation (PWM). AM can be accomplished by modulating the overall column voltage swing, but nonuniform pixel emission will result in a grainy appearance in the display. Another approach to modulating the pixel is to control the column current. However, the large transient current used to charge the capacitive column is in the milliamp range, whereas the emitter current is only microamps. Neither of these approaches has proven feasible for passive displays, so the most popular method for attaining gray-scale encoding is pulse-width modulation. Figure 5 shows representative column and row waveforms in a gray-scale FED. As in the binary display, rows are sequentially enabled. The column drivers, loaded with 6- or 8-bit data, are used to turn on column lines for a variable amount of time that is proportional to the encoded value. Six bit data would require 64 individual time increments for the column driver. For the SVGA FED, these time increments would be in multiples of approximately 400 nanoseconds. Each of the color subpixels is similarly
controlled and allows display of a full range of colors and intensities. DISPLAY FABRICATION A high degree of process control, large-scale manufacturing capability, and the ability to generate small features, makes semiconductor processing the manufacturing method of choice for field emission displays. Semiconductor processing is the general term for techniques in thin-film processing and photolithography that, involve depositing and selectively removing thin films supported on a substrate. FED cathodes and anodes are fabricated by this type of processing, combined with the spacers in an assembly technique, and the finished product is incorporated into a package with the driver system. To understand how the cathode and anode panels are fabricated, it is necessary to know photolithography and thin-film processing. In general, the process flow is to place a material on the panel (deposition), pattern it (photolithography), and remove it selectively (etch). Thin films may be deposited by a number of means. Physical vapor deposition (PVD) and chemical vapor deposition (CVD) are two modes of thin-film deposition. Sputtering and evaporation are two main forms of PVD. Sputtering offers a wide variety of choices for material deposition but can be high in cost, depending on the material. Evaporation offers high purity deposition
Pulse-width modulation timing On Row N Off On
Row N+1
Off Off Column M On Variable time on
Variable time on
Figure 5. Waveform diagram for driving system.
384
FIELD EMISSION DISPLAY PANELS
(partially due to the high vacuum levels used) but can be difficult and expensive to scale up to large substrate sizes (48). During sputtering, a panel is placed in a vacuum chamber. Also inside the chamber is a target made of the material to be deposited. Positively charged atoms (e.g., from argon or other halogen gases) are created by a plasma and are accelerated into the target (negatively charged electrode) due to a potential difference. The result is that the target material is deposited on the anode (a FED panel in this case). Evaporation is a PVD technique that involves a filament, which is made up of the material to be deposited. The material is heated to its boiling point (the point at which it evaporates) by electron beam or thermal evaporation and subsequently condenses on the substrate assisted by gas flow. There are many types of CVD processes. Two types are atmospheric pressure chemical vapor deposition (APCVD) and plasma-enhanced chemical vapor deposition (PECVD). APCVD is typically used for thin-film oxide depositions. Some APCVD can be done only on silicon wafers because the deposition temperature is above the average melting point most glass substrates used in FED production (∼ 600 ° C). PECVD uses a plasma to aid in depositing materials by forming highly reactive ions from the source gases. This process can also be done at high temperature, however, temperatures of 200–500 ° C are sufficiently low for processing FED substrates. PECVD is also used for many deposition processes within LCD manufacturing. Photolithography is a process by which a pattern is transferred to a substrate. Photolithography uses a lightsensitive material called a photoresist that can copy the image from the master, or mask, by changing its properties in the exposed areas, not in the unexposed areas. The exposed areas of the photoresist are chemically altered so that they can be removed by a solution containing water and a developer that is commonly a basic liquid, such at tetramethyl ammonium hydroxide (TMAH). A typical process would consists of coating the panel with photoresist, baking the resist to remove solvent, aligning the mask to the substrate (panel) to ensure proper placement of the master image, exposing the unmasked areas to light (usually ultraviolet light for semiconductor processing), and finally, removing the exposed resist (for a positive resist) and exposing the thin film underneath that is to be removed by the next process step, etching. Etching in semiconductor processing is a technique for selectively removing a particular material from other surrounding materials. There are two main categories of etching, wet and dry. Wet etching uses acid and base chemistries and generally gives an isotropic etch profile, whereas dry etching uses gases and a glow discharge (plasma) and is normally used to generate an anisotropic (directional) etch profile, that is, a vertical profile. Wet etching often is faster and more selective than dry etching. The term dry etching is used to encompass the general topic of gas phase etching of a specific surface by physical reaction, chemical reaction, or some combination of the two (48). Either wet or dry etching is used depending on the materials that are being processed, the desired selectivity, and the sidewall profile results.
Because the FED is an electronic unit, it is necessary to control each of the panels produced, so that it has the desired electrical properties of its films. Doping the films by purposefully incorporating impurities into them is the generally accepted method for altering their electrical properties. The most common methods of doping are by ion implantation and in situ doping. Ion implantation is the process for implanting high-energy ions through the surface of the material to be doped. In situ doping is doping the film during deposition. In situ doping is the preferred method for FED fabrication. The Cathode Using standard semiconductor processing tools, cathode fabrication is slightly more complicated than production of the anodes. The first step is to choose a material for the substrate. The material must have many specific physical properties, including strength and a coefficient of thermal expansion (CTE) that matches most of the other materials used in the processing steps. CTE is a property of a material that indicates how much the material expands or contracts when it experiences a thermal history. The two main types of materials chosen for the cathode substrate are silicon wafers and glass. The glass types that are used vary from a typical window type glass to highly specialized glass that is ultrapure. Glass composition may include oxides of silicon, boron, phosphorous, calcium, sodium, aluminum, and magnesium, to name a few (49). Borosilicate and aluminosilicate glass are two types of special glasses that are routinely used in FED manufacturing. Many LCD makers use aluminosilicate, which contains oxides of aluminum within the silicon dioxide. Glass can be provided in a wide variety of sizes, including very large sheets (1,000 × 1,000 mm). Several smaller displays may be fabricated on these large substrates, this is known as multi-up processing. Silicon wafers are single crystals of silicon that are produced in sizes up to 300 mm in diameter. It is impractical to make field emission displays on silicon wafers if the active area of the display is going to be more than about 5 cm. Silicon wafers are easy to process but extremely expensive as a starting material. The semiconductor industry uses them for two main reasons. First, it is necessary for them to use silicon because it is the building block of a semiconductor device. Second, it is possible to fabricate a plethora of devices on each wafer, thus splitting up the total cost of the wafer among thousands of parts. Once the substrate has been selected, it is time to start processing the FED. Probably the most important step in all FED processing is the cleaning step. There are many cleaning steps in semiconductor processing, and FED processing is no different. A particle on a cathode as small as a human hair would be enough to destroy the quality of the emission from the entire device. In addition, any organic materials that get on the cathode or in the package can contribute to a phenomenon known as outgassing. The first process completed on the glass substrates as they arrive at the fabrication facility (fab for short) is cleaning. Normally, once the panels have been
FIELD EMISSION DISPLAY PANELS
cleaned, no human hands can contact them until they are completely processed. The entire process is automated in the interest of cleanliness. After the panels are washed, it is necessary for them to receive a thin film deposit of a barrier (protection) layer. The barrier is typically a silicon dioxide film used to protect the subsequent thin film layers from any surface defects that may be on the glass. Following the barrier deposition, the panels receive a deposit of metal, which is the conductor for the columns used in conjunction with the row metal to address (by matrix addressing) the emitters. After metal deposition, the panels receive the very precise pattern that the columns (and later the rows) require. This is done by using the photolithographic process described earlier. The next step is resistive silicon deposition. This thin layer of silicon is deposited on the cathode so that when driven, each set of emitter tips is limited in the amount of current it can draw. This is important because if one set of tips draws an unexpectedly large amount of current (e.g., due to a short), the entire set of tips or worse yet, the driver circuits could be destroyed. Again, this thin film is processed so that all excess material is removed by the etching process, and the final pattern remains. This point in the processing scheme is where the two main methods (the CMP and Spindt methods) for fabricating cathodes diverge. The greatest difference in the way FED cathodes are made is in the emitter tip process module. The CMP method deposits a layer of doped silicon approximately 1 µm thick and then a layer of silicon dioxide that will act as a hard mask to pattern the tips. The hard mask is photopatterned by using circles for the locations of the tips. Before the tips can be formed, the hard mask is etched so that circles, or caps, will be above each future tip location. The tips are formed by plasma etching the silicon to shape it into cones. The oxide hard mask is removed by a wet chemical etch. Next, two more thin films are deposited on the panel. First, a layer of oxide is placed on top of the tips to isolate them electrically from each other. Second, a layer of metal, called the extraction grid, is deposited on top of the oxide layer. This grid is usually a metal or semiconductor, such as niobium or silicon, respectively. This second metal is used as the gate electrode to extract electrons from the emitter tips. After these metals are deposited, bumps remain where the materials have conformed to the shapes of the tips. By flattening these bumps, the extraction grid metal is removed from the area directly above the tips, leaving the oxide exposed. The process by which the bumps are flattened is called chemical mechanical planarization (CMP) (50). This process is shown in detail in Fig. 6. The final step in fabricating tips is etching the oxide layer, chemically and isotropically, to create cavities. These cavities play a crucial role in allowing proper electric fields to be generated, that are necessary to extract electrons. However, if this process is left as the final step, the oxide layer can serve as a physical protection layer for the tips until all further processing has been completed. The last step in fabricating the cathode then becomes wet chemical etching of this oxide layer.
Silicon dioxide cap layer
385
Patterned photoresist
Doped silicon (Tip silicon)
Silicon
Oxide
Figure 6. Schematic diagram of tip formation using the CMP process.
The Spindt method of manufacturing tips differs mostly in order of operations. In addition, instead of removing material to form the tips, they are actually formed into their sharp conical shape as they are deposited. The thick oxide layer, on which the grid metal sits is deposited first. Then, the grid metal is deposited on top, and photopatterned by using circles, just as for the hard mask in the previous example. The grid metal is dry etched so that circles expose the underlying oxide layer. The oxide layer is isotropically wet etched, creating a cavity under the grid. Before the tips are formed into cavities, a layer is deposited so that the tip material that is deposited outside the cavities can be removed later. This sacrificial layer is sometimes called the liftoff layer. When the opening is set, the cathode is placed in an evaporation chamber where a metal, usually molybdenum, is deposited at an angle normal to the holes. As the holes fill up, the conical shape of the emitter tips take form. When the process is complete, the original holes are filled and closed off. An electrochemical etch is employed to remove residual tip metal from the top of the extraction grid and tips. This process is known as the Spindt technique, and the tips are called Spindt emitters (9). This is the most common method for producing field emitter arrays used in FEDs. Regardless of emitter type, the next step in cathode fabrication is depositing the metal that will be used to form the rows. The row metal is similar if not identical to the column metal. The functionality of the row and column metal give the display its x, y addressability. The driver chips can then easily turn on any specific row and column thus illuminating one defined pixel (or subpixel). After the row metal is patterned and etched, a layer of a dielectric material may be deposited. This material is usually silicon dioxide or silicon nitride. Because the anode can be at high voltages, this dielectric may be necessary to suppress rogue emission from unwanted, possibly jagged sites.
386
FIELD EMISSION DISPLAY PANELS
The Anode Anodes are much less complicated in terms of number of processing steps. A typical anode can be fabricated in approximately four to seven major fabrication steps (a major step is normally defined by a photolithographic mask step), compared to cathodes, which can require upward of 5 to 10 fabrication steps. Because every pixel site on the cathode has a corresponding pixel site on the anode, the layout of the pixel defining area is normally identical to that which defines the pixel/tip locations on the cathode. Large area FEDs require that the anode and cathode are fabricated on substrates comprised of the same material. However, small area FED cathodes may be made on a crystalline silicon substrate, whereas the anode is usually made on glass. The first step in anode fabrication is to choose a material to deposit on the substrate that will be electrically conductive and optically transparent, such as ITO. The next step in making a FED anode is to deposit and pattern the black matrix. The black matrix contributes to the contrast ratio and can act as an antireflectant. The final step(s) in the anode fabrication process is to deposit phosphor. The methods employed here depend on many factors, including whether the display will be monochrome or color, high or low voltage (on the anode), and the resolution of the display (VGA, SVGA, XGA extended graphics array, etc.). Two main methods for phosphor deposition are electrophoretic (EP) deposition and slurry deposition. Electrophoretic deposition is a process for placing ionic phosphor particles into an electrolytic solution. The anode is coated with photoresist everywhere except where phosphor is to be deposited. The panel is then placed into the solution, and a potential difference is applied between it and a reference electrode. The charged particles migrate to the panel and coat the surface. The photoresist is stripped off the panel along with any undesired phosphor that may have been left on it. This leaves a well-defined pattern of phosphor where the resist was not located. Slurry phosphor deposition is fully coating an anode with a particularly pigmented phosphor (blue, for example) and then patterning and selectively removing that phosphor to form the precise pattern necessary for pixel definition. This process is repeated two more times for the other colors (red and green). The two most common methods for slurry deposition are spin coating and screen printing. The Spacers One of the more important concerns for FED spacers is the choice of material. Among other things, the spacer is responsible for preventing the evacuated display from imploding. Therefore, it is of the utmost importance for the spacers to be strong. In addition, it is the daunting task of a spacer to occupy a relatively small (10–400 µm) footprint and to be tall enough to withstand large gaps (200–3,000 µm). High aspect ratios between 4 and 20 are not easy to achieve, and good quality spacers are increasingly difficult to fabricate when considering the physical requirements for support and resistivity.
Spacers can be fabricated in a variety shapes, sizes, components, and methods. Methods of fabrication have been derived from traditional glass drawing, molding, and micromachining. Spacer structures can incorporate solid or hollow structures (51). Different designs that have been used in spacer technology include cylinders, triangles, squares, rectangles, trapezoids, and crossshaped structures. Application of these small spacer structures to either substrate (cathode or anode) is a particularly difficult task. The spacers must be placed on the cathode or anode within ±10 µm in some cases. Handling single structures that are 25 × 200 µm is very difficult without specialized equipment. Moreover, depending on size, a typical display may require placing from thousands to tens of thousands of these small structures in precise locations. Clearly, handling these spacers individually becomes infeasible in production. One method for spacer development is to fabricate the spacer directly on the substrate. Some historical methods of directly employing spacers include screen printing, stencil printing, reactive ion etch (RIE), laser ablation (52), and thick-film photopatterning of organic structures (53). Each of these methods has serious technological limitations. Screen printing suffers from an inability to produce the proper aspect ratio, and RIE is an inherently slow process that leads to high-cost production. Many individual spacers need to be attached to the substrate via some procedure that allows fast and accurate placement. Methods of attaching separate spacer structures include anodic bonding (54), metal-to-metal bonding (55), and various methods of gluing. When spacers are manufactured from soda-lime glass, the presence of sodium allows a method of attachment known as anodic bonding. At high potential and high temperature, sodalime glass spacers can be bonded to silicon substrates (or glass substrates that have silicon sites connected to a conductor), by countermigration of oxygen and sodium ions. Other methods of attaching spacers include using both organic glues and inorganic glass frit (56). Issues of charging and electrical breakdown are understood, and solutions have been derived to eliminate these catastrophic failure mechanisms from an operating display. Although there have been many great accomplishments in spacer technology, it is still in its infancy. The Assembly The technology for evacuating a glass envelope though an exhaust tube exists. (CRT manufacturers have been doing it for years). However, the fundamental shape and design of the glass pieces are different in a CRT than in a FED. The use of an exhaust tube in a FED package removes the advantage of producing a thin display. In addition, exhaust tubes can introduce potential failure mechanisms. The technology exists today for assembling and sealing the glass components in a vacuum that eliminates the need for an exhaust tubulation process. The fundamental pieces of the sealing process are CTE matching, incorporation of a getter, and alignment of the substrates to each other. Initial work on the assembly of FEDs was done using tubulated seals. Tubulated seals involve a two-step process
FIELD EMISSION DISPLAY PANELS
whereby the anode and cathode are sealed together in an inert atmosphere that does not oxidize components of the display, leaving a small tubulation. The second step is to pull vacuum through the tubulation and pinch off the tube via various glass manipulations. A high temperature bake-out process can be accomplished using the tabulated sealing approach while actively pumping out the display. Some FED manufacturers have shifted focus to nontubulated sealing, which has the added benefit that it is a single-step process (57). The sealing glass (also called frit) is typically a powder form of glass that is CTE matched to the substrate in the display. The powder is combined with an organic material that allows placing the powder glass on the panel precisely. When the organic material is removed (through an elevated thermal cycle), the glass powder is below its flow point so that organic materials cannot be trapped inside and increase the porosity of the sealing glass (58). High porosity and improper wetting of the sealing components can be direct causes of vacuum leakage. Equally important to assembly and ultimately final operation is alignment of the cathode to the anode. It is imperative to line up the cathode pixels (or subpixels) precisely with their counterpart pixels/subpixels on the anode. Misalignment by more than 5 microns on highdefinition displays can destroy color purity as well as image quality. Alignment tools are normally custom designed for FED developers. CRT manufacturers have a rough alignment procedure and achieve exacting alignment using internal (magnetic) methods, making corrections after the CRT is manufactured. The CTE is a material property of utmost concern in assembling anodes and cathodes. If the CTE of the components (anode, cathode, spacers, and sealing glass) are not properly matched, then one component will expand or contract at a different rate than the others and breakage may occur. Often, there are CTE mismatches that the sealing technology must overcome. One such issue is sealing temperature. Temperature constraints from the high end come from glass and thin-film materials (e.g., no temperatures over 450 ° C for soda-lime glass due to decreased glass viscosity). Constraints from the low end come from the getter materials (some getter materials are not activated until they have experienced temperatures over 400 ° C). Other types of getters can be activated by a point heat source, which eliminates the need to heat the entire display to activate the gettering material. This closes the tolerances for some of the assembly materials. Some of the materials needed include a sealing glass solvent vehicle that totally burns off at less than 400 ° C and a sealing glass powder that completes its transition to glass at less than 450 ° C. Sealing glass powders are available for temperatures lower than 450 ° C however, the getter materials that work much below that temperature are hard to obtain. Recently, significant advances in gettering technology have allow for the development of lower temperature sealing methods (59). Insufficient vacuum levels in the presence of high electric fields (like those in operating FEDs) can lead to a condition known as glow discharge and/or voltage
387
breakdown, known as arcing. Both of these events are catastrophic failures in FED devices. PERFORMANCE AND PARAMETERS Field emissive display technology is still evolving; a number of different manufacturers are pursuing unique approaches. Unlike more mature display technologies, such as CRTs, it is difficult to describe general parameters for a typical FED. However, the physics defining the manufacture of a FED result in a certain envelope of operation. Key parameters that define this envelope for a FED include luminance, viewing angle, contrast ratio, resolution, size (display diagonal), operating temperature, response time, color purity, power consumption, and lifetime. The luminance of a field emissive display is determined by a combination of the potential difference between the grid and tip, the tip current density, the anode acceleration voltage, the phosphor efficiency, and the duration of excitation for each row. A typical VGA implementation whose values for these parameters are attainable can range from 100–600 cd/m2 (cd is the abbreviation for candela, the unit of luminous intensity). Extreme examples of this technology can yield displays as high as 1,000 cd/m2 or more. Two important parameters that are inherent in FEDs are wide viewing angle and quick response time. Unlike LCD displays that use polarizing filters that result in a distinct angular response (typically ±60° ), FEDs are an emissive technology that produces light off-axis. For the end user, this results in a display that can be viewed at angles of ±80° or greater in both the horizontal and vertical directions. Response times of FEDs are similar to CRTs because both are cathodoluminescent technologies. Typical response times of an FED phosphor are measured in fractions of a second (the response time of an FED is of the order of 10–30 µs). The response time is determined by the phosphor persistence and the scan rate of the display. This enables both FEDs and CRTs to achieve true video rate image presentation. In comparison, the viscosity of liquid crystal material limits an LCD response time to milliseconds. Operating temperature is another distinct advantage of FEDs. The manufacturing process involves sealing the displays at high temperatures (normally above 300 ° C). Although normal operating temperatures are rarely this high, the limiting factor is often the electronics that drive the display. FEDs could actually operate at temperatures above 100 ° C, except that the silicon base resistor is temperature-dependent and would require compensation. The benefit of this temperature independence means that FEDs have ‘‘instant on’’ capabilities at a wide range of temperatures that appeal to military and automotive markets. The contrast ratio depends in part on the type of phosphor used. A popular phosphor ZnO •• Zn, pronounced zinc oxide zinc, emits blue/green light and is white when it is not excited. This results in a large amount of reflected ambient light. The ZnO •• Zn phosphor has a contrast ratio between 20 : 1 and 40 : 1 in a dark room (60). In
388
FIELD EMISSION DISPLAY PANELS
comparison, the contrast ratio of a typical home television set is around 30 : 1 to 50 : 1 (61). Using color phosphors, the contrast ratio can be increased by using a black matrix between the pixels. Color purity, response time, and lifetime all depend on the choice of phosphor (monochrome vs. color, material composition). Color purity is probably the most difficult parameter to quantify because it is related to the presence (or lack) of impurities in color uniformity. These impurities can be due to both the phosphor and fluctuations in magnetic fields (for CRTs) within the display. Qualitatively, color purity can be reported as good (having no localized distortions) or bad (containing distortions). Preliminary attempts in developing an FED have focused on lower resolution and smaller display size to facilitate research and development. To date, development efforts on FEDs have been in the 1/4 VGA to XGA pixel format. Display sizes in this pixel format range from less than 1 inch to as much as 13 inches in diagonal (62). As of May 2000, a 13.1’’ color SVGA FED was demonstrated by Candescent, Inc. at a Society of Information Display Exhibition. Power consumption for a field emissive display is dominated by two specific factors. Neglecting the power required for digital control logic, which is minimal, most of the energy is required to charge and discharge capacitive columns as well as drive the anode current across the vacuum gap (from emitter tip to phosphor). Capacitive switching power consumption is calculated by the relationship P = 12 (fCV2 ), where f is the switching frequency, C is the capacitance, and V is the voltage being switched. Row capacitive switching is quite small because only one row is switched at a time and the row scan rate is relatively slow. Column switching, however, can be very significant due to the large column voltage swing, the large number of individual columns, and the fact that all the columns may be switched every row. For the color VGA display example, 640 × RGB = 1, 920 columns, assuming a column capacitance of 1 nF, then the overall column capacitance is 1.92 µF. At a 60 Hz display refresh rate, the columns could potentially switch up to every row which is 480 × 60 Hz = 28.8 kHz. For a 25-V column swing, the resulting column power consumption would be 17.3 W. The anode power consumption is calculated simply from the anode current and voltage by P = IV. This depends, in part, on the degree of luminance that the display needs to produce, but values of 10–15 W for anode consumption are acceptable. Without attempting to reduce the column swing or finding more efficient phosphors, total display power consumption could be in the range of 30–45 W. The field emissive display lifetime is determined by phosphor degradation, emitter tip aging, and highvoltage breakdown between the anode and cathode. If a robust, high-voltage vacuum system can be maintained to eliminate the breakdown issue, phosphor and emitter tip stability remain to be solved. Differential aging is used to describe the phenomenon where illuminated pixels degrade differently from dark ones. This same situation manifested itself in early CRTs where screen-savers were necessary to prevent a latent image from being ‘‘burned’’
into the screen. The goal is a display than can operate more than 10,000 hours and age negligibly. AUTHORS’ FINAL STATEMENTS Although field emission displays have not yet entered the major consumer markets, the technology is growing toward that end. Within the next 5–10 years, this emerging technology will share in the world display market. The first company that can produce and profit by selling FEDs will most likely emerge as a powerhouse in this new arena of flat panel displays. ABBREVIATIONS AND ACRONYMS A/D AECS AM AMLCD APCVD ARC ATMI CIE CL CMP CRT CTE CVD DARPA DSP Ef Ev FEA FED FPD IBM IC ITO LCD LETI NEA NRL NTSC OEM PAL PECVD PVD PWM R&D RIE SiNx SiOxNy SRI SVC SVGA TFT TI TMAH TRP VC VFD VGA XGA
analog-to-digital Automotive Energy Components Sector amplitude modulation active matrix liquid crystal display atmospheric pressure chemical vapor deposition antireflective coating Advanced Technology Materials, Inc. Commission Internationale de l‘Eclairage cathodoluminescent chemical mechanical planarization or chemical mechanical planarization cathode-ray tube coefficient of thermal expansion chemical vapor deposition Defense Advanced Research Projects Agency digital signal processor Fermi energy level vacuum energy level field emitter array field emission display flat panel display International Business Machines integrated circuit indium tin oxide liquid crystal display Laboratoire d’Electronique de Technologie et de l’Informatique negative electron affinity Naval Research Laboratory National Television Standards Committee original equipment manufacturer phase alternating line plasma enhanced chemical vapor deposition physical vapor deposition pulse-width modulation research and development reactive ion etch silicon nitride silicon oxy-nitride Stanford Research Institute Silicon Video Corporation super video graphics array thin film transistor Texas Instruments tetramethyl ammonium hydroxide Technology Reinvestment Program venture capital vacuum fluorescent display video graphics array extended graphics array
FIELD EMISSION DISPLAY PANELS
BIBLIOGRAPHY
36. K. Tominaga (1998).
et al.,
Thin
Solid
Films
334,
389 35–39
1. R. Young, Semicond. Int. 97–100 (May 1998).
37. K. Utsimi et al., Thin Solid Films 334, 30–34 (1998).
2. B. Fedrow, Solid State Technol. 60–67 (September 1998).
38. I. Baia, Thin Solid Films 337, 171–175 (1999).
3. M. Abbey, Inf. Display 14, 14–17 (1998).
39. A. K. Kulkarni and S. A. Knickerbocker, J. Vac. Sci. Technol. A 14, 1,709–1,713 (1996).
4. Stanford Resources Incorporated, FID 1999. 5. W. Schottky, Z. F. Physik 14, 80 (1923).
40. D. V. Morgan et al., Renewable Energy 7, 205–208 (1996).
6. R. H. Fowler and L. W. Nordheim, Proc. R. Soc. London A 119, 173–181 (1928).
41. F. Neusch et al., Appl. Phys. Lett. 74, 880–882 (1999).
7. K. R. Shoulders, Adv. Comput. 2, 135–293 (1961).
43. S. Wolf and R. N. Tauber, Silicon Processing for the VLSI Era vol 1: Process Technology, Lattice Press, Sunset Beach, 2000, pp. 524–527.
8. Thin Electron Tube with Electron Emitters at Intersections of Crossed Conductors, US Pat. 3,500,102, March 10, 1970, M. E. Crost et al., (to United States of America). 9. C. A. Spindt, J. Appl. Phys. 39, 3,504 (1968).
42. C. C. Wu et al., Appl. Phys. Lett. 70, 1,348–1,350 (1997).
44. R. C. Hibbeler, Mechanics of Materials, Macmillan, NY, 1991, pp. 114–117.
10. H. Busta, J. Micromechanical Microengineering 2, 43–74 (1992).
45. K. P. Arges and A. E. Palmer, Mechanics of Materials, McGraw-Hill, NY, 1963, pp. 8–10.
11. J. D. Levine, Flat Panel Display Process and Res. Tutorial Symp., San Jose, June 21–22, 1995.
46. R. T. Smith, Inf. Display 14, 12–15 (1998).
12. Process for fabricating Self-Aligned Field Emitter Arrays, US Pat. 4,964,946, October 23, 1990, H. F. Gray and G. J. Campisi, (to United States of America).
48. M. Mandou, Fundamentals of Microfabrication, CRC Press, Boca Raton, 1997, pp. 53–113.
13. D. Pidge, Jpn. Econ. Newswire July 4, 1989. 14. Vacuum Electronics Could Find Use in HDTV Technology, ASP 31(8), 33 (August 1989). 15. R. Meyer et al., Proc. Jp Display 513 (1986). 16. V. Comello, R & D Mag. (December 1996). 17. ATMI Announces Flat Panel Display Contract: $1.1 Million Contract for Field Emission Display Cathode a Step Toward Diamond Semiconductor Commercialization, Bus. Wire, May 3, 1994. 18. SVC-Private Placement Memorandum–March 4, 1996, p. 40.
47. A. Y. Lee, Inf. Display 14, 30–33 (1998).
49. W. H. Dumbaugh and P. L. Bocko, Proc. SID 31, 269–272 (1990). 50. Method to Form Self-Aligned Gate Structures Around Cold Cathode Emitter Tips Using Chemical Mechanical Polishing Technology, US Pat. 5,372,973, April 27, 1993, T. Doan, B. Rolfson, T. Lowery, and D. Cathey (to Micron Technology, Inc.). 51. Method for Manufacturing Hollow Spacers, US Pat. 5,785,569, July 28, 1998, D. M. Stansbury, J. Hofmann, C. M. Watkins, (to Micron Technology, Inc.).
19. Candescent WEB page http://www.candescent.com /Candescent/.
52. Spacers for Field Emission Display Fabricated Via SelfAligned High Energy Ablation, US Pat. 5,232,549, Aug. 3, 1993, D. A. Cathey et al., (to Micron Technology, Inc.).
20. Candescent and Sony to Jointly Develop High-Voltage Field Emission Display (FED) Technology, Bus. Wire, November 2, 1998.
53. Sacrificial Spacers for Large Area Displays, US Pat. 5,716,251, Feb. 10, 1998, C. M. Watkins, (to Micron Display Technology, Inc.).
21. Candescent and Schott Partner to Establish Flat-Panel Display Manufacturing Infrastructure in the United States, Bus. Wire, May 14, 1997. 22. A. Van Oostrom, J. Appl. Phys. 33, 2,917–2,922 (1962). 23. C. A. Spindt et al., J. Appl. Phys. 42, 5,248–5,263 (1976). 24. J. E. Yater etal, J. Vac. Sci. Technol. A 16, 913–918 (1998). 25. A. Weber et al., J. Vac. Sci. Technol. A 16, 919–921 (1998). 26. R. N. Thomas et al., Solid-State Electron. 17, 155–163 (1974).
54. Anodically Bonded Elements for Flat Panel Displays, US Pat. 5,980,349, November 9, 1999, J. J. Hofmann et al., (to Micron Technology, Inc.). 55. Method for Affixing Spacers within a Flat Panel Display. US Pat. 5,811,927, September 22, 1998, C. L. Anderson and C. D. Moyer, (to Motorola, Inc.). 56. Flat Panel Display Having Spacers, US Pat. 5,789,857, August 4, 1998, T. Yamaura et al., (to Futaba Denshi Kogyo K.K.).
28. R. W. Wood, Phys. Rev. 5, 1 (1897).
57. Field Emission Display Package and Method of Fabrication, US Pat. 5,788,551, August 4, 1998, D. Dynka, D. A. Cathey Jr. and L. D. Kinsman (to Micron Technology Inc.).
29. D. Temple, Mater. Sci. Eng. 5, 185–239 (1999).
58. J. W. Alpha, Electro-Optical Syst. Design 92–97 (1976).
30. J. P. Barbour et al., Phys. Rev. 117, 1,452 (1960).
59. Low Temperature Method for Evacuation and Sealing Field Emission Displays, US Pat. 5,827,102, Oct. 27, 1998, C. M. Watkins and D. Dynka (to Micron Technology, Inc.).
27. K. W. Wong et al., Appl. Surf. Sci. 140, 144–149 (1999).
31. H. Gamo et al., Appl. Phys. Lett. 73, 1,301–1,303 (1998). 32. Method of Making a Field Emission Device Anode Plate Having an Integrated Getter, US Pat. 5,520,563, May 28, 1996, R. M. Wallace et al. (to Texas Instruments, Inc.). 33. Anode Plate for Flat Panel Display Having Silicon Getter, US Pat. 5,614,785, March 25, 1997, R. M. Wallace et al. (to Texas Instruments, Inc.). 34. L. E. Shea, Electrochem. Soc.-Interface 24–27 (1998). 35. H. W. Leverenz, An Introduction to Luminescence of Solids, Dover, New York, 1968, pp. 333–341.
60. R.O. Peterson, FED Phosphors: Low or High Voltage? Inf. Display 13(3), March 1997 pp. 22–24. 61. A. F. Inglis and A. C. Luther, Video Engineering, 2nd ed., McGraw-Hill, 1996, pp. 121–123. 62. Candescent Technologies Corporation, San Jose, CA, 2001. http://www.candescent.com/Candescent/showcase.htm.
390
FLOW IMAGING
FLOW IMAGING NOEL T. CLEMENS The University of Texas at Austin Austin, TX
y
z
INTRODUCTION Imaging has a long history in fluid mechanics and has proven critical to the investigation of nearly every type of flow of interest in science and engineering. A less than exhaustive list of flows where imaging has been successfully applied would include flows that are creeping, laminar, turbulent, reacting, high-temperature, cryogenic, rarefied, supersonic, and hypersonic. The wide range of applications for flow imaging is demonstrated by the recent development of techniques for imaging at microand macroscales. For example, (1) and (2) report imaging velocity fields in 100-µm channels, and (3) describes a schlieren technique for imaging density gradient fields around full-scale supersonic aircraft in flight for the study of sonic booms. Impressively, the range of flow length scales spanned by these techniques is more than six orders of magnitude. Traditionally, flow imaging has been synonymous with ‘‘flow visualization,’’ which usually connotes that only qualitative information is obtained. Examples of flow visualization techniques include the imaging of smoke that has been introduced into a wind tunnel or vegetable dye introduced into a water flow. Owing to the complex and often unpredictable nature of fluid flows, flow visualization remains one of the most important tools available in fluid mechanics research. Excellent compilations of flow visualization images captured in a number of different flows can be found in (4) and (5). Modern flow imaging, however, goes far beyond qualitative flow visualization. Advances in computer, laser, and digital camera technologies have enabled the development of imaging techniques for obtaining quantitative images of a large number of flow variables such as density, temperature, pressure, species concentration, and velocity. Image data of this type enable the computation of a number of quantities that are important in fluid mechanics research, including vorticity, strain rate, dissipation, and heat flux. As an example of the power of flow imaging, consider Fig. 1, which shows a 3-D volume of the conserved scalar field in the far field of a turbulent water jet (6,7). The jet was seeded with a fluorescent dye, and the image volumes were captured by recording the fluorescence induced by a thin laser beam that was swept through the flow. The beam was swept rapidly in a raster fashion, and the fluorescent images were recorded by using a high-speed 2-D photodiode array. The resulting data volumes resolve the finest scales of mixing in three spatial dimensions and time, and when several such volumes are acquired sequentially, the data enable studying the temporalevolution of the conserved scalar field. These data can
x
z
Figure 1. Three-dimensional rendering of the conserved scalar (ζ ) field measured in a turbulent water jet using laser-induced fluorescence of a fluorescent dye seeded into the jet fluid. The cube is approximately 27 mm on each side, and the data resolve the finest scalar and vorticity scales in the flow. (Reprinted with permission from Quantitative Flow Visualization via Fully-Resolved Four-Dimensional Spatio-Temporal Imaging by W. J. A. Dahm and K. B. Southerland, in Flow Visualization: Techniques and Examples, A. J. Smits and T. T. Lim, eds., Imperial College Press, London, 2000.) See color insert.
yield details of the mixing process and even the complete 3-D, unsteady velocity vector field within the volume (7). This example shows that flow imaging is providing the type of multidimensional, multiparameter data that could be provided only by computational fluid dynamics not too long ago (8). Most imaging in fluid mechanics research involves planar imaging, where the flow properties are measured within a two-dimensional cross section of the flow. This is most often accomplished by illuminating the flow using a thin laser light sheet, as shown in Fig. 2, and then recording the scattered light using a digital camera. The laser light is scattered from either molecules or particles in the flow. The primary emphasis of this article will be on this type of planar laser imaging because it remains the cornerstone of quantitative imaging in fluid mechanics research. Furthermore, planar imaging is often a building block for more complex 3-D imaging techniques, such as that used to produce Fig. 1. Readers interested in
Pulsed laser Timing electronics
Laser sheet Flow facility
CCD camera with filter Cylindrical telescope
f2
f1 Spherical lens
Glass flat
Image acquisition computers White card
Video camera
Figure 2. Schematic of a typical planar imaging experiment.
FLOW IMAGING
details of qualitative flow imaging techniques should note that several good references are available in the literature (5,9,10). Quantitative imaging is substantially more challenging than simple visualization because a greater degree of knowledge and effort are required before the researcher can ensure that the spatial distribution of the flow property of interest is faithfully represented in the image. The first part of this article will discuss some of the most important issues that need to be addressed in quantitative flow imaging. The article will end with a brief survey of primarily planar imaging techniques that have been developed. This survey will not be able to discuss all, or even most, of the techniques that have been developed, but hopefully readers will gain an appreciation for the wide range of techniques that can be applied to their flow problems. BASIC PLANAR LASER IMAGING SYSTEMS Lasers Lasers are used almost universally in flow imaging, owing to their high brightness, coherence, excellent focusing properties, and the nearly monochromatic range of wavelengths at which they operate. Lasers can be either pulsed or continuous wave (CW); pulsed lasers are more commonly used because they provide high-energy pulses that are sufficiently short (e.g., 10 ns) to freeze the motion of nearly any flow. Most lasers used in flow imaging operate at visible or UV wavelengths (11). One of the main reasons for this is that until recently, there were few low-noise imaging arrays that operate outside of the UV-visible to near-IR wavelength range. Furthermore, some techniques, such as Rayleigh and spontaneous Raman scattering, increase in scattering efficiency as the frequency of the incident light increases, and therefore UV and visible lasers have a large advantage over IR sources. Furthermore, planar laser-induced fluorescence (PLIF) techniques typically involve the excitation of atomic/molecular electronic transitions, which occur primarily at UV and visible wavelengths for species of interest in fluid mechanics. The predominance of techniques in the visible/UV is by no means absolute, however, as recent advances in laser and camera technology have enabled the development of PLIF techniques that rely on the excitation of vibrational transitions at IR wavelengths (12). The most widely used laser in flow imaging is the flashlamp-pumped neodymium: yttrium–aluminum garnet (Nd •• YAG) laser, which emits in the infrared (1.06 µm), but whose output is usually frequency-doubled (532 nm), tripled (355 nm) or quadrupled (266 nm), using nonlinear crystals (13). Frequency-doubled Nd •• YAG lasers are primarily used in particle image velocimetry (PIV), Rayleigh and Raman scattering, and for pumping tunable lasers. Nd •• YAG lasers are essentially fixed frequency, but when injection seeded (a technique that is used primarily to obtain very narrow line width), they can be tuned across a narrow frequency range. This ability to tune is used extensively in a class of techniques called filtered Rayleigh scattering (described later). Flashlamp-pumped Nd •• YAG
391
lasers operate at repetition rates of a few tens of Hz and pulse energies of hundreds of millijoules at 532 nm. One drawback to flashlamp-pumped Nd •• YAG lasers is that their repetition rates are typically much lower than the characteristic flow frequencies typical in most applications; the images are thus not temporally correlated and are effectively randomly sampled from the flow. Excimer lasers provide high-energy pulses of UV light (e.g., hundreds of millijoules at hundreds of hertz) in a narrow range of frequencies that depend on the particular gas mixture that is used. The most commonly used wavelengths in flow imaging are 193 nm (ArF), 249 nm (KrF), 308 nm (XeCl), and 350 nm (XeF). Because Rayleigh and Raman scattering are more efficient at short wavelengths, excimers are particularly attractive for these techniques. Furthermore, versions are commercially available that have narrow line width and are tunable over a small range. These lasers can be used to excite the fluorescence from O2 and NO (193 nm) and from OH (248 and 308 nm), without using a dye laser. Coppervapor lasers are pulsed lasers that produce visible light simultaneously at two wavelengths (510 and 578 nm) and operate at high repetition rates (tens of kHz) but have relatively low pulse energies (a few mJ). Because of their high repetition rates, they have been used extensively for high-speed flow visualization (such as smoke scattering), but they are not as widely used as Nd •• YAG lasers because of their relatively low pulse energies. Flashlamp-pumped dye lasers provide very high pulse energies (e.g., a few joules per pulse) but at repetition rates of just a few hertz. Because of their high pulse energies, they have been used primarily in imaging techniques where the signals are very weak, such as in spontaneous Raman or Rayleigh scattering imaging. For spectroscopic techniques, where it is necessary to tune the laser wavelength to coincide with an atomic/molecular absorption line, then laser-pumped dye lasers and more recently, optical parametric oscillators (OPO) are used. Both dye lasers and OPOs are typically pumped by Nd •• YAG lasers, although dye lasers are also pumped by excimers. The use of CW lasers is limited to low-speed flows (typically liquids) or to high-speed flows where only timeaverage measurements are desired. The reason is that they typically provide insufficient energy in times that are short enough to freeze the motion of most gas flows. For example, a 20-W CW laser provides only 0.02 mJ of energy in one microsecond, compared to a frequencydoubled Nd •• YAG that can provide up to 1 J per pulse in 10 ns. The argon-ion laser is the most commonly used CW laser in flow imaging. The argon-ion laser has found a niche particularly for laser-induced fluorescence of dyes seeded into liquid flows. Some techniques, such as cinematographic imaging, require high-repetition rate light sources such as coppervapor or high-repetition rate diode-pumped Nd •• YAG lasers. The latter achieve repetition rates up to hundreds of kHz by acousto-optic Q-switching of a continuously pumped Nd •• YAG rod. The drawback of these highrepetition rate lasers is that they tend to have low energy per pulse (a few millijoules maximum), despite relatively high average power (e.g., 20–50 W). For
392
FLOW IMAGING
slower flows, electro-optically Q-switched diode-pumped Nd •• YAG lasers can produce repetition rates of the order of a kilohertz and pulse energies of the order of tens of millijoules at 532 nm. Recently, a pulse-burst Nd •• YAG laser has been developed that produces a train of up to 100 pulses at a rate as high as 1 MHz and individual pulse energies at 532 nm of about 25 mJ (14). In another technique, repeated Q-switching of a ruby laser (694 nm) was used to generate a train of 65 pulses at a rate of 500 kHz, where the energy for each of the 65 pulses was about 350 mJ (15). If this laser could operate continuously, its average power would be an impressive 175 kW. These laser systems are not currently available commercially, but they are particularly well suited for imaging very high-speed flows.
When the beam diameter at the lens is the same for both diffraction-limited and multimode beams, then the far-field full-angle divergence, θ = d/f , is the same for both beams. However, if the focal spot sizes (d0 ) are made to be the same — because the multimode beam has a larger diameter at the lens — then the divergence will be M 2 times larger for the multimode beam. This is seen by considering the Rayleigh range, which is an important parameter in imaging because it is a measure of the distance across which the laser beam (or sheet) remains focused. The definition of the Rayleigh range xR is the distance along the beam √ from the focus to the point where the beam diameter is 2 times the diameter at the focus. The relationship is xR =
Optics The focusing properties of laser beams are related to the mode structure of the beam, or specifically to the number of transverse electromagnetic modes (TEM) that characterize the energy flux field (16). A single-mode (TEM00 ) laser beam has a Gaussian intensity distribution and is considered diffraction-limited. Note that in this article, the term ‘‘intensity’’ refers to the power density, or irradiance, of the laser beam (in units of W/m2 ), whereas the term ‘‘fluence’’ refers to the energy density (in units of J/m2 ). The focusing properties of diffraction-limited beams are described by Gaussian optics (16). Many laser beams, however, are not diffraction-limited because they contain many transverse modes. Multimode beams have higher divergence and poorer focusing characteristics than singlemode beams. The degree to which a beam is multimode is often specified by the M 2 value (pronounced ‘‘M-squared’’), where the more multimode the beam, the higher the M 2 value, and where M 2 equals unity for a diffractionlimited beam. Many scientific lasers have M 2 values of 1 to 2, although many lasers, such as copper-vapor or high-power diode-pumped Nd •• YAG lasers, can have M 2 values ranging from tens to hundreds. To see the effect of nonunity M 2 , define the beam diameter d as twice the radius where the laser beam intensity drops to e−2 of the maximum. Assume that a laser beam whose initial diameter is d is focused by a spherical lens of focal length f . The resulting focal spot will have a diameter d0 given by the relationship (17), d0 =
4 f λM 2 πd
(1)
The focal spot diameter for a Gaussian (diffraction-limited) beam is 4f λ/(π d); thus Eq. (1) is the same as for a Gaussian beam, except that λ is replaced by λM 2 . Equation (1) shows that the multimode focal spot diameter is M 2 times the diffraction-limited value for equal beam diameter at the focusing lens. Owing to this, a laser beam is often referred to as being ‘‘M 2 times diffraction-limited,’’ meaning that it will have M 2 times the spot size. Equation (1) also shows that it is possible to get a smaller focal spot by using a shorter focal length lens or by increasing the initial beam diameter (by using a beam-expanding telescope).
π d20 , 4λM 2
(2)
which is the same as the Rayleigh range for a Gaussian beam, except that λ has been replaced by λM 2 . Equation (2) shows that for equal spot size, as M 2 increases, the Rayleigh range decreases because of the greater divergence. It can be concluded from this that aberrated beams can be focused to as small a spot as a diffraction-limited beam (by expanding it before the focusing lens), but the focal spot cannot be maintained over as large a distance. Note that the M 2 value can usually be obtained from the laser manufacturer, but it can also be measured by passing the beam through a lens of known focal length and then measuring the beam diameter at several locations (17). In planar imaging, the laser beam is formed into a thin sheet, which can be accomplished by several different techniques (11). One of the more common methods is shown in Fig. 2 where a spherical lens, which is typically plano-convex and has a focal length of 500 to 1000 mm, is used to focus the beam near the center of the field of view of the camera. Such long focal length lenses are used to increase the Rayleigh range, or the distance across which the beam remains focused. The larger Rayleigh range obtained from long focal length lenses does not come without a cost, however, because the longer focal length lenses also result in larger focal spots, or thicker sheets, in planar imaging. Figure 2 also shows the use of a cylindrical telescope formed from a plano-convex lens of focal length f1 and a larger plano-convex lens of focal length f2 . For high peak power laser beams, it may be best to use a negative (plano-concave) lens as the first lens to avoid a real focus and hence reduce the possibility of air breakdown. The reason for using two plano-convex lenses — where the convex sides are directed toward the collimated beam — is that this configuration minimizes the aberrations for a telescope formed from simple spherical lenses (18). The cylindrical lenses expand the laser beam only in one direction, by a factor of f2 /f1 . Because the laser sheet height is determined by the height of the second cylindrical lens, producing large sheets (e.g., 100 mm) requires a large lens, which can be very expensive. Often, the second lens is omitted, and the sheet is allowed to diverge. The disadvantage is that the laser intensity varies in the propagative direction, which can make it harder to
FLOW IMAGING
correct the image of the scattered light for variations in intensity. Because a laser sheet is formed by expanding the beam in only one direction by using a cylindrical lens, the thickness of the sheet at the focus is approximately equal to the focal spot diameter given by Eq. (1). However, when the sheet thickness must be measured, this can be accomplished by using the scanning knife-edge technique. In this technique a knife-edge (e.g., a razor blade) is placed normal to the laser sheet and is translated across it so that the beam is progressively blocked by more of the knife-edge. The transmitted light is measured by a power meter as the knife-edge is translated. The derivative of the power versus distance curve is the mean sheet intensity profile. For example, if the laser sheet intensity profile is Gaussian, then the knife-edge intensity profile will be an error function. Spreading the laser beam into a sheet results in a large reduction in the intensity (or fluence); thus, when the intensity must be maximized, such as in Raman scattering imaging, the laser sheet can be formed by using a multipass cell (19). In this case, the laser beam is reflected back and forth between two confocal cylindrical mirrors. The main problem in this technique is that the sheet intensity profile is very nonuniform, and the nonuniformity may be difficult to correct for on a singleshot basis. In this case, shot-to-shot fluctuations in the intensity distribution can be left as an artifact in the image. Another technique that can be used in low-velocity flows is the scanning method, where a CW laser beam is swept past the field of view by using a moving mirror (6). If time-resolved data are desired, then the sweep time must be short enough to freeze the motion of the flow. Because of this, the scanning technique is really useful only in liquid flows, which have relatively small characteristic flow timescales. Cameras The most commonly used cameras in quantitative imaging are based on charged-coupled device (CCD) arrays or image-intensified CCD arrays. Note that there are a few applications where film may be preferred to a digital camera, such as large field-of-view PIV (20) and highframing-rate PIV (21,22). Nevertheless, CCD arrays have largely supplanted film and other detectors, including TV tubes, photodiode and charge-injection device (CID) arrays, owing to their low noise, excellent linearity, uniformity, and resistance to blooming. The operation of a CCD is based on the fundamental property that a photon incident on the CCD produces an electron–hole pair in a region of silicon that is biased to some potential. The electrons generated are called ‘‘photoelectrons,’’ which migrate to the ‘‘potential well’’ of the CCD pixel where they are stored for later readout. Because the CCD stores charge, it is essentially a capacitor, whose charge is proportional to the number of incident photons. The quantum efficiency η is the ratio between the number of photoelectrons generated and the number of photons incident. Frontilluminated CCDs have quantum efficiencies of 10–50% at visible and near-IR wavelengths (peaking near 700 nm) but are virtually zero at UV and mid-IR wavelengths.
393
Back-illuminated CCDs, although more expensive, provide quantum efficiencies up to 90% in the visible and can maintain good response (e.g., η = 20%) well into the UV. CCD arrays can be full frame, frame transfer, or interline transfer type (23). Full frame CCD arrays read out the charge by shifting it down through the entire array (like a ‘‘bucket brigade’’) into an output register where it is then read out serially. Because the array is used to shift the charge, the image will be blurred if the CCD is exposed during readout. Because readout can take several seconds, a mechanical shutter must be used. In contrast, frame transfer CCD arrays use a photosensitive array and an identical array that is masked off from any incident light. After an exposure, the charge of each pixel is shifted down through the array into the masked array, and the masked array is then read out in the same manner as a full frame CCD array. Frame transfer CCD arrays offer some level of electronic shuttering, but this is limited to a few milliseconds. The pixel area for both full frame and frame transfer CCD arrays is 100% photosensitive, thus the pixel width is the same as the pixel pitch (spacing). Interline transfer CCD arrays have nonphotosensitive storage registers located adjacent to the photosensors. This enables the rapid transfer of charge (in parallel) from the pixels into the storage registers. This makes it possible to rapidly shutter the array electronically, where exposure times of the order of microseconds or less are possible. The interline transfer arrays also enable ‘‘frame straddling,’’ whereby two frames can be captured in rapid succession. For example, standard RS-170 format video cameras based on interline transfer arrays can acquire two video fields in less than 10 µs between frames (24). More expensive scientific grade interline transfer cameras report interframe times as short as 200 ns. Frame-straddling by video cameras is useful for double-pulse imaging in high-speed flows (25), whereas frame-straddling by higher resolution scientific/industrial cameras (e.g., Kodak ES1.0 and ES4.0) is now becoming the norm for PIV because it enables the use of crosscorrelation processing algorithms. The main drawback of interline transfer imagers is that they tend to be noisier than either full frame or frame transfer imagers. The main reason for this is that the storage registers are located adjacent to the photosensitive sites; therefore the photosensitive area of the pixel is substantially smaller than the physical area of the pixel. The fraction of the pixel area that is photosensitive is called the ‘‘fill factor’’ and is typically 20–30% for an interline transfer CCD. As will be discussed later, the signal scales with the number of photons collected per pixel; thus low fill factors result in low signals. Some manufacturers mitigate this problem to some extent by using microlenses over each pixel to collect light across a larger area and can increase the fill factor to about 60%. If neither electronic shuttering nor frame straddling is required, then full frame or frame transfer imagers are desired to maximize the signal-to-noise ratio (SNR). Generally, the relatively long shutter times are not a problem when pulsed lasers are used because the laser pulse duration acts as the exposure time. Intensified CCD cameras (ICCD) are used for low lightlevel imaging and for very short exposure times (e.g., as
394
FLOW IMAGING
low as a few nanoseconds). The most common type of image intensifier consists of a photocathode, a microchannel plate, a phosphor screen, and a mechanism to couple the screen to the CCD (26,27). Photons that are incident on the photocathode eject photoelectrons, which in turn are amplified in the microchannel plate. The amplified electrons contact the phosphor screen causing photon emission, and these photons are collected by the CCD. The phosphor screen is usually coupled to the CCD by a fiber optic bundle, although lens coupling is also used. Image intensifiers are shuttered by switching on and off, or ‘‘gating,’’ the photocathode by a high-voltage pulse. The electron gain is a function of the voltage applied across the microchannel plate. Short duration gating is necessary to reject the background luminosity of very luminous flows, such as sooting flames or plasmas. Because the duration of the laser scattering signal is often of the order of several nanoseconds, short gates greatly reduce the background luminosity but do not affect the signal. One of the main drawbacks of intensifying CCD cameras is that the intensifiers tend to have both lower resolution and lower signal dynamic range than the bare CCD. The dynamic signal range is usually limited by saturation of the microchannel plate, particularly at high electron gain (26), rather than by saturation of the CCD itself. Furthermore, as will be shown later, it is unlikely that an ICCD camera will provide better SNR than a low-noise CCD camera under the constraint that a certain minimum SNR is required for an image to be useful for quantitative analysis. For these reasons, ICCD cameras are preferred to low-noise UV-sensitive CCD cameras only when fast gating is required, which is why they are primarily used for imaging high-temperature gases.
light scattering are particularly susceptible to a low SNR because the laser beam must be spread out into a sheet; thus, signals are lower by hundreds to thousands of times, compared to a point measurement with the same laser energy. Figure 3 shows a generic camera system that views a region in the flow that is illuminated by a laser light sheet of height yL and thickness z. Assume that the camera uses an array sensor and a lens of known focal length f and limiting aperture diameter D. Each pixel of the camera, of width δx and height δy, transforms to a region in the flow of dimensions, x = δx/m, y = δy/m, where m = yi /yo is the magnification and yi and yo are as defined in Fig. 3. Each pixel also spatially integrates the signal in the z direction across a distance equal to the sheet thickness z. Note that usually in flow imaging, the image is inverted, and the magnification is typically less than unity, that is the object is minified. Now, assuming that a pulsed laser light sheet is used that has a local fluence FL , then the number of photons collected by each pixel Spp will be Spp =
FL dσ V n ηt , hν d
(3)
where h is Planck’s constant, ν is the laser frequency, V = xyz is the volume imaged by each pixel, dσ/d is the differential scattering cross section, n is the number density of the scattering medium, is the solid angle subtended by the lens, and ηt is the transmission efficiency of the collection optics (lens and spectral filters). For a CW laser, FL = IL t, where IL is the laser intensity (power flux density) and t is the integration time. The solid angle, = (π D2 /4)/z2o (where zo is the distance from the object to the lens), is related to the magnification and f number (f# = f /D) of the lens by:
SIGNAL AND NOISE One of the most critical issues in flow imaging is obtaining an adequate SNR. Imaging measurements that use laser
=
π m2 . 2 4 (f# ) (m + 1)2
(4)
yL Collection lens
yo
CCD array (image plane)
yi
dy ∆y
dx
∆x Field-of-view (object plane)
y x z
Laser sheet ∆z
Figure 3. Planar laser imaging of a flow field using an array detector.
FLOW IMAGING
Assuming that the laser sheet is uniform (i.e., the fluence is constant), then the fluence can be approximated as FL = EL /yL , where EL is the laser energy. Now combining Eqs. (3) and (4), and substituting x = δx/m and y = δy/m gives
π 1 4 (f #)2 (m + 1)2
Spp =
1 0.8
nηt .
(5)
Equation (5) shows that the photons collected per pixel actually increase as m → 0, or as the camera is moved farther from the object plane. This may seem counterintuitive because the solid angle subtended by the lens progressively decreases. The reason for this is that x and y increase as the magnification decreases, which means that each pixel collects light from a larger region of the flow. This is correct as the problem has been posed, but is not realistic, because it assumes that the laser sheet has the same fluence, regardless of the field of view. However, in practice, as the camera is moved farther away, the laser sheet must be enlarged to accommodate the larger field of view. To see this effect, assume that the condition yL = yo must be maintained as the magnification is changed; in this case yL = Np δy/m, where Np is the number of pixels in one column of the array. Now, Eq. (5) reduces to
1.2
Spp /(Spp)max
Spp
EL δxδy dσ = hν yL d
395
0.6 0.4 0.2 0
0
0.5
1
1.5 Magnification
2
2.5
3
Figure 4. Relative variation of photons-per-pixel (Spp ) versus magnification for a typical planar imaging experiment.
an electron from a photocathode. The signal Se (in units of electrons, designated as e− ) is given by Se = ηSpp G,
(7)
EL yi dσ π m nηt . hν (Np )2 d 4 (f# )2 (m + 1)2
(6)
This form of the equation is probably the most useful for seeing the effect of varying different parameters. For example, Eq. (6) shows that the signal depends only on the laser energy (actually, the term EL /hν represents the total number of incident photons) and is independent of z or on how tightly the sheet is focused. Although tighter focusing increases the fluence, this effect is counteracted by a decrease in the number of molecules that is available to scatter the light. In addition, as the number of pixels is increased (at fixed detector size yi ), the signal decreases because the pixels are smaller and thus collect light from a smaller area of the flow. This shows the importance of having large pixels (or small Np at fixed yi ) to improve the SNR, albeit possibly at the expense of resolution. The trade-off between SNR and resolution is a fundamental one, whose manifestation in point measurements is the trade-off between SNR and bandwidth (or response time). Equation (6) also shows that Spp ∼ m/(m + 1)2 , a dependence that is plotted in Fig. 4. Here, it is seen that the signal is maximized at a magnification of unity and that there is an abrupt decrease in signal as m → 0. Equation (6) also shows that the signal is inversely proportional to f#2 , and thus it is essential in many imaging techniques to use lenses that have low f numbers. For several techniques such as PLIF, Rayleigh scattering, and Raman scattering in gas-phase flows it is difficult to obtain adequate SNRs using lenses whose f numbers are higher than f /1.2. Equation (6) gives the number of photons incident on a pixel of a generic detector. The resulting signal then consists of the photoelectrons that are generated, whether by creating an electron–hole pair in a CCD or by ejecting
where G is the overall electron gain from the photocathode to the CCD. For an unintensified CCD, G = 1. The noise in the signal will have several sources, but the dominant sources in scientific grade CCD and ICCD cameras are shot noise and ‘‘read’’ noise. Shot noise results from statistical fluctuations in the number of photoelectrons generated at each pixel. The statistical fluctuations of photoelectrons and photons exhibit Poisson statistics, for which the variance is equal to the mean (28). Most of the shot noise arises from statistical fluctuations in the photoelectrons generated, although some noise is induced in the amplification process of image intensifiers. The shot noise (in units of e− ), which is the square root of the variance, is given by (29) Nshot = G(ηκSpp )1/2 ,
(8)
where κ is the noise factor. The noise factor quantifies the noise that is induced through the overall gain process between the photocathode and the array; for an ICCD, it is gain dependent and falls within the range of 1.5 < κ < 2.5. In an unintensified CCD, G = κ = 1, and the shot noise is equal to (ηSpp )1/2 , which is the square root of the number of photoelectrons collected per pixel during the integration period. One way of interpreting the shot noise in a detector array is to consider the case where the array is composed of identical pixels and is illuminated by a spatially uniform light source. If it is assumed that each pixel collects an average of 1000 photons during the integration time and if it is further assumed that η = 0.1, then, on average, each pixel will collect 100 photoelectrons. However, the actual number of photoelectrons collected will vary from pixel to pixel, and compiling a histogram of the pixel values will reveal that the variance of the distribution is equal to the mean number of photoelectrons collected per pixel.
FLOW IMAGING
The dominant noise source intrinsic to scientific grade CCD cameras is ‘‘read noise’’ (30). Read noise is incurred in the output registers in the process of converting the charge of each pixel into a voltage that can be read by an analog-todigital converter. A pixel is read by transferring the charge of each pixel to a small capacitor, whose integrated charge is converted to a voltage by an on-chip amplifier. The dominant sources of read noise are dark-current shot noise, ‘‘reset noise,’’ and output amplifier noise. Dark current is the current that is generated in the absence of incident light due to thermally induced charge carriers. Cooling a CCD greatly reduces the dark current. For example, an uncooled CCD might generate a dark current of 300 e− /s at 20 ° C, but only 1 e− /s at −40 ° C. Owing to the relatively short exposure and readout times that are typically used in flow imaging (of the order of 10 seconds or less), shot noise in dark current is not usually a large contributor to the noise in cooled CCD arrays. Reset noise is injected into the small capacitor by a switching transistor, whose job is to reset the capacitor to a reference voltage in preparation for reading the next pixel’s charge. This switching transistor contaminates the capacitor charge with both ‘‘digital feedthrough’’ and thermal noise. Digital feedthrough noise is caused by capacitive coupling of the clock signals through the switching transistor. These noise sources can be greatly limited by slow (low-bandwidth) readout rates and correlated double sampling (30,31). Because means have been developed to reduce these noise sources, the intrinsic camera noise is typically limited by the on-chip output amplifier to a few electrons rms per pixel (typically 5–20 e− ). When photoelectron shot noise is not the only noise source, then it is assumed that the noise sources are uncorrelated and therefore their variances add. In this case, the SNR is given by (29) SNR =
ηSpp G , 2 )1/2 (ηκSpp G2 + Ncam
(9)
where Ncam is the intrinsic background noise of the camera (in electrons rms) and includes contributions from amplifier noise, digital feedthrough noise, thermal noise, dark-current shot noise, and quantization noise from the analog-to-digital converter. There are several interesting implications of Eq. (9). The first is seen by considering the limit when the signal is dominated by shot noise, that is, 2 . This shot-noise-limited operation when ηκSpp G2 Ncam of the detection system occurs when either the read noise is small or when the signal is high. Equation (9) also shows that it is possible to obtain shot-noise-limited operation by increasing the gain until the first noise term dominates the other. This is the way an image intensifier works; it provides very high electron gain through the microchannel plate and thus causes the shot noise to overwhelm the intrinsic noise sources in the camera. It may seem odd that the goal is to increase the noise, but the signal is also increased as the gain increases, so the SNR either improves or remains constant. At low gain, the signal will be detector-noise-limited. As the gain is increased to arbitrarily high levels, the SNR continues to improve until it reaches the shot noise limit, beyond which the SNR is
constant. This is seen in Eq. (9) by letting G → ∞, in which case the SNR becomes independent of G. Because electron gains of 103 are possible by using single-plate microchannel intensifiers that are typical, it is possible to operate in the shot-noise-limited regime, even when the camera that stores the image has relatively high noise, such as a video format CCD camera. The dynamic range of a CCD — defined as the ratio of the maximum to the minimum usable signals — is limited by the well depth, which is the total number of photoelectrons that can be stored in a CCD pixel, and the intrinsic noise of the camera. Specifically, the dynamic range DR is given by (29) DR =
Se,sat − Sdc , Ncam
(10)
where Se,sat is the signal at saturation (full well) and Sdc is the integrated dark charge. For example, for a cooled slow-scan CCD array whose integration time is short (hence low Sdc ) and has a well depth of 105 e− and noise of 10 e− , then DR ≈ 104 , which is much larger than can usually be obtained in single-shot planar imaging. The dynamic range of an ICCD can be much smaller than this because the electron gain from the photocathode to the CCD effectively reduces the well depth of the CCD (29). For example, if the overall electron gain is 102 , then a CCD that has a well depth of 105 e− will saturate when only 103 photoelectrons are generated at the photocathode. In addition, ICCD cameras may have an even lower dynamic range than that allowed by saturation of the CCD well because of saturation of the microchannel plate (26). Figure 5 shows how the SNR varies as a function of the number of photons per pixel for cameras of high and low read noise, as might be found in video format and slow-scan CCD cameras, respectively. In this figure, it is assumed that η = 0.7 for the low-noise camera and Ncam = 10 e− , and η = 0.7 and Ncam = 200 e− for the high-noise camera. Also shown is the case where the high-noise camera has been intensified. It is assumed that the intensified camera
1000
100 Shot-noise limited 10 SNR
396
1
Camera-noise limited
Low noise CCD High noise CCD Intensified CCD
0.1
0.01 10
100
1000
104
105
S pp (photons/pixel) Figure 5. Variation of the SNR versus signal (Spp ) for three different camera systems.
FLOW IMAGING
has a lower quantum efficiency (η = 0.2) and G = 500. Dark charge has been neglected in all cases. The high-noise camera is camera-noise-limited for the entire range of Spp (hence, the slope of unity on the log–log plot), whereas the low-noise camera is camera-noise-limited only for low Spp . As expected, the SNR is substantially higher for the lownoise camera at all Spp . At higher Spp , the low-noise camera becomes shot-noise limited, as seen by the region where the slope is one-half on the log–log plot. By intensification, the high-noise camera reaches the shot-noise limit even at very low Spp ; thus results in a SNR that is even higher than that of a low-noise camera. However, for Spp greater than about 60, the low-noise camera outperforms the intensified camera, owing to its higher quantum efficiency. Figure 5 also shows that at an overall electron gain of 500, if the well depth is 10−5 e− , the intensified camera saturates the CCD when 1000 photons are incident per pixel. One point to consider is that for flow imaging, it is usually not necessary or desired to intensify a slow-scan low-noise CCD camera, unless gating is required to reject a luminous background. The main reason is that if the signal is so low that read noise is a significant contributor to the total noise, then it is unlikely that single-shot images will be useful for quantitative purposes. For example, assume that a minimum SNR of 20 is desired for quantitative analysis and that the intensified slow-scan camera has κ = η = 1, is operated at high gain, and the CCD has 10 e− rms of read noise. If 100 e− are collected per pixel, then the high gain overwhelms the read noise, and the signal is shot-noise limited, that is, SNR = (100)1/2 = 10, which is well below our minimum value. Now, assuming that 500 e− are collected, then the SNR based only on shot noise is (500)1/2 = 22. However, at these signal levels, the signal is nearly shot-noise-limited, even without the intensifier, because including the camera noise gives a SNR ≈ 20; thus there would be very little benefit in intensifying the CCD. The fact that the intensifier is likely to have a smaller dynamic signal range, worse resolution, lower quantum efficiency, and a larger noise factor than the CCD, makes intensification even less desirable. It is also interesting to consider how the high-noise camera would perform with the signal of 500 e− . In video format cameras, the read noise will be about 100–200 e− rms. Using the lower value, the SNR for the video camera would be 500/100 = 5. In this case, adding an image intensifier would be an advantage because high electron gain could be used to obtain shot-noise-limited operation, so that the SNR = (500)1/2 = 22 (assuming equal η with and without intensification). IMAGE CORRECTIONS Quantitative imaging always requires several correction steps so that the measured signal can be related to the flow property of interest and to ensure that the spatial structure of the object is faithfully represented by the image. First, consider corrections to the signal measured at each pixel of the array. Most planar imaging involves only relative measurements of signal intensity, from which absolute measurements can be obtained by calibrating a single point within the image. To obtain an image
397
that represents quantitatively accurate relative intensity measurements requires making several corrections to the measured image. For example, let Se (x, y) represent the desired signal level at a given pixel or location on the array (x, y). By ‘‘desired’’ it is meant that Se (x, y) is proportional to the number of photons incident on that pixel originating from the scattering process of interest. The signal Se can be related to the total signal (Stot ) recorded at that pixel by the imaging system through the relationship
Stot (x, y, ti , tro ) = w(x, y) L(x, y)Se (x, y) + Sback (x, y, ti ) + Sdark (x, y, tro ),
(11)
where L(x, y) is a function that is proportional to the laser sheet intensity (or fluence) distribution function, Sback is the signal resulting from unwanted background light, Sdark is the fixed pattern signal that occurs with no light incident on the detector, ti is the exposure time, and tro is the array readout time (which includes the exposure time). The function w(x, y) is the ‘‘white-field’’ response function, which accounts for variation in the signal across an image of a uniformly white object. It has been assumed that a pulsed laser is used as the light source, in which case the signal Se is not a function of the exposure time. Furthermore, in general, all of the functions involved in the correction may vary from shot to shot. The desired scattering signal is obtained by solving for Se in Eq. (11): Stot (x, y, ti ) − [w(x, y)Sback (x, y, ti ) + Sdark (x, y, tro )] . Se (x, y) = w(x, y)L(x, y)
(12)
Equation (12) gives a means of obtaining the desired scattering signal image by arithmetic processing of the signal and correction images. Sdark (x, y, tro ) is not noise because it is an offset that is nominally the same for each image that has the same exposure and readout time. The dark image is obtained by acquiring an image when the shutter is closed (or when the lens cap is on) and using the same integration and readout times as in the experiment. The background signal Sback (x, y), is due to reflections of the laser from walls/windows, natural flow luminosity (as in combustion), fluorescence from windows or species not of interest, and external light sources. For nonluminous flows, a good approximation to the background can be obtained by acquiring an image when the laser beam is present but without the scattering medium (e.g., without the fluorescent species seeded into the flow). This is only an approximation of the actual background because the light itself that is scattered from particles/molecules in the flow can reflect from the walls and windows; therefore, an image obtained when the scattering medium omitted may not have the same background signal as during an actual experiment. There is usually no simple way around this problem, but fortunately, this effect is often negligible. It is important to note that the background cannot be measured directly because it is the function wSback that is actually measured when a background image is acquired. In fact, the background image is also affected by the dark signal; therefore, if the background image
398
FLOW IMAGING
is acquired by using the same exposure and readout times as the scattering signal image, then this yields the term Scorrection = (wSback + Sdark ) in Eq. (12). In this case, the correction relationship is simply, Se = (Stot − Scorrection )/(wL). Note also that to reduce the effect of noise on the correction procedure, the images Scorrection (x, y), w(x, y), and L(x, y), should be average images, unless the corrections are made on a single-shot basis. If the flow is unsteady and luminous, the luminosity varies from shot to shot, and therefore, it is more difficult to correct for the background signal. In this case, it is useful to consider the signal-to-background ratio (SBR), Se /Sback , which is sometimes confused with the SNR. Background luminosity is usually not random, and thus it is not noise (although it may appear so if one does not have an easy way to correct for it). One option for dealing with background luminosity is to reduce the luminosity incident on the array through gating, by using an intensified camera or by using spectral filters in front of the camera that pass the scattered light but reject the bulk of the luminosity. Another option is to use a second camera to capture an image of the flow luminosity a very short time before (or after) the laser fires. This assumes, of course, that the flow is essentially frozen for each camera image, which is unlikely to be the case for the millisecond shutter times used for full frame CCD cameras, but it is likely to be true when using microsecond gates and an intensified camera. The laser sheet intensity distribution function, L(x, y), is not easy to obtain, but it can be approximated in a few different ways. In general, the sheet intensity varies in both the x and y directions and from shot to shot. Figure 2 shows a technique, described in (32), for measuring L(x, y) on a single-shot basis. For single-shot corrections, it is necessary to collimate the laser sheet, so that L is a function only of y. In this case, part of the laser sheet energy can be extracted, as done using the glass flat in Fig. 2, and directed onto a target. The glass flat reflects several percent of the laser light from each surface, depending on the angle of incidence (33). In Fig. 2 the target is a white card, although a cell containing fluorescent material could also be used (e.g., laser dye in water, or acetone vapor). The scattering (or fluorescence) from the target must obviously be linear in its response to the incident light intensity and must scatter the light uniformly. In Fig. 2, a video camera is used to image the laser sheet intensity profile. Rather than using a target, it is also possible to image the beam directly using a 2D or linear array. The main drawback of this technique is the risk of damage to the array by the focused laser beam. The scattering image and the sheet profile can be registered by blocking the beam, before the optical flat, at two discrete vertical locations using two very thin wires. Both the scattering image and the profile image will include a shadow of the wires, which can be used to index the two images. If the laser sheet is not collimated, but diverging, this makes it much more difficult to correct for the sheet on every shot. In this case, the laser energy and distribution must be sufficiently repeatable so that L(x, y) can be obtained at a time different from that for the scattering image. The correction image is obtained by
placing a uniform, linear scattering medium in the field of view. Sometimes, it is possible to use the Rayleigh scattering from the air itself, although it is more common to have to introduce a more efficient scattering medium, such as smoke or a fluorescent test cell. Care must be taken when using fluorescent materials, such as laser dyes or acetone vapor, because they will cause substantial absorption of the beam if the concentration is too high. Unless the absorption itself is corrected for, the sheet intensity distribution will be incorrect. Therefore, when using fluorescent media, it is best to use very low concentrations to keep the absorption to less than a few percent across the image. The low concentration may necessitate averaging the correction image over many shots to obtain sufficient SNR. The white-field response function, w(x, y), is obtained by imaging a uniformly white field, such as a uniformly illuminated white card. The signal of a white-field image will tend to decrease from the center of the image because the solid angle subtended by the lens is smaller for point sources located near the periphery of the field of view. The variation in intensity across an image formed by a circular aperture will theoretically follow the ‘‘cosine-to-the-fourth’’ law, or I(β)/I(0) = cos4 β, where β is the angle between the optical axis and a line connecting the center of the lens aperture and the given point on the object plane (18). The white-field response function will also enable correction for variable response of the pixels in the array. Note that the dark charge contribution to the signal must also be subtracted from the white-field image. In some cases, it will be necessary to correct for geometric distortion. The distortion in an image is typically larger for points farther from the optical axis. For this reason, a square will be imaged as an object whose sides either bulge out (called barrel distortion) or in (called pincushion distortion). When using high quality photographic lenses, the maximum distortion is usually small (often less than a pixel). However, when it must be corrected for, this is usually accomplished by imaging a rectangular grid and then warping (or remapping) the image so that each point of the grid is consistent with its known geometry (34). The warping procedure involves finding a large number of ‘‘tie’’ points across the image such as the ‘‘points’’ where two gridlines cross and using these to solve for a set of polynomial coefficients required for the remapping. Pixels other than the tie points are remapped by interpolating among coefficients for the tie points. IMAGING SYSTEM RESOLUTION Even though the proper specification of the resolution of the imaging system is often critically important to a particular application, it is often neglected in flow imaging studies. For example, it is not unusual to find scalar imaging papers that quote the resolution in terms of the area that each pixel images in the flow. In many cases, however, this is not the factor that limits the resolution, particularly when using fast (low f# ) optics. A somewhat better approach involves imaging a standard resolution target, such as the USAF or NBS targets (35), available
FLOW IMAGING
from major optics companies, which are composed of a periodic sequence of light and dark bars of varying spatial frequency. The user typically reports the resolution limit as the smallest set of bar patterns for which a contrast modulation can be distinguished. In some cases, this may give the user an idea of the limiting resolution of the imaging system, but this technique is subjective, can be misleading because of aliasing (discussed further below), and is inadequate as a measure of the limitations that finite resolution impose on the data. The resolution is fundamentally related to the pointspread function (PSF), which is the intensity distribution at the image plane, Ii (x, y), produced by imaging an infinitesimally small point source of light. The overall size of the PSF is referred to as the blur spot, whose diameter is denoted as dblur . In the diffraction limit, the PSF will be the Airy function (33), which has a blur spot diameter that can be approximated as the Airy disk diameter, (dblur )dl , given by the relationship (20) (dblur )dl = 2.44(m + 1)λf# .
399
(a)
(b)
(13)
Most flow imaging experiments employ camera lenses designed for 35-mm film cameras. When used at high f# and for magnifications that are not too far off design, these lenses give nearly diffraction-limited performance. The lenses have several lens elements that are necessary to correct for the many types of aberrations, including spherical, chromatic, coma, astigmatism, and distortion. However, chromatic aberrations are not usually a problem in flow imaging, because in most cases the scattered light is effectively monochromatic. In practice, such photographic lenses used at low f# and off-design produce spot sizes that can be several times larger than the Airy disk. For example, Fig. 6 shows digitally sampled images of a point light source (λ = 532 nm) whose diameter is approximately two microns in the object plane, taken at unity magnification by a Nikon 105-mm Micro lens coupled to a Kodak ES1.0 1 k × 1 k CCD camera (9 µm × 9 µm pixels). For comparison, the length of the horizontal white bar below each image of Fig. 6 is equal to the diffractionlimited spot size computed from Eq. (13). Figure 6 shows that the spot size is approximately diffraction-limited at f /22 and f /11, but at f /2.8, the spot size is about 50 µm, which is substantially larger than the diffractionlimited value. The increase in the blur spot, relative to the diffraction limit, results from the greater aberrations of the lower f# . The PSF directly affects the resolution because the image is the result of the convolution of the PSF with the irradiance distribution of the object. Therefore, the smallest objects that can be imaged are related to the size and shape of the PSF; worse resolution is associated with a broader PSF or larger blur spot. In addition to setting the limiting resolution, or the highest spatialfrequency structure that can be resolved, the imaging system also tends to blur increasingly smaller scale structures. Because of this, it is usually not sufficient to simply state the limiting resolution of the system. For example, it will be shown later that measurements of scalar gradients, such as derived from temperature or
(c)
Figure 6. Digitally sampled point-spread functions acquired using a Kodak ES1.0 CCD camera (9 × 9 µm pixels) fitted with a Nikon 105-mm lens. The object imaged is a point source approximately 2 µm in diameter, and the magnification is unity. The three images are for three different aperture settings: (a) f /22, (b) f /11, and (c) f /2.8. The white line below each spot is the diameter of the diffraction-limited blur spot.
concentration fields, can exhibit substantial errors due to resolution limitations, even at frequencies substantially lower than the limiting resolution of the system. The blurring incurred by an imaging system that has finite resolution is essentially a result of the system’s inability to transfer contrast variations in the object to the image. The accepted means of quantifying how accurately an imaging system transfers contrast is the optical transfer function (OTF) (18,35). The OTF, which is analogous to a linear filter in time-series analysis, describes the response of the imaging system to a sine wave contrast variation in the object plane. For example,
400
FLOW IMAGING
assume that the intensity distribution of the object is described by the equation Io (x) = b0 + b1 cos(2π sx),
(14)
where Io is the intensity of the object, b0 and b1 are constants, and s is the spatial frequency (typically in cycles/mm, or equivalently, line-pairs/mm). It can be shown that a linear system will image the object as a sine wave of the form (18) Ii (x) = b0 + c1 cos(2π sx − φ),
MTF(s) =
(15)
where Ii is the intensity of the image, c1 is a constant, and φ is a phase shift. Examples of functions are shown in Fig. 7, where the image exhibits both a reduction in the contrast (i.e., c1 < b1 ) and a phase shift, which corresponds to a shift in the location of the wave. Because the phase shift is associated with a shift in the position of the image, it is generally associated with geometric distortion. The OTF can be described mathematically by the relationship OTF(s) = MTF(s)eiPTF(s) ,
(16)
where MTF(s) is the modulation transfer function and PTF(s) is the phase transfer function. The MTF describes the contrast transfer characteristics of the imaging system, and the PTF describes the phase transfer characteristics. Equation (16) shows that the magnitude of the OTF is the MTF, that is, MTF(s) = |OTF|. The MTF is generally considered more important in describing the transfer characteristics of an imaging system because
s−1
(a)
phase differences typically occur only at high spatial frequency where the MTF is very small (35). The MTF is measured by imaging objects that have a sine wave irradiance variation of known spatial frequency. The maximum and minimum intensities are defined as Imax , Imin , respectively, and the contrast of the object is defined as Co = (Iomax − Iomin )/(Iomax + Iomin ). The contrast of the image is defined similarly as Ci = (Iimax − Iimin )/(Iimax + Iimin ). The MTF is then defined as
b 0+ b 1
(17)
For an imaging system that reproduces the contrast of an image perfectly, the MTF is equal to unity, but for all real imaging systems, MTF → 0 as s → ∞. For example, Fig. 8 shows the MTF of a diffractionlimited f /8 lens at a magnification of unity. The figure shows that the MTF immediately begins decreasing as spatial frequency increases, and implies that there are no nonzero frequencies that can be imaged without contrast distortion. This is different from ideal transfer functions in time-series analysis, which generally have a flat response over a wide range and then roll off only at high frequency. In imaging, it is virtually impossible to measure without some level of contrast distortion. The limiting resolution is often specified by a cutoff frequency sco , where the MTF goes to zero. Note that all diffraction-limited MTFs have a universal shape and a cutoff frequency (sco )dl that is related to the numerical aperture (NA) on the image side of the lens and the wavelength of light (36). In the literature, it is common to see the cutoff frequency related to the lens f# but assuming an infinite conjugate ratio (i.e., object at infinity). However, for noninfinite conjugate ratios and assuming that the image is formed in a medium whose index of refraction is unity, the cutoff frequency depends on the magnification per the relationship (sco )dl = [λf# (m + 1)]−1 .
b0
Io(x )
1
b 0− b 1
0.8
x
Diffraction-limited imaging system
(b) f
0.6
b 0+ c 1 b0
MTF
Ii (x )
Ci Co
0.4 Aberrated imaging system
b 0− c 1 0.2
x Figure 7. Effect of the imaging system on a sine wave object. (a) the irradiance distribution of the object; (b) the irradiance distribution of the image resulting from the convolution of the object sine wave with the LSF. The resulting image exhibits contrast reduction and a phase shift φ. (Adapted from W. J. Smith, Modern Optical Engineering: The Design of Optical Systems, 2e., McGraw-Hill, NY, 1990, with permission of The McGraw-Hill Companies.)
4% MTF 0
0
20
40 60 80 100 Spatial frequency (cycles/mm)
120
Figure 8. Diffraction-limited MTF for an f /8 lens operated at a magnification of unity. Also shown is a hypothetical MTF for an aberrated imaging system. The cutoff frequency for the diffraction-limited MTF is (sco )dl = 117 cycles/mm.
FLOW IMAGING
The diffraction-limited MTF is given by (18) MTF(s) =
2[α(s) − cos α(s) sin α(s)] π
(18)
where, α(s) = cos−1 [s/(sco )dl ]. The human eye can distinguish contrast differences of a few percent, and so the cutoff frequency, particularly for Gaussian MTFs, is sometimes specified as the frequency at which the MTF is 0.04, or 4% of the peak value. Figure 8 also shows a hypothetical MTF for an aberrated optical system. The aberrated system exhibits reduced contrast transferability across the entire frequency range and a lower cutoff frequency. One of the main advantages of the concept of the MTF is that MTFs for different components of an optical system can be cascaded. In other words, the overall MTF is the product of the MTFs of each component. For example, the overall MTF for an intensified camera system is the product of the MTFs for the photocathode, microchannel plate, phosphor screen, optical fiber bundle, and CCD. Because virtually all MTFs exhibit rapid roll-off, the overall MTF is always worse than the worst MTF in the system. It is enlightening to consider an example of how significantly the MTF can affect a certain type of measurement. Assume that it is desired to measure the irradiance gradient dIo /dx, such as is necessary when computing diffusive fluxes. Consider an object that has a sine wave intensity distribution as given by Eq. (14). It can be shown that the image contrast is given by (18) Ii (x) = b0 + b1 MTF(s) cos(2π sx − φ)
(19)
The derivatives of both Io and Ii are sine waves; for simplicity, consider only the maximum derivative, which occurs at 2π sx − φ = π/2. In this case, the relative error in the maximum gradient (derivative) is 1 Error = dIo dx
dIo dIi − dx dx
= 1 − MTF
(20)
Equation (20) shows that the error in the gradient is very large (96%) at the 4% MTF point. If an error no larger than 10% is desired, then the MTF at the frequency of interest must be no less than 0.9. This can be a very stringent requirement for some imaging systems. For the diffraction-limited case shown in Fig. 8, the measurements would be limited to frequencies less than 10 cycles/mm or wavelengths greater than 100 µm. As exemplified in Fig. 8, the situation is typically much worse for an actual aberrated imaging system. In practice, the MTF is a very difficult thing to measure directly because it is difficult to achieve a true sine wave contrast modulation in the object plane (35). It is relatively easy, however, to produce black-and-white bar patterns of varying frequency, which is why the MTF is often approximated by this method. The response of the system to a periodic black-and-white bar pattern is sometimes called the contrast transfer function (CTF) (also the square-wave transfer function). The CTF is relatively
401
easy to measure, and several square-wave targets are available commercially. However, the CTF is not the same as the MTF, although they are related. Because the FT of a square wave is a sinc function, which exhibits a finite bandwidth of frequencies, the CTF is a reflection of the imaging system’s ability to transfer contrast across a range of frequencies, rather than at just a single frequency as for the MTF. The CTF is related to the MTF by the relationship (28) CTF(3s) CTF(5s) π CTF(s) + − MTF(s) = 4 3 5 CTF(7s) CTF(11s) − + ··· . + 7 11
(21)
The CTF generally has a shape similar to that of the MTF, but it will have higher values of the transfer function at a given spatial frequency; therefore, measuring the CTF tends to give the impression that the resolution is better than it actually is. Despite the ease of measuring the CTF, it is not a recommended means of determining the resolution because it is not very accurate, particularly when using discrete sampling detectors, such as CCD arrays (35,37). An array detector can be thought of as a device that averages, owing to the finite size of the pixels (δx), and samples at a frequency that is the inverse of the pixel pitch (spacing) a. When the image, as projected onto the array detector, is sampled at too low a frequency, then aliasing can occur. Aliasing occurs when high-frequency components of the image are incorrectly sampled as lower frequency components and results in spurious contrast modulation in the sampled image. Aliasing can be avoided by ensuring that the image (before sampling) has no frequency content higher than the Nyquist frequency sN = (2a)−1 . When the spatial frequency content of the image is higher than the Nyquist frequency, then the resulting spurious frequency content can mislead the user into thinking that the resolution is higher than it actually is (38). In flow imaging, the input optics typically have a cutoff frequency that is higher than the Nyquist frequency of the array, and thus aliasing is often a potential problem. Furthermore, the broad range of frequencies in a squarewave target makes it very difficult to avoid aliasing effects. In fact, the avoidance of aliasing when measuring contrast transfer characteristics is imperative because the MTF of a discrete sampling detector is not even defined when aliasing is present (35,37). The reason is that for a device to have an MTF, it must be linear and isoplanatic. Isoplanatic means that the output image is insensitive to movement of the input image. Array detectors are typically sufficiently linear, but they are not necessarily isoplanatic. For example, consider the case where a white/black bar pattern is imaged at a magnification of unity and where the spacing of the bars is equal to the pixel pitch. In this case, the contrast modulation of the image will depend on whether the bars are ‘‘in-phase’’ (aligned with the pixels), or ‘‘out-of-phase’’ (straddling the pixels). Such nonisoplanatic behavior is mainly a problem at spatial frequencies near the Nyquist limit. For this reason, MTFs
402
FLOW IMAGING
for CCDs and other detectors can be considered ‘‘pseudo’’MTFs only, which have a limited range of applicability. For example, it has been shown that array detectors are approximately isoplanatic for frequencies lower than SN (35). From purely geometric considerations, the array MTF follows a sinc function, MTF(s) =
sin(π δx s) , π δxs
(22)
which goes to zero at a frequency of s = 1/δx. In practice, the MTF will be smaller than given by Eq. (22), owing to the diffusion of photon-generated charge carriers, light scatter between detector elements, reflections between the array and the protective window, and nonideal chargetransfer efficiency. For video systems, the processing electronics and frame grabber will also reduce the quality of the MTF. Several studies have shown that a useful means of inferring the MTF is by measuring the line-spread function (LSF). The LSF is the 1-D analog of the PSF because it is the intensity distribution at the image plane resulting from imaging an infinitesimally narrow slit at the object plane. The importance of the LSF is that its FT is the OTF (35). Furthermore, if the LSF is a symmetrical function, then the OTF is real, indicating that there is no phase distortion and the PTF is zero. If the intensity distribution of the PSF is given by p(x, y), then the LSF irradiance distribution is
k(x) is the convolution of a step function with the LSF. It necessarily follows that the derivative of k(x) is the LSF l(x) =
(24)
Figure 9 shows example an setup for obtaining the LSF by scanning a knife-edge and monitoring the output from a single pixel. Figure 10 shows the SRF obtained using this same setup, for m = 1, f /2.8, where the knife-edge was translated in 2-µm increments. A single 9-µm pixel near the center of the field of view was monitored, and the resulting SRF was very well resolved. Figure 10 also shows an error function curve fit to k(x), where the error function provides a reasonably good fit to the data. Also shown in Fig. 10 is the Gaussian LSF obtained by differentiating the error function curve fit. The LSF is seen to have a 1/e2 full width of about 40 µm, which corresponds to about 4.5 pixels. The point source images of Fig. 6c indicate a larger LSF, but the heavy quantization and the potential for aliasing makes this difficult to determine from these types of images. The MTF, which is the FT of the LSF (and
Narrowband filter Tungsten lamp Diffusing screen Kodak ES1.0 CCD camera w/ 105 mm lens Knife-edge
∞ l(x) =
dk(x) . dx
p(x, y) dy.
(23)
−∞
x –z translation stage Figure 9. Schematic of the setup for measuring the step response function (SRF) for a single pixel of a CCD camera. The camera images the back-illuminated knife-edge, and the output of a single pixel is monitored as the knife-edge is translated across the field of view.
1
SRF(x) Measured SRF(x) Curve fit LSF(x)
0.8 Response function
Consider the sampled PSF represented by the image of Fig. 6c. Because the LSF covers such a small range of pixels, it is not known how the actual LSF is affected by array sampling. For example, if the LSF contains spatial frequency content that is higher than the Nyquist frequency, then aliasing is present, and the sampled LSF may not reflect the true LSF. There is, however, a superior technique for measuring the LSF that does not suffer from aliasing (39). In this technique, the object (whether sine wave or line source) is translated within the object plane (say in the x direction), and the output from a single pixel is monitored as a function of the x location. This technique is free from aliasing errors because the LSF is sampled at only a single point and the pitch of the measurement (i.e., the resolution) can be much finer than the pixel pitch. For example, it is not difficult to obtain 1-µm resolution on standard optical translation stages, which is substantially smaller than the pitch of most CCD arrays. Because good sine wave and line sources may be difficult to generate in practice, a relatively easy technique is to measure the step response function (SRF), which is the intensity distribution at the image plane obtained by scanning a knife-edge across the object plane. In this case, the output of a single pixel is also measured as a function of the knife-edge position. The SRF irradiance distribution
0.6 0.4 0.2 0 0
10
20 30 Distance, x (µm)
40
50
Figure 10. Measured SRF for an f /2.8 lens operated at unity magnification. The dashed line is the LSF computed from the derivative of the curve fit to the SRF.
FLOW IMAGING
system per the relationship (18)
1 MTF(s)
δdf =
Ideal MTF 0.8
f (m + 1)dblur . m(D ± dblur )
(25)
The ± sign in Eq. (25) indicates that the depth of field is smaller in the direction of the lens and larger away from it. The total depth of field δtot is the sum of the depths of field toward and away from the lens. When dblur D, which is so for most flow imaging cases, then the total depth of field simplifies to
MTF
0.6
0.4
sN
0.2
0
403
0
0.1
0.2
0.3
0.4
0.5
δtot ≈ 2dblur f#
0.6
0.7
Frequency, s (cycles/pixel)
Figure 11. Comparison of MTFs for an f /2.8 lens and 9-µm pixel array operated at unity magnification. The Gaussian MTF was inferred from the measured LSF shown in Fig. 10, and the ideal MTF was computed assuming a diffraction-limited lens and a geometric sampling function for the CCD detector.
is also Gaussian) is shown in Fig. 11. From the figure, it is seen that the resolution of this system is really not very good because sine wave structures whose frequency is 0.2 cycles/pixel (or a wavelength of 5 pixels) will exhibit a 40% contrast reduction. The Nyquist frequency SN is associated with an MTF of about 5% and emphasizes the danger of specifying the resolution in terms of the projection of a pixel into the field of view. An ‘‘ideal’’ MTF is also shown for comparison. The ideal MTF is the product of the MTFs for a diffraction-limited lens (at f /2.8, λ = 532 nm, and m = 1) and an ideal sampling detector whose pixel size is 9 µm [i.e., the product of Eqs. (18) and (22)]. The figure shows that the measured MTF is substantially worse than the ideal one, owing largely to aberrations in the lens. Note that because taking a derivative is a noiseenhancing process, if the SRF cannot be fit to a relatively simple functional form, such as an error function, this makes the determination of the LSF much more difficult using this technique. In some cases, it may be worth the trouble of measuring the LSF directly by using a narrow slit rather than a knife-edge. Paul (26) shows in a planar imaging experiment, that the MTF will be a function of the laser sheet thickness when the sheet thickness is greater than the depth of field of the imaging system. The depth of field δdf is the distance that the object may be shifted in the direction of the lens and still maintain acceptable blur, whereas the depth of is the distance that the detector can be shifted and focus δdf maintain acceptable blur. Note that the two are related = m2 δdf . If the laser sheet by the magnification, that is, δdf is larger than the depth of field, then the region across which the imaging system collects light will be a ‘‘bowtie’’ shaped region, rather than the ‘‘box’’ region shown in Fig. 3. Therefore, near the tails of the laser sheet, the blur spot may be substantially larger than at best focus. The depth of field is related to the blur spot of the imaging
m+1 . m
(26)
For the diffraction-limited case, the blur spot size is given by Eq. (13). Equation 26 shows that the depth of field increases as blur spot size increases and decreases for increasing magnification. For example, the blur spot of Fig. 6c is about 50 µm, which at f /2.8 and m = 1 amounts to δtot = 560 µm. This is somewhat larger than the typical laser sheet thicknesses that are used in planar imaging of scalars, and therefore, it is unlikely that additional blur at the edge of the sheet would be an issue. In many cases, such as using faster optics, this effect will not be negligible. One way to account for the collection of light over the ‘‘bow-tie’’ shaped volume is given in (26), where the MTF was evaluated as the weighted sum of the MTFs of thin laminates (infinitesimally thin planes) parallel to the laser sheet but at different z locations. The weighting function used was the laser sheet energy distribution. This technique of weighting the MTFs by the energy distribution accounts for the fact that more energy will be collected from regions that have smaller blur spots. However, this technique requires either the assumption of ideal MTFs or detailed system MTF measurements at a number of z locations. Another approach is to measure the MTF by the knife-edge technique at best focus and at the edge of the laser sheet. To be conservative, the MTF at the edge of the sheet could be used as the primary measure of resolution, although a reasonable compromise might be to take the average of these two MTFs as the representative MTF of the entire system, including the laser sheet. It is also important to note that the MTF is measured at a given point in the image, but it may vary across the field of view. For this reason, it is also advisable to measure the MTF at the center and near the edges of the field of view. RESOLUTION REQUIREMENTS IN FLUID FLOWS One of the major difficulties in flow imaging is achieving adequate spatial and temporal resolution. This is particularly the case when flows are turbulent because the resolution requirements are typically very severe if it is desired to resolve the smallest scales at which fluctuations occur. Laminar flows, however, pose substantially less stringent requirements on resolution, compared to turbulent flows. The primary issue when considering the resolution requirements is the gradient of the flow property that is being measured because the gradient determines the amount of averaging that occurs across the
404
FLOW IMAGING
resolution volume. In many laminar shear flows including boundary layers, pipe flows, wakes, jets, and mixing layers, the maximum gradient is the same order of magnitude as the overall gradient. In other words, the maximum velocity and temperature gradients are approximately (∂U/∂y)max ∼U/δ and (∂T/∂y)max ∼T/δ, where U is the characteristic velocity difference, δ is the local width of the shear flow, and T is the characteristic temperature difference across the flow. For example, in a boundary layer formed by the flow of air over a heated flat plate, the maximum velocity gradients in these flows scale as (∂U/∂y)max ≈ U∞ /δ ∼ (U∞ /x)Re1/2 x , where Rex = U∞ x/ν, x is the downstream distance, and ν is the kinematic viscosity. Gradients in scalars, such as temperature or species concentration, will similarly scale with Reynolds number, but will also depend on the relative diffusivities for momentum and the scalar. For example, the maximum scalar gradient in the boundary layer will scale as (∂T/∂y)max ≈ [(T∞ − Tw )/x](Rex Pr)1/2 , where T∞ and Tw are the free-stream and wall temperatures, respectively, Pr = ν/α is the Prandtl number, and α is the thermal diffusivity. The preceding relationships show that gradients become large at large Reynolds and Prandtl numbers (or Schmidt number, Sc = ν/D , where D is the mass diffusivity for mass transfer), which is the same as saying that shear flows become ‘‘thin’’ at high Re (and Pr). Turbulent flows have substantially more severe resolution requirements than laminar flows, owing to the much larger gradients that occur at the smallest scales of turbulence. In turbulent flows, the spatial fluctuations in flow properties, such as velocity, temperature, or concentration range in scale from the largest physical dimension of the flow (e.g., the local width of the boundary layer or jet) to the scale at which diffusion acts to remove all gradients. The largest scales are often called the ‘‘outer scales,’’ whereas the smallest scales are the ‘‘inner’’ or dissipation scales because these are the scales at which the energy of fluctuations, whether kinetic or scalar, is dissipated. In classical turbulence theory, the kinetic energy dissipation scale is the Kolmogorov scale (40), η≡
ν3 ε
1/4 ,
(27)
where ε is the kinetic energy dissipation rate. Batchelor (41) argued that the smallest scale of scalar fluctuations λB , called the Batchelor scale, is related to the Kolmogorov scale and the ratio of the kinematic viscosity to the scalar diffusivity. For Sc(or Pr) 1, he argued that λB = ηSc−1/2 . There is some disagreement in the literature about the scaling for fluids when Sc 1 (42), but because most gases and liquids have Schmidt numbers of order unity or larger, this is of little practical concern. Generally, it is assumed that the Sc−1/2 scaling applies at near unity Schmidt numbers, in which case λB ≈ η. For liquids, it is typical that Pr, Sc 1; thus the Sc−1/2 scaling is appropriate, in which case λB η. Using scaling arguments, the Kolmogorov scale can also be related to outer scale variables through the relationship, η ∝ Re−3/4 , where Re is the Reynolds number based on outer scale variables (such as U, the maximum
velocity difference, and δ, the local width of the shear flow). Buch and Dahm (43) make explicit use of such an outer scaling by defining the strain-limited scalar diffusion scale λD , as λD −3/4 (28) = Reδ Sc−1/2 δ where δ is the 5–95% velocity full width of the shear flow and Reδ = Uδ/ν. Their planar imaging measurements of the finest mass diffusion scales in round turbulent jets suggest that ≈ 11. Similar measurements in planar jets suggest a value of ≈ 14 (32). The finest velocity gradient scale, analogous to the Kolmogorov scale, is the strainlimited vorticity scale, λν = λD Sc1/2 . The strain-limited diffusion scales can be related to the Kolmogorov scale by using measurements of the kinetic energy dissipation rate. For example, using the data for the decay of the kinetic energy dissipation rate for gas-phase round jets (44) and taking = 11, it can be shown that λD ≈ 6λB and λν ≈ 6η. If the mean kinetic energy dissipation scales are about 6η, then accurate measurements of the gradients will necessitate better resolution than this. This is consistent with thermal-wire measurements of temperature and velocity fluctuations, which suggest that a resolution of about 3η is sufficient for correct measurements of the smallest scale gradients (45–47). Therefore, it is recommended that the resolution of the imaging system be no worse than λD /2 and λν /2, if the smallest fluctuations in a turbulent flow are to be measured accurately. It cannot be emphasized enough that because of the nature of the MTF of the imaging system, it is too simplistic to speak of ‘‘resolving’’ or ‘‘not resolving’’ particular scales in the flow. Progressively finer scales will be increasingly affected by the imaging system, and any quantitative measurement of gradients must take this into account. Another perspective on Eq. (28) is that it describes the dynamic spatial range that is required for measuring the full range of scales. Here, the dynamic spatial range (DSR) is defined as the ratio of the largest to the smallest spatial structures that can be measured. The largest spatial scale in turbulent flows is generally considered the local width of the shear flow (or in some enclosed flows, a characteristic dimension of the enclosing box). Therefore, for turbulent shear flows, δ/λD given by Eq. (28), is the range of scales of the flow. This also shows that the Reynolds (and Schmidt) number can be thought of as directly related to the DSR of the turbulent shear flow. Equation (28) also shows that the DSR for scalars is even larger for low scalar diffusivity (high Sc or Pr numbers). For example, fluorescein dye in water has a diffusivity of about 2000, and thus the finest mass diffusion scale is about 45 times smaller than the smallest vorticity scale (42). The other important point that Eq. (28) reveals is that the DSR is a strong function of the Reynolds number; thus, it is often not possible to resolve the full range of turbulent scales by using currently available camera systems. For example, assume that it is desired to obtain planar images of the jet fluid concentration in a turbulent round jet and to resolve the full range of scales 500 mm downstream of a 5-mm diameter nozzle. The jet velocity 5% full width grows at a rate of δ(x) = 0.44x, where x is the distance downstream of the jet exit and the centerline
FLOW IMAGING
velocity decays as Uc /U0 = 6.2/(x/dj ), where Uc is the centerline velocity, U0 is the jet exit velocity, and dj is the jet exit diameter (48). In this case, the outer scale Reynolds number, Reδ = Uc δ/ν = 1.9Red , where Red is the source Reynolds number (= U0 dj /ν). If we desire to study a jet where Red = 20, 000, then the range of scales given by Eq. (28) is 150. If a smallest scale of λD /2 must be resolved, then our required DSR is 2δ/λD = 300. In planar imaging using the 1000 × 1000 pixel CCD camera whose measured MTF is shown in Fig. 11, it would not be possible to resolve the entire range of scales because substantial blurring occurs across 4–5 pixels. Turbulent timescales are generally bounded by relatively low frequency outer scale motions and highfrequency inner scale motions (40). The largest scale motions are independent of viscosity and occur over a characteristic time that is of the order of τos ∼ δ/U. This is also commonly referred to as the ‘‘large-eddyturnover’’ time. The small-scale motions, however, occur over a substantially shorter timescale, which is of the order of τis ∼ (ν/ε)1/2 or τis ∼ (Reδ )−1/2 τos , if based on outer scale variables. This latter relationship shows that, at high Reynolds numbers, the inner scale timescales can be orders of magnitude smaller than outer scale timescales. The turbulent inner scale timescales may not be the shortest timescales that must be resolved, if the flow is convecting past the measurement volume. In this case, the shortest time may be the convective inner scale time (τis )conv = λν /U, where U is the local velocity. For example, consider a mixing layer that forms between two parallel streams of air, where the streams have velocities of 100 m/s and 90 m/s. The range of turbulent spatial scales will depend only on the outer scale Reynolds number, which in turn depends only on the velocity difference of 10 m/s. The absolute velocities are irrelevant, except to the extent that they affect the local mixing layer thickness δ. If the imaging system is in the laboratory frame of reference, then the timescales will depend on both the velocity difference (which drives the turbulence) and the bulk convection of these spatial structures, which depends on the local velocity of the structures with respect to the imaging system. For the mixing layer conditions given before, if the mixing layer at the imaging location is 10 cm thick, then τos ≈ 10 ms, and τis ≈ 40 µs. However, if the small-scale structures convect by the measurement station at the mean velocity of Uconv = (U1 + U2 )/2 = 95 m/s, then the timescale that needs to be resolved is (τis )conv = λν /Uconv = 3 µs, which is considerably less than τis . It is clear that the smaller of the convective and turbulence timescales must be resolved. FLOW IMAGING: SURVEY OF TECHNIQUES The purpose of this section is to give the reader an idea of the wide range of flow imaging techniques that have been developed and applied in fluid mechanics research. Owing to space limitations, however, this survey must leave out many techniques that are certainly worthy of discussion. Hopefully, in most cases, a sufficient number of general references is provided for readers to learn about these omitted techniques on their own.
405
Furthermore, excellent reviews of a number of qualitative and quantitative flow visualization techniques, including some that were omitted in this article, can be found in (5). The reader should keep in mind that the physical principles underlying each technique are usually not discussed in this article because they are covered in different sections of this encyclopedia and in the references cited in the bibliography. This section is organized on the basis of the flow variable to be imaged, because in most cases the user starts with a need (say, for temperature imaging in an aqueous flow) and then must find the technique that best addresses that need. Because some techniques can be used to measure several flow variables, their use may be described under more than one category. Therefore, to avoid too much redundancy, a technique is described only the first time it is mentioned; thus, the uninitiated reader may need to read the article all the way through rather than skipping to later sections. Density Gradients (Schlieren and Shadowgraph) Two of the most widely used techniques for qualitative flow visualization, particularly in high-speed flows, are the schlieren and shadowgraph techniques. Although the main emphasis of this article is on quantitative planar imaging techniques, the shadowgraph and schlieren techniques will be briefly discussed because of their extensive use in gas dynamics. Furthermore, the mechanism of light ray deflection by index-of-refraction gradients, which is the basis for these techniques, is a potential source of error in quantitative laser imaging. In their most commonly used forms, the schlieren and shadowgraph techniques provide line-of-sight integrated information about gradients in the index-of-refraction field. Because the index of refraction is related to gas density, in fluid flows such as air whose composition is uniform, the schlieren technique is sensitive to variations in the first derivative of density, and the shadowgraph to the second derivative. Interferometry is a quantitative line-of-sight technique that enables imaging the density field, but it will not be discussed here because it is becoming increasingly supplanted by planar imaging techniques. Because these techniques are spatially integrated along the line of sight, they are limited in the quantitative information that can be inferred in complex, three-dimensional flows. Further details of these techniques can be found in several excellent references (5,9,10,49). The physical basis for the shadowgraph and schlieren techniques is that spatial variations in the index of refraction of a transparent medium cause spatial variations in the phase of plane light waves (33). The index of refraction is defined as n = c0 /c, where c0 is the speed of light in vacuum and c the speed of light in the medium. When traveling through a medium when n > 1, the phase of the transmitted wave undergoes a negative phase shift, owing to a lag in the oscillations of the induced dipoles within the medium. For this reason, an object that causes such a phase shift is termed a ‘‘phase object,’’ and it can be contrasted with an ‘‘amplitude object,’’ such as an opaque disk, which changes the amplitude of the light waves. Because the velocity of light is usually considered the
406
FLOW IMAGING
‘‘phase velocity,’’ which is the velocity of a point of constant phase on the wave, the phase shift can be interpreted as a change in the velocity of the transmitted wave. Both the schlieren and shadowgraph techniques are analyzed by considering how a plane wave is affected by propagation through an index-of-refraction gradient. Consider the propagation of a plane wave in the z direction through a transparent medium that has a gradient of n in the y direction. It can be shown that the angular deflection θy in the y direction is given by (9) θy =
1 ∂n dz, n ∂y
(29)
L
where the integration is along the line of sight and over the path length L. Equation (29) shows that the angular deflection increases for increasing gradients and longer path lengths. The equation also shows that the light rays are bent in the direction of the gradient, that is, the rays are bent toward regions of higher index of refraction. In gases, the index of refraction is related to the fluid density ρ by the Gladstone-Dale relationship (9), n = 1 + Kρ,
(30)
where K is the Gladstone–Dale constant. For example, for 633-nm light and T = 288 K, K = 2.26 × 10−4 , 1.57 × 10−4 , 1.96 × 10−4 m3 /kg, for air, argon and helium, respectively. In water, which is largely incompressible, the index of refraction varies primarily with temperature. For example, for 632.8-nm light, the index of refraction across the temperature range of 20–34 ° C is given by (9) n(T) = 1.332156 − 8.376 × 10−5 (T − 20 ° C) − 2.644 × 10−6 (T − 20 ° C)2 + 4.79 × 10−8 (T − 20 ° C)3 (31) An example of a schlieren setup is shown in Fig. 12. For instantaneous imaging, the light source is usually a point source of short duration (typically a microsecond or lower); common sources are xenon flash lamps and lasers. In most cases, a flash lamp is preferred to a laser source because lamps are cheaper and the coherence and mode structure
Microscope objective lens
Pinhole on three-axis translation stage
Lenses (typically mirrors)
f2 I = I a
1 ∂n dz, n ∂y
(32)
L
CCD camera
f1 Knife-edge f2 Phase object Figure 12. Schematic of a typical laser schlieren setup. The undeflected rays are shown in gray, whereas the deflected rays are shown in black. The knife-edge blocks rays deflected downward by negative gradients, which renders those gradients dark in the image. In contrast, the rays deflected upward by the positive gradients miss the knife-edge and are rendered light in the image. Pulsed laser
of most pulsed lasers causes a nonuniform image. In some cases, however, such as in plasmas where the background luminosity is very high, the high brightness of a laser is a necessity. In this case, the beam must be spatially filtered to improve its spatial uniformity (33). This is usually accomplished by tightly focusing the beam through a small pinhole by using a microscope objective lens. Note that it is typically very difficult to focus the laser beam onto such a small pinhole, and an integrated lens/pinhole mount that has three axes of translation is necessary. A further problem when using a pulsed laser is that it is difficult to keep from burning the pinhole material, owing to the very high peak intensity at the focus. This problem can be alleviated by substantially reducing the energy of the beam. Although such low laser energies will result in weaker signals, obtaining a sufficient signal is not usually a problem in the schlieren and shadowgraph techniques because the beam is usually directed into the camera (Fig. 12). When using a flash lamp that has an extended arc, the point source is approximated by imaging the arc onto a small aperture (e.g., submillimeter diameter) with a lens. The sensitivity of the schlieren system will be improved by a smaller point source, but the signals are reduced accordingly. For many flash lamps, the arc is small enough that it may not be necessary to use any spatial filter at all. The point light source is collimated by what is typically a large diameter spherical mirror, which is at least as large as the object that is being imaged. Lenses can also be used when smaller fields of view are desired, and this is the situation shown in Fig. 12. The mirror/lens is placed one focal length from the source, which collimates the beam. After the beam passes through the test section, it is then directed to a second mirror/lens (called the ‘‘schlieren head’’), which refocuses the beam. In the conventional schlieren setup, a knife-edge (e.g., razor blade) is placed at the second focal spot, as shown in Fig. 12. The horizontal knife-edge shown produces an optical system that renders upward density gradients as light and downward gradients dark. This occurs because the rays are deflected up by the upward gradients and thus miss the knife-edge, whereas the knife-edge blocks the rays that are deflected down by the downward gradients. The analysis of the schlieren intensity variations is conceptually simpler for a line light source, which forms a line image at the focus. In this case, the relative intensity variations at the film plane are given by (9)
where I is the intensity of the image when no gradient is present, I = I∗ − I, I∗ is the intensity when the gradient is present, f2 is the focal length of the schlieren head, and a is the height of the focal image that is not blocked by the knife-edge. Equation (32) shows that a longer focal length schlieren head and decreasing height of the transmitted portion of the image at the focus increases sensitivity. Interestingly, increasing the distance between the phase object and the focusing lens/mirror does not affect the sensitivity.
FLOW IMAGING
The focus is found by traversing the knife-edge along the optical axis. When the knife-edge is upstream of the focus, the image reveals an inverted shadow of the knifeedge, whereas when the knife-edge is downstream of the focus, the image reveals an upright shadow. It is only at the focus that inserting the knife-edge into the focal spot results in a uniform reduction of intensity of the image and no shadow of the knife-edge. An example of a schlieren image of a supersonic helium jet issuing into room air is shown in Fig. 13 (from Ref. 50). In this figure the knife-edge was horizontal, and therefore the vertical n-gradients are visualized. Because helium has a very low index of refraction, the n gradients resulting from mixing are very distinct. Furthermore, even subtle features such as the Mach waves in the ambient air are visualized as the dark lines to the outside of the jet. A useful extension of the schlieren technique is ‘‘color’’ schlieren, which uses a white light source combined with a transparency of varying color in place of the knifeedge (5,9,10). Because the eye is better able to distinguish colors than shades of gray, color schlieren is superior for visualizing the density gradients in flows. Although most color schlieren is used for flow visualization, it has also been used to obtain quantitative temperature data in flows by relating the color of the image to the angular deflection of the light rays (51). When used with an axisymmetric phase object, this technique enables the tomographic reconstruction of the three-dimensional temperature field (52).
407
An interesting way of looking at the function of the knife-edge is as a filter, which acts on the spatial frequencies in the phase object. This can be seen by considering that the second focus is called the ‘‘Fourier transform plane,’’ because the intensity distribution at the focus is related to the spatial frequency content of the phase object (33,53). Higher spatial frequencies are associated with increasing radial distance from the center of the focal spot. The dc component is the neutral intensity background present when there is no phase object, and it can be filtered out if an opaque disk is used as the spatial filter. In this case, when the phase object has no high-frequency content, then the image will be uniformly dark. When higher spatial frequencies are present in the phase object, the disk will not block them, and they will be visualized as light regions in the image. It can be seen from this that the shape of the spatial filter can be tailored to visualize different frequencies in the phase object. This principle has been used to develop a system that directly measures the power spectrum of the line-of-sight integrated index of refraction fluctuations in turbulent flows (54,55). Note that an alternative technique has been developed, named the ‘‘focusing schlieren’’ technique, which enables visualizing density gradients as in conventional schlieren, but the depth of field along which the signal is integrated can be just a few millimeters (56). The focusing schlieren technique can yield nearly planar images of the density gradient field at substantially lower cost than by planar laser imaging. In some cases, such as a large-scale wind tunnel where optical access is limited, it may be the only means of acquiring spatially resolved image data. The shadowgraph effect can be understood from simple geometrical ray tracing, as shown in Fig. 14. Here a plane wave traverses a medium that has a nonuniform index-of-refraction gradient and is allowed to illuminate a screen. The rays traversing through the region that has no gradient are not deflected, whereas the rays traversing the region that has an the upward gradient are bent up. The resulting image on the screen consists of regions where the rays converge and diverge; these appear as regions of light and dark, respectively. It is this effect that gives the technique its name because gradients leave a shadow, or dark region, on the viewing screen. It can be shown that the intensity variations on the screen follow the relationship (9) I =L I
∂2 ∂2 + 2 2 ∂x ∂y
(ln n) dz.
(33)
Illumination screen
} Figure 13. Sample schlieren image of a Mach 2 jet of helium exhausting into room air. The knife-edge is oriented horizontally, thus the vertical index of refraction gradients are visualized. The image reveals fine structures of the jet turbulence in addition to Mach waves that are generated by structures that travel at supersonic speeds with respect to the ambient. (Reprinted with permission from Mach Waves Radiating from a Supersonic Jet by N. T. Clemens and P. H. Paul, Physics of Fluids A 5, S7, copyright 1993 The American Institute of Physics.)
Collimated light rays
Neutral
} Light
r
Dark r
} Light
} Phase object
Neutral
Deflected ray
Figure 14. Illustration of the shadowgraph effect.
408
FLOW IMAGING
For gas flows, incorporating the Gladstone–Dale relationship into Eq. (33) shows that the shadowgraph technique is sensitive to the second derivative of the density along the line of sight of the light beam. A shadowgraph system can be set up almost trivially by using an approximately collimated light source and a screen. For example, a shadowgraph system suitable for classroom demonstrations can be made by expanding the beam from a laser pointer using a short focal length lens and projecting the beam onto a wall a few meters away. This simple system will enable the visualization of the thermal plume rising from a candle flame. Despite the simplicity of this system, more sophisticated setups are typically desired. For example, the schlieren setup shown in Fig. 12 can be used for shadowgraph by simply removing the knife-edge. However, unlike schlieren, where the camera is focused on the phase object, the camera must be slightly defocused to produce sufficient divergence of the deflected rays on the image plane. This feature enables one to ‘‘focus out’’ the shadowgraph effect in a schlieren system. An obvious disadvantage to this technique is that any amplitude objects in the image (e.g., a bullet) will be slightly out of focus. The problem of slight defocus is generally tolerable, compared to the advantages of being able to alternate quickly between the schlieren and shadowgraph techniques. Concentration/Density Imaging the concentration of a particular type of fluid or chemical species is primarily of interest in studies of mixing and combustion. Concentration and density are related quantities in that they both quantify the amount of a substance per unit volume. Because most optical diagnostic techniques are sensitive to the number of scatterers per unit volume, rather than to the mass per unit volume, the concentration is the more fundamental quantity. Of course, density can be inferred from the concentration if the fluid composition is known. Concentration imaging is of interest in nonreacting mixing studies and in reacting flows for investigating the relationship between the chemical state of the fluid and the fluid mechanics. Planar laser-induced fluorescence imaging is probably the most widely used technique for quantitative scalar imaging because it can be used in liquids and gases, it is species specific, and its high signals enable measuring even minor species in gas-phase flows (11,27,57). In laserinduced fluorescence (LIF), a laser is used to excite an atom or molecule from a lower energy state into a higher energy state by the absorption of a photon of light. The frequency of light required is related to the energy difference between the states through the relationship E = hν, where E is the energy per photon and ν is the frequency of light. The excited state is a state of nonequilibrium, and thus the atom/molecule will tend to return to equilibrium by transiting to a lower energy state. The return to the lower state can occur by several processes, including spontaneous emission of a photon of light (fluorescence); stimulated emission by the incident laser light; ‘‘quenching,’’ that is, the transfer of energy to other atoms/molecules through molecular collisions; and
by internal energy transfer, or the transfer of energy to other energy modes within the molecule. Because the probability of quenching depends on local thermodynamic conditions, the LIF signal is in general a function of several flow variables, including the concentrations of all species present, temperature, and pressure. Furthermore, the theoretical dependence of the LIF signal on the flow variables depends on the specific model of the energytransfer physics. Because properly modeling the physics is an important part of quantifying PLIF measurements, PLIF can be a particularly challenging technique to use. The dependence of the signal on many variables presents both an opportunity and a disadvantage for making quantitative measurements. The opportunity is that PLIF can be used to measure a range of flow variables for a remarkable number of chemical species. However, it is generally very difficult to relate the LIF signal to a particular variable of interest (e.g., species concentration) because the signal depends on so many other flow variables, which may not be known. For example, in using PLIF for OH, which is commonly used in flames as an approximate marker of the reaction zone, the PLIF signal is a function of the OH mole fraction, the mole fractions of several other species, including N2 , O2 , H2 O, and CO2 ; and the temperature. Because it is virtually impossible to measure all of these variables, the signal can be quantified only by assuming a certain level of knowledge about the thermochemical state of the flow (e.g., equilibrium chemistry). Despite the caveat about the difficulties that can be encountered when using PLIF imaging, there are many cases where PLIF imaging is in fact relatively simple to implement. The first case is using PLIF in liquid flows. PLIF in liquids, particularly water, is achieved by seeding a fluorescent organic dye into the flow. Because many liquids are essentially incompressible and isothermal, the PLIF signal is usually a function only of the dye concentration and therefore is ideal for mixing studies (6,58,59). Fluorescent dyes absorb light across a very broad range of wavelengths, and thus they can be stimulated by using a number of different lasers. Some of the more popular dyes for aqueous flows include fluorescein, rhodamine B, and rhodamine 6G; all of their absorption bands overlap one or more emission lines of the argon-ion, copper-vapor, and doubled Nd •• YAG lasers. Because of this and because liquid-phase PLIF tends to exhibit high signal levels (due to the high density of the fluid), excellent results can usually be achieved without highly specialized equipment. It is important to note that some dyes, suffer from photobleaching effects at high laser intensity (or fluence), which can lead to significant errors in concentration measurements (60–63). Photobleaching is the reduction in the concentration of fluorescent molecules due to laser-induced photochemistry. Both fluorescein and rhodamine 110 are particularly problematic, and (60) even suggests abandoning the use of fluorescein in favor of rhodamine B. Another important issue in using PLIF of organic dyes is that the high signals are often a result of the high absorption coefficient of the dye solution. In this
FLOW IMAGING
case, substantial laser beam attenuation is encountered when the optical path lengths are relatively large. Beam attenuation can be alleviated by reducing the dye concentration along the beam path or by reducing the optical path length; however, this is often not possible, owing to SNR considerations or other practical limitations. Alternatively, attenuation along the ray path can be corrected for by using the Beer-Lambert absorption law, provided that the entire path length of a given ray of the laser sheet is imaged (11,64). PLIF is also relatively easy to implement in nonreacting gas-phase flows, where the flow can be seeded with a gasphase tracer species. By far the most popular tracer to date is acetone, although biacetyl, NO, and I2 have also been used to a more limited extent. Acetone (CH3 COCH3 ) is an excellent tracer species in nonreacting flows because it is relatively nontoxic, fairly easy to seed into flows, generally provides good signals, and can be pumped at a range of UV wavelengths (65). A characteristic feature of polyatomic molecules, such as acetone, is that they have broad absorption bands. The absorption band of acetone ranges from about 225 to 320 nm, and thus it is readily pumped by quadrupled Nd •• YAG (266 nm) and XeCl excimer (308 nm) lasers. Furthermore, although its fluorescence efficiency is not very high (about 0.1–0.2%), its high saturation intensity means that high laser energies can be used, which compensates for any limitation in fluorescence efficiency. A small sample of studies where acetone PLIF was used for concentration measurements includes jets (32,65), jets in crossflow (66), supersonic shear layers (67), and internal combustion engines (68). Biacetyl (CH3 (CO)2 CH3 ) is another low toxicity seed species that has been used in nonreacting flows and to a lesser extent in flames. Biacetyl vapor absorbs in the range 240–470 nm and exhibits blue fluorescence over the range 430–520 nm and green phosphorescence over the range 490–700 nm. The quantum yield, that is, the fraction of emitted photons to absorbed photons is 15% for phosphorescence, but is only 0.2% for fluorescence. For this reason, biacetyl phosphorescence has been used to produce very high SNR imaging (29). Several different lasers have been used for biacetyl pumping, including dye lasers, excimers, and frequency-tripled Nd •• YAGs. One drawback to using biacetyl is that O2 quenches the phosphorescence, which leads to a significantly lower SNR when biacetyl is seeded into air. Furthermore, the long lifetime of the phosphorescence (about 1 ms) can be severely limiting if high temporal resolution is required. Finally, biacetyl can be difficult to work with because it has a very strong odor (akin to butterscotch) that can rapidly permeate an entire building if not contained. Other seed species that have been used for species mole fraction measurements, primarily in supersonic mixing flows, include I2 and NO. Both species are difficult to work with because they are highly corrosive and toxic. One of the main advantages of diatomic molecules is that they tend to have many discrete absorption lines, in contrast to more complex polyatomic molecules, such as acetone, whose lines are very broad. Diatomic molecules give the user much greater ability to choose the temperature dependence of the LIF signal. This property has been
409
used in several supersonic mixing studies where relatively temperature-insensitive transitions were used so that the resulting LIF signal was approximately proportional to the mole fraction of the fluorescent species (69–71). A major issue in mixing studies is that unless the smallest scales of mixing are resolved, it is not possible to differentiate between fluid that is uniformly mixed at the molecular level or simply ‘‘stirred’’ (i.e., intertwined, but without interdiffusion). An interesting application of NO PLIF is in ‘‘cold chemistry’’ techniques, which can differentiate between mixed and stirred fluid on a scale smaller than can be resolved. These techniques use the fact that NO fluorescence is rapidly quenched by O2 but is negligibly quenched by N2 . Cold chemistry has been used to obtain quantitative statistical mixing properties of high Reynolds number shear layers where the smallest mixing scales were not resolved (72,73). This technique has also been extended to enable direct imaging of the degree of mixing/stirring for each pixel by simultaneously imaging the fluorescence from a quenched (NO) and nonquenched (acetone) species (74). PLIF has proven extremely useful in investigating mixing and supersonic flows by seeding a tracer, and it is also important for imaging naturally present species, such as occur in chemically reacting flows. Because PLIF is a highly sensitive technique, it enables the imaging of trace species, such as combustion intermediates. For example, PLIF has been used to image an astounding number of species in flames. A limited list of major and intermediate species that have been imaged in flames by PLIF include CH, OH, NO, NO2 , C2 , CN, NH, O2 , CO, C2 H2 , H2 CO, O, and H (11,57,75). The power of PLIF species imaging in combustion research is exemplified by Fig. 15, which shows a pair of simultaneously acquired images of CH and OH in a turbulent nonpremixed methane–oxygen jet flame (76). The CH was pumped at a wavelength of about 390 nm, and the fluorescence was collected across the range of 420–440 nm; the OH was pumped at about 281 nm, and the fluorescence was collected across the range of 306–320 nm. The laser excitation was achieved by using two Nd •• YAG lasers, two dye lasers, frequency doubling, and mixing crystals; the images were captured
x /d 42
CH
CH + OH
OH
40 38 −2
0
2
−2
0 r /d
2
−2
0
2
Figure 15. Sample of simultaneously acquired CH/OH PLIF images in a turbulent methane–oxygen jet flame. The CH field is shown at left, the OH field at center, and the superposition of the two at the right. The coordinates x and r refer to axial and radial distances, respectively, and d is the diameter of the jet nozzle. (Reprinted by permission of Elsevier Science from Reaction Zone Structure in Turbulent Nonpremixed Jet Flames — From CH-OH PLIF Images by J. M. Donbar, J. F. Driscoll and C. D. Carter, Combustion and Flame, 122, 1–19, copyright 2000 Combustion Institute.) See color insert.
410
FLOW IMAGING
on two intensified CCD cameras. The CH field is shown at the left, the OH in the middle, and the two images are shown superimposed at the right. Rayleigh scattering has also been used successfully to image the concentration field in a range of flows. Rayleigh scattering is defined as the elastic scattering from particles, including atoms and molecules, which are much smaller than the wavelength of the incident light (77). In molecular Rayleigh scattering, the differential Rayleigh scattering cross section at 90° , (dσRay /d ), is given by the relationship (78) 4π 2 (n − 1)2 dσRay = d Nd2 λ4
(34)
where Nd is the number density. Note that in a mixture of fluids, the Rayleigh scattering signal is proportional to the total cross section of the mixture, and thus it is not species specific (11), which greatly limits its utility for measuring concentration in reacting flows. Equation (34) shows that the differential scattering cross section scales as λ−4 , which indicates a much greater scattering efficiency for short wavelengths of light. However, whether it is advantageous to work at UV rather than visible wavelengths is determined from an analysis of the entire electro-optical system. For example, is it better to measure using a frequency-quadrupled (266 nm) or a doubled (532 nm) Nd •• YAG laser? To see this, it must first be considered that the signal recorded by a detector is directly proportional to the number of incident photons (EL /hν), as shown in Eq. (5). Because ν = c/λ, the number of photons per pixel for Rayleigh scattering scales as Spp ∝ (EL λ)λ−4 ∝ EL λ−3 ; thus, the dependence of the signal on the wavelength is weaker on a per photon basis. Furthermore, the quantum efficiency of most detectors decreases in the UV, and there are few highquality fast (low f# ) photographic lenses that operate at UV wavelengths. For example, consider scattering measured by a Nd •• YAG laser that produces 500 mJ at 532 nm and 125 mJ at 266 nm. In this case, the number of photons scattered will be only twice as large as that at 266 nm. After accounting for the likely reduced quantum efficiency and f# of the collection optics, UV excitation may not be a means of improving the signal. UV excitation is more likely to be beneficial when using excimer lasers, which produce very high energies per pulse well into the UV. This example shows that it is necessary to account for all aspects of the electro-optical system, not just the scattering cross section, when determining the optimal wavelength to use. One of the most common uses of Rayleigh scattering is in nonreacting mixing studies, where it is used as a passive marker of fluid concentration. For example, jet mixing can be studied by imaging the Rayleigh scattering when a jet fluid that has a high Rayleigh cross section issues into an ambient fluid that has a low cross section (32,43,79,80). In this case, the mole fraction of jet fluid can be related to the scattering signal through the relationship χjet = [Se − (Se )∞ ]/[(Se )0 − (Se )∞ ], where (Se )0,∞ are the signals obtained at the jet exit and ambient, respectively. An example of a Rayleigh scattering image is shown in
Max.
Concentration
Min. Figure 16. Example of a planar Rayleigh scattering image of a turbulent propane/acetone jet. The jet issued into a slow co-flow of filtered air, the field-of-view was 35 × 35 mm, and the local Reynolds number at the measuring station was 5600. The signal is proportional to the concentration of jet fluid. (Reprinted with permission from Planar Measurements of the Full Three-Dimensional Scalar Dissipation Rate in Gas-Phase Turbulent Flows by L. K. Su and N. T. Clemens, Experiments in Fluids 27, 507–521, copyright 1999 Springer-Verlag.) See color insert.
Fig. 16 (from Ref. (32)), which was acquired in a planar turbulent jet of local Reynolds number 5600 at a distance of 100 slot widths downstream. The jet fluid was propane, which was seeded with about 5% acetone vapor, and the jet issued into a slow co-flow of air. The jet was illuminated by 240 mJ of 532-nm light produced by a Nd •• YAG laser, and the images were captured by a slow-scan CCD camera that had a 58-mm focal length, f /1.2 lens and a laser line filter (50% maximum transmission) at a magnification of 0.28. In Fig. 16, the signal is proportional to the concentration of jet fluid, and the figure demonstrates that Rayleigh scattering can be used to obtain very high quality images of the jet concentration field. One of the main difficulties in such an experiment is the need to reduce all sources of unwanted elastic scattering, such as reflections from test section walls/windows and scattering from dust particles in the flow. The background scattering from windows and walls is particularly problematic because it can easily overwhelm the weak Rayleigh signals. Although theoretically these background signals can be removed as part of a background correction, such as obtained by filling the test cell with helium (81), this can be done only if the shot-to-shot variation in the background is substantially less than the Rayleigh signals of interest. In many cases, however, such as in a relatively small test section, this is not the case, and background interference is unacceptably high. When the background due to laser reflections from walls/windows is high, increasing the laser energy does not improve the signal-to-background ratio because the signal and background increase proportionately. In this case, the only recourse is to increase the signal by using a higher cross section or number density or to lower the background by reducing reflections by painting opaque surfaces flat black and by using antireflection coatings on all windows.
FLOW IMAGING
In some cases, filtered Rayleigh scattering (FRS) can be used to reduce the background reflections greatly from walls and windows (5,82). In FRS, the Rayleigh scattering from a narrow bandwidth laser is imaged through a narrow line notch filter. For example, in one implementation used in high-speed flows, the Rayleigh scattering is induced by the narrow line light from an injection seeded frequency-doubled Nd •• YAG laser, and the scattering is imaged through a molecular iodine absorption filter. If the laser beam and camera are oriented in the appropriate directions, the light scattered by the moving molecules will be Doppler-shifted, whereas the reflections from the stationary objects will not be shifted. Figure 17 illustrates the FRS technique. Because the scattering is imaged through an absorption filter, the unshifted light is absorbed by the filter, whereas the Doppler-shifted scattering is partially or completely transmitted. This same technique also forms the basis of a velocity diagnostic that will be discussed later. Spontaneous Raman scattering has also been employed for quantitative concentration measurements in turbulent flows. It is particularly useful in combustion research because it is linear, species specific (unlike Rayleigh scattering), and enables measuring multiple species using a single laser wavelength (11). Spontaneous Raman scattering is caused by the interaction of the induceddipole oscillations of a molecule with its rotational and vibrational motions. In other words, the incident laser beam of frequency ν0 is shifted in frequency by the characteristic frequency of rotation/vibration. The frequency of Raman scattering is either shifted to lower frequencies (called Stokes-shifted) or to higher frequencies (called anti-Stokes-shifted). The photon that is Stokesshifted has lower energy than the incident photon, and the energy difference is transferred to the energy of
∆nD Laser line
Filter transmission curve
I (n) Rayleigh–Brillouin scattering line
nL
n
Figure 17. Illustration of the filtered Rayleigh scattering technique. The scattering from walls and windows has the same line shape and line center frequency as the laser itself. The scattering from the flow is shown as molecular (Rayleigh–Brillouin) scattering, which may be broader than the laser line, owing to thermal and acoustic motions of the molecules. If the scattering medium is particles rather than molecules, then the Rayleigh scattered light will have the same line shape as the laser. When the scattering is imaged through a notch filter (shown as the dotted line), then the Doppler-shifted light is partially or completely transmitted, whereas the scattering from stationary objects is not transmitted.
411
vibration/rotation of the molecule. Similarly, anti-Stokesshifted photons have higher energy, and thus energy is given up by the molecule. In most flow imaging studies, vibrational Raman scattering is used because the lines for different species are fairly well separated. For example, for excitation at 532 nm, the Stokes-shifted vibrational Raman scattering from N2 , O2 , and H2 occurs at wavelengths of 607, 580, and 683 nm, respectively. In contrast, owing to the smaller energies of rotation, the rotational Raman lines in a multispecies mixture tend to be grouped closely around the excitation frequency, thus making it very difficult to distinguish the scattering from a particular species. The main problem in spontaneous Raman scattering is that the signals tend to be very weak, owing to very small scattering cross sections. Typically, Raman scattering cross sections are two to three orders of magnitude smaller than Rayleigh cross sections (11). For example, for N2 at STP, (dσ/d )Ray = 7 × 10−32 m2 /sr, whereas the vibrational Raman cross section (dσ/d )Ram = 4.6 × 10−35 m2 /sr, which is more than three orders of magnitude smaller than the Rayleigh cross section. The low signals that are inherent in Raman scattering make it applicable in only a few very specialized cases, such as when only major species are of interest and when very high laser energies can be generated. For example, methane concentration has been imaged in jets and flames; however, this required a high-energy flashlamp-pumped dye laser (λ ≈ 500 nm, >1 J/pulse), combined with a multipass cell (19,83,84). The multipass cell resulted in an increase in laser fluence of about 30 times over that which could be achieved using only a cylindrical lens. A similar setup was used to image the Raman scattering from the C–H stretch vibrational mode in methane-air jet flames (85). Despite the use of very high laser energies and multipass cells, the relatively low SNRs reported in these studies demonstrate the great challenge in the application of Raman scattering imaging in flames. Temperature/Pressure Several techniques have been developed to image temperature in both liquid- and gas-phase flows. Most liquidphase temperature imaging has been accomplished using either liquid crystals or PLIF of seeded organic dyes. For example, suspensions of small liquid crystal particles were used to image the temperature field in aqueous (86) and silicon oil flows (87,88). In these studies, the liquid crystal suspensions were illuminated by a white light sheet, and the reflected light was imaged by using a color CCD camera. The color of the crystals were then related to the local flow temperature using data from independent calibration experiments. The advantage of liquid crystals is that they can measure temperature differences as small as a fraction of a degree, but typically in a range of just a few degrees. Furthermore, they have a rather limited response time and spatial resolution that is not as good as can be achieved by planar laser imaging. PLIF thermometry offers an improvement in some of these areas, but the minimum resolvable temperature difference tends to be inferior. The simplest PLIF technique is single-line
412
FLOW IMAGING
thermometry, where a temperature-sensitive dye is uniformly seeded into the flow and the signals are related to temperature using data from a calibration experiment. For example, rhodamine B dye has relatively good temperature sensitivity because the LIF signal decreases 2–3% per ° C. In (89), temperature fields were acquired by exciting rhodamine B by using a frequency-doubled Nd •• YAG laser and imaging the fluorescence through a color filter. They report measurement uncertainty of about 1.7 ° C. A potential source of error in flows that have large index of refraction gradients, such as occur in variable temperature liquid- or gas-phase flows, is the variation in the intensity of the laser beam, owing to the shadowgraph effect. This can be a significant problem in liquid-phase flows where the temperature differences are of the order of several degrees or more or where fluids that have different indexes of refraction are mixed. In gas-phase flows, shadowgraph effects are less of a problem, but they may be significant when mixing gases such as propane and air that have very different indexes of refraction, and at high Reynolds numbers where gradients tend to be large. For example, careful viewing of the mixing of propane and air shown in Fig. 16 reveals subtle horizontal striations that are caused by shadowgraph effects. In principle, it is possible to correct for shadowgraph effects (64,89) — provided that the smallest gradients are resolved — by correcting the laser beam intensity along a ray path using Eq. (33). In the planar imaging of turbulent flow, however, it is not possible to correct for out-of-plane gradients, and thus the correction procedure will not be completely accurate. As an alternative to correcting for shadowgraph effects, two-line techniques have been developed where a mixture, composed of a temperature-sensitive dye and a temperature-insensitive dye, is seeded into the flow (63,90). If dyes are chosen that fluoresce at different wavelengths (when excited by the same wavelength of light), then the ratio of the two LIF signals is related to the temperature but is independent of the excitation intensity. In some cases, it is desired to remove shadowgraph effects, while maintaining density differences. In this case, it is possible to make a binary system of fluids, which have different densities but the same index of refraction (e.g., 91). One of the simplest techniques for measuring temperature in gas-phase, constant-pressure flows is to measure density by schlieren deflectometry, interferometry, or Rayleigh scattering, from which the temperature can be inferred using an equation of state. For example, the rainbow schlieren (or deflectometry) technique discused previously (51,52) enables imaging the temperature field under certain conditions, such as in constant pressure, steady, two-dimensional, laminar flows. However, because this technique is spatially integrated, it has limited applicability to 3-D, unsteady flows, particularly where the composition and temperature (hence, index of refraction) vary in space and time. Unfiltered Rayleigh scattering techniques typically require a constant pressure flow that has a uniform index of refraction (hence, Rayleigh scattering cross section). In this case, variations in the Rayleigh scattering signal are due only to temperature variations. In general, however, mixing and reacting flows exhibit
variations in fluid composition, which lead to variations in the index of refraction, even at constant temperature. It is for this reason that combustion researchers have used specialized fuel mixtures where the Rayleigh scattering cross section is approximately constant for all states of combustion, and thus the Rayleigh scattering signal is inversely proportional to temperature (92,93). The main drawback of this technique is that it assumes equal molecular diffusivities of heat and species, which is a rather dubious assumption in many cases. FRS can be used for temperature imaging by relating changes in the scattered signal line shape to the temperature. In Rayleigh scattering from molecules, even if the incident light is essentially monochromatic, the scattered light will be spread over a range of frequencies due to thermal and acoustic motions, as illustrated in Fig. 17 (5,82). When the scattering combines thermal and acoustic broadening, it is sometimes called Rayleigh–Broullin scattering. The resulting scattered light line shape, which is sensitive to the temperature, pressure, and composition of the gas, can be used to measure those quantities. For example, if the Rayleigh scattering is imaged through a notch filter that has a known transmission curve and the theoretical Rayleigh–Brillouin line shape is known, then it is possible to infer the temperature field under certain conditions. Techniques using this procedure have enabled imaging the mean pressure and temperature fields in a Mach 2, free, air jet (94) and the instantaneous temperature field in premixed flames (95). In a related technique, Rayleigh–Brillouin scattering is imaged through a Fabry–Perot interferometer, which gives a more direct measure of the frequency and line shape of the scattered light (96). This technique has been used to measure temperature (and velocity) in high-speed, free, jet flows. PLIF has also been extensively used for temperature imaging in gas-phase flows. The most commonly used technique is two-line PLIF of diatomic species (such as NO, I2 , and OH), where the ratio is formed from the fluorescence resulting from the excitation of two different transitions originating from different lower rotational levels (11,57). The advantage of the two-line technique is that the ratio of the signals is directly related to the rotational temperature but is independent of the local collisional environment, because the quenching affects the fluorescence from both lines similarly. The main difficulty in this technique is that if instantaneous temperature fields are desired, then two tunable laser sources and two camera systems are required. If only time-average measurements are required, then it is possible to use only a single laser/camera system. The two-line imaging technique has been used on a wide variety of flows for a range of fluorescent species. For example, measurements have been made in flows seeded by NO (97–99) and I2 (70,100) and by naturally occurring species such as OH (101). An example of a mean temperature image of a Mach 3 turbulent bluff trailing-edge wake is shown in Fig. 18. This image was obtained by seeding 500 ppm of NO into the main air stream and then capturing the fluorescence that results from the excitation of two different absorption lines (99). The figure clearly reveals the structure of the wake flow field, including
FLOW IMAGING
the warm recirculation region behind the base, the cool expansion fans, and the jump in temperature across the recompression shocks. Most PLIF thermometry has employed diatomic molecules, but the temperature dependence of acetone fluorescence has been used for single- and two-line temperature imaging in gas-phase flows (102,103). The advantage of using acetone is that it absorbs across a broad range of frequencies and thus tunable lasers are not required. In the single-line technique, which is applicable to flows that have a uniform acetone mole fraction and constant pressure, relative temperature measurements can be made up to temperatures of about 1000 K. For example, pumping by a KrF excimer laser at 248 nm can provide an estimated 1 K measurement uncertainty at 300 K. When the acetone mole fraction is not constant (such as in a mixing or reacting flow), a twoline technique can be used that is based on measuring the ratio of the LIF signals resulting from the excitation by two fixed-frequency lasers. For example, the ratio of PLIF images obtained from pumping by a XeCl excimer (308 nm) and quadrupled Nd •• YAG (266 nm) can be used to achieve a factor of 5 variation in the signal ratio across the range 300–1000 K. Compared to the singlelaser technique, the two-laser technique is considerably harder to implement (particularly if both images are acquired simultaneously), and it exhibits substantially lower temperature sensitivity. All of the techniques that have been developed for measuring pressure do not measure it directly but instead infer its value from an equation of state combined with measurements of the fluid density and temperature. For this reason, pressure is very difficult to infer in low-speed flows because the pressure fluctuations result in only very small fluctuations in the density and temperature. PLIF has seen the most extensive use in pressure imaging, although one technique based on FRS (94) has been developed and was described earlier. For example, in (104), PLIF of seeded
413
iodine and known absorption line shapes were used to infer first-order accurate pressure information for an underexpanded jet. This technique requires an isentropic flow assumption, which makes it inapplicable in many practical situations. In a related iodine PLIF technique, the pressure field was obtained by measuring its effect on the broadening of the absorption line shape (105). A limitation of this technique is that it is not very sensitive to pressure for moderate to high pressures (e.g., near 1 atm and above). In (106), NO PLIF was used to infer the 2-D pressure field in a high-enthalpy shock tunnel flow using the ratio of NO PLIF signals from a pressure-insensitive B-X transition and a pressure-sensitive A-X transition. A correction for the temperature measured in an earlier study then allowed them to infer the static pressure. This technique may be more practical than I2 PLIF in some cases because NO occurs naturally in high-temperature air flows, but its disadvantages include the low fluorescent yield of the B-X transition and that accurate quenching rates are required. NO PLIF was also used to infer the pressure field of a bluffbody turbulent wake whose temperature field is shown in Fig. 18 (99). In this technique, trace levels of NO were seeded into a nitrogen-free stream. Because N2 is very inefficient in quenching NO fluorescence, the LIF signal is directly proportional to the static pressure and to a nonlinear function of temperature. However, the temperature dependence can be corrected for if the temperature is measured independently, such as by the two-line method. The resulting mean pressure field obtained by this technique is shown in Fig. 19. This figure shows the low-pressure expansion fans originating from the lip of the splitter plate, the pressure increase across the recompression shock, and the nearly constant-pressure turbulent wake. Velocity The most widely applied velocity imaging technique in fluid mechanics is particle image velocimetry (PIV). PIV is a very robust and accurate technique, which in its
Temperature
P /P∞
280 K Mach 3
Mach 3
1.0 0.8 0.6 0.4 0.2
50 K Figure 18. The mean temperature field of a supersonic bluff-body wake derived from two-line NO PLIF imaging. The field of view is 63 mm wide by 45 mm high. (Reprinted with permission from PLIF Imaging of Mean Temperature and Pressure in a Supersonic Bluff Wake by E. R. Lachney and N. T. Clemens, Experiments in Fluids, 24, 354–363, copyright 1998 Springer-Verlag.) See color insert.
Figure 19. The mean pressure field of a supersonic bluff-body wake derived from NO PLIF imaging. The field of view is 63 mm wide by 45 mm high. (Reprinted with permission from PLIF Imaging of Mean Temperature and Pressure in a Supersonic Bluff Wake by E. R. Lachney and N. T. Clemens, Experiments in Fluids, 24, 354–363, copyright 1998 Springer-Verlag.) See color insert.
414
FLOW IMAGING
most common implementation enables the imaging of two components of velocity in a cross-section of the flow. PIV measurements are now commonplace and they have been applied in a wide range of gas- and liquid-phase flows, including microfluidics, large-scale wind tunnels, flames, plasmas, and supersonic and hypersonic flows. Excellent introductions to PIV can be found in several references (5,20,23,107). At its simplest, PIV involves measuring the displacement of particles moving with the fluid for a known time. The presumption, of course, is that the particles, which are usually seeded, have sufficiently low inertia to track the changes in the motion of the flow (108). Even a cursory review of the literature shows that there are myriad variations of PIV, and therefore for brevity, only one of the most commonly used configurations will be discussed here. In a typical PIV experiment, two spatially coincident laser pulses are used where there is a known time between the pulses. The coincident beams are formed into thin sheets and passed through a flow seeded with particles. The lasers used are usually frequency-doubled Nd •• YAG lasers, and the two pulses can
originate from two separate lasers, from double-pulsing the Q-switch of a single laser, or from one of several dual-cavity lasers that were designed specifically for PIV applications. In two-component PIV, the scattering from the particles is imaged at 90° to the laser sheets using a high-resolution CCD camera, or less commonly today, a chemical film camera. The particle pairs can be imaged onto either a single frame (i.e., a double exposure) or onto separate frames. A major issue in PIV is that if it is not possible to tell which particle image of the pair came first, then there is an ambiguity in the direction of the velocity vector. This is one of the main advantages of the twoframe method because it does not suffer from directional ambiguity. Several CCD cameras on the market are ideal for two-frame PIV. They are based on interline transfer technology and can ‘‘frame-straddle,’’ or allow the capture of two images a short time apart. An example of a twoframe particle field captured in a turbulent jet is shown in Fig. 20 (from Ref. 109). The camera used was a 1k × 1 k frame-straddling camera (Kodak ES1.0), the field of view was 33 × 33 mm, and the time between pulses was 8 µs.
(a)
t + ∆t
Spatial crosscorrelation function
t
(b)
1 m/s
Figure 20. Sample PIV images. (a) Two-frame particle field images. The right image was captured 8 µs after the left image. (b) A two-component velocity vector field computed from a cross-correlation analysis of a two-frame particle image pair. (Reprinted with permission from Ref. 109.)
FLOW IMAGING
The particle displacements are obtained by dividing the image into smaller interrogation windows (usually ranging from 16 × 16 to 64 × 64 pixels), for which a single velocity vector is computed for each window. Examples of interrogation windows are shown as white boxes in Fig. 20a. The displacement is determined by computing the spatial cross-correlation function for the corresponding windows in each image of the pair, as shown in Fig. 20a. The mean displacement and direction of the velocity vector can then be determined from the location of the peaks in the cross-correlation function. This is then repeated for every interrogation window across the frame. A sample turbulent jet velocity field computed from this process is shown in Fig. 20b. For this vector field, the interrogation window size was 32 × 32 pixels, and the window was offset by 16 pixels at a time (50% overlap), which resulted in 62 × 62 vectors across the field. Because the velocity is averaged across the interrogation window, PIV resolution and DSR are important issues. For example, typically cited values of the resolution are about 0.5 to 1 mm. Perhaps a bigger limitation though, is the DSR Np /Nw , where Np and Nw are the linear sizes of the array and the interrogation window, respectively. For example, a 1k × 1k array that has a 32 × 32 window gives a DSR of only 32. If the minimum required resolution is 1 mm, then the maximum field of view that can be used is 32 mm. Limited DSR is one of the main reasons for using large format film and megapixel CCD arrays. Several algorithms have been developed that use advanced windowing techniques (110) or a combination of PIV and particle tracking (111–113) to improve both the resolution and DSR of the measurements substantially. The PIV technique described can measure only two components of velocity; however, several techniques have been developed that enable measuring all three components. Probably the most widely used technique to date is stereoscopic PIV, which requires using two cameras, separated laterally, but share a common field of view (20,23). Particle displacement perpendicular to the laser sheet can be computed by using the particle images from the two cameras and simple geometric relationships. Although stereoscopic PIV is somewhat more difficult to implement than two-component PIV, much of the development burden can be avoided because entire systems are available from several different companies. In another class of velocity imaging techniques, the scattered light signal is related to the Doppler shift imparted by the bulk motion of the flow. Both FRS and PLIF techniques have been applied that use this effect and may be preferable to PIV under some circumstances. For example, both FRS and PLIF velocimetry become easier to use in high-speed flows, owing to increasing Doppler shifts, whereas PIV becomes more difficult to use at high speeds because of problems in obtaining sufficient seeding density and ensuring small enough particles to track the fluid motion. Rayleigh scattering velocimetry has seen substantial development in recent years, and different researchers have implemented closely related techniques, which go by the names of global Doppler velocimetry, filtered Rayleigh scattering, and planar Doppler velocimetry. Here, the
415
less ambiguous term, planar Doppler velocimetry (PDV), will be used. A recent review of these techniques can be found in (114). All of these techniques operate on the basic principle that small changes in the frequency of the scattered light resulting from Doppler shifts can be inferred from the signal when the scattered light is imaged through a narrowband notch filter. Two Doppler shifts affect measurement. When molecules in the flow are illuminated by an incident laser beam, the radiation by the induced dipoles in the gas will be Doppler-shifted if there is a component of the bulk fluid velocity in the direction of the laser beam propagation. Similarly, the detector will perceive a further Doppler shift in the induced radiation if there is a component of the bulk fluid velocity in the direction of the detector. The result is that the perceived Doppler shift fD measured by the detector is given by (82) fD =
(s − o) · V , λ
(35)
is the bulk fluid velocity, o is the unit vector in the where V laser propagation direction, and s is the vector originating from the probe volume and pointing toward the detector. In PDV, the Rayleigh scattering is induced by a tunable narrow line width laser, and the flow is imaged through a notch filter. In the most common implementation, the laser source is an injection seeded, frequency-doubled Nd •• YAG laser, which has a line width of about 50–100 MHz and can be tuned over a range of several GHz (114). The notch filter is usually an iodine vapor cell. In one technique, the laser is tuned so the non-Doppler-shifted light is centered on the edge of the absorption line, such as the right edge of the line shown in Fig. 17. Usually the scattering medium is an aerosol, such as a condensation fog, and thus the scattered line width is nearly the same as that of the laser. If the flow has a constant particle density, then the signal will increase as the Doppler shift increases. If the notch filter line shape is known, then the signal can be directly related to the velocity. In most cases, the density is not constant, and therefore a separate nonDoppler-shifted density measurement must be made. This can be accomplished by using another camera or a singlecamera split-image configuration (114). Much of the recent work in this area has been in improving the accuracy of the technique and in extending it to enable measuring three components of velocity. PLIF velocimetry is also a Doppler-shift-based technique, which is particularly applicable in high-speed reacting flows where seeding the flow with particles is not practical or where low gas densities preclude the use of Rayleigh scattering. In most PLIF velocimetry studies, the flow is seeded by a tracer, such as iodine or NO, although naturally occurring species, such as OH, have also been used successfully. PLIF velocimetry is accomplished by having the laser sheet propagate as much as possible in the direction of the bulk flow, which maximizes the Doppler shift seen by the absorbing molecules. The camera is usually oriented normally to the laser sheet, and the broadband fluorescence is collected (i.e., it is not spectrally resolved). Thus, unlike PDV, only the Doppler
416
FLOW IMAGING
shift induced by the flow/laser beam is relevant. The difficulty in PLIF is that in addition to velocity, the fluid composition, pressure, and temperature also affect the signal through number density, population, quenching, and line shape effects. Therefore, schemes have to be devised that enable isolating effects due to velocity alone. In an early PLIF velocimetry technique, a tunable CW narrow line laser (argon-ion) was used to scan in frequency across an absorption line of I2 seeded in a high-speed flow, and several images were captured during the scan, which enabled reconstructing the line shape for each pixel (115). The measured Doppler-shifted line shapes were then compared to an unshifted line shape taken in a stationary reference cell. Although this technique worked well, it can provide only time-average measurements because it takes finite time to scan the laser. In another technique also employing I2 PLIF, two discrete laser frequencies and four laser sheets were used to measure two components of mean velocity and pressure in an underexpanded jet (104). In the techniques mentioned before, the laser line needs to be much narrower than the absorption line. It can be an advantage, however, when the laser line width is much larger than the absorption line width because it reduces the sensitivity of the signal to variations in the absorption line shape. For example, in (116), two counterpropagating laser sheets and two cameras were used to image one component of velocity in NO-seeded supersonic flows. The reason for using counterpropagating sheets is that the ratio of the LIF signals from the two sheets can be related to the velocity component but is independent of the local temperature and pressure. When the laser line is of the same order of magnitude as the absorption line, such two-sheet fixed-frequency techniques require modeling the overlap integral for the absorption and laser line shapes (117). Future Developments Although new quantitative imaging techniques will certainly continue to be developed, it is likely that the greatest effort in the future will be directed at simply improving existing techniques by making them easier and cheaper to implement and by improving the accuracy, precision, resolution, and framing rate. A good example of the improvement that can be achieved by better technology is to compare the quality of OH PLIF imaging from one of the first images captured by this technique in 1984 (118) to images that have been captured more recently (76). The difference in quality is dramatic, despite the use of the same technique in both cases. A major trend that started in the past decade, but will no doubt continue, is the application of two or more ‘‘established’’ techniques to obtain simultaneous images of several flow parameters (81). Multiple-parameter techniques include the simultaneous acquisition of multiple flow variables, such as velocity and scalars. Multipleparameter imaging also includes imaging the same flow variable with a short time delay between images, to obtain the rate of change of a property, and acquiring two images of the same flow variable where the laser sheets are placed a small distance apart to enable the computation of spatial gradients. Because multiple-parameter imaging
usually involves established techniques, its implementation is usually limited by the availability of the required equipment and by optical access for all of the laser beams and cameras. An obvious limitation of most of the techniques that have been discussed is that the framing rates are typically limited to a few hertz. This limitation is imposed by the laser and camera systems that are available now. Although there is no question that the laser power of high-repetition rate commercial lasers will continue to increase with time, limited laser power will remain an obstacle to kilohertz imaging for many of the techniques discussed in this article. For example, the Rayleigh scattering image of Fig. 16 required about 300 mJ of light from a frequencydoubled Nd •• YAG operating at 10 Hz, which corresponds to 3 W of average power. If it was desired to acquire images that have the same SNR at 10 kHz, such as is likely to be required in even a moderate Reynolds number gasphase flow, then this would require a doubled Nd •• YAG laser whose average power is 3 kW. This amount of continuous average power might not be large compared to that required for metal cutting or ballistic missile defense, but it is an astounding amount of power by flow diagnostics standards, and handling such a laser would provide many practical problems for the user. Highframing rate imaging is also currently limited by camera technology; no camera is currently available that operates quasi-continuously at 10 kHz at 10–20 e− rms noise per pixel as is necessary to obtain images of the quality of Fig. 16. The reason for this is that high framing rates require high readout bandwidths, which in turn lead to more noise. Thus to keep the noise low, either the framing rate or the number of pixels must be degraded. Despite this caveat, progress toward higher framing rate imaging for all of the techniques discussed here will continue as the necessary technologies improve. Another major trend that will continue is the further development and refinement of three-dimensional techniques. The most commonly used three-dimensional techniques are classified as either tomography or reconstructions from stacks of planar images. Tomography is the reconstruction of a 3-D field of a fluid property from line-of-sight integrated data measured from several different directions through the flow. For example, both absorption (11) and interferometry (119,120) have been used, which enable reconstructing the 3-D concentration and index-of-refraction fields, respectively. A more popular technique is to reconstruct the 3-D field using a set of images that have been acquired by rapidly scanning a laser sheet through the flow and capturing several planar images during the sweep (5). This technique has been used effectively in many aqueous flows using PLIF excited by either continuous or pulsed lasers (6,59,63,121). However, because these techniques rely on sweeping a laser beam or sheet through the flow on a timescale that is shorter than the characteristic fluid timescales, such techniques are significantly more challenging in gas-phase flows. It is remarkable, however, that such experiments have been accomplished by sweeping a flashlamp-pumped dye laser sheet through the flow in only a few microseconds. In one case, the Rayleigh scattered light from a
FLOW IMAGING
freon-gas jet was imaged (122) and in another case the Mie scattering from a particle-laden supersonic mixing layer was imaged (123). Both studies used a high-speed framing camera that could acquire only a few frames (e.g., 10–20), and thus the resolution of the reconstructions was obviously quite limited. The future of 3-D flow imaging is probably best exemplified by holographic PIV (HPIV), which provides accurate three-component velocity fields throughout a volume of fluid (124–127). HPIV is an intrinsically 3-D technique, which begins with recording a hologram of the 3-D double-exposure particle field onto high-resolution film. The image is then reconstructed, and the particle field is digitized by sequentially imaging planes of the reconstruction using a digital camera. HPIV enables the acquisition of an astounding amount of data, but because it is a challenging technique to implement and it requires using very high resolution large-format chemical film, the framing rates will remain low for at least the near future. In conclusion, flow imaging is driving a revolution in fluid mechanics research that will continue well into the future. Continued advances in laser and digital camera technologies will make most of the imaging techniques described in this article possible one day at sufficient spatial resolution and framing rates to resolve virtually any flow spatial and temporal scale of interest. This is an exciting proposition as we enter a new century of experimental fluid dynamics research. Acknowledgments The author acknowledges the generous support of his research into flow imaging by the National Science Foundation, particularly under grants CTS-9319136 and CTS-9553124. In addition, the author thanks Michael Tsurikov and Yongxi Hou of UT-Austin for help in preparing this article.
ABBREVIATIONS AND ACRONYMS 2-D 3-D CCD CTF CW DSR FRS FT ICCD IR LIF LSF MTF NA Nd:YAG OPO OTF PDV PIV PLIF PSF PTF SBR SNR SRF
two-dimensional three-dimensional charge-coupled device contrast transfer function continuous wave dynamic spatial range filtered Rayleigh scattering Fourier transform intensified charge-coupled device infrared laser-induced fluorescence line spread function modulation transfer function numerical aperture neodymium: yttrium-aluminum garnet optical parametric oscillator optical transfer function planar Doppler velocimetry particle image velocimetry planar laser-induced fluorescence point spread function phase transfer function signal to background ratio signal to noise ratio step response function
STP TEM TV UV
417
standard temperature and pressure transverse electromagnetic modes television ultraviolet
BIBLIOGRAPHY 1. P. H. Paul, M. G. Garguilo, and D. J. Rakestraw, Anal. Chem. 70, 2459–2467 (1998). 2. J. G. Santiago et al., Exp. Fluids 25, 316–319 (1998). 3. L. M. Weinstein, High-Speed Research: 1995 Sonic Boom Workshop, Atmospheric Propagation and Acceptability Studies, NASA CP-3335, October, 1995. 4. M. Van Dyke, An Album of Fluid Motion, The Parabolic Press, Stanford, 1982. 5. A. J. Smits and T. T. Lim, eds., Flow Visualization: Techniques and Examples, Imperial College Press, London, 2000. 6. W. J. A. Dahm and K. B. Southerland, in A. J. Smits and T. T. Lim, eds., Flow Visualization: Techniques and Examples, Imperial College Press, London, 2000, pp. 289– 316. 7. L. K. Su and W. J. A. Dahm, Phys. Fluids 8, 1,883–1,906 (1996). 8. M. Gharib, J. Fluids Eng. 118, 233–242 (1996). 9. W. Merzkirch, Flow Visualization, 2nd ed., Academic Press, Orlando, 1987. 10. G. S. Settles, AIAA J. 24, 1,313–1,323 (1986). 11. A. C. Eckbreth, Laser Diagnostics for Combustion Temperature and Species, Abacus Press, Cambridge, 1988. 12. B. J. Kirby and R. K. Hanson, Appl. Phys. B 69, 505–507 (1999). 13. J. Hecht, The Laser Guidebook, 2nd ed., McGraw-Hill, NY, 1992. 14. P. Wu and R. B. Miles, Opt. Lett. 25, 1,639–1,641 (2000). 15. J. M. Grace et al., Proc. SPIE 3642, 133–141 (1999). 16. A. E. Siegman, Lasers, University Science Books, Mill Valley, CA, 1986. 17. M. W. Sasnett, in D. R. Hall and P. E. Jackson, ed., The Physics and Technology of Laser Resonators, Adam Hilger, Bristol, 1989, pp. 132–142. 18. W. J. Smith, Modern Optical Engineering: The Design of Optical Systems, 2nd ed., McGraw-Hill, NY, 1990. 19. M. B. Long, D. C. Fourguette, M. C. Escoda, and C. B. Layne, Opt. Lett. 8, 244–246 (1983). 20. R. J. Adrian, Ann. Rev. Fluid Mech. 23, 261–304 (1991). 21. A. Vogel and W. Lauterborn, Opt. Lasers Eng. 9, 274–294 (1988). 22. J. C. Lin and D. Rockwell, Exp. Fluids 17, 110–118 (1994). 23. M. Raffel, C. E. Willert, and J. Kompenhans, Particle Image Velocimetry: A Practical Guide, Springer, Berlin, 1998. 24. B. Lecordier et al., Exp. Fluids 17, 205–208 (1994). 25. N. T. Clemens, S. P. Petullo, and D. S. Dolling, AIAA J. 34, 2,062–2,070 (1996). 26. P. H. Paul, AIAA Paper 91–2315, June, 1991. 27. J. M. Seitzman and R. K. Hanson, in A. Taylor, ed., Instrumentation for Flows with Combustion, Academic Press, London, 1993. 28. RCA Staff, Electro-Optics Handbook, RCA, Lancaster, PA, 1974. 29. P. H. Paul, I. van Cruyningen, R. K. Hanson, and G. Kychakoff, Exp. Fluids 9, 241–251 (1990).
418
FLOW IMAGING
30. J. R. Janesick et al., Opt. Eng. 26, 692–714 (1987). 31. I. S. McLean, Electronic Imaging in Astronomy: Detectors and Instrumentation, John Wiley & Sons, NY, 1997. 32. L. K. Su and N. T. Clemens, Exp. Fluids 27, 507–521 (1999). 33. E. Hecht, Optics, 3rd ed., Addison-Wesley, Reading, MA, 1998. 34. W. K. Pratt, Digital Image Processing, 2nd ed., Wiley, NY, 1991. 35. T. L. Williams, The Optical Transfer Function of Imaging Systems, Institute of Physics Publishing, Bristol, 1999. 36. C. S. Williams and O. A. Becklund, Introduction to the Optical Transfer Function, John Wiley & Sons, NY, 1989. 37. W. Wittenstein, J. C. Fontanella, A. R. Newbery, and J. Baars, Optica Acta 29, 41–50 (1982). 38. R. N. Bracewell, The Fourier Transform and its Applications, 2nd ed., McGraw-Hill, NY, 1986. 39. F. Chazallet and J. Glasser, SPIE Proc. 549, 131–144 (1985). 40. H. Tennekes and J. L. Lumley, A First Course in Turbulence, MIT Press, Cambridge, 1972. 41. G. K. Batchelor, J. Fluid Mech. 5, 113–133 (1959). 42. K. A. Buch Jr. and W. J. A. Dahm, J. Fluid Mech. 317, 21–71 (1996). 43. K. A. Buch Jr. and W. J. A. Dahm, J. Fluid Mech. 364, 1–29 (1998). 44. C. A. Friehe, C. W. van Atta, and C. H. Gibson, in AGARD Turbulent Shear Flows CP-93, North Atlantic Treaty Organization, Paris, 1971, pp. 18-1–18-7. 45. J. C. Wyngaard, J. Phys. E: J. Sci. Instru. 1, 1,105–1,108 (1968). 46. J. C. Wyngaard, J. Phys. E: J. Sci. Instru. 2, 983–987 (1969). 47. R. A. Antonia and J. Mi, J. Fluid Mech. 250, 531–551 (1993). 48. C. J. Chen and W. Rodi, Vertical Turbulent Buoyant Jets: A Review of Experimental Data, Pergamon, Oxford, 1980. 49. R. J. Goldstein and T. H. Kuen, in R. J. Goldstein, ed., Fluid Mechanics Measurements, Taylor and Francis, London, 1996. 50. N. T. Clemens and P. H. Paul, Phys. Fluids A 5, S7 (1993). 51. P. S. Greenberg, R. B. Klimek, and D. R. Buchele, Appl. Opt. 34, 3,810–3,822 (1995). 52. A. K. Agrawal, N. K. Butuk, S. R. Gollahalli, and D. Griffin, Appl. Opt. 37, 479–485 (1998). 53. J. W. Goodman, Introduction to Fourier Optics, 2nd ed., McGraw-Hill, Boston, 1996. 54. G. F. Albrecht, H. F. Robey, and T. R. Moore, Appl. Phys. Lett. 57, 864–866 (1990). 55. D. Papamoschou and H. F. Robey, Exp. Fluids 17, 10–15 (1994). 56. L. M. Weinstein, AIAA J. 31, 1,250–1,255 (1993). 57. R. K. Hanson, J. M. Seitzman, and P. H. Paul, Appl. Phys. B 50, 441–454 (1990). 58. M. M. Koochesfahani and P. E. Dimotakis, J. Fluid Mech. 170, 83–112 (1986). 59. R. R. Prasad and K. R. Sreenivasan, J. Fluid Mech. 216, 1–34 (1990). 60. C. Arcoumanis, J. J. McGuirk, and J. M. L. M. Palma, Exp. Fluids 10, 177–180 (1990). 61. J. R. Saylor, Exp. Fluids 18, 445–447 (1995). 62. P. S. Karasso and M. G. Mungal, Exp. Fluids 23, 382–387 (1997). 63. J. Sakakibara and R. J. Adrian, Exp. Fluids 26, 7–15 (1999).
64. J. D. Nash, G. H. Jirka, and D. Chen, Exp. Fluids 19, 297–304 (1995). 65. A. Lozano, B. Yip, and R. K. Hanson, Exp. Fluids 13, 369–376 (1992). 66. S. H. Smith and M. G. Mungal, J. Fluid Mech. 357, 83–122 (1998). 67. D. Papamoschou and A. Bunyajitradulya, Phys. Fluids 9, 756–765 (1997). 68. D. Wolff, H. Schluter, and V. Beushausen, Berichte der Bunsen-Gesellschaft fur physikalisc 97, 1,738–1,741 (1993). 69. R. J. Hartfield Jr., J. D. Abbitt III, and J. C. McDaniel, Opt. Lett. 14, 850–852 (1989). 70. J. M. Donohue and J. C. McDaniel Jr., AIAA J. 34, 455–462 (1996). 71. N. T. Clemens and M. G. Mungal, J. Fluid Mech. 284, 171–216 (1995). 72. N. T. Clemens and P. H. Paul, Phys. Fluids 7, 1,071–1,081 (1995). 73. T. C. Island, W. D. Urban, and M. G. Mungal, Phys. Fluids 10, 1,008–1,021 (1998). 74. G. F. King, J. C. Dutton, and R. P. Lucht, Phys. Fluids 11, 403–416 (1999). 75. T. Parr and D. Hanson-Parr, L. DeLuca, E. W. Price, and M. Summerfield, eds., Nonsteady Burning and Combustion Stability of Solid Propellants, Progress in Aeronautics and Astronautics, American Institute of Aeronautics and Astronautics, vol. 143, Washington, DC, 1992, pp. 261–323. 76. J. M. Donbar, J. F. Driscoll, and C. D. Carter, Combustion and Flame 122, 1–19 (2000). 77. C. F. Bohren and D. R. Huffman, Absorption and Scattering of Light by Small Particles, John Wiley & Sons, NY, 1983. 78. E. J. McCartney, Optics of the Atmosphere: Scattering by Molecules and Particles, Wiley, NY, 1976. 79. B. Yip, R. L. Schmitt, and M. B. Long, Opt. Lett. 13, 96–98 (1988). 80. D. A. Feikema, D. Everest, and J. F. Driscoll, AIAA J. 34, 2,531–2,538 (1996). 81. M. B. Long, in A. M. K. P. Taylor, ed., Instrumentation for Flows with Combustion, Academic, London, 1993, pp. 467–508. 82. R. B. Miles and W. R. Lempert, Ann. Rev. Fluid Mech. 29, 285–326 (1997). 83. M. Namazian, J. T. Kelly, and R. W. Schefer, TwentySecond Symposium (Int.) on Combustion, The Combustion Institute, Pittsburgh, 1988, pp. 627–634. 84. M. Namazian et al., Exp. Fluids 8, 216–228 (1989). 85. J. B. Kelman, A. R. Masri, S. H. Starner, and R. W. Bilger, Twenty-Fifth Symposium (Int.) on Combustion, The Combustion Institute, Pittsburgh, 1994, pp. 1,141–1,147. 86. D. Dabiri and M. Gharib, Exp. Fluids 11, 77–86 (1991). 87. I. Kimura et al., in B. Khalighi, M. J. Braun, and C. J. Freitas, eds., Flow Visualization, vol. 85, ASME FED, 1989, pp. 69–76. ¨ 88. M. Ozawa, U. Muller, I. Kimura, and T. Takamori, Exp. Fluids 12, 213–222 (1992). 89. M. C. J. Coolen, R. N. Kieft, C. C. M. Rindt, and A. A. van Steenhoven, Exp. Fluids 27, 420–426 (1999). 90. J. Coppeta and C. Rogers, Exp. Fluids 25, 1–15 (1998). 91. A. Alahyari and E. K. Longmire, Exp. Fluids 17, 434–440 (1994). 92. D. C. Fourguette, R. M. Zurn, and M. B. Long, Combustion Sci. Technol. 44, 307–317 (1986).
FORCE IMAGING
419
93. D. A. Everest, J. F. Driscoll, W. J. A. Dahm, and D. A. Feikema, Combustion and Flame 101, 58–68 (1995).
126. J. O. Scherer and L. P. Bernal, Appl. Opt. 36, 9,309–9,318 (1997).
94. J. N. Forkey, W. R. Lempert, and R. B. Miles, Exp. Fluids 24, 151–162 (1998).
127. J. Zhang, B. Tao, and J. Katz, Exp. Fluids 23, 373–381 (1997).
95. G. S. Elliott, N. Glumac, C. D. Carter, and A. S. Nejad, Combustion Sci. Technol. 125, 351–369 (1997). 96. R. G. Seasholtz, A. E. Buggele, and M. F. Reeder, Opt. Lasers Eng. 27, 543–570 (1997).
FORCE IMAGING
97. B. K. McMillan, J. L. Palmer, and R. K. Hanson, Appl. Opt. 32, 7,532–7,545 (1993).
KIM DE ROY RSscan INTERNATIONAL Belgium
98. J. L. Palmer, B. K. McMillin, and R. K. Hanson, Appl. Phys. B 63, 167–178 (1996).
L. PEERAER
99. E. R. Lachney and N. T. Clemens, Exp. Fluids 24, 354–363 (1998).
University of Leuven Belgium, FLOK University Hospitals Leuven CERM
100. T. Ni-Imi, T. Fujimoto, and N. Shimizu, Opt. Lett. 15, 918–920 (1990). 101. J. M. Seitzman and R. K. Hanson, Appl. Phys. B 57, 385–391 (1993). 102. M. C. Thurber, F. Grisch, and R. K. Hanson, Opt. Lett. 22, 251–253 (1997).
INTRODUCTION
103. M. C. Thurber et al., Appl. Opt. 37, 4,963–4,978 (1998).
The study of human locomotion has generated a substantial number of publications. Starting from evolutionary history, the authors try to give some insight into the transition from quadripedal to bipedal locomotion (1,2). Evolving to a semiaquatic mode of life, it may be assumed that the human foot developed from its original versatile function for swimming (3), support, and gripping to a more specialized instrument that can keep the body in an upright position. This change of function enables all of the movements that are specific to humans such as walking and running. Data deduced from the literature show that the effects of walking speed on stride length and frequency are similar in bonobos, common chimpanzees, and humans. This suggests that within extant Hominidae, spatiotemporal gait characteristics are highly comparable (4) (Fig. 1). Despite these similarities, the upright position and erect walking is accepted as one of the main characteristics that differentiate humans from animals. No wonder that throughout the search for human evolution the imprints of the feet of the first humanoid creatures that were found are studied and discussed as much as their skulls. Whatever the causes for this evolution to the erect position, the fact remains that static and dynamic equilibrium must be achieved during bipedal activities, thus dramatically reducing the supporting area with respect to the quadripedal condition where this total area is formed by more than two feet. The available area of support during bipedal activities is thus restricted to that determined by one or both feet. The anatomical structure of the human foot, as well as the neuromuscular and circulatory control must have evolved to a multijoint dynamic mechanism that determines the complex interaction between the lower limb and the ground during locomotion (5). Consequently, besides gravitation, the external forces acting on the human body act on the plantar surface of the foot and generate movement according to Newton’s laws. Thus, studying the latter, called the ground reaction force (GRF), is essential to our understanding of human normal and pathological locomotion. The GRF may, however, vary in point of application, magnitude, and orientation, necessitating the
104. B. Hiller and R. K. Hanson, Appl. Opt. 27, 33–48 (1988). 105. R. J. Hartfield Jr., S. D. Hollo, and J. C. McDaniel, AIAA J. 31, 483–490 (1993). 106. P. Cassady and S. Lieberg, AIAA Paper No. 92-2962, 1992. 107. P. Buchhave, in L. Lading, G. Wigley, and P. Buchhave, eds., Optical Diagnostics for Flow Processes, Plenum, NY, 1994, pp. 247–269. 108. A. Melling, Meas. Sci. Technol. 8, 1,406–1,416 (1997). 109. J. E. Rehm, Ph.D. Dissertation, The University of Texas at Austin, 1999. 110. J. Westerweel, D. Dabiri, and M. Gharib, Exp. Fluids 23, 20–28 (1997). 111. R. D. Keane, R. J. Adrian, and Y. Zhang, Meas. Sci. Technol. 6, 754–768 (1995). 112. E. A. Cowen and S. G. Monismith, Exp. Fluids 22, 199–211 (1997). 113. J. E. Rehm and N. T. Clemens, Exp. Fluids 26, 497–504 (1999). 114. M. Samimy and M. P. Wernet, AIAA J. 38, 553–574 (2000). 115. J. C. McDaniel, B. Hiller, and R. K. Hanson, Opt. Lett. 8, 51–53 (1983). 116. P. H. Paul, M. P. Lee, and R. K. Hanson, Opt. Lett. 14, 417–419 (1989). 117. M. Allen et al., AIAA J. 32, 1,676–1,682 (1994). 118. G. Kychakoff, R. D. Howe, and R. K. Hanson, Appl. Opt. 23, 704–712 (1984). 119. L. Hesselink, in W. -J. Yang, ed., Handbook of Flow Visualization, Hemisphere, NY, 1987. 120. D. W. Watt and C. M. Vest, Exp. Fluids 8, 301–311 (1990). 121. M. Yoda, L. Hesselink, and M. G. Mungal, J. Fluid Mech. 279, 313–350 (1994). 122. B. Yip, R. L. Schmitt, and M. B. Long, Opt. Lett. 13, 96–98 (1988). 123. T. C. Island, B. J. Patrie, and R. K. Hanson, Exp. Fluids 20, 249–256 (1996). 124. H. Meng and F. Hussain, Fluid Dynamics Res. 8, 33–52 (1991). 125. D. H. Barnhart, R. J. Adrian, and G. C. Papen, Appl. Opt. 33, 7,159–7,170 (1994).
420
FORCE IMAGING
(a)
(b)
(a)
(b)
L (c)
(d) (c)
R Figure 1. Plantar pressure measurement of the bonobo apes (a,b) (source: The Bonobo Project, Universities of Antwerp and Ghent (Belgium) and the Royal Zoological Society of Antwerp.) and a human being (c,d) (source c,d: footscan). See color insert for (b) and (d).
measurement of its vertical, fore–aft, and mediolateral components for accurate analysis. Different authors have thoroughly investigated these components using different devices (6–8). Force plates enable measuring the resultant three force vectors by four three-dimensional force sensors (Fig. 2c) commonly mounted near the corners underneath a stiff rectangular plate (Fig. 2a,b). Measuring force distribution on the foot would, however, necessitate using a multitude of these three-dimensional sensors across the whole of the area of the plate, thus challenging actual state-of-the-art technology and considerations of cost-efficiency. Commercially available plantar force- or pressure measurement systems are at present restricted to measuring forces normal to the sensor (9,10). We may expect, however, that accurate and reliable measurement of shear forces in the fore–aft and mediolateral directions will become available in the near future. Although several sensors such as piezoelectric crystals and foils, photoelastic materials, and others have been used in the past, most state-of-the-art systems provide a thin (approximately 1 mm or less) sensor array composed of resistive or capacitive force sensors of known area to calculate the mean pressure over the sensor area (10,11). These may be mounted in pressure-distribution platforms or in pressuresensitive insoles (12) available in different foot sizes and developed by different manufacturers. This type of measuring equipment is used in biomechanical analysis of the
Figure 2. (a) Image of the 9253 multicomponent Kistler force plate. (b) Image of the 9285 multicomponent glass-top Kistler force plate. (c) Graphic representation of the multicomponent quartz force sensors. See color insert for (c).
normal, pathological, or even artificial foot during activities such as standing, walking, running, jumping, cycling, or any activity where knowledge of forces acting on the foot are of interest. Therefore, a subject-specific analysis of the dynamics of locomotion is necessary to distinguish between normal and pathological foot function, to differentiate between the various levels of impairment, and to assess quantitatively the restoration of normal foot function after treatment (13). The field of application of these force-distribution measuring systems is very broad and includes medicine, rehabilitation, biomechanics, sports, ergonomics, engineering, and manufacturing of footwear and insoles. Similar measurement techniques are also used for evaluating pressure parameters in seating, residual limb–socket interfaces, and pressure garments. Because
FORCE IMAGING
the sensors have to conform to the sometimes irregular body shape, technical problems due to sensor bending or other factors will occur. Nevertheless several studies illustrate the clinical relevance of these systems in identifying areas of high pressure, or in making comparative studies, in seating (14–16), stump socket interfaces (17), and pressure garments (18). Final conclusions, however, point out the importance of further development of the existing sensor materials to improve the reliability and accuracy of the existing measurement systems. Foot pressure measurements, however, are of utmost importance in understanding the central role of biomechanical issues in treating the diabetic foot (19) and other foot pathologies, as well as in the design, manufacture, and assessment of insoles and footwear for these patients. To this end, foot-pressure distribution was at first assessed visually by using a so-called podobarograph where the plantar surface of the foot is observed while standing on a glass plate. A camera and image processing techniques enabled a gross quantification of pressure after calibrating the system. Because only barefoot images restricted to the equipment location could be studied, further developments aimed at systems that can be used to measure the pressure distribution between the plantar surface of the foot and footwear during free dynamic activities, in particular, during the gait. These insole measurement systems evolved during the last few decades from a unique, relatively large sensor placed under the foot location of interest to eight or more individual sensors placed under anatomical landmarks on the plantar surface of the foot. Measurement frequency was limited to the sampling rate of the analogue-to-digital converters and computers available at the time. The actual systems present an array of numerous small sensors distributed over the whole plantar surface of the foot, and thus eliminate the need for precise anatomical positioning. Sampling frequency went up to 500 Hz per sensor and higher, meaning that the registration system passes on information every 0.002 seconds, significantly improving the quality. Simultaneously, the measurement platforms, which are built into a walkway, evolved accordingly in sensor technology and sampling equipment. The small measurement area of the first systems, which was not much larger than the maximal foot size, is now transformed to plates available in virtually any size that present a geometric array of relatively small sensors (more than one per square cm) sampled at several hundreds of hertz (≥500 Hz). THE IMAGED OBJECT The human foot is a complex, flexible structure composed of multiple bones and joints (Fig. 3). It must provide stability during support, as well as shock absorption and propulsion during respectively the first- and last phases of support during the gait and running. Angular displacements around three axes occur at the ankle joint and at numerous joints of the foot in various degrees during these activities. Foot movements are described in degrees of plantar and dorsal flexion, valgus and varus, ab- and adduction, inversion and eversion, and pronation–supination. These movements
421
Figure 3. Radiograph of the configuration of a normal foot.
occur simultaneously at different foot joints and in various degrees, which makes precise definition of major foot axes extremely difficult. Anatomical or functional aberrations obviously complicate the whole picture. Moreover, foot kinematics influences the movement of lower limb segments. Although several authors investigated the possible relationships between them (13,20,21) by using three-dimensional kinematic measurement systems capable of tracking several segments simultaneously and often combined with measurement of ground reaction forces and foot-pressure distribution, the question remains whether it is preferable to study these relationships patient-by-patient (22). Yet, foot kinematics are often related to frequently diagnosed lower limb injuries, in particular, in sports. Numerous studies focused on several typical injuries. Some recent examples can be found in (21,23,24). Force- or pressure-distribution measurements are often part of the previously mentioned measurement equipment that can provide the user with specific information about the force normal to the sensor surface or with pressure-distribution data (pressure is perpendicular to the surface). These can be measured level with the floor by using platforms or normal to the plantar surface of the foot using force- or pressure-sensitive insoles. Measurement platforms are fixed to the floor. Thus, the measurements reflect the distribution of the vertical ground reaction force on the foot or the shoe during the various situations under study. Because the foot is usually shod, at least in industrialized countries, the foot can be studied as a functional entity with the shoe that is used. As a consequence, the influence of the shoe on the roll-off pattern of the foot can be studied, as well as the influence of insoles used in the shoe. Depending on the goal of a specific research setup, the sole use of pressuredistribution measurements can provide the researcher with significant information on foot function, shoe–foot interaction, and even foot–insole interaction. Although the measurements do not provide information about foot kinematics, several attempts are made to calculate approximate corresponding movements or parameters that reflect foot flexibility by using various mathematical models. An example is calculation of the radius of gyration based on the principal axis theory (25).
422
FORCE IMAGING
FIELD OF APPLICATION Any activity where foot loading is important in understanding movement is a potential field of application for pressure- or force-distribution measurements on the foot. The following areas can be subdivided: • Orthopedics: examination of neuropathic foot ulceration, in particular, foot malformations, pre- and postoperative assessment in foot, knee, and hip surgery (e.g., hallux valgus, (32)) • Pediatrics: the assessment of orthotics and the effects of drug administration (e.g., botulin toxin) on cerebral palsied children • Sports and sport injuries: running, cycling, skiing • Orthopedic appliances: functional evaluation of orthopedic insoles, shoe inserts, orthopedic shoes, as well as lower limb orthotics for partial load bearing and prosthetic foot and knee systems, stump socket interfacial pressure • Biology: roll-off patterns in bipedal and quadripedal locomotion for different animals (e.g., bonobo apes, horses, cows) • Rehabilitation: assessment of different rehabilitation programs in stroke, Parkinson’s disease, amputation • Foot and finite element modeling • Footwear industries: design of shoes, industrial footwear, sports shoes Considering this diversity, it is clear that the same pressure-sensitive sensor material can be used in any field of application where contact pressure registration is important. Reference can also be made to handling tools and sitting and sleeping facilities. MAIN FACTORS INFLUENCING MEASUREMENT DATA AND THEIR REPRESENTATION Several factors influence the force or pressure data obtained. One of the factors mentioned following, in particular, those related to the subject, will be the factor of interest to the researcher or clinician who examines foot function. As in all research, every possible attempt should be made to allow only this variable to vary during the research protocol. Therefore, knowledge of the different factors involved is needed. These can be arbitrarily divided into three main groups: the factors related to (1) the subject, (2) the measurement equipment, and (3) the sensor environment (9,10). Factors Related to the Subject 1. the anatomical structure of the foot including malformations of the foot; 2. the functional deficiencies of the foot including all of the possible gait deviations, whatever the cause may be: limitation of range of motion in joints, insufficient muscular force, compensatory and pathological movements in all body segments, aberrations in neuromuscular control, etc.;
3. movement or gait characteristics that influence total forces that act on the foot (such as gait velocity, age); and 4. anthropometrical variables such as body weight and length. Factors Related to the Measurement Equipment These should reflect basic conditions in good measurement practice, instrumentation specifications, and statistical analysis. Therefore, the following list is restricted to the most important ones: validity of the measurement, reliability, accuracy, measurement range, signal resolution, geometric or spatial resolution (26), response frequency, hysteresis, time drift, temperature and humidity drift, sensor durability, and sensor bending radius or effect of curvature (10). Because the orientation of the force vectors on the sensors is not known, the sensitivity with respect to the measurement axis of the sensor must be known, as well as signal crossover between sensors, which is particularly important in sensor arrays often used in this type of measurement equipment. In addition, a relatively simple procedure for calibrating the sensors is needed because most have a nonlinear characteristic between applied force or pressure and signal output. A lookup table that has interpolation or any appropriate curve fitting procedure can be installed in the measurement system software. Factors Related to the Sensor Environment Sensor Embedment. Sensors are embedded in a sensor mat (platform or insole) and therefore should be flush and remain so, and the mat surface should be in the load range specified. This implies that the sensor mat should be homogeneous to prevent sensors from protruding from the loaded sensor mat. The latter would result in higher loads on the sensor due to the irregular mat surface. Therefore, the measurements would not represent the actual loads. The Effective Sensor Area. The total area covered by the sensors will be less than the total area of the measurement platform or insole due to the space separating sensors and the space needed for the leads in some systems. The ratio between the effective sensor area and the aforementioned free space is important because it affects the representativeness of the measurement results, depending on the spatial resolution and the software procedures used to interpolate results between sensor locations. The measurement data from each sensor can be interpreted mostly as the mean force or pressure acting across the sensor that results in a certain degree of smoothing across the area. Most software programs visualize force or pressure distribution on the foot using linear or other interpolative techniques to enhance the image produced. This may give the user the impression that the spatial resolution is higher than actual. Sensor Lead Arrangement. Point to point (two leads for each sensor) or matrix lead arrangements can influence measurement results significantly. In the first, crossover phenomena should be practically excluded, but it is
FORCE IMAGING
a challenge for sensor mat design when high spatial resolution is to be obtained, in particular, when the dimensions of the sensor mat are kept to a minimum. In the latter, high spatial resolution can be achieved with relatively small sensor mat dimensions often at the expense of crossover phenomena and consequently of accuracy. Sensor Placement
Platforms. Because forces are measured normal to the sensor surface, results can be considered the distribution of vertical ground reaction force acting on the foot. The advantages are that the spatial position of the foot during the supporting phase can be seen with respect to the line of progression. Multistep platforms (large platforms) also allow calculating temporal and spatial parameters because several consecutive steps are measured. Drawbacks are that measurements are restricted to the walkway where the measurement platform is located and to the fact that the subject may target the plate, thus altering gait. The latter would become relative by using large multistep platforms that are readily available. On-line interpretation of foot roll-of may be difficult because the force or pressure sequence is difficult to relate to foot anatomy. Measurements are also restricted to barefoot walking or walking with shoes that do not damage the plate (e.g., sports shoes: spikes, soccer shoes, etc.) Insoles. Insole measurements in general enable measuring several consecutive steps that are made on different surfaces and in different situations. Because the sensor mat placed in the shoe has a fixed position with respect to the foot, force and pressure distribution can be viewed as related to gross foot anatomy. The spatial position of the foot during the activity considered is unknown, however. It is important that insole measurements are not limited to certain types of shoes, or walking surface (even cycling, ski,) and that a large number of consecutive steps or cycles can be measured depending on the data storage capacity or data transfer rate of the measuring unit. Three possible locations can be considered: 1. between the plantar surface of the foot and the insole or foot bed. Measured forces will be those normal to the plantar surface of the foot, without any reference to the orientation of these force vectors with respect to the vertical. Nevertheless it will be of particular importance to detect or estimate the force or pressure to which the skin and soft tissues of the foot are subjected. Clinical relevance is in this case obvious for feet liable to develop pressure ulceration, as in the diabetic foot. 2. between insole and shoe bottom. Measurements between the insole and the bottom of the shoe are often considered a possible solution for measuring the effectiveness of the insole and for limiting sensor curvature. It must be clear, however, that these measurements do not necessarily reflect the pressure distribution on the plantar surface of the foot.
423
Measurements will be largely influenced by the elastic properties and thickness of the insole used. Measurements under a stiff insole will reflect only the incongruence between insole and shoe bottom, whereas very elastic insoles may result in a measurement that is liable to extrapolation. 3. between the midsole of the shoe and the floor. In this case, the sensor mat is attached to the midsole of the shoe. Any extrapolation of results with respect to plantar surface pressure distribution, if at all desirable, must be made with extreme care as a function of insole and shoe midsole elastic properties, as well as shoe construction or correction and sole profile geometry. The first and second type of measurements will be largely influenced by shoe construction, shoe fit, and the firmness of lacing. Firmly lacing the shoe will indeed create forces that are superimposed on those created by supporting the foot. The third type reflects the properties of a particular combination of foot, insole, and shoe on the roll-off characteristics during the gait, and is to a certain extent comparable to plate measurements. AVAILABLE SYSTEMS Throughout the past few decades, the search for objective data representing foot function led to the development of a variety of devices capable of measuring pressure under the foot. The previous section showed how different aspects influence the adequacy of the plantar pressure measurement. Over the years, a variety of different pressure measurement systems have been reviewed (27–30) using different techniques to obtain objective documentation of foot function. The quantification of the multidimensional forces generated by a load applied on the sensor(s) is the result of an indirect measurement. The degree of deformation of the sensor(s) caused by a force applied on a sensor surface can be converted in terms of pressure (kPa). Therefore, the objectiveness of the measurement depends strongly on the specific electromechanical characteristics of the sensors. Most of the recently available systems use capacitive or resistive sensors, two sensor types that have specific characteristics. Capacitive sensors have the advantage that they conform well to a 3-D shaped surface, show a small temperature drift, and have little hysteresis (10). The disadvantages of capacitive sensors are the need for special leads, large-scale electronics, a limited scanning frequency (up to 100 Hz), and the thickness of these sensors (>1 mm). Resistive sensors are extremely thin (<0.5 mm), they can provide high spatial resolution, and cutting can alter the shape of the sensor matrix to a certain extent. The disadvantages of resistive sensors are higher temperature drift, larger hysteresis, instability during long-term loading, and a nonelastic and nonlinear character. Recently improved conductive, piezoelectric polymer sensors that have limited temperature drift and hysteresis are used which allow for higher constant or variable scanning frequencies (up to 500 Hz) due to the
424
FORCE IMAGING Table 1. Practical Example of Technical Specifications for Plantar Pressure Measurementsa 1.
Sensors
Plate
2.
Sensor area
Insoles Plate
3. 4. 5. 6.
Pressure range Pressure threshold Hysteresis Maximum total force
Insoles
a
5 mm × 7.6 mm/sensor 4,096, 8,192, and 16,384 sensors, respectively, for the 0.5-, 1- and 2-meter plate 356 for a size 42 488 mm × 325 mm, 976 mm × 325 mm, and 1952 mm × 325 mm, respectively, for the 0.5-, 1- and 2-meter plate Depending on shoe size 0.686 kPa up to 151.9 kPa or higher 0.686 kPa 0.5% after calibration and software rectification 246 knewton, 492 knewton and 984 knewton for 0.5-, 1- and 2- meter plate (complete surface)
Source:RSscan INTERNATIONAL
enhanced electromechanical properties of the material used (11). An example of technical specifications is given in Table 1. The current advances in accuracy and spatial, as well as measurement resolution, of pressure measurement systems is greatly enhanced by never ending technological improvement. However, actual techniques cannot produce accurate absolute pressure values. Appropriate calibration techniques are essential in this respect; use of relative data is, nevertheless, important in objective documentation and can be considered indispensable in quantifying the foot function for clinical use. Approaching a level of scientific and academic oriented research, the lack of standardized calibration, along with the nonlinear and relative valoriation of the data can be considered important shortcomings of currently available systems. Recently, attempts have been made to enhance accuracy considerably by simultaneous pressure measurements on a force distribution platform placed over a calibrated force platform (Kistler, AMTI, BERTEC). Thus, the pressure distribution data can be optimized for linearity and absolute values by the accurately measured resultant vertical force. INTEGRATION OF PRESSURE MEASUREMENT SYSTEMS IN MOTION ANALYSIS Automatic registration and analysis of movement has become an indispensable tool in the biomechanics of sports, rehabilitation, and ergonomics. Simultaneous registration of 3-D kinematics, kinetics including pressure
Figure 4. Picture of the footscan 3-D box. See color insert.
distribution data, electromyography, event markers, and other relevant data are the necessary tools in understanding the complex nature of human movement. Measurement systems are evolving rapidly using upto-date electronics in data sampling, data processing, and imaging techniques. An example is the 3-D box, shown in Fig. 4, which is a versatile data sampling interface that enables simultaneous and synchronized measurement of pressure-distribution data and force-plate measurements, by multiple high-speed infrared cameras (240 Hz), multichannel electromyographic signals, RF applications, and event and trigger ports. Properly used, the system offers a broad range of possibilities. The combination of a plantar pressure measurement device with a calibrated force plate contributes to the accuracy of motion analysis and facilitates data processing of simultaneously measured data (31). VISUALIZATION OF THE PLANTAR PRESSURE MEASUREMENT Visualization of the measured forces or pressures distributed over the entire surface of each foot follows data calibration and processing. The representation of results can be given in any conventional form such as tables, bar graphs, pies, or gray or color scales, but should reflect clearly and concisely the changes in the factors that are studied. The problem that arises is that a very large amount of data has to be represented and interpreted consecutively. The data vary in each measured area of each foot, between both feet, and especially in time when dynamic measurements are made. Therefore, representation of results should clearly illustrate the following parameters simultaneously: the amplitude of the measured force or pressure, the location in two dimensions with respect to the plantar surface of the foot, and the time dependence of the signal. Any system that has a large spatial resolution and/or a high measurement frequency will therefore generate a high amount of data, making interpretation extremely difficult. To provide ultimate options for interpreting foot roll-off, a combination of a cross-sectional view, with reference to a specific moment, and a longitudinal view, with respect to the complete gait cycle, should be possible. The choices to be made will depend largely on
FORCE IMAGING
(a)
footscan® 3D box Trigger/Sync Trigger in out
(b)
425
Sync out
SCSI-connector
PCMCIA-card
Power
Antenna Channel 1,3,5,7,9
Channel 2,4,6,8
Digital I/O
footscan® connector
Figure 5. Graphic representation of the front (a) and back (b) connections of the footscan 3-D box. See color insert.
Figure 6. Visualization of the pressures measured under two feet during static loading, with reference to a corresponding legend. See color insert.
the user’s preferences for interpretation. Most plantar pressure measurement systems offer static and dynamic measurements. Because static measurements (made while standing still) show little variation of forces or pressure in time, a two-dimensional representation of mean values during the measurement period will be sufficient. Figure 6 shows the mean pressure distribution in a 2-D view of both feet standing still. To enhance the interpretation, the amplitude of the signal is converted into a color scale that has a corresponding legend. This scale can be expressed in absolute force or pressure values, or as a percentage of body weight. To display the actual values for specific areas under the foot, the user can drag several pointers to the areas of interest. Results can be displayed in numerical values, bar graphs, etc. (Fig. 7). The same image can be converted into a 3-D representation using a wire diagram where the height represents the amplitude of force or pressure. A combination that uses a specified color legend can enhance interpretation (Fig. 8a,b,c). Force or pressure values measured in a dynamic situation, for example, during the gait or running, add another dimension to the representation of results. This
Figure 7. Quantification of the pressures measured in graphs and as numerical data. See color insert.
can be viewed as consecutive cross-sectional views similar to the aforementioned. Generating these consecutive views during a whole gait or running cycle will result in a multitude of images that can be interpreted only by visual inspection. The time dependence of the signal is poorly represented, however. Force or pressure versus
426
FORCE IMAGING
with respect to anatomical landmarks, do not cover the entire surface of the foot, any desired division of the foot into foot zones may be helpful in obtaining an overall view of foot loading. Figure 10(a) shows the division of the total area of the foot into three zones. The fore-foot, the midfoot, and the rear-foot areas are distinguished. Based on trials by some investigators to determine methods of measuring footprints for classification, different foot types are mathematically integrated in the software. Calculation of the arch index (6) shown in Fig. 10b resulted in suggested criteria for classifying footprints as high, normal, and flat feet. Additionally, these zones are arbitrarily subdivided into three zones for the fore-foot, two for the midfoot, and four for the rearfoot for further analysis. The variation of the total force in each zone is represented with respect to time in Fig. 10b. Representative data for each of the aforementioned foot areas or foot zones can be summarized in a table, as illustrated in Fig. 11. The table represents the measured data for each of the eight foot areas, expressed as a percentage of contact time (% contact), load rate, relative loading on the heel, metatarsal heads and toes (% comp), maximal pressure (Pmax ), and impulse. The uncoloured columns show reference values.
(a)
(b)
(c)
PREDICTION OF FOOT MOTION
Figure 8. 3-D visualization of the three foot types (a) flat foot, (b) normal foot, (c) high arched foot based on plantar pressure measurements. See color insert.
time graphs of selected foot areas offer a more precise representation of variations with respect to time. Figure 9a shows the peak pressures and eight userdefined foot areas represented by squares. The pressure–time graph (Fig. 9b) illustrates the variation in pressure in each area with respect to time. Any crosssectional view of the pressure in these areas can be obtained by moving a cursor (the vertical yellow line in b) to the desired time. Because these selected areas, chosen
(a)
A gross view of force or pressure distribution during foot roll-off can be obtained by calculating the center of force or pressure. Projecting this calculated center for each time sample on the 2-D representation of the mean or peak pressures on the foot yields a series of points that reflect the gross foot roll-off, as shown in Fig. 12. To obtain a visualization of presumed foot-motion, further mathematical analysis is necessary. Several advanced software features have recently been developed that provide a graphical representation of the results of these calculations. The graphs shown in Fig. 13 visualize specific motion characteristics for the left and the right fore and rear foot. The green line represents rear-foot motion known as pronation or valgus (the inward tilt of the calcaneus or heel bone) and supination or varus (the outward tilt of
(b)
Figure 9. (a) Example of pressure area division. (b) Practical example of quantification of the pressures measured under specific anatomical reference points under the foot. See color insert.
FORCE IMAGING
(a)
427
(b)
(c)
C = L /3
B = L /3
A = L /3
Figure 10. (a) Example of force area division. (b) Practical example of the quantification of the forces measured under specific anatomical reference zones under the foot. (c) Transposition of the calculation method for determining foot types presented by Cavanagh and Rodgers (6) in the image of a recently developed plantar pressure measurement device. See color insert.
Figure 11. Statistics table with reference to the plantar pressure measured, in addition to parameters important for further foot function analysis.
the calcaneus or heel bone). The red line represents the same parameters for the fore foot. The higher the values for these two parameters, the higher the degree of medial foot loading that can correlate with pronation (valgus) measured for the specific foot parts. The velocity of the calculated pronation can be derived from the slope of the curve that represents this motion in time. The steeper the slope, the faster the pronation, and the higher the risk of developing injuries. The yellow line represents the balance
line that visualizes foot loading from the medial part of the foot to the lateral during complete roll-off of the foot; the higher the value, the higher the medial load, and the greater the pronation. Finally, the purple line (X comp COF) shows how the center of force is projected on the x axis. Besides these data generated for each foot separately, the load transfer between both feet may be of primary importance to the user. One of the possible solutions in
428
FORCE IMAGING
Figure 12. Practical example of the evolution of the center of pressure projected on the 2-D view of maximal or mean pressure values on the foot. See color insert.
representing load transfer between feet may be obtained by combining multiple steps from insole or large platform data into an image where both feet are positioned arbitrarily next to one another. Now, the center of pressure is calculated on both feet simultaneously. The result will be a figure eight resembling a butterfly, as shown in Fig. 14. Both crossing dotted lines represent the load transfer between left and right foot and vice versa, and the longitudinal dotted lines represent roll-off during a single stance (while only one foot is in contact with the floor). Despite the fact that this figure does not reflect a physical situation, visual inspection of the graph will provide the clinician with an overall view of the timing and roll-off characteristics of each foot, and in particular, of load transfer between feet. PRACTICAL EXAMPLE Improved accuracy and reproducibility of the existing systems greatly contributes to the objectivity of research and clinical analysis. Still, detection and evaluation of possible foot disorders depends greatly on the personal experience of the investigator and on different studies in the field. As described throughout this article, recent software developments support the interpretation of results that provide a large amount of specific information. The following case study is used to emphasize the power of a plantar pressure measurement system in clinical analysis. The analysis of a 26 year-old soccer player who
Figure 13. Graphical presentation of predicted foot motion of the different foot parts. See color insert.
has recurrent Achilles tendinitis is described. The patient’s main complaint was bilateral sharp pain in the Achilles tendon during the last two soccer seasons. The method used for clinical analysis consisted of a physical therapy examination including postural inspection, joint mobility tests, muscle testing, and a dynamic plantar pressure measurement. The findings of the static inspection showed an important varus position at the knee associated with lateral loading of the foot. The ankle joint mobility proved to be of comparable magnitude on both sides whereas dorsiflexion of the big toe was limited. Muscle activity of the foot plantar flexor muscle group was rather weak and was painful. Plantar pressure measurement on the right foot during running is illustrated. The evolution of the center of pressure (red line), shown in Fig. 15, shows a slight medial shift of the pressures on the heel after the initial lateral loading is observed. Interpretation of the forces on two selected zones of the heel shows a clear and relatively high load on the medial heel zone immediately after the initial lateral force peak. In addition, the period of heel loading is relatively long. A significant rear foot pronation (valgus movement of the calcaneus) can be suspected from these findings. Heel loading is followed by lateral fore-foot loading during early midstance and a wide pressure reading in the midfoot area that indicates a deficient medial arch (Fig. 15). The center of pressure on the right foot shows a significant medial shift during midstance that indicates a pronation movement of the foot, followed by a second lateral orientation of the center of pressure most probably due to a resupination. Pressure readings under the big toe start well before heel-off and reach a high level, possibly indicating high activity of the flexor of the big toe in an attempt to sustain the medial arch. The limited dorsiflexion of the big toe is further illustrated by the high load under the toe at push-off that ends relatively late after the maxima on of the second and third metatarsal heads. CONCLUSIONS A review of the different imaging techniques and factors that influence measurements used to objectify foot pressure distribution highlights certain restrictions in the former as well as in the present systems. The diversity of the application areas and the factors that influence the relevance of the measurement data emphasize the
FORCE IMAGING
Figure 14. A butterfly diagram of five consecutive steps. Dots represent the center of pressure calculated from the pressure data of the two feet placed an arbitrary distance from one another. See color insert.
Figure 15. Visualization of the maximal pressures measured under the right foot and the evolution of the center of pressure (red line) during the complete unroll of the foot. See color insert.
importance of a dedicated hardware setup. Research to improve the accuracy and resolution of the measurement equipment is indispensable to fulfill the high standards aimed for in the different application areas. Using the pressure-distribution measurement system solely as a tool for clinical analysis of foot function or as a part of an integrated 3-D kinematic and kinetic measurement setup for analysis of total body biomechanics creates the same challenge in representing results. Imaging techniques used to represent data of a multidimensional space are of major importance in allowing the researcher and the clinician to obtain concise insight in the movement of the subject under
429
study. Current systems can provide an enormous amount of measurement data. However, interpretation of the results in a clinical or scientific setting still remains the main goal. The increased demand for dynamic analysis and scientifically oriented research puts a high demand on accuracy of measurement, on the versatility of software processing, and especially on the way results are represented. The review shows how the interpretation of the forces and pressures applied under the foot during stance, walking, or running can enhance the interpretation of normal or pathological foot function. Additionally, the demand for accurate measurement of foot kinematics increases, as well as its influence on lower limb kinematics or even whole body movements. Future developments can be expected with respect to reliable measurement of absolute force or pressure data and of shear forces. The synchronized measurement of pressure-distribution data combined with force-plate measurements and reliable calibration techniques open an interesting perspective in this respect. Integration of pressure-distribution measurements in regular gait laboratory setups (3D kinematics, electromyographical registration) yield differentiated data on foot roll-off and its interaction with whole body movements. In conclusion, the development of pressure distribution measurement systems can be expected to evolve to more reliable and accurate systems that are more versatile in the field of application, thus enabling relevant measurements in the actual situations instead of classic settings restricted to the laboratory environment. Imaging techniques can be expected to play an ever increasing role in representing a multitude of complex and interrelated data concisely and clearly to enhance the insight and overview of clinicians or researchers of the problems they want to study. It may be expected that interpretation of foot pressure measurement results, as well as applications such as the design of orthopedic insoles, and shoes will improve by using appropriate expert systems. Thus, a multitude of areas of application opens for research using pressure-distribution measurements in a laboratory setting or during the activities of daily living, thus, leading to a better understanding of the interaction between
Figure 16. Practical example of the quantification of the pressures measured under specific anatomical reference points under the foot. See color insert.
430
FOUNDATIONS OF MORPHOLOGICAL IMAGE PROCESSING
body and environment, which could lead to significant improvement in the design of products we regularly use and in particular, the design and fine-tuning of orthopedic appliances. BIBLIOGRAPHY 1. Y. Li et al., Folia Primatol 66(1–4), 137–159 (1996). 2. R. H. Crompton et al., J. Hum. Evol. 35(1), 55–74 (1998). 3. R. Bender, M. Verhaegen, and N. Oser, Anthropol. Anz. 55(1), 1–14 (1997). 4. P. Aerts, R. Van Damme, L. Van Elsacker, and V. Duchene, Am. J. Phys. Anthropol. 111(4), 503–517 (2000). 5. M. M. Rodgers, J. Orthop. Sports Phys. Ther. 21, 306–316 (1995). 6. P. R. Cavanagh and M. M. Rodgers, J. Biomechanics 20, 547–551 (1987). 7. I. J. Alexander, E. Y. S. Chao, and K. A. Johnson, Foot & Ankle 11, 152–167 (1990). 8. S. H. Scott and D. A. Winter, J. Biomechanics 26, 1091–1104 (1993). 9. L. Peeraer, Tijdschrift voor geneeskunde 49(17), 1153–1154 (1993). 10. E. M. Hennig and M. A. Lafortune, in P. Allard, A. Cappozo, A. Lundberg, and C. Vaughan, eds., Three-Dimensional Analysis of Human Locomotion, John Wiley & Sons, Chichester, 1997. 11. http://www.morganelectroceramics.com/piezoguide1.html 12. A method and apparatus for determining a flexural property of a load applied to a discretely sampled pressure sensor array, European Pat. application EP00200568.4, 18/02/00 by RSscan INTERNATIONAL. 13. C. Giacomozzi, V. Macellari, A. Leardini, and M. G. Benedetti, Med. Biol. Eng. Comput. 38, 156–163 (2000). 14. L. B. Jeffcott, M. A. Holmes, and H. G. Townsend, Vet. J. 158(2), 113–119 (1999). 15. D. van Dijk, G. Aufdemkampe, and S. Van Langheveld, Spinal Cord 37(2), 123–128 (1999). 16. C. Maltais, J. Dansereau, R. Aissaoui, and M. Lacoste, IEEE Trans. Rehabil. Eng. 7(1), 91–98 (1999). 17. A. A. Polliack et al., Prosthetic Orthotic Int. 24(1), 63–73 (2000). 18. R. Mann, E. K. Yeong, M. L. Moore, and L. H. Engrav, J. Burn Care Rehabil. 18(2), 160–163; discussion 159 (1997). 19. P. R. Cavanagh, J. S. Ulbrecht, and G. M. Caputo, Diabetes Metab. Res. Rev. 16(Suppl 1), S6–S10–(2000). 20. C. Nester, Gait Posture 12(3), 251–256 (2000). 21. B. A. Rothbart and L. Estabrook, J. Manipulative Physiol. 11(5), 373–379 (1988). 22. S. F. Reischl, C. M. Powers, S. Rao, and J. Perry, Foot & Ankle 20(8), 513–520 (1999). 23. M. S. Hefzy, W. T. Jackson, S. R. Saddemi, and Y. F. Hsieh, J. Biomed. Eng. 14(4), 329–343 (1992). 24. C. M. Powers, R. Maffucci, and S. Hampton, J. Orthop. Sports Phys. Ther. 22(4), 155–160 (1995). 25. Electronic scanning device to measure at high speed, physical properties of large multi-dimensional arrays, U.S. Pat. application 09/416,274, 14/10/99 by RSscan INTERNATIONAL. 26. M. Lord, Med. Eng. Phys. 19(2), 140–144 (1997). 27. T. W. Kernozeck, E. E. LaMott, and M. J. Dancisak, Foot Ankle Int. Apr 17(4), 204–209 (1996).
28. J. H. Ahroni, E. J. Boyko, and R. Forsberg, Foot Ankle Int. 19(10), 668–673 (1998). 29. A. L. Randolph et al. 30. S. Barnett, J. L. Cunningham, and S. West, Clin. Biomech. (Bristol, Avon) 15(10), 781–785 (2000). 31. Apparatus and method for measuring the pressure distribution generated by a three-dimensional object, European Pat. application EP 98202334.3, 10/07/98 and U.S. Pat. application 09/185,197, 3/11/98 by RSscan INTERNATIONAL. 32. G. Dereymaeker, Management of Hallux Valgus and Metatarsalgia, Ph.D. Dissertation, Ku Leuven, 1996.
FOUNDATIONS OF MORPHOLOGICAL IMAGE PROCESSING EDWARD R. DOUGHERTY Texas A&M University College Station, TX
INTRODUCTION The basic morphological operations of erosion, dilation, opening, and closing are a part of every image processing toolbox. So too are many basic algorithms such as the watershed and top-hat transforms. This chapter reviews the fundamental algebraic theory of mathematical morphology as applied to Euclidean set theory, the context in which it was originally conceived by Matheron (1) and Serra (2) from mathematical operations dating back to Minkowski (3) and Hadwiger (4,5). Thorough knowledge of basic algebraic theory is required for anyone engaged in serious research in morphological image processing and is very beneficial for anyone interested in advanced applications. Since its original set-theoretic formulation, mathematical morphology has been extended to real-valued functions (6) and to lattice-valued functions (7–10). These extensions take note of the fact that the operations and propositions of the algebraic theory depend on order relations. This can be seen in the direct generalization of many binary definitions and properties to lattices based on the partial order relation among sets. In many cases, the definitions in this article are extended to lattices (including real-valued and integral-valued functions) by changing the subset relation ⊂ to the order relationship ≤. Besides restricting our attention to binary morphology, let us note three areas not addressed in this article: (1) most of the algebraic theory is immediately applicable to discrete set theory, but there are topological issues to be considered in the transition; (2) the relation between mathematical morphology and stochastic geometry plays a key role in the original studies of Matheron, especially in his deliberations on texture; (3) the design of complex morphological operators can be a daunting task, and there is a body of work concerning the statistical design of optimal morphological operators. This article is mainly a reproduction of Chapter 3 of Nonlinear Filters for Image Processing (11), edited
FOUNDATIONS OF MORPHOLOGICAL IMAGE PROCESSING
by E. R. Dougherty and J. T. Astola (12). The material on granulometries is taken from Chapter 4 of the same reference (13). For an introductory treatment, see An Introduction to Morphological Image Processing by E. R. Dougherty (14). We refer the reader also to Morphological Image Analysis by P. Soille (15), Image Analysis and Mathematical Morphology by J. Serra (16), and Mathematical Morphology in Image Processing, edited by E. R. Dougherty (17).
Moreover, if S ⊂ T, then, i (S) ⊂ i (T) for all i ∈ I, so that (S) =
i (S) ⊂
i∈I
Considering a binary (n-dimensional) image as a subset of n-dimensional Euclidean space Rn , image processing operators have the form (S), where S ⊂ Rn . Translationinvariant operators are especially important. Let P denote the power set (set of all subsets) of Rn . For n = 2, P is the class of continuous binary images. A mapping : P → P is translation-invariant if (Sx ) = (S)x for all S ∈ P and x ∈ Rn , where Sx = {x + z : z ∈ S}. Matheron calls a τ mapping. The kernel of is defined by
Finally, 0 ∈ (S) if and only if 0 ∈ i (S) for some i ∈ I. Hence, S ∈ K [] if and only if there exists i ∈ I such that S ∈ K [i ].
K [ ∗ ] = {S : Sc ∈/ K []}.
Proposition 1. Let 1 and 2 be translation-invariant mappings. Then K [1 ] ⊂ K [2 ] if and only if 1 ⊂ 2 . In particular, 1 = 2 if and only if K [1 ] = K [2 ]. Proof. Suppose that K [1 ] ⊂ K [2 ], S is a set, and z ∈ 1 (S). Then 0 ∈ 1 (S)−z = 1 (S−z ) and S−z ∈ K [1 ] ⊂ K [2 ]. But S−z ∈ K [2 ] implies (by similar reasoning) that z ∈ 2 (S), and we conclude that 1 ⊂ 2 . Conversely, suppose that 1 ⊂ 2 . Then, S ∈ K [1 ] implies 0 ∈ 1 (S) ⊂ 2 (S), which implies S ∈ K [2 ], and we conclude that K [1 ] ⊂ K [2 ].
Proof. Translation invariance of the dual follows from translation invariance of and the fact that complementation commutes with translation [(Sz )c = (Sc )z ]. Specifically,
= [(Sc )c ]z = ∗ (S)z .
Mapping is increasing if it preserves ordering within
P , namely, S1 ⊂ S2 implies (S1 ) ⊂ (S2 ). Subset partial ordering in P induces a partial order relation among
K [ ∗ ] = {S : 0 ∈ ∗ (S)} = {S : 0 ∈/ (Sc )} = {S : Sc ∈ / K []}.
Operator is said to commute with intersection if
Si
=
(Sz ) =
i∈I
i (Sz ) =
i∈I
i (S)z =
i∈I
z
(2)
(Si )
(7)
for any collection of sets {Si } across an arbitrary index set I. It commutes with union if
Si
=
(Si ).
(8)
i∈I
Proposition 4. commutes with intersection if and only if ∗ commutes with union. Proof. We demonstrate the equivalence in one direction. The converse is demonstrated similarly. If commutes over intersection, then, by De Morgan’s law,
= (S)z .
i (S)
i∈I
i∈I
Proof. Let = ∪i∈I i . If i is translation-invariant for each i ∈ I, because union commutes with translation by a point,
(6)
mappings: 1 ⊂ 2 if and only if 1 (S) ⊂ 2 (S) for all S ∈ P.
Proposition 2. A union of translation-invariant mappings is translation-invariant and a union of increasing mappings is increasing. In translation invariance, the kernel of the union is equal to the union of the kernels.
(5)
The increasingness of ∗ follows from the increasingness of because complementation reverses subset ordering [S ⊂ T iff Sc ⊃ T c ]. Specifically, if S ⊂ T, then, by the increasingness of , (T c ) ⊂ (Sc ). Taking complements yields ∗ (S) ⊂ ∗ (T). As for the kernel,
i∈I
(4)
If is increasing, so is ∗ .
(1)
Point z ∈ (S) if and only if S ∈ K []z . The dual mapping of is ∗ : P → P defined by ∗ (S) = (Sc )c . Because Scc = S, ∗∗ = . The dual of the dual is the original mapping.
(3)
∗ (Sz ) = [(Sz )c ]c = [(Sc )z ]c = [(Sc )z ]c
K [] = {S : 0 ∈ (S)}.
i (T) = (T).
i∈I
Proposition 3. If is translation-invariant, then so is its dual ∗ , and
TRANSLATION-INVARIANT OPERATORS
431
∗
i∈I
Si
=
i∈I
c Sci
=
i∈I
c (Sci )
=
∗ (Si ).
i∈I
(9)
432
FOUNDATIONS OF MORPHOLOGICAL IMAGE PROCESSING
A mapping : P → P is antiextensive if (S) ⊂ S for any S ∈ P . is extensive if (S) ⊃ S for any S ∈ P . If is antiextensive, then (Sc ) ⊂ Sc , so that ∗ (S) = (Sc )c ⊃ Scc = S. Because the argument can be reversed, is antiextensive if and only if ∗ is extensive. For translationinvariant operators, antiextensivity and extensivity can be characterized via the kernel. Proposition 5. Let P 0 be the subset of all sets in P that contain the origin, and let be translation-invariant. Then (i) is antiextensive if and only if P 0 ⊃ K [], and (ii) is extensive if and only if P 0 ⊂ K []. Proof. We show (i). Suppose that is antiextensive and S ∈ K []. Because (S) ⊂ S and 0 ∈ (S), 0 ∈ S, so that S ∈ P 0 . Conversely, suppose that P 0 ⊃ K [] but is not antiextensive. Then there exists set S such that (S) ⊂ S, which means that there exists a point z ∈ (S) − S. Because is translation-invariant, 0 ∈ / P 0, (S)z − Sz = (Sz ) − Sz . Hence, Sz ∈ K [], but Sz ∈ which contradicts the supposition P 0 ⊃ K []. The basic building blocks of increasing, translationinvariant mappings are erosions. The erosion operator EB determined by the structuring element B is defined by EB (S) = {x : Bx ⊂ S}
(10)
Figure 1 shows (a) a rectangle S and a disk structuring element B centered at the origin (b) some translates of the disk that are subsets of the rectangle and (c) the erosion of the rectangle by the disk. Figure 2 shows a square structuring element B whose lower left corner is at the origin and the erosion of the rectangle of Fig. 1 by the square. The next proposition summarizes the most basic properties of erosion.
EB(S)
B
Figure 2
(ii) (iii) (iv) (v)
S1 ⊂ S2 implies EB (S1 ) ⊂ EB (S2 ) [increasing]. EBz (S) = EB (S)−z . EB (S1 ∩ S2 ) = EB (S1 ) ∩ EB (S2 ). EB1 ∪B2 (S) = EB1 (S) ∩ EB2 (S).
Proof. Part (i) is established by the relation {x : Bx ⊂ Sz } = {x : Bx−z ⊂ S} = {y + z : By ⊂ S} = {y : By ⊂ S}z .
Because S1 ⊂ S2 , part (ii) follows because Bx ⊂ S1 implies Bx ⊂ S2 . Part (iii) is established by {x : (Bz )x ⊂ S} = {x : Bz+x ⊂ S} = {y − z : By ⊂ S} = {y : By ⊂ S}−z .
(a)
B S
(12)
As for part (iv), Bx ⊂ S1 ∩ S2 if and only if Bx ⊂ S1 and Bx ⊂ S2 , and part (v) holds because (B1 ∪ B2 )x ⊂ S if and only if (B1 )x ⊂ S and (B2 )x ⊂ S. Property (iv) holds for an erosion of any union, countable or uncountable, so that erosion commutes with intersection: for any index set I,
Proposition 6. Erosion satisfies the following properties: (i) EB (Sz ) = EB (S)z [translation-invariant].
(11)
EB
Si
=
i∈I
EB (Si ).
(13)
i∈I
In the algebraic formulation of mathematical morphology on complete lattices (7), any operator that commutes with infimum (intersection) and for which the maximal element is invariant [for P , meaning that (Rn ) = Rn ] is called an erosion. The erosion operation defined in Eq. (10) via the structuring element B is often called a structural erosion (8). Property (v) also holds for any union:
(b)
E∪i∈I Bi (S) =
EBi (S).
(14)
i∈I
Because, for any single point x, E{x} (S) = S−x , erosion can be expressed via intersection, (c)
EB (S) = E∪x∈B {x} (S) = EB (S )
Figure 1
x∈B
E{x} (S) =
x∈B
S−x =
Sx . (15)
x∈−B
The intersection on the right-hand side defines the classical Minkowski subtraction, so we see that EB (S) is equal to the Minkowski subtraction of S by −B.
FOUNDATIONS OF MORPHOLOGICAL IMAGE PROCESSING
The dual mapping for erosion by B is dilation by −B = {−x : x ∈ B}. Dilation of S by (structuring element) B is defined according to duality by B (S) = E−B (Sc )c .
EB (S) = −B (S ) . c c
(17)
It follows from Propositions 3 and 4 that B is translationinvariant and increasing. Because erosion commutes with intersection, its dual dilation commutes with union: B
=
Si
i∈I
c
(22)
According to Eq. (10), erosion is characterized by translates of the structuring element fitting within the set. Dilation, the dual of erosion, can be characterized by hitting.
B (S) = {x : (−B)x ∩ S = "}.
Sc−x
=
(23)
Proof. The proposition follows from Eq. (19) and commutativity: {x : (−B)x ∩ S = "} = {x : ∃ z ∈ S z ∈ (−B)x }, = {x : ∃ z ∈ S z − x ∈ −B}, = {x : ∃ z ∈ S x ∈ Bz }, = Bz = B ⊕ S = B (S).
(24)
Sx .
(19)
x∈B
Figure 3 shows (a) some translates of the disk of Fig. 1 to points in the rectangle of Fig. 1 and (b) the dilation of the rectangle by the disk. Because Sx = {y + x : y ∈ S}, the union of Eq. (19) can be expanded to yield
(S ⊕ T) ⊕ U = S ⊕ (T ⊕ U).
z∈S
x∈−B
B (S) =
(21)
(18)
In the algebraic formulation of mathematical morphology on complete lattices (7), any operator that commutes with supremum (union) and for which the minimal element is invariant [which for P means that (") = "] is called a dilation. Thus, the dilation defined in Eq. (16) is often called a structural dilation (8). Whereas erosion can be directly expressed as an intersection of translates of the set, dilation can be expressed as a union of translates: B (S) = E−B (Sc )c =
S ⊕ T = T ⊕ S,
Proposition 7. Dilation possesses the formulation B (Si ).
i∈I
and B and is denoted by S ⊕ B. Thus, B (S) = S ⊕ B. The symmetry of the union implies (from the corresponding properties of ordinary addition) that Minkowski addition is both commutative and associative:
(16)
Because the dual of the dual is the original operator,
433
{y + x}.
(20)
y∈S,x∈B
The union on the right-hand side of Eq. (20) is symmetrical in S and B. This union is called the Minkowski addition of S
(a)
Because 0 ∈ EB (S) if and only if B = B0 ⊂ S, erosion by set B has the kernel
K [EB ] = {S : S ⊃ B}.
(25)
Because dilation by B is the dual of erosion by −B,
K [B ] = {S : Sc ∈/ K [E−B ]}, = {S : −B ⊂ Sc }, = {S : S ∩ (−B) = "}.
(26)
REPRESENTATION OF INCREASING TRANSLATION-INVARIANT OPERATORS Many applications of mathematical morphology depend on the representation of operators in terms of elementary operations. The most basic theorem of this kind is due to Matheron and concerns the representation of increasing mappings in terms of erosions.
B
(b)
∆B(S )
Theorem 1 (1). If is increasing and translationinvariant, then it has the representation (S) =
EB (S).
(27)
B∈K []
Figure 3
Proof. Because is increasing, if B ∈ K [] and A ⊃ B, then 0 ∈ (B) ⊂ (A), and A ∈ K []. Thus, {A : A ⊃ B} ⊂
434
FOUNDATIONS OF MORPHOLOGICAL IMAGE PROCESSING
K []. More generally,
Theorem 2 (18). If is increasing, translation-invariant, and possesses a basis, then,
{A : A ⊃ B} ⊂ K [].
(28) (S) =
B∈K []
EB (S).
(34)
B∈B []
Because the reverse inclusion is trivial,
K [] =
{A : A ⊃ B}.
(29)
B∈K []
Proof. First we show the union representation. Because
Hence, by Proposition 2,
K [EB ] = K
EB .
(30)
and the representation follows by Proposition 1.
K [] =
B∈K []
B∈K []
Corollary 1. If is increasing and translation-invariant, then, −B (S). (31) (S) = B∈K [ ∗ ]
Proof. Because the dual of is also translation-invariant and increasing, by the theorem, ∗ (Sc ) =
EB (Sc ).
(32)
B∈K [ ∗ ]
Taking complements, applying duality, and recognizing that ∗ (Sc )c = (S) yields (S) =
Moreover, if commutes with intersection, then, it possesses a basis.
EB (Sc )c .
B [] ⊂ K [], the union across B [] is a subset of the union across K []. On the other hand, if B ∈ K [], there exists B ∈ B [] such that B ⊂ B. The latter inclusion implies that EB (S) ⊂ EB (S). Thus, the union across K [] is a subset of the union across B [], and Eq. (34) is verified. Next, suppose that commutes with intersection. Let B denote the collection of all sets in K [] that do not have a proper subset in K []. We show that B is a basis. The first condition is trivially fulfilled. As for the second condition, let S ∈ K [], and let S denote the class of all subsets of S in K []. S is partially ordered by containment. Suppose that {Ui }i∈I is a totally ordered subset of S . Let U0 = ∩i∈I Ui . Because Ui ∈ K [] for all i, 0 ∈ (Ui ) for all i. Because commutes with intersection, 0 ∈ ∩i∈I (Ui ) = (U0 ). Hence, U0 ∈ K [], and U0 ∈ S . Moreover, U0 is a lower bound for {Ui }. Consequently, every totally ordered subclass of S possesses a minimal element in S . By Zorn’s lemma, S possesses a minimal element, say, S0 . We claim that S0 ∈ B . If not, then S0 has a proper subset S1 ∈ S , but this would mean that S0 is not minimal. Hence, S0 ⊂ S and S0 ∈ B , verifying that B is a basis for .
(33)
B∈K [ ∗ ]
The result follows because dilation by −B is the dual of erosion by B. The erosion representation of an increasing, translation-invariant operator across the kernel is highly redundant. Redundancy can be eliminated by deleting from the expansion any erosion whose structuring element contains a structuring element of another erosion. This reduction is a bit problematic owing to the existence of infinite totally ordered families of sets. A set family {Si }i∈I (I countable or uncountable) is totally ordered if for any i, i ∈ I, either Si ⊂ Si or Si ⊂ Si . The hurdle is transfinite induction. In particular, no finite search through the family need reach a lower bound for the family; a lower bound is a set S such that S ⊂ Si for all i. To proceed, we define the concept of a basis. A subset B [] of K [] is called a (increasing) basis if (1) no set in B [] is a proper subset of a distinct set in B [] and (2) for any T ∈ K [], there exists B ∈ B [] such that B ⊂ T. The notation ‘‘B []’’ is permitted because there can be one basis at most. To see uniqueness, suppose that B 1 and B 2 are subsets of K [] that satisfy the two basis conditions and B ∈ B 1 . Because B ∈ K [], there exists B ∈ B 2 such that B ⊂ B and, because B ∈ K [], there exists B ∈ B 1 such that B ⊂ B ⊂ B. Because B ⊂ B and B , B ∈ B 1 , it must be that B = B. Hence, B = B ∈ B 2 , and B 1 ⊂ B 2 . By symmetry, B 2 ⊂ B 1 , and B 1 = B 2 .
Proposition 8 (19). If is increasing, translationinvariant, and every set in K [] possesses a finite subset in K [], then has a basis, and every basis set is finite. Proof. Again let B denote the collection of all sets in K [] that do not have a proper subset in K []. To show the second condition for a basis, let S ∈ K []. There exists a finite set S1 ∈ K [] such that S1 ⊂ S. If S1 ∈ B , we are done; if not, there exists a proper subset S2 of S1 such that S2 ∈ K []. Continue the process. Because S1 is finite, the process must come to an end with some Sm ∈ B . REPRESENTATION OF NONINCREASING TRANSLATION-INVARIANT OPERATORS Representation of a nonincreasing, translation-invariant mapping depends on the hit-or-miss transform, which for structuring elements (sets) E and F, is defined by E,F (S) = EE (S) ∩ EF (Sc ).
(35)
For representation, we require the notion of the closed interval[A, B] determined by two sets A and B, which is simply the family of all sets T ∈ P such that A ⊂ T ⊂ B. If C is a collection of sets, then C is equal to the union of closed intervals contained within it: C=
[A,B]⊂C
[A, B].
(36)
FOUNDATIONS OF MORPHOLOGICAL IMAGE PROCESSING
Theorem 3 (20). If is translation-invariant, then it possesses the representation
(S) =
E,F (S).
(37)
[E,Fc ]⊂K []
Proof. Because the intervals within the kernel cover the kernel, K [] = [E, F c ]. (38) [E,Fc ]⊂K []
435
As for minimal representation, the situation is somewhat problematic. A maximal closed interval in K [] is a closed interval in K [] not properly contained in any other closed interval in K []. In the present setting, the basis, B [], is defined as the collection of all maximal closed intervals contained in K []. According to Theorem 4, if B [] satisfies the representation condition, then the expansion can be taken across B []: (S) =
E,F (S).
(43)
[E,Fc ]∈B []
It is immediately apparent from the definitions of the kernel and the hit-or-miss transform that the kernel of the hit-or-miss transform that has structuring pair (E, F) is [E, F c ]. Consequently, the theorem follows, as did Theorem 1: the kernel of the mapping defined by the union is equal to the union of the kernels, this latter union is K [], and two mappings that possess the same kernel are identical. Dual representation is achieved via the dual of the hit-or-miss transform. Application of De Morgan’s laws in conjunction with the duality relation between erosion and dilation yields the dual, ∗E,F , of the hit-or-miss transform E,F : (39) ∗E,F (S) = −E (S) ∪ −F (Sc ). Corollary 2. If is translation-invariant, then it possesses the representation
(S) =
∗E,F (S).
(40)
[E,Fc ]⊂K [ ∗ ]
Proof. Apply the theorem and De Morgan’s law to expressed as (S) = ∗ (Sc )c . As in finite-logic representation (disjunctive normal form), reduction is possible. A set B of closed intervals in K [] is said to satisfy the representation condition for if and only if for any closed interval [A, B] ⊂ K [] there exists a closed interval [A , B ] ∈ B such that [A, B] ⊂ [A , B ]. Theorem 4 (20). If is translation invariant and B is a set of closed intervals contained in K [] that satisfies the representation condition for , then, (S) =
E,F (S).
(41)
[E,Fc ]∈B
Proof. Let be the mapping defined by the union. We need to show that = . Because each interval in B is a subset of K [], ⊂ by Theorem 3. In the other direction, for each [E, F c ] ⊂ K [], there exists [A, Bc ] ∈ B such that [E, F c ] ⊂ [A, Bc ]. Hence, (S) =
[E,Fc ]⊂K []
E,F (S) ⊂
A,B (S) ⊂
(S).
[E,Fc ]⊂K []
(42)
Moreover, if B is a collection of closed intervals that satisfies the representation condition for , then B [] ⊂ B (were there a closed interval in B [] − B , there would be a distinct closed interval in B containing it, thereby contradicting the B [] maximality condition). Thus, in the sense that we are considering expansions across collections of closed intervals that satisfy the representation condition, expansion across B [] is minimal. In fact, there may still be redundancy because it is possible for a union of closed intervals in B [] to contain a distinct closed interval in B [] that can be deleted from the union expansion. Besides the general problem of minimization, there is another key difference between the representations of increasing and nonincreasing mappings. For increasing mappings, the basis need not exist; for nonincreasing mappings, the basis exists by definition. But for nonincreasing mappings, existence of the basis by definition does not ensure that it satisfies the representation condition; whereas, for increasing mappings, existence of the basis ipso facto provides representation. Hence, the problem corresponding to the existence of the basis for an increasing mapping is satisfaction of the representation condition for nonincreasing mappings. Although a sufficient condition for existence in the increasing case can be framed in terms of commutation with intersection, a corresponding condition for satisfying the representation condition in the nonincreasing case involves formulation in terms of the hit-or-miss topology, and we shall leave that to the literature. The theory of morphological operator representation is very general and extends to gray-scale operators (21,22), and to operators between complete lattices (23–25). The minimal representation of translation-invariant operators (both increasing and nonincreasing) has theoretical value because it facilitates the study of complex operators via simple primitive operators; however, it also has great practical value because it provides a means of operator design. To design a statistically optimal increasing operator is equivalent to finding an optimal basis. For digital image processing, structuring elements are subsets of the discrete Cartesian grid and, so long as we restrict ourselves to finite windows, they can be represented by matrices of zeros and ones, zero means that a pixel is not in the structuring element, and one means that it is in the structuring element. Once the design problem is placed in a statistical setting, finding an optimal basis becomes a problem of computational learning from sample images. For nonincreasing operators, the problem becomes one of learning the pairs of structuring elements that determine
436
FOUNDATIONS OF MORPHOLOGICAL IMAGE PROCESSING
the operator representation. There is a large literature on this subject to which we defer (26–39). OPENINGS AND CLOSINGS We have considered representation relative to translation invariance and increasingness. Now, we consider antiextensivity and idempotence. An operator is idempotent if = . The opening of S by B is denoted by !B (S) [or S°B] and defined by the composition !B = B EB .
(44)
The next proposition gives a key property of opening for image processing: the opening is equal to the union of all translations of the structuring element that are subsets of the input image. It follows that opening is independent of the position of the structuring element: !B (S) = !By (S) for any point y. Figure 4 shows the opening of the rectangle of Fig. 1 by the disk of Fig. 1. Note the rounded corners.
By .
(45)
By ⊂S
Proof. z ∈ !B (S) if and only if there exists b ∈ B such that z ∈ EB (S)b if and only if there exists b ∈ B and w ∈ EB (S) such that w = −b + z if and only if there exists w such that w ∈ (−B)z ∩ EB (S) if and only if there exists w such z ∈ Bw and Bw ⊂ S if and only if z lies in the union of Eq. (45). Proposition 10. Opening satisfies the following properties: (i) (ii) (iii) (iv)
Consider the signal-union-noise model, in which there is an underlying uncorrupted image S, a noise image N, and a corrupted image S ∪ N. As a filter, opening is translation-invariant, and because it is increasing and antiextensive, !B (S) ⊂ !B (S ∪ N) ⊂ S ∪ N.
Proposition 9. Opening is given by !B (S) =
Figure 5
!B (Sz ) = !B (S)z [translation-invariant]. S ⊂ T implies !B (S) ⊂ !B (T) [increasing]. !B (S) ⊂ S [antiextensive]. !B !B = !B [idempotent].
Proof. (i) As an iteration of translation-invariant mappings, opening is translation-invariant. (ii) By ⊂ S implies By ⊂ T, so that increasingness follows from Proposition 9. (iii) Antiextensivity follows from Proposition 9. (iv) !B [!B (S)] ⊂ !B (S) by antiextensivity. For the reverse inclusion, z ∈ !B (S) implies that there exists y such that z ∈ By ⊂ S. By ⊂ S implies By ⊂ !B (S), so that z ∈ By ⊂ !B (S). By Proposition 9, z ∈ !B [!B (S)].
(46)
Opening can be used to filter small background clutter from an image. Small clutter is removed when no translate of the structuring element fits into the clutter components. Figure 5 shows the rectangle of Fig. 1 together with some background clutter. Opening by the disk of Fig. 1 yields the opening of Fig. 4. A set S is open with respect to set B if !B (S) = S. We say that S is B-open. Proposition 11. If A is B-open, then !A (S) ⊂ !B (S). Proof. z ∈ !A (S) implies that there exists y such that z ∈ Ay ⊂ S. Let z = w + y ∈ S, where w ∈ A. Because A is B-open, for each a ∈ A, there exists point b(a) such that a ∈ Bb(a) ⊂ A. Hence, z ∈ (Bb(w) )y ⊂ Ay ⊂ S, so that z ∈ !B (S). Proposition 12. S is B-open if and only if there exists a set A such that S = B (A). Proof. If S is B-open, then S = B [EB (S)], so the implication holds where A = EB (S). Conversely, suppose that S = B (A) and z ∈ S. Then there exists a ∈ A such that z ∈ Ba . By Eq. (19), Ba ⊂ S. Hence, z ∈ !B (S), and S ⊂ !B (S). By antiextensivity, S = !B (S). A special case of Proposition 12 occurs when A and B are disks of radii r and s, respectively. Then B (A) is a disk of radius r + s. Hence, a disk is open with respect to any disk that possesses a smaller radius. Proposition 13. If A is B-open, then
ΓB (S )
!A !B = !B !A = !A .
(47)
B Figure 4
Proof. By antiextensivity and increasingness, !A [!B (S)] ⊂ !A (S). Because A is B-open, !A (S) ⊂ !B (S). Idempotence
FOUNDATIONS OF MORPHOLOGICAL IMAGE PROCESSING
implies that !A (S) = !A [!A (S)] ⊂ !A [!B (S)].
(48)
Hence, !A [!B (S)] = !A (S). To prove the other identity, suppose that x ∈ !A (S). Then there exists y such that x ∈ Ay ⊂ S. By B-openness, x ∈ !B (A)y ⊂ S. Thus, there exists a point w such that x ∈ (Bw )y ⊂ Ay ⊂ S. By Proposition 9, x ∈ (Bw )y ⊂ !A (S), so that x ∈ !B [!A (S)]. Thus, !A (S) ⊂ !B [!A (S)]. The reverse inclusion follows by antiextensivity. As an increasing, translation-invariant mapping, opening must possess an erosion representation. According to Proposition 9, 0 ∈ !B (S) if and only if there exists y ∈ B such that 0 ∈ By ⊂ S, which means that y ∈ −B and By ⊂ S. The matter can be expressed as there exists z ∈ B such that B−z ⊂ S. Hence,
K [!B ] =
{S : B−z ⊂ S}.
(49)
z∈B
It follows at once that {B−z }z∈B satisfies the second condition of a basis, but it is not necessarily true that the first condition regarding no proper inclusions among basis sets is satisfied. However, if B is finite, then each translation possesses equal finite cardinality, and there can be no proper subset relation among them, so that for finite structuring elements, B [!B ] consists of all translates of B that contain the origin. The dual of opening is called closing, denoted by B (S) [or S • B] and defined by B (S)
= !B (Sc )c .
(50)
Straightforward algebra yields B
= E−B −B .
(51)
437
The strategy is to eliminate small salt and pepper components, thereby allowing the larger structuring elements to fit more likely when they are eventually applied in the process. REPRESENTATION OF OPENINGS AND CLOSINGS Opening can be generalized in accordance with the basic properties of Proposition 10. A mapping : P → P is called an algebraic opening if it is increasing, antiextensive, and idempotent. Further, is called a τ opening if it is translation-invariant. is called an algebraic closing if it is increasing, extensive, and idempotent. It is called a τ closing if it also translationinvariant. is a τ opening if and only if ∗ is a τ closing. The invariant class of any mapping , denoted by Inv[], is the collection of all sets S for which (S) = S. Owing to idempotence, (S) ∈ Inv[] for any algebraic opening and any set S. If is a τ opening, then S ∈ Inv[] if and only if Sy ∈ Inv[] for all y, which means that Inv[] is closed under translation. In fact, an algebraic opening is a τ opening if and only if its invariant class is closed under translation. A class B is called a base for , or a base for Inv[], if Inv[] is the class generated by B under translations and unions. This means that S ∈ Inv[] if and only if there exists a subfamily {Bi }i∈I of B and points yi such that S=
(Bi )yi .
(53)
i∈I
Bases for τ openings are not unique. Indeed, Inv[] is a base for itself. To see this, we need only show that Inv[] is closed under unions because we already know it is closed under translation. If Si ∈ Inv[] for all i ∈ I, then,
⊃ (Si ) = Si
Si
(54)
i∈I
By duality and Proposition 10, closing is translationinvariant, increasing, extensive, and idempotent. S is said to be B-closed if B (S) = S. By duality, S is B-open if and only if Sc is B-closed. By duality in conjunction with Proposition 12, S is B-closed if and only if there exists a set A such that S = E−B (A). Opening can be used to filter pepper noise; closing can be used to filter salt noise. When there is both union and subtractive noise, one strategy is to open to eliminate union noise in the background and then close to fill subtractive noise in the foreground. The resulting filter is called an open-close. A potential pitfall of the open-close strategy occurs when large noise components need to be eliminated, but a direct attempt to do so will destroy too much of the original image. One way around this problem is to employ an alternating sequential filter (ASF) (6,40). Open-close filters are performed iteratively, beginning with a very small structuring element and then proceeding with ever-increasing structuring elements. For B1 ⊂ B2 ⊂ · · · ⊂ Bm , an ASF takes the form =
Bm !Bm
Bm−1 !Bm−1
···
B1 !B1 .
(52)
for all i, and therefore,
Si
⊃
i∈I
Si .
(55)
i∈I
The reverse inclusion follows from antiextensivity, so that ∪i∈I Si ∈ Inv[]. The intent is to find small bases that determine filter behavior. A theorem of Matheron shows that openings by structuring elements are primary for representing of τ openings. Theorem 5 (1). A mapping : P → P is a τ opening if and only if there exists a class of sets B such that (S) =
!B (S).
(56)
B∈B
Moreover, B is a base for . Proof. First suppose that is given by the union. As a union of increasing, translation-invariant mappings,
438
FOUNDATIONS OF MORPHOLOGICAL IMAGE PROCESSING
is increasing and translation-invariant. As a union of antiextensive mappings, is antiextensive. As for idempotence, because of antiextensivity, we need show only that (S) ⊂ [(S)]. Now, z ∈ (S) implies that there exists B ∈ B and a point y such that z ∈ By ⊂ !B (S) ⊂ (S). Because By ⊂ (S), application of a second time will leave By ⊂ [(S)] because !B [(S)] ⊂ [(S)]. Conversely, suppose that is a τ opening. Let " be defined by the union with B = Inv[]. Because we already know that " is a τ opening, we need show only that = ". According to Proposition 9,
"(S) =
By .
(57)
and its complement. Boundaries between the set and its complement cannot be broken or changed; they are either left as is or deleted. CONVEXITY Convex sets play a significant role in constructing Euclidean set operators. A set S is convex if, for any two points x, y ∈ S, the line segment λ(x, y) between x and y is a subset of S. The line segment is defined by λ(x, y) = {ax + by : a ≥ 0, b ≥ 0, a + b = 1}.
(59)
B∈Inv[],By ⊂S
By idempotence, (S) ∈ Inv[]. Hence, (S) ⊂ "(S). Now, suppose that z ∈ "(S). Then, there exists B ∈ Inv[] and a point y such that z ∈ By ⊂ S. Because is increasing and translation-invariant, (S) ⊃ (By ) = (B)y = By . Hence, z ∈ (S), and = ". The claim that the union is across a base holds because Inv[] is a base for itself. Corollary 3. If is a τ opening, then its dual (a τ closing) has the representation ∗ (S) =
B (S).
(58)
B∈B
For any t > 0, tS is convex if S is convex. Because the intersection of convex sets is convex, Eq. (15) yields the convexity of EB (S) whenever S is convex. Proposition 14. If A and B are convex sets, so are the dilation, erosion, opening, and closing of A by B. Proof. Suppose that z, w ∈ B (A), r + s = 1, r ≥ 0, and s ≥ 0. According to Eq. (20), there exist a, a ∈ A and b, b ∈ B such that z = a + b and w = a + b . Owing to convexity, rz + sw = (ra + sa ) + (rb + sb ) ∈ B (A).
(60)
where B is a base for . The representation of Eq. (56) provides a filter-design paradigm. If an image is composed of a disjoint union of grains (connected components), then unwanted grains can be eliminated according to their sizes relative to the structuring elements in the base B . A key to good filtering is selection of appropriately sized structuring elements, because we wish to minimize elimination of signal grains and maximize elimination of noise grains. This leads to the theory of optimal openings. Because each opening in the expansion of Eq. (56) can, according to Theorem 1, be represented as a union of erosions, substituting the erosion representation of each opening into Eq. (56) ipso facto produces an erosion representation for . However, even if a basis expansion is used for each opening, there is redundancy in the resulting expansion. In a finite number of finite digital structuring elements, there exists a procedure to produce a minimal erosion expansion from the opening representation (41). As applied to a grain image according to Eq. (56), a τ opening passes certain components and eliminates others, but affects the passing grains. Corresponding to each τ opening is an induced reconstructive τ opening () defined in the following manner: a connected component is passed in full by () if it is not eliminated by ; a connected component is eliminated by () if it is eliminated by . If the clutter image of Fig. 5 is opened reconstructively by the ball of Fig. 2, then the rectangle is perfectly passed and all of the clutter is eliminated. Reconstructive openings belong to the class of connected operators (42–44). These are operators that either pass or completely eliminate grains in both the set
establishing the convexity of B (A). Convexity of opening and closing follow, because each is an iteration of a dilation and an erosion. It is generally true that (r + s)A ⊂ rA ⊕ sA, but the reverse inclusion does not always hold. However, if A is convex, then we have the identity (r + s)A = rA ⊕ sA.
(61)
Proposition 15. If r ≥ s > 0 and B is convex, then !rB (S) ⊂ !sB (S) for any set S. Proof. From the property of Eq. (61), rB = sB ⊕ (r − s)B.
(62)
By Proposition 12, rB is sB-open, and the conclusion follows from Proposition 11. If t ≥ 1 and we replace r, s, and B in Eq. (62) by t, 1, and A, respectively, then Proposition 12 establishes the following proposition: if A is convex, tA is A-open for any t ≥ 1. The converse is not generally valid; however, a significant theorem of Matheron states that it is valid under the assumption that A is compact. The proof is quite involved, and we state the theorem without proof. Theorem 6 (1). Let A be a compact set. Then tA is A-open for any t ≥ 1 if and only if A is convex.
FOUNDATIONS OF MORPHOLOGICAL IMAGE PROCESSING
GRANULOMETRIES Granulometries were introduced by Matheron to model parameterized sieving processes operating on random sets (1). If an opening is applied to a binary image composed of a collection of disjoint grains, then, some grains are passed (perhaps with diminution), and some are eliminated. If the structuring element is decreased or increased in size, then, grains are more or less likely to pass. A parameterized opening filter can be viewed in terms of its diminishing effect on image volume as the structuring element(s) increase in size. The resulting size distributions are powerful image descriptors, especially for classifying random textures. For motivation, suppose that S = S1 ∪ S2 ∪ · · · ∪ Sn , where the union is disjoint. Imagine that the components are passed over a sieve of mesh size t > 0 and that a parameterized filter t is defined componentwise according to whether or not a component passes through the sieve: t (Si ) = Si if Si does not fall through the sieve; t (Si ) = " if Si falls through the sieve. For the overall set, t (S) =
n
t (Si ).
(63)
i=1
Because the components of t (S) form a subcollection of the components of S, t is antiextensive. If T ⊃ S, then each component of T must contain a component of S, so that t (T) ⊃ t (S), and t is increasing. If the components are sieved iteratively through two different mesh sizes, then the output after both iterations depends only on the larger of the mesh sizes. In accordance with these remarks, an algebraic granulometry is defined as a family of operators t : P → P , t > 0, that satisfies three properties: (a) t is antiextensive; (b) t is increasing; (c) r s = s r = max{r,s} for r, s > 0 [mesh property]. If {t } is an algebraic granulometry and r ≥ s, then r = s r ⊂ s , where the equality follows from the mesh property, and the inclusion follows from the antiextensivity of r and the increasingness of s . The granulometric axioms are equivalent to two conditions. Proposition 16. {t } is an algebraic granulometry if and only if (i) for any t > 0, t is an algebraic opening; (ii) r ≥ s > 0 implies Inv[r ] ⊂ Inv[s ] [invariance ordering]. Proof. Assuming that {t } is an algebraic granulometry, we need to show idempotence and the invariance ordering of (ii). For idempotence, t t = max{t,t} = t . For invariance ordering, suppose that S ∈ Inv[r ]. Then, s (S) = s [r (S)] = max{s,r} (S) = r (S) = S.
(64)
439
To prove the converse, we need show only condition (c), since conditions (a) and (b) hold because t is an algebraic opening. Suppose that r ≥ s > 0. By idempotence and condition (ii), r (S) ∈ Inv[r ] ⊂ Inv[s ]. Hence, s r = r . Consequently, r = r r = r s r ⊂ r s ⊂ r ,
(65)
where the two inclusions hold because s is antiextensive and r is increasing. It follows that r ⊂ r s ⊂ r and r s = r = max{s,r} . Similarly to Eq. (65), r = r r = s r r ⊂ s r ⊂ r ,
(66)
and it follows that s r = r = max{s,r} , so that condition (c) is satisfied. If t is translation-invariant for all t > 0, then {t } is called a granulometry. For a granulometry, condition (i) of Proposition 16 is changed to say that t is a τ opening. If, t satisfies the Euclidean property, t (S) = t1 (S/t), for any t > 0, then {t } is called a Euclidean granulometry. In terms of sieving, translation invariance means that the sieve mesh is uniform throughout the space. The Euclidean condition means that scaling a set by 1/t, sieving by 1 , and then rescaling by t is the same as sieving by t . We call 1 the unit of the granulometry. The simplest Euclidean granulometry is an opening by a parameterized structuring element, !tB . For it, the Euclidean property states that (67) !tB (S) = t!B (S/t). Through Proposition 9, x ∈ !tB (S) if and only if there exists y such that x ∈ (tB)y ⊂ S, but (tB)y = tBy/t , so that x ∈ (tB)y ⊂ S if and only if x/t ∈ By/t ⊂ S/t, which means that x/t ∈ !B (S/t). REPRESENTATION OF EUCLIDEAN GRANULOMETRIES There is a general representation for Euclidean granulometries; before giving it, we develop some preliminaries. A crucial point to establish is that not just any class of sets can serve as the invariant class of a granulometric unit. Proposition 17. If {t } is a granulometry, then it satisfies the Euclidean condition if and only if Inv[t ] = t Inv[1 ], which means that S ∈ Inv[t ] if and only if S/t ∈ Inv[1 ]. Proof. Suppose that the Euclidean condition is satisfied and S ∈ Inv[t ]. Then, 1 (S/t) = t (S)/t = S/t, so that S/t ∈ Inv[1 ]. Now suppose that S/t ∈ Inv[1 ]. Then, t (S) = t1 (S/t) = t(S/t) = S, so that S ∈ Inv[t ]. To show the converse, let "t (S) = t1 (S/t). We claim that "t is a τ opening. Antiextensivity, increasingness, and translation invariance follow at once from the corresponding properties of t . For idempotence, "t ["t (S)] = t1 [t1 (S/t)/t] = t1 [1 (S/t)] = t1 (S/t) = "t (S).
(68)
440
FOUNDATIONS OF MORPHOLOGICAL IMAGE PROCESSING
Next, S ∈ Inv["t ] if and only if S/t ∈ Inv[1 ], which, according to the hypothesis, means that S ∈ Inv[t ]. Thus, Inv["t ] = Inv[t ], and it follows from Theorem 5 that t = "t and thus, t satisfies the Euclidean condition.
union, translation, and scalar multiplication by scalars t ≥ 1 that is generated by G is I. If {t } is the Euclidean granulometry with Inv[1 ] = I, then G is called a generator of {t }. The next theorem provides a more constructive characterization of Euclidean granulometries.
Theorem 7 (1). Let I be a class of subsets of Rn . There exists a Euclidean granulometry for which I is the invariant class of the unit if and only if I is closed under union, translation, and scalar multiplication by all t ≥ 1. Moreover, for such a class I, the corresponding Euclidean granulometry possesses the representation
Theorem 8 (1). An operator family {t }, t > 0, is a Euclidean granulometry if and only if there exists a class of images G such that
t (S) =
!tB (S),
(69)
t (S) =
!rB (S).
(72)
B∈G r≥t
Moreover, G is a generator of {t }.
B∈I
where Inv[1 ] = I. Proof. First suppose that there exists a Euclidean granulometry {t } for which I = Inv[1 ]. To prove closure under unions, consider a collection of sets Si ∈ Inv[1 ], and let S be the union of the Si . Because 1 is increasing, S=
i
Si =
i
1 (Si ) ⊂ 1
Si
= 1 (S).
(70)
i
1 S S 1 = !sB !tsB (S) = !rB (S), = t t t t r≥t s≥1 s≥1 B∈G
Because 1 is antiextensive, the reverse inclusion holds, S ∈ Inv[1 ], and there is closure under unions. Because an algebraic opening is a τ opening if and only if its invariant class is invariant under translation, Inv[1 ] is closed under translation. Now suppose that S ∈ Inv[1 ] and t ≥ 1. By the Euclidean condition, tS ∈ Inv[t ], and 1 (tS) = 1 [t (tS)] = max{1,t} (tS) = t (tS) = tS.
Proof. First, we show that Eq. (72) yields a Euclidean granulometry for any class G . According to Theorem 5, t is a τ opening with base B t = {rB : B ∈ G , r ≥ t}. If u ≥ t, then, B u ⊂ B t , which implies that Inv[u ] ⊂ Inv[t ]. To show that {t } is a Euclidean granulometry, we apply Proposition 17 and Eq. (67). Indeed, S/t ∈ Inv[1 ] if and only if
(71)
Therefore, tS ∈ Inv[1 ], and Inv[1 ] is closed under scalar multiplication by t ≥ 1. For the converse of the proposition, we need to find a granulometry for which I is the invariant class of the unit. According to Theorem 5, t defined by Eq. (69) is a τ opening whose base is tI. Because I is closed under unions and translations, Inv[t ] = tI, and I = Inv[1 ]. To show that {t } is a Euclidean granulometry, we must demonstrate invariance ordering and the Euclidean condition. Suppose that r ≥ s > 0 and S ∈ Inv[r ]. Then, S = rB for some B ∈ I. Because r/s ≥ 1, (r/s)B ∈ I, which implies that S = rB = sC for some C ∈ I, which implies that S ∈ sI = Inv[s ]. Finally, according to Proposition 17, the Euclidean condition holds because, by construction, Inv[t ] = t Inv[1 ]. Taken together with Proposition 17, which states that invariant classes of Euclidean granulometries are determined by the invariant class of the unit, Theorem 7 characterizes the form and invariant classes of Euclidean granulometries. Nevertheless, as it stands, it does not provide a useful framework for filter design because one must construct invariant classes of units and we need a methodology for construction. Suppose that I is a class of sets closed under union, translation, and scalar multiplication by t ≥ 1. A class G of sets is called a generator of I if the class closed under
B∈G
B∈G
(73) which, upon canceling 1/t, says that S/t ∈ Inv[1 ] if and only if S ∈ Inv[t ]. Because B 1 is a base for 1 , the form of B 1 shows that G is a generator of {t }. As for the converse, because {t } is a Euclidean granulometry, t has the representation of Eq. (69) and, because I = Inv[1 ] is a generator of itself, 1 (S) =
!rB (S).
(74)
B∈I r≥1
By the Euclidean condition and Eq. (67), t (S) = t
B∈I r≥1
!rB (S/t) =
B∈I r≥1
!rtB (S) =
!uB (S),
B∈I u≥t
(75) so that t possesses a representation of the desired form. Theorem 8 provides a methodology for constructing Euclidean granulometries: select a generator G and apply Eq. (72); however, such an approach is problematic in practice because it involves a union across all r ≥ t for each t. To see the problem, suppose that we choose a singleton generator G = {B}. Then, Eq. (72) yields the representation !rB (S), (76) t (S) = r≥t
which is an uncountable union. According to Theorem 6, if B is compact and convex, then, rB is tB-open, so that !rB (S) ⊂ !tB (S), and the union reduces to the single opening t (S) = !tB (S). Because Theorem 6 is an
FOUNDATIONS OF MORPHOLOGICAL IMAGE PROCESSING
equivalence, for compact B, we require the convexity of B to obtain the reduction. This reasoning extends to an arbitrary generator: for a generator composed of compact sets, the double union of Eq. (72) reduces to the single outer union over G , t (S) =
!tB (S),
(77)
B∈G
if and only if G is composed of convex sets, in which case we say that the granulometry is convex. The single union represents a parameterized τ opening. The generator sets of a convex granulometry are convex, and therefore connected. Hence, if S1 , S2 , . . . are mutually disjoint compact sets, then, t
∞
Si
=
i=1
∞
t(Si ),
(78)
i=1
that is, a convex granulometry is distributive, and it can be viewed componentwise. Although we have restricted our development to binary granulometries, as conceived by Matheron, the theory can be extended to gray-scale images (45,46), and the algebraic theory to the framework of complete lattices (10). RECONSTRUCTIVE GRANULOMETRIES The representation of Eq. (77) can be generalized by separately parameterizing each structuring element, rather than simply scaling each by a common parameter. To avoid cumbersome subscripts, we will now switch to the infix notation S°B for the opening of S by B. Assuming a finite number of convex structuring elements, individual structuring-element parameterization yields a family {r } of multiparameter τ openings of the form
441
need not be invariance ordered. As it stands, {r } is simply a collection of τ openings across a parameter space. The failure of the family of Eq. (79) and other operator families defined via unions and intersections of parameterized openings to be granulometries is overcome by openingwise reconstruction and leads to the class of logical granulometries (47). Regarding Eq. (79), the induced reconstructive family {r }, defined by r (S) =
n
(S°Bk [rk ]) =
k=1
n
S°Bk [rk ] ,
(80)
k=1
is a granulometry (because it is invariance ordered). As shown in Eq. (80), reconstruction can be performed openingwise or on the union. {r } is called a disjunctive granulometry. Although Eq. (79) does not generally yield a granulometry without reconstruction, a salient special case occurs when each structuring element has the form ti Bi . In this case, for any n-vector t = (t1 , t2 , . . . , tn ), ti > 0, for i = 1, 2, . . . , n, the filter takes the form t (S) =
n
S°ti Bi .
(81)
i=1
To avoid useless redundancy, we assume that no set in the base is open with respect to another set in the base, meaning that for i = j, Bi °Bj = Bi . For any t = (t1 , t2 , . . . , tn ) for which there exists ti = 0, we define t (S) = S. {t } is a multivariate granulometry (because it is a granulometry without reconstruction) (48). If the union of Eq. (79) is changed to an intersection and all conditions that qualify Eq. (79) are maintained, then the result is a family of multiparameter operators of the form n S°Bk [rk ]. (82) r (S) = k=1
r (S) =
n
S°Bk [rk ],
(79)
k=1
where r1 , r2 , . . . , rn are parameter vectors governing the convex, compact structuring elements B1 [r1 ], B2 [r2 ], . . . , Bn [rn ] that compose the base of r and r = (r1 , r2 , . . . , rn ). To keep the notion of sizing, we require (here and subsequently) the sizing condition that rk ≤ sk implies Bk [rk ] ⊂ Bk [sk ] for k = 1, 2, . . . , n, where vector order is defined by (t1 , t2 , . . . , tm ) ≤ (u1 , u2 , . . . , um ) if and only if tj ≤ uj for j = 1, 2, . . . , m. r is a τ opening because any union of openings is a τ opening; however, because the parameter is a vector now, the second condition of Proposition 16 does not apply as stated. To generalize the condition, we use componentwise ordering in the vector lattice. Condition (ii) of Proposition 16 becomes (ii ) r ≥ s > 0 ⇒ Inv[r ] ⊂ Inv[s ]. Condition (ii ) states that the mapping r → Inv[r ] is order reversing and we say that any family {r } for which it holds is invariance ordered. If r is a τ opening for any r and a family {r } is invariance ordered, then we call {r } a granulometry. The family {r } defined by Eq. (79) is not necessarily a granulometry because it
Each operator r is translation-invariant, increasing, and antiextensive but, unless n = 1, r need not be idempotent. Hence, r is not generally a τ opening, and the family {r } is not a granulometry. Each induced reconstruction (r ) is a τ opening (is idempotent), but the family {(r )} is not a granulometry because it is not invariance ordered. However, if reconstruction is performed openingwise, then the resulting intersection of reconstructions is invariance ordered and a granulometry. The family of operators r (S) =
n
[S°Bk (rk )]
(83)
k=1
is called a conjunctive granulometry. In the conjunctive case, the equality of Eq. (81) is softened to an inequality: the reconstruction of the intersection is a subset of the intersection of the reconstructions. Conjunction and disjunction can be combined to form a more general form of reconstructive granulometry: r (S) =
mk n k=1 j=1
[S°Bk,j (rk,j )].
(84)
442
FOUNDATIONS OF MORPHOLOGICAL IMAGE PROCESSING
If Si is a component of S and xi,k,j and yi are the logical variables determined by the truth values of the equations Si °Bk,j [rk,j ] = " and r (Si ) = " [or, equivalently, [Si °Bk,j (rk,j )] = Si and r (Si ) = Si ], respectively, then y possesses the logical representation yi =
mk n
xi,k,j .
(85)
k=1 j=1
We call {r } a logical granulometry. Component Si is passed if and only if there exists k such that, for j = 1, 2, . . . , mk , there exists a translate of Bk,j (rk,j ) that is a subset of Si . Disjunctive and conjunctive granulometries, are special cases of logical granulometries, and the latter compose a class of sieving filters that locate targets among clutter based on the size and shape of the target and clutter structural components. For fixed r, we refer to r as a disjunctive, conjunctive, or logical opening, based on the type of reconstructive granulometry from which it arises. Logical openings form a subclass of a more general class of reconstructive sieving filters called logical structural filters (49). These are not granulometric in the sense of Matheron; they need not be increasing. SIZE DISTRIBUTIONS For increasing t, a granulometry causes increasing diminution of a set. The rate of diminution is a powerful image descriptor. Consider a finite-generator convex Euclidean granulometry {t } of the form given in Eq. (77) that has compact generating sets containing more than a single point. Fixing a compact set S of positive measure, letting ν denote area, and treating t as a variable, we define the size distribution as
(t) = ν[S] − ν[t (S)].
(86)
(t) measures the area removed by t . is an increasing function for which (0) = 0, and (t) = ν(S) for sufficiently large t. (t) is a random function whose realizations are characteristics of the corresponding realizations of the random set. Taking the expectation of (t) gives the mean size distribution (MSD), M(t) = E[ (t)]. The (generalized) derivative, H(t) = M (t), of the mean size distribution is called the granulometric size density (GSD). The MSD is not a probability distribution function because M(t) → E[v(S)], as t → ∞. Hence, the GSD is not a probability density. The MSD and GSD serve as partial descriptors of a random set in much the same way as the power spectral density partially describes a wide-sense-stationary random function, and they play a role analogous to the power spectral density in the design of optimal granulometric filters (12,50–53). The pattern spectrum of S is defined by (t) = is a prob (t)/ν(S), or its derivative, = d /dt. ability distribution function, and its derivative is a probability density. The expectation E[ (t)] is a probability distribution function, and we call its derivative, #(t) = dE[ (t)]/dt, the pattern-spectrum density (PSD).
# is a probability density and, under nonrestrictive regularity conditions, #(t) = E[ (t)]. Treating S as a random set, (t) is a random function, and its moments, called granulometric moments, are random variables. The pattern spectrum and the granulometric moments are used to provide image characteristics in various applications, and in particular are used for texture and shape classification (54–61). Because the moments of (t) are used for classification, three basic problems arise: (1) find expressions for the pattern-spectrum moments; (2) find expressions for the moments of the pattern-spectrum moments; (3) describe the probability distributions of the pattern-spectrum moments. In this vein, various properties of pattern spectra have been studied: asymptotic behavior (relative to grain count) of the pattern-spectrum moments (62–64), effects of noise (65), continuous-todiscrete sampling (66), and estimation (67,68). Granulometric classification has also been applied to gray-scale textures (69–71). Given a collection of convex, compact sets B1 , B2 , . . . , BJ , there exist granulometric moments for each granulometry {S°tBj }, j = 1, 2, . . . , J. If we take the first q moments of each granulometry, then m = qJ features, µ(k) (Si ; Bj ), are generated for each Si , thereby yielding, for each Si , an m-dimensional feature vector. Size distributions are applied locally to classify individual pixels. The granulometries are applied to the whole image, but a size distribution is computed at each pixel by taking pixel counts in a window about each pixel. BIBLIOGRAPHY 1. G. Matheron, Random Sets and Integral Geometry, Wiley, NY, 1975. 2. J. Serra, Image Analysis and Mathematical Morphology, Academic Press, NY, 1983. 3. H. Minkowski, Math. Ann. 57, 447–495 (1903). 4. H. Hadwiger, Altes und Neues Uber Konvexe Korper, Birkhauser-Verlag, Basel, 1955. 5. H. Hadwiger, Vorslesungen Uber Inhalt, Oberflache and Isoperimetrie, Springer-Verlag, Berlin, 1957. 6. S. Sternberg, Comput. Vision Graphics Image Process. 35(3), 337–355 (1986). 7. J. Serra, in J. Serra, ed., Image Analysis and Mathematical Morphology, vol. 2, Theoretical Advances, Academic Press, NY, 1988. 8. H. J. Heijmans and C. Ronse, Comput. Vision Graphics Image Process. 50, 245–295 (1990). 9. C. Ronse and H. J. Heijmans, Comput. Vision Graphics Image Process. 54, 74–97 (1991). 10. H. J. Heijmans, Morphological Operators, Academic Press, NY, 1995. 11. E. R. Dougherty, in E. R. Dougherty and J. T. Astola, eds., Nonlinear Filters for Image Processing, SPIE and IEEE Presses, Bellingham, 1999. 12. E. R. Dougherty and J. T. Astola, eds., Nonlinear Filters for Image Processing, SPIE and IEEE Presses, Bellingham, 1999. 13. E. R. Dougherty and Y. Chen, in E. R. Dougherty and J. T. Astola, eds., Nonlinear Filters for Image Processing, SPIE and IEEE Presses, Bellingham, 1999.
FOUNDATIONS OF MORPHOLOGICAL IMAGE PROCESSING
443
14. E. R. Dougherty, An Introduction to Morphological Image Processing, SPIE Press, Bellingham, 1992.
42. J. Crespo and R. Schafer, Math. Imaging Vision 7(1), 85–102 (1997).
15. P. Soille, Morphological Image Analysis, Springer-Verlag, NY, 1999. 16. J. Serra, Image Analysis and Mathematical Morphology, Academic Press, London, 1982. 17. E. R. Dougherty, ed., Mathematical Morphology in Image Processing, Marcel Dekker, NY, 1993.
43. J. Crespo, J. Serra, and R. Schafer, Signal Process. 47(2), 201–225 (1995). 44. H. Heijmans, in E. R. Dougherty and J. T. Astola, eds., Nonlinear Filters for Image Processing, SPIE and IEEE Presses, Bellingham, 1999. 45. E. R. Dougherty, Math. Imaging Vision 1(1), 7–21 (1992).
18. P. Maragos and R. Schafer, IEEE Trans. Acoust. Speech Signal Process. 35, 1153–1169 (1987). 19. C. R. Giardina and E. R. Dougherty, Morphological Methods in Image and Signal Processing, Prentice-Hall, Englewood Cliffs, NY, 1988. 20. G. J. F. Banon and J. Barrera, SIAM J. Appl. Math. 51(6), 1782–1798 (1991). 21. P. Maragos and R. Schafer, IEEE Trans. Acoust. Speech Signal Process. 35, 1170–1184 (1987). 22. E. R. Dougherty and D. Sinha, Signal Process. 38, 21–29 (1994). 23. G. J. F. Banon and J. Barrera, Signal Process. 30, 299–327 (1993). 24. E. R. Dougherty and D. Sinha, Real-Time Imaging 1(1), 69–85 (1995).
46. E. Kraus, H. J. Heijmans, and E. R. Dougherty, Signal Process. 34, 1–17 (1993). 47. E. R. Dougherty and Y. Chen, in J. Goutsias, R. Mahler, and C. Nguyen, eds., Random Sets: Theory and Applications, Springer-Verlag, NY, 1997. 48. S. Batman and E. R. Dougherty, Opt. Eng. 36(5), 1518–1529 (1997). 49. E. R. Dougherty and Y. Chen, Opt. Eng. 37(6), 1668–1676 (1998). 50. E. R. Dougherty et al., Signal Process. 29, 265–281 (1992). 51. R. M. Haralick, P. L. Katz, and E. R. Dougherty, Comput. Vision Graphics Image Process. Graphical Models Image Process. 57(1), 1–12 (1995). 52. E. R. Dougherty, Math. Imaging Vision 7(2), 175–192 (1997).
25. E. R. Dougherty and D. Sinha, Real-Time Imaging 1(4), 283–295 (1995). 26. E. R. Dougherty and J. Barrera, in E. R. Dougherty and J. T. Astola, eds., Nonlinear Filters for Image Processing, SPIE and IEEE Presses, Bellingham, 1999. 27. J. Barrera, E. R. Dougherty, and N. S. Tomita, Electron. Imaging 6(1), 54–67 (1997). 28. E. J. Coyle and J.-H. Lin, IEEE Trans. Acoust. Speech Signal Process. 36(8), 1244–1254 (1988). 29. E. R. Dougherty, CVGIP: Image Understanding 55(1), 36–54 (1992). 30. E. R. Dougherty and R. P. Loce, Opt. Eng. 32(4), 815–823 (1993). 31. E. R. Dougherty and R. P. Loce, Signal Process. 40(3), 129–154 (1994). 32. E. R. Dougherty and R. P. Loce, Electron. Imaging 5(1), 66–86 (1996). 33. E. R. Dougherty, Y. Zhang, and Y. Chen, Opt. Eng. 35(12), 3495–3507 (1996). 34. M. Gabbouj and E. J. Coyle, IEEE Trans. Acoust. Speech Signal Process. 38(6), 955–968 (1990). 35. P. Kuosmanen and J. Astola, Signal Process. 41(3), 165–211 (1995). 36. R. P. Loce and E. R. Dougherty, Visual Commun. Image Representation 3(4), 412–432 (1992). 37. R. P. Loce and E. R. Dougherty, Opt. Eng. 31(5), 1008–1025 (1992). 38. R. P. Loce and E. R. Dougherty, Enhancement and Restoration of Digital Documents: Statistical Design of Nonlinear Algorithms, SPIE Press, Bellingham, 1997. 39. P. Salembier, Visual Commun. Image Representation 3(2), 115–136 (1992). 40. J. Serra, in J. Serra, ed., Image Analysis and Mathematical Morphology, vol. 2, Theoretical Advances, Academic Press, NY, 1988. 41. E. R. Dougherty, Pattern Recognition Lett. 14(3), 1029–1033 (1994).
53. Y. Chen and E. R. Dougherty, Signal Process. 61, 65–81 (1997). 54. P. Maragos, IEEE Trans. Pattern Analy. Mach. Intelligence 11, 701–716 (1989). 55. E. R. Dougherty and J. Pelz, Opt. Eng. 30(4), 438–445 (1991). 56. E. R. Dougherty, J. T. Newell, and J. B. Pelz, Pattern Recognition 25(10), 1181–1198 (1992). 57. L. Vincent and E. R. Dougherty, in E. Dougherty, ed., Digital Image Processing Methods, Marcel Dekker, NY, 1994. 58. B. Li and E. R. Dougherty, Opt. Eng. 32(8), 1967–1980 (1993). 59. R. Sobourin, G. Genest, and F. Preteux, IEEE Trans. Pattern Anal. Mach. Intelligence 19(9), 989–1003 (1997). 60. G. Ayala, M. E. Diaz, and L. Martinez-Costa, Pattern Recognition 34(6), 1219–1227 (2001). 61. Y. Balagurunathan et al., Image Anal. Stereology 20, 87–99 (2001). 62. E. R. Dougherty and F. Sand, Visual Commun. Image Representation 6(1), 69–79 (1995). 63. F. Sand and E. R. Dougherty, Visual Commun. Image Representation 3(2), 203–214 (1992). 64. F. Sand and E. R. Dougherty, Pattern Recognition 31(1), 53–61 (1998). 65. B. Bettoli and E. R. Dougherty, Math. Imaging Vision 3(3), 299–319 (1993). 66. E. R. Dougherty and C. R. Giardina, SIAM J. Appl. Math. 47(2), 425–440 (1987). 67. K. Sivakumar and J. Goutsias, in J. Serra and P. Soille, eds., Mathematical Morphology and its Applications to Image Processing, Kluwer Academic, Boston, 1994. 68. K. Sivakumar and J. Goutsias, Electron. Imaging 6(1), 31–53 (1997). 69. Y. Chen and E. R. Dougherty, Opt. Eng. 33(8), 2713–2722 (1994). 70. Y. Chen, E. R. Dougherty, S. Totterman, and J. Hornak, Magn. Resonance Med. 29(3), 358–370 (1993). 71. S. Baeg et al., Electron. Imaging 8(1), 65–75 (1999).
G GRAVITATION IMAGING
across the denser body and likewise negative across the other body. In fact, gravitational imaging is in essence the method we use to detect and map the extent of such density contrasts. However, the lithosphere is composed of many heterogeneous bodies of rock, and thus, the appearance of images of gravity anomalies is often complex and makes the separation of anomalies due to different density contrasts difficult. Another consideration is the fundamental ambiguity in gravity studies that is due to the fact that the variation in mass that causes a particular gravity anomaly can be represented by many geologically reasonable combinations of volume and density. Thus, we can be confident that anomalies in gravitational images locate anomalous masses, but we require independent information such as data from drill holes to determine the geometry and density of the body that causes the anomaly. Even though it is a first-order approximation of reality, Newton’s law of gravitation should form the basis for our intuitive understanding of most aspects of gravitational imaging. If the earth were a perfect sphere consisting of concentric shells of constant density and was not rotating, Newton’s law of gravitation would predict the gravitational attraction g between the earth (mass = Me ) and a mass m1 on its surface as
G. RANDY KELLER University of Texas at El Paso El Paso0, TX
FUNDAMENTAL THEORY AND PRACTICE Studies of the earth’s gravity field (and those of other planetary bodies) are a prime example of modern applications of classical Newtonian physics. We use knowledge of the earth’s gravity field to study topics such as the details of the earth’s shape (geodesy), predicting the orbits of satellites and the trajectories of missiles, and determining the earth’s mass and moment of inertia. However, gravitational imaging as defined here refers to geophysical mapping and interpretation of features in the earth’s lithosphere (the relatively rigid outer shell that extends to depths of ∼100 km beneath the surface). In fact, the emphasis is on the earth’s upper crust that extends to depths of about 20 km, because it is this region where gravity data can best help delineate geologic features related to natural hazards (faults, volcanoes, landslides), natural resources (water, oil, gas, minerals, geothermal energy), and tectonic events such as the formation of mountain belts. Such studies provide elegantly straightforward demonstrations of the applicability of classical physics and digital processing to the solution of a variety of geologic problems. These problems vary in scale from very local investigations of features such as faults and ore bodies to regional investigations of the structure of mountain belts and tectonic plates. Mass m is a fundamental property of matter, and density ρ is the mass per unit volume (v); thus, m = ρv. The variation in density in the lithosphere produces mass variations and thus changes in the gravity field that we seek to image. To produce images, we must first apply corrections to our gravity measurements that remove known variations in gravity with respect to elevation and latitude. Our goal is to derive gravity anomalies that represent departures from what we know about the gravity field and to construct images of these values. The lithosphere constitutes only about 5% of the earth’s volume, and because density generally increases with depth in the earth, the lithosphere is a very small portion of the earth’s mass. However, below the lithosphere, the earth can be thought of as concentric shells of material whose density is relatively constant. Thus, the vast majority of the earth’s gravity field is due to the material below the lithosphere and varies in a subtle and very long-wavelength fashion. However, within the lithosphere, rocks vary in density from less than that of water (pumice, a volcanic rock can actually float in water) to more than 4000 kg m−3 . A contrast in density between adjacent bodies of rock produces a gravity anomaly that is positive
g = γ Me m1 /R2e ,
(1)
where Re is the radius of the Earth and γ is the International gravitational constant (γ = 6.67 × 10−11 m3 kg−1 s−2 ). In actuality, the earth’s gross shape departs slightly from spherical, there is topography on the continents that can be thought of as variations in Re , the density within the Earth varies (and varies complexly in the lithosphere), and a slow rotation is present. However, all of these complications are second order. In studies of lithospheric structure, the search is for gravity anomalies (differences between observed gravity values and what is expected based on first principles and planetary-scale variations). With respect to the total gravity field of the earth, these anomalies are at most only a few parts per thousand in amplitude. Images (maps) of the values of these anomalies are used to infer the earth’s structure and are well suited to be integrated with other data such as satellite images and digital elevation models. For example, a simple overlay of gravity anomalies on a Landsat image provides an easy and effective depiction of the way subsurface mass distributions correlate with surface features. Qualitative interpretation of gravity anomalies is no more complex than calling upon Newton’s law to tell us that positive anomalies indicate the presence of a local mass excess while negative anomalies indicate local mass deficiencies. As discussed later, several different types of anomalies have been defined based on the known variations in the 444
GRAVITATION IMAGING
The idealized shape of the earth, an ellipsoid whose major axis is a and minor axis is b.
b a
Flattening (f ) f = (a −b)/a
Figure 1. An example of an idealized reference spheroid that is used to predict gravity values at sea level.
earth’s gravity field that are considered before calculating the anomaly value. However, we start from a basic formula for the gravitational attraction of a rotating ellipsoid (Fig. 1) with flattening f , derived by Clairaut in 1743. This formula predicts the value of gravity (Gt ) at sea level as a function of latitude (φ). In the twentieth century, higher order terms were added so that the formula takes the form (2) Gt = Ge (1 + f2 sin2 φ − f4 sin4 φ), where Ge = global average value of the gravitational acceleration at the equator. = angular velocity of the Earth’s rotation. m = 2 a/Ge , ( 2 a = centrifugal force at the equator, a = equatorial radius of the ellipsoid, b = polar radius of the ellipsoid) and
445
gravitational attraction is 980 cm/s, but this formula shows that the gravitational attraction of the earth at sea level varies from about 978 cm/s2 at the equator to about 983 cm/s2 at the poles. Gravity surveys on land routinely detect anomalies that have amplitudes of a 0.1 mGal and thus have the rather remarkable precision of 1 part in 1 million. Surveys whose precision is 0.01 mGal are common. By merely subtracting Gt from an observed value of the gravitational acceleration (Gobs ), we calculate the most fundamental type of gravity anomaly. However, the effects of elevation are so large that such an anomaly value means little except at sea level. Instead, the Free Air, Bouguer, and residual anomaly values described later are calculated and interpreted. Maps of these anomaly values have been constructed and interpreted for decades, and using modern techniques, it is these values that are imaged. LANGUAGE The intent is to introduce only those terms and concepts necessary to understand the basics of gravitational imaging. The Gravity and Magnetics Committee of the Society of Exploration Geophysicists maintains a web site that includes a dictionary of terms (http://seg.org/comminfo/grav− mag/gm− dict.html), and a link to a glossary (http://www.igcworld.com/gm− glos.html) that is maintained by the Integrated Geophysics Corporation. Gravity
f = (a − b)/a f2 = −f + 5/2 m + 1/2 f 2 − 26 f m + 15/4 m2 f4 = 1/2 f 2 + 5/2 f m The values of Ge , a, b, and f (flattening) are known to a considerable level of precision but are constantly being refined by a variety of methods. Occasionally, international scientific organizations agree on revised values for these quantities. Thus, all calculated values of gravity anomalies need to be adjusted when these revisions are made. As of 2000, practitioners commonly use the following equation that is based on the Geodetic Reference System 67 (1). Gt (mGals) = 978031.846(1 + 0.005278895 sin2 φ + 0.000023462 sin4 φ).
(3)
Due to the advent of the Global Positioning System (GPS), the World Geodetic System 1984 (2) is being widely adopted, and the reduction equation for this system will become the standard. The National Imaging and Mapping Agency maintains a web site (http://164.214.2.59/GandG/pubs.html) that has the latest information on geodetic systems and models of the earth’s gravity field. The units for gravity measurements are cm/s2 or Gals in honor of Galileo; Eq. 3 produces values whose units are milliGals (mGal). We learn that the value of the earth’s
Technically, gravity g is the gravitational acceleration due to the sum of the attraction of the earth’s mass and the effects of its rotation. However, it is common practice for geophysicists to say that they are measuring gravity and think of it as a force (a vector), that represents the attraction of the earth on a unit mass (F = Me g). In most cases, geophysicists tacitly assume that g is directed toward the center of the earth, which is true to much less than 10 in most places. This practice results in treating gravity, effectively as a scalar quantity. Geoid The theoretical treatment of the earth’s gravity field is based on potential theory (3). The gravitational potential at a point is the work done by gravity as a unit mass is brought from infinity to that point. One differentiates the potential to arrive at the gravitational attraction. Thus, it is important to remember that equipotential surfaces are not equal gravity surfaces. This concept is less abstract if we realize that mean sea level is an equipotential surface that we call the geoid. For the continents, the geoid can be thought of as the surface sea level would assume if canals connected the oceans and the water was allowed to flow freely and reach its equilibrium level. Another important consideration is that a plumb bob (a weight on a string) always hangs in a direction perpendicular to the geoid. This is the very definition of vertical, which is obviously important in surveying topography. The technical definition of elevation is also height above the geoid. However, reference spheroids
446
GRAVITATION IMAGING
that approximate the geoid, at least locally, are employed to create coordinate systems for constructing maps (see http://www.Colorado.EDU/geography/gcraft/notes/notes. html for a good primer on geodetic data). Thus, mapping the geoid is a key element in determining the earth’s shape. Gravimeter The measurement of absolute gravity values is a very involved process that usually requires sophisticated pendulum systems. However, gravimeters that measure differences in gravity are elegantly simple and accurate. These instruments were perfected in the 1950s, and although new designs are being developed, most instruments work on the simple principle of measuring the deflection of a suspended mass as a result of changes in the gravity field. A system of springs suspends this mass, and it is mechanically easier to measure the change in tension on the main spring required to bring the mass back to a position of zero deflection than to measure the minute deflection of the mass. If gravity increases from the previously measured value, the spring is stretched, and the tension must be increased to return it to zero deflection. If gravity decreases, the spring contracts and the tension must be decreased. The gravimeter must be carefully calibrated so that the relative readings it produces can be converted into differences in mGals. Thus, each gravimeter has its own calibration constant, or table of constants, if the springs do not behave linearly over the readable range of the meter. Instruments that measure on land are the most widely used and can easily produce results that are correct to 0.1 mGal. Meters whose precision is almost 0.001 mGal are available. Specially designed meters can be lowered into boreholes or placed in waterproof vessels and lowered to the bottom of lakes or shallow portions of the ocean. Considering the accelerations that are present on moving platforms such as boats and aircraft and the precision required for meaningful measurements, it is surprising that gravity measurements are regularly made in such situations. The gravimeters are placed on platforms that minimize accelerations, and successive measurements must be averaged. But precision of 1–5 mGal is obtained in this fashion.
Topography Gravity stations
Elevation of lowest gravity station Station 1
Elevation datum
Bouguer slab for station 1
Figure 2. A diagram that shows how elevation corrections (Free Air and Bouguer) are used in gravity studies. The Free Air correction compensates for variations in the distance from the datum chosen for the procedure, which is usually sea level. The Bouguer correction compensates for the mass between the location of an individual gravity reading and the datum.
Observed Gravity Value By reading a gravimeter at a base station and then at a particular location (usually called a gravity station; Fig. 2), we can convert the difference into an observed gravity value by first multiplying this difference by the calibration constant of the gravimeter. This converts the difference from instrument readings into mGal. Then, this difference is corrected for meter drift and earth tides (see later) and is added to the established gravity value at the base station to obtain the observed gravity value (Gobs ) at the station. This process is to some degree analogous to converting electromagnetic sensor readings from a satellite to radiance values. Corrections of Gravity Measurements To arrive at geologically meaningful anomaly values, a series of ‘‘corrections’’ are made to raw observations of differences between gravity measured at a station and a base station. The use of this term is misleading because most of these ‘‘corrections’’ are really adjustments that compensate (at least approximately) for known variations in the gravity field that do not have geologic meaning. Drift Correction Gravimeters are simple and relatively stable instruments, but they do drift (i.e., the reading varies slightly with time). Because of the sensitivity of these instruments, they are affected by temperature variations, fatigue in the internal springs, and minor readjustments in their internal workings, and these factors are the primary cause of instrument drift. In addition, earth tides cause periodic variations in gravity that may be as large as ∼0.3 mGal during about 12 hours. In field operations, these factors cause small changes in gravity readings with time. The gravity effects of earth tides can be calculated (4), or they can be considered part of the drift. One deals with drift by making repeated gravity readings at designated stations at time intervals that are shorter, as the desired precision of the measurement increases. It is assumed that the drift is linear between repeated occupations of the designated stations, and during a period of a few hours, this is usually a valid assumption. The repeated values are used to construct a drift curve, which is used to estimate the drift for readings that were made at times between those of the repeated readings. Because one encounters so many different situations in real field operations, it is hard to generalize about the way one proceeds. However, the key concern is that no period of time should occur which is not spanned by a repeated observation. This is a way of saying that the drift curve must be continuous. If the meter is jarred, a tare (an instantaneous variation in reading) may occur. If one expects that this has occurred, simply return to the last place a reading was made. If there is a significant difference, a tare has occurred, and a simple constant shift (the difference in readings) is made for all subsequent readings.
GRAVITATION IMAGING
Tidal Correction As discussed earlier, variations in the gravity field due to the earth’s tides can be calculated if one makes assumptions about the rigidity of the lithosphere (4). In fact, the rigidity of the lithosphere can be estimated by studying the earth’s tides. If the effects of tides are calculated separately, the correction for this effect is called the tidal correction. Latitude Correction The international gravity formula predicts that gravity increases by about 5300 mGal from the equator to the poles. The rate of this increase varies slightly as a function of latitude (φ), but it is about 0.8 mGal/km. The preferable approach to correct for this effect is to tie repeat stations to the IGSN 71 gravity net (1,5). The IGSN 71 base stations are available in digital form (files whose prefix is dmanet93) in Dater et al. (6). Then, one can use the international gravity formula to calculate the expected value of gravity, which will vary with latitude. Thus, the first level calculation of the gravity anomaly at the station (Ganomaly = Gobs − Gt ) will have the adjustment for latitude built into the computation. For a local survey of the gravity field, one can derive the formula for the N–S gradient of the gravity field (1.3049 sin 2φ mGal/mile or 0.8108 sin 2φ mGal/km) by differentiating the International Gravity Formula with respect to latitude. Then, a base station is chosen and all gravity readings are corrected for latitude by multiplying the distance by which a gravity station is north or south of this base station by this gradient. Stations located closer to the pole than the base station have higher readings just because of their geographic position; thus, the correction would be negative. The correction is positive for stations nearer the equator than the base station. Free Air Correction In a typical gravity survey, the elevation of the various stations varies considerably (Fig. 2) and produces significant variations in observed gravity because Newton’s law of gravitation predicts that gravity varies with the distance from the center of the earth (7). The vertical gradient of gravity is about −0.3086 mGal/m. The amplitude of the gravity anomalies that we seek to detect (image) are often less than 1 mGal, so the magnitude of this gradient requires that we have high precision vertical control for the locations of our gravity stations. This requirement was once the major barrier to conducting gravity surveys because traditional surveying methods to establish locations are costly and time-consuming, and the number of established benchmarks and other accurately surveyed locations in an area is usually small. However, the emergence of the Global Positioning System (GPS) has revolutionized gravity studies from the viewpoint of data acquisition. Thanks to GPS, a land gravity station can be located almost anywhere. However, the care that must be exercised to obtain GPS locations routinely with submeter vertical accuracy should not be underestimated. One aspect of the variation of gravity with elevation is called the Free Air effect. This effect is due only
447
to the change of elevation, as if the stations were suspended in free air, not sitting on land. The vertical gradient of gravity is derived by differentiation with respect to Re . Higher order terms are usually dropped and yield gradients that are not a function of latitude or elevation (0.3086 mGal/m or 0.09406 mGal/ft). However, this approach does pose some problems (8). Once gravity values have been established and their locations are accurately determined, the Free Air correction can be calculated by choosing an elevation datum and simply applying the following equation: Free Air Correction = FAC = 0.3086 h,
(4)
where h = (elevation − datum elevation). Then, the Free Air anomaly (FAA) is defined as FAA = Gobs − Gt + FAC,
(5)
where Gobs is the observed gravity corrected for drift and tides. Bouguer Correction The mass of material between the gravity station and the datum also causes a variation of gravity with elevation (Fig. 2). This mass effect makes gravity at higher stations higher than that at stations at lower elevations and thus partly offsets the Free Air effect. To calculate the effect of this mass, a model of the topography must be constructed, and its density must be estimated. The traditional approach is crude but has proven to be effective. In this approach, each station is assumed to sit on a slab of material that extends to infinity laterally and to the elevation datum vertically (Fig. 2). The formula for the gravitational attraction of this infinite slab is derived by employing a volume integral to calculate its mass. The resulting correction is named for the French geodesist Pierre Bouguer: Bouguer Correction (BC) = 2π γρh,
(6)
where γ is the International gravitational constant (γ = 6.67 × 10−11 m3 kg−1 s−2 ), ρ is the density, and h = (elevation − datum elevation). As discussed later, the need to estimate density for the calculation of the Bouguer correction is a significant source of uncertainty in gravity studies. Then, the Bouguer anomaly (BA) is defined as BA = Gobs − Gt + FAC − BC
(7)
where Gobs is the observed gravity corrected for drift and tides. If terrain corrections (see later) are not applied, the term simple Bouguer anomaly is used. If terrain corrections have been applied, the term complete Bouguer anomaly is used. A second-order correction to account for the curvature of the earth is often added to this calculation (9).
448
GRAVITATION IMAGING
Terrain Correction Nearby topography (hills and valleys) attracts the mass in the gravimeter (valleys are considered to have negative density with respect to the surrounding rocks) and reduces the observed value of gravity. The terrain correction is the calculated effect of this topography, and it is always positive (a hill pulls up on the mass in the gravimeter and a valley is a mass deficiency). In mountainous regions, these corrections can be as large as 10s of mGals. The corrections have traditionally been made by using Hammer charts (10) to estimate the topographic relief by dividing it into compartments. There have been a number of refinements to this approach as it has been increasingly computerized (8,11,12), but the basic idea has remained unchanged. The use of digital terrain models to calculate terrain corrections has led to a nomenclature of inner zone corrections (calculated by hand using Hammer charts) and outer zone corrections (calculated using terrain models). Unfortunately, radius from the gravity reading that constitutes the divide between inner and outer zones has not been standardized, but this distance has typically been 1–5 km. However, the increasing availability of high-resolution, digital terrain data is on the verge of revolutionizing the calculation of terrain corrections. Many new approaches are being developed, but the general goal is the same: to construct a detailed terrain model and calculate the gravitational effect of this terrain on individual gravity readings. These approaches can also be considered as having replaced the Bouguer slab approximation by a more exact calculation because the goals of the Bouguer and topographic corrections are to estimate the gravitational effect of the topography above the elevation datum to a large radius from the gravity station. This radius is commonly chosen to be 167 km. ¨ os ¨ Correction Eotv Technological advances have made it possible to measure gravity in moving vehicles such as boats and aircraft. However, the motion of the gravimeter causes variations in the centrifugal acceleration and thus the gravitational attraction. This variation is linearly related to the velocity ν of the gravimeter. The correction for this effect, named for the Hungarian geophysicist R. E¨otv¨os, is positive when the gravimeter is moving westward and negative when it is moving eastward. Navigational data from the survey are used to calculate ν, and the equation for the correction is as follows: E¨otv¨os correction (mGal) (EC) = 7.503ν cos λ sin α + 0.004154 ν 2 ,
been achieved, and we say that the area is compensated. Thus, we think that the excess mass represented by a mountain range is compensated for by a mass deficiency at depth. The tendency toward isostatic balance makes regional Bouguer gravity anomalies substantially negative over mountains and substantially positive over oceanic areas. These large-scale anomalies mask anomalies due to shallow (upper crustal) geologic features (13). The delineation of upper crustal features is often the goal of gravity studies. Thus, various techniques have been proposed to separate and map the effects of isostatic equilibrium. The isostatic corrections calculated by these techniques attempt to estimate the gravitational effects of the masses that compensate for topography and remove them from the Free Air or Bouguer anomaly values. A popular approach is calculation of the isostatic residual (14). Wavelength and Fourier Analysis Gravitational imaging usually involves digital processing based on Fourier analysis (3). Most texts on this subject deal with time series [1-dimensional, f (t)], so it is important to clarify the terminology used in the 2-D spatial domain [f (x, y)] of gravitational imaging. Thus, instead of dealing with period and frequency, we deal with wavelength λ in the spatial domain and wave number (k = 2π/λ) in the spatial frequency domain. Regional Gravity Anomaly Gravity anomalies whose wavelengths are long relative to the dimensions of the geologic objectives of a particular investigation are called regional anomalies. Because some shallow geologic features such as broad basins have large lateral dimensions, one has to be careful, but it is thought that regional anomalies usually reflect the effects of relatively deep features. Local (residual) Gravity Anomaly Gravity anomalies whose wavelengths are similar to the dimensions of the geologic objectives of a particular investigation are called local anomalies. In processing gravity data, it is usually preferable to attempt to separate the regional and local anomalies before interpretation. A regional anomaly can be estimated by employing a variety of analytical techniques. The simple difference between the observed gravity anomalies and the interpreted regional anomaly is called the residual anomaly. IMAGING CAPABILITIES AND LIMITATIONS
(8) Availability of Gravity Data
where ν is in knots, α is the heading with respect to true north, and λ is the latitude (3). Isostatic Correction Isostasy can be thought of as the process in the earth that causes the pressure at some depth (most studies place this depth at 30 to 100 km) to be approximately equal over most regions. If this pressure is equal, isostatic balance has
An advantage of gravitational imaging is that a considerable amount of regional data is freely available from universities and governmental agencies throughout the world. However, the distribution of these data is often less organized than that of satellite imagery. As more detailed data are needed, commercial firms can provide this product in many areas of the world. Finally, relative to most geophysical techniques, the acquisition of
GRAVITATION IMAGING
land gravity data is very cost-effective. The determination of the precise location of a gravity station is in fact more complicated than making the actual gravity measurement. Thus, a good approach to producing a database of gravity measurements for a project is use a public domain data set such as the U.S. data provided by Dater et al. (6; http://www.ngdc.noaa.gov/seg/fliers/se-0703. shtml) and the Australian data set provided by the Australian Geological Survey Organization (http://www.agso. gov.au/geophysics/gravimetry/ngdpage.html). Then, if more detailed data are needed, one should contact a commercial geophysical firm via organizations such as the Society of Exploration Geophysicists and the European Association of Geoscientists and Engineers. Finally, field work to obtain new data in the area of a specific target may be required. The Role of Density Knowledge of the density of various rock units is essential in gravity studies for several reasons. In fact, a major limitation in the quantitative interpretation of gravity data is the need to estimate density values and to make simplifying assumptions about the distribution of density in the earth. The earth is complex, and the variations in density in the upper few kilometers of the crust are large. Thus, the use of a single average density in Bouguer and terrain corrections is a major source of uncertainty in calculating of values for these corrections. This fact is often overlooked as we worry about making very precise measurements of gravity and then calculate anomaly values whose accuracy is limited by our lack of detailed information on density. A basic step in reducing gravity measurements to interpretable anomaly values is calculating the Bouguer correction, which requires an estimate of density. At any specific gravity station, one can think of the rock mass whose density we seek as a slab extending from the station to the elevation of the lowest gravity reading in the study area (Fig. 2). If the lowest station is above the datum (as is usually the case), each station shares a slab that extends from this lowest elevation down to the datum, so that this portion of the Bouguer correction is a constant shared by all of the stations (Fig. 2). No one density value is truly appropriate, but when using the tradition approach, it is necessary to use one value when calculating Bouguer anomaly values. When in doubt, the standard density value for upper crustal rocks is 2670 kg m−3 . To make terrain corrections, a similar density estimate is needed. However in this case, the value sought is the average density of the topography near a particular station. It is normal to use the same value as used in the Bouguer correction, but this need not necessarily be so for complex topography and geology. As mentioned in the discussion of the Bouguer correction, modern digital elevation data make it possible to construct realistic models of topography that include laterally varying density. Although preferable, this approach still requires estimating the density of the column of rock between the earth’s surface and the reduction datum. From a traditional view point, this
449
approach represents merging the Bouguer and terrain corrections and then applying them to Free Air anomaly values. One can also extend this approach to greater depths, vary the density laterally, and consider it a geologic model of the upper crust that attempts to predict Free Air anomaly values. Then, the Bouguer and terrain corrections become unnecessary because the topography simply becomes part of the geologic model that is being constructed. When one begins to construct computer models based on gravity anomalies, densities must be assigned to all of the geologic bodies that make up the model. Here, one needs to use all of the data at hand to come up with these density estimates. Geologic mapping, drill hole data, and measurements on samples from the field are examples of information one might use to estimate density. Measurements of Density Density can be measured (or estimated) in many ways. In general, in situ measurements are better because they produce average values for fairly large bodies of rock that are in place. Using laboratory measurements, one must always worry about the effects of porosity, temperature, saturating fluids, pressure, and small sample size as factors that might make the values measured unrepresentative of rock in place. Many tabulations of typical densities for various rock types have been compiled (15,16) and can be used as guides to estimate density. Thus, one can simply look up the density value for a particular rock type (Table 1). Samples can be collected during field work and brought back to the laboratory for measurement. The density of cores and cuttings available from wells in the region of interest can also be measured. Most wells that have been drilled while exploring for petroleum, minerals, and water are surveyed by down-hole geophysical logging techniques, and these geophysical logs are a good source of density values. Density logs are often
Table 1. Typical Densities of Common Types of Rocka Type of rock Volcanic ash Salt Unconsolidated sediments Clastic sedimentary rocks Limestone Dolomite Granite Rhyolite Anorthosite Syenite Gabbro Eclogite Crystalline upper crust Lower crust Upper mantle
Density (kg m−3 ) 1800 2000 2100 2500 2600 2800 2650 2500 2750 2750 2900 3400 2750 3000 3350
a The effects of porosity, temperature, saturating fluids, and pressure cause variations in these values of at least ±0.100 kg/cm−3
450
GRAVITATION IMAGING
available and can be used directly to estimate the density of rock units encountered in the subsurface. However in many areas, sonic logs (seismic velocity) are more common than density logs. In these areas, the Nafe–Drake or a similar relationship between seismic velocity and density (17) can be used to estimate density values. The borehole gravity meter is an excellent (but rare) source of density data. This approach is ideal because it infers density from down-hole measurements of gravity. Thus, these measurements are in situ averages based on a sizable volume of rock, not just a small sample. The Nettleton technique (18) involves picking a place where the geology is simple and measuring gravity across a topographic feature. Then, one calculates the Bouguer gravity anomaly profile using a series of density values. If the geology is truly simple, the gravity profile will be flat when the right density value is used in the Bouguer and terrain corrections. One can also use a group of gravity readings in an area and simply find the density value where the correlation between topography and Bouguer anomaly values disappears.
−107° 00'
−106° 00' 41° 00' 0
km
50
North Park
40° 00' −160 −200 −250 −300 −350 −400
Contour interval: 5 mGals
South Platte River South Park Arkansas River
39° 00'
Construction and Enhancement of Gravitational Images The techniques used to separate regional and local gravity anomalies take many forms and can all be considered as filtering in a general sense (3). Many of these techniques are the same as those employed in enhancing traditional remote sensing imagery. The process usually begins with a data set consisting of Free Air or, more likely, Bouguer anomaly values, and the first step is to produce an anomaly map such as that shown in Figure 3. Gridding The initial step in processing gravity data is creating a regular grid from the irregularly spaced data points. This step is required even in creating a simple contour map, and in general purpose software, it may not receive the careful attention it deserves because all subsequent results depend on the fidelity of this grid as a representation of the actual data. On land, gravity data tend to be very irregularly spaced and have areas of dense data and areas of sparse data. This irregularity is often due to topography; mountainous areas are generally more difficult to enter than valleys and plains. It may also be due to difficulty in gaining access to private property and sensitive areas. Measurements of marine data are dense along the tracks that the ships follow but have relatively large gaps between tracks. Airborne and satellite gravity measurements involve complex processing that is beyond the scope of this discussion. However once these data are processed, the remainder of the analysis is similar to that of land and marine data. A number of software packages have been designed for processing gravity (and magnetic) data, and several gridding techniques are available in these packages. The minimum curvature technique (19) works well and is illustrative of the desire to respect individual data points as much as possible while realizing that gravitational images have an inherent smoothness due to the behavior of the earth’s gravity field. In this technique, the data points that surround a particular grid node are selected. A surface is fitted to these data that satisfies the criteria of minimum curvature between them, and then, the value on this surface at the node is determined. One can intuitively conclude that the proper grid interval is approximately the mean spacing between readings in an area. A good gridding routine should respect individual gravity values and not produce spurious values in areas of sparse data. Once the gridding is complete, the grid interval (usually 100s of meters) can be thought of as being analogous to the pixel interval in remote sensing imagery. Filtering
−107° 00'
−106° 00'
Figure 3. Bouguer gravity anomaly image of a portion of central Colorado. The colors produce an image in which the lowest anomalies are violet and the highest ones are red. As in this case, contour lines are often superimposed on the colors to provide precise anomaly values. The red dashed line is the gravity profile modeled in Fig. 6. See color insert.
The term filtering can be applied to any of the various techniques (quantitative or qualitative) that attempt to separate anomalies on the basis of their wavelength and/or trend (3) and even on the basis of their geologic origin such as isostatic adjustment [i.e., isostatic residual anomaly; (14)]. The term separate is a good intuitive one because the idea is construct an image (anomaly map) and then use filtering to separate anomalies of interest to the interpreter from other interfering anomalies (see regional
GRAVITATION IMAGING
−107° 00'
−106° 00'
−107° 00'
−106° 00'
41° 00' 0
km
451
41° 00'
50
0
km
50
North Park North Park 40° 00' 80 68
40° 00'
48 28 8 −2 −22 −42
Contour interval: 5 mGals
Arkansas River South Park
45 35
South Platte River
25
39° 00'
15
Arkansas River
5 −5 −15
−107° 00'
−106° 00'
Figure 4. Gravity anomaly image formed by applying a 10–150 km (wavelength) band-pass filter to the values shown in Fig. 3. See color insert.
South Platte River
39° 00'
−25
Contour interval: 5 mGals
−107° 00'
versus local anomalies earlier). In fact, fitting a low-order polynomial surface (third-order is used often) to a grid to approximate the regional anomaly is a common practice. Subtracting the values that represent this surface from the original grid values creates a residual grid that represents the local anomalies. In gravity studies, the familiar concepts of high-pass, low-pass, and band-pass filters are applied in either the frequency or spatial domains. In Figs. 4 and 5, for example, successively longer wavelengths have been removed from the Bouguer anomaly map shown in Fig. 3. At least to some extent, these maps enhance anomalies due to features in the upper crust at the expense of anomalies due to deep-seated features. Directional filters are also used to select anomalies on the basis of their trends. In addition, a number of specialized techniques developed to enhance images of gravity data based on the physics of gravity fields are discussed later. The various approaches to filtering can be sophisticated mathematically, but the choice of filter parameters or design of the convolution operator always involves a degree of subjectivity. It is useful to remember that the basic steps in enhancing an image of gravity anomalies to emphasize features in the earth’s crust are (1) First remove a conservative regional trend from the data. The choice of a regional trend is usually not critical but may greatly help in interpretations (14). The goal is to remove long wavelength anomalies, so this step consists of applying a broad high-pass filter.
South Park
−106° 00'
Figure 5. Gravity anomaly image formed by applying a 10–75 km (wavelength) band-pass filter to the values shown in Fig. 3. See color insert.
Over most continental areas, Bouguer anomaly values are large negative numbers; thus, the usual practice of padding the edges of a grid with zeros before applying a Fourier transform and filtering will create large edge effects. One way to avoid this effect is first to remove the mean in the data and grid an area larger than the image to be displayed. However, in areas where large regional anomalies are present, it may be best to fit a loworder polynomial surface to the gridded values and then continue the processing using the residual values with respect to this surface. (2) Then, one can apply additional filters, as needed, to remove unwanted wavelengths or trends. In addition to the usual wavelength filters, potential theory (3) has been used to derive a variety of specialized filters. Upward continuation. A process (low-pass filter) though which a map, simulating the result as if the survey had been conducted on a plane at a higher elevation, is constructed. This process is based on the physical fact that the further the observation is from the body that causes the anomaly, the broader the anomaly. It is mathematically stable because
it involves extracting long-wavelength from shortwavelength anomalies. Downward continuation. A process (high-pass filter) through which a map, simulating the result as if the survey had been conducted on a plane at a lower elevation (and nearer the sources), is constructed. In theory, this process enhances anomalies due to relatively shallow sources. However, care should be taken when applying this process to anything but very clean, densely sampled data sets, because of the potential for amplifying noise due to mathematical instability. Vertical derivatives. In this technique, the vertical rate of change of the gravity field is estimated (usually the first or second derivative). This is a specialized high-pass filter, but the units of the resulting image are not milliGals, and they cannot be modeled without special manipulations of the modeling software. As in downward continuation, care should be taken when applying this process to anything but very clean data sets, because of the potential for amplifying noise. This process has some similarities to nondirectional edge-enhancement techniques used to analyze remote sensing images. Strike filtering. This technique is directly analogous to the directional filters used to analyze remote sensing images. In gravity processing, the goal is to remove the effects of some linear trend for a particular azimuth. For example, in much of the central United States, the ancient processes that formed the earth’s crust created a northeast trending structural fabric that is reflected in gravity maps in the area and can obscure other anomalies. Thus, one might want to apply a strike-reject filter that deletes linear anomalies whose trends (azimuths) range from N30 ° E to N60 ° E. Horizontal gradients. In this technique, faults and other abrupt geologic discontinuities (edges) are detected from the high horizontal gradients that they produce. Simple difference equations are usually employed to calculate the gradients along the rows and columns of the grid. A linear maximum in the gradient is interpreted as a discontinuity such as a fault. These features are easy to extract graphically for use as an overlay on the original gravity image or on products such as Landsat images. Computer Modeling Image processing and qualitative interpretation in most applications of gravitational imaging are followed by quantitative interpretation in which a profile (or grid) of anomaly values is modeled by constructing an earth model whose calculated gravitational effect closely approximates the observed profile (or grid). Modeling profiles of gravity anomalies has become commonplace and should be
Bouguer anomaly (mGal)
GRAVITATION IMAGING
0
20
40
60
80
100
−200 −250
Calculated values
−300 Observed values
−350
SOUTH PARK BASIN
Datum 0
Basin fill (2300 kg m− 3) Depth (km)
452
Crystalline basement (2750 kg m− 3 )
10 20
Deep crustal feature (3050 kg m− 3) 30 0
20
40
60
80
100
Figure 6. Computer model for a profile of gravity values that is shown in Fig. 3. Density values for the elements of the model are given in kg m−3 .
considered a routine part of any subsurface investigation. For example, a model for a profile across Fig. 3 is shown in Fig. 6. In its simplest form, the process of constructing an earth model is one of trial and error iteration in which one’s knowledge of the local geology, data from drill holes, and other data such as seismic surveys are valuable constraints in the process. As the modeling proceeds, one must make choices about the density and geometry of the rock bodies that make up the model. In the absence of any constraints (which is rare), the process is subject to considerable ambiguity because many subsurface structural configurations can fit the observed data. Using some constraints, one can usually feel that the process has yielded a very useful interpretation of the subsurface. However, ambiguities will always remain just as they do in all other geophysical techniques aimed at studying subsurface structure. Countless published articles document a wide variety of mathematical approaches to computer modeling of gravity anomalies (3). However, a very flexible and easy approach is used almost universally for the two-dimensional case (i.e., modeling profiles drawn perpendicular to the structural grain in the area of interest). This technique is based on the work of Hubbert (20), Talwani et al. (21), and Cady (22), although many groups have written their own versions of this software that are increasingly effective graphical interfaces and output. The original computer program was published by Talwani et al. (21), and Cady (22) was among the first to introduce an approximation (called 2 1/2-D) that allows a certain degree of three dimensionality. In the original formulation by Hubbert (20,29), the earth model was composed of bodies of polygonal cross section that extended to infinity in and out of the plane of the profile of gravity readings. In the 2 1/2-D formulation, the bodies can be assigned finite strike
GRAVITATION IMAGING
lengths in both directions. Today, anyone can have a 2 1/2-D model running on his or her PC. The use of three-dimensional approaches is not as common as it should be because of the complexity of constructing and manipulating the earth model. However, there are many 3-D approaches available (3). As discussed earlier, full 3-D calculation of the gravitational attraction of the topography using a modern digital terrain model is the ultimate way to calculate Bouguer and terrain corrections and to construct earth models. This type of approach will be employed more often in the future as terrain data and the computer software needed become more readily available. Gravity modeling is an ideal field in which to apply formal inverse techniques. This is a fairly complex subject mathematically. However, the idea is to let the computer automatically make the changes in a starting earth model that the interpreter constructs. Thus, the interpreter is saved from tedious ‘‘tweaking’’ of the model to make the observed and calculated values match. In addition, the thinking is that the computer will be unbiased compared to a human. The process can also give some formal estimates of the uncertainties in the interpretation. Inverse modeling packages are readily available and can also run on PCs. A free source of programs for modeling gravity anomalies by a variety of PC-based techniques is at http://crustal.usgs.gov/crustal/geophysics/index.html.
interpret. A particularly nice aspect of the gravity technique is that the instrumentation and interpretative approaches employed are mostly independent of the scale of the investigation. Thus, gravitational imaging can be employed in a wide variety of applications. Very small changes in gravity are even being studied as indicators of the movement of fluid in the subsurface. In addition, images of gravity anomalies are ideal candidates for layers in a Geographic Information System (GIS), and typical image processing software provides a number of techniques to merge gravity anomaly data with data sets such as Landsat images. For example, Fig. 7 was constructed by merging Landsat and Bouguer gravity anomaly images for a portion of the Sangre de Cristo Mountains region in northern New Mexico. The merged image shows that low gravity anomaly values are found under a portion of the range in which dense rocks are exposed at the surface in a structural high, a situation that should produce a gravity high. This contradiction poses a major geologic and geophysical challenge for efforts to understand the tectonic evolution of this portion of the Rocky Mountains. The regional geophysics section of the Geological Survey of Canada and the western and central regions of the U.S. Geological Survey maintain the following web sites that include case histories demonstrating applications of gravitational imaging, data sets, and free software: (http://gdcinfo.agg.emr.ca/toc.html?/app/bathgrav/ introduction.html) (http://wrgis.wr.usgs.gov/docs/gump/gump.html) (http://crustal.usgs.gov/crustal/geophysics/index.html)
APPLICATIONS As discussed earlier, gravity data are widely available and relatively straightforward to gather, process, and
250
250
210 215 220 225 230 235 240
245
255
453
0
22 5
21
Figure 7. Merged Landsat and Bouguer anomaly images of a portion of the Sangre de Cristo Mountains in northern New Mexico. The most negative values are less than −255 mGal and are colored purple. See color insert.
454
GRAVURE MULTI-COPY PRINTING
One example of the application of gravitational imaging is the use of gravity anomalies to delineate the geometry and lateral extent of basins that contain groundwater. The sedimentary rocks that fill a basin have low densities and thus produce a negative gravity anomaly that is most intense where the basin is deepest (Fig. 6). Gravity modeling is used to provide quantitative estimates of the depth of the basin and thus the extent of the potential water resource. In the search for hydrocarbons, gravity data are used in conjunction with other geophysical data to map out the extent and geometry of subsurface structures that form traps for migrating fluids. For example, salt is a low-density substance whose tendency to flow often creates traps. Regions where salt is thick thus represent mass deficiencies and are associated with negative gravity anomalies. Gravity data have been used to detect and delineate salt bodies from the very early days of geophysical exploration to the present. A fault is another example of a structure that often acts as a trap for hydrocarbons, and these structures can be located by the gravity gradients that they produce. Gravity data can often delineate ore bodies in the exploration for mineral resources because these bodies often have densities different from the surrounding rock. Images of gravity anomalies also reveal structures and trends that may control the location of ore bodies, even when the bodies themselves produce little or no gravity anomaly. Studies of geologic hazards often rely on images of gravity anomalies to detect faults and evaluate their sizes. Gravity anomalies can also be used to help study the internal plumbing of volcanoes. High precision surveys can even detect the movement of magma in some cases.
ABBREVIATIONS AND ACRONYMS BA BC EC FAA FAC GIS GPS PC
Bouguer anomaly Bouguer correction E¨otv¨os correction Free Air anomaly Free Air correction Geographic Information System Global Positional System Personal Computer
6. D. Dater, D. Metzger, and A. Hittelman, compilers, Land and Marine Gravity CD-ROMs: Gravity 1999 Edition on 2 CDROMs: U.S. Department of Commerce, National Oceanic and Atmospheric Administration, National Geophysical Data Center, Boulder, CO. Web site http://www.ngdc.noaagov/seg/fliers/se-0703.shtml 1999. 7. W. A. Heiskanen and H. Moritz, Physical Geodesy, W. H. Freeman, New York 1967. 8. T. R. LaFehr, Geophysics 56, 1170–1178 (1991). 9. T. R. LaFehr, Geophysics 56, 1179–1184 (1991). 10. S. Hammer, Geophysics 4, 184–194 (1939). 11. L. J. Barrows and J. D. Fett, Geophysics 56, 1061–1063 (1991). 12. D. Plouff, Preliminary documentation for a Fortran program to compute gravity terrain corrections based on topography digitized on a geographic grid: U.S. Geological Survey OpenFile Report 77-535, Menlo Park, CA 1977. 13. T. D. Bechtel, D. W. Forsyth, and C. J. Swain, Geophys. J. R. Astron. Soc. 90, 445–465 (1987). 14. R. W. Simpson, R. C. Jachens, R. J. Blakely, and R. W. Saltus, J. Geophys. Res. 91, 8348–8372 (1986). 15. W. M. Telford, L. P. Geldart, and R. E. Sheriff, Applied Geophysics, 2nd ed., Cambridge University Press, Cambridge, 1990, pp. 6–61. 16. R. S. Carmichael, ed., CRC Handbook of Physical Properties of Rocks, vol. III. CRC Press, Boca Raton, FL, 1984. 17. P. J. Barton, Geophys. J. R. Astron. Soc. 87: 195–208 (1986). 18. L. L. Nettleton, Geophysics 4, 176–183 (1939). 19. I. C. Briggs, Geophysics 39, 39–48 (1974). 20. M. K. Hubbert, Geophysics 13, 215–225 (1948). 21. M. Talwani, J. L. Worzel, and M. Landisman, J. Geophys. Res. 64, 49–59 (1959). 22. W. J. Cady, Geophysics 45, 1507–1512 (1980). 23. J. Milsom, Field Geophysics: Geological Society of London Handbook, Halsted Press, NY, 1989. 24. E. S. Robinson and C. Coruh, Basic Exploration Geophysics, John Wiley, NY, 1988. 25. A. E. Mussett and M. A. Khan, Looking into the Earth, Cambridge University Press, Cambridge, 2000.
GRAVURE MULTI-COPY PRINTING BARRY LEE Rochester Institute of Technology School of Printing Management and Sciences, RIT Rochester, NY
BIBLIOGRAPHY 1. G. P. Woollard, Geophysics 44, 1352–1366 (1979). 2. National Imagery and Mapping Agency, 1997, Department of Defense World Geodetic System 1984: Its definition and relationship with local geodetic systems: NIMA TR8350.2, 3rd Edition, 4 July 1997, Bethesda, MD. Also available to download from their web site http://164.214.2.59/GandG/pubs.html 3. R. J. Blakely, Potential Theory in Gravity and Magnetic Applications, Cambridge University Press, Cambridge, 1996. 4. I. M. Longman, J. Geophys. Res. 64, 2351–2355 (1959). 5. C. Morelli, ed., The International Gravity Standardization Net, 1971, International Association of Geodesy Special Publication, Paris, 1974, 4.
INTRODUCTION Gravure, also known as rotogravure, is an intaglio (from the Italian word ‘‘intagliare’’ meaning to engrave) printing process. The intaglio printing processes are characterized by printing plates (image carriers) whose images have been etched or engraved into a hard surface. To print from an intaglio image carrier, the recessed images must be flooded, or filled with ink, and the surface of the image carrier must be cleared of ink, usually by a metal wiping blade known as a ‘‘doctor blade.’’ Then, the paper or other substrate is pressed against the intaglio image carrier,
GRAVURE MULTI-COPY PRINTING
and the resulting contact between substrate and ink-filled image areas causes the ink to transfer from the image carrier to the substrate. Intaglio images are engraved or etched into a hard, flat surface and then filled with ink. The ink used for intaglio printing often has a high viscosity, the same consistency as paste. The nonimage surface of the intaglio image carrier is then cleared of all excess ink by the wiping action of a thick metal doctor blade. Gravure printing has evolved from early intaglio printing processes and has been adapted to a rotary printing process capable of higher resolution, higher speeds, and greater production capacity than the traditional intaglio processes. The primary differences between gravure and printing from a conventional intaglio plate involve the type of ink used for each process and the variation of the image carrier. The gravure printing process uses a much more fluid, low viscosity ink. This ink has been adapted to dry quickly on a variety of substrates. To contain and control this fluid ink all images on a gravure image carrier have been etched or engraved in a series of tiny cells (Fig. 1). Typically the width of the individual cells that comprise a gravure image are between 30 microns and 300 microns, depending on the type of image being reproduced. On a single gravure image cylinder there may be type matter, halftone graphics, and solid block area images each comprised of hundreds of thousands of cells. In the gravure press, these cells are filled with ink and the non-image surface areas of the gravure cylinder are cleared of ink by a thin metal doctor blade. The low viscosity of the gravure ink also allows for easy cell filling in a rotary process. HISTORICAL DEVELOPMENT OF THE GRAVURE PROCESS The history of intaglio printing begins in Germany in the year 1446 A.D., when the first engraved plates were used for printing playing cards and other illustrations. These intaglio images were hand engraved into the surface of wood and later copper. In the mid-fifteenth century as
Web
Gravure cylinder Impression roller Doctor blade Ink fountain
Figure 1. The five essential elements are impression roller, web & controls, gravure cylinder, doctor blade, and ink fountain (courtesy of GAA).
455
Johannes Gutenberg was developing the relief process, letterpress, intaglio was being developed as a method of applying illustrations other than type, or textual matter. The invention of chemical etching in 1505 made intaglio plate imaging much easier and also improved the quality of the images to be engraved. Copper plates were covered with an acid-resistant coating and the intaglio-imaging artist would simply scrape the intended images into the resist coating, exposing the copper in the image areas. Eventually, for many applications, the high-quality images printed by intaglio were combined with the text pages of books printed by the letterpress process. The most famous example of this technique is the French Encyclop´edie, published from 1751 to 1755 by Denis Diderot. Encyclop´edie included several volumes of text and several volumes of intaglio illustrations (1). In 1783, Thomas Bell, a British textile printer, was granted the first patent for a rotary intaglio press. This press, the first of its kind, marked the beginnings of automated intaglio printing and evolution to the rotogravure process. Before the press patented by Bell, intaglio printing was a much more manual printing process. By contrast, the first rotary intaglio press allowed for a continuous nonstop sequence of all of the imaging functions necessary for intaglio printing: (1) Image areas are filled with ink. (2) A doctor blade cleans nonimage areas. (3) The Image is transferred to the substrate. Interestingly, and as evidence of the simplicity of the gravure process, this early press contained all of the components that are on gravure presses built today. Once the press had become automated, it remained for cylinder imaging techniques to improve. Afterall, scraping images by hand into an etch-resistant coating layer and then etching copper with acid does not lend itself to fast turnaround, high-quality imaging. ‘‘In 1826 JosephNi´ecephore Ni´epce produced the first photo-mechanical etched printing plate. He covered a zinc plate with a light sensitive bitumen and exposed through a copper engraving, which was made translucent by the application of oil’’ (1). The unexposed coating on the zinc was developed, and then the plate image areas were etched. The invention of photography in the late 1820s led to further improvements in the photomechanical image transfer process. The photomechanical intaglio imaging technique was revised and improved by William Henry Fox Talbot and English engraver J. W. Swan. Fox Talbot was responsible for two important discoveries that led directly to improved gravure imaging techniques: the halftone screening process and the light sensitivity of chrome colloids. The halftone screening process allowed printers to reproduce continuous tone images by converting those images to halftone dot patterns. In the 1860s, Swan, using the light sensitivity of chrome colloids, became the first to use carbon tissue as a resist for gravure etching. Carbon tissue ‘‘. . .was a gelatin resist coating on a light-sensitive material applied to the surface of the paper. After exposure the paper could be removed and the exposed coating applied to another surface, such as a metal plate — or plate cylinder’’ (2). After application to the gravure cylinder, the exposed coating, became the ‘‘stencil’’ for etching by iron perchloride solutions.
456
GRAVURE MULTI-COPY PRINTING
In 1860, French publisher Auguste Godchaux patented the first reel-fed gravure perfector press for printing on a paper web. Although the cylinders for this press were hand engraved, the patent and the 1871 English trade paper, The Lithographer, indicated another departure from traditional intaglio toward a gravure image; ‘‘The engraving of these copper cylinders. . . is not lined by the engraver. . . nor etched with acid. . . but is accomplished by a series of minute holes hardly discernable by the naked eye, but forming together the outline of the letters. . . to be printed’’ (1). The Godchaux press design in combination with the cellular (minute holes) structure of the images on the gravure cylinder are the first applications of what is known today as gravure printing. It is difficult to determine who can be credited with developing that most recognizable characteristic of the gravure image carrier, the gravure crosshatch screen. ‘‘Using dichromate gelatin as the sensitive coating on a copper or steel plate, Fox Talbot placed objects such as leaves and pieces of lace on the coating and exposed it to daylight’’ (3). No doubt the lace pattern would have broken the images into small ‘‘cells.’’ Another early application of the gravure screen pattern may have been developed in France. A French patent in 1857 by M. Bechtold describes an opaque glass plate inscribed with fine lines. This plate was exposed to the metal plate and then turned 90° and exposed again. Finally, the cylinder (or plate) was exposed through a diapositive. This technique produces a crosshatch pattern on gravure plates, which would provide a noncell or land area on gravure images. Karl Klietsch, also called Karel Klic was also among the first to realize and take advantage of gravure technology for production. Klic was a pioneer of the gravure printing process and often served as a high-priced consultant to those interested in his printing techniques. In March of 1886, Klic left his home in Vienna for England where he met Samuel Fawcett, a gravure cylinder engraver in the textile decorating business. Fawcett had a number of years of experience in engraving gravure cylinders; he had also spent years refining photographic imaging and engraving technology. Klic and Fawcett combined all of the components of the Thomas Bell textile press (which by this time had no doubt been refined) with the latest cylinder imaging techniques that had been patented by Fawcett. Together, they founded the Rembrandt Intaglio Printing Company in 1895. The Rembrandt Intaglio Printing Company was the most respected and progressive gravure printer of the era. The printing and cylinder imaging techniques were closely guarded secrets. In 1895, beautiful Rembrandt prints were sold in London art shops where they created quite a stir. Londoners were curious to know more about the process capable of producing high quality prints that sold at such low prices. Klic and the Rembrandt Intaglio Printing Company continued their policy of secrecy. Although the Rembrandt Company was printing from cylinders, representatives of the company always referred to the prints as coming from plates — yet another attempt to keep the true nature of their gravure printing techniques secret.
Inevitably, the closely guarded secrets of the Rembrandt Company became known to the outside printing world. In the early 1900s, an employee of the Rembrandt Company moved to the United States and brought the secrets of the Rembrandt Intaglio Printing Company with him. The techniques used for high quality gravure printing may have first come to the United States with the Rembrandt employee, out the first gravure equipment was manufactured in England and installed at the Van Dyke Company in New York in 1903. Soon thereafter, the Englishmen Hermann Horn and Harry Lythgoe installed another gravure press in Philadelphia. Horn and Lythgoe had been experimenting with gravure printing with fellow Englishmen Theodore Reich. Horn sent samples of their gravure prints to art dealers in America. One dealer in particular was so impressed with the prints that he invited Horn and Lythgoe to come to the United States with their gravure printing equipment. In 1904, Hermann Horn and Harry Lythgoe moved to Philadelphia and brought with them from England a small two-color gravure press. In 1913, the New York Times became the first newspaper to use gravure for printing Sunday’s The New York Times Magazine. Today, Sunday supplements like Parade and USA Weekend remain a stronghold of gravure printing. In the 1920s, the gravure printing method was first used to apply graphics on packaging. As the gravure printing process continued to grow in Europe and the United States, chemical cylinder imaging techniques improved. Tone variation was accomplished on the original gravure cylinders by a diffusion transfer method known as conventional gravure (Fig. 2). This method reproduced various tones by varying only the depths of etch of each cell. Later the two-positive method of cylinder etching made it possible to vary the depths of etch and the sizes of the cell. This technique provides an extended tone range because the two-positive cylinders deposit a variable size dot of variable ink film thickness. The only chemical etching methods still in use are called direct transfer systems. Direct transfer chemical etching is used for only a small percentage of the gravure cylinders produced today. The direct transfer method produces variable width cells. To a great extent, electromechanical cylinder imaging technology has replaced chemical imaging methods. Early in the 1950s, the German company, Hell, began developing an electromechanical method of engraving copper cylinders with a diamond stylus. The Hell machine first called the Scan-a-Graver and later called the HelioKlischograph, was the forerunner of modern graphic arts scanners. On an electromechanical engraver, the gravure cylinder is mounted in a precision lathe equipped with an engraving head, including the engraving stylus. On early model engravers, the lathe also carried a ‘‘copy drum’’ on which the continuous tone black-and-white copy to be reproduced was mounted. The Helio-Klischograph converts light energy reflected from the black-and-white copy to voltage. The resulting voltage is amplified and passed to an electromagnet that in turn vibrates a diamond-tipped stylus. The force of the diamond stylus
GRAVURE MULTI-COPY PRINTING
Continuous tone positive
Shadow area
Highlight area
Screen
457
• A gravure cylinder • An ink fountain and on some presses an ink applicator • A doctor blade, a doctor blade holder, and a doctor blade oscillating system • An impression roll • A dryer • Web handling equipment and tension systems Gravure Cylinders
Carbon tissue
Hardened gelatin
Paper backing Unhardened gelatin Paper backing
Hardened gelatin Unhardened gelatin
Cylinder
Acid
Cylinder Unetched "lands" Shadow Highlight
Figure 2. Conventional gravure cylinder making using carbon tissue. Image is transferred directly to cylinder surface and etched into it (courtesy of GAA).
is controlled electronically to cut cells of various size and depth depending on the copy requirements. Modern engravers accept all forms of digital input. Presently, the majority of gravure cylinders are imaged by electromechanical diamond engraving technology; however, many recent developments may gain acceptance in the future. Some of the new developments include laser engraving (used in metals other than copper), Electron Beam engraving in copper, and the use of polymer materials to replace the copper image cylinder. Gravure Printing Components A gravure printing press includes the following components (Fig. 1):
Sheet-fed gravure presses print from engraved copper plates, and web-fed gravure printing is done from a cylinder. In most gravure applications, the gravure printing unit can handle cylinders of varying circumference. This feature of gravure printing allows an infinitely variable repeat length in printing. There are two types of gravure cylinders, sleeve (or mandrel) cylinders and integral shaft cylinders. Sleeve cylinders are stored as shaftless hollow tubes and must be mounted on shafts before going to press, engraving, or copper plating. Although cheaper, lighter, and easier to handle than their counterpart, the integral shaft cylinders, the sleeve cylinders are not as accurate. Consequently, sleeve cylinders are used predominantly in packaging gravure and are more common on presses 40 in. or less in web width. Some gravure presses are designed to print from light-weight copper-plated phenolic resin sleeves. Integral cylinders are used on larger presses and when high quality printing is important. The integral cylinder is heavier and more expensive than the sleeve cylinders; however the integral cylinder is much more accurate. All publication gravure presses use integral cylinders. Both types of cylinders are reuseable and continue through the cylinder life cycle, electroplating and surface finishing, imaging, chrome plating, proofing and printing, stripping of old images. Copper is by far the most common material used for gravure cylinder imaging and gravure printing. After the steel cylinder bodies are formed, in the manufacture of gravure cylinders a thin (0.0002–0.0004 in.) base coat of copper is plated onto the steel cylinder body. This coat of copper is permanent and remains on the cylinder. In preparation for cylinder imaging, a face coat of copper is electroplated onto the base coat. The thickness of the face coat of copper varies from 0.004–0.050 in., depending on the requirements of the print job. Copper plating tanks use acid-based electrolytes that consist of copper sulfate, sulfuric acid, deionized water, and additives designed to affect the surface and hardness of the copper. The anode is made of copper that is removed during the plating process; the gravure cylinder is the cathode. The rate of deposition in a copper plating tank is from 0.003–0.005 in. per hour. After plating, the copper must be polished. Gravure cylinders may require grinding and polishing, or they may require only polishing. The cylinder is ground to correct the diameter, and it is polished to prepare the surface without changing its diameter. Cylinders are polished with polishing paste, polishing paper, lapping film, or polishing stones. Once the cylinder has been plated and
458
GRAVURE MULTI-COPY PRINTING
polished, it is ready to be imaged by electromechanical, laser or chemical means (see Gravure Cylinder Imaging). After imaging, the cylinders chrome plated to increase its life and to reduce the coefficient of friction of the cylinder surface. The electrolyte for chrome plating consists of chromic acid, sulfuric acid, deionized water, and small amounts of other additives. The rate of deposition for chrome plating is from 0.0003–0.0005 in. per hour, and the typical thickness for chrome plating is from 0.0002–0.0007 in. After chrome plating, the cylinder may be polished again and is then ready for cylinder proofing on the press. Following imaging and chrome application, a cylinder is sometimes proofed. If the cylinder proof reveals any inaccuracies, some corrections of the images can be made. If the imaged cylinder is accurate or if it is not proofed, it is then ready for the press. On press, the gravure cylinder may last for 10 million or more impressions. Commonly, the chrome cylinder covering will begin to show signs of wear after three to five million impressions, depending on the abrasiveness of the ink. When wear signs are noticed, the cylinder is removed from the press and can be rechromed and reinserted in the press. When the print run is completed, the cylinder is removed from the press and stored for future use, or the cylinder may reenter the life cycle. If the cylinder reenters the cycle the chrome plating and copper engraved images have to be removed before replating a fresh copper layer. The chrome layer can be removed by reverse electrolysis and then the copper image layer removed by lathe cutting, or both layers may be lathe cut. Ink Fountains/Ink Applicators The gravure ink fountain includes a reservoir of ink, a pumping and circulating system, and on some presses temperature control devices, filtration, and/or, an ink applicator. The function of the ink fountain is to provide a continuous supply of ink to the gravure cylinder. Gravure ink is a low viscosity combination of dissolved resin (the vehicle), solvent, and dispersed colorants (usually pigments). The fountain at each gravure unit contains a reservoir of ink that is pumped into the printing unit. The ink pumped to the printing unit is held close to the gravure cylinder, and the gravure cylinder is partially submerged in the ink. The ink fountain and the printing unit must be designed to allow complete filling of the gravure cells as the cylinder rotates in the ink bath. Some gravure presses are equipped with ink applicators. On presses using ink applicators, the ink is pumped to the applicator. The applicator is designed to flow ink onto and across the entire width of the cylinder. Excess ink flowing from the cylinder is captured by a ‘‘belly pan,’’ and then flows back to the ink reservoir. Gravure ink is continuously circulated and kept under constant agitation to ensure proper blending of the components. Doctor Blade There is some disagreement about the origin of the term ‘‘doctor blade.’’ According to the textbook, Gravure Process and Technology, ‘‘ The name ‘‘doctor blade’’ is derived
from wiping blades used on ‘‘ductor’’ rolls on flatbed letterpress equipment, and in common usage ductor became doctor.’’ (3). An excerpt from a nineteenth century London publication, The History of Printing, by the Society for Promoting Christian Knowledge, suggests a different origin: This important appendage to the machine is called the ‘‘doctor,’’ a name which has been thus oddly accounted for in Lancashire: when one of the partners in the firm by whom cylinder printing was originally applied was making experiments on it, one of the workmen, who stood by, said, ‘‘Ah! This is very well, sir, but how will you remove the superfluous colour from the surface of the cylinder?’’ The master took up a common knife which was near, and placing it horizontally against the revolving cylinder, at once showed its action in removing the colour, asking the workman, ‘‘What do you say to this?’’ After a little pause, the man said, ‘‘Ah, sir, you’ve doctored it,’’ thus giving birth to a name for the piece of apparatus. The most common material used for doctor blades is tempered spring steel. Gravure printers who use waterbased ink often use stainless steel to counteract the corrosiveness of water. Other materials such as brass, bronze, rubber and plastic have also been used. Blade thickness ranges from 0.002–0.020 in. The doctor blade is mounted into a doctor blade holder on press. The holder oscillates the blade back and forth across the gravure cylinder. The oscillation stroke is designed to minimize blade wear and to distribute blade wear evenly during the press run. Doctor blade oscillation also helps to remove any foreign particles that may be trapped between the blade and the cylinder. The doctor blade holder (Fig. 3) is adjustable to allow for adjustments to four aspects of blade to cylinder contact: (1) blade to impression roll nip distance, (2) angle of contact, (3) parallelism with the cylinder, (4) running contact pressure. A doctor blade is pressurized by mechanical, pneumatic, or hydraulic means. The rule of thumb when setting a doctor blade is that the pressure should be as light as possible and equal across the face of the cylinder. Impression Rolls The impression roll on a gravure print station is a hollow or solid steel roll covered with a seamless sleeve of synthetic rubber. The primary function of the impression roll is to provide the pressure necessary for the substrate to contact the ink in the gravure cell at the printing nip. The impression roll is pressed against the gravure cylinder by either pneumatic or hydraulic force. Because the surface of many substrates is rough or irregular, the impression roll pressure is sometimes as high as up to 250 pounds per square inch. The pressure on the impression roll is high enough to cause the rubber covering to compress and flatten as it travels through the printing nip. During normal press adjustments, the compressed area of the printing nip, appropriately called the flat, is measured and the nip pressure is adjusted accordingly. For most applications, the flat area should be one-half inch in width and equal across the width of the impression roll.
GRAVURE MULTI-COPY PRINTING
459
(b)
(a)
Figure 3. Doctor blade setup (courtesy of GAA).
Under ideal conditions, the rubber covering of the gravure impression roll is changed whenever the substrate is changed. For example, ink transfer from the cells to smooth substrates like polyethylene and polypropylene is maximized by using a softer rubber covering, say 65 durometer on a Shore A scale. By comparison, a paperboard substrate with a hard, rough, porous surface is best printed using a harder rubber covering, perhaps a 90 durometer on the Shore A scale. A printing defect unique to the gravure process known as ‘‘cell skipping, skip dot, or snowflaking’’ occurs when the substrate is so rough that some of the smaller gravure cells fail to contact some of the ‘‘valley’’ areas of the substrate. This problem is partially due to the fact that the ink in a gravure cell immediately forms a meniscus after being metered by the doctor blade. This phenomenon prevents contact between the ink in the gravure cell and the rough areas of the substrate. To address this problem, the Gravure Research Institute (today known as the Gravure Association of America) worked with private industry to develop Electro Static Assist (ESA). ESA was introduced in the 1960s and has become standard equipment on many gravure presses. The concept of ESA is to employ the principle of polarity and the attraction of opposite charges. ESA apples a charge to the impression roll and an opposite charge to the gravure cylinder. Just before the printing nip, the top of the ink’s meniscus is pulled slightly above the surface of the cell. As the cell enters the printing nip, the ink contacts the substrate, and capillary action completes the ink transfer. Unlike electrostatic printing, ESA does not cause the fluid ink to ‘‘jump’’ across an air gap; it merely aids transfer by providing contact between ink and substrate in the printing nip. Dryers The inks used for gravure printing are fast drying, and multicolor gravure printing is accomplished by dry trapping. Dry trapping means that even at the highest press speeds, one ink film must be applied and dried before the next ink film can be printed over it. To dry
trap while maintaining high press speed, the web travels through a dryer immediately after leaving the printing unit. The ink film must be dry enough so that it does not offset on any rolls with which it comes in contact. A dryer includes some type of heating element and high velocity air supply and exhaust units. The temperature of the dryer setting depends on the substrate being printed, the air flow on the web, the type of ink being dried, and the speed of the press. The dryer air flow is adjusted to create a slight negative pressure is by adjusting the exhaust air volume to exceed the supply volume slightly. By creating negative air pressure, the gravure printer eliminates the possibility that ink volatiles return into the pressroom. Web Handling Equipment and Tension Systems Most gravure printing is done on web-fed presses; limited production is done on sheet-fed gravure presses. In any web printing method, the substrate to be printed must be drawn from the supply roll and fed into the printing units at a controlled rate and under precise tension. Gravure press web handling is often separated into zones: the unwind, the infeed, the printing units, the outfeed, finishing units, the rewind. The unwind zone of a gravure press may be a single reel stand, a fixed position dual reel unwind, or a rotating turret dual reel stand. To prepare a roll for mounting on a single reel stand, the reel tender inserts a shaft into the roll’s core or positions the roll between two conical tapered reel holders. The roll is lifted by hoist and set into the frame of the reel stand. The reel stand includes an unwind brake that is linked to the shaft. The braking mechanism is designed to act against the inertia of the roll while the press is running to provide resistance against web pull and, in turn, tension on the web. Unwind brake systems are usually closed loop systems that work in combination with a dancer roll. As the dancer roll moves up or down because of tension variations, the dancer’s movement controls the amount of force applied by the unwind brake.
460
GRAVURE MULTI-COPY PRINTING
When the roll of material on a single position reel stand has run out, the press must be stopped while a new roll of material is positioned and spliced to the tail end of the expired roll. Fixed position dual reel stands and rotating turret reel stands are designed to increase productivity by allowing continuous operation of the press by a ‘‘flying splice’’ mechanism. While a roll is running in the active position of these types of dual unwind stands, the idle position is prepared with another roll of substrate. As the active roll of substrate runs out, the roll in the second position is automatically spliced to the moving web, thus providing a continuous press run. The infeed zone of a gravure press isolates the unwind from the printing sections. The infeed section is designed is to pull substrate from the reel on the unwind and to provide a consistent flow of web to the printing units. The tension from the unwind is provided by a pull against the brake tension as the web path travels through a nip between a rubber-covered draw roll and a metal driven roll. The speed of the metal driven roll can be varied to provide more or less tension to the printing units. Often the tension on the web is measured by transducers as it leaves the infeed, and the speed of the driven roll in the infeed is electronically tuned to match the printing unit tension requirements. The printing units of a gravure press influence web tension because of the significant amount of pressure in the printing nip. Consequently, the speed of the printing cylinders, the infeed drive roll, and the driven rolls on the outfeed must be adjusted to maintain uniform web tension. It is common practice to adjust the speed of the driven rolls at the infeed to turn at a speed that is 1/2% slower than that of the printing cylinders. The speed of the driven rolls at the rewind, or finishing zone, are commonly adjusted to a speed that is 1/2–1% faster than that of the printing cylinders. These adjustments help eliminate slack throughout the press despite the natural tendency of the web to elongate under tension. As the web exits the printing section of the press, it may be rewound, or it may go through one or more finishing units. Publication gravure presses include large folders that are designed to slit the web into multiple ribbons. The ribbons are then folded and trimmed to form the individual signatures of magazines or catalogs. A press used for folding carton printing might have a rotary or flatbed diecutting unit and a delivery section for finished materials. SHEET-FED GRAVURE Sheet-fed gravure presses are in limited use for printing high-quality folding cartons, art reproductions, posters, and proofs. The substrate supply for sheet-fed gravure is not in web form but in cut and stacked sheets. Most, but not all gravure production is done on web-fed presses. A sheet-fed gravure press includes an ink fountain, a gravure cylinder or a gravure plate, and a doctor blade mechanism. The press also includes a delivery section called a ‘‘feed table,’’ transfer cylinders with grippers for transporting individual sheets through the press, and a delivery table designed to restack printed sheets.
OFFSET GRAVURE The offset gravure process is used for graphic application on some products like medicine capsules, metal cans, and candies. An offset gravure printing unit includes all of the components of a typical gravure unit and also includes a rubber blanket image transfer cylinder. In offset gravure printing, images are printed from a gravure plate or cylinder onto a soft rubber blanket or transfer medium; the images are then offset from the blanket to the material to be printed. A variation of this process is called pad transfer. The transfer medium in pad transfer printing is a soft silicon ‘‘ball’’ that has been shaped to conform to the item being printed. A pad transfer printing unit includes a gravure plate image carrier, an ink flooding mechanism, and a doctor blade. Once the ink has filled the recesses of the image carrier and the nonimage areas have been cleaned with the doctor blade, the silicon ball is pressed against the image carrier, contacts the inked images, and transfers the images to the silicon ball. Then, the silicon ball is pressed to the material being printed, transferring the image.
GRAVURE CYLINDER IMAGING The methods used to etch or engrave images into copper for gravure cylinders have changed little since their inception (see The Historical Development of the Gravure Process). Today, there are two methods for placing cellular images are placed in copper for gravure cylinders; chemical etching and engraving by an electromechanically operated diamond stylus, or less frequently by a laser. The original chemical imaging process known as conventional gravure, also called diffusion etch, is no longer used. Conventional gravure used a diffusion transfer resist commonly referred to as carbon tissue. Exposing a carbon tissue first to a continuous tone film positive for tone work or vignettes, or a 100% black positive for line copy and text formed the stencil used for chemical etching. The second exposure was to a gravure screen. For tone copy or vignettes the carbon tissue material was light hardened with an infinite variation of hardness from little or no exposure effect, which would occur in the shadow areas of the film positive where little light is available during exposure, to nearly complete hardness in the highlight where most of the exposure light reaches the carbon tissue. After exposure to the continuous tone positive, the land areas that define the gravure cell are formed by exposure to a special gravure screen. The gravure screen is a crosshatch that contains between 100 and 200 lines per inch. When exposed to the carbon tissue, the areas of the stencil that would protect the land areas of the image and cell walls are light hardened. After exposure, the carbon tissue is inverted and applied to the surface of the gravure cylinder and then developed. The highlight areas of the carbon tissue are the hardest because they inhibit diffusion of the etchant; the shadow areas are less hard and provide less of a barrier to the etchant. Consequently, during the etching
GRAVURE MULTI-COPY PRINTING
process the etchant reaches the copper in the shadow areas first, the midtones later in the process, and finally the highlights, resulting in cells of varying depth but constant width. All cells are large and square for all percentages of tone — highlight to shadow — however, they varied in depth to create cell volume variability and thus, print with density variability. The depths of etch range from 5–10 microns in the highlight cells to 40–45 microns in the shadow cells. A second chemical imaging method, the two-positive or sometimes called lateral hard dot (also no longer being used) was developed in the United States. Imaging for the two-positive method was very similar to the imaging methods for conventional gravure but had one important difference. Rather than a gravure screen, a half-tone positive was used for the second exposure to the carbon tissue. Consequently, the two-positive method yielded cells that varied in both depth (due to the exposure to the continuous tone positive) and width (due to the exposure to the half-tone positive). Highlight cells were shallow, small, and round; shadow cells were large, deep, and square. Compared to the conventional method, the two-positive method provided increased land area between the highlight cells and deeper highlight cells, both of these factors added to the life of the gravure cylinder. The twopositive method never gained favor in Europe, probably because of the complexity of registering two film positives during exposure of the carbon tissue. The two-positive method of gravure cylinder imaging was the major method of gravure cylinder imaging in the United States before electro-mechanical methods were introduced in the early 1970s. The introduction of photographic polymer films in the 1950s marked the beginning of the direct transfer, sometimes called the single positive, method of gravure cylinder etching. In the direct transfer process, a cylinder is completely coated with a liquid photopolymer resist. The photopolymer dries on the cylinder — eventually to become the stencil for the etching process. After the photopolymer application, the light-sensitive photopolymer coated on the cylinder is exposed to a special gravure screened film positive consisting of halftones where required and screened solids and text. Exposure to ultraviolet light hardens the photopolymer (and it become insoluble) in the exposed nonimage areas. One direct transfer system offered by the Japanese company Think Laboratories is fully automated and can include a laser exposure unit that eliminates film exposure. Following exposure, the photopolymer is developed with water or dyed solvent, and the image areas are cleared of the photopolymer resist. The next step is etching usually by ferric chloride in an etching bath for 3 to 5 minutes. The direct transfer method of chemical imaging produces cells of variable width and constant depth (approximately 40–45 microns). The direct transfer method remains the only chemical imaging process presently used for gravure cylinders. The percentage of cylinders imaged chemically is estimated at less than 5% in the United States and with a slightly higher percentage worldwide.
461
ELECTROMECHANICAL ENGRAVING Early in the 1950s the German company, Hell, began developing an electromechanical method of engraving copper cylinders with a diamond stylus. The Hell machine, first called the Scan-a-Graver and later called the HelioKlischograph, was the forerunner of modern graphic arts scanners. On an electromechanical engraver, the gravure cylinder is mounted in a precision lathe equipped with an engraving head, including the engraving stylus. On early model engravers, the lathe also carried a ‘‘copy drum’’ on which the continuous tone black-and-white copy to be reproduced was mounted. The Helio-Klischograph converts light energy reflected from the black-and-white copy to voltage. The resulting voltage is amplified and passed to an electromagnet that, in turn, vibrates a diamond-tipped stylus. The force of the diamond stylus is controlled electronically to cut cells of various size and depth, depending on the copy requirements. Modern engravers accept all forms of digital input that has been scanned from original copy. The cells, cut by a true diamond, resemble the shape of the diamond and so vary in width and depth (Fig. 4). A highlight cell is cut from the very tip of the diamond and is relatively shallow and narrow. A shadow cell is cut with the nearly the entire diamond and is relatively deep and wide. Cells made by an electromechanical engraver vary in width between 20 and 220 microns. The depth of the cell is always a function of the shape of the diamond that cut it. By varying the speed of cylinder rotation during engraving and the amplitude of the pulsating diamond stylus, the engraver can cut cells of a compressed, normal (diamond shape), or elongated shape. By varying the cell shape, the engraver also varies the angle at which the cells are aligned on the cylinder. The approximate cell angles of 30° for a compressed cell, 45° for a normal cell, and 60° for an elongated cell help the engraver and printer avoid moir´e patterns in multicolor printing. Because the cells must be consistently precise in size and shape, control of the engraving stylus is critical. This fact has limited the speed of the engraver to approximately 5000 cells per second. The cell cutting speed of the engraver varies with the type of image elements and the screen ruling being cut. Average production speeds for an electromechanical engraver range from 3200–4500 cells per second. Presently, new engraving heads are being developed to double the rate of cell generation. To speed the imaging process when imaging a cylinder for publication gravure printing, an engraver is equipped with several engraving heads. In some applications, as many as 12 individual engraving heads, driven simultaneously by separate digital information, are used to image the various pages on one gravure cylinder. This technique can significantly reduce the time required to image a gravure cylinder; however, it can be used only when the image elements to be printed are separated by some nonimage boundaries as in magazine printing where each page is void of images in its margins. When gravure cylinders are imaged for applications
460
GRAVURE MULTI-COPY PRINTING
When the roll of material on a single position reel stand has run out, the press must be stopped while a new roll of material is positioned and spliced to the tail end of the expired roll. Fixed position dual reel stands and rotating turret reel stands are designed to increase productivity by allowing continuous operation of the press by a ‘‘flying splice’’ mechanism. While a roll is running in the active position of these types of dual unwind stands, the idle position is prepared with another roll of substrate. As the active roll of substrate runs out, the roll in the second position is automatically spliced to the moving web, thus providing a continuous press run. The infeed zone of a gravure press isolates the unwind from the printing sections. The infeed section is designed is to pull substrate from the reel on the unwind and to provide a consistent flow of web to the printing units. The tension from the unwind is provided by a pull against the brake tension as the web path travels through a nip between a rubber-covered draw roll and a metal driven roll. The speed of the metal driven roll can be varied to provide more or less tension to the printing units. Often the tension on the web is measured by transducers as it leaves the infeed, and the speed of the driven roll in the infeed is electronically tuned to match the printing unit tension requirements. The printing units of a gravure press influence web tension because of the significant amount of pressure in the printing nip. Consequently, the speed of the printing cylinders, the infeed drive roll, and the driven rolls on the outfeed must be adjusted to maintain uniform web tension. It is common practice to adjust the speed of the driven rolls at the infeed to turn at a speed that is 1/2% slower than that of the printing cylinders. The speed of the driven rolls at the rewind, or finishing zone, are commonly adjusted to a speed that is 1/2–1% faster than that of the printing cylinders. These adjustments help eliminate slack throughout the press despite the natural tendency of the web to elongate under tension. As the web exits the printing section of the press, it may be rewound, or it may go through one or more finishing units. Publication gravure presses include large folders that are designed to slit the web into multiple ribbons. The ribbons are then folded and trimmed to form the individual signatures of magazines or catalogs. A press used for folding carton printing might have a rotary or flatbed diecutting unit and a delivery section for finished materials. SHEET-FED GRAVURE Sheet-fed gravure presses are in limited use for printing high-quality folding cartons, art reproductions, posters, and proofs. The substrate supply for sheet-fed gravure is not in web form but in cut and stacked sheets. Most, but not all gravure production is done on web-fed presses. A sheet-fed gravure press includes an ink fountain, a gravure cylinder or a gravure plate, and a doctor blade mechanism. The press also includes a delivery section called a ‘‘feed table,’’ transfer cylinders with grippers for transporting individual sheets through the press, and a delivery table designed to restack printed sheets.
OFFSET GRAVURE The offset gravure process is used for graphic application on some products like medicine capsules, metal cans, and candies. An offset gravure printing unit includes all of the components of a typical gravure unit and also includes a rubber blanket image transfer cylinder. In offset gravure printing, images are printed from a gravure plate or cylinder onto a soft rubber blanket or transfer medium; the images are then offset from the blanket to the material to be printed. A variation of this process is called pad transfer. The transfer medium in pad transfer printing is a soft silicon ‘‘ball’’ that has been shaped to conform to the item being printed. A pad transfer printing unit includes a gravure plate image carrier, an ink flooding mechanism, and a doctor blade. Once the ink has filled the recesses of the image carrier and the nonimage areas have been cleaned with the doctor blade, the silicon ball is pressed against the image carrier, contacts the inked images, and transfers the images to the silicon ball. Then, the silicon ball is pressed to the material being printed, transferring the image.
GRAVURE CYLINDER IMAGING The methods used to etch or engrave images into copper for gravure cylinders have changed little since their inception (see The Historical Development of the Gravure Process). Today, there are two methods for placing cellular images are placed in copper for gravure cylinders; chemical etching and engraving by an electromechanically operated diamond stylus, or less frequently by a laser. The original chemical imaging process known as conventional gravure, also called diffusion etch, is no longer used. Conventional gravure used a diffusion transfer resist commonly referred to as carbon tissue. Exposing a carbon tissue first to a continuous tone film positive for tone work or vignettes, or a 100% black positive for line copy and text formed the stencil used for chemical etching. The second exposure was to a gravure screen. For tone copy or vignettes the carbon tissue material was light hardened with an infinite variation of hardness from little or no exposure effect, which would occur in the shadow areas of the film positive where little light is available during exposure, to nearly complete hardness in the highlight where most of the exposure light reaches the carbon tissue. After exposure to the continuous tone positive, the land areas that define the gravure cell are formed by exposure to a special gravure screen. The gravure screen is a crosshatch that contains between 100 and 200 lines per inch. When exposed to the carbon tissue, the areas of the stencil that would protect the land areas of the image and cell walls are light hardened. After exposure, the carbon tissue is inverted and applied to the surface of the gravure cylinder and then developed. The highlight areas of the carbon tissue are the hardest because they inhibit diffusion of the etchant; the shadow areas are less hard and provide less of a barrier to the etchant. Consequently, during the etching
GROUND PENETRATING RADAR
European companies have worked with partners in the United States to develop laser-engraving systems for cutting cells into copper. Theoretically, a laser could generate up to 100,000 cells per second, a great improvement over electromechanical methods. In practice, however the reflectivity of the copper cylinder surface significantly reduced the efficiency of the laser engraver. The laser light energy intended to ablate copper to engrave a cell is instead reflected from the cylinder. Attempts to replace copper with plastics or hardened epoxy for laser imaging have proven unsuccessful. To avoid the reflectivity problems associated with laser engraving of copper, the Max Daetwyler Corporation (MDC) developed a metal alloy with a ‘‘laser type-specific light absorption’’ (4). This system can reportedly engrave 70,000 cells per second.
• • • • • •
463
Countertops Vinyl flooring Candy and pill trademarking Stamps Cigarette filter tips Lottery Tickets
The strength of gravure derives from the simplicity of operation — fewer moving parts. This allows for a more stable and consistent production process. The improvements in gravure prepress have shortened cylinder lead times. Any successful breakthrough in plastic cylinder technology will make gravure competitive in short-run markets. BIBLIOGRAPHY
OVERVIEW OF TODAY’S GRAVURE INDUSTRY The gravure process is currently the third most common printing process used in the United States and the second most commonly used process in Europe and Asia. Three distinctly different market segments use the gravure process: publication, packaging, and specialty. Publication gravure presses are designed to print web widths up to 142 inches and run at speeds up to three thousand feet per minute. The maximum cylinder circumference used in a publication gravure press is 76 inches. A typical publication gravure press consists of eight printing units, four for each side of the web. Gravure printed publications include magazines, Sunday newspaper supplements, catalogs, and newspaper advertising inserts. Gravure packaging presses are designed to handle the specific substrates used in the packaging industry. Consequently, a packaging press is narrower and runs at a slower speed than a publication press. A packaging gravure press usually includes eight or more printing stations, and runs at a speed between 450 and 2000 feet per minute. The type of substrate printed or the speed of in-line finishing often limits press speeds. The gravure printing process can handle a wider range of substrates than any other printing process (with the possible exception of flexography). Gravure printed packaging products include folding cartons, usually printed on paperboard, flexible packaging, usually printed on polyethylene or polypropylene, and labels and wrappers. The most interesting and diverse segment of gravure printing is known as the product or specialty segment. The press speeds and the web widths used for the various products manufactured by specialty gravure printers are diverse. Depending on the substrate, speeds vary from 30 to 1000 feet per minute, and widths vary from less than 20 inches to 12 feet. The following list includes many of the specialty products printed by gravure: • • • •
Gift wrap Wallcoverings Swimming pool liners Shower curtains
1. M. O. Lilien, History of Industrial Gravure Printing up to 1920, Lund Humphries, London, 1972, pp. 3–24. 2. J. F. Romano and M. Richard, Encyclopedia of Graphic Communications, Prentice-Hall, Englewood Cliffs, NJ, 1998, pp. 361–368. 3. Gravure Process and Technology, Gravure Education Foundation and Gravure Association of America, 1998, pp. 17, 182–196, 259. 4. www.daetwyler.com LASERSTAR
GROUND PENETRATING RADAR LAWRENCE B. CONYERS University of Denver Denver, CO
INTRODUCTION Ground-penetrating radar (GPR) is a geophysical method that can accurately map the spatial extent of near-surface objects or changes in soil media and produce images of those features. Data are acquired by reflecting radar waves from subsurface features in a way that is similar to radar methods used to detect airplanes in the sky (1). Radar waves are propagated in distinct pulses from a surface antenna; reflected from buried objects, features or bedding contacts in the ground; and detected back at the source by a receiving antenna. As radar pulses are being transmitted through various materials on their way to the buried target feature, their velocity changes, depending on the physical and chemical properties of the material through which they are traveling. When the travel times of the energy pulses are measured and their velocity through the ground is known, distance (or depth in the ground) can be accurately measured, producing a three-dimensional data set. In the GPR method, radar antennas are moved along the ground in transects, and two-dimensional profiles of a large number of periodic reflections are created, producing a profile of subsurface stratigraphy and buried
464
GROUND PENETRATING RADAR
Ground surface
Buried living surface Buried pipe
Depth (meters)
0
1.0
2.0
0
5
adapted to many differing site conditions. In the past, it has been assumed that GPR surveys would be successful only in areas where soils and underlying sediment are extremely dry and nonconductive (28). Although radar wave penetration and the ability to reflect energy back to the surface are enhanced in a dry environment, recent work has demonstrated that dryness is not necessarily a prerequisite for GPR surveys, as good data have been collected in swampy areas, peat bogs, rice paddies, and even freshwater lakes. Modern methods of computer enhancement and processing have also proven that meaningful data can be obtained, sometimes even in these very wet ground conditions.
10
Distance (meters) Figure 1. GPR reflection profile showing a vertical slice in the ground to 2.5 meters depth.
features along lines (Fig. 1). When data are acquired in a series of transects within a grid and reflections are correlated and processed, an accurate three-dimensional picture of buried features and associated stratigraphy can be constructed. Ground-penetrating radar surveys allow for a wide aerial coverage in a short period of time and have excellent subsurface resolution of buried materials and geological stratigraphy. Some radar systems can resolve stratigraphy and other features at depths in excess of 40 meters, when soil and sediment conditions are suitable (2). More typically, GPR is used to map buried materials at depths from a few tens of centimeters to 5 meters in depth. Radar surveys can identify buried objects for possible future excavation and also interpolate between excavations and project subsurface knowledge into areas that have not yet been, or may never be excavated. GPR surveys are most typically used by geologists, archaeologists, hydrologists, soil engineers, and other geoscientists. Ground-penetrating radar (GPR) was initially developed as a geophysical prospecting technique to locate buried objects or cavities such as pipes, tunnels, and mine shafts (3). The GPR method has also been used to define lithologic contacts (4–6), faults (7), bedding planes and joint systems in rocks (8–11). Ground-penetrating radar technology can also be employed to investigate buried soil units (12–17) and the depth to groundwater (14,18,19). Archaeological applications range from finding and mapping buried villages (20–25) to locating graves, buried artifacts, and house walls (26,27). ENVIRONMENTS WHERE GROUND-PENETRATING RADAR IS SUCCESSFUL The success of GPR surveys depends to a great extent on soil and sediment mineralogy, clay content, ground moisture, depth of burial, surface topography, and vegetation. It is not a geophysical method that can be immediately applied to any geographic or archaeological setting, although with thoughtful modifications in acquisition and data processing methodology, GPR can be
GROUND-PENETRATING RADAR EQUIPMENT AND DATA ACQUISITION The GPR method involves transmitting high-frequency electromagnetic radio (radar) pulses into the earth and measuring the time elapsed between transmission, reflection from a buried discontinuity, and reception at a surface radar antenna. A pulse of radar energy is generated on a dipole transmitting antenna that is placed on, or near, the ground surface. The resulting wave of electromagnetic energy propagates downward into the ground where portions of it are reflected back to the surface at discontinuities. The discontinuities where reflections occur are usually created by changes in electrical properties of the sediment or soil, variations in water content, lithologic changes, or changes in bulk density at stratigraphic interfaces. Reflection can also occur at interfaces between anomalous archaeological features, buried pipes, and the surrounding soil or sediment. Void spaces in the ground, which may be encountered in burials, tombs, or tunnels, will also generate significant radar reflections due to a significant change in radar wave velocity. The depth to which radar energy can penetrate and the amount of definition that can be expected in the subsurface are partially controlled by the frequency of the radar energy transmitted. The radar energy frequency controls both the wavelength of the propagating wave and the amount of weakening, or attenuation, of the waves in the ground. Standard GPR antennas propagate radar energy that varies in bandwidth from about 10 megahertz (MHz) to 1200 MHz. Antennas usually come in standard frequencies; each antenna has one center frequency but produces radar energy that ranges around that center by about two octaves. An octave is one-half and two times the center frequency. Radar antennas are usually housed in a fiberglass or wooden sled that is placed directly on the ground (Fig. 2) or supported on wheels a few centimeters above the ground. When two antennas are employed, one is used as a transmitting antenna and the other as a receiving antenna. Antennas can also be placed separately on the ground without being housed in a sled. A single antenna can also be used as both a sender and receiver in what is called a monostatic system. In monostatic mode, the same antenna is turned on to transmit a radar pulse and then immediately switched to receiving mode to receive and measure the returning reflected energy.
GROUND PENETRATING RADAR
465
Figure 2. Typical GPR field acquisition set up. A 500 MHz antenna is on the left connected to the radar control unit and computer by a cable. A screen and keyboard for analysis in the field during collectiion is on the packing box on the right.
Antennas are usually hand-towed along survey lines within a grid at an average speed of about 2 kilometers per hour, or they can be pulled behind a vehicle at speeds of 10 kilometers per hour or greater. In this fashion, energy is being continuously transmitted and received as the antennas move over the ground. They can also be moved in steps along a transect instead of being moved continuously. During step acquisition, the smaller the spacing between steps, the greater the subsurface coverage. In the last few years, radar equipment manufacturers have been building their systems so that data can be collected by either method, depending on the preference of the user or because of site characteristics. The most efficient method of subsurface radar mapping is establishing a grid across a survey area before acquiring the data. Usually rectangular grids are established with a line spacing of 50 centimeters or greater. Rectangular grids produce data that are easier to process and interpret. Other types of grid acquisition patterns may be necessary because of surface topography or other obstructions. Surveys lines that radiate outward from one central area have been sometimes used, for instance, to define a moat around a central fort-like structure (27). A rhomboid grid pattern has also been used with success within a sugarcane field on the side of a hill (20), where antennas had to be pulled between planted rows. Data from nonrectangular surveys are just as useful as those acquired in rectangular grids, although more field time may be necessary for surveying, and reflection data must be manipulated differently during computer processing and interpretation. Occasionally GPR surveys have been carried out on the frozen surface of lakes or rivers (2,6,14,28). Radar waves will easily pass through ice and freshwater into the underlying sediment, revealing features on lake or river bottoms and in the subsurface. A radar sled can also be easily floated across the surface of a lake or
river and onto the shore, all the while collecting data from the subsurface (29). These techniques, however, do not work in salt water because the high electrical conductivity of the saline water quickly dissipates the electromagnetic energy before it can be reflected to the receiving antenna. If the antennas are pulled continuously along a transect line within a presurveyed grid, continuous pulses of radar energy are sent into the ground, reflected from subsurface discontinuities and then received and recorded at the surface. The movable radar antennas are connected to the control unit by cable. Some systems record the reflective data digitally directly at the antenna, and the digital signal is sent back through fiber optic cables to the control module (2). Other systems send an analog signal from the antennas through coaxial copper cables to the control unit where it is then digitized. Older GPR systems, without the capability of digitizing the reflected signals in the field, must record reflective data on magnetic tape or paper records. The two-way travel time and the amplitude and wavelength of the reflected radar waves derived from the pulses are then amplified, processed, and recorded for immediate viewing or later postacquisition processing and display. During field data acquisition, the radar transmission process is repeated many times per second as the antennas are pulled along the ground surface or moved in steps. The distance along each line is also recorded for accurate placement of all reflections within a surveyed grid. When the composite of all reflected wave traces is displayed along the transect, a cross-sectional view of significant subsurface reflective surfaces is generated (Fig. 1). In this fashion, two-dimensional profiles that approximate vertical ‘‘slices’’ through the earth are created along each grid line. Radar reflections are always recorded in ‘‘two-way time’’ because that is the time it takes a radar wave
466
GROUND PENETRATING RADAR
to travel from the surface antenna into the ground, to reflect from a discontinuity, travel back to the surface, and be recorded. One of the advantages of GPR surveys over other geophysical methods is that the subsurface stratigraphy and archaeological features at a site can be mapped in real depth. This is possible because the twoway travel time of radar pulses can be converted to depth, if the velocity of the radar wave travel through the ground is known (1). The propagative velocity of radar waves that are projected through the earth depends on a number of factors; the most important is the electrical properties of the material through which they pass (30). Radar waves in air travel at the speed of light, which is approximately 30 centimeters per nanosecond (one nanosecond is one billionth of a second). When radar energy travels through dry sand, its velocity slows to about 15 centimeters per nanosecond. If the radar energy were then to pass through a water-saturated sand unit, its velocity would slow further to about 5 centimeters per nanosecond or less. Reflections would be generated at each interface where velocity changes. Type of Data Collected The primary goal of most GPR investigations is to differentiate subsurface interfaces. All sedimentary layers in the earth have particular electrical properties that affect the rate of electromagnetic energy propagation, as measured by the relative dielectric permittivity. The reflectivity of radar energy at an interface is primarily a function of the magnitude of the difference in electrical properties between the two materials on either side of that interface. The greater the contrast in electrical properties between the two materials, the stronger the reflected signal (31). The inability to measure the electrical parameters of buried units precisely usually precludes accurate calculations of specific amounts of reflectivity in most contexts, and usually only estimates can be made. The strongest radar reflections in the ground usually occur at the interface of two thick layers whose electrical properties vary greatly. The ability to ‘‘see’’ radar reflections on profiles is related to the amplitude of the reflected waves. The higher the amplitude, the more visible the reflections. Lower amplitude reflections usually occur when there are only small differences in the electrical properties between layers. Radar energy becomes both dispersed and attenuated as it radiates into the ground. When portions of the original transmitted signal are reflected toward the surface, they will suffer additional attenuation in the material through which they pass before finally being recorded at the surface. Therefore, to be detected as reflections, important subsurface interfaces must have sufficient electrical contrast at their boundaries and also must be located at shallow enough depths where sufficient radar energy is still available for reflection. As radar energy is propagated to increasing depths and the signal becomes weaker and spreads out over more surface area, less is available for reflection, and it is possible that only very low-amplitude waves will be recorded. The maximum depth of resolution for every site will vary with the geologic
conditions and the equipment being used. Data filtering and other data amplification techniques can sometimes be applied to reflective data after acquisition that will enhance very low amplitude reflections to make them more visible. Reflections received from deeper in the ground are usually gained, either during data collection in the field or during postacquisition processing. This data processing method exponentially increases the amplitudes of reflections from deeper in the ground and makes them visible in reflective profiles. The gaining process enhances otherwise invisible reflections, which have very low amplitude because the energy has traveled to a greater depth in the ground and become attenuated and spread out as waves radiate away from the transmitting antenna, leaving less energy to be reflected to the surface. Production of Continuous Reflective Images Most radar units used for geologic and archaeological investigation transmit short discrete pulses into the earth and then measure the reflected waves derived from those pulses as the antennas are moved along the ground. A series of reflected waves are then recorded as the antennas are moved along a transect. The amount of spatial resolution in the subsurface depends partially on the density of reflections along each transect. This spatial density can be adjusted within the control unit to record a greater or lesser number of traces along each recorded line, depending on the speed of antenna movement along the ground. If a survey wheel is being used for data acquisition, the number of reflective traces desired every unit distance can also be adjusted for greater or lesser resolution. If the step method of acquisition is used, the distance between steps can be lengthened or shortened, depending on the subsurface resolution desired. As reflections from the subsurface are recorded in distinct traces and plotted together in a profile, a twodimensional representation of the subsurface can be made (Fig. 1). One ‘‘trace’’ is a complete reflected wave that is recorded from the surface to whatever depth is being surveyed. A series of reflections that make up a horizontal or subhorizontal line (either dark or light in standard black-and-white or gray-scale profiles) is usually referred to as ‘‘a reflection.’’ A distinct reflection visible in profiles is usually generated from a subsurface boundary such as a stratigraphic layer or some other physical discontinuity such as a water table. Reflections recorded later in time are usually those received from deeper in the ground. There can also be ‘‘point source reflections’’ that are generated from one feature in the subsurface. These are visible as hyperbolas on two-dimensional profiles. Due to the wide angle of the transmitted radar beam, the antenna will ‘‘see’’ the point source before arriving directly over it and continue to ‘‘see’’ it after it is passed. Therefore, the resulting recorded reflection will create a reflective hyperbola (Fig. 3), sometimes incorrectly called a diffraction, on two-dimensional profiles. These often can be produced from buried pipes, tunnels, walls, or large rocks.
GROUND PENETRATING RADAR
Physical Parameters that Affect Radar Transmission The maximum effective depth of GPR wave penetration is a function of the frequency of the waves that are propagated into the ground and the physical characteristics of the material through which they are traveling. The physical properties that affect the radar waves as they pass through a medium are the relative dielectric permittivity (RDP), the electrical conductivity, and the magnetic permeability (32). Soils, sediment or rocks that are ‘‘dielectric’’ will permit the passage of most electromagnetic energy without actually dissipating it. The more electrically conductive a material, the less dielectric it is. For maximum radar energy penetration, a medium should be highly dielectric and have low electrical conductivity. The relative dielectric permittivity of a material is its capacity to store and then allow the passage of electromagnetic energy when a field is imposed upon it (33). It can also be thought of as a measure of a material’s ability to become polarized within an electromagnetic field and therefore respond to propagated electromagnetic waves (30). It is calculated as the ratio of a material’s electrical permittivity to the electrical permittivity in a vacuum (that is, one). Dielectric permittivities of materials vary with their composition, moisture content, bulk density, porosity, physical structure, and temperature (30). The relative dielectric permittivity in air, which exhibits only negligible electromagnetic polarization, is approximately 1.0003 (34), usually rounded to one. In volcanic or other hard rocks, it can range from 6 to 16, and in wet soils or clay-rich units, it can approach 40 or 50. In unsaturated sediment, where there is little or no clay, relative dielectric permittivities can be 5 or lower. In general, the higher the RDP of a material, the slower the velocity of radar waves passing through it. In general, the higher the RDP of a material, the poorer its ability to transmit radar energy (1). If data are not immediately available about field conditions, the RDP can only be estimated, but if the actual depth of objects or interfaces visible in reflective profiles is known, the RDP can be easily calculated using Eq. 1. K 1/2 =
C V
(1)
where K = relative dielectric permittivity (RDP) of the material through which the radar energy passes C = speed of light (0.2998 meters per nanosecond) V = velocity at which the radar passes through the material (measured in meters per nanosecond) The relative dielectric permittivity of some common materials is shown in Table 1. These of course can be highly variable due to changes in clay content and type, the amount and type of salts, and especially moisture. The greater the difference between the relative dielectric permittivity of materials in the subsurface, the larger the amplitude of the reflection generated. To generate a significant reflection, the change in dielectric permittivity between two materials must occur over a
Table 1. Relative Dielectric Common Materials Material
Permittivities
467 of
Relative Dielectric Permittivity
Air Ice Salt water Dry sand Saturated sand Volcanic ash/pumice Limestone Shale Granite Coal Dry silt Saturated silt Clay Permafrost Asphalt Concrete
1 3–4 81–88 3–5 20–30 4–7 4–8 5–15 5–15 4–5 3–30 10–40 5–40 4–5 3–5 6
short distance. When the RDP changes gradually with depth, only small differences in reflectivity will occur every few centimeters, and therefore only weak or nonexistent reflections will be generated. Magnetic permeability is a measure of the ability of a medium to become magnetized when an electromagnetic field is imposed upon it (35). Most soils and sediments are only very slightly magnetic and therefore have low magnetic permeability. The higher the magnetic permeability, the more the electromagnetic energy will be attenuated during its transmission. Media that contain magnetite minerals, iron oxide cement, or iron-rich soils can have high magnetic permeability and therefore transmit radar energy poorly. Electric conductivity is the ability of a medium to conduct an electric current (35). When a medium through which radar waves pass has high conductivity, radar energy will be highly attenuated. In a highly conductive medium, the electric component of the electromagnetic energy is essentially conducted away into the earth and becomes lost. This occurs because the electric and magnetic fields are constantly ‘‘feeding’’ on each other during transmission. If one is lost, the total field dissipates. Highly conductive media include those that contain salt water and those that have high clay content, especially if the clay is wet. Any soil or sediment that contains soluble salts or electrolytes in the groundwater will also have high electrical conductivity. Agricultural runoff that is partially saturated with soluble nitrogen and potassium can raise the conductivity of a medium, as will wet calcium carbonate impregnated soils in desert regions. Radar energy will not penetrate metal. A metal object will reflect 100% of the radar energy that strikes it and will shadow anything directly underneath it.
RADAR ENERGY PROPAGATION Many ground-penetrating radar novices envision the propagating radar pattern as a narrow pencil-shaped
468
GROUND PENETRATING RADAR
beam that is focused directly down from the antenna. In fact, GPR waves from standard commercial antennas radiate energy into the ground in an elliptical cone (Fig. 3) whose apex is at the center of the transmitting antenna (36–38). This elliptical cone of transmission occurs because the electric field produced by the antenna is generated parallel to its long axis and therefore usually radiates into the ground perpendicular to the direction of antenna movement along the ground surface. This radiative pattern is generated from a horizontal electric dipole antenna to which elements called shields, are sometimes added that effectively reduce upward radiation. Sometimes, the only shielding mechanism is a metal plate that is placed above the antenna to re-reflect upward radiating energy. Because of considerations of cost and portability (size and weight), the use of more complex radar antennas that might be able to focus energy more efficiently into the ground in a more narrow beam has been limited to date. When an electric dipole antenna is located in air (or supported within the antenna housing), the radiative pattern is approximately perpendicular to the long axis of the antenna. When this dipole antenna is placed on the ground, a major change in the radiative pattern occurs due to ground coupling (39). Ground coupling is the ability of the electromagnetic field to move from transmission in the air to the ground. During this process, refraction that occurs as the radar energy passes through surface units changes the directionality of the radar beam, and most of the energy is channeled downward in a cone from the propagating antenna (32). The higher the RDP of the surface material, the lower the velocity of the transmitted radar energy, and the more focused (less broad) the conical transmission pattern becomes (24). This focusing effect continues as radar waves travel into the ground and material of higher and higher RDP is encountered. The amount of energy refraction that occurs with depth and therefore the amount of focusing is a function of Snell’s law (35). In Snell’s law the amount of reflection or refraction that occurs at a boundary between two media depends on the angle of incidence and the velocity of
Antenna
A=
Ground surface
+
D √K + 1
(3)
A = Approximate long dimension radius of footprint
D
A Footprint
l 4
l = Center frequency wavelength of radar energy
D = Depth from ground surface to reflection surface K = Average relative dielectric permittivity (RDP) of material from ground surface to depth (D )
Figure 3. The conical transmission of radar energy from a surface antenna into the ground. The footprint of illumination at any depth can be calculated with equation 3.
the incoming waves. In general the greater the increase in RDP with depth, the more focused the cone of transmission becomes. The opposite can also occur if materials of gradually lower RDP are encountered as radar waves travel into the ground. Then, the cone of transmission would gradually expand outward as refraction occurs at each interface. Radiation fore and aft from the antenna is usually greater than to the sides, making the ‘‘illumination’’ pattern on a horizontal subsurface plane approximately elliptical (Fig. 3); the long axis of the ellipse is parallel to the direction of antenna travel (1). In this way, the subsurface radiative pattern on a buried horizontal is always ‘‘looking’’ directly below the antenna and also in front, behind, and to the sides as it travels across the ground. The radiative pattern in the ground also depends on the orientation of the antenna and the resulting polarization of the electromagnetic energy as it travels into the ground. If a standard GPR antenna is used, where the transmitting and receiving antennas are perpendicular to the direction of transport along the ground, the elliptical pattern of illumination will tend to be elongated somewhat in the direction of transport. A further complexity arises due to polarization of waves as they leave the antenna and pass through the ground. The electric field generated by a dipole antenna is oriented parallel to the long axis of the antenna, which is usually perpendicular to the direction of transport across the ground. A linear object in the ground that is oriented parallel to this polarization would therefore produce a very strong reflection, as much of the energy is reflected. In contrast, a linear object in the ground perpendicular to the polarized electrical field will have little surface area parallel to the field with which to reflect energy therefore will reflect little energy, and may be almost invisible. To minimize the amount of reflective data derived from the sides of a survey line, the long axes of the antennas are aligned perpendicular to the survey line. This elongates the cone of transmission in an in-line direction. Various other antenna orientations achieve different subsurface search patterns, but most of them are not used in standard GPR surveys (1). Some antennas, especially those in the low-frequency range from 80–300 MHz, are sometimes not well shielded and therefore radiate radar energy in all directions. Using unshielded antennas can generate reflections from a nearby person pulling the radar antenna or from any other objects nearby such as trees or buildings (40). Discrimination of individual targets, especially those of interest in the subsurface, can be difficult if these types of antennas are used. However, if the unwanted reflections generated from unshielded antennas all occur at approximately the same time, for instance from a person pulling the antennas, then they can be easily filtered out later, if the data are recorded digitally. If reflections are recorded from randomly located trees, surface obstructions, or people moving about near the antenna, usually they cannot easily be discriminated from important subsurface reflections, and interpreting the data is much more difficult.
GROUND PENETRATING RADAR
If the transmitting antenna is properly shielded so that energy is propagated in a mostly downward direction, the angle of the conical radiative pattern can be estimated, depending on the center frequency of the antenna used (1). An estimate of this radiative pattern is especially important when designing line spacing within a grid, so that all subsurface features of importance are ‘‘illuminated’’ by the transmitted radar energy and therefore can potentially generate reflections. In general, the angle of the cone is defined by the relative dielectric permittivity of the material through which the waves pass and the frequency of the radar energy emitted from the antenna. An equation that can be used to estimate the width of the transmission beam at varying depths (footprint) is shown in Fig. 3. This equation (Eq. 3) can usually be used only as a rough approximation of real-world conditions because it assumes a consistent dielectric permittivity of the medium through which the radar energy passes. Outside of strictly controlled laboratory conditions, this is never the case. Sedimentary and soil layers within the earth have variable chemical constituents, differences in retained moisture, compaction, and porosity. These and other variables can create a complex layered system that has varying dielectric permittivities and therefore varying energy transmission patterns. Any estimate of the orientation of transmitted energy is also complicated by the knowledge that radar energy propagated from a surface antenna is not of one distinct frequency but can range in many hundreds of megahertz around the center frequency. If one were to make a series of calculations on each layer, assuming that all of the variables could be determined and assuming one distinct antenna frequency, then the ‘‘cone’’ of transmission would widen in some layers, narrow in others, and create a very complex three-dimensional pattern. The best one can usually do for most field applications is to estimate the radar beam configuration based on estimated field conditions (1). Antenna Frequency Constraints One of the most important variables in ground-penetrating radar surveys is selecting antennas that have the correct operating frequency for the depth necessary and the resolution of the features of interest (41). The center frequencies of commercial GPR antennas range from about 10–1200 megahertz (MHz) (15,37). Variations in the dominant frequencies of any antenna are caused by irregularities in the antenna’s surface or other electronic components located within the system. These types of variations are common in all antennas; each has its own irregularities and produces a different pulse signature and different dominant frequencies. This somewhat confusing situation with respect to transmission frequency is further complicated when radar energy is propagated into the ground. When radar waves move through the ground, the center frequency typically ‘‘loads down’’ to a lower dominant frequency (39). The new propagative frequency, which is almost always lower, will vary depending on the electric properties of near-surface soils and sediment that change the velocity of propagation
469
and the amount of ‘‘coupling’’ of the propagating energy with the ground. At present, there is little hard data that can be used to predict accurately what the ‘‘downloaded’’ frequency of any antenna will be under varying conditions. For most GPR applications, it is only important to be aware that there is a downloading effect that can change the dominant radar frequency and affect calculations of subsurface transmission patterns, penetration depth, and other parameters. In most cases, proper antenna frequency selection can make the difference between success and failure in a GPR survey and must be planned for in advance. In general, the greater the necessary depth of investigation, the lower the antenna frequency which should be used. Lower frequency antennas are much larger, heavier and more difficult to transport to and within the field than high-frequency antennas. One 80-MHz antenna used for continuous GPR acquisition is larger than a 42-gallon oil drum, cut in half lengthwise, and weighs between 125 and 150 pounds. It is difficult to transport to and from the field, and usually must be moved along transect lines by some form of wheeled vehicle or sled. In contrast, a 500-MHz antenna is smaller than a shoe box, weighs very little, and can easily fit into a suitcase (Fig. 2). Lower frequency antennas used for acquiring data by the step method are not nearly as heavy as those used in continuous data acquisition but are equally unwieldy. Low-frequency antennas (10–120 MHz) generate long wavelength radar energy that can penetrate up to 50 meters in certain conditions but can resolve only very large subsurface features. In pure ice, antennas of this frequency have been known to transmit radar energy for many kilometers. Dry sand and gravel or unweathered volcanic ash and pumice are media that allow radar transmission to depths that approach 8–10 meters, when lower frequency antennas are used. In contrast, the maximum depth of penetration of a 900-MHz antenna is about 1 meter or less in typical soils, but its generated reflections can resolve features as small as a few centimeters. Therefore, trade-off exists between depth of penetration and subsurface resolution. The depth of penetration and the subsurface resolution are actually highly variable and depend on many site-specific factors such as overburden composition, porosity, and the amount of retained moisture. If large amounts of clay, especially wet clay, are present, then attenuation of the radar energy with depth will occur very rapidly, irrespective of radar energy frequency. Attenuation can also occur if sediment or soils are saturated with salty water, especially seawater. SUBSURFACE RESOLUTION The ability to resolve buried features is determined mostly by frequency and therefore the wavelengths of the radar energy transmitted into the ground. The wavelength necessary for resolution varies, depending on whether a three-dimensional object or an undulating surface is being investigated. For GPR to resolve three-dimensional objects, reflections from at least two surfaces, usually a top and bottom interface, need to be distinct. Resolution
470
GROUND PENETRATING RADAR
of a single buried planar surface, however, needs only one distinct reflection and therefore wavelength is not as important in resolving it. An 80-MHz antenna generates an electromagnetic wave about 3.75 meters long when transmitted in air. When the wavelength in air is divided by the square root of the RDP of the material through which it passes, the subsurface wavelength can be estimated. For example, when an 80-MHz wave travels through material whose RDP is 5, its wavelength decreases to about 1.6 meters. The 300-MHz antenna generates a radar wave whose wavelength is 1 meter in air, and decreases to about 45 centimeters in material whose RDP is 5. To distinguish reflections from two parallel planes (the top and bottom of a buried object, for instance), they must be separated by at least one wavelength of the energy that is passing through the ground (1,2). If the two reflections are not separated by one wavelength, then the resulting reflected waves from the top and bottom will either be destroyed or will be unrecognizable due to constructive and destructive interference. When two interfaces are separated by more than one wavelength, however, two distinct reflections are generated, and the top and bottom of the feature can be resolved. If only one buried planar surface is being mapped, then the first arrival reflected from that interface can be accurately resolved, independent of the wavelength. This can be more difficult when the buried surface is highly irregular or undulating. Subsurface reflections of buried surfaces that have been generated by longer wavelength radar waves tend to be less sharp when viewed together in a standard GPR profile, and therefore many small irregularities on the buried surface are not visible. This occurs because the conical radiation pattern of an 80-MHz antenna is about three times broader than that of a 300MHz antenna (1). Therefore, the reflected data that are received at the surface from the lower frequency antenna have been reflected from a much greater subsurface area, which results in averaging out the low percentage of reflections from the smaller irregular features. Therefore, a reflective profile produced from reflections by an 80MHz antenna produces an average and less accurate representation of a buried surface. In contrast, a 300-MHz transmission cone is about three times narrower than an 80-MHz radar beam, and its resolution of subsurface features on the same buried surface is much greater. Radar energy that is reflected from a buried subsurface interface that slopes away from a surface transmitting antenna is reflected away from the receiving antenna and will be lost. This sloping interface would go unnoticed in reflective profiles. A buried surface of this orientation is visible only if an additional traverse is located in an orientation where that the same buried interface slopes toward the surface antennas. This is one reason that it is important always to acquire lines of data within a closely spaced surface grid. The amount of reflection from a buried feature is also determined by the ratio of the object’s dimension to the wavelength of the radar wave in the ground. Short wavelength (high-frequency) radar waves can resolve very small features but will not penetrate to a great depth.
Longer wavelength radar energy will resolve only larger features but will penetrate deeper in the ground. Some features in the subsurface may be described as ‘‘point targets,’’ and others are similar to planar surfaces. Planar surfaces can be stratigraphic and soil horizons or large flat archaeological features such as pit-house floors or buried soil horizons. Point targets are features such as tunnels, voids, artifacts or any other nonplanar object. Depending on a planar surface’s thickness, reflectivity, orientation, and depth of burial, it is potentially visible with any frequency data, constrained only by the conditions discussed before. Point sources, however, often have little surface area with which to reflect radar energy and therefore are usually difficult to identify and map. They are sometimes indistinguishable from the surrounding material. Many times they are visible only as small reflective hyperbolas visible on one line within a grid (Fig. 1). In most geologic and archaeological settings, the materials through which radar waves pass may contain many small discontinuities that reflect energy. These can be described only as clutter (that is if they are not the target of the survey). Clutter depends totally on the wavelength of the radar energy propagated. If both the features to be resolved and the discontinuities that produce the clutter are of the order of one wavelength, then the reflective profiles will appear to contain only clutter, and there can be no discrimination between the two. Clutter can also be produced by large discontinuities, such as cobbles and boulders, but only when a lower frequency antenna is used that produces a long wavelength. In all cases, the feature to be resolved, if not a large planar surface, should be much larger than the clutter and greater than one wavelength. Buried features, whether planar or point sources, also cannot be too small compared to their depth of burial, before they are undetectable. As a basic guideline, the cross-sectional area of the target to be illuminated within the ‘‘footprint’’ of the beam should approximate the size of the footprint at the target depth (Eq. 3 in Fig. 3). If the target is much smaller than the footprint size, then only a fraction of the reflected energy that is returned to the surface will have been reflected from the buried feature. Any reflections returned from the buried feature in this case may be indistinguishable from background reflections and will be invisible on reflective profiles. Frequency Interference Ground-penetrating radar employs electromagnetic energy at frequencies that are similar to those used in television, FM radio, and other radio communication bands. If there is an active radio transmitter in the vicinity of the survey, then there may be some interference with the recorded signal. Most radio transmitters, however, have quite a narrow bandwidth and, if known in advance, an antenna frequency can be selected that is as far away as possible from any frequencies that might generate spurious signals in the reflected data. The wide bandwidth of most GPR systems usually makes it difficult to avoid such external transmitter effects completely, and any major adjustments in antenna frequency may affect the survey objectives. Usually, this
GROUND PENETRATING RADAR
Ground surface
(a)
Ground surface
(b)
20
40
Subsurface horizon
60
Minor scattering
Time
20 Time
471
Subsurface horizon
40 60
Major scattering
80
80
Ground surface
(c)
Time
20 40
Subsurface horizon
60
Focusing
80 Figure 4. Ground-penetrating radar ray paths reflected from an undulating surface and a deep ditch. Convex upward surfaces scatter radar energy while concave upward focus. Very deep features tend to scatter most energy and are hard to detect using GPR.
becomes a problem only if the site is located near a military base, airport, or radio transmission antennas. Cellular phones and walkie-talkies that are in use nearby during the acquisition of GPR data can also create noise in recorded reflective data and should not be used during data collection. This type of radio ‘‘noise’’ can usually be filtered out during postacquisition data processing. Focusing and Scattering Effects Reflection from a buried surface that contains ridges or troughs can either focus or scatter radar energy, depending on its orientation and the location of the antenna on the ground surface. If a subsurface plane is slanted away from the surface antenna location or is convex upward, most energy will be reflected away from the antenna, and no reflection or a very low amplitude reflection will be recorded (Fig. 4). This is termed radar scatter. The opposite is true when the buried surface is tipping toward the antenna or is concave upward. Reflected energy in this case will be focused, and a very high-amplitude reflection derived from the buried surface would be recorded. Figure 4 is an archaeological example of the focusing and scattering effects when a narrow buried moat is bounded on one side by a trough and on the other side by a mound. Both convex and concave upward surfaces would be ‘‘illuminated’’ by the radar beam as the antenna is pulled along the ground surface. When the radar antenna is located to the left of the deep moat (Fig. 4) some of the reflections are directed to the surface antenna, but there is still some scattering, and a weak reflection will be recorded from the buried surface. When the antenna is located directly over the deep trough, there will be a high degree of scattering, and much of the
radar energy, especially that which is reflected from the sides of the moat, will be directed away from the surface antenna and lost. This scattering effect will make the narrow moat invisible in GPR surveys. When the antenna is located directly over the wider trough to the right of the moat, there will be some focusing of the radar energy that creates a higher amplitude reflection from this portion of the subsurface interface. TWO-DIMENSIONAL GPR IMAGES The standard image for most GPR reflective data is a twodimensional profile that shows the depth on the ordinate and the distance along the ground on the abscissa. These image types are constructed by stacking many reflective traces together that are obtained as the antennas are moved along a transect (Figs. 1 and 5). Profile depths are usually measured in two-way radar travel time, but time can be converted to depth, if the velocity of radar travel in the ground is obtained. Reflective profiles are most often displayed in gray scale, and variations in the reflective amplitudes are measured by the depth of the shade of gray. Color palettes can also be applied to amplitudes in this format. Often, two-dimensional profiles must be corrected to reflect changes in ground elevation. Only after this is done will images correctly represent the real world. This process, which is usually important only when topographic changes are great, necessitates detailed surface mapping of each transect within the data grid and then reprocessing each transect by adjusting all reflective traces for surface elevation.
472
GROUND PENETRATING RADAR
Tunnel
Depth (meters)
0
1.0
2.0
5
0 Distance (meters)
Figure 5. A vertical GPR profile perpendicular to a buried tunnel illustrating the hyperbolic reflection generated from a point source.
Standard two-dimensional images can be used for most basic data interpretation, but analysis can be tedious if many profiles are in the database. In addition, the origins of each reflection in each profile must sometimes be defined before accurate subsurface maps can be produced. Accurate image definition comes only with a good deal of interpretive experience. As an aid to reflection interpretation, two-dimensional computer models of expected buried features or stratigraphy can be produced, which creates images of what things should look like in the ground for comparative purposes (1,24). THREE-DIMENSIONAL GPR IMAGING USING AMPLITUDE ANALYSIS The primary goal of most GPR surveys is to identify the size, shape, depth, and location of buried remains and related stratigraphy. The most straightforward way to accomplish this is by identifying and correlating important reflections within two-dimensional reflective profiles. These reflections can often be correlated from profile to profile throughout a grid, which can be very timeconsuming. Another more sophisticated type of GPR data manipulation is amplitude slice-map analysis that creates maps of reflected wave amplitude differences within a grid. The result can be a series of maps that illustrate the three-dimensional location of reflective anomalies derived from a computer analysis of the two-dimensional profiles.
This method of data processing can be accomplished only with a computer using GPR data that are stored digitally. The raw reflective data collected by GPR are nothing more than a collection of many individual traces along two-dimensional transects within a grid. Each of those reflective traces contains a series of waves that vary in amplitude, depending on the amount and intensity of energy reflection that occurred at buried interfaces. When these traces are plotted sequentially in standard two-dimensional profiles, the specific amplitudes within individual traces that contain important reflective information are usually difficult to visualize and interpret. The standard interpretation of GPR data, which consists of viewing each profile and then mapping important reflections and other anomalies, may be sufficient when the buried features are simple and interpretation is straightforward. In areas where the stratigraphy is complex and buried materials are difficult to discern, different processing and interpretive methods, one of which is amplitude analysis, must be used. In the past when GPR reflective data were often collected that had no discernible reflections or recognizable anomalies of any sort, the survey was usually declared a failure, and little if any interpretation was conducted. Due to the advent of more powerful computers and sophisticated software programs that can manipulate large sets of digital data, important subsurface information in the form of amplitude changes within the reflected waves has been extracted from these types of GPR data. An analysis of the spatial distribution of the amplitudes of reflected waves is important because it is an indicator of subsurface changes in lithology or other physical properties. The higher the contrasting velocity at a buried interface, the greater the amplitude of the reflected wave. If amplitude changes can be related to important buried features and stratigraphy, the location of higher or lower amplitudes at specific depths can be used to reconstruct the subsurface in three dimensions. Areas of low-amplitude waves indicate uniform matrix material or soils, and those of high amplitude denote areas of high subsurface contrast such as buried archaeological features, voids, or important stratigraphic changes. To be correctly interpreted, amplitude differences must be analyzed in ‘‘time slices’’ that examine only changes within specific depths in the ground. Each time slice consists of the spatial distribution of all reflected wave amplitudes, which are indicative of these changes in sediments, soils, and buried materials. Amplitude time slices need not be constructed horizontally or even in equal time intervals. They can vary in thickness and orientation, depending on the questions being asked. Surface topography and the subsurface orientation of features and the stratigraphy of a site may sometimes necessitate constructing slices that are neither uniform in thickness nor horizontal. To compute horizontal time slices, the computer compares amplitude variations within traces that were recorded within a defined time window. When this is done, both positive and negative amplitudes of reflections are compared to the norm of all amplitudes within that window. No differentiation is usually made between
GROUND PENETRATING RADAR
positive or negative amplitudes in these analyses, only the magnitude of amplitude deviation from the norm. Lowamplitude variations within any one slice denote little subsurface reflection and therefore indicate the presence of fairly homogeneous material. High amplitudes indicate significant subsurface discontinuities and in many cases detect the presence of buried features. An abrupt change between an area of low and high amplitude can be very significant and may indicate the presence of a major buried interface between two media. Degrees of amplitude variation in each time slice can be assigned arbitrary colors or shades of gray along a nominal scale. Usually, there are no specific amplitude units assigned to these color or tonal changes.
473
45 40 35 30 25 20 15
EXAMPLES OF THREE-DIMENSIONAL GPR MAPPING USING TIME SLICES Archaeological applications of GPR mapping have been expanding in the last decade, as the prices of data acquisition and processing systems have decreased and the image producing software has expanded. One area of archaeological success with GPR is the high plateau and desert areas of Colorado, Utah, New Mexico and Arizona, an area of abundant buried archaeological remains, including pit houses, kivas (semisubterranean circular pit features used for ceremonial activities), and storage pits. The climate and geological processes active in this area produce an abundance of dry sandy sediments and soil, an excellent medium for GPR energy penetration. Traditional archaeological exploration and mapping methods used for discovering buried sites include visual identification of artifacts in surface surveys, random test pit excavation, and analysis of subtle topographic features; all of them may indicate the presence of buried features. These methods can sometimes be indicative of buried sites, but they are extremely haphazard and random and often lead to misidentification or nonidentification of features. At a site near Bluff, Utah, a local archaeologist used some of these techniques to map what he considered was a large pit-house village. The area is located in the floodplain of the San Juan River, an area that was subjected to repeated floods during prehistoric time that often buried low lying structures in fluvial sediment. In a grid that was roughly 50 × 30 meters in dimension, surface surveys had located four or five topographic depressions that appeared to be subtle expressions of pit houses in what was presumably a small buried village. Lithic debris from stone tool manufacture as well as abundant ceramic sherds were found in and around these depressions and further enhanced this preliminary interpretation. A GPR survey was conducted over this prospective area, using paired 500-MHz antennas that transmitted data to a maximum depth of about 2 meters (42). While data were being acquired, reflection profiles were viewed on a computer monitor and were recorded digitally. A preliminary interpretation of the raw data in the field showed no evidence of pit-house floors in the areas that contained the depressions. Surprisingly, a large distinct floor was located in one corner of the grid, an area
10 5
0
5
10
15
20
25
30
Figure 6. Amplitude slice-map of a layer from 1.2–1.5 meters in the ground. The anomalous high amplitude reflections in the bottom right are from the floor and features on the floor of a buried pit-house covered in river sediment.
not originally considered prospective (Fig. 6). Velocity information, obtained in a nearby pit being dug for a house foundation, was used to convert radar travel time to depth. An amplitude time-slice map was then constructed in a slice from about 1.2–1.5 meters deep, a slice that would encompass the pit-house floor and all subfloor features. A map of the high amplitudes in this slice shows an irregularly shaped floor that has a possible antechamber and an entrance at opposite sides of the pit structure (Fig. 6). To confirm this interpretation derived only from the GPR maps, nine core holes were dug on and around the feature. All holes dug within the mapped feature encountered a hard-packed floor covered with fire-cracked rock, ceramic sherds and even a small bone pendant at exactly the depth predicted from the GPR maps. Those cores, drilled outside the pit house and in the area of the shallow depressions originally considered the location of the houses, encountered only hard, partially cemented fluvial sediment without archaeological remains. This GPR survey demonstrates the advantages of performing GPR surveys in conjunction with typical surface topography and artifact distribution mapping. The standard methods of site exploration indicated the presence of nearby pit houses, but both the artifact distributions and the subtle depressions pointed to the wrong area. If only these indicators were used as a guide to subsurface testing, it is doubtful that any archaeological features would have been discovered. Only when used in conjunction with the GPR data was the pit house discovered. It is not known at this time what may have created the subtle depressions that were originally interpreted as pit houses. It is likely that the artifact
GROUND PENETRATING RADAR
and lithic scatters noticed on the surface were produced by rodent burrowing, which brought these materials from depth and then concentrated them randomly across the site. A cautionary lesson about how changing conditions can affect GPR mapping was learned at this site when a second GPR survey over the known pit house was conducted a few months later after a large rain storm. This survey produced no significant horizontal reflections in the area of the confirmed pit house, but many random nonhorizontal reflections throughout the grid; none of them looked like house floors. These anomalous reflections were probably produced by pockets of rainwater that had been differentially retained in the sediments. At a well known archaeological site, also near Bluff, Utah, a second GPR survey was performed in an area where a distinct surface depression indicated the presence of a Great Kiva, a large semisubterranean structure typical of Pueblo II sites in the American Southwest (42). A 30 × 40 meter GPR survey, using both 300- and 500-MHz antennas, was conducted over this feature for use as a guide to future excavation. Individual GPR profiles of both frequencies showed only a bowl-shaped feature, which appeared to be filled with homogeneous material that had no significant reflection (Fig. 7). There were no discernible features within the depression that would correspond to floor features or possible roof support structures.
Ground surface Wall of kiva
Kiva fill
1.0
5
10
15 Distance (meters)
1.25−1.50 meters
30.00
25.00
20.00
15.00
10.00
5.00
10.00
15.00 meters
20.00
25.00
25 buried
kiva
Amplitude time-slice maps were then produced for the grid in the hope that subtle changes in amplitude, not visible to the human eye in normal reflection profiles, might be present in the data. When this was completed, the slice from 1.33 to 1.54 meters in depth (Fig. 8) showed a square feature deep within the depression, which, it was later found in two excavation trenches, was the wall of a deeper feature within the depression (42). The origin and function of this feature is not yet known. What can be concluded from this exercise in GPR data processing is that the computer can produce images of
35.00
5.00
20
Figure 7. A vertical GPR profile across a (semi-subterranean pit structure) in Utah, USA.
0.50−1.0 meter depth
0.00
Wall of kiva
0 Depth (meters)
474
30.00
Figure 8. Two amplitude slice-maps across the buried kiva shown in Figure 7. The slice from .5–1.0 meters shows the circular wall of the kiva as high amplitude reflections. It is a discontinuous circle because the wall is partially collapsed. The slice from 1.25–1.5 meters shows a more square feature within the kiva that was found to be interior walls during excavations.
GROUND PENETRATING RADAR
475
2−14 ns
2.00
0.00
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
14 − 28 ns 4.00
2.00
0.00 2.00
4.00
6.00
8.00
10.00
12.00
subtle features that cannot be readily processed by the human brain. Without this type of GPR processing, this deep feature would most likely not have been discovered or excavated. The most straightforward application of threedimensional amplitude analysis is the search for buried utilities. Often near-surface metal water pipes or electrical conduit can be discovered by using metal detectors or magnetometers. But these methods will not work if the buried utilities are made of clay, plastic, or other nonmagnetic material. Because GPR reflections are produced at the contact between any two types of buried materials, reflections from many buried nonmagnetic pipes and conduits will occur. Tunnels and tubes filled with air are especially visible and produce very high-amplitude reflections. In Fig. 9, two amplitude slices are shown in an area where it was thought that a buried electrical conduit existed. Records were available giving the approximate depth of burial, which was about 5 years before the GPR data were acquired. The actual location of the buried pipe and its orientation were not known. The conduit was immediately visible as a point-source hyperbola on the computer screen during acquisition when the antennas crossed it. Using the approximate depth from the old records and the measured radar travel time acquired in the field, an average velocity of radar travel through the ground was calculated. Amplitude slices were then constructed from the reflective data, and the lowest slice was most likely to include the buried conduit. The upper slices show only minor changes in amplitude, relating to changes in soil character. The pipe is easily discerned in the slice from 14 to 28 nanoseconds, and each bend is imaged. The image from this depth is somewhat complicated due to reflections from the side of the trench within which the conduit was placed.
14.00
Figure 9. Two amplitude slice-maps showing a buried electrical cable. The slice from 2–14 nanoseconds shows only soil changes near the surface. From 14–28 nanoseconds the cable is clearly visible as high amplitude reflections.
CONCLUSIONS AND PROSPECTS FOR THE FUTURE Ground-penetrating radar imaging of near-surface features is still very much in its infancy. Accurate threedimensional maps can be made from two-dimensional reflective data, manually and using the amplitude slicemap method. But these maps are really constructed only from a series of two-dimensional data sets. Using only one transmitting and one receiving antenna (the acquisition method typical today) and an abundant amount of subsurface energy refraction, reflection, and scatter, it can sometimes be difficult to determine the exact source in the ground of many of the reflections recorded. A number of data acquisition and processing methods are being developed that may alleviate some of these problems. One simple data processing method that offers a way to remove the unwanted tails of reflective hyperbolas is data migration. If the velocity of radar travel in the ground can be calculated, each axis of a hyperbola can be collapsed back to its source before producing amplitude slice maps. This process is standard procedure in seismic processing used by petroleum exploration companies. Good velocity analysis and a knowledge of the origin of hyperbolic reflections is necessary for this type of processing. A very sophisticated data acquisition method is presently under development that will allow acquiring true three-dimensional reflective data. Also a modification from seismic petroleum exploration, this procedure would place many receiving antennas on the ground with a grid; each would simultaneously ‘‘listen’’ for reflected waves and then would record each of those received signals on its own channel. One transmitting antenna would then be moved around the grid in an orderly fashion, while the receiving antennas record the reflected waves from many different locations. In this way, real three-dimensional data are acquired in a method called tomography. The processing procedure, which can
476
GROUND PENETRATING RADAR
manipulate massive amounts of multichannel data, is still being developed. Ever more powerful computers and the advancement of true three-dimensional data acquisition and processing will soon make it possible to produce rendered images of buried features. A number of prototype rendering programs have been developed, all of which show much promise. In the near future, clear images of buried features in the ground will be produced from GPR reflections, soon after data are acquired in the field; this will allow researchers to modify acquisition parameters, recollect data while the equipment is still on location, and produce very precise maps of subsurface materials. BIBLIOGRAPHY 1. L. B. Conyers and D. Goodman, Ground-Penetrating Radar: An Introduction for Archaeologists, AltaMira Press, Walnut Creek, CA, 1997. 2. J. L. Davis and A. P. Annan, in J. S. Pilon, ed., Ground Penetrating Radar, Geological Survey of Canada paper 904:49–56, 1992. 3. P. K. Fullagar and D. Livleybrooks, in Proceedings of the Fifth International Conference on Ground Penetrating Radar, Walnut Creek, CA, 1994, pp. 883–894. 4. U. Basson, Y. Enzel, R. Amit, and Z. Ben-Avraham, in Proceedings of the Fifth International Conference on Ground Penetrating Radar, 1994, pp. 777–788. 5. S. van Heteren, D. M. Fitzgerald, and P. S. McKinlay, in Proceedings of the Fifth International Conference on Ground Penetrating Radar, Walnut Creek, CA, 1994, pp. 869–881. 6. H. M. Jol and D. G. Smith, Can. J. Earth Sci. 28, 1939–1947 (1992). 7. S. Deng, Z. Zuo, and W. Huilian, in Proceedings of the Fifth International Conference on Ground Penetrating Radar, Walnut Creek, CA, 1994, pp. 1,115–1,133. 8. L. Bjelm, Geologic Interpretation of SIR Data from a Peat Deposit in Northern Sweden, Lund Institute of Technology, Dept. of Engineering Geology. Lund, Sweden, 1980. 9. J. C. Cook, Geophysics 40, 865–885 (1975). 10. L. T. Dolphin, R. L. Bollen, and G. N. Oetzel, Geophysics 39, 49–55 (1974). 11. D. L. Moffat and R. J. Puskar, Geophysics 41, 506–518 (1976). 12. M. E. Collins, in H. Pauli and S. Autio, eds., Fourth International Conference on Ground-Penetrating Radar, June 8–13, Rovaniemi, Finland. Geological Survey of Finland Special Paper, 16:125–132, 1992. 13. J. A. Doolittle, Soil Surv. Horizons 23, 3–10 (1982). 14. J. A. Doolittle and L. E. Asmussen, in H. Pauli and S. Autio, eds., Fourth International Conference on Ground-Penetrating Radar, June 8–13, Rovaniemi, Finland. Geological Survey of Finland Special Paper, 16:139–147, 1992. 15. C. G. Olson and J. A. Doolittle, Soil Sci. Soc. Am. J. 49, 1,490–1,498 (1985). 16. R. W. Johnson, R. Glaccum, and R. Wotasinski, Soil Crop Sci. Soc. Proc. 39, 68–72 (1980).
17. S. F. Shih and J. A. Doolittle, Soil Sci. Soc. Am. J. 48, 651–656 (1984). 18. L. Beres and H. Haeni, Groundwater 29, 375–386 (1991). 19. R. A. van Overmeeren, in Proceedings of the Fifth International Conference on Ground Penetrating Radar, Walnut Creek, CA, 1994, pp. 1,057–1,073. 20. L. B. Conyers, Geoarchaeology 10, 275–299 (1995). 21. T. Imai, S. Toshihiko, and T. Kanemori, Geophysics 52, 137–150 (1987). 22. D. Goodman and Y. Nishimura, Antiquity 67, 349–354 (1993). 23. D. Goodman, Y. Nishimura, R. Uno, and T. Yamamoto, Archaeometry 36, 317–326 (1994). 24. D. Goodman, Geophysics 59, 224–232 (1994). 25. D. Goodman, Y. Nishimura, and J. D. Rogers, Archaeological Prospection 2, 85–89 (1995). 26. C. J. Vaughan, Geophysics 51, 595–604 (1986). 27. B. W. Bevan, Ground-Penetrating Radar at Valley Forge, Geophysical Survey Systems Inc., North Salem, NH, 1977. 28. A. P. Annan and J. L. Davis, in Ground Penetrating Radar, J. A. Pilon, ed., Geological Survey of Canada, Paper 904:49–55, 1992. 29. D. C. Wright, G. R. Olhoeft, and R. D. Watts, in Proceedings of the National Water Well Association Conference on Surface and Borehole Geophysical Methods, 1984, pp. 666–680. 30. G. R. Olhoeft, in Physical Properties of Rocks and Minerals, Y. S. Touloukian, W. R. Judd, and R. F. Roy, eds., McGrawHill, New York, 1981, pp. 257–330. 31. P. V. Sellman, S. A. Arcone, and A. J. Delaney, Cold Regions Research and Engineering Laboratory Report 83-11, 1–10 (1983). 32. A. P. Annan, W. M. Waller, D. W. Strangway, J. R. Rossiter, J. D. Redman, and R. D. Watts, Geophysics 40, 285–298 (1975). 33. A. R. von Hippel, Dielectrics and Waves, MIT Press, Cambridge, MA, 1954. 34. M. B. Dobrin, Introduction to Geophysical Prospecting, McGraw-Hill, NY, 1976. 35. R. E. Sheriff, Encyclopedic Dictionary of Exploration Geophysics, Society of Exploration Geophysics, Tulsa, OK, 1984. 36. S. A. Arcone, J. Appl. Geophys. 33, 39–52 (1995). 37. A. P. Annan and S. W. Cosway, in Proceedings of the Fifth International Conference on Ground Penetrating Radar, Walnut Creek, CA, 1994, pp. 747–760. 38. J. L. Davis and A. P. Annan, Geophysics 37, 531–551 (1989). 39. N. Engheta, C. H. Papas, and C. Elachi, Radio Sci. 17, 1557– 1566 (1982). 40. E. Lanz, L. Jemi, R. Muller, A. Green, A. Pugin, and P. Huggenberger, in Proceedings of the Fifth International Conference on Ground Penetrating Radar, Walnut Creek, CA, 1994, pp. 1261–1274. 41. P. Huggenberger, E. Meier, and M. Beres, in Proceedings of the Fifth International Conference on Ground Penetrating Radar, Walnut Creek, CA, 1994, pp. 805–815. 42. L. B. Conyers and C. M. Cameron, J. Field Archaeology 25, 417–430 (1998).
H HIGH RESOLUTION SECONDARY ION MASS SPECTROSCOPY IMAGING
charged secondary ions can be collected directly, energyanalyzed, and mass-separated, according to their mass-tocharge ratio, by a mass filter or spectrometer to provide mass-resolved signals adequate for SIMS microanalysis in the form of mass spectra, depth profiling, and imaging. The last results in creating two-dimensional compositional maps of the analyzed surface. The neutral sputtered atoms can also be collected and identified by various methods of postionization. Secondary electrons are also abundantly emitted in the primary ion bombardment process, and provide signals suitable for imaging the surface topography or to yield material contrast. When the primary ion probe is rastered across the sample, these signals yield images analogous to those obtained by the scanning electron microscope (SEM). It is customary to classify the conditions by which this form of ion analysis is performed into two categories: ‘‘static’’ and ‘‘dynamic’’ SIMS (1). The former refers to primary ion bombardment conditions that affect, essentially, only the topmost monolayer of the sample surface. The latter refers to conditions that perturb the near-surface layers of the sample by complex interactions of the impinging ions with the target material and lead to rapid sputter-erosion of the sample. Dynamic SIMS is generally carried out with primary ion fluences higher than those employed for static SIMS. Reactive ion species are often used in dynamic SIMS to enhance secondary ion yields. Under dynamic SIMS conditions generally necessary for SIMS imaging, the surface of the material is continually eroded at a controllable rate, and new layers of the sample are sequentially exposed. This process, which is effectively analytical tomography on a microscale, permits the three-dimensional reconstruction of the chemical and structural constitution of a volume of the target object. Before describing the methods used for SIMS image formation and discussing issues of analytical sensitivity and image resolution, it is important to recall the fundamental physical processes that govern secondary ion emission and that ultimately contribute to the availability of signals suitable for forming analytical images or maps. The feasibility of SIMS depends primarily on the sputtering yields (2) and on the ionization probabilities of the sputtered atoms (3,4). In fact, for a species A present in a sample at concentration CA , the detected secondary ion current IA can be expressed as
RICCARDO LEVI-SETTI KONSTANTIN L. GAVRILOV The University of Chicago Chicago, IL
INTRODUCTION Several modern microanalytical techniques strive to describe the chemical composition of materials by images at ever increasing spatial resolution and sensitivity. Digital micrographs that depict the two-dimensional distribution of selected constituents (analytical maps) across small areas of a sample are one of the most effective vehicles for recording the retrieved information quantitatively. One technique in particular, secondary ion mass spectroscopy or, more appropriately, spectrometry (SIMS) imaging, has been advanced during the last two decades to reach analytical image resolution of a few tens of nanometers. High-resolution SIMS imaging has become practical due to the development of finely focused scanning ion probes of high brightness, incorporated in scanning ion microscopes or microprobes (SIM), as will be illustrated in this context. SIMS images will be shown that embody a wealth of correlative interwoven information in a most compact form. These can be thought of as two-dimensional projections of an abstract multidimensional space that encompasses the spatial dimensions (physical structure), the mass or atomic number dimension (chemical structure), and, as a further variable, the concentration of each constituent in a sample (quantification). In this paper, we will review the fundamental principles that underlie the formation of SIMS images, the requirements and limitations in attaining high spatial resolution (<100 nm), the engineering details of a state-ofthe-art SIM, and a number of applications, illustrated by images, in the materials sciences. For SIMS applications in the range of ∼1 µm, typical of the more accessible ion microscopes, the reader is referred to the excellent Wiley series of proceedings of the biennial, international SIMS conferences.
IA = IP SA γA ηA CA ,
(1)
FUNDAMENTALS OF SIMS IMAGING where IP is the probe current, SA is the sputtering yield (the number of sputtered atoms per incident ion), γA is the fraction of atoms that emerge as charged ions, and ηA is a detection efficiency factor. The product
Secondary ion mass spectroscopy or spectrometry (SIMS) exploits the interaction of fast primary ions with solid matter to obtain information about the structure and chemical composition of the bombarded material. Under ion bombardment, the atomic constituents of the near surface of a solid are ejected as positive or negative lowenergy secondary ions or sputtered as neutral atoms. The
YA = γA ηA
(2)
represents the overall ion yield, also called the useful yield. 477
478
HIGH RESOLUTION SECONDARY ION MASS SPECTROSCOPY IMAGING
The sputtering yield S is a function (2) of the atomic density- and energy-dependent rates of energy loss via nuclear stopping, the efficiency of energy transfer, the ratio of the masses of target and projectile, and the surface binding energy (or sublimation energy) of the atoms in the material. The sputtered atoms that emerge from the surface have an energy spectrum that peaks at low energy (a few electron volts) and have a rough (cos θ )−F dependence on the incident angle θ (measured from the normal), where F ≈ 1, for θ smaller than ∼60° . Elementdependent variations in surface sublimation energy (a result of atomic shell effects) cause large changes in S as a function of the atomic number of the target. For example, the calculated S for a 40-keV Ga+ ion probe varies from 2–70 for pure elements (5). Clearly, the sputtering yields govern the surface erosion rates. Sputtering is also affected by the crystalline structure of materials due to channeling effects: the crystal orientation dependence of S (6) is primarily responsible for the material contrast observed in ion microscopies (7,8). The ion fraction γA is also species-dependent (4): The positive ion fractions γA+ of elements sputtered from a common matrix exhibit an inverse exponential dependence on the ionization potential of the sputtered atoms. Consequently, the alkali (e.g., sodium, potassium) and cations (e.g. magnesium, calcium) have large γ + factors and are easily detected. Similarly, the negative ion fractions γA− show an exponential dependence on electron affinity. As a result, elements such as carbon, oxygen, halogens, sulfur, and phosphorus, and many compounds or clusters such as C2 , O2 , CN, PO, PO2 , and PO3 have abundant negative ion yields. The ion fractions have a strong Z-dependence, and may vary over a vast range (10−1 –10−5 ). A number of semiempirical and theoretical models serve as guidelines for quantitation (1, Section 3.3; pp. 9–11). The Z-dependence of the sputtering yields S conspires with that of the ion fractions γ to make the useful yield Y strongly species-dependent. Furthermore, the positive ion fractions and consequently, the detected yields are strongly enhanced by the presence of oxygen or other electronegative species on the emitting surface. Negative ion yields are correspondingly enhanced by electropositive species such as cesium, although oxygen also seems effective (12). It is common practice to provide O2 coverage of the sample during sputtering or to use oxygen or cesium as sputtering agents to maximize sputtered ion yields (3,13,14) and ultimately, the feasibility of imaging the distribution of particular ion species. Because of these factors and the additional presence of matrix effects, reliable SIMS quantitation requires the analysis of standards. These are best provided by chemical doping or by ion implantation in the matrix under investigation. These considerations are relevant to the retrieval of absolute quantitative information from digital images. However, within an image derived from a uniform matrix, the relative SIMS signal for a species measured in different areas can be determined very accurately (within the limits of available statistics) without recourse to standards. It should be pointed out that quantification for details embedded in high spatial resolution SIMS images is more daunting. Small (<1 µm) phases and boundaries are often
encountered. Without previous knowledge of the nature of such small features, it is often not possible to select the correct reference standard. However, even qualitative evaluations are often of much value, especially when correlating SIMS maps derived from several secondary ion signals. The basic principles of SIMS, the complexities of SIMS instrumentation and quantitation, and the applications of the SIMS method have been exhaustively covered in the literature, primarily in the treatise by Benninghoven et al., (1), where aspects of SIMS not touched upon here, such as time-of-flight mass spectrometry and postionization, are also discussed. Other publications, relevant to issues of high-resolution SIMS imaging, may be consulted (15–20). METHODS OF HIGH-RESOLUTION DYNAMIC SIMS IMAGING Dynamic SIMS imaging is accomplished using either of two basic instrument designs: the stigmatic, direct imaging emission ion microscope (EIM), an analog of the electron emission microscope (21), and the scanning ion microscope, microprobe, or microanalyzer (SIM) (22), an analog of the scanning electron microscope and of the electron microprobe EDX analyzer. The basic design of the former instrument was later incorporated in the ion microscope series still marketed by CAMECA (France). The principal handicap of the original EIM design is the modest spatial image resolution that it provides, primarily due to the large energy spread of the ejected secondary ions; the latter cause severe chromatic aberration in the image forming system. The ultimate spatial resolution of mass-resolved images obtained by the EIM approach is comparable to that of the best optical microscopes (∼0.5 µm) (23). This limitation also affected the early scanning ion microprobes (22) despite efforts to reduce the probe spot size (24). In this case, the basic limitation was due to the large size and consequent low brightness of the duoplasmatron ion sources employed (approximately 200 A cm−2 sr−1 ), whereby submicrometer size probes would not carry enough current for practical applications. During the last 15 years, ion microprobe instruments have achieved unprecedented analytical image resolution (25) using focused beams extracted from high brightness liquid metal ion sources (LMIS) (26–28). The brightness of these sources is ∼104 times larger than that of duoplasmatron sources and enables the formation of intense submicrometer ion probes (29,30). In the following, we will dwell in detail on the progress toward high-spatial-resolution dynamic SIMS imaging made possible by using LMIS-based, focused heavy ion probes through the scanning ion microprobe approach. Having played a role in this development, our coverage of this topic will be based largely on our first-hand experience in creating the relevant instrumentation, in exploring the requirements and limitations of high-resolution (<100 nm) SIMS imaging, and in its interdisciplinary applications. Thus, for this discussion we shall describe at some length the scanning ion microprobe developed at the University of Chicago (UC-SIM), originally in
HIGH RESOLUTION SECONDARY ION MASS SPECTROSCOPY IMAGING
collaboration with Hughes Research Laboratories. It is not our intention to exclude from consideration other instrumentation: the design principles and characteristics of the UC-SIM typify the basic operation of a large class of modern scanning ion microprobes, inclusive of a highresolution scanning ion microprobe of novel design now marketed by CAMECA (‘‘NanoSIMS 50’’) (31). The first step toward radically reducing the size of an ion probe and consequent increase in image resolution involved the realization that point-like ion sources of brightness adequate for scanning microscopy had to be developed (32), analogous to the use of field emission sources for high-resolution scanning electron microscopy (33). Such sources are needed to provide a Gaussian image spot size in the nanometer range using minimal demagnification, a condition necessary to obtain probes that carry current suitable for scanning. Thus, initially, the field ionization (34) source has been advocated (32) as the suitable choice for realizing an ion probe in the nanometer range. Using a basic diode ion gun and a cryogenic field ionization source of hydrogen ions, a scanning transmission ion microscope (STIM) was constructed by Escovitz et al. in 1975 (35). It yielded images of biological samples in the 100-nm resolution range using a 55-keV proton probe of ∼10−2 pA. Also using a field ionization source, proton and Ar submicron probes of current density in the 10−2 A cm−2 at 12 keV were obtained by Orloff and Swanson (36) and used in the SIM mode in microfabrication. The introduction of liquid metal ion sources (LMIS), closely related to field ion sources, revolutionized ion probe technology. Because of the point-like geometry of their confined emission cones and their consequent high brightness (∼106 A cm−2 sr−1 ), these sources have made possible high current density (∼1 A/cm2 ), finely focused (<100 nm FWHM) probes (29) adequate for focused ion beam (FIB) lithography, high-resolution scanning ion probe microscopy, and SIMS imaging. An exhaustive discussion of the optical principles that underlie the formation of nanoprobes using field emission ion sources and LMIS, together with a number of optimized optical column designs and the prospects for high-resolution imaging SIMS, was advanced soon afterward (30,37). Using LMIS, it became practical to use such beams in the scanning mode to study contrast mechanisms in secondary ion imaging (38) and for high-spatial-resolution dynamic imaging SIMS (25,39), even with instruments using the relatively inefficient RF quadrupole mass filters. Nevertheless, imaging SIMS at 20-nm lateral resolution was demonstrated by Levi-Setti et al. (40,41) using an earlier version of the UC-SIM. This instrument was upgraded (42,43) to incorporate a magnetic sector mass spectrometer (UC-SIM magSIMS), as detailed in the following. Comparing the two approaches to dynamic SIMS imaging sketched before, it is relevant to recall the following considerations. In the EIM, all points of the bombarded surface contribute simultaneously to the final image; whereas, in the SIM, each picture element (pixel) is recorded in sequence. Consequently, for identical ion dose per unit area, the collection time will be greater in the SIM
479
by a factor equal to the number of pixels in the image. The features that distinguish the emission electron microscope (EEM) from the scanning electron microscope (SEM) also apply to ion analysis instruments. The increased exposure time required for the serial readout of ion microprobe information is partly compensated for by the high brightness of the LMIS source (and, in turn, of the ion probe). Such trade-offs translate into different amounts of sample material that can be analyzed in a given exposure time and affect the elemental concentrations that can be detected per unit time by imaging SIMS using the two types of instruments. A detailed comparison of the erosion rates inherent in the two approaches to dynamic SIMS imaging has been presented in an earlier publication (45). It is relevant to recall that the scanning approach entails the ability to vary the erosion rate across a wide range by varying either the dwell time of the probe per picture element (pixel) or the size of the scanned area. Thus, although the probe current density for a LMIS SIM is more than 103 times that in the EIM and 108 times that of static SIMS (for a stationary probe), a scanned probe can be made to approach the conditions of static SIMS, whereby only the top monolayer of a sample is eroded during an actual scan. DESIGN AND PERFORMANCE OF THE SCANNING ION MICROPROBE UC-SIM MAGSIMS The UC-SIM, inaugurated in 1983–1984 was conceived to explore the spatial resolution limits of SIMS imaging by using a scanning ion probe instrument, based on an integrally designed secondary ion transport and analysis system (25,40). Equipped with an electrostatic cylindrical energy analyzer ( E/E = 5%) and an RF quadrupole mass filter (M/ M = 300 at 40 amu), the total secondary ion detection efficiency was 0.2% for Al, already exceptional for a quadrupole SIMS instrument but definitely inferior to that of time-of-flight or magnetic sector microscopes and analyzers. Although the excellent imaging properties of this instrument have been established (41), the low collection efficiency and limited mass resolution often precluded the feasibility of important classes of investigations. As a preferred upgrade, it was decided to couple the original UC-SIM to a high-performance magnetic sector mass spectrometer in a system now named UC-SIM magSIMS (42,43,45). A top view of the layout of the UC-SIM magSIMS in its present configuration (43) is shown in Fig. 1. The primary beam column is normal into the page at the center of the microprobe vacuum chamber at the right of the figure. A differential pumping aperture separates the magnetic spectrometer, operating in the 10−7 –10−8 torr range, from the microprobe chamber, operating at 10−9 torr or better. Alternative operation of the SIMS system using a RF quadrupole mass filter is retained through a ‘‘switchyard’’ deflecting quadrupole, part of the secondary ion transport system. In the retrofit of the magnetic sector spectrometer onto the existing UC-SIM, preservation of the imaging qualities of the original primary optical column was of paramount concern. Several innovations made this possible as will be seen in the following.
480
HIGH RESOLUTION SECONDARY ION MASS SPECTROSCOPY IMAGING
Exit slit ESA 90°
Scanning ion microprobe Sample inserter Differential pumping Entrance Switchyard slit
Extraction aperture Beam-defining aperture and asymm. triode lens Alignment octupole
r = 35 cm f = 65° a = 14 mr Magnetic sector mass spectrometer
Liquid metal ion source
RF quadrupole
Spherical ESA/ collector
1m Figure 1. Layout (top view) of the University of Chicago high-resolution scanning ion microprobe UC-SIM magSIMS. Either the magnetic sector mass spectrometer or a RF quadrupole mass filter can be accessed through a secondary ion beam switchyard (adapted from Ref. 43).
Differential pumping aperture
Dual deflection octupoles
Einzel lens
Probe-Forming Optical Column The probe-forming optical column was designed (30) to incorporate the following features: (1) decoupling the source extraction voltage (Vs ) from the final probe voltage (Vp ), (2) maintaining a fixed object and image position while varying source and probe voltages, (3) minimizing the chromatic aberration of the column, and (4) providing a crossover where a small aperture could be placed without affecting the transmission of the column. This aperture filters out stray and unwanted beam components (e.g., halo) and also serves as a differential pumping orifice between the source and the specimen enclosures. An exploded view of the column is shown in Fig. 2 (20). The Ga LMIS, which consists of a W needle protruding through a W heater ribbon (designed at Hughes Research Laboratories), and its enclosure are shown in Fig. 3. To activate the LMIS, a droplet of Ga is placed as a reservoir where the W needle pierces the ribbon and the tip assembly placed in a vacuum chamber. When the W ribbon is heated by current to a red glow, the Ga ‘‘wets’’ the protruding needle and the ribbon interior. The emitting tip is surrounded by a tantalum housing that incorporates the extraction aperture. This precaution, which minimizes contamination of the tip caused by the backscattering of material (46), together with the low residual pressure of the tip environment (∼10−10 torr) ensure a practically limitless life of the LMIS. The theory of LMIS ion emission and operating details of this source have been reviewed elsewhere (30,47). When subjected to a high electrical extraction field, the liquid metal that coats the tip assumes a characteristic conical shape (Taylor cone) (48), and intense ion emission occurs. The two-lens column can operate over a range of Vs (5–30 kV) and Vp (20–60 kV). It comprises an asymmetrical triode lens (49) that operates in the decelerating mode and an einzel lens to bring the beam to a final focus. Three octupole scanning arrays are inserted between the two lenses. The top one, above
Spherical ESA/ deflector CEM detector
Exit aperture Sample
Figure 2. Exploded view of the two-lens primary ion column of the UC-SIM magSIMS. Gallium ions to form the probe are extracted from the liquid metal ion source at the top and accelerated to 20–60 keV (adapted from Ref. 20).
High-voltage insulator
LMIS connector Liquid metal ion source
Field-free region
Insulated jaw Insulator Pump-out gap Extraction: Cylinder (Ta) Electrode Aperture First lens electrode Beam-defining Aperture aperture housing (Ta)
Figure 3. Cross section of the liquid metal ion source of the UC-SIM magSIMS, its housing, and the extraction and beam defining apertures. The emitting W tip protrudes through a heating W ribbon (adapted from Ref. 47).
the differential pumping aperture, is used for alignment and beam blanking. The lower pair provides double beam deflection to scan the sample, above the einzel lens. This scanning mode has two advantages over scanning using only one deflector above the sample, as practiced in some systems: (1) the beam is pivoted through the coma-free point of the einzel lens (50), thus allowing large beam
HIGH RESOLUTION SECONDARY ION MASS SPECTROSCOPY IMAGING
Dc = Cc αp ( Es /Ep ),
(3)
where Cc is the total chromatic aberration coefficient of the lens system, Es is the FWHM energy spread of the source ions, Ep is the final energy of the probe ions, and αp , the probe convergence semiangle is related to α0 through the Helmoltz–Lagrange relationship αp = α0 (Es /Ep )1/2 M −1 .
(4)
Here, Es is the energy of the extracted source ions, M is the system magnification (M = 0.2 in the present system). Another contribution to the limiting probe diameter is given by the Gaussian image size of the source DG : DG = Mds ,
Probe current (A) 10−14
10−13
10−12
10−11
10−10
10−9
Contributions to final probe size:
Probe diameter (nm)
deflections without loss of resolution due to coma; and (2) it reduces the column magnification by allowing the sample to be placed a short distance below the einzel lens, hence reducing the probe diameter. A beam-defining aperture restricts the accepted beam semiangle α0 at the source, thus controlling the size of the focused probe that is determined primarily by the chromatic aberration of the column. This aberration is dominant due to the relatively large energy spread Es of the LMIS emitted ions (7–10 eV). In fact, across most of the practical range, the contribution Dc of the chromatic aberration to the probe diameter is (30)
481
102
e∆Vs = 7 eV FWHM Vs = 10 kV, Vp = 40 kV 50-nm virtual source size Total magnification = 0.2 Measured FWHM probe dia.: From SIMS edge profile of Al bars Inferred from image
Spherical aberration
10
Chromatic aberration 1 10−5
Gaussian image of source
10−3 10−4 Accepted half-angle at source a0 (rad)
10−2
Figure 4. Plot of the Ga probe diameter vs. the accepted half-angle of the emitted ions at the source. The solid curve is obtained from the calculated contributions to the final probe size (dashed lines) assuming a virtual source size of 50 nm. Experimental points are shown for comparison (adapted from Ref. 47).
(5)
where ds is the virtual source size. The calculated probe diameter Dp , obtained by adding in quadrature Dc and DG and assuming the value ds = 50 nm, is plotted in Fig. 4 versus the accepted source aperture semiangle α0 for a 40-keV Ga probe. Excellent agreement is observed for a number of experimental points, determined by measurements of image edge definition (44). Efforts to decrease to a probe diameter below 20 nm have not been deemed worthwhile in view, first of all, of the inherent decrease in probe current below 1 pA. Second, a study of the spread of the collisional cascade induced by the impinging probe ions shows (Fig. 5) (18) a theoretical resolution limit of the order of 20 nm for a 40-keV Ga probe on aluminum, for example. It should be emphasized that the size of the probe approaches the analytical image resolution only when sufficient signal count statistics are available. These considerations have been the object of a detailed study (16,18). As discussed previously (47), two conclusions emerge as essential to minimize the probe size: (1) the source energy spread should be limited as much as possible. This occurs for extraction (tip) currents of only a few µA. (2) An optical column design should be chosen that minimizes the overall chromatic aberration coefficient Cc . The probe size also decreases as probe energy increases, at the expense, however, of depth resolution. Magnetic Sector Mass Spectrometer In view of (1) the idiosyncrasies of the species-dependent useful ion yields, (2) the limited probe currents available
for the smallest probe sizes (see probe current scale at the top of the diagram of Fig. 4), and (3) the requirement of high signal statistics to achieve analytical image resolution that approaches the size of the probe, it becomes imperative to maximize the value of the parameter ηA in Eq. (1) for high-resolution imaging SIMS. This measure represents the best that can be done, at least in part, to offset the well-known trade-off between analytical sensitivity and resolution. The parameter ηA is the product of two factors that represent, respectively, the collection/transmission efficiency of the secondary ion transport system before entering the mass spectrometer and the transmission efficiency of the mass spectrometer itself. To maximize the latter, a Finnigan MAT 90 fully laminated, double focusing, magnetic sector mass spectrometer (reverse Nier–Johnson geometry) that has a 90° electrostatic sector was selected. Its mass range is 3,500 amu at full accelerating voltage 5 kV, increasing to 17,000 at lower secondary ion energy. The transmission of the spectrometer (excluding the interface optics) at 5 kV and resolution M/ M = 1, 000 is 70% and mass independent. For comparison, the transmission of the RF quadrupole is only 1% at low masses and decreases as mass increases. The energy acceptance is 0.7% (35 eV at 5 kV). The mass resolution can be varied from a minimum M/ M of 570 up to 50,000. Ions are detected by an active film electron multiplier (ETP AF820) that operates in the pulse-counting mode and can accept count rates up to 50 Mhz. The spectrometer was modified to allow injection of the secondary ion beam from the SIM.
482
HIGH RESOLUTION SECONDARY ION MASS SPECTROSCOPY IMAGING
(a) Ga+
dy/dR
Sputtered particles/ 1Å ring
0.12
Al 15 keV
0.10 50%22Å
0.08
40 keV 0.06 50%31Å
0.04
0.02 50%45 Å 100 keV 0 1 r
2 4 8 16 32 64 128 256 Å Radial distance from incidence
(b)
m 1< m 2 r
m1 r Surface
m2 Sputtering depth
m1 > m2
m1 r
r Surface
Sputtering m2 depth
Figure 5. (a) Monte Carlo calculation of sputtered atom intensity as a function of radial distance from the point of primary Ga+ ion impact on Al. (b) Illustration of emission processes for different combinations of probe and target mass m1 and m2 . Actually, on average, each emission is the end result of a chain of collisions longer than shown (adapted from Ref. 18).
Secondary Ion Collection and Transport Traditionally, in magnetic instruments, the secondary ions are collected and accelerated to the energy needed for injection into the mass spectrometer by a large potential applied to the sample region. Such biasing would severely affect the primary ion optics of the UC-SIM, where the probe is incident normally to the sample plane and the secondary ions are collected backward and deflected out of the primary column by a 90° electrostatic energy analyzer (ESA). The solution adopted for magSIMS operation of the microprobe consists of floating the sample and the entire secondary ion transport system together to the final
acceleration voltage needed (5 kV), and the secondary ions are extracted by a modest potential (200–300 V) into a miniature 90° one-quarter spherical ESA and are deflected toward the interface of the mass spectrometer. Using this strategy, the primary optics is practically unaffected, except for a longitudinal acceleration/deceleration (when collecting negative/positive secondary ions) of the probe by ±5 kV. The secondary ions are then gradually accelerated and focused onto the entrance slit of the mass spectrometer by a stack of cylindrical lenses and quadrupoles, as shown in Fig. 6. This figure illustrates the structure and the potential along the ion axis for the two operating modes of the system leading either to the RF quadrupole mass filter or to the Finnigan MAT 90 through the switchyard quadrupole/deflector. The distance from the sample to the entrance slit of the magnetic sector is 96 cm. The final set of downstream lens/quadrupoles shapes and focuses the secondary ion beam onto the slit in the horizontal plane and onto the magnet center in the vertical plane. The potentials applied to the switchyard quadrupole are ramped synchronously by the raster scanning the sample to reaim the secondary beam onto the center of the entrance spectrometer slit during scanning (dynamic emittance matching). The energy window of secondary ion acceptance is 30 eV wide. The result of a computer simulation of the transmission efficiency of the secondary ion transport system versus the initial energy of the secondary ions (45) is shown in Fig. 7. Due to the low extraction potential, only the lowest energy secondary ions are transmitted at high efficiency (a drawback of the ion extraction strategy to preserve the excellent optical properties of the primary column). However, the overall average transmission efficiency ηA of the entire magSIMS system is estimated at ∼20%, an improvement by a factor of 100 over that of the previously used RF quadrupole system for large masses, and by a factor of 20 at the low end of the mass spectrum (the signals at high masses are suppressed by the RF quadrupole). As an example of the improvement in SIMS sensitivity and resolution attained by magSIMS versus RF quadrupole, the mass spectra of negative ions from pyrolitic graphite are shown in Fig. 8, as obtained (20) from the two systems using a 40-kV Ga+ probe. Note that, because the spherical analyzer and the deflectors break cylindrical symmetry, all optical calculation of the secondary ion collection and transport system had to be performed in 3-D aided by SIMION (51) and IONCAD simulation software developed in house. SIMS Imaging and Storage Systems SIMS images are formed by displaying on a monitor the pulse counts that emerge from the active film electron multiplier detector at the spectrometer exit, synchronously with the probe digital raster scan. The latter can be varied from 16 × 16 to 1, 024 × 1, 024 pixels/frame, at dwell times that can be varied from 500 ns to 16 ms. The shorter dwell times are suitable for visual observation of the sample topography, focusing, and astigmatism correction using the ion-induced secondary electrons (ISE) or secondary ions (ISI) collected by a
HIGH RESOLUTION SECONDARY ION MASS SPECTROSCOPY IMAGING
483
RF quad. low-voltage mode
(a)
Expanded beam profile (side view) Potential along ion axis
Extraction potential ∆V = 200 V 200 V 0
Switchyard, dynamic emittance matcher L4 Layout RF quadrupole (top view) spectrometer
Apertures L3
L2 L1
Quadrupole lenses (L1 − L4) = Einzel cyl. lenses
(b)
Spherical analyzer
Mag.sector high-voltage mode
Exit slit
Expanded beam profile (side view) Extraction potential ∆V = 300 V 5 kV
ESA
Potential along ion axis Beam shaper
0 Quad. lens L4
L3 A2 L2 L1A1
Magnet Entrance slit Finnigan-MAT mass spectrometer
(L1 − L4) = Einzel cyl. lenses (A1, A2) = Accelerating electrodes
Figure 6. Schematics of the secondary ion transfer optics in the UC-SIM magSIMS for the ‘‘low-voltage mode’’ (a), leading to a RF quadrupole mass filter, and for the ‘‘high-voltage mode’’ (b), leading to the magnetic sector mass spectrometer. The axial potential along the transport optics for the two operating modes is also shown.
channel multiplier detector that overlooks the sample. For high-resolution SIMS image recording, however, single-pass scans at relatively long dwell times are chosen. A discussion of the dwell time conditions needed for high-resolution SIMS imaging has been presented previously (25): to preserve scan-image synchronism, the species-dependent transit time of the secondary ions from target to detector must be taken into account. The size of the scanned area can be varied across a wide range. To locate areas of interest, the raster potentials can be amplified 10X to yield fields of view as large as 3 mm. For high-resolution work, typical fields of view vary between 5 and 160 µm2 . To avoid accumulation of charge deposited by the probe or induced by the secondary emissions, which can lead to complete disappearance of the SIMS signals, insulating samples are mounted on metal supports or gold-coated glass coverslips and sputter-coated with a few nm of gold. The detector pulses are discriminated, amplified, and shaped using fast nuclear counting electronics, and are fed to counting rate meters, a multichannel analyzer for mass spectra acquisition and depth profiling, or displayed for imaging and recording by a KONTRON IMCO 500
image processing and storage system. The SIMS images of raster size 512 × 512 pixels are directly stored in digital memory and later transferred to optical disks for archival storage. Correlative superpositions of color-coded SIMS maps for two or more mass-resolved images are valuable for identifying compositional structures, as will be illustrated in the following. This superposition can be performed using either sequentially retrieved images, after registration, or from interlaced multimass SIMS maps (the latter when using the RF quadrupole) obtained by ‘‘peak switching’’ (52). Issues that pertain to image quality dependence on signal statistics, such as edge definition and analytical contrast have been dealt with in previous publications (16,18,44). In these studies, the attainable image quality has been related to the limitations encountered in SIMS microanalysis. These involve considerations of available sample volume, detection sensitivity, elemental concentration, available probe current, and practical scan time. It suffices to say here that high-resolution SIMS imaging is not always feasible and that a trade-off between sensitivity and resolution must always be based on the sample.
484
HIGH RESOLUTION SECONDARY ION MASS SPECTROSCOPY IMAGING
120
UC scanning ion microprobe
Efficiency (%)
100
Structural composition of materials
40
0
0
10 20 30 40 50 60 70 Initial energy of secondary ions (eV)
80
Natural
Figure 7. Computed secondary ion transmission efficiency of the transfer optics for the UC-SIM magSIMS vs. initial secondary ion energy. The data are for optics optimized for 6 eV initial ion energy and assuming a cos2 θ distribution of ion emission angles (adapted from Ref. 45). The acceptance energy window of the secondary ion optics system is ∼30 eV wide.
104
Pyrolitic graphite UC quad SIMS
102
1 20
40
60
80 100 120 140 160 180 200 M/z
C9
C
C2 C3 C4 C5 C6 C7 C8
106
104
Pyrolitic graphite UC magSIMS 10 1 10−1 10−2 0
102
1
0
20
40
60
4
Rel. intensity
0
• Minerals • Meteorites • Biomaterials
Engineered • • • • •
Alloys Ceramics Composites High Tc Photoemulsion microcrystals • Catalysts
Chemical and physiological processes
• Nucleation and growth of metal oxides • Microstructural evolution in ceramics • Ion exchange in biomaterials • Metabolism of trace and toxic metals
Figure 9. Categorization of the broad areas of interdisciplinary research that have been investigated by high-resolution imaging SIMS using the University of Chicago high-resolution scanning ion microprobe.
C6 C7 GaC2 C8 C9 GaC4 C10 C11 GaC6 C12 C13 C14
Counts/3.6s
H C C2 C3 C4 C5
(a)
Counts/0.2s
Mass analysis
60
20
(b)
Depth profiling
Imaging
80
8c−n 12 16
80 100 120 140 160 180 200 M/z
Figure 8. Negative secondary ions mass spectra of clean pyrolitic graphite comparing the performance of (a) RF quadrupole mass filter vs. (b) magnetic sector-based SIMS for the UC-SIM magSIMS (from Ref. 20). The inset in (b) emphasizes the odd-even effect in the emission of carbines C− n . Note that the signal accumulation time (abscissa) is 18 times shorter in (b) than in (a).
RESEARCH APPLICATIONS OF HIGH-RESOLUTION SIMS IMAGING High-resolution SIMS imaging using the earlier version UC-SIM and the upgraded UC-SIM magSIMS to address
specific problems in the materials sciences, structural biology, and physiology has been the object of a number of previous reviews (17,19,53–55). The flowchart of Fig. 9 summarizes the broad areas in which this particular instrument has been applied. Each area further encompasses a spectrum of specific interdisciplinary topics, to which the high-resolution imaging SIMS approach contributed unprecedented and exclusive information. For example, the flowchart of Fig. 10 categorizes the biological and biomedical topics touched on by studies carried out by using the UC-SIM. In this context, however, examples to be summarily described in the following will be limited to topics in the materials sciences to include images that have contributed to the development of advanced and engineered materials, such as multishell tabular photoemulsion microcrystals, aluminum–lithium alloys, high-temperature superconductors, metal–matrix composites, and high-temperature and corrosion-resistant alumina ceramics. Silver Halide Photoemulsion Microcrystals To improve the efficiency of the photographic process, ‘‘designer’’ silver halide microcrystals are created that simulate the behavior of a semiconductor device (56). For example, to reduce the rate of electron–hole recombination following their creation in the photon absorption process, electron or hole traps are introduced into the microcrystals by physically separating regions of different band-gap width. This is the case for core-shell structures of, AgBr and AgBr1−x Ix (x < 0.1) as illustrated in the energy diagrams of Fig. 11. Some of these structures (57), represented by banded tabular grains (<0.3 µm thick) are shown in Fig. 12, where SIMS maps of different components, for example, Br− and I− , are superposed with
HIGH RESOLUTION SECONDARY ION MASS SPECTROSCOPY IMAGING (a)
UC scanning ion microprobe Chemical architecture of biostructures
485
Dynamics of biological processes, physiology
Existing structures
Normal state • Ion exchange in live and dead bone • Uptake of calcium tracer isotopes in bone • Cation dynamics in mitotic chromosomes
Mapping of diffusible and bound elements in: • mineralized tissue (dental tissue, bone, kidney stones) • soft tissue (pancreas, eye retina, isolated cells, chromosomes)
Altered states
Isotope-labeled structures • Mapping of labeled nucleosides in chromosomes • SIMS chromosome banding
• Metabolism of toxic metals and drugs • Bone ion exchange and resorption • Sister chromatid exchange
(b)
Figure 10. Categorization of specific topics in the biomedical sciences in which advances were made possible by using high-resolution imaging SIMS with the UC-SIM.
Conventional photoemulsion grain surface Conduction lred 2.57 eV
e-trap
λblue
(c)
Valence AgBr
sensitizer
Dye
Engineered grain Core
shell
2.38 eV
surface
e-trap
"Hole-trap"
AgBr++I
AgBr+I
AgBr Sensitizer
Figure 11. Energy level diagrams illustrating the mechanism by which core-shell engineered photoemulsion microcrystals prevent electron–hole recombination by creating of ‘‘hole traps.’’
different colors. Figure 12a shows grains that have a core of AgBr surrounded by one shell of AgBrI, and Fig. 12b shows an analogous four-shell structure. Another type of structure is shown in Fig. 12c, where an AgBr core crystal is decorated preferentially at the vertices by epitaxially
Figure 12. Images of core-shell, banded tabular photoemulsion microcrystals. (a) Superposition of Br− (red) and I− (yellow) SIMS maps where an AgBr core is surrounded by one shell of AgBrI. (b) Same as in (a) for a core-four shell structure. (c) Superposition of Br− (red) and Cl− (white) SIMS maps of a tabular AgBr grain where epitaxially grown AgCl microcrystals have been directed to grow preferentially at the AgBr grain vertices. Length of image sides: 20, 20, and 10 (µm). See color insert.
grown microcrystals of AgCl revealed by a Cl− map. The SIMS signals (for Br− ) obtained from these materials often exceeded 5 MHz; all maps were acquired in times shorter
486
HIGH RESOLUTION SECONDARY ION MASS SPECTROSCOPY IMAGING
than 105 s. On several occasions, images such as these have guided the development of novel photoemulsions.
(a)
Aluminum-Lithium Base Alloys Al–Li alloys are being developed and used as replacements for other Al base alloys in aerospace structural applications. The addition of Li reduces density and increases elastic modulus. Knowledge of the Li distribution in these alloys is critical to alloy development and in evaluating processing methods. Imaging SIMS due to its high sensitivity for Li, in contrast to the virtual inability of other microanalytical techniques, best attains this objective. Thus, the UC-SIM has been profitably employed in studies of the Li distribution associated with microstructures, phase equilibria, and oxidation behavior of Al–Li alloys (47,58). Fig. 13 shows an example of the chemical microstructure of Cu alloy that contains 18.7 at.% Al and 11.6 at.% Li aged at 500 ° C for 7 days, an object of a study of phase equilibria (58,59). The images depict three areas of a eutectic mixture containing a Li-rich T1 phase (Al2 LiCu) and an Al-rich solid solution α phase. In Fig. 13a, the combined distributions of Al+ (in blue) and Li+ (in yellow) are displayed. Li+ is confined to the T1 phase, whereas Al+ dominates in the α phase. Fig. 13b is a composite map of Al+ (blue) and O− (red). Here O− in trace amounts is completely segregated to the T1 phase due to the high affinity of oxygen for Li that arises, it is thought, from the oxidation of the liquid alloy during casting or from the reaction of Li with the residual oxygen in the specimen chamber. Figure 13c compares the distribution of Cu+ (white) with that of Al+ (blue), as in (a) and (b). Cu+ is seen predominantly in the T1 phase (that appears in black here). High Tc Superconductors Thin-film and bulk Y1 Ba2 Cu3 O7−x (1–2–3) high Tc superconductors have been analyzed using the UCSIM (the old RF quadrupole version) to correlate the superconducting properties with O stoichiometry and the presence of impurities. Variants of the 1–2–3 stoichiometric ratio have also been examined, for example, bulk superconductors sintered with excess Cu (as CuO) in the proportion 1–2–4 (47). An interesting anomaly in the SIMS emission intensities of Cu+ and Cu− was observed for this compound (60) and is illustrated in the SIMS images of Fig. 14 (47). Figure 14a is a superposition of SIMS maps of Cu+ (green) and Cu− (blue). Figure 14b compares Cu+ (green) with Ba+ (red), and Fig. 12c compares Cu+ (green) with O− (yellow); the latter mimics the distribution of the Cu− signal in Fig. 14a. Figure 12a reveals a striking anomaly: Cu+ emission is predominant (by a factor of ∼60 relative to the matrix) from large crystallites (green areas) that carry the excess Cu and are identified as a solid solution of Cu2 O and CuO (∼9 : 1) from a study of Cu/O ratios. Cu− emission instead is dominant (by a factor of ∼25 relative to the CuO crystals) from the matrix areas (blue) that correspond to the normal 1–2–3 compound. In other words, the ratio of the Cu emission intensities n− /n+ is ∼25 for the superconducting 1–2–3, and ∼1/60 for the insulator CuO. This extreme ‘‘matrix
(b)
(c)
Figure 13. Images depicting three areas of a eutectic mixture that contains a Li-rich T1 phase (Al2 LiCu) and an Al-rich solid solution α-phase. (a) SIMS combined distributions of Al+ (blue) and Li+ (yellow) where Li+ is confined to the T1 phase, whereas Al+ dominates in the α phase. (b) Composite map of Al+ (blue) and O− (red) (from Ref. 47). Here O− in trace amounts is associated with Li in the T1 phase. (c) Comparison of the distribution of Cu+ (white) with that of Al+ (blue). Cu+ is seen predominantly in the T1 phase (that appears in black here). Length of image sides: 80, 80, 80 (µm). See color insert.
HIGH RESOLUTION SECONDARY ION MASS SPECTROSCOPY IMAGING
(a)
487
effect’’ has been related to the O stoichiometry and to Tc (60). Metal–Matrix Composites
(b)
(c)
Metal–matrix composites are a new class of materials that combine the separate benefits of ceramics and alloys that have demonstrated in low-cost, high-performance applications in the automotive industry. However, the incorporation of ceramic materials into metal is complicated by reactions between the two and by binders and additives that may also be included in the composite. High-resolution imaging SIMS by the UC-SIM has been instrumental in signaling deleterious steps in the processing of some of these composites (61–63). One of these, selected from many studied, consists of an Al–Si–Cu–Mg (F332) in which alumina/silica (45/55%: Fiberfrax@ ) fibers have been randomly embedded. The fibers were coated with a silica binder, and the preform was calcined to either 850° or 1,200 ° C for 3 h before squeeze casting at high temperature (∼700 ° C) and pressure with the metal. After that, the composites were further heat treated at either 200 ° C (11 h: ‘‘T5’’) or at 530 ° C (6 h: ‘‘T6’’). Figure 15a is a correlative superposition of SIMS maps of the Al-T5 alloy, where Al is coded blue and Si is coded green. The alloy consists of large Al grains between which an Al-Si eutectic occurs, originating the lacy network of Si precipitates. The black areas contain Mg-rich intermetallic precipitates (not shown). Figure 15b depicts a fiber-containing area from a sample where the preform was calcined at 850 ° C and the composite was subjected to T5 heat treatment. Here, Si (red) from the silica component of the fibers fills the fibers, and Mg (yellow) is concentrates on a layer coating the fibers. In this case, Mg from the matrix has migrated toward the silica binder and reacted with it forming MgO according to the dominant reaction SiO2 + 2Mg → 2MgO + Si. The fibers, however, remain intact, and this reaction, in limited amount, is beneficial because it enhances interfacial wetting. Figure 15c shows the same composite after high-temperature T6 treatment, where again Si is shown in red and Mg in yellow. In this case, the previous reaction has taken a dramatic turn: the silica in the fibers has been completely reduced by Mg (as well as by Al), and large precipitates of elemental Si form in the matrix. The reinforcing action of the fibers is lost. Studies such as this led to an optimal set of processing parameters (63). Grain Boundary Segregation of Dopants in Alumina
Figure 14. Images of bulk high-Tc superconductor (Y1 Ba2 Cu3 O7−x ) (1–2–3) sintered with excess Cu (in the form of CuO) in the proportion 1–2–4 (from Ref. 47). (a) Superposition of SIMS maps of Cu+ (green) and Cu− (blue). The large green crystallites represent CuO precipitates, the blue areas, Cu in the (1–2–3) matrix. Note that the different distributions of Cu+ vs. Cu− represent a striking anomaly of the ion emission process. (b) Comparison of the SIMS distributions of Cu+ (green) with Ba+ (red). (c) Correlative SIMS distributions of Cu+ (green) vs. O− (yellow), the latter mimicking the distribution of the Cu− signal in Fig. 14a. Length of image sides: 40, 40, 40 (µm). See color insert.
The significance of grain boundaries in controlling processing and properties of ceramics is widely acknowledged. Properties such as creep resistance and corrosion resistance of polycrystalline Al2 O3 can be dramatically improved by adding dopants. Dopants may tend to segregate at grain boundaries, but the characterization of grain boundary chemistry is a challenging task. The UC-SIM (pre-1994) and its magSIMS (post-1994) have exhibited unprecedented potential in such characterization (64–67). A few examples drawn from these investigations are shown in Fig. 16. Fig. 16a is a superposition of SIMS maps of Al (red) and Ca (blue) for a fracture surface of alumina doped
488
HIGH RESOLUTION SECONDARY ION MASS SPECTROSCOPY IMAGING
(a)
(a)
(b)
(b)
(c)
(c)
Figure 15. Images of a metal–matrix composite that consists of an Al–Si–Cu–Mg (F332) matrix in which alumina/silica (45/55%: Fiberfrax@ ) fibers are randomly embedded. (a) Correlative superposition of SIMS maps of a region of the Al-based alloy containing eutectic Si precipitates, where Al is coded blue, Si is coded green. (b) Superposition of SIMS maps of a fiber-containing area where Si+ (red) fills the intact fibers and Mg+ (yellow) concentrates on a layer coating the fibers (low-temperature heat treatment). (c) In the high-temperature treatment (see text), the silica in the fiber has been completely reduced by Mg, and large precipitates of elemental Si (red) are form in the matrix. Length of image sides: 80, 40, 40 (µm). See color insert.
Figure 16. Images of grain boundary segregation of dopants in alumina. (a) Superposition of SIMS maps of Al (red) and Ca (blue) for a fracture surface of alumina doped with 500 ppm of CaO. (b) Pseudocolor SIMS La+ map (signal intensities increase on a thermal color scale) of a polished section of alumina doped with 500 ppm La. (c) SIMS map, color-coded as in (b), for alumina doped with 1,000 ppm Y. Length of image sides: 40, 20, 20 (µm). See color insert.
HIGH RESOLUTION SECONDARY ION MASS SPECTROSCOPY IMAGING
with 500 ppm of CaO (64). A Ca-rich compound, presumably a second phase CaO(Al) or CaAl2 O4 envelopes the grains and the circular pores on the grain surfaces. The pseudocolor image of Fig. 16b (signal intensities increase from blue to red) is a lanthanum map of a polished section of alumina doped by 500 ppm La. The dopant here is evenly concentrated over all alumina grain boundaries and at higher concentration at sparse pores and precipitates. A boundary enrichment factor of 500 ± 20 was derived for this dopant (67). A similar map is shown in Fig. 16c for alumina doped by 1,000 ppm yttrium. Again the dopant is segregated at grain boundaries by an enrichment factor of 900 ± 80, and at a multitude of second-phase YAG (yttrium–aluminum garnet) precipitates. It is thought that grain boundary precipitates can inhibit the boundary sliding that must accompany diffusional creep.
489
(a)
(b)
Effect of Codoping on Grain Boundary Chemistry in Alumina The resistance of alumina ceramics to highly aggressive and corrosive aqueous hydrofluoric acid (HF) is severely compromised by the presence of impurities, in particular SiO2 , also known to promote abnormal grain growth. Corrosion is attributed to the presence of glassy grain boundary films, highly soluble in HF, that lead to the loss of structural integrity and decay of the material. Solute MgO additions to alumina improve sintering characteristics, and control the microstructure and also the corrosion resistance (68,69). Only recently, the mechanism for the role of MgO in corrosion resistance has been unraveled in a series of experiments based on using high-resolution imaging SIMS (70,71). Alumina samples of the highest purity were prepared with controlled additions of either 29 SiO2 or MgO and also codoped with both additives. As shown in the SIMS map of Fig. 17a, single doping with MgO (Mg/Al = 500 ppm) (71) of an alumina sample sintered for 60 min at 1,650 ° C leads to sharp segregation of Mg at the alumina grain boundaries and some evidence of bulk solubility of magnesium in the grain interior. The bright particle in the field of view is believed to be a precipitate of spinel. The effect of codoping with MgO and 29 SiO2 (Mg/Al = Si/Al = 500 ppm) is portrayed in Fig. 17b,c. Both the Mg map of Fig. 17b and the Si map of Fig. 17c show a striking diffusion of both ions toward the grain centers in the near-grain boundary regions (70). This evidence supports the conclusion that both Si and Mg are drawn into solid solution in the alumina lattice, presumably by a defect compensation mechanism, as a result of codoping. It was also concluded that the beneficial effect of MgO addition in controlling microstructure and corrosion resistance to aqueous HF stems from its ability to redistribute Si ions from grain boundaries into the bulk. Note that the relative amounts of codoping additives are critical, implying that, for example, the addition of MgO should be ‘‘tailored’’ to preexisting silica content to achieve the desired result. An improvement in corrosion resistance by a factor of 50 can be attained by this approach that makes this engineered ceramic suitable for industrial applications.
(c)
Figure 17. Images showing the effect of codoping on grain boundary segregation in alumina. (a) Pseudocolor SIMS Mg+ map (signal intensities increase on a thermal color scale) of a polished section of alumina single-doped with MgO (Mg/Al = 500 ppm). Sharp Mg segregation at grain boundaries is observed. (b) Pseudocolor SIMS Mg+ map of a polished section of alumina codoped with MgO and 29 SiO2 (Mg/Al = Si/Al = 500 ppm). (c) 29 Si SIMS map of the same section shown in (b). Codoping enables the segregants to dissolve into the grain interior, and there is a marked increase in resistance to aqueous HF (see text). Length of image sides: 20, 20, 20 (µm). See color insert.
490
HIGH RESOLUTION SECONDARY ION MASS SPECTROSCOPY IMAGING
CONCLUDING REMARKS
10. C. A. Andersen and 1,421–1,438 (1973).
This review of high-resolution imaging SIMS technology has been partial in its choices of instrumentation and images. The authors do not claim that such choices represent an exclusive domain of best performance. However, because the authors and their former collaborators played an active role in the very development of this particular aspect of SIMS, their claim is to have maintained a high level of standards of SIMS imaging throughout such development. From the imaging standpoint, the main thrust of the present context, it is felt that the images presented do still represent state-of-the-art high-resolution imaging SIMS. Note that such images are not exceptional demonstration examples: they were acquired during routine applications of the technology to attain particular research goals, and they have succeeded in contributing information that could not have been obtained by other means. It has been the authors’ experience that even revisiting already well-established aspects of materials sciences phenomena using the present high-resolution SIMS technology, notwithstanding its limitations, has been rewarded by novel insights.
11. H. W. Werner, Surf. Interface Anal. 2(2), 56–74 (1980).
J. R. Hinthorne,
Anal.
Chem.
45,
12. P. Williams and C. A. Evans, Surf. Sci. 78, 324–338 (1978). 13. M. Bernheim and G. Slodzian, Int. J. Mass Spectrom. Ion Phys. 12, 93–99 (1973). 14. M. Bernheim and G. Slodzian, J. Phys. Lett. 38, L325–L328 (1977). 15. P. Williams, Scanning Electron Microsc. 2, 553–561 (1985). 16. R. Levi-Setti, J. Chabala, and Y. -L. Wang, Scanning Microsc. Suppl. 1, 13–22 (1987). 17. R. Levi-Setti, Ann. Rev. Biophys. Biophy. Chem. 17, 325–347 (1988). 18. J. Chabala, R. Levi-Setti, and Y. -L. Wang, Appl. Surf. Sci. 32, 10–32 (1988). 19. R. Levi-Setti et al., in D. B. Williams, A. R. Pelton, and R. Gronsky, eds., Images of Materials, Oxford University Press, New York, Oxford, 1991, pp. 94–113, Plates 3.1–3.4. 20. J. Chabala et al., Int. J. Mass Spectrom. Ion Process. 143, 191–212 (1995). 21. R. Castaing and G. Slodzian, J. Microscopie 1, 395–400 (1962). 22. H. Liebl, J. Appl. Phys. 38, 5,277–5,283 (1967).
Acknowledgments The authors thank Nicholas J. Pritzker and the Pritzker Foundation for support in the preparation of this manuscript.
24. H. Liebl, Vacuum 22, 619–621 (1972).
ABBREVIATIONS AND ACRONYMS
25. R. Levi-Setti, Y. -L. Wang, and G. Crow, J. Physique (Paris) 45(C9), 197–205 (1984).
EDX EEM EIM ESA FIB FWHM HF ISE ISI LMIS SEM SIM SIMS STIM UC YAG
electron dispersive X-ray (analizer) emission electron microscope emission ion microscope electrostatic (energy) analyzer focused ion beam full width at half maximum hydrofluoric (acid) ion (induced) secondary electron ion (induced) secondary ion liquid metal ion source scanning electron microscope scanning ion microprobe or microscope secondary ion mass spectrometry scanning transmission ion microscope university of chicago yittrium-aluminum garnet
BIBLIOGRAPHY ¨ 1. A. Benninghoven, F. G. Rudenauer, and H. W. Werner, Secondary Ion Mass Spectrometry, John Wiley & Sons, Inc., New York, NY, 1997, p. 664. 2. P. Sigmund, Phys. Rev. 184, 383–416 (1969). 3. G. Blaise and A. Nourtier, Surf. Sci. 90, 495–547 (1979). 4. P. Williams, Surf. Sci. 90, 588–634 (1979). 5. Y. -L. Wang, R. Levi-Setti, and J. Chabala, Scanning Microscopy 1, 1–11 (1987). 6. R. S. Nelson, The Observation of Atomic Collisions in Crystalline Solids, North Holland, Amsterdam, 1968. 7. R. Levi-Setti, T. R. Fox, and K. Lam, Nucl. Instrum. Methods 205, 299–309 (1982). 8. R. Levi-Setti, Scanning Electron Microsc. 1, 1–22 (1983). 9. C. A. Andersen and J. R. Hinthorne, Science 175, 853–860 (1972).
23. M. T. Bernius, Y. -C. Ling, and G. H. Morrison, Anal. Chem. 55, 94 (1986).
26. V. E. Krohn and G. R. Ringo, Appl. Phys. Lett. 27, 479–481 (1975). 27. G. R. Ringo and V. E. Krohn, Nucl. Instrum. Methods 149, 735–737 (1978). 28. R. Clampitt, K. L. Aitken, and D. K. Jefferies, J. Vac. Sci. Technol. 12, 1208 (1975). 29. R. L. Seliger, J. W. Ward, V. Wang, and R. L. Kubena, Appl. Phys. Lett. 34, 210–212 (1979). 30. R. Levi-Setti, Adv. Electron. Electron Phys. Suppl. 13A, 261–320 (1980). 31. F. Hillion et al., in A. Benninghoven, Y. Nihei, R. Shimizu, and H. W. Werner, eds., Secondary Ion Mass SpectrometrySIMS IX, John Wiley & Sons, Inc., New York, NY, 1994, pp. 254–257. 32. R. Levi-Setti, Scanning Electron Microsc. I, 125–134 (1974). 33. A. V. Crewe, D. N. Eggenberger, J. Wall, and L. M. Welter, Rev. Sci. Instrum. 39, 576–583 (1968). ¨ 34. E. W. Muller, Z. Phys. 131, 136 (1951). 35. W. H. Escovitz, T. R. Fox, and R. Levi-Setti, Proc. Nat. Acad. Sci. USA 72, 1826–1928 (1975). 36. J. H. Orloff and L. W. Swanson, J. Vac. Sci. Technol. 12, 1209–1213 (1975). 37. R. Levi-Setti and T. R. Fox, Nucl. Instrum. Methods 168, 139–149 (1980). 38. R. Levi-Setti et al., Nucl. Instrum. Methods 218, 368–374 (1983). 39. A. R. Bayly, A. R. Waugh, and K. Anderson, Nucl. Instrum. Methods Phys. Res. 218, 375–382 (1983). 40. R. Levi-Setti, G. Crow, and Y. -L. Wang, Scanning Electron Microsc. II, 535–552 (1985). 41. R. Levi-Setti, G. Crow, and Y. -L. Wang, in A. Benninghoven, R. J. Colton, D. S. Simons, and H. W. Werner, eds.,
HIGH SPEED PHOTOGRAPHIC IMAGING
42.
43.
44. 45.
46. 47. 48. 49. 50. 51. 52.
53.
54. 55. 56. 57.
58. 59. 60.
61. 62. 63. 64.
65. 66. 67. 68.
69. 70. 71.
Secondary Ion Mass Spectrometry-SIMS V, Springer-Verlag, NY, 1985, pp. 132–138. J. M. Chabala et al., in A. Benninghoven, K. T. F. Janssen, ¨ J. Tumpner, and H. W. Werner, eds., Secondary Ion Mass Spectrometry-SIMS VIII, John Wiley & Sons, Inc., New York, NY, 1992, pp. 669–672. R. Levi-Setti et al., in A. Benninghoven, Y. Nihei, R. Shimizu, and H. W. Werner, eds., John Wiley & Sons, Inc., New York, NY, 1994, pp. 233–237. R. Levi-Setti, J. M. Chabala, and Y. -L. Wang, Ultramicroscopy 24, 97–114 (1988). J. M. Chabala, in A. Benninghoven, B. Hagenhoff, and H. W. Werner, eds., Secondary Ion Mass Spectrometry-SIMS X, John Wiley & Sons, Inc., New York, NY, 1996, pp. 23–29. C. S. Galovich and A. Wagner, J. Vac. Sci. Technol. B6, 2108 (1988). R. Levi-Setti et al., Surf. Sci. 246, 94–106 (1991). G. I. Taylor, Proc. R. Soc. (London) A313, 453 (1969). J. H. Orloff and L. W. Swanson, Scanning Electron Microsc. 1, 39–44 (1979). A. V. Crewe and N. W. Parker, Optik 46, 183 (1976). D. A. Dahl, J. E. Delmore, and A. D. Appelhans, Rev. Sci. Instrum. 61, 607 (1990). J. M. Chabala, R. Levi-Setti, and Y. -L. Wang, in P. E. Russell, ed., Microbeam Analysis–1989, San Francisco Press, San Francisco, 1989, pp. 586–590. R. Levi-Setti and K. K. Soni, Proc.Workshop No. 15: ‘‘Rastermikroskopie in der Materialprufung’’, ¨ Deutscher Verband ¨ Materialforschung und Prufung ¨ fur E. V., Berlin, 1992, pp. 121–131. R. Levi-Setti et al., Scanning Microsc. 7, 1161–1172 (1993). D. B. Williams et al., J. Microsc. 169, 163–172 (1993). T. J. Maternaghan, C. J. Falder, R. Levi-Setti, and J. M. Chabala, J. Imaging Sci. 34, 58–65 (1990). J. M. Chabala et al. in G. Gillen, R. Lareau, J. Bennett, and F. Stevie, eds., Secondary Ion Mass Spectrometry-SIMS XI, John Wiley & Sons, Inc., New York, NY, 1998, pp. 651–654. K. K. Soni et al., Acta Metall. Mater. 40, 663–671 (1992). K. K. Soni et al., Microbeam Analysis 2, 13–21 (1993). R. Levi-Setti et al., in A. Benninghoven, K. T. F. Janssen, ¨ J. Tumpner, and H. W. Werner, eds., Secondary Ion Mass Spectrometry-SIMS VIII, John Wiley & Sons, Inc., New York, NY, 1992, pp. 769–772. R. Mogilevsky et al., Mater. Sci. Eng. A 191, 209–222 (1995). K. K. Soni et al., J. Microsc. 177, 414–423 (1995). W. S. Wolbach et al., J. Mater. Sci. 32, 1953–1961 (1997). K. K. Soni et al., in G. W. Bailey, J. Bentley, and J. A. Small, eds., Proc. 50th Annual EMSA Meeting, San Francisco Press, San Francisco, 1992, pp. 1772–1773. K. K. Soni et al., Surf. Interface Analysis 21, 117–122 (1994). K. K. Soni et al., Appl. Phys. Lett. 66, 2795–2797 (1995). A. M. Thompson et al., J. Am. Ceram. Soc. 80, 373–376 (1997). S. J. Bennison and M. P. Harmer, in C. A. Handwerker, J. E. Blendell, and W. A. Kaiser, eds., Ceramic Transactions, vol. 7, Sintering of Advanced Ceramics, American Ceramic Society, Westerville, Ohio, 1990, p. 13. US Pat. 5,411.583, May 1995, S. J. Bennison and K. R. Mikeska. K. L. Gavrilov et al., J. Am. Ceram. Soc. 82, 1001–1008 (1999). K. L. Gavrilov et al., Acta Mater. 47, 4031–4039 (1999).
491
HIGH SPEED PHOTOGRAPHIC IMAGING ANDREW DAVIDHAZY Rochester Institute of Technology Rochester, NY
INTRODUCTION High-speed photography is a specialty within the broad discipline of scientific photography that is used for making visible and possible quantitative evaluations of events that transpire over periods of time much too short for visual perception even when aided by the processes of image recording by normal photographic cameras. The time frame of certain events can be so short that sometimes neither qualitative nor, more importantly, quantitative evaluation of an event is possible using conventional photographic techniques. When the lifetime of events of interest is very short, both higher shutter speeds and framing rates than are normally available on standard cameras are required. When such events happen in adverse and hostile environments, this calls additionally for photographic systems ruggedized for such conditions. In addition, when the motion of subjects over time is of interest or there is a desire to record subjects that are too small for the unaided eye to appreciate fully, this also calls for special camera equipment. HIGH-SPEED STILL PHOTOGRAPHY It is a relatively simple task to record blurry and unsharp images of almost any event. When photography of fast moving events is contemplated, one of the first questions that needs to be answered is whether a given subject will appear sharp and in adequate detail in the final photographs. The answer to this question plays an important role in subsequent decisions about selecting photographic equipment and techniques to achieve the desired result. A basic fact is that any image of an object in motion relative to the photosensitive surface that will record it will, in principle, be invariably blurred regardless of how short the exposure time used to capture the event. In practice, one controls blur to fall below some desired threshold value. The factor primarily responsible for achieving perceptibly blur-free reproduction of subjects in motion is the shutter speed or exposure time used for making the original photograph. In simplified terms, the shorter the exposure time, the less distance a subject can move during the recording phase, and the sharper and more detailed an image will be. The actual perception of unsharpness or of the blur in a photograph is largely a function of the observer who evaluates the final image and the conditions under which the observer functions. Visual acuity, angle of view, orientation of the subject, viewing distance, lighting level, and image color all affect the ultimate perception of image sharpness. The perceived amount of blur in a reproduction changes depending on the optical magnification that was used when
492
HIGH SPEED PHOTOGRAPHIC IMAGING
the image was recorded in the first place, as well as on the magnification of that image introduced when it is made into a final print. A photograph that exhibits obvious blur can be reduced in size to make the blur imperceptible to the viewer, but at the same time, the subject’s image size is reduced. An image that does not capture the proper amount of detail (equated here with sharpness) at the outset will show that no improvement in sharpness is possible by altering viewing distance, enlargement, use of alternative focal length lenses, or making changes in the camera to subject distance. The exposure time necessary to limit blur in a subject moving at a particular rate can be estimated from the following relationship:
ET =
Maximum allowable blur (at subject) , K × Rate of subject motion × cos A
(1)
where ET = is the exposure time, K = is a ‘‘quality’’ constant equal to 2–4, and A = is the angle between the subject’s direction of motion and the film plane. The maximum allowable blur is often a percentage of the subject size, typically, about 10%. This means that one must accurately predict the smallest part of a given subject where one desires detail. This becomes the real subject of the photograph, and 10% of it will be assigned the ‘‘maximum allowable blur’’ factor in Eq. (1). The major problem in making high-speed still pictures of high-speed events is achieving short exposure times. Conventional mechanical shutters can reach exposures as short as 1/10,000 second but this might not be short enough. Good light transmission and short exposure times are characteristics that are usually not available simultaneously in most shutters. When the action-stopping ability of a mechanical shutter is not sufficient to achieve blur-free images, short flash illumination sources, either spark or electronic flashes, whose duration ranges from milliseconds to less than a microsecond, are traditionally used to achieve desired results. Short laser pulses are more frequently used today. Special pyrotechnic lighting units are also available. Fox-Talbot is credited with having made the first highspeed photograph in 1840. He placed a page from the London Times on a rapidly spinning wheel. Using light flashes generated by Leyden jars, he used an open shutter on the camera and captured easily readable images of the otherwise blurry event. An electronic flash (sometimes also referred to as a ‘‘strobe’’) is a device that stores electrical energy in capacitors for an extended period of time and then discharges it in a brief period of time through a tube most often filled with xenon or in spark illumination sources, by a simple discharge across an air gap. Electronic flashes are often characterized in terms of light output by their watt-second rating, their lighting efficiency, and also by their duration. The watt-second rating is determined by the operating voltage of the flash
circuit and the capacitance of the main capacitors: Watt-second =
Capacitance × voltage . 2
(2)
Raising the voltage is a very efficient way of increasing the power level because it is a factor that raises the power exponentially whereas increases in capacitance raise the power arithmetically. A simple watt-second rating of electronic flashes does not completely account for the photographic efficiency of a given flash because it does not take into account the reflector that typically surrounds a flash tube. A better comparison of the photographic effectiveness of electronic flashes is a comparison based on their beam candlepower seconds (BCPS) rating or their effective guide numbers when used with the same film under similar conditions. The duration of a given flash unit can be determined approximately by taking into account the resistance of the circuit while the discharge takes place, plus the capacitance of the main capacitors: Duration (seconds) =
Capacitance (farads) × resistance (ohms) . √ 2
(3)
Because it is impractical to raise the voltage beyond certain safe-to-handle limits, further gains in power must be achieved by increasing capacitance. On the other hand, because it is not possible to drop circuit resistance below certain limits, short exposure times must be achieved by dropping the capacitance of the main capacitors. Thus high power and short duration cannot be achieved simultaneously and are a matter of compromise in any given flash design. Electronic flash tubes can be disconnected by using suitable solid-state switches (thyristors in a fairly straightforward R/C circuit) from the power capacitors, so that only a fraction of the power available in the capacitors is used at any given time. This increases the life of batteries used to power portable units and also shortens the duration of a given flash discharge while also providing less light. When action-stopping capability is required, the fractional power settings of many common flash units provide an effective means to achieve relatively short durations suitable for many high-speed applications. (Fig. 1 — also refers to Fig. 24). Nonconventional High-Speed Shutters Kerr cells and Pockels cells are sometimes used when the goal is to achieve very short exposure times. The former is a liquid medium that aligns the plane of the polarization of light in one particular direction. It consists of a glass cell holding the active ingredient (nitrobenzene) which is placed between two polarizers and lined up so that their axes of polarization are at right angles to each other. Therefore, no light passes through the system. However, when two electrodes placed in the liquid are energized, they cause the plane of polarization of the incident beam to rotate and match the plane of polarization of the second polarizer. This allows light to pass through
HIGH SPEED PHOTOGRAPHIC IMAGING
493
Figure 2. Microsecond flash photograph where flash is triggered by an acoustic synchronizer.
Figure 1. High-speed photograph of water drop impact. Flash triggered after a short delay from the time the drop passed through a light beam.
of an experiment themselves by using built-in circuitry. (Fig. 2 — also refers to Fig. 25). Repetitive Flash or Stroboscopic Photography
the system until the electrodes are disconnected. Exposure times in the nanosecond range are possible using such shutters. But there is a matter of light efficiency. They are not very good in this respect, and such shuttering is suitable for events that are highly luminous or where lots of light can be provided, but otherwise they are somewhat impractical for general purpose high-speed applications. They are used primarily in weapons and nuclear research applications. A Pockels cell is similar to a Kerr cell, except that the active material is a solid instead of a liquid. Another similar shutter is the Faraday shutter that responds to the presence of an electrically induced magnetic field. There is always a small amount of light that leaks through the crossed polarizers used in all of these shutters, so generally a second, mechanical, capping shutter is used to exclude light from the camera interior previous to and after the event is photographed. Synchronization A short exposure time is needed to make high-speed still photographs, and one also needs to make sure that the photograph is taken at the right time. For this purpose, synchronizers are used to trigger either the camera or a light source, and sometimes both, from a control device that detects some signal from the imminent high-speed event. Synchronizers are available that respond to just about any kind of an input signal or event; the more popular types include those that respond to a light being turned off or a beam interrupted; a light being turned on; the onset of sound; or changes in pressure, strain, temperature, or motion. Synchronizers typically activate an adjustable delay so that photography takes place at some predetermined time interval after the detection of the synchronization signal. Some can control the sequence
An often neglected form of motion analysis is the application of stroboscopes (developed to a high level of sophistication by Dr. Harold Edgerton) to the study and visualization of high-speed events. If these events are repetitive, a stroboscope is often a low-cost instrument that enables the researcher to gather significant data without making permanent records of the subject at all. And if records are needed, the instruments can also be used for that purpose in a variety of ways in conjunction with cameras. A stroboscope can be either mechanical or electronic. The latter is usually used for event visualization, but there is a lot to be said for the mechanical variety as well. The mechanical stroboscope is generally nothing more than a rotating disk that has one or more ‘‘slots’’ cut into it at regular intervals; the electronic stroboscope is essentially a short duration electronic flash that has a very fast recycling time that allows it to be triggered at variable and calibrated time intervals. When the time period of a repeatable event is the same as that of the time between light flashes of the stroboscope, the event seems to appear motionless if the ambient light level is low, compared with the light produced by the flash. Because our eyes can only see by the light of the flash in such a case and the flash of light (usually lasting only microseconds) happens when the subject is in the same position within each subject cycle, the illusion is that the event, such as rapid rotation of a fan, is stationary. The event could, however, make more than one cycle between light flash periods. To determine the frequency of any cyclic event, all that is needed are two consecutive frequency ‘‘readings’’ (made from the stroboscope’s frequency dial) where the event appears to be standing still. The event frequency (EF) can then be found from EF =
Frequency 1 × frequency 2 . Difference between frequency 1 and frequency 2 (4)
494
HIGH SPEED PHOTOGRAPHIC IMAGING
Figure 3. Stroboscopic photograph (at 60 records per second) of human motion made by using a Colorado Video freeze frame interface.
Stroboscopic Motion Analysis Photographs that can give insight into the motion characteristics of a moving subject can be recorded onto a frame of film by having the subject, illuminated by the flashing stroboscope, perform its motion against a dark background while the shutter of the camera is open for a brief period of time. Those parts of the subject that are in different locations at the time of each light flash are recorded at different locations on the film. By knowing the frequency of the strobe and (by incorporating a scale in the scene) the distance a subject moved between flashes or exposures, the average rate of motion from one point to another can be determined relatively easily (Fig. 3). Unfortunately, the stationary parts of a subject overexpose the film in their locations. If a subject part returns to a previous location, it may also be difficult to determine the actual sequence of the sequential images recorded by the film. To deal with this eventuality, stroboscopes are sometimes coupled with cameras that are simply shutterless film transport mechanisms. They generally either simply transport film from a supply roll onto a take-up spool or are cameras equipped to carry film on the inside or the outside surface of a rotating drum. In these cases, for each flash of light from the stroboscope, the subject is in a new location and there is also unexposed film available to record the image associated with each flash. Because there is a period of darkness between flashes, sometimes the subject has attached to it a continuously glowing lamp to enable the researcher to track the subject’s position easily over time.
(and elapsed time) can be perceived from these records. Cameras that use various motion picture film formats, such as 8 mm, 16 mm, 35 mm and 70 mm, are available, but we will deal primarily with 16-mm format cameras because they are most commonly used for high-speed recording. The image-forming principles of these cameras can be applied to the other formats as well. Conventional motion picture cameras acquire photographs by intermittently placing unexposed film stock in the camera’s gate every 1/24th of a second. While the film is standing still, the camera’s shutter exposes it for some fraction of time that is determined by the selected exposure time. Typically these cameras couple a rotating disk to the film advance mechanism built into the camera. This disk is located behind the lens just in front of the film plane and has a fixed or adjustable open sector or slot cut into it. In most cameras, this disk makes one turn for every frame that is exposed, although in some cameras it only makes a 180° turn per frame (Fig. 4). For cameras that use full circular shutter disks, the exposure time ET is a direct function of the size of the opening cut into the shutter and the framing rate as follows: ET =
Shutter opening in degrees . 360° × framing rate
(5)
The relationship between the full circle of the rotating shutter disk and the opening cut into it is often called the ‘‘shutter factor.’’ If the open sector is 180° , then the shutter factor is 2. Exposure time can also be determined by the reciprocal of the framing rate multiplied by the shutter factor. This means that at a framing rate of 24 pictures per second and a shutter opening of (for instance) 180° , the exposure time is 1/48 second. What is more significant, however, is that the film advance mechanism must then
Rotating shutter
Pull-down claw
HIGH-SPEED MOTION PICTURE CAMERAS Standard cameras record the appearance of a subject at a single time; yet much more information can be gained by studying a subject over time. If samples are taken at a sufficiently high frequency and then redisplayed to the viewer at the same frequency, the illusion of motion
Exposure gate Figure 4. General layout of the film advance and shutter mechanism in a conventional motion picture camera.
HIGH SPEED PHOTOGRAPHIC IMAGING
495
move a new piece of film into the gate of the camera also in 1/48th second. Standard motion picture cameras can move and expose film intermittently up to a rate of about 64 pictures per second, although a very few reach 128 frames per second. The objective in motion picture photography is essentially to duplicate the events that took place in front of the camera, so the time factor on playback generally matches the time during which an event took place when it was recorded that is, under normal conditions, one can say that films have a time magnification of ‘‘1.’’ Thus, time magnification is determined by dividing the image acquisition rate by the ultimate projection rate. Standard motion picture cameras and video cameras acquire images at the same rate as is used later to project them. Intermittent-Action High-Speed Motion Picture Cameras By increasing the acquisition rate to rates higher than the eventual projection rate, time can essentially be ‘‘magnified’’ on playback. If a camera operates at 48 pictures per second and the film is later replayed at 24 pictures per second, the time on the screen will be twice as long as the event originally. To examine smaller and smaller time periods at leisure, it is necessary to operate cameras at ever increasing framing rates and thus achieve ever greater ‘‘magnifications of time.’’ Two reasons that standard or conventional motion picture cameras cannot operate at framing rates higher than about 120 pictures per second both relate to mechanical failure. For one, the perforations, or sprocket holes which the film transport mechanism engages to move the film one frame at a time, cannot tolerate the acceleration forces involved and tear above a certain rate. Estar-based film is superior in this respect and resists tearing. Second, when higher framing rates are involved, the film is not completely motionless during the exposure time, and this leads to blurred records. Cameras that can achieve framing rates beyond 120 pictures per second deal with both of the problems mentioned previously. They are equipped with advance mechanisms that engage from Two up to Four film perforations at a time, and when the film is advanced, it is also placed over a set of pins that match the film’s perforations before the shutter exposes the film. Thus these cameras are intermittent and they are also ‘‘pin registered’’ which means that the film is about as motionless as it can be during exposure. Cameras of this type are available that regularly reach recording rates of up to 500 pictures per second. This means the film is moved a distance of one frame in a time close to 1/1,000 second. The rotating shutters in these cameras can be adjusted to select exposure times as long as 1/1,000 second (at the top framing rate) to exposures in the range of 1/25,000 second or less. In addition, the cameras can often be synchronized with high-speed repeating stroboscopes for even better action-stopping ability (Fig. 5). At their top framing rates, these cameras that use 16-mm motion picture film can magnify time by a factor of approximately 20 times. Typical motion picture cameras of this type are the Milliken, the PhotoSonics 1PL (Instrumentation Marketing Corp, San Diego, CA), and
Figure 5. The Photosonics 1PL intermittent, pin registered, high-speed motion picture camera capable of 500 full frame 16-mm pictures per second.
the Locam (Visual Instrumentation Corp., San Diego, CA). Larger format cameras cannot generally reach these speeds; smaller format cameras exceed them at the cost of lower spatial resolution. Rotating Prism Cameras When the framing rate of the intermittent motion picture camera is not high enough, camera designers compromised somewhat with the sharpness of the records obtained and designed cameras in which the film is in constant motion during filming. Because the film never comes to a standstill, the image of the subject must be moved at a rate that matches, as well as possible, the rate at which the film is moving. Then, a simple segmenting device allows the moving film to record the appearance of the subject periodically (Fig. 6). The device that accomplishes this is a glass block that rotates and is placed between the lens and the moving film. Image-forming rays pass through the rotating block and are deflected so that the image formed by the lens moves at the same rate as the film. As the critical angle between the incoming light rays and the glass block is exceeded, the illumination at the film plane decreases (essentially ‘‘shuttering’’ the view) until such time as the next facet of the block starts to transmit the next view of the subject onto the film. Using interdependent gearing, the glass block and the film maintain a relatively stable relationship to each other as the film is pulled through the camera from a supply spool onto a take-up spool. Cameras are available that accept film loads from 100 to 2,000 feet of 16-mm film (4,000 to 80,000 frames), and the framing rate essentially depends on the rate at which the film can be moved through the camera. Special circuitry is available in some models to reach and keep the preset framing rate as constant as possible (Fig. 7).
496
HIGH SPEED PHOTOGRAPHIC IMAGING
Figure 6. Representation of the method whereby the rotating prism moves the image along with the film in a rotating-prism, high-speed motion picture camera.
Ultimately, accurate timing is accomplished only by timing lights that flash at known time intervals (generally 100 or 1,000 flashes per second) and are incorporated into the cameras. They leave a periodic mark on the edge of the film. If the time between marks is known, then the number of frames between marks is an indicator of the camera’s framing rate when the timing marks were impressed on the film. Framing rates up to about 10,000 full frame 16-mm film images per second can be reached for relatively long times. At such framing rates, the time magnifying ability of these cameras is about 400 times. Some cameras can be equipped with magazines holding up to 2,000 feet of film and, though the top framing rate may be compromised, recording time can be extended significantly. When even higher framing rates are desired, these cameras can be equipped with prisms containing a larger number of facets and this allows doubling or quadrupling the full frame capability of the camera by, respectively, reducing the height of the individual frames to one-half or one-quarter of their full frame dimension. Refinements of this basic concept of glass-block motion compensation include cameras where the compensating device is a multifaceted mirror ground onto the same shaft that advances the film through the camera. Cameras of this general type include the Fastax, Hycam, Nova, PhotoSonics 1b (Instrumentation Marketing Corp., Burbank, CA), Photec, etc. Rotating Drum and Mirror Cameras Again, physical limitations put a ceiling on the rate at which film can be transported through a camera. When
Supply spool Camera lens
Rotating prism
Sprocketed film drive Take-up spool Figure 7. Relationship of component parts of the film transport and image-motion-inducing rotating glass block in a rotating-prism, high-speed motion picture camera.
still higher framing rates are needed than those reached by the rotating prism cameras, then cameras based on yet a different image motion compensation scheme are available. These can reach higher speeds but usually at an additional compromise on image sharpness due to small amounts of motion between the film and the image during exposure. The image-forming rays in these cameras are deflected by a rotating multifaceted mirror driven by a rotating drum that also holds the film during exposure. The mirror turns at a rate set by the camera design parameters, so that the image moves at the same rate as the film located along the inside periphery of the rotating drum. The motion-inducing rotating mirror also deflects the image of a ‘‘stop,’’ through which the image-forming rays pass. As the image moves across a physical replica of the stop and because the stop’s image is moved more rapidly than the subject’s image, the shuttering of such a camera is quite remarkable in terms of duration. The principle of operation of image motion compensation and shuttering in these cameras is referred to, after its inventor, David Miller, as the Miller principle (Fig. 8). Cameras like this hold only enough film for about 200 separate full-frame 16-mm pictures but can achieve recording rates up to about 35,000 pictures per second at exposure times for each frame of less than a microsecond. Such a relationship obviously means that the camera is out of film in less than 1/100 second when running at full speed, but the framing rate is uniform along the film, and generally no timing lights are needed. One simply needs to make a record of the framing rate at which the camera was running during the recording. This is displayed by a simple tachometer readout. Unlike most other cameras, drum-type cameras generally do not need any sort of synchronization scheme between the camera and the event, as long as the event is self-luminous. It is said that they are ‘‘always alert.’’ This occurs because the camera runs the same film past the image gate repeatedly at the desired framing rate. The drum is simply brought up to the desired speed, and when an event happens, it turns on a bright light, and the event is recorded by the film. To prevent multiple exposures, one needs only to close a shutter (or use a light that lasts less than one revolution of the drum) before the drum has made a complete turn (Fig. 9). Their framing rate, and thus time magnification capability, is high, but the images captured by these cameras cannot generally be conveniently viewed by using a projector. They are, instead, viewed as a series of still images or are ‘‘animated’’ by duplication onto standard motion picture stock. Cameras of this type include the Dynafax by the Cordin Corporation (Salt Lake City, UT).
HIGH SPEED PHOTOGRAPHIC IMAGING
497
13 8
10 11
9 9′ 10′
7
15 14
12
6 11′ 5 3 4 2 12′
1
Figure 8. Diagrammatic layout of the Cordin Dynafax rotating mirror and high-speed drum camera. 1 Objective lens; 2 Mask; 3 Field lens; 4 Capping shutter; 5 Entrance stop; 6 Entrance relay mirror; 7 Entrance relay lens; 8 Rotating hexagonal mirror; 9 and 9 Collimating relay lenses; 10 and 10 Exit stops; 11 and 11
Imaging relay lenses; 12 and 12 Exit relay mirrors; 13 Film.
Rotating Mirror Framing Cameras When framing rates even higher than 35,000 pictures per second are desired, a compromise is then typically made in terms of the total number of images acquired. For photography at rates of millions per second, the film is actually held still and the image of the subject is moved to successive positions by a rotating mirror. The primary image collected by the camera’s objective is brought to a focus and is then reimaged on the surface of a rotating mirror. This mirror is surrounded by an array of lenslets which in turn reimage the subject’s image formed in the mirror onto film which is held in an arc at the appropriate distance from each lenslet (Fig. 10). Because each lens can ‘‘see’’ only the image of the subject reflected by the rotating mirror when the mirror is lined up at a particular angle, as the mirror rotates, each lens records the view of the subject at a time slightly different from the others. The time between which lenslets record their respective images is a function of how fast the main, rotating mirror can be turned. Using rotational rates of thousands of revolutions per second, framing rates of millions of pictures per second are possible, albeit for only a few microseconds or for only a dozen to a few dozen frames. Again, the framing rates of these cameras can be increased by making the frames small when measured in the direction of mirror rotation. To prevent multiple exposures that would be caused when the main mirror makes more than one revolution, a
Figure 9. Sequence of schlieren photos made at 20,000 pictures per second using a Cordin Dynafax rotating drum and mirror high-speed camera.
Image of subject
Film in curved track Lenslets
Rotating mirror
Subject
Camera lens
Image of subject on mirror surface
Figure 10. Basic layout of rotating mirror ultrahigh speed framing camera.
variety of capping shutter schemes are employed both before event initiation (in case of a non-self-luminous
498
HIGH SPEED PHOTOGRAPHIC IMAGING
event) and after the event. These include mirrors or glass windows that are explosively shattered, as well as surfaces explosively coated with various materials. Timing and synchronization are significant problems when using these cameras. Image Dissection Cameras Framing rates of 100,000 pictures per second and higher can also be achieved by cameras that sacrifice spatial resolution for temporal resolution by using the same piece of film to store information about the appearance of the subject at more than one time. This solution to high-speed recording depends on the fact that one can often make do with images of low spatial resolution and still have enough information to deduce various of the subject’s behavioral parameters. By placing a grid consisting of fine opaque/clear lines in front of a piece of film and assuming that the clear lines are one-tenth the size of the opaque lines, it can be readily understood that when a picture is made through the grid, only one-tenth of the film’s surface area is exposed. If the grid lines are small enough, a fairly good representation of most subjects can be achieved, but only one-tenth of the total image content is possible. Then, by moving the grid a distance equal to the width of the opaque grid, it is possible to make additional records of a subject. If the subject is moving when the grid is in motion, the image information will occupy different locations on the surface of the film. Once the film is processed, the real grid is placed on the film and registered by using its location when the picture was made. Then, movement of the grid across the grid lines presents to the viewer essentially a brief ‘‘motion picture’’ of the subject’s action when the grid moved and while the ‘‘sequence’’ of pictures was being made. Obviously synchronization is a major problem. Cameras of this type are quite rare but can achieve very high framing rates at relatively low cost.
recent developments in electronic imaging detectors and integrated systems made significant inroads into a field long dominated by film-based cameras (see Fig. 23). Early high-speed video cameras were no match for the workhorse of the high-speed industry, the rotating-prism high-speed camera. This situation changed dramatically when the Eastman Kodak (Rochester, NY) introduced the Ektapro system. This system could capture and store 1,000 full frame video images per second using a sensor made up of 192 × 240 pixels. At first, novel tape transport methods allowed the system to store images on videotape. Although the system captured black-and-white images at a resolution significantly lower than film, the immediate availability of the data and the fact that the equipment did not require a skilled operator made it a highly desirable imaging tool in the manufacturing industry (Figs. 11–12).
Figure 11. One frame taken out of a motion record of a car crash test captured by a high-speed digital camera at 1,000 frames per second. Made with a RedLake motion scope camera.
Electrostatic Image Tube Cameras An alternative to high-speed rotating mirror framing cameras and image dissection cameras is the high-speed framing electrostatic camera based on the use of an image tube, sometimes equipped with an auxiliary microchannel plate image intensifier. A camera of this kind is the DRSHadland Imacon. In these cameras, that can make images at a time separation of the order of a few nanoseconds, images are electrostatically shuttered and moved to various locations. They are eventually recorded on a sheet of film (often Polaroid sheet film or, in newer versions, on CCD arrays) so that eight to ten images appear on a single sheet. This means that, although the framing rate of these cameras is the fastest among all high-speed cameras, the number of records is among the fewest. HIGH-SPEED VIDEO AND CCD SYSTEMS The high speed industry was stagnant to a large extent in new developments in instrumentation until
Figure 12. The Redlake MotionScope digital high-speed camera.
HIGH SPEED PHOTOGRAPHIC IMAGING
499
Figure 13. Minuteman missile launch recorded at 500 pps by a Photosonics Phantom digital high-speed camera.
A newer version of the system, the EM, as well as several ‘‘close cousins’’ of the fundamental design, store several thousand full frame images in RAM digital memory. This enables the camera to achieve something that the high-speed industry has sought since the advent of photography: recording of random events at a high recording rate and visualization of a subject’s state just before, as well as during and after, catastrophic failure. This is accomplished by erasing the oldest images recorded into RAM memory while continually adding new images to the ‘‘image stack’’ loaded into the camera. By monitoring the event, the camera is made to stop recording at some suitable time after a random event happens, and the stored images are then played back from a time before the initiation of the event or failure (see Fig. 27). Advances continue in solid-state imaging, and high recording rates at brief exposure times are currently achieved in a similar fashion. Often microchannel image intensifiers are used to boost the light sensitivity of these high-speed imaging systems (Fig. 13). STREAK CAMERAS The imaging scheme used in cameras of standard design is that the film remains stationary with respect to the film gate at the time of exposure. A significant number of specialized applications employ cameras in which the film is in motion at the time of exposure. In their simplest
manifestation, film is simply moved past a stationary slit, and the exposure itself takes place during the transit of the film across this opening (Fig. 14). Unlike normal cameras that make photographs that have two spatial dimensions, these ‘‘strip or streak’’ cameras always display time as one of their dimensions and one subject’s spatial dimension as their other dimension. When cameras of this type are used to make records of subjects whose images do not move across the shutterslit, they are basically simple time-recording instruments. When images move along the slit, then the photographic record can be reduced to image velocity or acceleration. Cameras that employ this scheme for making photographs but whose output resembles the subjects they are photographing depend on moving the image of the subject at the same rate as the film moves past the stationary shutter-slit. In its simplest form, the camera system described previously behaves basically as a light-based strip chart recorder where a physical pen is replaced by many light ‘‘pens’’ operating along the open shutter-slit of the camera while the moving film is the recording medium. As such, the instrument can make precise measurements of event duration, simultaneity, velocity, acceleration, frequency, etc (Fig. 15). The time resolution of the camera is first a function of the rate at which the film can be moved past the slit and second of the width of the shutter-slit itself. When the upper limits of film transport methods are reached, the image of the slit can be quickly wiped across stationary film. Rotating mirror streak (or velocity recording) cameras have been made with time resolutions
Slit image wiped onto film
Film in curved track
Subject
Camera lens
Image of subject Figure 14. The Photosonics Phantom digital high-speed camera.
Slit assembly
Rotating mirror
2nd lens
Figure 15. Diagrammatic layout of a high-speed rotating mirror streak camera.
500
HIGH SPEED PHOTOGRAPHIC IMAGING
Supply Camera lens Open slit Film motion Subject motion
Take-up
Figure 16. Diagram of a moving film strip (or streak) camera. In the strip or synchroballistic and photofinish mode, the images move in the same direction as the film. In the streak mode, images move perpendicular to the direction of film motion.
of less than 1 cm per microsecond and slit sizes as small as 1/10 millimeter and less. Electrostatic streak cameras are designed so an image line is deflected electrostatically across the face of an image tube for time resolutions that enable researchers to determine the duration of events in the picosecond time realm. Hamamatsu (now DRS-Hadland) and EG&G are two companies that produce such streak cameras. STRIP CAMERAS When it is desired to make a record of a scene so that it resembles its appearance to the eye, the image of a subject can be made to move across the shutter-slit of the streak camera, instantly converting it into a ‘‘strip’’ camera. Images are moved across the slit of the camera by several methods. The easiest by far is simply to have the subject move linearly across the field of view of the lens. There are two major applications in high-speed photography for this imaging method (Fig. 16). SYNCHROBALLISTIC AND PHOTOFINISH PHOTOGRAPHY Synchroballistic photography is a high-speed photographic technique used to photograph fast moving objects such as rockets in flight. It is also the same process that is used to determine accurately the order of the finish of horses at racetracks. There are several problems in making high-speed pictures of high-speed missiles in flight. The major one is that, until lately, it was impossible to make shutters
that had good light transmission characteristics and gave a short exposure time to make sharp pictures of these objects. Missile photography using a high-speed motion picture camera could be used, but there is a problem in tracking the missile as well as in using large amounts of film. And finally, the pictures are generally small. If such photographs are magnified to any extent, the results are not of very high quality. The solution is the ‘‘synchroballistic photography’’ system. In this approach, the problem of high subject speeds and consequently, image speeds is solved by putting the film in motion. The film is moved in the camera at the expected speed of the image of the missile. Because the image and the film are moving a the same rate, the film sees an essentially stationary image of the missile and records each part of the missile at a different time. In a synchroballistic camera (as well as in a horse-race photofinish camera), a narrow slit shutter is generally incorporated just in front of the film plane. The film is transported past this open slit while the image of the object being photographed also moves past this slit at the same speed as the film moves. Thus the film records, sequentially, the image of the object as it moves past the slit. The final record, then, is a view of one line in space spread out over time. The cameras depend on good a priori knowledge of the speed at which a missile will be traveling to end up with sharp pictures, but this is often a solvable problem. Ultimately the photographs that these cameras deliver are sharp, they are full of detail, they provide quantitative information, and they are relatively inexpensive. A ‘‘drawback’’ is that sometimes the images look distorted from their real appearance, but information content is still very high. A variation on the theme is the application of these cameras at racetracks. In this case, horses are substituted for a missile, and as the image of each horse passes over the slit, the horses are recorded sequentially onto the film. The film speed is adjusted to match that of the image of the average racehorse. As each individual ‘‘nose’’ gets to the slit of the camera, it is recorded sequentially. Of course, the noses are followed by the rest of the horses’ bodies, and they too are recorded eventually as they move forward. But the important part is that the sequential arrival of each nose at the slit on the ‘‘strip’’ camera is then displayed in offset and on different positions on the film or final print. The distance between noses can be directly translated to the difference in the time at which each nose arrives at the slit of the camera which is lined up so that it matches the finish line on the real track. This system is much better than trying to determine the finishing order of racers by human judgment, by a single photograph, or by a motion picture sequence. The picture presents, on a single flat sheet of paper, the order of arrival of the horses, accurately, with a high degree of reliability and without a doubt caused by loss of time information. Electronic or ‘‘digital’’ strip cameras are currently being installed at many racetracks. They are ‘‘linear’’ array cameras where a single row of CCD sensors doubles for the
HIGH SPEED PHOTOGRAPHIC IMAGING
501
slit of the film-type strip camera. The results are similar to that traditionally achieved, but the darkroom is no longer needed. Several linear array cameras can be installed along the track for basically instantaneous readings on the elapsed time for any given racer between the start and arrival at a given location and to obtain accurate, almost real-time, time-between-racers information, even as the race is developing. At the high end of high-speed photography, timeresolution instrumentation cameras like this, or variations on the theme, are used to detect whether lasers hit targets simultaneously in nuclear fusion research. PHOTOGRAPHIC TECHNIQUES OFTEN ASSOCIATED WITH HIGH-SPEED PHOTOGRAPHY Figure 18. Microsecond illumination direct shadow shadowgraph of bullet in supersonic flight.
Basic Flow Visualization Techniques Photography is widely used to visualize phenomena that are invisible to the eye because they hardly alter the environment visually. Such events are associated with density gradients formed in liquids or gases as a result of nonhomogeneous mixing or the by-product of local heating, cooling, or compression of gases. Several basic techniques are used to visualize these gradients. Shadowgraph Method. The shadowgraph is the result of interposing a subject containing density gradients between a small (sometimes referred to as a ‘‘point’’) light source and a screen (often plain photographic film or paper). The gradients in the subject cause light rays to fall in locations different from where they would fall if the disturbing medium were not present, and therefore, the effect of the gradients on the light distribution on the screen can be associated with the gradients themselves (Figs. 17–18). If a plain screen is used, a standard camera can be aimed at the screen from the front or, if the screen is translucent, from the rear. High reflectivity (3M’s Scotchlite ) materials allow visualizing shadowgraphs in daylight. When photographic materials are used as the screen, the system is set up under laboratory conditions in a darkened room, and the lamp is placed in a shuttered enclosure, or a flash is used as the source. Advantages
Subject
Small compact “point” source
Image of subject and shadow on screen
of this simple system are its ability to deal with large subjects and low cost. A disadvantage is its relatively low sensitivity. Schlieren Methods. When a higher level of sensitivity is desired, optical elements are incorporated into the shadowgraph system (thus practically limiting the size of the subjects that can be photographed), and deviations in the path of light rays caused by density gradients (or departures from surface flatness in reflective systems) cause these deviated rays to interact with various physical obstacles (knife-edges) that emphasize the appearance of these gradients compared to their appearance in a shadowgraph system.
Single-Pass Schlieren Systems. In a popular layout, a small light source is placed at the focal point of an astronomical-quality spherical or parabolic mirror (lenses can also be used, but the system costs increase dramatically). The light emerges from the mirror in a parallel beam that is intercepted some distance away by another similar mirror. It collects the light rays from the first one and brings them to a focus at the focal point where a real image of the light source is formed (Figs. 19–20). The light continues into a camera where the image of the second mirror’s surface appears to be filled with light. Introducing a ‘‘knife-edge’’ into the image of the source causes an overall drop in the light level of the mirror’s surface. When the image of the source is moved by refractive index gradients in the space between the
Mirror
Small, compact light source or lamp filament
2f of mirror
Camera Photosensitive material or reflective screen Shadow Figure 17. Diagram of the layout of direct shadow (shadow cast onto film) and reflected (imaged by a camera) shadowgraph systems.
Camera Schlieren field
Knife-edge
Figure 19. Diagrammatic layout of a single mirror, double-sensitivity, schlieren system.
502
HIGH SPEED PHOTOGRAPHIC IMAGING
then those areas in the image of the mirror’s surface made up of light rays that moved away from the knife-edge will appear brighter. Those areas made up of rays that moved toward the edge will appear darker than the brightness level of a steady-state condition. Replacing the opaque, solid knife-edge by transparent, colored filters opens up the possibility of introducing color into the images although color systems are visually more appealing, they often are less sensitive than black-andwhite systems and do not generally yield additional data (see Fig. 26).
Figure 20. Microsecond schlieren photograph of a bullet in flight. The flash was triggered by the bullet breaking a wire visible to the right in the photo.
source and its image, local variations in light level appear in the image of the second mirror’s surface. Photography is almost secondary to the process, although just about every industrial schlieren system has a camera of some sort incorporated in the system.
Double-Pass Schlieren Systems. When the previous design is not sufficiently sensitive, light can be made to pass through the density gradients twice, and thus the sensitivity is increased. This is commonly accomplished by using a single mirror and placing the camera and the source close to each other, both located approximately two focal lengths from the mirror’s surface. In this way, a life-sized image of the source is formed just in front of the camera lens (Fig. 21). The light from the source again proceeds into the camera lens where it contributes to making the mirror’s surface appear fully luminous. Intercepting some of the light rays at the image of the source leads to an overall, uniform darkening of the image of the mirror’s surface (as if the source itself had been reduced in size or its luminance decreased). If the knife-edge remains fixed in position and the image of the light source is moved (by density gradients),
Light source Condenser
Large area front or rear illuminted grid
Focused location is subject plane
Large aperture long fl lens
Physical negative replica of source grid and image of grid
Film plane
Figure 22. Essential components and layout of a focusing schlieren system.
Knife-edge 1 1f of mirror
Mirror 2
Focusing Schlieren Systems. An interesting schlieren scheme that overcomes the relatively high cost and complexity of mirror systems, but does not lose much in sensitivity, is the focusing schlieren system. This system is particularly well suited for dealing with windows (in wind tunnels, for example) of relatively low quality and has the advantage of being highly sensitive to the location of the gradients in the scene, unlike mirror systems which are influenced by the effect on light rays across the entire distance from source to knife-edge. A basic focusing schlieren system consists of a grid of opaque and transparent lines placed in front of a broad light source. A camera lens, although focused at some distance between it and the grid (where the subject will be located), forms an image of the grid located some distance on the opposite side of the lens but closer to the lens than the subject plane on which it is focused. The image of this grid will be distorted to some extent and may be rather badly distorted if low-quality windows are placed between the grid and the lens. Whatever the quality of the image, a reproduction (negative) of it is made onto high contrast film (Figs. 22–23). After processing, the photographic negative is replaced in the same exact position that it had when the image of
Mirror 1
Schlieren field 1f of mirror
Camera
Knife-edge 2 Figure 21. Layout of a conventional double-mirror schlieren system.
Figure 23. Missile in flight captured by a RedLake Motion Scope digital high-speed camera.
HIGH SPEED PHOTOGRAPHIC IMAGING
Figure 24. High-speed photograph of a wasp in flight. Wasp location in space identified with an ‘‘X’’ laser beam synchronizer that activated the camera when the wasp intercepted both laser beams simultaneously.
Figure 25. Four 1-microsecond flash photographs taken at increasing delays showing the separation of the shotgun pellets from the wad.
Figure 26. Color schlieren photograph of a .22 caliber supersonic bullet in flight. See color insert.
the (distorted) grid was made. The consequence is that the negative image of the grid prevents any light rays from proceeding onto the camera’s ground glass, and the field is completely (or mostly) dark.
503
Figure 27. The NAC Memrecam 400 digital high-speed camera.
Now, the subject that gives rise to, or contains, the refractive index gradients is placed in the location, on which the lens is focused, between the light source and grid combination and the camera lens. Because it is closer to the lens than the grid, an image of it is formed beyond the distance at which the image of the grid was brought to a focus and where the negative of the grid is placed. Disturbances in the path of light rays induced at the subject location will cause some of the light rays making up the image of the grid to move relative to the obstructing negative and those areas on the ground glass will gain brightness, identifying a subject that has a nonhomogeneous refractive index distribution. If directional changes are caused in locations outside the plane of sharp focus of the lens, they tend to move the image of the grid across a much larger area and simply slightly degrade the overall contrast of the image, but cannot be identified with a particular location.
Reflective Schlieren System. Generally, only transmission schlieren systems are used, but a significant tool for determining surface flatness of reflecting or semireflecting surfaces can also be investigated by using a reflective schlieren system. In this scheme, the light source and camera lens are again next to each other but located one focal length from a ‘‘field’’ lens. Because the source is at the focal point of the lens, a parallel beam of light emerges from it. If this falls on a flat surface and is reflected directly back to the lens, then an image of the source is formed again at the focal point of the lens. Because the lens and source cannot generally occupy the same location, each is slightly off-axis. If the surface is not flat, then it changes the direction in which the light rays are reflected from the surface. When these changes are at right angles to the orientation of the knife-edge, this causes variations in light intensity at the image on the lens’ surface. The change in direction
504
HOLOGRAPHY
can be caused by either a surface that is not flat or by refractive index gradients in the space between the source and the image. One attempts to isolate the changes to a known location, generally, as close to the flat mirror as possible. Surface qualities of multiple surfaces can be visualized by offsetting the angle of tilt of each individual surface slightly. Each will produce an image of the source, and thus multiple knife edges are used. HIGH SPEED AND PHOTOINSTRUMENTATION COMPANIES EG&G DRS-Hadland Photonics Hamamatsu Photonic Systems Roper Industries Redlake Corporation Instrumentation Marketing Corporation Cordin Corporation Visual Instrumentation Corp. Photonics Inc. NAC BIBLIOGRAPHY 1. C. R Arnold, P. J. Rolls, and J. C. J Stewart, in D. A. Spencer, ed., Applied Photography, Focal Press, London, UK, 1971. 2. S. Ray, ed., High Speed Photography and Photonics, Focal Press, Butterworth-Heinemann, Oxford, UK, 1997. 3. R. F.Saxe, High Speed Photography, Focal Press, London, UK, 1966. 4. H. Edgerton, Electronic Flash, Strobe, MIT Press, Cambridge, MA, 1987. 5. E. Hecht, Optics, Addison-Wesley, Reading, MA, 1987. 6. Proc. Int. Congr. High Speed Photogr. Photonics, starting from the first meeting in 1952. Recent issues published by the SPIE, the International Society for Optical Engineering, Bellingham, WA.
we will analyze the recording and reconstruction of the wave fronts of a three-dimensional object. These analyses are applicable to hybrid developments to be summarized. Generally, the process of holography can be defined as recording the interference pattern between two mutually coherent radiative fields on a two- or three-dimensional medium. The result is said to be a hologram. It is a complex diffraction grating. When one of the fields is directed at the hologram, the diffraction reconstructs the wave fronts of the other field. Holography can be studied and appreciated at many levels. It is sufficiently simple for those without a technical background to produce beautiful works of art. Indeed, it can now be pursued at the same level as photography. THEORY OF HOLOGRAPHIC IMAGING The most elegant exposition on holography is through the mathematics of communication theory (3) and Fourier transformations (9). However, for practicality, we will present two complementary treatments, one through rigorous mathematics which leads to numerical results and the other through a graphic model that offers a conceptual understanding of all of the major features of holography. Basic Mathematics of Holographic Recording and Reconstruction Figure 1 depicts a generalized system for recording and playing back a hologram. Let the origin of the x, y, z coordinate system be located in the plane of a hologram. A point S(xS , yS , zS ) on the object, which is illuminated by an ideal coherent source, scatters light toward the hologram being recorded. It interferes with a coherent wave from a point source located at R(xR , yR , zR ). The subsequent interference pattern is recorded all over the surface of the hologram, which is in the x, y plane.
HOLOGRAPHY S (x S,y S,z S)
TUNG H. JEONG
Holographic plate
Y
Lake Forest College Lake Forest, IL
INTRODUCTION In an attempt to correct the spherical aberration of the electron microscope, Dennis Gabor (1,2) gave birth to the basic concepts of holography in 1948, for which he was awarded the Nobel prize in physics in 1971. However, it was not until after the invention of the laser and further contributions by E. N. Leith and J. Upatnieks (3–5) and independently by Yu. N. Denisyuk (6–8) that holography was developed to its present stage. In the discussion that follows, we will concentrate on optical holography, although the concepts involved are applicable to any type of coherent waves. In particular,
a X c C (x C,yC,z C)
b P (x,y,O)
Z
R (x R,y R,z R)
Figure 1. Generalized system for recording and playing back a hologram.
HOLOGRAPHY
During the playback, a point source C(xC , yC , zC ) located in the vicinity of R whose wavelength is λ , which can be different from the recording wavelength λ, is directed at the hologram. We wish to study the nature of the resultant diffraction. In particular, let us concentrate on point P(x, y, 0). Let a be the distance between S and P, and let b be the distance between R and P, where a=
(x − xS )2 + (y − yS )2 + z2S
b = (x − xR )2 + (y − yR )2 + z2R
and
(1)
Next we consider illuminating P by a spherical wave front originating from C (xC , yC , zC ) whose wavelength is µλ, where µ is the ratio of the illuminating and the recording wavelengths. At the left side of P, we can denote the amplitude of the illuminating light as C(x, y, z = −0, t) = where
R(x, y, t) =
R0 exp(jφR + jωt ), b
(3)
S(x, y, t) =
S0 exp(jφS + jωt ), a
(4)
φS = −
and φR = −
2π λ
2π λ
a + ψS
(5)
(12)
c=
(xC − x)2 + (yC − y)2 + z2C .
(13)
Thus, the amplitude at the right side of P, is C(x, y, z = +0, t) = T(x, y)C(x, y, z = −0, t) 2 ω S0 R20 C0 + + j exp jφ t =K C a2 b2 c µ KS0 R0 C0 ω t + exp j(φS − φR + φC ) + j abc µ KS0 R0 C0 ω t + exp j(−φS + φR + φC ) + j abc µ = I + II + III.
(14)
b + ψR .
(6)
Here ω is the angular frequency, and S0 , R0 , ψS , and ψR are constants. From Eqs. (3) and (4), the total amplitude at P is H(x, y, t) = S + R R0 S0 = exp(jφS ) + exp(jφR ) exp(jωt ), (7) a b and the total intensity is I(x, y, t) = |H(x, y, t)|2 .
(8)
At this point, the nature of the recording medium as well as the processing procedure play an important role in the amplitude transmittance T(x, y) at P. For simplicity, let T(x, y) = KI(x, y)
(9)
where K is a positive constant. From Eqs. (7)–(9), we obtain 2 R20 S0 R0 S0 [exp[j(φS − φR )] + + T(x, y) = K a2 b2 ab (10) + exp[−j(φS − φR )]] This is the hologram at P.
2π c + ψC µλ
(11)
(2) and
where
ω C0 exp (jφC ) + j t , c µ
φC = −
Furthermore, let the instantaneous amplitudes of the object and reference waves at P be S(x, y, t) and R(x, y, t) respectively, having duly noted that they both have spherical wave fronts. Thus,
and
505
Notice that for a usual case where µ = 1, C = R, and b = c, we have the essence of holography. The II and III terms become S0 (15) exp(jφS + jωt ), II ∝ a and III ∝
S0 exp(−jφS + jωt ). a
(16)
Except for a constant, Eq. (15) is identical to the object. This is called the direct image. Equation (16) represents a conjugate image. Note that the direct or conjugate images can be real, virtual, or a combination of the two. This can occur when the ‘‘object’’ is actually a real image projected onto the holographic plane. As depicted in Fig. 1, however, the direct image is virtual, and the conjugate is real. The first term in Eq. (14) represents the directly transmitted wave from C, sometimes called the ‘‘dc’’ component, parallel to language used in electronics. Properties of Holographic Images (10) To design holographic optical elements and to display holograms that will be illuminated by nonlaser sources, it is useful to understand some optical characteristics peculiar to holography. Although a more general treatment is possible (11), a useful understanding can be derived with a paraxial approximation, as for simple lenses. Assuming that S, R,
506
HOLOGRAPHY
and C are located relatively closely to the z axis, we can perform an expansion on a and b as follows: a = [z2S + (x − xS )2 + (y − yS )2 ]1/2 1/2 (x − xS )2 + (y − yS )2 = zS 1 + z2S (x − xS )2 + (y − yS )2 ≈ zS 1 + 2z2S ≈
x2 + y2S x2 + y2 − 2xS x − 2yS y + zS + S . 2zS 2zS
(17)
φ1 = −
x2 + y2R x2 + y2 − 2xR x − 2yR y + zR + R . 2zR 2zR
2π λ
1 2 xS yS (x + y2 ) − x − y + const. 2zS zS zS
1 1 1 1 − = − , µz1 zS µzC zR xC x1 µxS µxR = + − , z1 zS zC zR y1 µyS yC µyR . = + − z1 zS zC zR
(19a)
and φR = −
2π λ
1 xR yR (x2 + y2 ) − x− y + const. 2zR zR zR
(19b)
1 1 1 1 + = + , µz2 zS µzC zR x2 µxS xC µxR , =− + + z2 zS zC zR y2 µyS yC µyR . =− + + z2 zS zC zR
2π 1 1 1 (x2 + y2 ) − λ 2 zS zR xS yS xR yR − − − x− y + const. (20) zS zR zS zR
φS − φR = −
Now, we can represent the phase factors from the second and third terms of Eq. (14) as
µ 2π 1 µ 1 − + φ2 ≈ − (x2 + y2 ) + µλ 2 zS zR zC µxR xC µxS + + − − zS zR zC µyS µyR yC y + const. ×x− − + + zS zR zC
(24b) (24c)
(25a) (25b) (25c)
For a special case where µ = 1 and C = R, Eqs. (24a) and (25a) show that the hologram is a weird lens that has two focal lengths. From Eqs. (24) and (25), we can also derive the following useful lateral, longitudinal, and angular magnifications for both the direct and conjugate images. For a direct image, Mlat
dx1 dy1 z1 1 zS zS −1 = = =µ = 1+ − dxS dyS zS µ zC zR Mlong =
(21)
dz1 1 2 = Mlat , dzS µ
and
and similarly
(24a)
The location of the conjugate image can be solved in the same fashion:
Therefore,
φ1 = φS (x, y) − φR (x, y) + φC (x, y) µ 1 2π 1 µ (x2 + y2 ) − + ≈− µλ 2 zS zR zC µxS µxR xC − + − x zS zR zC µyR yC µyS y + const, − + − zS zR zC
(23)
(18)
Using these approximations on Eqs. (5) and (6) yields φS = −
1 2 2π x1 y1 (x + y2 ) − x − y + const. µλ 2z1 z1 z1
By comparing Eq. (23) and Eq. (21), it is easy to solve for the coordinates and arrange them in a form similar to lens equations:
and similarly, b≈
find the locations of the real and the virtual images from the hologram. Let the centers of the direct and conjugate images be located at (x1 , y1 , z1 ) and (x2 , y2 , z2 ), respectively. Notice that Eq. (19) is a general expression of the phase distribution of a point source in the x, y plane (in the paraxial approximation). Therefore, the phase distribution of a spherical wave front that originates from (x1 , y1 , z1 ) on the x, y plane can be written in the similar form
Mang =
d(x1 /z1 ) = µ. d(xS /zS )
(26a) (26b)
(26c)
For a conjugate image, Mlat = −µ (22)
Just as in Eq. (19), we readily recognize the form A(x2 + y2 ) + Bx + Cy, which represents spherical wave fronts. By finding the centers of these spheres, we can
Mlong =
z2 1 zS zS −1 = 1− − , zS µ zC zR
1 2 M , µ lat
(27a) (27b)
and Mang = −µ.
(27c)
HOLOGRAPHY
VOLUME HOLOGRAMS — A GRAPHIC MODEL The theoretical analysis presented assumes a twodimensional recording medium. Like ‘‘thin’’ lens treatments in geometric optics, the results are generally useful. In reality, all media are at least 1 µm thick. When holograms are recorded on photorefractive crystals, the medium is truly three-dimensional. Mathematical treatments for volume holograms are a great deal more complex (12) and beyond the scope of this chapter. However, we present here a graphic model (13) that offers an intuitive understanding of all of the major characteristics. Figure 2 depicts in two dimensions the constructive interference pattern formed by A and B, two point sources of coherent radiation. It consists of a family of hyperbolic lines. A figure of revolution around axis AB is a corresponding family of hyperboloidal surfaces. The difference in optical path between rays from A and B on any point of a given surface is constant in integral numbers of wavelengths. The zeroth order, for example, is a flat plane that bisects AB perpendicularly. Any point on it is equidistant from A and B. Likewise, any point on the nth-order surface represents an optical path difference of n wavelengths. Transmission Holograms Region I of Fig. 2 depicts the edge view of a thick recording medium. The silver halide, such as the red-sensitive PFG01 emulsion from Slavich, is approximately 6 µm thick. Light from A and B arrives at the plane of the emulsion from the same side and forms a family of narrow hyperboloidal surfaces throughout the volume after chemical processing. Halfway between any two surfaces of interference maxima is a surface of interference minimum.
507
The essential feature of the graphic model is to regard these hyperboloidal surfaces as if they were partially reflecting mirrors. Rays shown by dotted lines in Fig. 2 show a unique property of hyperbolic mirrors: Any light from A incident on any part of any mirror is reflected to form a virtual image of B, and vice versa. Of course, retracing the reflected rays results in a real image of A. Suppose that photoplate is exposed in region I and properly processed. Source A, now called the reference beam, is directed alone at the finished hologram placed at its original position. Some light will be directly transmitted through the hologram, but the remainder will be reflected to create a virtual image of B. An observer viewing the reflected light will see a bright spot located at B. If a converging beam is directed through the hologram in a reversed direction and is focused at A (now called the conjugate beam) along the same path, a real image of B is created at the precise original location. Thus, we have created a transmission hologram of an object that consists of a single point. If we substitute a three-dimensional object illuminated by coherent light in the vicinity of B, each point on the object interferes with A to form a unique set of hyperbolic mirrors. Any elementary area on the photoplate receives light from different points on the object and records a superposition of all sets of mirrors. Thus, each elementary area records a distinctly different set of points holographically. All together, the photoplate records all of the views that can be seen through the plate at the object. When the finished hologram is illuminated by A, light is ‘‘reflected’’ by the mirrors and recreates the virtual image of the object. By moving one’s eye to different locations, different views of the original object are observed from the reconstructed wave fronts. Because the ‘‘reflected’’ light is actually transmitted from source A through the hologram to the viewer on the other side, this is a transmission hologram. Such a hologram can be broken into small pieces, and each piece can recreate a complete view of the object. If a narrow laser beam is directed in a conjugate direction toward A, a real image is projected onto a screen located at B. By moving the beam around (but always directing it toward A), a different view of the object is
S (far away) S′ B
II
A Microscopic cross section of a hologram
I →
→
→
13 12 1110 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1011 12 13 Figure 2. Constructive interference pattern formed by point sources A and B. Region I records transmission holograms, and Region II records reflection holograms.
→
a3
a4
a2
a1
Observer
Figure 3. The microscopic cross-sectional view of a typical ‘‘thick’’ or ‘‘volume’’ transmission hologram that shows the effect of Bragg diffraction.
508
HOLOGRAPHY
projected. Figure 3 shows a realistic microscopic cross section of an emulsion after exposure to two beams, A and B, each at 45° from opposite sides of the normal. Because the emulsion is about 10 wavelengths thick, the ‘‘mirrors’’ are very close together. When A alone is directed through the hologram, the rays penetrate through not one but several partial mirrors. This volume grating exhibits diffraction that is wavelength-selective, a phenomenon explored by Lippmann (14) and Bragg (15) for which each was awarded a Nobel prize. If red light from a laser were used for the recording and a point source of white (incandescent) light is located at A, a predominantly red image is recreated. But because there are only a few planes in this quasi-thick hologram, other colors still come through and show a spectral smear. By using a photopolymer 50 µm thick, transmission holograms have been made which are viewable under incandescent light. In this case, the recording wavelength is selected by Bragg diffraction. Reflection Hologram As shown in Fig. 2, region II contains standing waves formed by light from A and B in opposite directions. Nodal planes are one-half wavelength apart. By placing a photoplate in region II, it is exposed to hyperboloidal planes that are practically flat and parallel to the plate. An emulsion layer 10 wavelengths thick would create twenty partially reflecting surfaces. Reconstruction can be accomplished readily by using an incandescent point source. When a hologram is made in this manner, the reconstructed image is viewed from the side of the reference beam and is called a reflection hologram. A particularly simple configuration for making such a hologram was discovered by Denisyuk (6–8). The object is placed directly on one side of the photoplate, and the exposure is made by a single source from the opposite side. The direct incidence is the reference beam, and the transmitted and subsequently scattered light is the object beam.
MATERIAL REQUIREMENTS Because low cost Class IIIa lasers are available, excellent holograms can be made safely at home by absolute beginners, including children. Simple Holography (16) shows, in pictorial detail, how to make these holograms by using a diode laser, holoplates, prepackaged processing chemicals, and readily available household item to make reflection and transmission holograms. Beyond the first level, more serious holography requires much more equipment. Higher power lasers that have long coherence length are desirable; followed by optical components such as first surfaced mirrors, beam splitters, lenses, and stable holders for all components. In general, a vibration isolation table is needed except for those who work on a basement floor in an area free from external disturbance.
Lasers The helium–neon laser was first used by Leith and Upatnieks (3–5) and remains the most popular source today. However, there are numerous alternatives, both continuous wave (cw) and pulsed. Table 1 summarizes lasers now available and their wavelengths and typical outputs. Blue light-emitting solid-state lasers are becoming available. The best choices in the future are solid-state lasers that give highly coherent light and are practically maintenance free. For direct portraiture of human subjects (17) and other mechanically unstable objects, the most popular choice has been the Q-switched ruby laser that has a 694-nm output in the vicinity of 1–10 joules per pulse and a pulse duration of about 20 ns. Lately, pulsed and frequencydoubled Nd:YAG lasers (18) that have a 530-nm output are commercially available. In choosing a laser, the most important factors are 1. Coherence (19): Determines the ranges of optical path differences, that is, the depth of the scene, for the recording. 2. Power: Determines the exposure time. 3. Wavelength: Determines the recording material and the reconstruction color. Optics In general, with few exceptions, holography requires highquality optics. The profile of the Gaussian beam must be preserved. Front-surfaced Mirrors. Because laser beam diameters are small, flat mirrors need not be large. Flatness should be one-fourth wavelength or less. Special coatings for the wavelengths chosen are sometimes necessary for efficiency and for use with high-power lasers. Beam Splitters. The simplest beam splitter is a piece of glass. However, there are two parallel surfaces that result in multiple reflections. Disks that have a graduated coating on one side and an antireflection coating on the other side are available. Cubes made by combining two prisms that have a coated diagonal surface are superior for fixed beam ratios. Because most laser outputs are Table 1. Available Lasers Type He–Ne Argon
Krypton
He–Cd Diode Nd:YAG
Wavelength (nm)
Typical Power Outputs (mW)
633 458 477 488 514 476 521 647 442 635–670 532
1–80 400 800 2,000 3,000 200 300 1,000 100 3–100 500
HOLOGRAPHY
plane-polarized, cubes made by a judicious combination of two calcite prisms afford the possibility of variable beam ratios. Polarization Rotators. All high-quality lasers have plane-polarized outputs. When the beam is split and redirected around the system, sometimes in three dimensions, the polarization directions may be rotated. When two or more of these beams meet at the holoplate to create the hologram, it is highly desirable that their polarization planes be parallel to each other.
509
a necessity. This device detects the movement of interference fringes and provides feedback to control a mirror (typically the reference beam) by using speaker or piezoelectric movements. Sometimes more than one system is required. Electronic shutters are not necessary for low-power lasers, when the exposure exceeds 1 s. However, a commercial shutter is advisable for high-power lasers. Power meters, especially those calibrated for each laser wavelength, are necessary for professional work. Recording Material and Processing Chemicals
Beam Expanders. For high-power pulsed lasers, one must avoid focusing the beam to a small point where ionization of the air may take place. Here, very clean plano-concave lenses or convex front surface mirrors must be used. For cw lasers, a microscope objective that focuses the beam through a pinhole of appropriate size is ideal; this combination is called a spatial filter. Large Collimators. In making high-quality holographic optical elements (HOEs) as well as master holograms, one often needs to create a conjugate beam through a large recording plate. The most obvious element to be used is a large positive lens. A better alternative is a concave mirror. It serves as a single surfaced reflector, which folds the beam to minimize the table size and collimates or focuses as well. For display holography, a telescope mirror does well. For HOEs, off-axis elliptical or parabolic mirrors are sometimes necessary. Diffusers. Sometimes ‘‘soft’’ light for the object is necessary. An expanded beam more than 2 cm in diameter directed on opal or ground glass usually serves the purpose. HOEs. In a basic holographic laboratory. one can create HOEs for more efficient and exotic work (20). For example, the conjugate image of a transmission hologram of a diffused beam (on opal glass) recreates a similar beam but preserves the plane of polarization and localizes the light. A hologram of a plane beam is in fact a beam splitter. When the diffraction efficiency is varied across the hologram, it serves as a variable beam splitter. Optical Fibers (21). It is possible to launch the output of a laser directly into fiber-optic elements that contain fiber couplers as beam splitters and to perform all the functions provided by these components, except for collimation and conjugation (22). Single-mode fibers are necessary for the reference beam. The only disadvantages at this time are insertion losses and higher requirements for mechanical and temperature stability. Hardware Except for pulsed work, the most dominant piece of hardware in a holographic laboratory is the vibration isolation table. Other equipment includes holders for each item of optical equipment, various sizes of holders for photoplates, and vacuum platens for film. For long exposures, fringe lockers (23) should be considered
Silver halide emulsion coated on glass or flexible backing (triacetate or polyester) has been and will remain, for a long time, the workhorse of holographic recording (24). The most commonly used silver-halide materials today (25) are manufactured by Slavich (26) in Russia and HRT (27) in Germany. Plates and film (Slavich only) are available that are sensitized for all visible colors. These materials have resolutions of 5,000–10 000 lines/mm. Dichromated gelatin (DCG) (26,28) is a popular medium if one has high output lasers in the blue region. It offers high diffraction efficiency, low scatter, and environmental stability when correctly sealed in glass. Currently, DCG is used in most HOE applications such as headup displays (HUDs) and common jewelry items in gift shops. Photoresist-coated plates (29) are used for making surface relief master holograms. After proper processing, the plates can be electroformed to create metallic embossing. Thermoplastic (31) material is amenable to electronic processing and can be erased and recorded as many as 1,000 times. It is used in holographic nondestructive testing systems. Photopolymers (32) are the material of the future. They offer high diffraction efficiency and low scatter as does DCG, but can be sensitized to all regions of the visible spectrum. They are promising for HOE and full-color holographic applications (33). GENERAL PROCEDURES Like photography, holography can be practiced at all levels of sophistication and cost. In general, one should begin with the simplest procedures (16) by making single-beam reflection and transmission holograms, using commercially available processing kits (25). From there, one can progress to a more professional level by consulting books and Web sites (34–36). Figure 4 represents a typical configuration for creating a high-quality transmission hologram. The general procedure is as follows: 1. Lay out the optical configuration on a vibration-free table approximately as shown in Fig. 4. Pinholes for spatial filters need not be inserted at this point. 2. This step is optional if the laser used has an internal etalon for selecting a single frequency, thereby emitting a beam of long coherent length. In general, beam path lengths, beginning from the first beam splitter to the photoplate, must be equalized well
510
HOLOGRAPHY
become a large segment of the business. Holograms are now used on credit cards, postage stamps, paper currency, baseball cards, clothing labels, Super Bowl tickets, among many other products. Currently, true full color holograms are being recorded using red, green and blue laser light (40).
M2 L a s e r
L1 S
Reference beam Object S′1 Holoplate
M1
Object beam
Holographic Interferometry (41–43) S
BS
M3
L2
Figure 4. Typical optical design for recording a transmission hologram. M = mirror; L = microscope objective with pinhole at focal point; BS = beam splitter.
within the coherent length available. Tie one end of a long string to the base of the first beam splitter, and measure the total optical paths of the reference beam and the object beam(s). Equalize them by judicious relocation of beam-folding mirrors. 3. Measure the relative intensity of light that arrives at the photoplate from the reference beam and from the illuminated object. The ratio should be at least 4 : 1. 4. Measure the total intensity at the photoplate, and determine the exposure time. In general, a strip test is advisable. This is done by covering the plate with an opaque mask and exposing one vertical strip at a time. When each adjacent strip has twice the exposure, the best choice can then be made from the processed hologram. Often, air movement alone can be a critical deterrent. The system should be covered without introducing further disturbance.
Holograms can reveal microscopic changes on a specimen and yield quantitative data. The three general methods are (1) double exposure, where a change occurs between exposures; (2) time average; a single exposure made while the subject undergoes many normal mode vibrations; and (3) real time, viewed through a hologram at the specimen. Holographic Optical Elements (HOEs) As shown in a previous section, a hologram behaves simultaneously like a lens and a grating. Volume holograms are effective notched filters. Among other current applications are scanning (44), head-up displays for aircraft (HUD) (45), and general optical elements (46). Many current HOEs can be generated by computers (47). Precision Measurements Because of the development of powerful and coherent lasers, deep three-dimensional scenes can be recorded in a short time — 20 ns. Particle sizing (48) was an early application. A more recent example is the recording of bubble-chamber tracks (49) at Fermilab. By using picosecond lasers, ‘‘light-in-flight’’ measurements are possible (50,51). Optical Information Processing and Computing (52)
CURRENT APPLICATIONS With few exceptions, the current applications of holography can be generally classified into the following categories. Because of space limitations, they are simply listed with pertinent references. Display This is the most obvious use of the three-dimensional medium. There is now a small but active group of professional artists around the world who create highly imaginative images (37). Most large cities have galleries that exhibit and sell their works. Custom portrait (38) are recorded routinely by using pulsed lasers. In 1991, group portraits of 200 people in motion were made during the Fourth International Symposium on Display Holography in Lake Forest, Ill (39). For commercial applications, embossed holograms are most popular because they can be mass-produced. National Geographic Magazine has used it in three covers. Because it is difficult to duplicate a hologram without the original master, anticounterfeiting applications have
Besides three-dimensional displays, the first dramatic application of holography was in complex spatial filtering (53). That was the first use of holography in the emerging field of optical computing. The fact that holograms are formed by the interference of two wave fronts and that one would recall the other is directly related to the phenomenon of associated memory in the field of neural networks (54). When a hologram is addressed, a parallel rather than serial output is obtained. Thus, information can be stored and/or transferred at a rate not possible by electronic computers. Furthermore, holograms can be duplicated more easily and economically. A major limitation of electronic computers is interconnections. Thus, connecting two n × n arrays of processors would require n4 wires (55). This is not a problem with holograms. Using new exotic recording materials such as photorefractive crystals, read and write computer memory systems that can handle terabytes of information are on the horizon (56). Based on the development of exotic photorefractive materials, the subject of phase conjugation (57) promises parallel-processing optical amplifiers. Judged by the current activities and funding in research,
HOLOGRAPHY
optical computing may be the most important application of holography, both in extent and in its economic impact. Education Holography is a subject that can be learned at all levels by students of all ages and of divergent interests (58). Basic holograms can be made on a budget below that for photography and without previous training (16). Yet the basic ideas involved relate directly to many Nobel prizes in physics. Those won by Michelson, Lippmann, Bragg, Zernicke, and Gabor are a few examples. Increasingly, this worthy subject is being very successfully integrated into elementary science educational programs (59). It is already an integral part of basic university physics texts (60). BIBLIOGRAPHY 1. D. Gabor, Nature 161, 777–779 (1948). 2. D. Gabor, Proc. Phys. Soc. A194, 454–487 (1949). 3. E. N. Leith and J. Upatnieks, J. Opt. Soc. Am. 53(12), 1,377–1,381 (1963). 4. E. N. Leith and J. Upatnieks, J. Opt. Soc. Am. 53(12), 1,377–1,381 (1963). 5. E. N. Leith and I. Upatnieks, J. Opt. Soc. Am. 54(11), 1,295–1,301 (1964). 6. Yu. N. Denisyuk, Sov. Phys. DokI. 7, 543–545 (1962). 7. Yu. N. Denisyuk, Opt. Spectrosc. 15, 279–284 (1963). 8. Yu. N. Denisyuk, Opt. Spectrosc. 18, 152–157 (1965). 9. I. Goodman, Introduction to Fourier Optics, McGraw-Hill, NY, 1968. 10. R. W. Meier, J. Opt. Soc. Am. 55, 987–992 (1965). 11. F. B. Champagne, J. Opt. Soc. Am. 57, 51–55 (1967). 12. R. R. A. Syms, Practical Volume Holography, Oxford University Press, London, 1991. l3. T. H. Jeong, Am. J. Phys. 714–717 (August 1975). 14. G. Lippmann, I. Phys. Paris, 3, (1894). 15. W. L. Bragg, Nature 143, 678 (1939). 16. T. H. Jeong, R. J. Ro, and M. Iwasaki, Simple Holography, http://members.aol.com/integraf. 17. H. Bjelkhagen, T. H. Jeong, ed., Proc. ISDH, vol. 1, Lake Forest College, Lake Forest, IL, 1982, pp. 45–54. 18. H. E. Bates, Appl. Opt. 12, 1,172–1,178 (1973). 19. T. H. Jeong, Z. Qu., E. Wesly, and Q. Feng Proc. Lake Forest Fourth Int. Symp. Display Holography, SPIE, 1992, vol. 1,600, pp. 387–401. 20. T. H. Jeong, Holography ‘89, Varna, Bulgaria, Yu. N. Denisyuk and T. H. Jeong, eds., Proc. SPIE, 1989, vol. 1,183. 21. I. A. Gilbert and T. D. Dudderar, SPIE lnst. Adv. Technol., P. Greguss and T. H. Jeong, eds., 1991, vol. IS 8, pp. 146–159. 22. T. H. Jeong, Proc. SPIE, R. J. Pryputniewiex ed., 1987, vol. 746, pp. 16–19. 23. N. Phillips, Lake Forest Int. Symp. Display Holography (ISDH), T. H. Jeong, ed., 1985, vol. 11, pp. 111–130. 24. H Bjelkhagen, Silver-Halide Recording Materials for Holography and Their Processing, 2nd ed., Springer-Verlag, London, 1995.
511
25. For updates in availability and current prices of emulsions and chemicals, visit the following Web site: http://members.aol.com/integraf. 26. Y. A. Sazonov and P. I. Kumonko, Proc. SPIE, 3,358, 31–40 (1997). 27. R. Birenheide, Proc. SPIE 3,358, 28–30 (1997). 28. A. B. Coblijn, SPIE Inst. Adv. Technol. P. Greguss and T. H. Jeong, eds., 158, 305–325 (1991). 29. T. J. Cvetkovich, ISDH, 1988 vol. 111, pp. 501–510. 30. J. R. Burns, Proc. SPIE L. Huff, ed., 523, 7–14 (1984). 31. L. H. Lin and H. L. Beauchamp, Appl. Opt. 9, 2,088–2,092 (1970). 32. W. K. Smothers, T. I. Trout, A. M. Weber, and D. I. Mickish, IEEE 2nd Intl. Conf. Holographic Systems Components Appl., University of Bath. U.K., 1989. 33. T. H. Jeong and E. Wesly, Kiev, USSR, SPlE V. Markov and T. H. Jeong, eds., 1,238, 298–305 (1989). 34. R. I. Collier, C. B. Burckhardt, and L. H. Lin, Optical Holography, Academic Press, NY, 1971. 35. G. Saxby, Practical Holography, 2nd ed., Prentice Hall, Englewood Cliffs, NJ, 1995. 36. This Web site gives detailed instructions for making simple holograms: http://members.aol.com/integraf. 37. Section 5 of ISDH, vol. III, Artistic Techniques and Concepts (10 articles by artists). 38. F. Unterseher, T. H. Jeong, ed., ISDH, vol. III, pp. 403–420, 1988. 39. Method and Apparatus for Producing Full Color Stereographic Holograms, U.S. Pat. 5,022,727, June 11, 1991, S. Smith and T. H. Jeong. 40. H. Bjelkhagen, T. H. Jeong, and D. Vukicevic, J. Imaging Sci. Technol. 40(2), 134–146 (1996). 41. C. Vest, Holographic lnterferometry, Wiley Interscience, NY, 1979. 42. N. Abramson, The Making and Evaluation of Holograms, Academic Press, NY, 1981. 43. R. J. Pryputniewicz, SPIE pp. 215–2,461 (1991).
lnst.
Adv.
Technol.
15(8),
44. L. Beiser, Holographic Scanning, John Wiley & Sons, Inc., New York, NY, 1988. 45. D. G. McCauley and C. E. Simpson, AppI. Opt. 12, 232–242 (1973). 46. B. J. Chang and C. D. Leonard, AppI. Opt. 15, 2,407 (1979). 47. I. N Cindrich and S. H. Lee, eds., Proc. SPIE, vol. 1,052, 1989. 48. B. J. Thompson, J. Phys. E: Sci. Instrum. 7, 781–788 (1974). 49. H. Akbari and H. Bjelkhagen, Big Bubble Chamber Holography, T. H. Jeong, ed., ISDH II, 1988, pp. 97–110. 50. N. Abramson, Proc. SPIE, vol. 22, pp. 215–222, 1983. 51. S. G. Pettersson, H. Bergstrom, and N. Abramson, ISDH III, T. H. Jeong, ed., pp. 315–325 (1988). 52. H. J. Caulfield, SPIE Inst. Adv. Opt. Technol. 15(8), pp. 54–61 (1991). 53. A. VanderLugt, IEEE Trans. Inf. Theory, IT-b, 139–145 (1964). 54. H. I. Caulfield, I. Kinser, and S. K. Rogers, Proc. IEEE, 77, 1,573–1,590 (1989). 55. H. J. Caulfield, Appl. Opt. 26, 4,039–4,040 (1987). 56. T. Parish, Byte pp. 283–288, (November 1990). 57. R. Fisher, Optical Phase Conjugation, Academic Press, NY, 1983.
512
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
58. T. H. Jeong, P. Greguss and T. H. Jeong, eds., SPIE Inst. Adv. Opt. Technol., vol. IS 8, 1991, pp. 360–369. 59. F. Tomaszkiewicz, Proc. SPIE, vol. 1,600, 1992, pp. 263–267. 60. D. Halliday and R. Resnick, Fundamental of Physics, 3rd ed., John Wiley & Sons, Inc., NY, 1988, essay 18.
Cornea
Lens Retina
Fovea
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING MARK D. FAIRCHILD Rochester Institute of Technology Rochester, NY
Pupil
INTRODUCTION Iris
This article covers the basic structure and function of the human visual system with respect to our perception of color and procedures for quantifying those color perceptions, colorimetry, by measuring and specifying light sources, materials, and human observers. Specific topics addressed include visual anatomy and physiology related to color perception, the spatial and temporal properties of color vision, color vision deficiencies, basic and advanced colorimetry, light sources and illuminants, materials, tristimulus values, CIE colorimetry, chromaticity coordinates, color spaces, color difference metrics, and color appearance models. A basic understanding of human color vision and color specification is important for imaging science and technology to develop insight into the design, engineering, and limitations of color imaging systems and further improve such systems. CIE colorimetry is also the fundamental basis of modern color management systems that enable device-independent color imaging in digital systems. It is very difficult to cover all of this material in sufficient detail in a single encyclopedia article, so interested readers should look to books devoted to color science for more detailed information. Wyszecki and Stiles’ Color Science (1) is the definitive reference book in the field. Hunt’s Reproduction of Colour (2) provides a good overview of current and historical color reproduction technology, and his Measuring Colour (3) is a good introductory colorimetry text oriented toward imaging industries. Kaiser and Boynton’s Human Color Vision (4) provides a comprehensive overview of visual anatomy, physiology, and performance. Gegenfurtner and Sharpe’s Color Vision: From Genes to Perception (5) includes a detailed examination of recent and historical research on color vision. Last, Fairchild’s Color Appearance Models (6) provides a concise overview of human color perception and colorimetry and includes the most advanced color appearance models. VISUAL ANATOMY OF COLOR VISION Our visual perceptions are initiated and strongly influenced by the anatomical structure of the eye. Figure 1 shows a schematic representation of the optical structure of the human eye, and some key features are labeled. The human eye is analogous to a camera. The cornea and lens
Optic nerve
Figure 1. Schematic structure of the human eye. Key features for color vision are highlighted.
act together like a camera lens to focus an image of the visual world on the retina at the back of the eye, which acts like the image sensor of a camera. These and other structures have a significant impact on our perception of color. The cornea is the transparent outer surface of the front of the eye through which light passes. It is the most significant image-forming element of the eye because its curved surface at the interface with air represents the largest change in refractive index in the eye’s optical system. The cornea is avascular (i.e., lacks blood vessels) and receives its nutrients from marginal blood vessels and fluids surrounding it. Refractive errors such as nearsightedness (myopia), farsightedness (hyperopia), or astigmatism can be attributed to variations in the shape of the cornea. The volume between the cornea and lens is filled with the aqueous humor, which is essentially water. The region between the lens and retina is filled with vitreous humor, which is also a fluid, but has a substantially higher viscosity. Both humors exist in a state of slightly elevated pressure to ensure that the flexible eyeball retains its shape and dimensions and to avoid the deleterious effects of wavering retinal images. The flexibility of the entire eyeball increases its resistance to injury because it is difficult to break a structure that gives way under impact. Because the refractive indexes of the humors are roughly equal to that of water and those of the cornea and lens are only slightly higher, the rear surface of the cornea and the entire lens have relatively little optical power. The iris is the pigmented muscle that controls pupil size. The iris’ pigment gives each of us our specific eye color. Eye color is determined by the concentration and distribution of melanin within the iris. The pupil, which is the hole in the middle of the iris through which light passes, controls the level of illuminance
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
on the retina. Pupil size is largely determined by the overall level of scene illumination, but it is important to note that it can also vary with nonvisual phenomena such as arousal. Thus, it is difficult to predict pupil size accurately from the prevailing illumination. In practical situations, pupil diameter varies from about 3 mm to about 7 mm. This change in pupil diameter results in approximately a fivefold change in pupil area and therefore retinal illuminance. The visual sensitivity change with pupil area is further limited by the fact that marginal rays are less effective at stimulating visual response in the cones than central rays (the Stiles–Crawford effect). The change in pupil diameter alone is not sufficient to explain excellent human visual function across prevailing scene illumination levels that can vary across more than 10 orders of magnitude. Various mechanisms of adaptation, discussed later, are also required. The lens has the function of accommodation, our ability to focus on objects at various viewing distances. It is a layered, flexible structure that varies in index of refraction. It is a naturally occurring, gradient-index, optical element whose index of refraction is higher in the center than at the edges. This feature reduces some of the aberrations that might normally be present in a simple optical system. The shape of the lens is controlled by the ciliary muscles. When we gaze at a nearby object, the lens becomes ‘‘fatter’’ and thus has increased optical power to allow us to focus on the near object. When we gaze at a distant object, the lens becomes ‘‘flatter’’ resulting in the decreased optical power required to bring far away objects into sharp focus. As we age, the internal structure of the lens changes, resulting in a loss of flexibility. Generally, at about 50 years of age, the lens has lost most of its flexibility, and observers can no longer focus on near objects (this is called presbyopia, or ‘‘old eye’’). At this point, most people need reading glasses or bifocals. Concurrent with the hardening of the lens is an increase in its optical density. The lens absorbs and scatters short wavelength (blue and violet) energy. As it hardens, the level of this absorption and scattering increases. In other words, the lens becomes more and more yellow with age. Various mechanisms of chromatic adaptation generally make us unaware of these gradual changes. However, we are all looking at the world through a yellow filter that changes with age, and is also significantly different from observer to observer. The effects are most noticeable when performing critical color matching or comparing color matches with other observers. The effect is particularly apparent for purple objects. An older lens absorbs most of the blue energy reflected from a purple object but does not affect the reflected red energy, so older observers tend to report that the object is significantly redder than reported by younger observers. The optical image formed by the eye is projected onto the retina. The retina is a thin layer of cells, approximately the thickness of tissue paper. It is located at the back of the eye and incorporates the visual system’s photosensitive cells and initial signal-processing and transmission ‘‘circuitry.’’ These cells are neurons, part of the central nervous system,
513
and can appropriately be considered part of the brain. The photoreceptors, rods, and cones transduce the information in the optical image into chemical and electrical signals that can be transmitted to the later stages of the visual system. These signals are then processed by a network of cells and transmitted to the brain through the optic nerve. More detail about the retina is presented in the next section. Behind the retina is a layer known as the pigment epithelium. This dark pigment layer absorbs any light that happens to pass through the retina without being absorbed by the photoreceptors. The optical function of the pigment epithelium is to prevent light from being scattered back through the retina and thus reducing the sharpness and contrast of the perceived image. Nocturnal animals give up this improved image quality in exchange for a highly reflective tapetum that reflects the light back to provide a second chance for the photoreceptors to absorb the energy. This is the reason that the eyes of a nocturnal animal caught in the headlights of an oncoming automobile glow. Perhaps the most important structural area on the retina is the fovea, the area on the retina where we have the best spatial and color vision. When we look at, or fixate, an object in our visual field, we move our heads and eyes so that the image of the object falls on the fovea. To illustrate how drastically spatial acuity falls off as the stimulus moves away from the fovea, try to read preceding text in this paragraph while fixating on the period at the end of this sentence. It is difficult to read text that is only a few lines away from the point of fixation. The fovea covers an area that subtends about 2° of visual angle in the central field of vision. To visualize 2° of visual angle, the width of your thumbnail, held at arm’s length, is approximately 1° of visual angle. The fovea is also protected by a yellow filter known as the macula. The macula protects this critical area of the retina from intense exposures to short wavelength energy. It might also reduce the effects of chromatic aberration that cause the short wavelength image to be rather severely out of focus most of the time. Unlike the lens, the macula does not become yellower with age. However, there are significant differences in the optical density of the macular pigment from observer to observer and in some cases between a single observer’s left and right eyes. The yellow filters of the lens and macula, through which we all view the world, are the major sources of variability in color vision among observers who have normal color vision. A last key structure of the eye is the optic nerve. The optic nerve is made up of the axons (outputs) of the ganglion cells, the last level of neural processing in the retina. It is interesting to note that the optic nerve contains approximately one million fibers that carry information generated by approximately 130 million photoreceptors. Thus, there is a clear compression of the visual signal before transmission to higher levels of the visual system. A one-to-one ‘‘pixel map’’ of the visual stimulus is never available for processing by the brain’s higher visual mechanisms. This processing is explored in greater detail later. Because the optic nerve takes up all of the space that would normally be populated by photoreceptors, there is a
514
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
small area in each eye in which no visual stimulation can occur. This area is known as the blind spot. The structures described before have a clear impact in shaping and defining the information available to the visual system that ultimately results in the perception of color appearance. The action of the pupil defines retinal illuminance levels that, in turn, have a dramatic impact on color appearance. The yellow filtering effects of the lens and macula modulate the spectral responsivity of our visual system and introduce significant interobserver variability. The neural networks in the retina reiterate that visual perception in general, and specifically color appearance, cannot be treated as simple pointwise imageprocessing problems. Several of these important features are discussed in more detail in the following sections on the retina, visual physiology, and visual performance. More details on the anatomy, or optics, of the eye can be found an excellent overview by Fry, The Eye as an Optical System, which is Chap. 2 of Bartleson and Grum’s volume, Visual Measurements (7) and in Chap. 2 of Wyszecki and Stiles (1). Ophthalmology texts [e.g., (8)] also offer an abundance of useful information on the optics and anatomy of the eye. THE RETINA Figure 2 shows a cross-sectional representation of the retina (note that light comes from the top in Fig. 2). The retina includes several layers of neural cells beginning with the photoreceptors, the rods and the cones. A vertical signal-processing chain through the
Ganglion
Amacrine
retina can be constructed by examining the connections of photoreceptors to bipolar cells, which are connected to ganglion cells, which form the optic nerve. Even this simple pathway compares and combines the signals from multiple photoreceptors because multiple photoreceptors provide input to many of the bipolar cells and multiple bipolar cells provide input to many of the ganglion cells. More importantly, this simple concept of retinal signal processing ignores two other significant types of cells. These are the horizontal cells that connect photoreceptors and bipolar cells laterally to one another and the amacrine cells that connect bipolar cells and ganglion cells laterally to one another. Figure 2 provides only a slight indication of the extent of these various interconnections. The specific processing that occurs in each type of cell is not completely understood and is beyond the scope of this article, but it is important to realize that the signals transmitted from the retina to the higher levels of the brain via the ganglion cells are not simple pointwise representations of receptor signals, but rather consist of sophisticated combinations of receptor signals. To envision the complexity of the retinal processing, keep in mind that each synapse between neural cells can effectively perform a mathematical operation (add, subtract, multiply, divide) in addition to the amplification, gain control, and nonlinearities that can occur within the neural cells (9). Thus, the network of cells within the retina is a sophisticated imageprocessing system. This is the way information from 130 million photoreceptors can be reduced to signals in approximately one million ganglion cells without losing visually meaningful data. It is noteworthy that light passes through all of the neural machinery of the retina before reaching the photoreceptors. This has little impact on visual performance because these cells are transparent and in fixed position, and thus are not perceived. Retinal blood vessels are not transparent but are normally not perceived due to their fixed spatial location relative to the photoreceptors — the visual system responds only to spatial or temporal changes. This design also allows the significant amounts of nutrients required and the waste produced by the photoreceptors to be processed through the back of the eye.
Bipolar
Photoreceptors
Photoreceptor responsivity
Horizontal
1.25 S
1.00
Rod M
L
500
600
0.75 0.50 0.25 0.00 400
Pigmented epithelium Figure 2. Schematic cross-section of the human retina. Note that light enters the retina from the top of the diagram.
700
Wavelength (nm) Figure 3. Spectral responsivities of the three classes of human cones.
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
1.25 Relative luminous efficiency
The two classes of retinal photoreceptors are rods and cones. Rods and cones derive their respective names from their prototypical shape. Rods tend to be long and slender, whereas peripheral cones are conical. This distinction is misleading because foveal cones, which are tightly packed due to their high density in the fovea, are long and slender and resemble peripheral rods. The more important distinction between rods and cones is visual function. Rods serve vision at low luminance levels (e.g., less than 1 cd/m2 ), whereas cones serve vision at higher luminance levels. Thus the transition from rod to cone vision is one mechanism that allows our visual system to function over a wide range of luminance levels. At high luminance levels (e.g., greater than 100 cd/m2 ) the rods are effectively saturated, and only the cones function. In the intermediate luminance levels, both rods and cones function and contribute to vision. Vision when only rods are active is referred to as scotopic vision. Vision served only by cones is referred to as photopic vision, and the term mesopic vision refers to vision in which both rods and cones are active. Rods and cones also differ substantially in their spectral responsivities, as illustrated in Fig. 3. Only one type of rod receptor has a peak spectral responsivity at approximately 510 nm. There are three types of cone receptors that have peak spectral responsivities spaced through the visual spectrum. The three types of cones are most properly referred to as L, M, and S cones. These names refer to the longwavelength, middle-wavelength, and short-wavelength sensitive cones, respectively. Sometimes the cones are denoted by other symbols such as RGB, suggestive of red, green, and blue sensitivities. However, the LMS names are more appropriately descriptive. Note that the spectral responsivities of the three cone types are broadly overlapping; this design is significantly different from the ‘‘color separation’’ responsivities that are often built into physical imaging systems. Such sensitivities, typically incorporated in imaging systems for practical reasons, are the fundamental reason that accurate color reproduction is often difficult, if not impossible. Recent work on determining cone spectral responsivities is reviewed by Sharpe et al. and Stockman et al. in chapters 1 and 2 of Ref. (5). The three types of cones clearly serve color vision. Because there is only one type of rod, the rod system is incapable of color vision alone. This can easily be observed by viewing a normally colorful scene at very low luminance levels. These properties of color vision are explained by the principle of univariance. Univariance reiterates the point that the output of a photoreceptor class is one-dimensional and, despite the fact that a given photoreceptor type responds to more than one wavelength, there is no way to examine the photoreceptor output to determine which wavelength was present. Only with the comparative signals between two or more photoreceptor classes is color vision possible. This principle generalizes beyond human vision to any sort of imaging system. Kaiser and Boynton (4) provide an enlightening overview of the importance of the concept of univariance. Figure 4 illustrates the two CIE spectral luminous efficiency functions, the V (λ) function for scotopic (rod) vision and the V(λ) function for photopic (cone) vision.
515
V′
1.00
V
0.75 0.50 0.25 0.00 400
500 600 Wavelength (nm)
700
Figure 4. CIE V(λ) and V (λ) luminous efficiency functions.
These functions represent the overall sensitivity of the two systems with respect to the perceived brightness of various wavelengths. Because there is only one type of rod, the V (λ) function is identical to the spectral responsivity of the rods and depends on the spectral absorption of rhodopsin, the photosensitive pigment in rods. The V(λ) function, however, represents a weighted combination of the three types of cone signals rather than the responsivity of any single cone type. Note the difference in peak spectral sensitivity between scotopic and photopic vision. In scotopic vision, we are more sensitive to shorter wavelengths. This effect, known as the Purkinje shift, can be observed by finding two objects, one blue and the other red, that appear to have the same lightness when viewed in daylight. When the same two objects are viewed under very low luminance levels, the blue object will appear quite light whereas the red object will appear nearly black because of the scotopic spectral sensitivity function. Another important feature of the three cone types is their relative distribution in the retina. The S cones are relatively sparsely populated throughout the retina and are completely absent in the most central area of the fovea. There are far more L and M cones than S cones and there are approximately twice as many L cones as M cones. The relative populations of the L:M:S cones are approximately 40:20:1, on average, and significant interobserver variability is quite likely. These relative populations must be considered when combining the cone responses (plotted with independent normalization in Fig. 3) to predict higher level visual responses. Note that no rods are present in the fovea. This feature of the visual system can also be observed when trying to look directly at a small dimly illuminated object, such as a faint star at night. It disappears because its image falls on the foveal area where there are no rods to detect the dim stimulus. Figure 5 shows the distribution of rods and cones across the retina, where the perimetric angle is a horizontal visual angle, the fovea is at 0° , and positive angles run toward the nasal side of the retina. Several important features of the retina can be observed in Fig. 5. First, notice the extremely large numbers of photoreceptors. In some retinal regions, there are about 150,000 photoreceptors per square millimeter of
516
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
Photoreceptors per mm2
160000 140000 120000
Rods
100000 80000 60000 40000 20000
Cones
0 −60 −40 −20 0 20 40 60 Perimetric angle (deg.)
80
Figure 5. Packing density of rod and cone photoreceptors along a horizontal cross section of the human retina. The fovea is located at a parametric angle of 0° .
retina! Also notice that there are far more rods (around 120 million per retina) than cones (around 7 million per retina). This might seem somewhat counterintuitive because cones function at high luminance levels and produce high visual acuity, whereas rods function at low luminance levels and produce significantly reduced visual acuity (analogous to low-speed, fine-grain photographic film vs. a high-speed, coarse-grain film). The solution to this apparent mystery lies in the fact that single cones feed into ganglion cell signals, whereas rods pool their responses over hundreds of receptors (feeding into a single ganglion cell) to produce increased sensitivity at the expense of acuity. This also partially explains how the information from so many receptors can be transmitted through one million ganglion cells. Figure 5 also illustrates that cone receptors are highly concentrated in the fovea and more sparsely populated throughout the peripheral retina, whereas no rods are in the central fovea. The lack of rods in the central fovea allows using that valuable space to produce the highest possible spatial acuity from the cone system. A final feature to be noted in Fig. 5 is the blind spot. This is the area, 12–15° from the fovea, where the optic nerve is formed, and there is no room for photoreceptors. Figure 6 provides some stimuli that can be used to demonstrate the existence of the blind spot. One reason that the blind spot generally goes unnoticed is that it is located on opposite sides of the visual field in each of the two eyes. However, even when one eye is closed, the blind spot is not generally noticed. To observe your blind spot, close your left eye, and fixate the cross in Fig. 6a with your right eye. Then adjust the viewing distance of the book until the spot to the right of the cross disappears when it falls on the blind spot. Note that what you see when the spot disappears is not a black region, but rather
(a) Figure 6. Stimuli to illustrate the retinal blind spot. See text for explanation.
(b)
it appears to be an area of blank paper. This is an example of a phenomenon known as filling-in. Because your brain no longer has any signal indicating a change in the visual stimulus at that location, it simply fills in the most probable stimulus, in this case a uniform white piece of paper. The strength of this filling-in can be illustrated by using Fig. 6b to probe your blind spot. In this case, with your left eye closed, fixate the cross with your right eye, and adjust the viewing distance until the gap in the line disappears when it falls on your blind spot. Amazingly, the perception is that of a continuous line because now that is the most probable visual stimulus. The filling-in phenomenon goes a long way toward explaining the function of the visual system. The signals present in the ganglion cells represent only local changes in the visual stimulus. Effectively, only information about spatial or temporal transitions (i.e., edges) is transmitted to the brain. Perceptually, this code is sorted out by examining the nature of the changes and filling in the appropriate uniform perception until a new transition is signaled. This coding provides tremendous savings in bandwidth to transmit the signal and can be thought of as somewhat similar to run-length encoding that is sometimes used in digital imaging. In addition to the previously mentioned sources (1,4), further information about retinal encoding and its relationships with perception can be found in the classic work of Cornsweet (10) and the congruous and unique treatment of the visual system by Wandell (11). VISUAL SIGNAL PROCESSING The neural processing of visual information is quite complex within the retina and becomes significantly more complex at later stages. This section provides a brief overview of the paths that some of this information takes. It is helpful to begin with a general map of the steps along the way. The optical image on the retina is first transduced into chemical and electrical signals in the photoreceptors. Then, these signals are processed through the network of retinal neurons (horizontal, bipolar, amacrine, and ganglion cells) described earlier. The ganglion-cell axons gather to form the optic nerve, which projects, or connects upstream, to the lateral geniculate nucleus (LGN) in the thalamus. After the LGN cells gather input from the ganglion cells, they project to visual area one (V1) in the occipital lobe of the cortex. At this point, the information processing becomes amazingly complex. Approximately 30 visual areas have been defined in the cortex named, for example, V2, V3, V4, and MT. Signals from these areas project to several other areas and vice versa. The cortical processing includes many instances of feed-forward, feedback, and lateral processing. Our ultimate perceptions are formed somewhere in this network of information.
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
Relative photoreceptor response
A few more details of these processes are described in the following paragraphs. Further details on the various physiological stages of visual processing can be found in the collection of papers edited by Spillmann and Werner (12). Light incident on the retina is absorbed by photopigments in the various photoreceptors. In rods, the photopigment is rhodopsin. Upon absorbing a photon, rhodopsin changes in structure, setting off a chemical chain reaction that ultimately results in closing ion channels in its cell walls which produce an electrical signal based on the relative concentrations of various ions (e.g., sodium and potassium) inside and outside the cell wall. A similar process takes place in cones. Rhodopsin is made up of opsin and retinal. Cones have similar photopigment structures. However, the ‘‘cone-opsins’’ in cones have slightly different molecular structures that result in the various spectral responsivities observed in the cones. Each type of cone (L, M, or S) contains a different form of ‘‘cone-opsin.’’ Figure 7 illustrates the relative responses of the photoreceptors as a function of retinal exposure(see Chap. 5 in Ref. 12). It is interesting to note that these functions have characteristics similar to those found in all imaging systems. There is a threshold at the low end of the receptor responses, below which the receptors do not respond. Then, there is a fairly linear portion of the curves followed by response saturation at the high end. Such curves are representations of the photocurrent at the receptors and represent the very first stage of visual processing. These signals are then processed through the retinal neurons and synapses until a transformed representation is generated in the ganglion cells for transmission through the optic nerve. For various reasons, including noise suppression and transmission speed, the amplitude-modulated signals in the photoreceptors are converted into frequencymodulated representations at the ganglion-cell and higher levels. In these, and indeed most, neural cells, the magnitude of the signal is represented in terms of the number of spikes of voltage per second fired by the cell rather than by the voltage difference across the cell wall (9). The concept of receptive fields becomes useful to represent the physiological properties of these cells. A
1.0 Rods
0.8 0.6 0.4
receptive field is a spatial map of the area in the visual field to which a given cell responds. In addition, the nature of the response (e.g., positive, negative, spectral bias,) is typically indicated for various regions in the receptive field. As a simple example, the receptive field of a photoreceptor is a small circular area that represents the size and location of that particular receptor’s sensitivity in the visual field. Figure 8 represents some prototypical receptive fields for ganglion cells. They illustrate centersurround antagonism, which is characteristic at this level of visual processing. The receptive field in Fig. 8a illustrates a positive central response, typically generated by a positive input from a single cone that is surrounded by a negative surround response, typically driven by negative inputs from several neighboring cones. Note that the notation of positive and negative is arbitrary and should be considered only relative to one another. Thus the response of this ganglion cell is made up of inputs from a number of cones that have both positive and negative signs. The result is that the ganglion cell does not simply respond to points of light but is an edge detector (actually a ‘‘spot’’ detector). Those familiar with concepts of digital image processing can think of the ganglion-cell responses as similar to the output of a convolution kernel designed for edge detection. Figure 8b illustrates a ganglion cell of opposite response polarity, which is equally likely to occur. The response in Fig. 8a is considered an on-center ganglion cell whereas that in Fig. 8b is called an off-center ganglion cell. Often on-center and off-center cells exist at the same spatial location and are fed by the same photoreceptors, resulting in an enhancement of the system’s dynamic range. Note that the ganglion cells represented in Fig. 8 do not respond to uniform fields (because the positive and negative areas are balanced). This illustrates one aspect of image compression carried out in the retina. The brain is not bothered with redundant visual information; only information about changes in the visual world is transmitted. This spatial information processing in the visual system is the fundamental basis of the important impact of the background on color appearance. Figure 8 illustrates spatial opponency in ganglion-cell responses. Figure 9 shows that in addition to spatial opponency, there is often spectral opponency in ganglion-cell responses. Figure 9a shows a red-green opponent response where the center is fed by positive input from an L cone and the surround is fed by negative input from M cones. Figure 9b illustrates the off-center version of this cell. Thus, before the visual information has even left the retina, processing has occurred that has a profound effect on color appearance.
(a)
Cones
517
−
(b)
+
0.2
+
−
0.0 −1
0
1 2 3 Log relative energy
4
5
Figure 7. Protypical output of rod and cone photoreceptors as a function of exposure.
Figure 8. Typical receptive fields for (a) on-center and (b) offcenter ganglion cells.
518
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
(a)
−G +R
(b)
+G −R
Figure 9. Typical receptive fields for red–green opponent ganglion cells.
Figures 8 and 9 illustrate typical ganglion-cell receptive fields. There are other types and varieties of ganglion-cell responses, but they all share these basic concepts. On their way to the primary visual cortex, visual signals pass through the LGN. The ganglion cells terminate at the LGN and make synapses with LGN cells, and there appears to be a one-to-one correspondence between ganglion cells and LGN cells. Thus, the receptive fields of LGN cells are identical to those of ganglion cells. The LGN acts as a relay station for the signals. However, it probably serves some visual function because there are neural projections from the cortex back to the LGN that could serve as some type of switching or adaptation feedback mechanism. It is also possible that the physiological recordings made to date simply have not appropriately quantified the nature of LGN responses. The axons of LGN cells project to visual area one (V1) in the visual cortex. Lennie and D’Zmura (13) provide an excellent review of the physiological mechanisms of color vision, including those of the LGN and cortex. In area V1 of the cortex, the encoding of visual information becomes significantly more complex. Much as the outputs of various photoreceptors are combined and compared to produce ganglion-cell responses, the outputs of various LGN cells are compared and combined to produce cortical responses. As the signals move further up in the cortical processing chain, this process repeats itself, and the level of complexity increases very rapidly to the point that receptive fields begin to lose meaning. Cells can be found in V1 that selectively respond to oriented edges or bars; input from one eye, the other, or both; various spatial frequencies; various temporal frequencies; particular spatial locations; and various combinations of these features. Classic work in this area was published by Hubel and Wiesel (14), for which they received a Nobel prize. In addition, cells can be found that linearly combine inputs from LGN cells and others with nonlinear summation. All of these various responses are necessary to support visual capabilities such as the perceptions of size, shape, location, motion, depth, and color. Given the complexity of cortical responses in V1 cells, it is not difficult to imagine how complex visual responses can become in an interwoven network of approximately 30 visual areas. Bear in mind that this refers to connections of areas, not cells. On the order of 109 cortical neurons serve visual functions. At these stages, it becomes exceedingly difficult to explain the function of single cortical cells in simple terms. In fact, the function of a single cell might not have meaning because the representation of various perceptions must be distributed across collections
of cells throughout the cortex. Rather than attempting to explore the physiology further, the following sections describe some of the overall perceptual and psychophysical properties of the visual system that help to specify its performance. More on cortical physiology with respect to color vision can be found in Ref. (5) and (13). MECHANISMS OF COLOR VISION Many theories have attempted to explain the function of color vision. A brief look at some of the more modern ideas provides useful insight into current concepts. In the later half of the nineteenth century, the trichromatic theory of color vision was developed, based on the work of Maxwell, Young, and Helmholtz. They recognized that there must be three types of receptors, approximately sensitive to the red, green, and blue regions of the spectrum. The trichromatic theory simply assumed that three images of the world were formed by these three sets of receptors and then transmitted to the brain where the ratios of the signals in each of the images was compared to sort out color appearances. The trichromatic (three-receptor) nature of color vision was not in doubt, but the idea that three images are transmitted to the brain is both inefficient and fails to explain several visually observed phenomena. At around the same time, Hering proposed an opponent-colors theory of color vision based on many subjective observations of color appearance. These observations included the appearance of hues, simultaneous contrast, afterimages, and color vision deficiencies. Hering noted that it was never perceived that certain hues occur together. For example, a color perception is never described as reddishgreen or yellowish-blue, whereas combinations of red and yellow, red and blue, green and yellow, and green and blue are readily perceived. This suggested to Hering that there was something fundamental about the red–green and yellow–blue pairs that caused them to oppose one another. Similar observations were made of simultaneous contrast in which objects placed on a red background appear greener, on a green background appear redder, on a yellow background appear bluer, and on a blue background appear yellower. Visual afterimages also have an opponent nature. The afterimage of red is green, green is red, yellow is blue, and blue is yellow. Last, Hering observed that those who have color vision deficiencies lack the ability to distinguish hues in red–green or yellow–blue pairs. All of these observations provide clues about color information processing in the visual system. Hering proposed that there were three types of receptors, but Hering’s receptors had bipolar responses to light–dark, red–green, and yellow–blue. At the time, this was considered physiologically implausible, and Hering’s opponent theory did not receive appropriate acceptance. Turner (15) provides an interesting historical perspective on the disputes between Helmholtz and Hering. In the middle of the twentieth century, Hering’s opponent theory enjoyed a revival of sorts when quantitative data that supported it began to appear. For example, Svaetichin (16) found opponent signals in electrophysiological measurements of responses in the retinas of
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
by becoming more sensitive and therefore produces a meaningful visual response at the lower illumination level. Figure 10 illustrates the recovery of visual sensitivity (decrease in threshold) after transition from an extremely high illumination level to complete darkness. At first, the cones gradually become more sensitive, until the curve levels off after a couple of minutes. Then, until about 10 minutes have passed, visual sensitivity is roughly constant. At that point, the rod system, which has a longer recovery time, has recovered enough sensitivity to outperform the cones, and thus the rods begin controlling overall sensitivity. The rod sensitivity continues to improve until it becomes asymptotic after about 30 minutes. Recall that the fivefold change in pupil diameter is not sufficient to serve vision across the wide range of illumination levels typically encountered, and data such as those illustrated in Fig. 10 are often collected by using fixed, artificial pupils. Therefore, neural mechanisms must produce some adaptation. Mechanisms thought responsible for various types of adaptation include depletion and regeneration of photopigment, the rod–cone transition, gain control in the receptors and other retinal cells, variation of pooling regions across photoreceptors, spatial and temporal opponency, gain control in opponent and other higher level mechanisms, neural feedback, response compression, and cognitive interpretation. The colorimetric characteristics of adaptation are discussed in more detail by Fairchild (6), and physiological details can be found in the chapter by Walraven et al. in Ref. (12). Light adaptation is essentially the inverse process of dark adaptation. However, it is important to consider it separately because its visual properties differ. Light adaptation occurs when leaving a darkened theater and returning outdoors on a sunny afternoon. In this case, the visual system must become less sensitive to produce useful perceptions because significantly more visible energy is available. The same physiological mechanisms serve light adaptation, but the forward and reverse kinetics are asymmetrical, so that the time course of light adaptation is on the order of five minutes, rather than 30 minutes. Figure 11 illustrates the utility of light adaptation. The
5 4 Relative threshold
goldfish (which happen to be trichromatic!). DeValois et al. (17), found similar opponent physiological responses in the LGN cells of the macaque monkey. Jameson and Hurvich (18) also added quantitative psychophysical data through their hue-cancellation experiments with human observers that allowed measurement of the relative spectral sensitivities of opponent pathways. These data, combined with the overwhelming support of much additional research since that time, led to the development of the modern opponent theory of color vision (sometimes called a stage theory). An overview of the modern interpretation of the opponent theory can be found in Hurvich’s book Color Vision (19). According to a stage theory, the first stage of color vision, the receptors, is indeed trichromatic, as hypothesized by Maxwell, Young, and Helmholtz. However, contrary to simple trichromatic theory, the three ‘‘color-separation’’ images are not transmitted directly to the brain. Instead the neurons of the retina (and perhaps higher levels) encode the color into opponent signals. The outputs of all three cone types are summed (L + M + S) to produce an achromatic response that matches the CIE V(λ) curve, as long as the summation is taken in proportion to the relative populations of the three cone types. Note here that some evidence suggests that the luminance channel does not receive input from shortwavelength sensitive cones, but it is a sum only of L and M signals. Although the contribution of S cones to luminance is still debated, it is clear that any contribution would be small due to the paucity of S cones and can be neglected in many practical applications (see Chap. 2 in Ref. 5). Differencing of the cone signals allows constructing of red–green (L − M + S) and yellow–blue (L + M − S) opponent signals. The transformation from LMS signals to the opponent signals decorrelates the color information carried in the three channels and thus allows more efficient signal transmission and reduces difficulties with noise. The three opponent pathways also have distinct spatial and temporal characteristics that are important for predicting color appearance. They are discussed further later. The importance of the transformation from trichromatic to opponent signals for color appearance is reflected in the prominent place that it finds within the formulation of mathematical color spaces and practical color imaging systems such as NTSC television and JPEG image compression (2). It is not enough to consider the processing of color signals in the human visual system as a static ‘‘wiring diagram.’’ The dynamic mechanisms of adaptation that optimize the visual response to the particular viewing environment must also be considered. Of particular relevance to the study of color are the mechanisms of dark, light, and chromatic adaptation. Dark adaptation refers to the change in visual sensitivity that occurs when the prevailing level of illumination is decreased, such as when walking into a darkened theater on a sunny afternoon. At first, the entire theater appears completely dark, but after a few minutes one can clearly see objects in the theater such as the aisles, seats, and other people. This happens because the visual system is responding to the lack of illumination
519
3 2 1 0 0
10 20 30 Time after bleaching exposure (min)
Figure 10. The recovery of light sensitivity after a bleaching exposure illustrates the rod–cone transition after about 10 minutes.
520
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
1.5
0.8 0.6 0.4 0.2
Relative cone responsivity
Relative response
1
1.25 1 0.75 0.5 0.25 0
0 −2 −1 0 1 2 3 4 5 6 7 8 9 10 Log relative energy Figure 11. Illustration of the result of light and dark adaptation. The solid lines show responses adapted to various exposure levels, and the dashed line shows the response that would be necessary to cover the same input dynamic range without any mechanisms of adaptation.
visual system has a limited output dynamic range, say 100:1, available for the signals that produce our perceptions. The world in which we function, however, includes illumination levels that cover at least 10 orders of magnitude from a starlit night to a sunny afternoon. Fortunately, it is almost never important to view the entire range of illumination levels at the same time. If a single response function were used to map the large range of stimulus intensities into the visual system’s output, then only a small range of the available output would be used for any given scene. Such a response is shown by the dashed line in Fig. 11. Clearly, such a response function would limit the perceived contrast of any given scene, and visual sensitivity to changes would be severely degraded due to signal-to-noise issues. On the other hand, light adaptation produces a family of visual response curves illustrated by the solid lines in Fig. 11. These curves map the useful illumination range in any given scene into the full dynamic range of the visual output, thus resulting in the best possible visual perception for each situation. Light adaptation can be thought of as the process of sliding the visual response curve along the illumination level axis in Fig. 11, until the optimum level for the given viewing conditions is reached. Light and dark adaptation can be thought of as analogous to an automatic exposure control in a photographic system. The third type of adaptation, closely related to light and dark adaptation, is chromatic adaptation. Again, similar physiological mechanisms it is thought, produce chromatic adaptation. Chromatic adaptation refers to the largely independent sensitivity control of the three mechanisms of color vision. As illustrated schematically in Fig. 12, this shows that the overall height of the three cone spectral responsivity curves can vary independently. Chromatic adaptation is often discussed and modeled as independent sensitivity control in the cones, but there is no reason to believe that it does not also occur in opponent and other color mechanisms. Chromatic adaptation can be observed by examining a white object, such as a piece of paper, under various types of illumination
400
500 600 Wavelength (nm)
700
Figure 12. The concept of chromatic adaptation whereby the three cone mechanisms can independently adjust the magnitudes of their relative spectral responsivities.
(e.g., daylight, fluorescent, and incandescent). Daylight contains relatively more short-wavelength energy than fluorescent light, and incandescent illumination contains relatively more long-wavelength energy than fluorescent light. However, the paper retains its approximate white appearance under all three light sources because the Scone system becomes relatively less sensitive in daylight to compensate for the additional short-wavelength energy and the L-cone system becomes relatively less sensitive in incandescent illumination to compensate for the additional long-wavelength energy. Chromatic adaptation can be thought of as analogous to an automatic white balance in video cameras. Von Kries (20) published an early and seminal suggestion that chromatic adaptation can be considered independent multiplicative normalization of the cone responses. More details on von Kries’ contribution and its more recent enhancements can be found in (6). There are many important cognitive visual mechanisms that impact color appearance, including memory color, color constancy, discounting the illuminant, and object recognition. Memory color is the phenomenon that recognizable objects often have a prototypical color that is associated with them. For example, most people have a memory of the typical blue sky color and can produce a stimulus of this color if requested to do so in an experiment. Interestingly, the memory color often differs from the color of the actual objects. For example, green grass and blue sky are typically remembered as more saturated than the actual stimuli (21). Color constancy refers to the everyday perception that the colors of objects remain unchanged across significant changes in illumination color and luminance level. Color constancy is served by the mechanisms of chromatic adaptation and memory color, and it can easily be shown that it is very poor when careful observations are made. Discounting the illuminant refers to an observer’s ability to interpret illumination conditions automatically and perceive the colors of objects after discounting the influences of illumination color. Object recognition is generally driven by the spatial, temporal, and light–dark properties of the objects rather than by chromatic properties (22). Thus once the objects
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
SPATIAL AND TEMPORAL PROPERTIES OF COLOR VISION No dimension of visual experience can be considered in isolation. The color appearance of a stimulus is not independent of its spatial and temporal characteristics. For example, a black-and-white stimulus that flickers at an appropriate temporal frequency can be perceived as quite colorful, as illustrated by Benham’s top (19). The spatial and temporal characteristics of the human visual system are typically explored by measuring contrast sensitivity functions. Contrast sensitivity functions (CSFs) in vision science are analogous to modulation transfer functions (MTFs) in imaging science. However, CSFs cannot legitimately be considered MTFs because the human visual system is highly nonlinear and not shift invariant. A contrast sensitivity function is defined by the threshold response to contrast (sensitivity is the inverse of threshold) as a function of spatial or temporal frequency. Contrast is typically defined as the difference between the maximum and minimum luminance of a stimulus divided by the sum of the maximum and minimum luminances, and CSFs are typically measured by stimuli that vary sinusoidally across space or time. Thus, a uniform pattern has a contrast of zero, and sinusoidal patterns whose troughs reach a luminance of zero have a contrast of 1.0, no matter what their mean luminance. Figure 13 illustrates typical spatial contrast sensitivity functions for luminance (black–white) and chromatic (red–green and yellow–blue at constant luminance) contrast. The luminance contrast sensitivity function is band-pass and has a peak sensitivity of about 5 cycles per degree. This function approaches zero at zero cycles per degree and illustrates the tendency for the visual system to be insensitive to uniform fields. It also approaches zero
100
Contrast sensitivity
are recognized, the mechanisms of memory color and discounting the illuminant can fill in the appropriate color. Such mechanisms have fascinating impacts on color appearance and become critically important when comparing color across different media. The psychological aspects of color perception are discussed clearly in Chap. 5 in the original edition of The Science of Color, published by the OSA (23). Clearly, visual information processing is extremely complex and not fully understood. It is of interest to consider the increasing complexity of cortical visual responses as the signal moves through the visual system. Single-cell electrophysiological studies have found cortical cells whose driving stimuli are extremely complex. For example, cells in monkeys that respond only to images of monkey paws or faces have been occasionally found in physiological experiments (12). The existence of such cells begs the question how complex a singlecell response can become. Clearly, it is not possible for every perception to have its own cortical cell. Thus, at some point in the visual system, the representation of perceptions must be distributed by combinations of various signals that produce various perceptions. Such distributed representations open up possibilities for numerous permutations of a given perception, such as color appearance.
521
Luminance
10
1 Red-green 0.1
Blue-yellow
0.01 −1
0
1
2
Log spatial frequency (cpd) Figure 13. Prototypical spatial contrast sensitivity functions for luminance contrast and isoluminant red–green or yellow–blue chromatic contrast.
at about 60 cycles per degree, the point at which detail can no longer be resolved by the optics of the eye or the photoreceptor mosaic. The band-pass contrast sensitivity function correlates with the concept of center-surround antagonistic receptive fields that are most sensitive to an intermediate range of spatial frequency. The chromatic mechanisms are low-pass and have significantly lower cutoff frequencies. This indicates the reduced availability of chromatic information for fine details (high spatial frequencies) that is often taken advantage of in image coding and compression schemes (e.g., NTSC or JPEG). The low-pass characteristics of the chromatic mechanisms also illustrate that edge detection/enhancement does not occur along these dimensions. The blue–yellow chromatic CSF has a lower cutoff frequency than the red–green chromatic CSF due to the paucity of S cones in the retina. It is noteworthy that the luminance CSF is significantly higher than the chromatic CSFs; this indicates that the visual system is more sensitive to small changes in luminance contrast compared to chromatic contrast. The spatial CSFs for luminance and chromatic contrast are generally not directly incorporated in colorimetric models although there is significant interest in doing so. Zhang and Wandell (24) presented an interesting technique for incorporating these types of responses into the CIELAB color space calculations. Pattanaik et al. (25,26) also present a comprehensive model that simulates the spatial aspects of color vision and adaptation and functions over wide dynamic ranges. Wandell (11) provides a good overview of the fundamentals of color and spatial vision. Typical temporal contrast sensitivity functions for luminance and chromatic contrast are similar to those for spatial contrast. They share many characteristics with the spatial CSFs shown in Fig. 13. Again, the luminance temporal CSF is higher in both sensitivity and cutoff frequency than the chromatic temporal CSFs, and it shows band-pass characteristics that suggest the enhancement of temporal transients in the human visual system. Again, temporal CSFs are not directly incorporated in colorimetric models, but they might be important to
522
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
consider when viewing time-varying images such as digital video clips that are rendered at differing frame rates. It is important to realize that the functions in Fig. 13 are prototypical, not universal. As stated earlier, the dimensions of human visual perception cannot be examined independently. The spatial and temporal CSFs interact with one another. A spatial CSF measured at different temporal frequencies varies tremendously, and the same is true for a temporal CSF measured at various spatial frequencies. These functions also depend on other variables such as luminance level, stimulus size, and retinal locus. Another important factor is the definition of contrast when comparing equiluminant chromatic contrast with monochromatic luminance contrast. The difficulty in expressing both types of contrast on equal scales means that one must take care in interpreting the visual system’s relative sensitivity to luminance and chromatic variation. For further discussion on this topic, see Lennie’s Chap. 12 in Ref. (5) and the work of Chaparro et al. (27). The oblique effect is an interesting spatial vision phenomenon. This refers to the fact that visual acuity is better for gratings oriented at 0° or 90° (relative to the line connecting the two eyes) than for gratings oriented at 45° . This phenomenon is considered in the design of rotated halftone screens that are set up so that the most visible pattern is oriented at 45° . The effect can be observed by taking a black-and-white halftone newspaper image and adjusting the viewing distance until the halftone dots are just barely imperceptible. If the image is kept at this viewing distance and then rotated 45° , the halftone dots become clearly perceptible. The spatial and temporal CSFs are also closely linked to the study of eye movements. A static spatial pattern becomes a temporally varying pattern when observers move their eyes across the stimulus. Noting that both the spatial and temporal luminance CSFs approach zero as either form of frequency variation approaches zero, it follows that a completely static stimulus is invisible. This is indeed the case. If the retinal image can be fixed by using a video feedback system attached to an eye tracker, the stimulus does disappear after a few seconds (28). Sometimes this can be observed by carefully fixating an object and noting that the perceptions of objects in the periphery begin to fade away after a few seconds. The centrally fixated object does not fade away because the ability to hold the eye completely still has variance greater than the high spatial resolution in the fovea. To avoid this rather unpleasant phenomenon in typical viewing, our eyes are constantly moving. Large eye movements take place to allow viewing different areas of the visual field with the high-resolution fovea. Small constant eye movements also keep the visual world nicely visible. This also explains why the shadows of retinal cells and blood vessels are generally not visible because they do not move on the retina, but rather move with the retina. The history of eye movements has significant impact on adaptation and appearance through integrated exposure of various retinal areas and the need for movements to preserve apparent contrast.
COLOR-VISION DEFICIENCIES There are various types of inherited and acquired colorvision deficiencies. Kaiser and Boynton (4) provide a current and comprehensive overview of the topic. Sharpe et al. (5) also provide more details on cone photopigments and color-vision deficiencies. This section concentrates on the most common inherited deficiencies. Some colorvision deficiencies are caused by the lack of a particular type of cone photopigment. There are three types of cone photopigments, so there are three general classes of these color-vision deficiencies, namely protanopia, deuteranopia, and tritanopia. An observer who has protanopia, known as a protanope, is missing the L-cone photopigment and therefore cannot discriminate between reddish and greenish hues because the red–green opponent mechanism cannot be constructed. A deuteranope is missing the M-cone photopigment and therefore also cannot distinguish between reddish and greenish hues due to the lack of a viable red–green opponent mechanism. Protanopes and deuteranopes can be distinguished by their relative luminous sensitivity because it is constructed from the summation of different cone types. The protanopic luminous sensitivity function is shifted toward shorter wavelengths than that of a deuteranope. A tritanope is missing the S-cone photopigment and therefore cannot discriminate between yellowish and bluish hues due to the lack of a yellow–blue opponent mechanism. There are also anomalous trichromats who have trichromatic vision, but their ability to discriminate particular hues is reduced either due to shifts in the spectral sensitivities of the photopigments or contamination of photopigments (e.g., some L-cone photopigment in the M-cones). Among the anomalous trichromats are those who have protanomaly (L-cone absorption shifted toward shorter wavelengths), deuteranomaly (M-cone absorption shifted toward longer wavelengths), and tritanomaly (Scone absorption shifted toward longer wavelengths). There are also some cases of cone monochromatism (effectively only one cone type) and rod monochromatism (no cone responses). The study of color-vision deficiencies is of more than academic interest in the field of color measurement and reproduction. This is illustrated in Table 1 which shows the approximate percentages of the population which Table 1. Approximate Occurrences of Various Color-vision Deficienciesa Type
Male (%)
Protanopia Deuteranopia Tritanopia Cone monochromatism Rod monochromatism Protanomaly Deuteranomaly Tritanomaly TOTAL
1.0 1.1 0.002 ∼0 0.003 1.0 4.9 ∼0 8.0
a
Based on data in (3)
Female (%) 0.02 0.01 0.001 ∼0 0.002 0.02 0.38 ∼0 0.4
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
have various types of color-vision deficiencies. Clearly, color deficiencies are not rare, particularly in the male population (about 8%), and it might be important to account for the possibility of color deficient observers in many applications. Why the disparity between the occurrence of colorvision deficiencies in males and females? This can be traced to the genetic basis of color-vision deficiencies. The most common forms of color-vision deficiencies are sex-linked genetic traits. The genes for photopigments are located on the X chromosome. Females inherit one X chromosome from their mothers and one from their fathers. Only one of these need have the genes for the normal photopigments to produce normal color vision. On the other hand, males inherit an X chromosome from their mothers and a Y chromosome from their fathers. If the single X chromosome does not include the genes for the photopigments, the sons will have color-vision deficiencies. If a female is color deficient, it means that she has two deficient X chromosomes and all male children are destined to have color-vision deficiencies. Knowledge regarding the genetic basis of color vision has grown tremendously in recent years. John Dalton was an early investigator of deficient color vision. He studied his own vision, which was historically considered protanopic based on his observations, and came up with a theory for the cause of his deficiencies. Dalton hypothesized that his color-vision deficiency was due to coloration of his vitreous humor that caused it to act like a filter. Upon his death, he donated his eyes to have them dissected to confirm his theory. Unfortunately Dalton’s theory was incorrect. However, Dalton’s eyes have been preserved to this day in a museum in Manchester, UK. Hunt et al. (29) performed DNA tests on Dalton’s preserved eyes and showed that Dalton was a deuteranope rather than a protanope, but had an L-cone photopigment whose spectral responsivity was shifted toward the shorter wavelengths. They also completed a colorimetric analysis to show that their genetic results were consistent with the observations originally recorded by Dalton. Given the fairly high occurrence rate of colorvision deficiencies, it is necessary to screen observers before allowing them to make critical color appearance or color matching judgments. A variety of tests are available, but two types, pseudoisochromatic plates and the Farnsworth–Munsell 100-Hue test, are of practical importance. Pseudoisochromatic plates (e.g., Ishihara’s Tests for Colour-Blindness) are color plates made up of dots of random lightness that include a pattern or number in the dots formed from an appropriately contrasting hue. The random lightness of the dots is a design feature to avoid discrimination of the patterns based only on lightness difference. The plates are presented to observers under properly controlled illumination, and they are asked to respond either by tracing the pattern or by reporting the number observed. Various plates are designed with color combinations that observers who have the different types of color-vision deficiencies would find difficult to discriminate. These tests are commonly administered as part of a normal ophthalmological examination and can be obtained from optical suppliers and general scientific
523
suppliers. Screening by using a set of peudoisochromatic plates should be considered a minimum evaluation for anyone who carries out critical color judgments. The Farnsworth–Munsell 100-Hue test, available from the Munsell Color Company, consists of four sets of chips that must be arranged in an orderly progression of hue. Observers who have various types of color-vision deficiencies make errors in arranging the chips at various locations around the hue circle. The test can be used to distinguish among the different types of deficiencies and also to evaluate the severity of color discrimination problems. This test can also be used to identify observers who have normal color vision, but poor discrimination for all colors. BASIC AND ADVANCED COLORIMETRY The following sections review the well-established practice of colorimetry according to the International Commission on Illumination (CIE) system first established in 1931. This system, which allows specifying color matches by an average observer, has amazingly withstood an onslaught of technological pressures and remained a useful international standard for about 70 years (30,31). However, CIE colorimetry provides only the starting point. Color appearance models enhance this system in an effort to predict the actual appearances of stimuli in various viewing conditions, rather than simply whether two stimuli match (6). This chapter provides a general review of the concepts of colorimetry, but it is not intended to be a complete reference. There are a number of excellent texts on the subject. For introductions to colorimetry, readers are referred to the texts of Berns (32), Hunt (3), BergerSchunn (33), and Hunter and Harold (34). The precise definition of colorimetry can be found in the primary reference, CIE Publication 15.2 (35). For complete details in an encyclopedic reference volume, consult the classic book by Wyszecki and Stiles, Color Science (1). Colorimetry is the measurement of color. Wyszecki (36,37) described an important distinction between basic and advanced colorimetry. It is most enlightening to quote Wyszecki’s exact words in making the distinction. Wyszecki’s description of basic colorimetry is as follows: Colorimetry, in its strict sense, is a tool used to making a prediction on whether two lights (visual stimuli) of different spectral power distributions will match in colour for certain given conditions of observation. The prediction is made by determining the tristimulus values of the two visual stimuli. If the tristimulus values of a stimulus are identical to those of the other stimulus, a colour match will be observed by an average observer with normal color vision (36).
Wyszecki went on to describe the realm of advanced colorimetry. Colorimetry in its broader sense includes methods of assessing the appearance of colour stimuli presented to the observer in complicated surroundings as they may occur in everyday life. This is considered the ultimate goal of colorimetry, but because of its enormous complexity, this goal is far from being reached. On the other hand, certain more restricted aspects of the
524
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
overall problem of predicting colour appearance of stimuli seem somewhat less elusive. The outstanding examples are the measurement of colour differences, whiteness, and chromatic adaptation. Though these problems are still essentially unresolved, the developments in these areas are of considerable interest and practical importance (36).
The following sections describe the well-established techniques of basic colorimetry that form the foundation for color appearance specification. It also describes some of the widely used methods for color difference measurement, one of the first objectives of advanced colorimetry. Wyszecki’s distinction between basic and advanced colorimetry highlights the ongoing challenges of color science, the research and modeling to extend basic colorimetry toward the ultimate goals of advanced colorimetry. WHY IS COLOR? To begin a discussion of the measurement of color, one must first consider the nature of color. ‘‘Why’’ is a more appropriate question than the more typical ‘‘what’’ because color is not a simple thing that can be easily described to someone who has never experienced it. Color cannot even be defined without resort to examples. Color is an attribute of visual sensation, and the color appearance of objects depends on three components that can be thought of as a triangle, as illustrated in Fig. 14. The first requirement is a source of visible electromagnetic energy to initiate the sensory process of vision. This energy is then modulated by the physical and chemical properties of an object. The modulated energy is imaged by the eye, detected by photoreceptors, and processed by the neural mechanisms of the human visual system to produce our perceptions of color. All of these components (light source, objects, observer) interact to produce color perceptions and, indeed, color itself. All three aspects of the triangle are required to produce color, so they must also be quantified to produce a reliable system of physical colorimetry. Light sources are quantified through their spectral power distribution
Observers [anatomy, physiology, psychophysics, psychology]
Color science Objects
Light sources
[chemistry, physics]
[physics, radiometry]
Figure 14. The triangle of color science illustrating its multidisciplinary nature and dependency on the properties of light sources, objects, and observers.
and are standardized as illuminants. Material objects are specified by the geometric and spectral distribution of the energy that they reflect or transmit. The human visual system is quantified through its color matching properties that represent the first-stage response (cone absorption) in the system. Thus, colorimetry, as a combination of all of these areas, draws upon techniques and results from the fields of physics, chemistry, psychophysics, physiology, and psychology. LIGHT SOURCES AND ILLUMINANTS The first component of the triangle of color is the light source. Light sources provide the electromagnetic energy required to initiate visual responses. The color properties of light sources are specified in two ways for basic colorimetry, through measurement and through standardization. The distinction between these two techniques is clarified in the definition of light sources and illuminants. Light sources are actual physical emitters of visible energy. Incandescent light bulbs, the sky at any given moment, and fluorescent tubes are examples of light sources. Illuminants, on the other hand, are standardized tables of values that represent a spectral power distribution typical of some particular light source. CIE illuminants A, D65, and F2 are standardized representations of typical incandescent, daylight, and fluorescent sources, respectively. Some illuminants have corresponding sources that are physical embodiments of standardized spectral power distributions. For example, CIE source A is a particular type of tungsten source that produces the relative spectral power distribution of CIE illuminant A. Other illuminants do not have corresponding sources. For example, CIE illuminant D65 is a statistical representation of average daylight whose correlated color temperature is approximately 6,500 K, and thus, there is no CIE source D65 that produces the illuminant D65 spectral power distribution. The measurement of the spectral power distributions of light sources is the realm of spectroradiometry. Spectroradiometry is the measurement of radiometric quantities as a function of wavelength. In color measurement, the wavelength region of interest encompasses electromagnetic energy of wavelengths from approximately 400 nm (violet) to 700 nm (red). A variety of radiometric quantities can be used to specify the properties of a light source. Irradiance and radiance are of particular interest in color appearance measurement. Both are measurements of the power of light sources whose basic units are watts. Irradiance, the radiant power per unit area incident on a surface, has units of watts per square meter (W/m2 ). Spectral irradiance adds the wavelength dependency and has units of W/m2 nm, sometimes expressed as W/m3 . Radiance differs from irradiance in that it is a measure of the power emitted from a source (or surface), rather than incident on a surface, per unit area per unit solid angle, whose units are watts per square meter per steradian (W/m2 sr). Spectral radiance includes the wavelength dependency whose units are W/m2 sr nm or W/m3 sr. Radiance has the interesting properties that it is preserved through
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
optical systems (neglecting absorption) and is independent of distance. Thus, the human visual system responds commensurably to radiance which makes it a key measurement in color appearance specification. The retina itself responds commensurably to the irradiance incident on it, but in combination with the optics of the eyeball, retinal irradiance is proportional to the radiance of a surface. This can be demonstrated by viewing an illuminated surface from various distances and observing that the perceived brightness does not change (consistent with the radiance of the surface). The irradiance at the eye from a given surface falls off approximately with the square of the distance from the surface. Radiance does not fall off in this fashion because the decrease in power incident on the pupil is directly canceled by a proportional decrease in the solid angle subtended by the pupil with respect to the surface in question. The spectral radiance L(λ) of a surface whose spectral reflectance factor is R(λ) can be calculated from the spectral irradiance E(λ) that falls on the surface by using Eq. (1), assuming that the surface is a Lambertian diffuser (i.e., equal radiance distribution in all directions). R(λ)E(λ) (1) L(λ) = π A spectral power distribution (SPD) (see Fig. 15) is simply a plot, or table, of a radiometric quantity as a function of wavelength. The overall power levels of light sources can vary over many orders of magnitude, so spectral power distributions are often normalized to facilitate comparisons of color properties. A typical approach is to normalize a spectral power distribution so that it has a value of 100 (or sometimes 1.0) at a wavelength of 560 nm (arbitrarily chosen near the center of the visible spectrum). Such normalized spectral power distributions are called relative spectral power distributions and are unitless. The color temperature of a light source is another important radiometric quantity. A special type of theoretical light source, known as a blackbody radiator, or Planckian radiator, emits energy due only to thermal excitation and is a perfect emitter of energy. The power
Relative spectral power
250 200
A
150 C 100 50 0 400
500 600 Wavelength (nm)
700
Figure 15. Relative spectral power distributions of CIE Illuminants C (theoretical daylight) and A (incandescent).
525
emitted by a blackbody increases, and its spectral composition shifts toward shorter wavelengths as the temperature of the blackbody increases. The spectral power distribution of a blackbody radiator can be specified by using Planck’s equation as a function of a single variable, absolute temperature (in Kelvin). Thus, if the absolute temperature of a blackbody is known, so is its spectral power distribution. The temperature of a blackbody is called its color temperature because it uniquely specifies the color of the source. Because blackbody radiators seldom exist outside specialized laboratories, color temperature is not a generally useful quantity. A second quantity, correlated color temperature, is more generally useful. A light source need not be a blackbody radiator to be assigned a correlated color temperature. The correlated color temperature (CCT) of a light source is simply the color temperature of a blackbody radiator that has nearly the same color as the source in question. As examples, an incandescent source might have a CCT of 2,800 K, a typical fluorescent tube 4,000 K, an average daylight 6,500 K, and the white point of a computer graphics display 9,300 K. As the correlated color temperature of a source increases, it becomes more bluish. The CIE has established a number of spectral power distributions as CIE illuminants for colorimetry. These include CIE illuminants A, C, D65, D50, F2, F8, and F11 among others. CIE illuminant A represents a Planckian radiator whose color temperature is 2,856 K; it is used for colorimetric calculations when incandescent illumination is of interest. CIE illuminant C is the spectral power distribution of illuminant A modified by particular liquid filters defined by the CIE and represents a daylight simulator whose CCT is 6,774 K. Illuminants D65 and D50 are part of the CIE D-series of illuminants that have been statistically defined on the basis of a large number of measurements of real daylight. Illuminant D65 represents average daylight whose CCT is 6,504 K, and D50 represents average daylight whose CCT is 5,003 K. D65 is commonly used in colorimetric applications, and D50 is often used in graphic arts applications. CIE D illuminants that have other correlated color temperatures can be easily obtained. The CIE F illuminants (12 in all) represent typical spectral power distributions for various types of fluorescent sources. CIE illuminant F2 represents cool-white fluorescent whose CCT is 4,230 K, F8 represents a fluorescent D50 simulator whose CCT is 5,000 K, and F11 represents a triband fluorescent source whose CCT is 4,000 K. Triband fluorescent sources are popular because of their efficiency, efficacy, and pleasing color-rendering properties. The equal-energy illuminant (sometimes called illuminant E) often has mathematical utility. It is defined as a relative spectral power of 100.0 at all wavelengths. Table 2 includes the spectral power distributions and useful colorimetric data for the CIE illuminants described before. The relative spectral power distributions of these illuminants are plotted in Figs. 15 through 17.
Table 2. Relative Spectral Power Distributions and Colorimetric Data for Some CIE Illuminants. Colorimetric Data are for the CIE 1931 Standard Colorimetric Observer (2° ) Wavelength (nm)
A
C
D65
D50
F2
F8
F11
360 365 370 375 380
6.14 6.95 7.82 8.77 9.80
12.90 17.20 21.40 27.50 33.00
46.64 49.36 52.09 51.03 49.98
23.94 25.45 26.96 25.72 24.49
0.00 0.00 0.00 0.00 1.18
0.00 0.00 0.00 0.00 1.21
0.00 0.00 0.00 0.00 0.91
385 390 395 400 405
10.90 12.09 13.35 14.71 16.15
39.92 47.40 55.17 63.30 71.81
52.31 54.65 68.70 82.75 87.12
27.18 29.87 39.59 49.31 52.91
1.48 1.84 2.15 3.44 15.69
1.50 1.81 2.13 3.17 13.08
0.63 0.46 0.37 1.29 12.68
410 415 420 425 430
17.68 19.29 21.00 22.79 24.67
80.60 89.53 98.10 105.80 112.40
91.49 92.46 93.43 90.06 86.68
56.51 58.27 60.03 58.93 57.82
3.85 3.74 4.19 4.62 5.06
3.83 3.45 3.86 4.42 5.09
1.59 1.79 2.46 3.33 4.49
435 440 445 450 455
26.64 28.70 30.85 33.09 35.41
117.75 121.50 123.45 124.00 123.60
95.77 104.87 110.94 117.01 117.41
66.32 74.82 81.04 87.25 88.93
34.98 11.81 6.27 6.63 6.93
34.10 12.42 7.68 8.60 9.46
33.94 12.13 6.95 7.19 7.12
460 465 470 475 480
37.81 40.30 42.87 45.52 48.24
123.10 123.30 123.80 124.09 123.90
117.81 116.34 114.86 115.39 115.92
90.61 90.99 91.37 93.24 95.11
7.19 7.40 7.54 7.62 7.65
10.24 10.84 11.33 11.71 11.98
6.72 6.13 5.46 4.79 5.66
485 490 495 500 505
51.04 53.91 56.85 59.86 62.93
122.92 120.70 116.90 112.10 106.98
112.37 108.81 109.08 109.35 108.58
93.54 91.96 93.84 95.72 96.17
7.62 7.62 7.45 7.28 7.15
12.17 12.28 12.32 12.35 12.44
14.29 14.96 8.97 4.72 2.33
510 515 520 525 530
66.06 69.25 72.50 75.79 79.13
102.30 98.81 96.90 96.78 98.00
107.80 106.30 104.79 106.24 107.69
96.61 96.87 97.13 99.61 102.10
7.05 7.04 7.16 7.47 8.04
12.55 12.68 12.77 12.72 12.60
1.47 1.10 0.89 0.83 1.18
535 540 545 550 555
82.52 85.95 89.41 92.91 96.44
99.94 102.10 103.95 105.20 105.67
106.05 104.41 104.23 104.05 102.02
101.43 100.75 101.54 102.32 101.16
8.88 10.01 24.88 16.64 14.59
12.43 12.22 28.96 16.51 11.79
4.90 39.59 72.84 32.61 7.52
560 565 570 575 580
100.00 103.58 107.18 110.80 114.44
104.11 102.30 100.15 97.80 95.43
100.00 98.17 96.33 96.06 95.79
100.00 98.87 97.74 98.33 98.92
16.16 17.56 18.62 21.47 22.79
11.76 11.77 11.84 14.61 16.11
2.83 1.96 1.67 4.43 11.28
585 590 595 600 605
118.08 121.73 125.39 129.04 132.70
93.20 91.22 89.70 88.83 88.40
92.24 88.69 89.35 90.01 89.80
96.21 93.50 95.59 97.69 98.48
19.29 18.66 17.73 16.54 15.21
12.34 12.53 12.72 12.92 13.12
14.76 12.73 9.74 7.33 9.72
610 615 620 625 630
136.35 139.99 143.62 147.24 150.84
88.19 88.10 88.06 88.00 87.86
89.60 88.65 87.70 85.49 83.29
99.27 99.16 99.04 97.38 95.72
13.80 12.36 10.95 9.65 8.40
13.34 13.61 13.87 14.07 14.20
55.27 42.58 13.18 13.16 12.26
(continued overleaf )
526
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
527
Table 2. (continued) Wavelength (nm)
A
C
D65
D50
F2
F8
F11
635 640 645 650 655
154.42 157.98 161.52 165.03 168.51
87.80 87.99 88.20 88.20 87.90
83.49 83.70 81.86 80.03 80.12
97.29 98.86 97.26 95.67 96.93
7.32 6.31 5.43 4.68 4.02
14.16 14.13 14.34 14.50 14.46
5.11 2.07 2.34 3.58 3.01
660 665 670 675 680
171.96 175.38 178.77 182.12 185.43
87.22 86.30 85.30 84.00 82.21
80.21 81.25 82.28 80.28 78.28
98.19 100.60 103.00 101.70 99.13
3.45 2.96 2.55 2.19 1.89
14.00 12.58 10.99 9.98 9.22
2.48 2.14 1.54 1.33 1.46
685 690 695 700 705
188.70 191.93 195.12 198.26 201.36
80.20 78.24 76.30 74.36 72.40
74.00 69.72 70.67 71.61 72.98
93.26 87.38 89.49 91.60 92.25
1.64 1.53 1.27 1.10 0.99
8.62 8.07 7.39 6.71 6.16
1.94 2.00 1.20 1.35 4.10
710 715 720 725 730
204.41 207.41 210.37 213.27 216.12
70.40 68.30 66.30 64.40 62.80
74.35 67.98 61.60 65.74 69.89
92.89 84.87 76.85 81.68 86.51
0.88 0.76 0.68 0.61 0.56
5.63 5.03 4.46 4.02 3.66
5.58 2.51 0.57 0.27 0.23
735 740 745 750 755 760
218.92 221.67 224.36 227.00 229.59 232.12
61.50 60.20 59.20 58.50 58.10 58.00
72.49 75.09 69.34 63.59 55.01 46.42
89.55 92.58 85.40 78.23 67.96 57.69
0.54 0.51 0.47 0.47 0.43 0.46
3.36 3.09 2.85 2.65 2.51 2.37
0.21 0.24 0.24 0.20 0.24 0.32
X Y Z x y CCT (K)
109.85 98.07 95.05 96.42 99.20 96.43 100.96 100.0 100.0 100.0 100.0 100.0 100.0 100.0 35.58 118.23 108.88 82.49 67.40 82.46 64.37 0.4476 0.3101 0.3127 0.3457 0.3721 0.3458 0.3805 0.4074 0.3162 0.3290 0.3585 0.3751 0.3586 0.3769 2,856 6,800 6,504 5,003 4,230 5,000 4,000
125 D65 Relative spectral power
D50 100 75 50 25
their interaction with visible radiant energy represented by the second component of the triangle of color. The interaction of radiant energy with materials obeys the law of conservation of energy. Only three fates can befall radiant energy incident on an object — absorption, reflection, and transmission. Thus the absorbed, reflected, and transmitted radiant power must sum to the incident radiant energy at each wavelength, as illustrated in Eq. (2), where (λ) is used as a generic term for incident radiant flux, R(λ) is the reflected flux, T(λ) is the transmitted flux, and A(λ) is the absorbed flux: (λ) = R(λ) + T(λ) + A(λ)
0 400
500 600 Wavelength (nm)
700
Figure 16. Relative spectral power distributions of CIE Illuminants D65 (average daylight of 6,500-K CCT) and D50 (average daylight of 5,000-K CCT).
COLORED MATERIALS Once the light source or illuminant is specified, the next step in the colorimetry of material objects is characterizing
(2)
To define terms, reflection, transmission, and absorption are the phenomena that take place when light interacts with matter, and reflectance, transmittance, and absorptance are the quantities measured to describe these phenomena. Because these quantities always must sum to the incident flux, they are typically measured in relative terms as percentages of the incident flux, rather than as absolute radiometric quantities. Thus, reflectance can be defined as the ratio of the reflected energy to the incident energy, transmittance as the ratio of transmitted energy to incident energy, and absorptance as the ratio of
528
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
Relative spectral power
80
60
F11 40 F2
20 F8
0 400
500 600 Wavelength (nm)
700
Figure 17. Relative spectral power distributions of CIE Illuminants F2 (cool-white fluorescent), F11 (narrow-band fluorescent), and F8 (daylight with fluorescent).
absorbed energy to incident energy. Note that all of these quantities are ratio measurements, the subject of spectrophotometry, which is defined as the measurement of ratios of radiometric quantities. Spectrophotometric quantities are expressed either as percentages (0–100%) or as factors (0.0–1.0). Because the three quantities sum to 100%, it is typically unnecessary to measure all three. Generally either reflectance or transmittance is of particular interest in a given application. Importantly, the interaction of radiant energy with objects is not just a simple spectral phenomenon. The reflectance or transmittance of an object is not just a function of wavelength, but is also a function of the illumination and viewing geometry. Such differences can be illustrated by the phenomenon of gloss. Imagine matte, semigloss, and glossy photographic paper or paint. The various gloss characteristics of these materials can be ascribed to the geometric distribution of the specular reflectance from the objects surface. This is just one geometric appearance effect. Many others exist such as interesting changes in the color of automotive finishes with illumination and viewing geometry (e.g., metallic, pearlescent, and other ‘‘effect’’ coatings). To quantify such effects fully, complete bidirectional reflectance (or transmittance) distribution functions (BRDFs), must be obtained for each possible combination of illumination angle, viewing angle, and wavelength. Measurement of such functions is prohibitively difficult and expensive and produces massive quantities of data that are difficult to use meaningfully. To avoid this explosion of colorimetric data, a small number of standard illumination and viewing geometries have been established for practical colorimetry. The CIE has defined four standard illumination and viewing geometries for spectrophotometric reflectance measurements. These come as two pairs of optically reversible geometries. The first are the diffuse/normal (d/0) and normal/diffuse (0/d) geometries. The designations indicate first the illumination geometry and then the viewing geometry following the slash (/). In the diffuse/normal geometry, the sample is illuminated from all angles by using an integrating sphere and is viewed at an
angle near the normal to the surface. In normal/diffuse geometry, the sample is illuminated from an angle near its normal, and the reflected energy is collected from all angles by using an integrating sphere. These two geometries are optical reverses of one another and therefore produce the same measurement results (assuming that all other instrumental variables are constant). The measurements made are of total reflectance. In many instruments, an area of the integrating sphere that corresponds to the angle of specular (regular) reflection of the illumination in a 0/d geometry or the angle from which specular reflection would be detected in a d/0 geometry can be replaced by a black trap so that the specular component of reflection is excluded and only diffuse reflectance is measured. Such measurements are called ‘‘specular component excluded’’ measurements, as opposed to ‘‘specular component included’’ measurements made when the entire sphere is intact. The second pair of geometries is the 45/normal (45/0) and normal/45 (0/45) measurement configurations. In a 45/0 geometry, the sample is illuminated with one or more beams of light incident at an angle of 45° from the normal, and measurements are made along the normal. In the 0/45 geometry, the sample is illuminated normal to its surface, and measurements are made using one or more beams at a 45° angle to the normal. Again, these two geometries are optical reverses of one another and produce identical results, if all other instrumental variables are equal. Use of the 45/0 and 0/45 measurement geometries ensures that all components of gloss are excluded from the measurements. Thus these geometries are typically used in applications where it is necessary to compare the colors of materials that have various levels of gloss (e.g., graphic arts and photography). It is critical to note the instrumental geometry used whenever reporting colorimetric data for materials. The definition of reflectance as the ratio of reflected energy to incident energy is perfectly appropriate for measurements of total reflectance (d/0 or 0/d). However, for bidirectional reflectance measurements (45/0 and 0/45), the ratio of reflected energy to incident energy is exceedingly small because only a small range of angles of the distribution of reflected energy is detected. Thus, to produce more practically useful values for any type of measurement geometry, reflectance factors are measured relative to a perfect reflecting diffuser. A perfect reflecting diffuser is a theoretical material that is both a perfect reflector (100% reflectance) and perfectly Lambertian (radiance equal in all directions). Thus, reflectance factor is defined as the ratio of the energy reflected by the sample to the energy that would be reflected by a perfect reflecting diffuser (PRD) illuminated and viewed in the identical geometry. For the integrating sphere geometries that measure total reflectance, this definition of reflectance factor is identical to the definition of reflectance. For bidirectional geometries, measurement of reflectance factor relative to the PRD results in a zero-to-one scale similar to that obtained for total reflectance measurements. Because PRDs are not physically available, reference standards that are calibrated relative to the theoretical are provided
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
by national standardizing laboratories (such as NIST in the United States) and instrument manufacturers. One last topic of importance in the colorimetric analysis of materials is fluorescence. Fluorescent materials absorb energy in a region of wavelengths and then emit this energy in a region of longer wavelengths. For example, a fluorescent orange material might absorb blue energy and emit it as orange energy. Fluorescent materials obey the law of conservation of energy, as stated in Eq. (2). However, their behavior is different in that some of the absorbed energy is normally emitted at longer wavelengths. In general, a fluorescent material is characterized by its total radiance factor, which is the sum of the reflected and emitted energy at each wavelength relative to the energy that would be reflected by a PRD. This definition allows total radiance factors greater than 1.0, which is often the case. It is important to note that the total radiance factor depends on the light source used in the measuring instrument because the amount of emitted energy is directly proportional to the amount of absorbed energy in the excitative wavelengths. Spectrophotometric measurements of reflectance or transmittance of nonfluorescent materials are insensitive to the light source in the instrument because its characteristics are normalized in the ratio calculations. This important difference highlights the major difficulty in measuring fluorescent materials. Unfortunately, many man-made materials (such as paper and inks) are fluorescent and thus are significantly more difficult to measure accurately. THE HUMAN VISUAL RESPONSE Measurement or standardization of light sources and materials provides the necessary physical information for colorimetry. What is needed is a quantitative technique to predict the response of the human visual system, the third aspect of the triangle of color. Following Wyszecki’s (36) definition of basic colorimetry, the quantification of the human visual response focuses on the earliest level of vision, absorption of energy in the cone photoreceptors, through the psychophysics of color matching. The ability to predict when two stimuli match for an average observer, the basis of colorimetry, provides great utility in a variety of applications. Such a system does not specify color appearance, but it provides the basis of color appearance specification and allows predicting matches for various applications and the tools required to set up tolerances for matches necessary for industry. The properties of human color matching are defined by the spectral responsivities of the three cone types because once the energy is absorbed by the three cone types, the spectral origin of the signals is lost (principle of univariance). If the signals from the three cone types are equal for two stimuli, regardless of their spectral power distributions, they must match in color when seen in the same conditions because no further information is introduced within the visual system to distinguish them. Thus, if the spectral responsivities of the three cone types are known, two stimuli, denoted by their spectral power distributions 1 (λ) and 2 (λ), will match in color if the product of their spectral power distributions and
529
each of the three cone responsivities, L(λ), M(λ), and S(λ), integrated over wavelength, are equal. This equality for a visual match is illustrated in Eqs. (3)–(5). Two stimuli match if all three of the equalities in Eqs. (3)–(5) hold true.
1 (λ)L(λ)dλ =
λ
(3)
2 (λ)M(λ)dλ,
(4)
2 (λ)S(λ)dλ.
(5)
1 (λ)M(λ)dλ = λ
and
2 (λ)L(λ)dλ, λ
λ
1 (λ)S(λ)dλ = λ
λ
Equations (3)–(5) illustrate the definition of metamerism. Because only the three integrals need be equal for a color match, it is not necessary for the spectral power distributions of the two stimuli to be equal for every wavelength. Metamerism is what allows color reproduction systems to work with just three primaries because it is not necessary to reproduce the spectral power distributions of objects to match their color. Metamerism is also the cause of many difficulties in producing colored materials because two objects might match in color under one light source despite disparate spectral properties. However, when the same two objects are viewed under a different light source, they might no longer match in color. The cone spectral responsivities are quite well known today (5). Using such knowledge, a system of basic colorimetry becomes nearly as simple to define as Eqs. (3)–(5). However, reasonably accurate knowledge of the cone spectral responsivities is a rather recent scientific development. The need for colorimetry predates this knowledge by several decades. Thus, in establishing the 1931 system of colorimetry, the CIE needed to take a less direct approach. To illustrate the indirect nature of the CIE system of colorimetry, it is useful first to explore the system of photometry, which was established in 1924. The aim for a system of photometry was the development of a spectral weighting function that could be used to describe the perception of brightness matches. In 1924, the CIE spectral luminous efficiency function V(λ) was established for photopic vision. This function, plotted in Fig. 4 and enumerated in Table 3, indicates that the visual system is more sensitive (with respect to the perception of brightness) to wavelengths in the middle of the spectrum and becomes less and less sensitive to wavelengths near the extremes of the visual spectrum. The V(λ) function is used as a spectral weighting function to convert radiometric quantities into photometric quantities via spectral integration, as shown in Eq. 6. V =
(λ)V(λ)dλ
(6)
λ
The term V in Eq. (6) refers to the appropriate photometric quantity corresponding to the radiometric
530
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING Table 3. CIE Photopic Luminous Efficiency Function V(l) and Scotopic Luminous Efficiency Function V (l)
Table 3. (continued) Wavelength (nm)
V(λ)
V (λ)
Wavelength (nm)
V(λ)
V (λ)
645 650 655
0.1382 0.1070 0.0816
0.0011 0.0007 0.0005
360 365 370 375 380
0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0006
660 665 670 675 680
0.0610 0.0446 0.0320 0.0232 0.0170
0.0003 0.0002 0.0001 0.0001 0.0001
385 390 395 400 405
0.0001 0.0001 0.0002 0.0004 0.0006
0.0014 0.0022 0.0058 0.0093 0.0221
685 690 695 700 705
0.0119 0.0082 0.0057 0.0041 0.0029
0.0000 0.0000 0.0000 0.0000 0.0000
410 415 420 425 430
0.0012 0.0022 0.0040 0.0073 0.0116
0.0348 0.0657 0.0966 0.1482 0.1998
710 715 720 725 730
0.0021 0.0015 0.0010 0.0007 0.0005
0.0000 0.0000 0.0000 0.0000 0.0000
435 440 445 450 455
0.0168 0.0230 0.0298 0.0380 0.0480
0.2640 0.3281 0.3916 0.4550 0.5110
460 465 470 475 480
0.0600 0.0739 0.0910 0.1126 0.1390
0.5670 0.6215 0.6760 0.7345 0.7930
735 740 745 750 755 760
0.0004 0.0002 0.0002 0.0001 0.0001 0.0001
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
485 490 495 500 505
0.1693 0.2080 0.2586 0.3230 0.4073
0.8485 0.9040 0.9430 0.9820 0.9895
510 515 520 525 530
0.5030 0.6082 0.7100 0.7932 0.8620
0.9970 0.9660 0.9350 0.8730 0.8110
535 540 545 550 555
0.9149 0.9540 0.9803 0.9950 1.0000
0.7305 0.6500 0.5655 0.4810 0.4049
560 565 570 575 580
0.9950 0.9786 0.9520 0.9154 0.8700
0.3288 0.2682 0.2076 0.1644 0.1212
585 590 595 600 605
0.8163 0.7570 0.6949 0.6310 0.5668
0.0934 0.0655 0.0494 0.0332 0.0246
610 615 620 625 630 635 640
0.5030 0.4412 0.3810 0.3210 0.2650 0.2170 0.1750
0.0159 0.0117 0.0074 0.0054 0.0033 0.0024 0.0015
quantity (λ) used in the calculation. For example, the radiometric quantities of irradiance, radiance, and reflectance factor are used to derive the photometric quantities of illuminance (lumens/m2 or lux), luminance (cd/m2 ), or luminance factor (unitless). More details on various radiometric and photometric quantities can be found in numerous texts on radiometry (38,39). All of the optical properties and relationships for irradiance and radiance are preserved for illuminance and luminance. To convert irradiance or radiance to illuminance or luminance, a normalization constant of 683 lumens/watt is required to preserve the appropriate units. To calculate the luminance factor, a different type of normalization, described in the next section, is required. The V(λ) function is clearly not one of the cone responsivities. Instead, as suggested by the opponent theory of color vision, the V(λ) function corresponds to a weighted sum of cone responsivity functions. When the cone functions are weighted according to their relative population in the retina and summed, the overall responsivity matches the CIE 1924 V(λ) function. Thus, the photopic luminous response represents a combination of cone signals. This sort of combination is present in the entire system of colorimetry. The use of a spectral weighting function to predict luminance matches is the first step toward a system of colorimetry. There is also a luminous efficiency function for scotopic vision (rods), known as the V (λ) function. This function, used for photometry at very low luminance levels, is plotted in Fig. 4 and presented in Table 3 along with the V(λ) function. Because there is only one type of rod photoreceptor, the V (λ) function corresponds exactly to the spectral responsivity of the rods after transmission
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
through the ocular media. Figure 4 illustrates the shift in peak spectral sensitivity toward shorter wavelengths during the transition from photopic to scotopic vision. This shift, known as the Purkinje shift, explains why blue objects tend to look lighter than red objects at very low luminance levels. The V (λ) function is applied mathematically in a way similar to the V(λ) function. It has long been recognized in vision research that the V(λ) function might underpredict observed responsivity in the short-wavelength region of the spectrum. To address this issue and standard practice in the vision community, an additional function, the CIE 1988 spectral luminous efficiency function, VM (λ), was established (40).
0.4
Tristimulus value
0.3
B G
0.2 0.1
−0.1 400
Equation (7) states that a color C is matched by R units of the R primary, G units of the G primary, and B units of the B primary. The script terms, RGB, define the particular set of primaries and indicate that for different primary sets, different amounts of the primaries will be required to make a match. The terms RGB indicate the amounts of the primaries required to match the color and are known as tristimulus values (in arbitrary units proportional to radiance). Because any color can be matched by certain amounts of three primaries, those amounts (tristimulus values), along with a definition of the primary set, allow specifying a color. If two stimuli can be matched using the same amounts of the primaries (i.e., they have equal tristimulus values), then they will also match each other when viewed in the same conditions. The next step in the derivation of colorimetry is extending tristimulus values so that they can be obtained for any given stimulus, defined by a spectral power distribution. Two steps are required to accomplish this. The first is to obtain tristimulus values for matches to spectral colors. The second is to take advantage of Grassmann’s laws of additivity and proportionality to sum tristimulus values for each spectral component of a stimulus spectral power distribution to obtain the integrated tristimulus values for the stimulus. Conceptually, the tristimulus values of the spectrum (i.e., spectral tristimulus values) are obtained by matching a unit amount of power at each wavelength by using an additive mixture of three primaries. Figure 18 illustrates a set of spectral tristimulus values for monochromatic
R
0
TRISTIMULUS VALUES AND COLOR-MATCHING FUNCTIONS Following the establishment of the CIE 1924 luminous efficiency function V(λ), attention was turned to developing of a system of colorimetry to specify when two metameric stimuli match in color for an average observer. Because the cone responsivities were unavailable at that time, a system of colorimetry was constructed that was based on the principles of trichromacy and Grassmann’s laws of additive color mixture. The concept of this system is that color matches can be specified in terms of the amounts of three additive primaries required to match a stimulus visually. This is illustrated in the equivalence statement of Eq. (7). C ≡ R(R) + G(G) + B(B) (7)
531
500 600 Wavelength (nm)
700
Figure 18. CIE 1931 RGB color-matching functions.
primaries at 435.8 (B), 546.1 (G), and 700.0 nm (R). Spectral tristimulus values for the complete spectrum are also known as color-matching functions, or sometimes color mixture functions. Notice that some of the spectral tristimulus values plotted in Fig. 18 are negative. This implies the addition of a negative amount of power to the match. For example, a negative amount of the R primary is required to match a monochromatic 500-nm stimulus because that wavelength is too saturated to be matched by the particular primaries (i.e., it is out of gamut). Clearly, a negative amount of light cannot be added to a match. Negative tristimulus values are obtained by adding the primary to the monochromatic light to desaturate it and bring it within the gamut of the primaries. Thus, a 500-nm stimulus mixed with a given amount of the R primary is matched by an additive mixture of appropriate amounts of the G and B primaries. The color-matching functions illustrated in Fig. 18 indicate the amounts of the primaries required to match unit amounts of power at each wavelength. By considering the spectral power of any given stimulus as an additive mixture of various amounts of monochromatic stimuli, the tristimulus values for a stimulus can be obtained by multiplying the color-matching functions by the amount of energy in the stimulus at each wavelength (Grassmann’s proportionality) and integrating across the spectrum (Grassmann’s additivity). Thus, the generalized equations for calculating the tristimulus values of a stimulus whose spectral power distribution is (λ), are given by Eqs. (8)–(10), where r(λ), g(λ), and b(λ) are the color matching functions. R=
(λ)r(λ)dλ,
(8)
(λ)g(λ)dλ,
(9)
(λ)b(λ)dλ.
(10)
λ
G= λ
and B=
λ
Having established the utility of tristimulus values and color-matching functions, it remains to derive a set
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
of color-matching functions that are representative of the population of observers who have normal color vision. Color-matching functions for individual observers, all of whom have normal color vision, can be significantly different due to variations in lens transmittance, macula transmittance, cone density and population, photopigment optical density, and photopigment spectral absorption spectra. Thus, to establish a standardized system of colorimetry, it is necessary to obtain a reliable estimate of the average color-matching functions of the population of observers whose color vision is normal. In the late 1920s, two sets of experiments were completed to estimate average color-matching functions. These experiments differed in that Wright (41) used monochromatic primaries and Guild (42) used broadband primaries. The primaries from one experiment can be specified in terms of tristimulus values to match them using the other system, so it is possible to derive a linear transform (3 × 3 matrix transformation) to convert tristimulus values from one set of primaries to another. This transformation also applies to the color-matching functions because they are themselves tristimulus values. Thus, a transformation was derived to place the data from Wright’s and Guild’s experiments into a common set of primaries. When this was done, the agreement between the two experiments was extremely good and verified the underlying theoretical assumptions in the derivation and the use of color-matching functions. Given this agreement, the CIE decided to establish a standard set of color-matching functions based on the mean results of the Wright and Guild experiments. These mean functions were transformed to RGB primaries of 700.0, 546.1, and 435.8 nm, respectively, and are illustrated in Fig. 18. In addition, the CIE decided to transform to yet another set of primaries, the XYZ primaries. The main objectives in performing this transformation were to eliminate the negative values in the color-matching functions and to have one of the color-matching functions equal the CIE 1924 photopic luminous efficiency function V(λ). The negative values were removed by selecting primaries that could be used to match all physically realizable color stimuli. This can be accomplished only by using imaginary primaries that are more saturated than monochromatic lights. This is a straightforward mathematical construct, and it should be noted that, although the primaries are imaginary, the color-matching functions derived for those primaries are based on very real color-matching results and the validity of Grassmann’s laws. Forcing one of the color-matching functions to equal the V(λ) function serves the purpose of incorporating the CIE system of photometry (established in 1924) into the CIE system of colorimetry (established in 1931). This is accomplished by choosing two of the imaginary primaries, X and Z, so that they produce no luminance response, leaving all of the luminance response in the third primary Y. The colormatching functions for the XYZ primaries, x(λ), y(λ), and z(λ), respectively, known as the color-matching functions of the CIE 1931 standard colorimetric observer, are plotted in Fig. 19 and are listed in Table 4 for the practical wavelength range of 360–760 nm in 5-nm increments. The CIE defines color-matching functions from 360–830 nm in
1-nm increments that have more digits after the decimal point (35). XYZ tristimulus values for colored stimuli are calculated in the same fashion as the RGB tristimulus values described before. The general equations are given in Eqs. (11)–(13), where (λ) is the spectral power distribution of the stimulus, x(λ), y(λ), and z(λ) are the color-matching functions, and k is a normalizing constant. X = k (λ)x(λ)dλ,
(11)
λ
Y = k (λ)y(λ)dλ,
(12)
λ
Z = k (λ)z(λ)dλ.
and
(13)
λ
The spectral power distribution of the stimulus is defined in different ways for various types of stimuli. For self-luminous stimuli (e.g., light sources and CRT displays), (λ) is typically spectral radiance or a relative spectral power distribution. For reflective materials, (λ) is defined as the product of the spectral reflectance factor R(λ) of the material and the relative spectral power distribution of the light source or illuminant of interest S(λ) that is, R(λ)S(λ). For transmitting materials, (λ) is defined as the product of the spectral transmittance of the material T(λ) and the relative spectral power distribution of the light source or illuminant of interest S(λ) that is, T(λ)S(λ). The normalization constant k in Eqs. (11)–(13), is defined differently for relative and absolute colorimetry. In absolute colorimetry, k is set equal to 683 lumens/watt to make the system of colorimetry compatible with the system of photometry. For relative colorimetry, k is defined by Eq. (14): 100 . (14) k= S(λ)y(λ)dλ λ The normalization for relative colorimetry in Eq. (14) results in tristimulus values that are scaled from zero to approximately 100 for various materials. It is useful
2.0
Z Tristimulus value
532
1.5
Y
1.0
X
0.5
0.0 400
500 600 Wavelength (nm)
700
Figure 19. CIE 1931 XYZ color-matching functions.
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING Table 4. Color-matching Functions for the CIE 1931 Standard Colorimetric Observer (2° ) Wavelength (nm)
x(λ)
y(λ)
z(λ)
360 365 370 375 380
0.0000 0.0000 0.0000 0.0000 0.0014
0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0065
385 390 395 400 405
0.0022 0.0042 0.0077 0.0143 0.0232
0.0001 0.0001 0.0002 0.0004 0.0006
0.0105 0.0201 0.0362 0.0679 0.1102
410 415 420 425 430
0.0435 0.0776 0.1344 0.2148 0.2839
0.0012 0.0022 0.0040 0.0073 0.0116
0.2074 0.3713 0.6456 1.0391 1.3856
435 440 445 450 455
0.3285 0.3483 0.3481 0.3362 0.3187
0.0168 0.0230 0.0298 0.0380 0.0480
1.6230 1.7471 1.7826 1.7721 1.7441
460 465 470 475 480
0.2908 0.2511 0.1954 0.1421 0.0956
0.0600 0.0739 0.0910 0.1126 0.1390
1.6692 1.5281 1.2876 1.0419 0.8130
485 490 495 500 505
0.0580 0.0320 0.0147 0.0049 0.0024
0.1693 0.2080 0.2586 0.3230 0.4073
0.6162 0.4652 0.3533 0.2720 0.2123
510 515 520 525 530
0.0093 0.0291 0.0633 0.1096 0.1655
0.5030 0.6082 0.7100 0.7932 0.8620
0.1582 0.1117 0.0782 0.0573 0.0422
535 540 545 550 555
0.2257 0.2904 0.3597 0.4334 0.5121
0.9149 0.9540 0.9803 0.9950 1.0000
0.0298 0.0203 0.0134 0.0087 0.0057
560 565 570 575 580
0.5945 0.6784 0.7621 0.8425 0.9163
0.9950 0.9786 0.9520 0.9154 0.8700
0.0039 0.0027 0.0021 0.0018 0.0017
585 590 595 600 605
0.9786 1.0263 1.0567 1.0622 1.0456
0.8163 0.7570 0.6949 0.6310 0.5668
0.0014 0.0011 0.0010 0.0008 0.0006
610 615 620 625 630
1.0026 0.9384 0.8544 0.7514 0.6424
0.5030 0.4412 0.3810 0.3210 0.2650
0.0003 0.0002 0.0002 0.0001 0.0000
635 640
0.5419 0.4479
0.2170 0.1750
0.0000 0.0000 (continued)
533
Table 4. (continued) Wavelength (nm)
x(λ)
y(λ)
z(λ)
645 650 655
0.3608 0.2835 0.2187
0.1382 0.1070 0.0816
0.0000 0.0000 0.0000
660 665 670 675 680
0.1649 0.1212 0.0874 0.0636 0.0468
0.0610 0.0446 0.0320 0.0232 0.0170
0.0000 0.0000 0.0000 0.0000 0.0000
685 690 695 700 705
0.0329 0.0227 0.0158 0.0114 0.0081
0.0119 0.0082 0.0057 0.0041 0.0029
0.0000 0.0000 0.0000 0.0000 0.0000
710 715 720 725 730
0.0058 0.0041 0.0029 0.0020 0.0014
0.0021 0.0015 0.0010 0.0007 0.0005
0.0000 0.0000 0.0000 0.0000 0.0000
735 740 745 750 755 760
0.0010 0.0007 0.0005 0.0003 0.0002 0.0002
0.0004 0.0002 0.0002 0.0001 0.0001 0.0001
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
to note that if relative colorimetry is used to calculate the tristimulus values of a light source, the Y tristimulus value is always equal to 100. There is another improper use of the term relative colorimetry in the graphic arts and other color reproduction industries. In some cases, tristimulus values are normalized to the paper white rather than to a perfect reflecting diffuser. This results in a Y tristimulus value of 100 for the paper, rather than its more typical value of about 85. The advantage of this is that it allows transformation between different paper types and preserves the paper white as the lightest color in an image, without having to keep track of the paper color. Such a practice might be useful, but it is actually a gamut mapping issue rather than a color measurement issue. A more appropriate terminology for this practice might be normalized colorimetry (or perhaps media-relative colorimetry, which has been suggested) to avoid confusion with the long-established practice of relative colorimetry. It is also worth noting that the practice of normalized colorimetry is not always consistent. In some cases, reflectance is measured relative to the paper white. This ensures that the Y tristimulus value is normalized between zero for a perfect black and 1.0 (or 100.0) for the paper white; however, the X and Z tristimulus values might still range above 1.0 (or 100.0) depending on the particular illuminant used in the colorimetric calculations. Another approach is to normalize the tristimulus values for each stimulus color, XYZ, by the tristimulus values of the paper, Xp Yp Zp , individually (X/Xp , Y/Yp , and Z/Zp ). This is analogous to the white-point normalization in CIELAB and is often used to adjust for white-point changes and a limited dynamic range in imaging systems.
534
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
It is important to know with which type of normalized colorimetry one is dealing in various applications. The relationship between CIE XYZ tristimulus values and cone responses (sometimes referred to as fundamental tristimulus values) is of great importance and interest in advanced colorimetry. Like the V(λ) function, the CIE XYZ color-matching functions each represent a linear combination of cone responsivities. Thus, the relationship between the two is defined by a 3 × 3 linear matrix transformation. Cone spectral responsivities can be thought of as the color-matching functions for a set of primaries that are constructed so that each primary stimulates only one cone type. It is possible to isolate the S cones with real primaries by choosing two of the three so that the S cones are not excited. However, no real primaries can be produced that stimulate only the M or L cones because their spectral responsivities overlap across the visible spectrum. Thus, the required primaries are also imaginary and produce all-positive color-matching functions but do not incorporate the V(λ) function as a color-matching function (because that requires a primary that stimulates all three cone types). The historical development and recent progress of colorimetric systems based on physiological cone responsivities has been reviewed by Boynton (43). It is important to be aware that two sets of colormatching functions have been established by the CIE. The CIE 1931 standard colorimetric observer was determined from experiments using a visual field that subtended 2° . Thus, the matching stimuli were imaged onto the retina completely within the fovea. Often they are referred to as the 2° color-matching functions or the 2° observer. It is of historical interest to note that the 1931 standard colorimetric observer is based on data collected from fewer than 20 observers. In the 1950s, experiments were completed to collect 2° colormatching functions for more observers using more precise and accurate instrumentation (44). The results showed slight systematic discrepancies, but not of magnitude sufficient to warrant a change in the standard colorimetric observer. At the same time, experiments were completed to collect color-matching function data for large visual fields. This was prompted by observed discrepancies between colorimetric and visual determination of the whiteness of paper that suggested that the 2° colormatching functions did not properly incorporate shortwavelength response. These experiments were completed using a 10° visual field that excluded the central fovea either by physically blocking it or by asking observers to ignore it. Thus, the color-matching functions include no influence of macular absorption. The results for large fields were deemed significantly different from the 2° standard to warrant establishing the CIE 1964 supplementary standard colorimetric observer, sometimes called the 10° observer. The difference between the two standard observers is significant, so care should be taken to report which observer is used for any colorimetric data. The differences are computationally significant but are certainly within the variability of color-matching functions found for either 2° or 10° visual fields. Thus the two standard colorimetric observers can
be thought of as representing the color-matching functions of two individuals. For practical applications, it is often recommended that the 1931 (2° ) observer be used for visual fields of less than a 4° subtended angle and that the 1964 (10° ) observer be used for larger fields. CHROMATICITY DIAGRAMS The color of a stimulus can be specified by a triplet of tristimulus values. Chromaticity diagrams were developed to provide a convenient two-dimensional representation of colors. The transformation from tristimulus values to chromaticity coordinates is accomplished through a normalization that removes luminance information. This transformation is a one-point perspective projection of data points in the three-dimensional tristimulus space onto the unit plane of that space (whose center of projection is at the origin), as defined by Eqs. (15)–(17). X , X +Y +Z Y y= , X +Y +Z x=
and z=
Z . X +Y +Z
(15) (16)
(17)
Because there are only two dimensions of information in chromaticity coordinates, the third chromaticity coordinate can always be obtained from the other two by noting that the three always sum to unity. Thus z can be calculated from x and y using Eq. (18): z = 1.0 − x − y.
(18)
Chromaticity coordinates should be used with great care because they attempt to represent a threedimensional phenomenon by using just two variables. To specify a colored stimulus fully, one of the tristimulus values must be reported in addition to two (or three) chromaticity coordinates. Usually the Y tristimulus value is reported because it represents the luminance information. The equations for obtaining the other two tristimulus values from chromaticity coordinates and the Y tristimulus value are often useful and therefore are given in Eqs. (19) and (20): X=
xY y
(19)
Z=
(1.0 − x − y)Y . y
(20)
Chromaticity coordinates, alone, provide no information about the color appearance of stimuli because they include no luminance (or therefore lightness) information and do not account for chromatic adaptation. As an observer’s state of adaptation changes, the color appearance that corresponds to a given set of chromaticity coordinates can change dramatically (e.g., a change from
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
yellow to blue could occur with a change from daylight to incandescent-light adaptation). Much effort has been expended to make chromaticity diagrams more perceptually uniform. This is an impossible task, but it is worth mentioning one of the results, which is actually the chromaticity diagram currently recommended by the CIE for general use, the CIE 1976 Uniform Chromaticity Scales (UCS) diagram, defined by Eqs. (21) and (22). 4X X + 15Y + 3Z 9Y v = X + 15Y + 3Z
u =
(21) (22)
The use of chromaticity diagrams should be avoided in most circumstances, particularly when the phenomena being investigated are highly dependent on the threedimensional nature of color. For example, the display and comparison of the color gamuts of imaging devices in chromaticity diagrams is misleading to the point of being almost completely erroneous. CIE COLOR SPACES The advent of the CIE color spaces, CIELAB and CIELUV, has made the general use of chromaticity diagrams largely obsolete. These spaces extend tristimulus colorimetry to three-dimensional spaces whose dimensions correlate approximately with the perceived lightness, chroma, and hue of a stimulus. This is accomplished by incorporating features to account for chromatic adaptation and nonlinear visual responses. The main aim in developing of these spaces was to provide uniform practices for measurement of color differences, something that cannot be done reliably in tristimulus or chromaticity spaces. In 1976, two spaces were recommended for use because there was no clear evidence to support one over the other at that time. Wyszecki (37) provides an overview of the development of the CIE color spaces. Of the two, CIELUV is rarely used any more, so it will be only mentioned briefly. The CIE 1976 (L∗ u∗ v∗ ) color space, abbreviated CIELUV, is defined by Eqs. (23)–(27). Equation 23 is restricted to tristimulus values normalized to white that are greater than 0.008856. See the CIELAB description later for a full definition of L∗ given in Eq. (28). L∗ = 116(Y/Yn )1/3 − 16,
(23)
u∗ = 13L∗ (u − u n ),
(24)
∗
∗
v n ),
v = 13L (v − C∗uv = (u∗2 + v∗2 ), and
huv = tan−1 (v∗ /u∗ ).
(25) (26)
(27)
In these equations, u and v are the chromaticity coordinates of the stimulus, and u n and v n are the chromaticity coordinates of the reference white. L∗ represents lightness, u∗ redness–greenness, v∗ yellowness–blueness, C∗uv
535
chroma, and huv hue. As in CIELAB, the L∗ , u∗ , and v∗ coordinates are used to construct a Cartesian color space, and the L∗ , C∗uv , and huv coordinates are the cylindrical representation of the same space. In 1976, the CIELAB and CIELUV spaces were both recommended as interim solutions to the problem of color difference specification of reflecting samples. Since that time, CIELAB has become almost universally used for color specification and particularly color difference measurement. Now, there is no technical reason to use CIELUV over CIELAB. CIELAB was developed as a color space for specifying color differences. In the early 1970s, there were as many as 20 different formulas used to calculate color differences. To promote uniformity of practice pending the development of a better formula, in 1976, the CIE recommended two color spaces, CIELAB and CIELUV (45, 46). Two spaces were recommended at the time because there was not sufficient experimental evidence to suggest a preference of one over the other. The Euclidean distance between two points in these spaces is taken as a measure of their color difference ( E∗ab or E∗uv ). Note that, in 1994, the CIE recommended a single better formula for color difference measurement, based on the CIELAB space. This is known as E∗94 (47, 48), and further enhancement is expected to result in a new color difference equation, tentatively named CIE2000, to be published in the near future. To calculate CIELAB coordinates, one must begin with two sets of CIE XYZ tristimulus values, those of the stimulus, XYZ, and those of the reference white, Xn Yn Zn . These data are used in a modified form of the von Kries chromatic adaptation transform by normalizing the stimulus tristimulus values by those of the white (i.e., X/Xn , Y/Yn , and Z/Zn ). Note that the CIE tristimulus values are not first transformed to cone responses as would be necessary for a true von Kries adaptation model. These adapted signals are then subject to a compressive nonlinearity represented by a cube root in the CIELAB equations. This nonlinearity is designed to model the compressive response typically found between physical energy and perceptual measurements (49). Then, these signals are combined into three response dimensions that correspond to the light–dark, red–green, and yellow–blue responses of the opponent theory of color vision. Finally, appropriate multiplicative constants are incorporated into the equations to provide the required uniform perceptual spacing and proper relationship among the three dimensions. The full CIELAB equations are given in Eqs. (28)– (31). L∗ = 116f (Y/Yn ) − 16 ∗
a = 500[f (X/Xn ) − f (Y/Yn )] ∗
b = 200[f (Y/Yn ) − f (Z/Zn )]
ω > 0.008856 ω1/3 f (ω) = 7.787ω + 16/116 ω ≤ 0.008856
(28) (29) (30) (31)
The alternative forms for low tristimulus values were introduced by Pauli (50) to overcome limitations in the original CIELAB equations that restricted their application to values of X/Xn , Y/Yn , and Z/Zn greater than
536
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
0.01. Such low values are usually encountered in colored materials that have glossy surfaces, and sometimes are found in flare-free specifications of imaging systems. It is critical to use the full set of equations given before when low values might be encountered. The L∗ measure given in Eq. (28) is a correlate of perceived lightness and ranges from 0.0 for black to 100.0 for a perfect diffuse white, the PRD (L∗ can sometimes exceed 100.0 for stimuli such as specular highlights in images). The a∗ and b∗ dimensions correlate approximately with red–green and yellow–blue chroma perceptions. They take on both negative and positive values. Both a∗ and b∗ have values of 0.0 for achromatic stimuli (i.e., white, gray, black). Their maximum values are limited by the physical properties of materials rather than the equations themselves. The CIELAB L∗ , a∗ , and b∗ dimensions are combined as Cartesian coordinates to form a three-dimensional color space, as illustrated in Fig. 20. This color space can also be represented in cylindrical coordinates, as shown in Fig. 21. The cylindrical coordinate system provides predictors
Light ∗
L
of chroma C∗ab and hue hab (hue angle in degrees), as expressed in Eqs. (32) and (33). C∗ab =
Dark Figure 20. Illustration of the meaning of the rectangular coordinates of the CIELAB color space.
Light L∗ Chroma C∗
Yellowish b∗
Greenish
Reddish a∗
Color differences are measured in the CIELAB space as the Euclidean distance between the coordinates for the two stimuli. This is expressed in terms of a CIELAB E∗ab , which can be calculated from Eq. (34). It can also be expressed in terms of lightness, chroma, and hue differences, as illustrated in Eq. (35) by using a combination of Eqs. (34) and (36). 1/2 E∗ab = L∗2 + a∗2 + b∗2 ∗2 1/2 E∗ab = L∗2 + C∗2 ab + Hab ∗ ∗2 ∗2 1/2 = E∗2 Hab ab − L − Cab
(34) (35) (36)
The CIELAB color space was designed to make color differences perceptually uniform throughout the space (i.e., a E∗ab of 1.0 for a pair of red stimuli is perceived as equal in magnitude to a E∗ab of 1.0 for a pair of gray stimuli), but this goal was not strictly achieved. To improve the uniformity of color difference measurements, the CIELAB E∗ab equation has been modified based on various empirical data. One of the most widely used modifications is the CMC color difference equation (51), which is based on a visual experiment on color difference perception in textiles. The CIE (48) recently evaluated such equations and the available visual data and recommended a new color difference equation for industrial use. This system for color difference mea∗ ) colorsurement is called the CIE 1994 ( L∗ C∗ab Hab difference model whose symbol is E∗94 and abbreviation is CIE94. Use of the CIE94 equation is preferred to a simple CIELAB E∗ab . CIE94 color differences are calculated from Eqs. (37)– (40).
E∗94 =
L∗ kL SL
2 +
where SL = 1, Dark Figure 21. Illustration of the meaning of the cylindrical coordinates of the CIELAB color space.
(33)
COLOR DIFFERENCE SPECIFICATION
h ab Hue angle
Bluish
(32)
∗
where C∗ has the same units as a∗ and b∗ . Achromatic stimuli have C∗ values of 0.0 (i.e., no chroma). Hue angle hab is expressed in positive degrees that start from 0° at the positive a∗ axis and progress counter clockwise. Thus, the CIELAB space takes the XYZ tristimulus values of a stimulus and the reference white as input and produces correlates to lightness L∗ , chroma, C∗ab , and hue hab , as output. Thus, CIELAB is a very simple form of a color appearance model.
Reddish
Bluish
∗
hab = tan (b /a ),
b∗
a∗
(a∗2 + b∗2 ), −1
Yellowish
Greenish
2 +
∗ Hab kH SH
2 1/2 (37) (38)
0.045C∗ab ,
(39)
SH = 1 + 0.015C∗ab .
(40)
SC = 1 + and
C∗ab kC SC
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
The parametric factors kL , kC , and kH are used to adjust the relative weighting of the lightness, chroma, and hue components, respectively, of color difference for various viewing conditions and applications that depart from the CIE94 reference conditions. It is also worth noting that, when averaged across the color space, CIE94 color differences are significantly smaller than CIELAB color differences for the same stimuli pairs. Thus, use of CIE94 color differences to report overall performance in applications such as the colorimetric characterization of imaging devices results in significantly improved performance if the numbers are mistakenly considered equivalent to CIELAB color differences. The same is true for CMC color differences. The CIE (48) established the following set of reference conditions for using the CIE94 color difference equations: Illumination: CIE Illuminant D65 simulator Illuminance: 1,000 lux Observer: Normal color vision Background: Uniform, achromatic, L∗ = 50 Viewing mode: Object Sample size: Greater than a 4° visual angle Sample separation: Direct edge contact Sample color-difference Magnitude: 0–5 CIELAB units Sample structure: No visually apparent pattern or nonuniformity The tight specification of reference conditions for the CIE94 equations highlights the vast amount of research that remains to generate a universally useful process for color difference specification. These issues are some of the same problems that must be tackled in developing of colorappearance models. Also note that for sample sizes greater than 4° , using of the CIE 1964 supplementary standard colorimetric observer is recommended. Once again, note that a revised color difference formula, tentatively called CIE2000 is likely to be published by the CIE in late 2001. It’s use is recommended over either E∗ab or E∗94 . COLOR-APPEARANCE MODELS This article reviewed the fundamentals of basic colorimetry and began touching on advanced colorimetry. These techniques are well established and have been successfully used for decades, but much more information must be added to the triangle of color to extend basic colorimetry toward the specification of the color-appearance of stimuli under a wide variety of viewing conditions. Some of the additional information that must be considered includes chromatic adaptation, light adaptation, luminance level, background color, and surround color. To describe the actual color appearance of stimuli, predictors of appearance parameters must be derived.
537
These parameters are the absolute color-appearance attributes of brightness, colorfulness, and hue and the relative color-appearance attributes of lightness, chroma, saturation, and, again, hue. (Hue retains a unique position as a color-appearance attribute that is really neither absolute nor relative, but important in both types of viewing and readily apparent in both related and unrelated colors. See Fairchild (6) for further details on this topic.) These terms are used to describe the colorappearance of stimuli independent of viewing condition. Color-appearance models aim to make a mathematical link between the basic colorimetery of light sources, materials, and the viewing environment and the apparent color of a given stimulus in that environment, as described by its color-appearance attributes. Given the definition mentioned, some fairly simple uniform color spaces such as the CIE 1976 L∗ a∗ b∗ color space (CIELAB) and the CIE 1976 L∗ u∗ v∗ color space (CIELUV) can be considered color-appearance models. These color spaces include simple chromatic adaptation transforms and predictors of lightness, chroma, and hue. Some general concepts that apply to the construction of all color-appearance models are described in this section. All color-appearance models for practical applications begin with specifying the stimulus and viewing conditions in terms of CIE XYZ tristimulus values (along with certain absolute luminances for some models). The first process applied to these data is generally a linear transformation from XYZ tristimulus values to cone responses to model the physiological processes in the human visual system more accurately. The importance of beginning with CIE tristimulus values is a matter of practicality. A great deal of color measurement instrumentation is available for measuring stimuli quickly and accurately in CIE tristimulus values. The CIE system is also a wellestablished, international standard for color specification and communication. Given tristimulus values for the stimulus, other data regarding the viewing environment must also be considered to predict color appearance. As a minimum, the tristimulus values of the adapting stimulus (usually taken as the light source) are also required. Additional data that might be used include the absolute luminance level, colorimetric data on the background and surround, and perhaps other spatial or temporal information. Given some or all of these data, the first step in a color-appearance model is generally a chromatic adaptation transform. Then the postadaptation signals are combined into higher level signals, usually modeled after the opponent theory of color vision and including threshold and/or compressive nonlinearities. These signals are combined in various ways to produce predictors of the various appearance attributes. Data on the adapting stimulus, background, and surround, are incorporated into the model at the chromatic adaptation stage and at later stages, as necessary. This general process can be witnessed within all of the color-appearance models that have been proposed (6). However, each model has been derived by a different approach, and various of the aspects mentioned before are
538
HUMAN VISUAL SYSTEM — COLOR VISUAL PROCESSING
stressed to a greater or lesser degree. A number of colorappearance models have been proposed and described in the color science literature. These are described in detail by Fairchild (6). Color-appearance models are still an active area of research, and the CIE has endeavored to provide some guidance to those interested in implementing colorappearance models when some form of standardization is required. To address this need, in 1997, the CIE formulated an interim color-appearance model called CIECAM97s. CIECAM97s represents a combination of many of the best features of previously published colorappearance models to provide some uniformity of practice in the field. CIECAM97s was never intended to be the final word on the subject (hence the inclusion of the year in its name), and it is a work in progress. It can be reasonably expected that the CIE will publish improvements to the model on a fairly regular basis during the coming years. Because of the interim nature of CIECAM97s, its formulation is not included in this article. Readers are encouraged to see Fairchild (6) or Hunt (3) for overviews of CIECAM97s. The formal CIE definition of the model can be found in CIE Publication 131 (52). Readers should also look for more recent formulations of the model. REVIEW This article has provided a brief overview of human color vision and accepted techniques of color measurement, CIE colorimetry. It is clear that both our understanding of color vision and our techniques to measure and specify color-appearance are still advancing at a rapid rate. This chapter focuses on the aspects of color science that are likely to remain fixed for the foreseeable future and are necessary foundations for moving forward to develop and use imaging systems and research in color science. The future of this evolving field promises to be fascinating. BIBLIOGRAPHY 1. G. Wyszecki and W. S. Stiles, Color Science: Concepts and Methods, Quantitative Data and Formulae, John Wiley & Sons, Inc., New York, NY, 1982. 2. R. W. G. Hunt, The Reproduction of Colour, 5e, Fountain Press, England,1995. 3. R. W. G. Hunt, Measuring Colour, 3e, Fountain Press, England, 1998. 4. P. K. Kaiser and R. M. Boynton, Human Color Vision, 2e, Optical Society of America, Washington, 1996. 5. K. R. Gegenfurtner and L. T. Sharpe, eds., Color Vision: From Genes to Perception, Cambridge, 1999. 6. M. D. Fairchild, Color Appearance Models, Addison-Wesley, Reading, PA, 1998. 7. C. J. Bartleson and F. Grum, Optical Radiation Measurements, vol. 5, Visual Measurements, Academic Press, Orlando, 1984. 8. F. W. Newll, Ophthalmology: Principles and Concepts, 6e, Mosby, St. Louis, 1986. 9. J. G. Nicholls, A. R. Martin, and B. G. Wallace, From Neuron to Brain, 3e, Sinauer, Sunderland, 1992.
10. T. N. Cornsweet, Visual Perception, Academic Press, NY, 1970. 11. B. A. Wandell, Foundations of Vision, Sinauer, Sunderland, 1995. 12. L. Spillmann and J. S. Werner, Visual Perception: The Neurophysiological Foundations, Academic Press, San Diego, 1990. 13. P. Lennie and M. D’Zmura, CRC Crit. Rev. Neurobiology 3, 333–400 (1988). 14. D. H. Hubel and T. N. Wiesel, J. Physiol. 195, 215–243 (1968). 15. R. S. Turner, in The Eye’s Mind: Vision and the Helmholts–Hering Controversy, Princeton University Press, Princeton, 1994. 16. G. Svaetichin, Acta Physiologica Scandinavia 39, 17–46 (1956). 17. R. L. DeValois, C. J. Smith, S. T. Kitai, and S. J. Karoly, Science 127, 238–239 (1958). 18. D. Jameson and L. M. Hurvich, Ann. Rev. Psychol. 40, 1–22 (1955). 19. L. M. Hurvich, Color Vision, Sinauer, Sunderland, 1981. 20. J. von Kries, Festschrift der Albrecht-Ludwig-Universitat, ¨ Fribourg, 1902; [Translation: D. L. MacAdam, Sources of Color Science, MIT Press, Cambridge, 1970.] [See also D. L. MacAdam, Selected Papers on Colorimetry — Fundamentals. SPIE, Bellingham, 1993.] 21. R. W. G. Hunt, I. T. Pitt, and L. M. Winter, J. Photogr. Sci. 22, 144–150 (1974). 22. J. Davidoff, Cognition Through Color, MIT Press, Cambridge, 1992. 23. OSA, The Science of Color, OSA, Washington, 1963. 24. X. Zhang and B. A. Wandell, SID 96, San Diego, CA, 1996. 25. S. N. Pattanaik, J. A. Ferwerda, M. D. Fairchild, and D. P. Greenberg, SIGGRAPH 98, Orlando, 1998, pp. 287–298. 26. S. N. Pattanaik, M. D. Fairchild, J. A. Ferwerda, and D. P. Greenberg, IS& T/SID 6th Color Imaging Conf., Scottsdale, NY, 1998. pp. 2–7. 27. A. Chaparro, C. F. Stromeyer, E. P. Huang, R. E. Kronauer, and R. T. Eskew, Nature 361, 348–350 1993. 28. D. H. Kelly, ed., Visual Science and Engineering: Models and Applications, Marcel Dekker, NY, 1994. 29. D. M. Hunt, S. D. Kanwaljit, J. K. Bowmaker, and J. D. Mollon, Science 267, 984–988 (1995). 30. W. D. Wright, AIC Color 81, Berlin, 1981. 31. M. D. Fairchild, Color Res. Appl. 18, 129–134 (1993). 32. R. S. Berns, Billmeyer and Saltzmann’s Principles of Color Technology, 3e, John Wiley & Sons, Inc., New Tork, NY, 2000. 33. A. Berger-Schunn, Practical Color Measurement, John Wiley & Sons, Inc., New York, NY, 1994. 34. R. S. Hunter and R. W. Harold, The Measurement of Appearance, 2e, John Wiley & Sons, Inc., New York, NY, 1987. 35. CIE, Colorimetry, Vienna, 1986. 36. G. Wyszecki, AIC Color 73, York, 1973, pp. 21–51. 37. G. Wyszecki, in Handbook of Perception and Human Performance, John Wiley & Sons, Inc., New York, NY, 1986, chap 9. 38. F. Grum and R. J. Becherer, Optical Radiation Measurements, vol. 1, Radiometry, Academic Press, NY, 1979. 39. R. W. Boyd, Radiometry and the Detection of Optical Radiation, John Wiley & Sons, Inc., New York, NY, 1983.
HUMAN VISUAL SYSTEM — IMAGE FORMATION 40. CIE, CIE 1988 2° Spectral Luminous Efficiency Function for Photopic Vision, Vienna, 1990.
539
Cornea
41. W. D. Wright, Trans. Opt. Soc. 30, 141–161 (1928–29).
Pupil
42. J. Guild, Philos. Trans. R. Soc. A 230, 149–187 (1931). 43. R. M. Boynton, J. Opt. Soc. Am. A 13, 1,609–1,621 (1996).
Lens
44. W. S. Stiles and J. M. Burch, Optica Acta 6, 1–26 (1959).
Sclera
45. A. R. Robertson, Color Res. Appl. 2, 7–11 (1977). 46. A. R. Robertson, Color Res. Appl. 15, 167–170 (1990). 47. R. S. Berns, AIC Color 93, Budapest, 1993. 48. CIE, Industrial Color-Difference Evaluation, Vienna, 1995. 49. S. S. Stevens, Science 133, 80–86 (1961). 50. H. Pauli, J. Opt. Soc. Am. 36, 866–867 (1976).
Retina
51. F. J. J. Clarke, R. McDonald, and B. Rigg, J. Soc. Dyers Colourists 100, 128–132 (1984).
rve
52. CIE, The CIE 1997 Interim Colour Appearance Model (Simple Version), CIECAM97s, Vienna, 1998.
Fovea
Op
tic
ne
HUMAN VISUAL SYSTEM — IMAGE FORMATION AUSTIN ROORDA University of Houston College of Optometry Houston, TX
INTRODUCTION The ultimate stage in most optical imaging systems is the formation of an image on the retina, and the design of most optical systems takes this important fact into account. For example, the light output from optical systems is often limited (or should be limited) to the portion of the spectrum to which we are most sensitive (i.e., the visible spectrum). The light level of the final image is within a range that is not too dim or bright. Exit pupils of microscopes and binoculars are matched to typical pupil sizes, and images are often produced at a suitable magnification, so that they are easily resolved. We even incorporate focus adjustments that can adapt when the user is near- or farsighted. Of course, it is understandable that our man-made environment is designed to fit within our sensory and physical capabilities. But this process is not complete. There is still much to know about the optical system of the human eye, and as we increase our understanding of the eye, we learn better ways to present visual stimuli, and to design instruments for which we are the end users. This article focuses on the way images are formed in the eye and the factors in the optical system that influence image quality. THE OPTICAL SYSTEM OF THE EYE The optical system of the eye is composed of three main components, the cornea, the iris, and the lens (Fig. 1). The structures of these three components are complex, and their description could fill volumes. Discussion in this article is limited to their optical properties, and the
Figure 1. Schematic of the eye.
anatomy is discussed only to the extent to which it impacts image quality. Cornea The cornea is the transparent first surface of the eye. It is an extension of the sclera, which is the tough, white outer shell of the eye. The transparency of the cornea is facilitated by the regular arrangement of the layers of collagen fibers that comprise most of the corneal thickness (1). Periodic closures of the eyelid maintain a thin tear film on the cornea’s external surface, which ensures a smooth refracting surface. However, changes in the tear film give rise to scattering and small changes in optical aberrations (2,3). The cornea is about 0.5 to 0.6 mm thick at its center, it has a mean refractive index of about 1.376 and its first surface has a radius of curvature of about 7.7 mm. Combining this with a back surface whose a radius is about 6.8 mm gives the cornea a total power of roughly 43 diopters. Because the cornea accounts for most of the power of the eye, it follows that it is also a key contributor to aberrations of the eye. The high magnitude of aberrations that might have existed in the cornea is reduced by virtue of its conic, rather than spherical shape. The slight flattening of the corneal curvature toward the periphery reduces the amount of spherical aberration to about one-tenth of that in spherical lenses of similar power (4). But corneal shapes vary and give rise to astigmatism and higher order aberrations (5,6). Laser refractive surgery induces corneal shape changes that may correct the mean defocus error but often leave large amounts of aberration (7–9). These postsurgical corneal aberrations have been correlated with losses in visual performance (10,11).
540
HUMAN VISUAL SYSTEM — IMAGE FORMATION
Pupil The pupil serves two main optical functions. It limits the amount of light that reaches the retina, and it alters the numerical aperture of the eye’s image system. Because of aberrations in the eye (see later section), the biggest pupil does not necessarily provide the best image quality. The optimal pupil size is criterion dependent, but for imaging the important spatial frequencies that humans use, the optimal pupil size is between 2 and 3 mm in diameter (12–14) (see later section for more discussion). Because the pupil is between the cornea and lens, it acts as a true aperture stop, which means that changing its size does not affect the field of view of the eye. This configuration limits off-axis aberrations and also gives the eye a field of view that spans nearly the full hemisphere. Lens The lens is positioned immediately behind the iris. The lens in the eye adds another 20–30 diopters to the optical system. It is held in place near its equator via zonules attached to the ciliary body. The tension on the zonules is relaxed by contraction of the ciliary muscle. This action increases lens curvatures making the eye focus, or ‘‘accommodate’’, on near objects (15). Tension on the zonules increases by relaxing the ciliary muscle, thereby flattening the lens and allowing the eye to focus on distant objects. The lens tends to harden as it ages, which diminishes its ability to change shape (16). This, combined with other lenticular and extralenticular factors, such as increasing lens size, the changing location of the insertion points of the zonules, and the aging of the ciliary muscle [see (17) for a review], results in a loss of accommodation in a condition called ‘‘presbyopia’’, as shown in Fig. 2 (18,19). The crystalline lens is composed of multiple layers of long, lens fiber cells that originate from the equator and stretch toward the poles of the lens. At the point where the cells meet, they form suture patterns. In nonprimate species, these suture patterns have simple shapes, for example a ‘‘Y’’, whose orientations are the same in each new layer, making it easily visible in the isolated lens (20). In the human, the embryonic eye has ‘‘Y’’ sutures, but as the eye ages, the suture patterns in the new layers become increasingly more complex, resulting in a lens whose suture patterns have a starlike appearance (21). Some scatter and refractive changes occur at the suture points, especially along the optical axis of the lens (21). Crystalline layers in the lens continue to form throughout life, and the lens continues to grow with age. The development pattern gives rise to a crystalline lens that has a gradient refractive index (22). Since the gradient index was first discovered, it has been the topic of much speculation and research (15,23,24). In simpler lenses, the gradient index of refraction can be determined by optical means [ellipsoidal rat lens (25) and the spherical fish lens (26)]. The exact form of the gradient in the human eye is still unknown, but its index peaks in the center at about 1.415 and drops off slowly at first and then quickly near the surface to a value of about 1.37 (27). Whether this gradient is by design or reflects the natural properties of biological tissue is arguable, but it cannot be denied
Age
Figure 2. The ability of the lens to accommodate to different object distances decreases as it ages. By the age of 58, essentially all accommodative ability is lost. Shown here are data showing excellent agreement between a subjective test more than 1,000 eyes by Duane and objective results from optical measurements on the isolated human crystalline lens by Glasser et al. (18) [Reproduced from Fig. 5 in (18)].
that this gradient index design is far superior to a lens of similar shape that has a homogenous refractive index. First, it has been shown that the gradient refractive index creates a lens that has better optical properties than a homogenous lens (28). Second, although the gradient index crystalline lens has a peak refractive index at its core of about 1.41, a homogenous lens would require an index of refraction that is higher than the peak of the gradient, typically around 1.43 (22,29). In addition, the lower refractive indexes minimize reflection and scattering losses, particularly at the lens–aqueous and lens–vitreous interfaces. Relative Orientation of the Components The optical elements described work in concert to create an image on the retina. But the individual components alone cannot be analyzed without knowing their locations with respect to each other. Their approximate locations are shown in Fig. 1, but these locations vary among individuals. The optical axis of the eye would be a line joining the centers of curvature of all the optical surfaces. These points, however, do not line up, and a true optical axis does not exist. Nonetheless, one can identify an axis, called the best optical axis, that most closely approximates the eye’s optical axis (30). The values that one finds for the tilt and decentrations of the optical components depend on the selection of this best optical axis (31,32). Even if the optics were perfectly aligned, the fact that the fovea is positioned away from this best optical axis forces us to view obliquely though the eye’s optics.
HUMAN VISUAL SYSTEM — IMAGE FORMATION
0.32 0.30 0.28 Lens weight (g)
An appropriate and convenient axis that can be used for the optical system of the eye is the line of sight, which is defined as the ray that passes through the center of the entrance pupil and strikes the fovea (the area of the retina that has the highest density of cone photoreceptors that is used for high acuity vision) (33). In other words, the line of sight is the chief ray of the bundle of rays from any object that focuses on the fovea. The line of sight can be determined quite easily in a lab setting, and it also has functional importance for vision. In the human eye, the fovea is located away from the location where the best optical axis intersects the retina. The angle between the line of sight and the best optical axis is called angle alpha. Angle alpha is variable, but, on average, it is displaced about 5° in the temporal direction. Based on this configuration, if the eye’s best optical axes were pointing straight forward, the eyes’ lines of sight would meet at a distance of 34.2 cm (assuming a distance of 60 mm between the eyes).
541
0.26 0.24 0.22 0.20 0.18
Weight = 0.00133 * age + 0.179 r 2 = 0.852; p < 0.001
0.16 0
20
40
60
80
100
Age (years)
Changes in the Eye as It Ages Everything in the human body change as it ages. It is not surprising that the optics of the eye and retinal image quality also change. Early aberration measurements (34) have shown that young eyes have more negative or overcorrected spherical aberrations. As the eye ages, the prevalence of positive spherical aberration tends to increase. Similar trends have been observed in the primate isolated crystalline lens (18), an optical component that continues to grow as it ages (16,35,36). This continual growth is reflected in Fig. 3, which shows the weight of the lens as a function of age. Constant environmental changes, eye growth, and age also change the corneal topography as it ages (37). It is reported that the lens and the cornea mutually compensate for the aberrations that the other imposes (38), but it is suspected that this balance is disrupted by the changing properties of each element as age increases.
Figure 3. This graph shows the clear trend of increasing weight in the human crystalline lens as it ages, indicating an increase in lens size. Although the lens continues to grow, it does nonlinearly. For example, lens equatorial diameter does not change with age (148) [Reproduced from Fig. 3 in (16)].
Object plane yo Lens plane y Image plane yi
Xo X
Xi
IMAGE FORMATION IN THE EYE The Mathematics of Image Formation Even though the optical system of the eye is complex, the process of image formation can be simplified by adopting a ‘‘black box’’ approach to the problem. If we know how the wave front of a parallel beam is altered in a given direction, then we can predict how the image will be formed. This approach focuses on the image-forming properties of the optical system as a whole, rather than on analyzing the physical and optical properties of every component in the system. To date, this has been the most successful approach for studying image quality in the human eye. To begin, we must first establish a coordinate system (Fig. 4). The mathematics of image formation will be based on the assumption that we have managed to measure how light has been changed by passing through the optics of the eye. The techniques for this measurement are outside the scope of this article, but the reader is referred to the following references to learn more about subjective methods (39–42) and about objective methods (2,14,43,44)
Figure 4. The standard coordinate system shown here will be used for all mathematical analyses in this article.
for measuring wave aberrations. The way light is changed by the optical system of the eye can be represented by the complex pupil function: P(x, y) · e−i(2π/λ)W(x,y) .
(1)
The pupil function has two components, an amplitude component P(x, y) and a phase component that contains the wave aberration W(x, y). The amplitude component P(x, y) defines the shape, size, and transmission of the optical system. The most common shape for the aperture
542
HUMAN VISUAL SYSTEM — IMAGE FORMATION
function is a circ function that defines a circular aperture. The size is simply the size of the pupil that the pupil function defines. A circ function may not always be the correct choice. For example, off-axis imaging would employ an elliptical aperture function. A clear optical system would have a value of unity at all points across the pupil function. To model variable transmission, one represents the pupil function as the fraction of transmitted light as a function of pupil location. Variable transmission may arise because of absorption in the lens. Variable transmission can also be used to model the way the eye ‘‘sees’’ by incorporating phenomena like the Stiles–Crawford effect (45) (see later section). In this case, the amplitude component of the pupil function takes the form of a Gaussian function (45). Another reason to use a nonuniform aperture function is to model the effect of a nonuniform beam entering or exiting the eye. The wave aberration W(x, y) defines how the phase of light is affected as it passes through the optical system. The wave front is a line that joins every point on a wave that has the same phase (see Fig. 5). In other words, it is a surface that joins the leading edges of all rays at some instant. The wave aberration is defined as the deviation of this wave front from a reference surface. The reference surface is commonly defined as a surface of curvature near the wave front whose origin is located at the Gaussian image point (where the light would be focused if the eye were perfect). If the Gaussian image is at infinity, then it follows that the reference surface is a plane. For the human eye, the natural choices for the reference surface would be a sphere whose center of curvature is at the photoreceptor
inner segments in the retina (choose the fovea for line-ofsight measurements) or at infinity for light emerging from the eye. Using such a reference sphere, departures from emmetropia (corrected vision) would appear as residual defocus and astigmatic shapes in the wave front. The wave aberration is often defined mathematically by a series of polynomials. Common choices are the Seidel polynomials, but these can define only a limited range of aberrations. Another choice is the Taylor polynomials, which can be used to define any surface as long as enough orders are used (39). Currently, the most popular choice of polynomials is Zernike polynomials because they have convenient properties that simplify the analysis of wave aberrations (46). These polynomials can also be used to represent any surface, and the quality of the fit is limited only by the number of polynomial terms that are used. To be complete, there is more than one wave aberration because the optics of the eye are birefringent. The birefringence of ocular structures is discussed later. Once the nature of the light emerging from the optical system is known, optical image formation in the eye becomes relatively straightforward. The mathematics of image formation begins with the computation of the pointspread function (PSF), which is the image of a point source formed by the optical system. The PSF can be computed using the Fraunhofer approximation (for PSFs near the image plane) (47): 2 PSF (xi , yi ) = K · FT P(x, y) · e−i(2π/λ)W(x,y) ,
(2)
where FT represents the Fourier transform operator and K is a constant. The actual Fourier transform is not discussed in this article. Once the concepts are understood, computations like the Fourier transform can be done numerically by using one of many common software packages such as MatlabTM . Representation of the Wave Aberration
Wave front
W (x,y ) Reference plane Figure 5. In a diffraction-limited optical system, the wave front takes a spherical shape (or a plane in the special case of a parallel beam). The wave front of the aberrated beam takes an irregular shape. The wave aberration is the difference between the reference wave front (which in most cases is the ideal diffraction-limited wave front) and the actual wave front.
In this section, we will limit the polynomial description of wave aberrations to Zernike polynomials (see Fig. 6). There are various forms and orders of the Zernike polynomials, but recently a committee has undertaken the task of coming up with a standard and acceptable form for use in vision (48). Any other representation of the wave aberration will work also. To predict optical quality in the human eye, it is important that to anchor the wave aberration to a precise location in the eye. It is common to choose the geometric center of the entrance (and exit) pupil as the point of origin, but in the human eye, the pupil does not always dilate symmetrically, and so the pupil center for a small pupil might be different from the pupil center for a large pupil (49). It is also known that the application of drugs to dilate the pupil causes asymmetrical dilation (50,51). A potentially stable landmark for describing aberrations might be the reflection of an on-axis source off the cornea when the eye is fixated along the optical axis, also called the coaxially sighted corneal reflex (32). This position is stable and independent of the pupil, but because it is not centered in the pupil, the computation of image quality is more difficult. So the selection of the origin depends
HUMAN VISUAL SYSTEM — IMAGE FORMATION −1
543
−1
Z3
Z4
−3
−9
Z3
Z9
1
Z5
Eye
Figure 6. Selected wave aberrations and their respective far-field PSFs. Five single Zernike terms are shown, numbered according to the standard method for vision science (48). The wave front in the lower right corner is from a typical human eye over a 7-mm pupil.
on the application. In either case, the necessary data for determining retinal image quality are the wave aberration, the pupil size, and the relative shift between the origin of the wave-front description and the center of the eye’s pupil (if the origin of the wave-front description is the line of sight, then these are the same). In this article, wave aberration will be defined with respect to the geometric center of the entrance pupil. Formation of the Retinal Image The image of a point source on the retina is called the point-spread function (PSF). The next stage is to compute how the image of an object is blurred on the retina. One can think of the object as an array of point sources; each has its respective location, intensity, and spectral composition. In the limit where the object is small and composed of a single wavelength, the computation of the image can be simplified. Under this assumption (isoplanatism), each point on the object has the same PSF. To compute the image, first one computes the size of the image as if the imaging system were perfect (magnification = object vergence/image vergence). This is called the ideal image. Each point on that ideal image takes the form of the PSF whose shape is constant but whose intensity is scaled
by the intensity of the point. This process is called a convolution and is shown in the following equation: I(xi , yi ) = PSF ⊗ O(xi , yi ),
(3)
where O(xi , yi ) is the ideal image of the object (corrected for magnification), PSF is the point-spread function, and I(xi , yi ) is the convolved image. Intuitively, the process can be shown in real space, but in practice, the convolution is performed mathematically by computing the product of the Fourier transform of the ideal image with the Fourier transform of the PSF (47,52). Calculation of Image Quality Metrics That Are Relevant for the Human Eye No single number can be used to define image quality in the human eye. When image quality is good, numbers such as the Strehl ratio and the root mean square (rms) aberration are commonly used, but in this section, I show that these do not always work for the magnitude and type of aberrations encountered in the human eye. When aberrations are high, the rms wave aberration and the Strehl ratio no longer correlate. Figure 7a shows what happens if a typical eye that has a 6-mm pupil chooses
544
HUMAN VISUAL SYSTEM — IMAGE FORMATION
20/20 letters at minimum rms aberration
20/20 letters at maximum strehl ratio
Figure 7. In this figure, the aberrations of a typical human eye were calculated over a 7-mm pupil. The PSFs were calculated in the focal planes that had the minimum rms wave-front error and the focal plane that had the maximum Strehl ratio. In this example, the separation between the two was about 0.5 D of defocus. The PSFs were convolved with three 20/20 Sloan letters (5 minutes of arc per side). The image in the focal plane that had the maximum Strehl ratio is more readable than that of the minimum rms. Note that when you view such simulations, they are further blurred by the optics of your own eye. However, by viewing the figure at a close distance, the relative size of your PSF is small compared to that simulated in the figure.
the minimum rms as the optimal condition for reading the 20/20 letters on a Snellen chart. Figure 7b shows the same letters blurred by the PSF when the criterion of the highest Strehl ratio is chosen. These two focal planes are separated by more than 0.5 diopters. Similar trends are observed in many eyes, as shown in the plots of Fig. 8. To put that in perspective, 0.5 diopters is sufficient amount of defocus to warrant wearing spectacle lenses. Based on Fig. 7 alone, one might argue that the defocus that gives the highest Strehl ratio is the most relevant choice for human vision, but this may apply only to visual acuity measurements. Recent findings suggest that the image quality metrics that correlate best with subjective preference are those based on the image plane (i.e., the Strehl ratio, encircled energy) rather than the pupil plane (rms, peak-to-valley aberration) (53). In summary, the rms can be a good indicator of optical quality, but minimizing the rms does not necessarily ensure the best image quality.
Humans rarely work at the limits of their visual capability. The eye can perceive spatial frequencies as high as 50 cycles per degree, but the range of spatial frequencies that is most important for vision and that dominates the visual scene is predominantly lower. Spatial frequencies in the environment have amplitude spectra that approximately follow a 1/f law, where f is the spatial frequency (54,55). An image quality metric that appreciates the importance of this range of spatial frequencies might be the best choice. Best image quality might occur when the area under the modulation transfer function (MTF) is maximized for those spatial frequencies. Two methods are commonly used to assess subjective image quality, visual acuity and contrast sensitivity. Visual acuity is a classic measure, but it tests very high spatial frequencies and may not be a meaningful measure of the quality of vision. This argument is supported by the fact that visual acuity does not correlate with typical variations in aberration until very high amounts of aberration are present (10). The contrast sensitivity function is the visual analog to the modulation transfer function, except that it combines the optics of the eye with the neural limits imposed in part by the discrete sampling array of photoreceptors (12). Although the contrast sensitivity function is a more tedious measurement to make, it reflects the quality of vision for all spatial frequencies when visual acuity tests vision only at its limits. IMAGE QUALITY IN THE HUMAN EYE Chromatic Aberrations Dispersion of the refracting media in the human eye gives rise to chromatic aberration. This chromatic aberration manifests itself in two ways, longitudinal and transverse. Longitudinal Chromatic Aberration. Longitudinal chromatic aberration refers to the change in focus as a function of wavelength. The degree of chromatic aberration is relatively constant between eyes, and there is general agreement among the measured values in the literature (33). Chromatic aberration often describes the changing power of the optical system, but it is more common to describe chromatic aberration in terms of the way it affects the refraction of the eye. This is more appropriately called the chromatic difference of refraction. Refraction defines the optical correction required outside the eye to correct it for focusing at infinity. It is a more clinically meaningful quantity and is easily measured by a variety of techniques. Bennett and Rabbetts published chromatic difference of refraction curves summarizing the results from a collection of studies, and it is reproduced here in Fig. 9. The graph shows that the chromatic difference of refraction is 2.2 D across the visible spectrum (from 400 to 700 nm). This implies that if the eye is properly focused on 700 nm light at infinity, 400 nm light must be at a distance of 45 cm from the eye to be in focus. The first impression might be that this amount of chromatic aberration of the eye would be seriously deleterious for vision. After all, a refractive error of
HUMAN VISUAL SYSTEM — IMAGE FORMATION
2.5 2
rms
rms (microns)
1.5
0.08 0.06
1
0.09
Strehl ratio
0.08
rms
2
0.07 1.5
0.06 0.05
1
0.04 0.03
0.04 0.5
0.02
0.5
0.02 0.01
0 −1
−0.5
0 0 0.5 Defocus (diopters)
1
0
1.5
−1
−0.5
1
0 1.5
2.5
0.16
Strehl ratio
0 0.5 Defocus (diopters)
0.14
Strehl ratio
0.12
rms
2
0.1
1.5
0.08 1
0.06 0.04
rms (microns)
Strehl ratio
0.1
Strehl ratio
0.12
2.5
0.1
Strehl ratio
rms (microns)
0.14
545
0.5
0.02 0 −1
−0.5
0
0.5
1
0 1.5
Defocus (diopters) Figure 8. The aberrations of three individuals were measured over a 6-mm pupil. The rms and the Strehl ratio were calculated as a function of defocus for each eye. The graph shows that the best focal plane is criterion dependent.
0.25 D is severe enough to warrant wearing a spectacle correction. The severity of the chromatic aberration is lessened because the eye has a tuned spectral bandwidth. This is discussed in more detail later. Of all the aberration terms that are present in the eye, the defocus is certainly not the only aberration that changes as wavelength changes. Marcos et al. showed that other aberrations, including astigmatism and spherical aberration, also change as wavelength changes, although to a lesser extent (56). Even without aberrations, diffraction itself causes changes in the image forming properties as a function of wavelength [see Eq. (2)]. Transverse Chromatic Aberration. Transverse chromatic aberration arises because of a difference of refraction in the chief ray as a function of wavelength. So, in addition to a difference in focus, chromatic aberration also gives rise to differences in the retinal location of the image as a function of wavelength. Transverse chromatic aberration affects mainly off-axis objects. The reason for this is that the entrance pupil is axially displaced from the nodal points of the eye. The amount of off-axis transverse chromatic aberration nearly matches the sampling efficiency of the eye as a function of retinal eccentricity, so it is not a serious problem. However, the most serious effects occur at the fovea. Transverse
aberration at the fovea occurs because the nodal points do not lie along the line of sight. In other words, the foveal achromatic axis (the ray that joins the nodal points with the fovea) is not necessarily collinear with the line of sight. This aberration has been carefully measured using vernier alignment techniques (57–59). From these studies, the transverse chromatic aberration between 400 and 700 nm at the fovea was typically within 1 minute of arc but larger variations have been found in other studies (56). Because the photoreceptor sampling is very high at the fovea, any amount of transverse aberration can be a problem. The amount of transverse chromatic aberration is only slightly larger than the size of a single foveal cone, indicating that there is very little disparity between the line of sight and the achromatic axis. Nonetheless, this disparity is greater than the minimum disparity that the human eye can detect [the ability to detect tiny displacements in the alignment between two objects is called vernier acuity or hyperacuity and is about 6 seconds of arc (60)]. Monochromatic Aberrations On-Axis Aberrations. Scientists have studied the monochromatic aberrations of the eye for the last two centuries (13,15,22,31,39–44,61–69). The on-axis aberrations of the eye refer to those aberrations that occur at the fovea of the eye. To be exact, these aberrations are those that
546
HUMAN VISUAL SYSTEM — IMAGE FORMATION
F′
d
∆k (D)
C′
+ 0.6 + 0.4 + 0.2 0 − 0.2 − 0.4 − 0.6 − 0.8 − 1.0 − 1.2 − 1.4
is a plot of the same data. The graph shows a nearly linear increase in rms as pupil size increases. In general, there is good agreement among the separate studies. It is now well known that the aberrations of the human eye have a large amount of interindividual variation. Do the same variations exist in the degree of aberrations? A list of rms aberrations as a function of pupil size reveals that, on average, aberrations increase as pupil size increases. The variation is not small, however. In a study of 14 subjects, Liang and Williams found a range of rms errors of 0.92 to 0.26 microns for a 7.3-mm pupil (14). These variations have significant impact on image quality and are likely factors contributing to variability in humans’ best-corrected visual acuity. The only trend that has been found in the higher order aberrations is a tendency for the relaxed eye to have an average spherical aberration component that is positive (4,14). No trends have been found in any other aberrations. Although the on-axis measurements are actually displaced from the best optical axis by about 5° (angle alpha), no investigator has found that any systematic off-axis aberrations such as coma exist at the fovea.
− 1.6 500
600
700
Wavelength (nm) Figure 9. The chromatic aberration of the eye causes a chromatic difference of refraction. It is plotted here as the refractive error of the eye as a function of wavelength where all data are normalized to a standard wavelength of 587.6 nm. Symbols are from the following: Wald and Griffin (149), Bedford and Wyszecki (150), Ivanoff (151), and calculated results for the Gullstrand-Emsley model eye by Bennett and Rabbets (33). The line represents the mean of all the data [Reproduced from Fig. 15.5 in (33)].
occur along the line of sight, which is about 5° from the best optical axis of the eye. Current computing technology allows us to derive and manipulate routinely the mathematical forms that represent the wave aberration of the eye. In 1977 Howland and Howland generated the first mathematical expressions that defined the total wave aberration of the eye (39). Since then, enough reports of the total aberration have been published to permit calculating the typical trends in the total aberration of the eye. Full descriptions of ocular aberrations are still developing. To date, the largest single published study of ocular aberrations included only 55 eyes (39). In total, the published literature in which the wave aberrations are described mathematically includes a total sum of less than 300 eyes. However, the recent development of the Hartmann–Shack wave-front sensor for the human eye has allowed rapid, accurate, and objective ocular aberration measurements, and this technique is currently being used to obtain the first extensive population studies of human eye aberrations. Table 1 shows the rms aberration drawn from studies in which the appropriate data have been published. Figure 10
Table 1. Plot of Average RMS Aberration for Human Eyes Drawn from Selected Papers Pupil Size
RMS Aberration (microns)
Standard Dev. (microns)
Number of Eyes
Ref.
4 6.7 5.4 7.3 3.4 5 7.33
0.210028 0.45 0.241885 0.47 0.207 0.162 0.83
0.060594 0.1 0.03894 0.19
5 4 2 14 12 11 8
68 80 2 14 14 44 85
0.041503 0.05
1.4 1.2 rms (microns)
400
− 1.8 800
1 0.8 0.6 0.4 0.2 0 0
2
4
6
8
Pupil size (mm) Figure 10. Each point on the graph represent results from a different study. Symbols are from the following studies: He Navarro et al. (80); et al. (85); ♦ Liang and Williams (14); Liang et al. (2); Iglesias et al. (152); Walsh et al. (44); Calver et al. (153). The plots shows that the rms aberrations increase as pupil size increases.
•
HUMAN VISUAL SYSTEM — IMAGE FORMATION
Off-Axis Aberrations. Like most optical systems, the optical quality of the eye degrades as the image moves away from the best optical axis. The most dominant off-axis aberration is astigmatism and its associated curvature of field. Compensation for curvature of field is facilitated by a curved image plane, or retina (22), so that in most eyes, the retina sits between the sagittal and tangential image planes. Objective techniques to measure off-axis astigmatism have found a nearly parabolic increase in astigmatism to between 4 and 5 diopters at 40° eccentricity from the fovea (70–73). Earlier refractometric measurements show similar characteristics at slightly less magnitude (74–77). As expected, other aberrations are also present offaxis, but these aberrations have yet to be completely characterized. Double-pass techniques have been effective for measuring the MTF of the eye, but the ability to extract the magnitude of odd aberrations is impossible (70,78,79). Navarro et al. used an objective ray-tracing model to estimate coma and other high-order aberrations in the periphery (80). In both cases, the amount of coma, measured off-axis, was significantly greater than that measured at the fovea. Overall, Navarro et al. found that the rms of higher order aberrations (5-mm pupil corrected for defocus and astigmatism at all eccentricities) increased linearly from 0.45 microns at the fovea to 1.13 microns at 40° . Incidentally, Rempt et al. (76) were the first to observe the effects of off-axis coma by using a retinoscope, but they did not identify it as caused by coma. They discussed the common occurrence of the ‘‘sliding-door’’ appearance of the retinoscopic reflex at off-axis locations, which is a phenomenon that can be entirely explained by the presence of coma (81). How much degradation in vision is expected from these off-axis aberrations? As it turns out, off-axis aberrations are not that problematic for vision because the detail we can see is most limited by the coarseness of the sampling array in the retina. This will be discussed more later. But, even though retinal sampling imposes the limit on resolution in the periphery, the blur due to off-axis aberrations significantly reduces the amount of aliasing that might occur (71,82). This is another example that suggests an exquisite codevelopment of the sampling and imaging capabilities of the human eye. Aberrations and Accommodation. During accommodation, the shape of the lens is changed by application and relaxation of zonular tension at its equator, so it follows that there are associated changes in the wave aberration of the lens. The predominant result revealed in the literature has been a tendency for the spherical aberration of the eye to move in a negative direction as it accommodates (34,62,63,83–85), although large interindividual variability was observed. Some subjects showed no dependency on aberration with accommodation, whereas a subset of subjects experienced a reversal of the sign of the spherical aberration. The He et al. study found that the overall aberrations (measured as the rms wave aberrations) decreased to a minimum then increased from further accommodation (85). They found that for most eyes there was an accommodative state for which the aberrations were at a
547
minimum and that this accommodative state was typically near the resting state of accommodation. The resting state of accommodation also called the dark focus, is where the unstimulated eye focuses (for example, if left in a dark room). It is typically at an accommodative state of about 2 diopters or at a focusing distance of about 50 cm. Studies of the isolated human lens have shown similar changes toward more negative spherical aberration from accommodation (18,86). It is important to add that the change in aberration from accommodation is roughly of the same magnitude as the total aberrations in any accommodation condition (87). This result has two important implications. First, because the change in aberration from accommodation is high, the lens must be a significant contributor to the total aberration of the eye. Secondly, this result means that a fixed aberrational correction in the eye applies only for a single accommodative state and any benefit will be diminished in a departure from that optimal state. This is important, given that refractive surgical techniques (e.g. LASIK) are working toward the possibility of improved best-corrected vision by correcting higher order aberrations (88). Sources of Aberration in the Eye. Few studies have determined the relative contributions of the various optical components to the total aberration of the eye. The reason for the lack of studies has been that techniques to measure the total aberration of the eye and techniques to compute the aberration of the cornea and lens are still maturing (18,38,89). For this reason, this section will report on some results for which full papers have yet to be published. The main contributors to the total aberration of the eye are definitely the cornea and the lens and their relative locations with respect to the pupil. In the cornea, the main contributions are from the outer surface, which has most of the corneal power. As stated in the previous section, the change in aberration from accommodation is roughly of the same magnitude as the total aberrations in any accommodation condition (90). So the lens is not a minor contributor. A recent study by Artal et al. measured the internal ocular surfaces in living eyes by using a Hartmann–Shack wave-front sensor. The aberrations of the cornea were neutralized by immersing the cornea in water. They found that the aberrations of the internal ocular surfaces in a 4-mm pupil, were often higher than the aberrations of the whole eye and concluded that the lens in some eyes must have a compensatory effect on the aberrations of the cornea (89,91). Another study compared the aberrations of the cornea with the total aberration of the eye (92). They found only one of three eyes where the internal surfaces had a compensatory effect on the corneal aberrations. Bartsch et al. obtained slight improvements in image quality in a scanning laser ophthalmoscope after correcting for the cornea by using a contact lens. They imaged the retina using a 7-mm scanning beam (93). The Bartsch et al. results suggest that, although the lens may have a compensatory effect, the aberrations of the cornea still dominate. In general, these early results indicate that there is some compensatory effect of the lens, but it is not present
548
HUMAN VISUAL SYSTEM — IMAGE FORMATION
in all eyes, and there is individual variability. Some variability of the results may be due to studies using different pupil sizes. Aberrations and Refractive Error. Defocus is the dominant aberration in the human eye. It is also the easiest to correct. But do high refractive errors come with increased aberrations? Few studies have addressed this question to date, but several studies are currently underway, and early results suggest that high myopes tend to suffer from higher aberrations also (94), even though the one published study states otherwise (95). These findings raise some interesting questions. For example, because the eye uses a feedback system to maintain emmetropia, then does the presence of aberrations disrupt the feedback system to an extent that leaves the eye myopic? Or, conversely, does the myopic eye lack a sufficient feedback signal to drive the correction of higher order aberrations? The relationship between refractive error and aberrations remains unknown at this time. Scattering in the Human Eye Scattering occurs whenever light encounters refractive index discontinuities in its path. The specific types of scattering that can occur depend on the nature of these discontinuities. In a simple example, light striking a spherical particle has a scattering profile that depends on the relationship between particle size and wavelength. Scattering from particles that are much smaller than the wavelength of light is equal in the forward and backward directions. This is called Rayleigh scattering. As the particle size increases to a size that is close to the wavelength, the scattering, called Mie scattering, is predominantly in the forward direction. Finally, when particle sizes are much greater than the wavelength, geometrical approximations can be applied, and typically all of the scattered light is in the backward direction. Relative proportions of scattered light for different scattering particle sizes are listed in Table 2. Scattering depends on particle size with respect to wavelength, so it follows that the amount of scattered light for a fixed particle size depends on wavelength. But
only a small relationship has ever been observed in the human eye (96). There is no doubt that Rayleigh scattering will exist; in fact, it has been measured in the nuclei of excised human lenses (97), but the scattering in the optics of the eye is dominated by the larger scattering particles, whose scattering has less wavelength dependence. How scattering affects the human eye depends on the specific application. For vision, the most important type of scatter would be forward scatter. However, for retinal imaging, both forward and backward scatter affect image quality. When scattering increases in the eye, it is primarily the amplitude of the scatter that changes, not the angular distribution of the scattered light. It has been found that the change in scatter as a function of angle follows the rule k Leq = 2 Egl θ
(4)
where Leq is the equivalent veiling luminance (in cd/m2 ), Egl is the illuminance at the eye, k is a constant, and θ is the scattering angle (98). As shown in the equation, the profile of scattering in the eye as a function of angle follows a power law of approximately −2. This value is close to a constant, even in the presence of cataract. With regard to k, the Stiles–Holladay approximation (99,100) puts the value of this constant at 10 for a healthy eye, but more recent work has determined the important dependency of the constant on age (101,102); also see (98). Transmission by the Human Eye The optics of the eye are not completely transparent across the range of visible wavelengths. Even though almost all red light incident on the cornea reaches the retina, a significant fraction of light toward the blue end of the spectrum does not, and the amount of light that is absorbed changes dramatically as the eye ages. All of the optical components, including the aqueous and vitreous, act as band-pass filters in the human eye. But the cornea and the vitrei have bandwidths that essentially exceed the visible spectrum. The lens, on the other hand, has significant absorption at the blue end of the visible spectrum, cutting off most of the
Table 2. Scattering of a Spherical Particle (n = 1.25) as a Function of Size and Direction of Scatter Rayleigh Regime →=← Mie Regime θ 0° (forward) 90° (sideways) 180° (backward)
λ aa = 100π
λ aa = 10π
1.00
10,000.00
9.60 × 106
4.60 × 108
2.15 × 109
7.84 × 1010
2.34 × 1011
0.50
5,000.00
4.03 × 106
7.36 × 107
1.25 × 108
2.20 × 108
2.23 × 108
1.00
9,800.00
6.24 × 106
3.82 × 106
1.01 × 107
1.02 × 108
2.81 × 107
aa =
λ 2π
aa =
λ π
aa =
λ 0.5π
aa =
λ 0.2π
aa =
λ 0.12π
a a is the diameter of the scattering particle. Scattered light in all directions is normalized to the amount of scattered light in the forward direction for the smallest particle size. All polarizations have been added for each scattering direction. The table shows that the total amount of scattered light increases but that the relative amount of backscattered light decreases as the scattering particle size increases. As the size of the scattering particles increases further, backscattering begins to dominate again (157).
HUMAN VISUAL SYSTEM — IMAGE FORMATION
light below 400 nm (103). Boettner and Wolters published a comprehensive study of the contributions of these components by directly measuring spectral transmission of freshly enucleated donor eyes (104). Their results may suffer from postmortem artifacts, but the graph is reproduced here as Fig. 11 because the data are presented in an informative way that illustrates the lens contribution relative to the other components and the cumulative effect of the absorbing tissue. The absorbance of blue light in the crystalline lens is commonly referred to as ‘‘yellowing’’ of the human lens. This yellowing of the lens increases dramatically as it ages. The lens optical density toward the blue wavelengths increases by about 0.1 to 0.15 log units per decade, provided that we assume that components other than the lens change little as they age (105,106). Accelerations in this rate have been found for eyes over age 60 (106,107), and it is proposed that this increase is due to the prevalence of cataract in that age group. When eyes that had cataract were excluded from the study, it was found that a linear increase in optical density versus age was maintained (105). The optical density of the lens is highly variable. The optical density for any given age group might span 0.8 log units or more (105). Tabulated data on the lens transmission spectrum can be found in Pokorny et al. (106) or in van Norren and Vos (103). Plots of the change in lens optical density versus age can be found in Savage et al. (105) and Delori and Burns (107). In addition to the absorbance by the optics, the retina itself has pigments that filter the light reaching the photoreceptors. The first filter arises from the blood vessels and capillaries that course through the retina and line the most anterior layers of the retina. No blood vessels lie
100 1 Percent transmittance
80
2 3
60
4
40 Total transmittance at the various anterior surfaces
20
1 Aqueous 2 Lens 300
3 Vitreous 4 Retina
400 500 600 800 1000 1200 1600 2000 Wavelength (µm)
Figure 11. The spectral distribution of light that reaches the retina depends on the transmission of the components that precede it. In this plot, the successive spectra at a series of interfaces are shown. The data are drawn from young eyes and are the ‘‘direct’’ transmittance (i.e., the spectra were measured in the vicinity of the focal point of the eye). By the time the light reaches the retina (curve 4), it is limited to a wavelength band from 400–1,200 nm [Reproduced from Fig 7 in (104)].
549
directly over the fovea, but in more peripheral regions, the areal coverage can be as high as 30% (108). Primary absorption in the blue end of the spectrum gives blood its red appearance but some absorption peaks also exist in the 500–600 nm spectral region. The choroid, which is posterior to the photoreceptors, is also rich in blood, which may alter the spectral properties of light scattered from the retina. The penetration of light into the choroid is reduced, however, by the presence of melanin pigment in the retinal pigment epithelium (see end of section). The most dominant filter in the retina is a yellowish pigment in the macular region near the fovea, called the macular pigment. Incidentally, this pigment is very similar to xanthophyll in the leaves of green plants (109). The amount of macular pigment is variable between individuals and ranges in optical density from 0.21 to 1.22 (110,111). The density depends on factors such as iris pigmentation, diet, and smoking (112). The density peaks at the fovea and declines exponentially to half its peak value at an average foveal eccentricity of 0.95° (±0.4° sd) visual angle or about 285 microns, and the distribution of macular pigment is nearly symmetrical about the fovea (113). The functional role of the macular pigment is still under debate. Some argue that the macular pigment is intended to filter out blue light (114), like an anatomical version of the yellow tinted sunglasses that are often worn by target shooters. Its role would be to increase the contrast of the retinal image by filtering out the out-of-focus blue light from the image. A more popular argument contends that the macular pigment reduces the amount of damaging blue light exposure. Reduced blue light exposure might prevent the onset of age-related macular degeneration, a disease that impacts central vision (115). Others contend that the macular pigment is simply a by-product of the increased metabolic activity that occurs in the cone-rich fovea. Whether accidental or intentional, the presence of macular pigment affects vision and reduces harmful blue light radiation. In addition to the macular pigment, Snodderly et al. found two other yellow pigments that have lower optical density and change more slowly across the retina (116). These pigments are not expected to have a significant impact on vision because they have spectral peaks of 435 and 410 and have maximum optical densities of less than 0.05. However, they should not be ignored if one wants to infer the spectral properties of the macular pigment by comparing psychophysical responses between the periphery and the fovea. The areal distribution of the macular pigment and these other pigments are shown in Fig. 12. Finally, there are other pigments in the eye, such as the blood in the choroid, that do not lie along the direct optical path from the object to the retina but still might affect vision via multiple scattering from deeper layers in the retina. But the eye is equipped with an optically dense, wideband, absorbing layer of melanin in the retinal pigment epithelium that lies immediately posterior to the photoreceptor layer. The retinal pigment epithelium’s primary role is to maintain and nourish the photoreceptors. It is also considered responsible for
Maximum absorbance
550
HUMAN VISUAL SYSTEM — IMAGE FORMATION
Macular pigment
0.3
P435 P410 0.2
0.1
0 −1000
−500
0
500
1000
Retinal eccentricity (microns) Figure 12. The concentration of some of the retinal pigments varies with retinal location. P435 and P410 are relatively uniform across the retina compared to the macular pigment, which is concentrated near the fovea [after (116)].
mopping up light that is unabsorbed by the photoreceptors. This light would otherwise penetrate and scatter from the choroidal layers. The spectra of all of the important pigments that are not part of the optical system are shown in Fig. 13. Polarization Properties of the Ocular Optics The speed of propagation of light in a medium depends on the structure of that medium. Some media, particularly crystalline media, can be anisotropic, that is, the atomic structure of the crystal differs in different directions. The speed of propagation in these anisotropic media depends
3 Melanin
Optical density
2 HbO2
1
on the direction of the applied electromagnetic field in relation to the crystal lattice. Thus, there can be two indexes of refraction and two possible values of the speed of propagation of light in a crystal in any given direction. Polarization refers to the direction of the applied field perpendicular to the direction of propagation. Any polarization orientation can be expressed as the sum of two orthogonal components, and their respective directions are along the crystal axes. The two values of the speed of propagation, therefore, are associated with the mutually orthogonal polarizations of light waves. It has been found that the human cornea has polarization-altering properties, and there have been two common models for the cornea that give rise to these properties. The first is to assume the cornea has a uniaxial structure. There is only one optical axis in a uniaxial crystal. An optical axis defines a propagative direction for which both orthogonal components of the polarization travel at the same speed. This implies that in two major polarization directions, there is no difference in the structure and there is only one light-propagative direction for which both orthogonal directions of polarization are retarded equally. That direction is normal to the corneal surface. This model, applied to the cornea, assumes that there is essentially a random orientation of the sheets of lamellae in the corneal stroma. If the optical axis is normal to the surface of the cornea at its apex, then its curved structure will give rise to polarization effects for more peripheral rays because it introduces a component in the polarization that is normal to the surface. This property was measured in the cat by Stanworth and Naylor (117). A more sophisticated and accurate model assumes that the cornea has a biaxial structure in which there are two optical axes in the crystal (118–121). In each major direction in the plane of the corneal surface, there is a different structure. In such crystals, there are two directions of light propagation for which both orthogonal components of the light travel at the same speed. The biaxial nature of the human cornea arises because of the preferred orientation (nasally downward) of the lamellar sheets in the stroma. The results of measured birefringence from a number of authors are shown in Table 3. A plot showing the polarization altering properties of the cornea from the paper of van Blokland and Verhelst is shown in Fig. 14.
P435 P410
Table 3. Birefringent Properties of the Cornea MP
0 400
450
500
550
600
Wavelength (nm)
Figure 13. Plot of some of the important retinal pigments. Macular pigment, P410, and P435 are anterior to the photoreceptor layer. Hemoglobin (HbO2 ) absorption occurs both anterior to and posterior to the photoreceptors in both retinal vasculature and the choroidal vasculature, respectively. Melanin resides posterior to the photoreceptors in the retinal pigment epithelium and the choroid.
Species
nz − nx
ny − nx
Ref.
Human Human Cat Bovine Cat
0.00159 0.0020 0.0014 NA 0.0017
0.00014 NA 0 0.00013 NA
(118) (158) (117) (159) (160)
a nx is the index along the slow axis (down and nasal); ny is the index in the orthogonal direction, tangential to the corneal surface; and nz is the index in a direction normal to the corneal surface. The birefringence between lateral and normal (nz − nx ) is about an order of magnitude greater than between nx and ny , which explains why some investigators like (117) were close when they assumed that the eye was simply a curved uniaxial crystal.
HUMAN VISUAL SYSTEM — IMAGE FORMATION
previous section] of the pupil will vary greatly and might give rise to a degraded (or improved) PSF (122). The lens contributes very little to the overall polarization in the eye. Its values of birefringence (comparing radial vs. tangential index of refraction) are more than two orders of magnitude less than that found in the cornea. Bettelheim finds birefringence values in the range of 10−6 to 10−7 (124). Still, an effect of crystalline lens birefringence can be observed if one looks at an isolated lens through a pair of crossed polarizers (125). Such an example of a lens imaged though crossed polarizers is shown in Fig. 15.
c
BV_OD
Position in pupil plane (mm)
3 100
150
75
551
125
50 0
50
RECEIVING THE RETINAL IMAGE
100 3
Sampling by the Photoreceptor Mosaic
Figure 14. Birefringent properties of the cornea alter the polarization of transmitted light. This plot shows the orientation, ellipticity, and retardation after a double passage of 514-nm light through the ocular optics. The incident beam was positioned centrally, and each diamond represents a measured corneal location. The changes in ellipticity and orientation of the polarization of the emerging light are indicated by the orientation and ratio of the short to the long axes of each diamond. The contour lines represent points of equal retardation in 25-nm intervals. The plot has a saddle-shaped function, pointing nasally downward, a direction consistent with the preferred orientation of lammelar fibers that comprise the corneal stroma. [Reproduced from Fig. 1 in (118)].
The retina is lined by millions of tiny cells, called photoreceptors, that sample the image that reaches the retina. Unlike a CCD camera, this array is far from uniform. Each photoreceptor is a like a fiberoptic waveguide that funnels the light into its outer segment, which is filled with photosensitive pigment. Rod photoreceptors make up by far the majority of the photoreceptors, totaling more than 100 million in a typical human retina. These cells are very sensitive and can signal the absorption of a single photon. Cone photoreceptors are less sensitive than rods and come in three types, each sensitive to a different portion of the visible light spectrum. The combined signals from the three cone-types provide color vision to the human eye at high light levels. There are only about 5 million cones, and they are most dense
How might this affect image quality in the eye? A biaxial crystal has different indexes of refraction; hence, phase errors depend on the orientation of the polarization of the light. Take, for example, the calcite crystal, one of the most birefringent structures in nature. The difference in the refracted angle between two orthogonal polarizations is 6.2° . If calcite were used in an imaging system, the PSF would be comprised of two discrete points. But the birefringence of calcite is 0.172, which is three orders of magnitude higher than that found in the cornea. Nonetheless, the effects of birefringence of the optics are likely to play a role, albeit a minor one, compared to aberrations in degrading retinal image quality (122). The difference in phase error across the cornea between two orthogonal states of polarization is less than one-tenth of a wave in the center of the cornea (118,123). Differences of about one-third of a wave have been measured at the edge of the optics (118). Now, consider a case where one is looking through a polarizing filter at a retina that is illuminated by polarized light (laser). In this case, because the degree and orientation of the emergent light varies in its polarized state, it is conceivable that the polarizing filter will attenuate some areas of the pupil, and other areas will transmit fully. According to the figures by van Blokland and Verhelst, this attenuation can be as such as 100%. So the amplitude modulation [see equation for P(x, y) in a
Figure 15. This picture shows the Maltese cross appearance of the isolated crystalline monkey lens when viewed by a polarizer-analyzer combination. The pattern arises because of the birefringent structures in the lens. (Courtesy of Adrian Glasser and Austin Roorda).
I
T
3
0
3
N
Position in pupil plane (mm)
HUMAN VISUAL SYSTEM — IMAGE FORMATION
180000 160000 140000 120000 100000 80000 60000 40000 20000 0
Cones Rods Optic disk
Spatial density (#/mm 2)
552
Nasal
−25 −20 −15 −10
Temporal −5
0
5
10
15
20
Retinal eccentricity (mm) Figure 16. This classic figure generated from Østerberg’s data shows the spatial density of rods and cones in the human retina. It illustrates the extreme nonuniformity of photoreceptor distributions across the retina. Only cones comprise the fovea, but rods quickly dominate outside that area. The decrease in spatial density toward the periphery occurs because of an increase in photoreceptor size with eccentricity (generated from Table 3 in (154)].
10 arcmin (48.6 µm) Figure 17. The photoreceptor mosaic shown here was taken of a living human retina using an adaptive optics ophthalmoscope (155). This image shows photoreceptors at a location 1° from the foveal center. The cones are about 5 microns in diameter and are packed into a quasi-crystalline array. Rods are also present in this portion of the retina, but they were too small to be resolved by the ophthalmoscope (courtesy of Austin Roorda and David Williams).
near the posterior pole of the eyeball in an area called the macula, which contains the fovea. The density of cones and rods across the retina is illustrated in Fig. 16. An actual image of the photoreceptor array in a living human eye is shown in Fig. 17. Circuitry in the retina transforms signals from the photoreceptors into a compact representation of color and luminance across the visual field. In fact, the
number of wires (optic nerves) that carries the signal to the brain totals only 1.2 million. This reduction is accomplished by devoting only a disproportionate amount of wiring to a small region of the retina called the fovea, which is comprised only of cones. The number of optic nerves for each cone in the fovea is estimated to be from two to four (126,127). Across the rest of the retina, signals from individual rod and cone photoreceptors are pooled together, thereby reducing the number of dedicated optic nerves. The impressive feature of the human retina is that it offers both high visual acuity and a large field of view and uses fewer fibers than there are pixels in a modern CCD camera! The nature of the detecting surface is very important for vision because the sampling array imposes a final limit on what is seen and not seen by the human eye. Nyquist’s sampling theorem states that to measure a spatially varying signal properly, one must sample it at twice the frequency. In the human retina, the density of the sampling is greatest at the fovea, where there is one optic nerve for each photoreceptor. The typical packing density of cones in the human fovea is about 199,000 cones per mm2 (128). This converts to a lateral sampling density of 480 cycles/mm or 140 cycles/degree. [NOTE: In vision science, space is often represented in degrees because its value is the same in object space as in image space.] Given this sampling density, the maximum spatial frequency that the eye could resolve would be 70 cycles/degree. To measure this property, investigators had to project spatial frequency patterns on the retina that were unaffected by the optics of the eye. This was done by projecting interference fringes directly onto the retina via two tiny, mutually coherent entrance beams (129,130). The interferometric technique is simply Young’s doubleslit phenomenon where the fringes are projected directly onto the retina. This technique has been used since to isolate effectively the optical and neural factors affecting vision (12,131,132). The human retina is very economical in design. It has a sampling array that is just high enough to detect the spatial frequencies that the optical system can transmit to the retina. For example, the cutoff spatial frequency of a 2-mm pupil for 550 nm light is 77 cycles/degree. This is closely matched to the maximum frequency that the retina is equipped to detect. For larger pupil sizes, the cutoff would increase linearly, but because of aberrations, the practical spatial frequency cutoff does not increase (13). At retinal locations away from the fovea, the sampling density drops off precipitously due to an increase in photoreceptor size and a decrease in the optic nerves:photoreceptor ratio. To deal with this drop-off in vision, the eye (and head) is equipped with muscles that allow us to align objects of regard so that their images always land on the fovea. The Stiles–Crawford Effect Rod and cone photoreceptors act as fiber-optic waveguides. Because they are optical waveguides, they have angular tuning properties. In the human eye, these properties give rise to both perceptual and reflective effects, which
HUMAN VISUAL SYSTEM — IMAGE FORMATION
With angular tuning
Without angular tuning
pil
ent
ry (
0
mm
) Temp 4 or
al
Log relative sensitivity
are important for vision and for imaging of the retina, respectively. The perceptual effect, called the Stiles–Crawford effect, describes the change in the perceived brightness of light as a function of the pupil location at which the light enters. (Changing the pupil location of the entrance beam changes the angle at which the incident light strikes the photoreceptor.) The tuning is sensitive enough that the angular sensitivity reduces to about half at the edge of a 8-mm pupil (see Figure 18) (133). The reflected light from the photoreceptors has similar directional properties. If the light emerging from the eye from a small illuminated patch of retina is measured as a function of pupil position, the reflectance is highest near the center of the pupil (134–138). Directional effects on the way into and out of the eye play an important role in determining image quality in the human eye. Although the optical system will produce a point spread on the retina, the retina preferentially accepts that portion of the PSF that is generated by the more central rays. It is hypothesized that the reason for this angular tuning is to suppress the contribution from the more aberrated peripheral rays in the optics to improve image quality. The expected improvements near best focus are quite small (139), but Zhang et al. showed that for defocused images, the Stiles–Crawford effect tends to increase depth of focus and prevent phase reversals (140). For imaging, the light emerging from the eye will always be weighted more for central rays than for peripheral rays.
553
Pu
4 Inferio
entry
4 sal
0 Pupil
Supe rior )
(mm
4
Na
r
Figure 19. We calculated the PSF of a typical human eye with and without incorporating the Stiles–Crawford effect in the calculation. PSFs were calculated for 550-nm light over a 7-mm pupil at the focal plane that had the highest Strehl ratio. The tuning value we chose was ρ = 0.047 mm−1 . The angular tuning has the effect of apodizing rays from the margins of the pupil. The result is that high spatial frequencies in the PSF are attenuated and low spatial frequencies are enhanced.
Figure 18. The optical fiber properties of the photoreceptors give rise to a perceived phenomenon called the Stiles–Crawford effect. This plot shows the log relative sensitivity of the eye to light for changing pupil entry positions. Changing pupil entry position is the method used to change the angle of the incident beam on the retina. The sensitivity to light reduces to about half at the edges of an 8-mm pupil [Reproduced from Fig. 1 in (133)].
A simple way to incorporate these angular tuning effects into computations for image quality is to project the tuning properties into the pupil function [see Eq. (1)]. A detector that has an angular tuning function that has reduced light sensitivity to light emerging from the edge of the pupil is, in effect, the same as a uniformly sensitive detector that has a pupil that has reduced transmission
554
HUMAN VISUAL SYSTEM — IMAGE FORMATION
at its edges. Likewise, light reflected from the retina that is preferentially directed toward the center of the exit pupil can be modeled as light from a uniformly reflecting retina passing through a pupil where absorbance increases toward the periphery. Figure 19 shows the computed PSF of an eye that has typical aberrations before and after angular tuning is incorporated into the pupil function. Figure 20 shows the corresponding average radial MTFs for the same pupil functions. Chromatic Sampling of the Retinal Image There are three types of cone photoreceptors; long (L), middle (M), and short (S) wavelength-sensitive cones. These are arranged in a single plane on the retina, and therefore, chromatic sampling on the retina is worse than spatial sampling. The S cones comprise only about 5% of the cones and thus are poor at spatial tasks. Furthermore, there are no S cones in the fovea. Without S cones, the eye confuses the distinction between blue and yellow in a condition called tritanopia. For this reason, the region around the fovea that is devoid of S cones is called the tritanopic zone. In the periphery, the S cones approach a partially crystalline arrangement (141) to achieve relatively uniform coverage. L and M cones comprise the remaining 95% of the cones in the retina; recent measurements have shown that these cones are randomly distributed in the retina (142,143). (see Fig. 21). This arrangement leaves the eye locally color-blind over patches as large as 5 minutes of arc. This phenomena goes largely unnoticed, but it is likely that it is one of the factors that
Log modulation
1 Best strehl, no SCE Best strehl with SCE
0.1
0.01 5 arc min (23.5 µm)
0.001
0.0001 0
50
100
150
200
250
Spatial frequency (c/deg) Figure 20. The MTFs in this plot show the spatial-frequencydependent effects of pupil apodization via cone angular tuning. When angular tuning is incorporated in calculations of the PSF in the focal plane that has the highest Strehl ratio, the contrasts for spatial frequencies that the eye can see (60 c/deg and below) are enhanced and higher spatial frequencies are attenuated. Therefore, the retinal image quality for a large pupil is improved and, at the same time, aliasing effects due to the presence of spatial frequencies beyond the eye’s maximum detectable spatial frequency are reduced. The SCE used ρ = 0.047 mm−1 and PSF calculations were done over a 7-mm pupil for a single eye. Incidentally, similar calculations of the MTF in the focal plane that had the lowest rms aberration show less of an effect.
Figure 21. Pseudocolor images of the trichromatic cone mosaic in two human eyes. Blue, green, and red colors represent the short (S), middle (M), and long (L) wavelength-sensitive cones, respectively. The two human subjects have a more than threefold difference in the number of L vs. M cones (142), yet both eyes have essentially the same color vision (156). In both mosaics shown, the arrangement of the S, M, and L cones is essentially random (143). See color insert.
gives rise to the blotchy colored appearance of highspatial-frequency gratings, a phenomena called Brewster’s colors (144). The Spectral Luminosity Function and Chromatic Aberration The combination of cone photopigments, adaptation, absorption, and retinal wiring limits the spectral response to wavelengths that make up the visible spectrum. The
HUMAN VISUAL SYSTEM — IMAGE FORMATION
−1
−0.5 −0.25 0
+0.25
of Campbell and Gubisch who found that contrast sensitivity improved by less than 0.2 log units across 10–40 cycles per degree of spatial frequencies when using monochromatic light (146). For large pupils, the image degradation is expected to be greater, but increased depth of focus that arises from the presence of monochromatic aberrations tends to reduce these deleterious effects (147).
+0.5
1.00
Relative luminosity
555
0.75
0.50
BIBLIOGRAPHY
0.25
1. G. B. Benedek, Appl. Opt. 10(3), 459–473 (1971). 2. J. Liang, B. Grimm, S. Goelz, and J. F. Bille, J. Opt. Soc. Am. A 11, 1,949–1,957 (1994).
0.00 350
400
450
500
550
600
650
700
Wavelength (nm) Figure 22. The spectral luminosity function of the eye limits the deleterious effect of chromatic aberration. This figure shows that if 550-nm light is in focus, more than 70% of the energy detected in the white light spectrum is within +/− 0.25 D of defocus. One-quarter of a diopter is considered a tolerable defocus for a human eye under typical conditions [after Fig. 3 in (145)].
3. L. N. Thibos and X. Hong, Optometry Vision Sci. 76(12), 817–825 (1999). 4. W. N. Charman and J Cronly-Dillon, eds., Visual Optics and Instrumentation, CRC Press, Boca Raton, 1991, p. 1–26. 5. H. C. Howland, J. Buettner, and R. A. Applegate, Vision Sci. Appl. Tech. Dig. 2, 54–57 (1994). 6. J. Schwiegerling and J. E. Greivenkamp, Optometry Vision Sci. 74(11), 906–916 (1997). 7. T. Oshika et al., Am. J. Ophthalmol. 127(1), 1–7 (1999).
exact spectrum to which we are sensitive depends on light levels. Because rods saturate at moderate and high light levels, cones largely govern the spectral sensitivity in that region. Because the more numerous M and L cones also mediate luminance, the spectral sensitivity function is weighted toward their end of the spectrum. This spectrum is referred to as the photopic luminous efficiency function. For lower light levels, cones lose sensitivity and rods, which are no longer saturated, dominate the response. At the lowest light levels, the spectral sensitivity curve is called the scotopic luminous efficiency function. The narrow spectral range of both of these functions filters some of the effects of chromatic aberration. Recall previously that the amount of chromatic aberration in the eye was about 2.2 D from 400 to 700 nm. This degree of chromatic aberration of the eye would normally be deleterious for image quality, but the severity of the chromatic aberration is lessened because the eye has a tuned spectral bandwidth. Thibos et al. illustrate this well in Fig. 22 showing that although the longitudinal chromatic aberration is 2 D, more than 70% of the luminous energy is confined to a defocus range of less than 0.25 D defocus on either side of focus, provided that the eye is optimally focused at 550 nm (145). An analysis of the optical quality of the eye in frequency space using the MTF further demonstrates that chromatic aberration contributes surprisingly little to image degradation in the eye. By calculating the MTF as a function of chromatic defocus and generating the white light MTF as a sum of all visible wavelengths, weighted by the luminance spectrum, Thibos et al. found that the uncorrectable blur due to chromatic aberration is equivalent to a monochromatic defocus of less than 0.2 D for an eye that has a 2.5-mm pupil, a tolerable amount of defocus in a human eye (145). These predictions support the earlier work
8. K. M. Oliver et al., J. Refract. Surg. 13(3), 246–254 (1997). 9. R. A. Applegate and H. C. Howland, J. Refract. Surg. 13, 295–299 (1997). 10. R. A. Applegate et al., J. Refract. Surg. 16, 507–514 (2000). 11. E. Moreno-Barriuso et al., Invest. Ophthalmol. Vision Sci. 42(6), 1,396–1,403 (2001). 12. F. W. Campbell and D. G. Green, J. Physiol. 181, 576–593 (1965). 13. F. W. Campbell and R. W. Gubisch, J. Physiol. 186, 558–578 (1966). 14. J. Liang and D. R. Williams, J. Opt. Soc. Am. A 14(11), 2,873–2,883 (1997). 15. H. Helmholtz and J. P. C. Southall, eds., Helmholtz’s Treatise on Physiological Optics, Optical Society of America, Rochester, 1924. 16. A. Glasser and M. C. W. Campbell, Vision Res. 39, 1,991– 2,015 (1999). 17. D. A. Atchison, Ophthal. Physiol. Opt. 15(4), 255–272, (1995). 18. A. Glasser and M. C. W. Campbell, Vision Res. 38(2), 209–229 (1998). 19. A. Duane, JAMA 59, 1,010–1,013, (1912). 20. J. R. Kuszak, B. A. Bertram, M. S. Macsai, and J. L. Rae, Scanning Electron Micros. 3, 1,369–1,378 (1984). 21. J. R. Kuszak, K. L. Peterson, J. G. Sivak, and K. L. Herbert, Exp. Eye Res. 59, 521–35 (1994). 22. T. Young, Philos. Trans. R. Soc. London 91, 23–88 (1801). 23. L. Matthiessen, Pflugers ¨ Archiv 27, 510–523 (1882). 24. J. C. Maxwell, Cambridge and Dublin Math. J. 8, 188–195 (1854). 25. M. C. W. Campbell, Vision Res. 24(5), 409–415 (1984). 26. R. H. H. Kr¨oger, M. C. W. Campbell, and R. Munger, Vision Res. 34(14), 1,815–1,822 (1994). 27. B. K. Pierscionek and D. Y. C. Chan, Optometry Vision Sci. 66(12), 822–829 (1989). 28. M. C. W. Campbell and A. Hughes, Vision Res. 21, 1,129– 1,148 (1981).
556
HUMAN VISUAL SYSTEM — IMAGE FORMATION
29. L. F. Garner and G. Smith, Optometry Vision Sci. 74(2), 114–119 (1997).
60. G. Westheimer, Vision Res. 22, 157–162 (1982).
30. D. A. Atchison and G. Smith, Optics of the Human Eye, Butterworth-Heinemann, Oxford, 2000.
62. M. Koomen, R. Tousey, and R. Scolnik, J. Am. Opt. Assoc. 39, 370–376 (1949).
31. M. Tscherning and C. Weiland, eds., Physiologic Optics, 4th ed., Keystone Publishing Co., Philadelphia, 1924.
63. A. Ivanoff, J. Opt. Soc. Am. 46, 901–903 (1956).
32. C. Cui and V. Lakshminarayanan, J. Opt. Soc. Am. A 15(9), 2,488–2,496 (1998). 33. A. G. Bennett and R. B. Rabbetts, Clinical Visual Optics, 2nd ed., Butterworths, London, 1989.
61. E. Jackson, Trans. Am. Ophthalmol. Soc. 5, 141–149 (1888).
64. G. van den Brink, Vision Res. 2, 233–244 (1962). 65. F. Berny, S. Slansky, J Home Dickon, eds., Optical Instruments and Techniques, Oriel Press, London, pp. 375–385 1969.
34. T. C. A. Jenkins, Br. J. Physiol. Opt. 20, 59–91 (1963).
66. J. Santamaria, P. Artal, and J. Bescos, J. Opt. Soc. Am. A 4, 1,109–1,114 (1987).
35. R. E. Scammon and M. B. Hesdorffer, Arch. Ophthalmol. 17, 104–112 (1937).
67. J. C. He, S. Marcos, R. H. Webb, and S. A. Burns, J. Opt. Soc. Am. A 15(9), 2,449–2,457 (1998).
36. R. A. Weale, Nature 198, 944–946 (1963).
68. I. Iglesias, E. Berrio, and P. Artal, J. Opt. Soc. Am. A 15(9), 2,466–2,476 (1998).
37. A. Guirao, M. Redondo, and P. Artal, J. Opt. Soc. Am. A 17(10), 1,697–1,702 (2000). 38. P. Artal, A. Guirao, and D. R. Williams, Invest. Ophthalmol. Vision Sci. Suppl. 40(4), 39 (1999). 39. H. C. Howland and B. Howland, J. Opt. Soc. Am. 67, 1,508–1,518 (1977).
69. T. Salmon, L. N. Thibos, and A. Bradley, J. Opt. Soc. Am. A 15(9), 2,457–2,465 (1998). 70. A. Guirao and P. Artal, Vision Res. 39, 207–217 (1999). 71. D. R. Williams et al., Vision Res. 36(8), 1,103–1,114 (1996).
40. M. S. Smirnov, Biophysics 6, 776–795 (1962).
72. J. A. M. Jennings and W. N. Charman, Vision Res. 21, 445–455 (1981).
41. M. CW. Campbell, E. M. Harrison, and P. Simonet, Vision Res. 30, 1,587–1,602 (1990).
73. P. Artal, A. M. Derrington, and E. Colombo, Vision Res. 35(7), 939–947 (1995).
42. R. H. Webb, M. Penney, and K. P. Thompson, Appl. Opt. 31, 3,678–3,686 (1992).
74. W. Lotmar and T. Lotmar, J. Opt. Soc. Am. 64(4), 510–513 (1974).
43. R. Navarro and M. A. Losada, Vision Sci. Appl: Tech. Dig. 1, 230–233 (1996).
75. M. Millodot and A. Lamont, J. Opt. Soc. Am. 64, 110–111 (1974).
44. G. Walsh, W. N. Charman, and H. C. Howland, J. Opt. Soc. Am. A 1, 987–992 (1984).
76. F. Rempt, J. Hoogerheide, and W. P. H. Hoogenboom, Ophthalmologica 162, 1–10 (1971).
45. W. S. Stiles and B. H. Crawford, Proc. R. Soc. London B. 112, 428–450 (1933).
77. C. E. Ferree, G. Rand, and C. Hardy, Arch. Ophthalmol. 5, 717–731 (1931).
46. D. Malacara, S. L. DeVore, and D. Malacara, eds., Optical Shop Testing, 2nd ed., Wiley-Interscience, New York, 1996, pp. 455–499. 47. J. W. Goodman, Introduction to Fourier Optics, McGrawHill, New York, 1968. 48. L. N. Thibos, R. A. Applegate, J. Schwiegerling, and R. H. Webb, and VSIA Standards Taskforce Members, V. Lakshminarayanan, ed., Standards for Reporting the Optical Aberrations of Eyes, Optical Society of America, Washington, DC, 232–235 2000. 49. M. A. Wilson, M. CW. Campbell, and P. Simonet, Optometry Vision Sci. 69, 129–136 (1992).
78. P. Artal, S. Marcos, R. Navarro, and D. R. Williams, J. Opt. Soc. Am. A 12, 195–201 (1995). 79. A. Roorda and M. C. W. Campbell, Invest. Ophthalmol. Vision Sci. Suppl. 35, 1,258 (1994). 80. R. Navarro, E. Moreno, and C. Dorronsoro, J. Opt. Soc. Am. A 15(9), 2,522–9 (1998). 81. A. Roorda and W. R. Bobier, J. Opt. Soc. Am. A 13, 3–11 (1996). 82. P. Artal and R. Navarro, Appl. Opt. 31, 3,646–3,456 (1992). 83. D. A. Atchison et al., Vision Res. 35, 313–323 (1995).
50. G. Walsh, Ophthal. Physiol. Opt. 8, 178–181 (1988).
84. C. Lu, M. CW. Campbell, and R. Munger, Ophthalmic Visual Opt. Tech. Dig. 3, 160–163 (1994).
51. H. J. Wyatt, J. Ocular Pharmacol. Ther. 12(4), 441–459 (1996).
85. J. C. He, S. A. Burns, and S. Marcos, Vision Res. 40(1), 41–48 (2000).
52. R. Bracewell, The Fourier Transform and its Applications, McGraw-Hill, New York, 1965.
86. A. Glasser and M. CW. Campbell, Vision Sci. Appl.: Tech. Dig. 1, 246–249 (1996).
53. A. Guirao and D. R. Williams, Invest. Ophthalmol. Vision Sci. Suppl. 42(4), 98 (2001).
87. H. Hofer, J. Porter, and D. R. Williams, Invest. Ophthalmol. Vision Sci. Suppl. 39(4), 203 1998.
54. D. C. Knill, D. Field, and D. Kersten, J. Opt. Soc. Am. A 7(6), 1,113–1,123 (1990).
88. M. Mrochen, M. Kaemmerer, and T. Seiler, J. Refract. Surg. 16, 116–121 (2000).
55. D. J. Field and N. Brady, Vision Res. 37(23), 3,367–3,383 (1997).
89. P. Artal and A. Guirao, Opt. Lett. 23, 1,713–1,715 (1998).
56. S. Marcos, S. A. Burns, E Moreno-Barriuso, and R. Navarro, Vision Res. 39, 4,309–4,323 (1999). 57. P. Simonet and M. C. W. Campbell, Vision Res. 30(2), 187–206 (1990).
90. H. Hofer, P. Artal, J. L. Aragon, and D. R. Williams, J. Opt. Soc. Am. A 18(3), 497–506 (2001). 91. P. Artal, A. Guirao, E. Berrio, and D. R. Williams, Invest. Ophthalmol. Vision Sci. Suppl. 42(4), 895 (2001)
58. L. N. Thibos et al., Vision Res. 30(1), 33–49 (1990).
92. T. Salmon and L. N. Thibos, [Abstract] OSA Annu. Meet. Tech. Dig. p. 70 1998.
59. M. Rynders, B. Lidkea, W. Chisholm, and L. N. Thibos, J. Opt. Soc. Am. A 12(10), 2,348–2,357 (1995).
93. D. Bartsch, G. Zinser, and W. R. Freeman, Vision Sci. Appl.: Tech. Dig. pp. 134–137 1994.
HUMAN VISUAL SYSTEM — IMAGE FORMATION 94. J. C. He et al., Myopia 2000 Meet., Boston, July, 2000; S. Marcos et al., ibid.; X. Cheng et al., ibid.; M. C. W. Campbell et al., ibid.
557
127. J. Sjostrand, N. Conradi, and L. Klar´en, Graefe’s Arch. Clin. Exp. Ophthalmol. 232, 432–437 (1994). 128. C. A. Curcio, K. R. Sloan, R. E. Kalina, and A. E. Hendrickson, J. Comp. Neurol. 292, 497–523 (1990).
95. M. J. Collins, C. F. Wildsoet, and D. A. Atchison, Vision Res. 35, 1,157–1,163 (1995).
129. G. Westheimer, J. Physiol. 152, 67–74 (1960).
96. D. Whitaker, R. Steen, and D. B. Elliot, Optometry Vision Sci. 70(11), 963–968 (1993).
130. M. A. Arnulf and M. O. Dupuy, C. R. Acad. Sci. (Paris) 250, 2,757–2,759 (1960).
97. T. J. van den Berg, Invest. Ophthalmol. Vision Sci. 38(7), 1,321–1,332 (1997).
131. D. R. Williams, Vision Res. 28, 433–454 (1988).
98. J. J. Vos, CIE J. 3(2), 39–53 (1984). 99. W. S. Stiles, Proc. R. Soc. London B 104, 322–355 (1929).
132. D. R. Williams, D. Brainard, M. J. McMahon, and Navarro, J. Opt. Soc. Am. A 11, 3,123–3,135 (1994).
R.
100. L. L. Holladay, J. Opt. Soc. Am. 12, 271–319 (1926).
133. R. A. Applegate and V. Lakshminarayanan, J. Opt. Soc. Am. A 10(7), 1,611–1,623 (1993).
101. M. L. Hennelly, J. L. Barbur, D. F. Edgar, and E. G. Woodward, Ophthalmol. Physiol. Opt. 18(2), 197–203 (1998).
134. J. -M. Gorrand, Vision Res. 19, 907–912 (1979).
102. J. K. Ijspeert, P. W. de Waard, T. J.van den Berg, and P. T. de Jong, Vision Res. 30(5), 699–707 (1990).
136. J. -M. Gorrand, R. Alfieri, J. -Y. Boire, Vision Res. 24, 1,097–1,106 (1984).
103. D. van Norren and J. J. Vos, Vision Res. 14, 1,237–1,244 (1974). 104. E. A. Boettner and J. R. Wolter, Invest. Ophthalmol. Vision Sci. 1(6), 776–783 (1962). 105. G. L. Savage, G Haegerstrom-Portnoy, A. J. Adams, and S. E. Hewlett, Clin. Vision. Sci. 8(1), 97–108 (1993). 106. J. Pokorny, V. C. Smith, and M. Lutze, Appl. Opt. 26(8), 1,437–1,440 (1987). 107. F. C. Delori and S. A. Burns, J. Opt. Soc. Am. A 13(2), 215–226 (1996). 108. D. M. Snodderly, R. S. Weinhaus, and J. C. Choi, J. Neuroscience 12(4), 1,169–1,193 (1992). 109. O. Sommerburg, J. E. Keunen, A. C. Bird, and F. J. van Kuijk, Br. J. Ophthalmol. 82(8), 907–910 (1998). 110. P. Pease, A. J. Adams, E. Nuccio, Vision Res. 5, 705–710 (1973). 111. D. M. Snodderly, J. D. Auran, and F. C. Delori, Invest. Ophthalmol. Vision Sci. 25(6), 674–685 (1984). 112. B. R. J. Hammond, B. R. Wooten, and D. M. Snodderly, Vision Res. 36(18), 3,003–3,009 (1996). 113. B. R. J. Hammond, B. R. Wooten, and D. M. Snodderly, J. Opt. Soc. Am. A 14(6), 1,187–1,196 (1997). 114. J. T. Landrum, R. A. Bone, and M. D. Kilburn, Adv. Pharmacol. 38, 537–556 (1997). 115. B. R. J. Hammond, B. R. Wooten, and D. M. Snodderly, Invest. Ophthalmol. Vision Sci. 39, 397–406 (1998). 116. D. M. Snodderly, P. K. Brown, F. C. Delori, and J. D. Auran, Invest. Ophthalmol. Vision Sci. 25(6), 660–673 (1984).
135. J. -M. Gorrand, Ophthalmol. Physiol. Opt. 9, 53–60 (1989).
137. G. J. van Blokland, Vision Res. 26, 495–500 (1986). 138. S. A. Burns, S. Wu, F. C. Delori, and A. E. Elsner, J. Opt. Soc. Am. A 12, 2,329–2,338 (1995). 139. D. A. Atchison, D. H. Scott, A. Joblin, and G. Smith, J. Opt. Soc. Am. A 18(6), 1,201–1,211 (2001). 140. X. Zhang, M. Ye, A. Bradley, and L. N. Thibos, J. Opt. Soc. Am. A 16(4), 812–820 (1999). 141. K. Bumstead and A. Hendrickson, J. Comp. Neurol. 403, 502–516 (1999). 142. A. Roorda and D. R. Williams, Nature 397, 520–522 (1999). 143. A. Roorda, A. B. Metha, P. Lennie, and D. R. Williams, Vision Res. 41(12), 1,291–1,306 (2001). 144. Brewster D., London Edinboro Philos. Mag. J. Sci. 1, 169–174 (1832). 145. L. N. Thibos, A. Bradley, and X. X. Zhang, Optometry Vision Sci. 68(8), 599–607 (1991). 146. F. W. Campbell and R. W. Gubisch, J. Physiol. 192, 345–358 (1967). 147. G. Yoon, I. Cox, and D. R. Williams, Invest. Ophthalmol. Vision Sci. Suppl. 40(4), 40 (1999). 148. S. A. Strenk et al., Invest. Ophthalmol. Vision Sci. 40(6), 1,162–1,169 (1999). 149. G. Wald and D. R. Griffin, J. Opt. Soc. Am. 37, 321–336 (1947). 150. R. E. Bedford and G. Wyszecki, J. Opt. Soc. Am. 47, 564–565 (1957). 151. A. Ivanoff, C. R. Acad. Sci. 223, 170–172 (1946).
117. A. Stanworth and E. Naylor, Br. J. Ophthalmol. 34, 201–211 (1950).
152. I. Iglesias, N Lopez-Gill, P. Artal, J. Opt. Soc. Am. A 15(2), 326–339 (1998).
118. G. J. van Blokland and S. C. Verhelst, J. Opt. Soc. Am. A 4, 82–90 (1987).
153. R. I. Calver, M. J. Cox, and D. B. Elliot, J. Opt. Soc. Am. A 16(9), 2,069–2,078 (1999).
119. C. C. D. Shute, Nature 250, 163–164 (1974).
154. G. Østerberg, Acta Ophthalmologica Suppl. 6, 1–103 (1935).
120. D. J. Donohue, B. J. Stoyanov, R. L. McCally, and R. A. Farrel, Cornea 15(3), 278–285 (1996).
155. J. Liang, D. R. Williams, and D. Miller, J. Opt. Soc. Am. A 14(11), 2,884–2,892 (1997).
121. D. J. Donohue, B. J. Stoyanov, R. L. McCally, and R. A. Farrel, J. Opt. Soc. Am. A 12(7), 1,425–1,438 (1995).
156. D. H. Brainard et al., J. Opt. Soc. Am. A 17(3), 607–614 (2000).
122. J. M. Bueno and P. Artal, J. Opt. Soc. Am. A 18(3), 489–496 (2001).
157. M. Born and E. Wolf, Principles of Optics, Pergamon Press, New York, 1980.
123. J. M. Bueno, Vision Res. 40(28), 3,791–3,799 (2000).
158. L. J. Bour and N. J. Lopes Cardozo, Vision Res. 21, 1,413–1,421 (1981).
124. F. A. Bettelheim, Exp. Eye Res. 21, 231–234 (1975). 125. D. G. Cogan, Arch. Ophthalmol. 25(3), 391–400 (1941). 126. C. A. Curcio and K. A. Allen, J. Comp. Neurol. 300, 5–25 (1990).
159. D. Kaplan and F. A. Bettelheim, Exp. Eye Res. 13, 219–226 (1972). 160. D. Post and G. E. Gurland, Exp. Eye Res. 5, 286–295 (1966).
558
HUMAN VISUAL SYSTEM — SPATIAL VISUAL PROCESSING
HUMAN VISUAL SYSTEM — SPATIAL VISUAL PROCESSING KAREN K. DE VALOIS RUSSELL L. DE VALOIS University of California, Berkeley Berkeley, CA
INTRODUCTION The human visual system is a remarkably versatile and powerful spatial image processor. The eye itself is only a small part of the visual mechanism. The task of analyzing and comprehending the visual world is so complex that the brain devotes between one-third and one-half of the entire cerebral cortex to solving it: all of the occipital lobe and large parts of the parietal, temporal, and frontal lobes as well. To appreciate the magnitude of the task of visual processing, it is useful to consider some of the problems that the visual system must address in the course of analyzing visual images. Some Initial Problems Faced by the Visual System Objects Do Not Emit Light. One set of initial problems faced by the visual system results from the fact that most naturally occurring objects do not themselves emit light; rather, they merely reflect some proportion of whatever light is incident upon them. Each photoreceptor receives a certain number of photons coming from a given direction in space, but that number is the product of the intensity of the illuminant (the light source) and the reflectance of the object at that location. Somehow the visual system must disentangle the image characteristics due to the object of interest from those due to the illuminant. Given no independent knowledge about the intensity of the illuminant, this is one equation that has two unknowns and is formally unsolvable. This is not a trivial problem because the intensity of illumination varies enormously in the course of a day and also between sunny and shaded regions within a given scene. Knowing that 1,000 photons came from a given direction, for instance, does not in any way allow one to know whether the object out there is black or white or something in between. Mismatch of Light Levels Versus the Dynamic Range of Neurons. A related problem is that there is a large mismatch between the very limited dynamic range over which individual neurons can signal and the huge range of luminances in the visual world. In the course of 24 hours, the intensity of light coming from a given outdoor surface may vary by a factor of a billion or more, but the receptors have a more limited dynamic range, and the neurons that carry information to the brain are even more limited; they can signal a range of at most 40 to 1 in spikes per 200 ms interval, the duration of the average fixation. What makes this problem tractable is that almost all of the huge range of light intensities is due to variations in the illuminant, which are largely irrelevant for vision, rather than to variations in the reflectance of visual objects. At a given illumination level, the whitest object reflects only about
20 times as much as the blackest object, so information about object reflectance could be encoded neurally if the enormous variations due to the illuminant could somehow be eliminated or at least minimized. Some Solutions to These Initial Problems There are three main mechanisms within the eye for dealing with these problems related to the illuminant. All involve minimizing the (presumed) contribution of the illuminant so as to capture the small amount of information in the visual image that is due to reflective objects. All operate on what we can consider a basic assumption made by the visual system, acquired in the course of evolution, that the mean light level averaged across (limited regions in) space and time is probably due to the illuminant and that only spatial and temporal variations around this mean level carry information about objects. One mechanism for discounting the illuminant is the pupil of the eye, which constricts as the mean light level increases and dilates as it decreases, thereby cutting down the total range of variations in the amount of light that enters the eye by a factor of about 16. A second mechanism for dealing with the problem of the dynamic range is the possession (by diurnal animals) of two separate, interlaced, visual systems, the rods and cones and their associated neural processing mechanisms. The rod system operates at low intensities and the cones at high, thus dividing the intensity range between them. A third, and by far the most important way the visual system minimizes the contribution of the illuminant is by multiple stages of adaptation that adjust the operating range of the neural processing system as light levels change. As a consequence, higher processes in the visual system only have to deal with a small portion of the large range of possible light intensities at any given time. The information available in the retinal image is not sufficient to allow an unambiguous solution to these problems of the illuminant or others that we discuss later. The visual nervous system must, in effect, make assumptions and guesses at each level of processing concerning the object characteristics most likely to have produced the image in question. Within the eye, most of the ‘‘assumptions’’, such as those discussed before, are built into the system as it develops under genetic control, having been incorporated through evolution. Many other ‘‘assumptions’’ about the visual world that are needed for object processing are developed through our individual experience in the visual world and are modifiable. The higher level computations are adjusted and refined by massive feedback loops at every central processing level. Given the degree to which the visual image is under specified, it is not surprising that we occasionally make errors of perception, some of which are quite dramatic. Optics, Filtering, Contrast Sensitivity, and Acuity The optical system of the eye refracts and focuses light from different directions in visual space to form a twodimensional image of the three-dimensional external world on the retina, the thin sheet of photoreceptors and neural processing cells that lines the posterior inner
HUMAN VISUAL SYSTEM — SPATIAL VISUAL PROCESSING
for sinusoidal luminance-varying patterns of each of many spatial frequencies. The inverse of this minimum detectable (or threshold) contrast is generally taken and used to define a contrast sensitivity function (CSF). The spatial luminance CSF can be strongly affected by the mean light level, the retinal position at which it is measured, and the temporal frequency at which the pattern is presented. For most imaging applications, however, it is reasonable to consider the CSF obtained at daylight (photopic) light levels when the pattern itself is stationary but presented only briefly, the observer has free eye movements, and the pattern is viewed directly (i.e., it is presented at or near the retinal fovea). Under those conditions, the spatial CSF is a band-pass function of spatial frequency, and peak contrast sensitivity is in the range of about 2–5 c/deg. The maximum contrast sensitivity under these conditions is often on the order of 200, which represents a minimum detectable contrast of around 0.005, or 0.5%, where contrast is defined as C = Lmax − Lmin /Lmax + Lmin , where C = contrast, Lmax = the maximum luminance of the pattern, and Lmin = the minimum luminance of the pattern. Contrast may be converted to percentage contrast. Figure 1 shows examples of CSFs measured for two normal human observers tested at low photopic light levels. It is customary to plot both spatial frequency and contrast sensitivity on logarithmic scales. Note that although contrast sensitivity is reasonably high in the middle spatial frequency range, it falls off rapidly at both higher and lower spatial frequencies. For most observers, the highest spatial frequency that can be detected when the pattern is normally imaged by the eye’s optical system does not exceed about 60 c/deg (see earlier). For many
Spatial contrast sensitivity for luminance variations 1000
Contrast sensitivity (1/threshold contrast)
surface of the eyeball. Since the process of image formation by the eye is discussed in the Optical Image Formation section, so we will not consider it further here except to mention briefly that the eye’s optical system acts as both a spatial and a chromatic filter. In considering the consequences of image formation by the eye’s optical system, it is useful to use the tools of linear systems analysis. The visual stimulus can be described in the space domain by quantifying the amount of light present at every discrete point in the image, or it can be described in the two-dimensional spatial frequency domain by identifying the frequency, amplitude, phase, and orientation of the sinusoidal variations in intensity across space into which the pattern can be analyzed (1). In the context of the visual system, spatial frequency is specified as cycles per degree of visual angle (c/deg), where the visual angle refers to the number of degrees of arc subtended at the eye. If it were possible to measure the modulation transfer function (MTF, the extent to which sine waves of different frequencies are attenuated in passing through the system) of the eye’s optical system directly, it would be seen that high spatial frequency information is markedly attenuated by the optics, but low spatial frequencies are affected very little. The common optical aberrations of the eye selectively reduce the amplitude of high spatial frequencies. Thus, the retinal image in most cases will differ from the external scene it represents; very fine spatial detail in the external scene may not be faithfully reproduced on the retina. In a normal observer, spatial frequencies higher than about 30–60 c/deg are not passed, and spatial frequencies that high are present in the retinal image only under optimal conditions. As the light level falls, the increasing diameter of the pupil increases optical aberrations and thus further degrades the high-spatial-frequency content of the retinal image. High spatial frequencies are also selectively attenuated by movement of the eye, when one is walking or driving and even when one attempts to hold the eyes still. The eyeball is unstably mounted in its orbit and cannot be held perfectly still. In addition to the information loss produced by compressing a threedimensional visual scene into a two-dimensional image, the spatial information available to the visual system is compromised by the eye’s optical system, its mounting, and by bodily movement. The visual system as a whole is complex and decidedly nonlinear in some respects, but in many circumstances, its behavior approaches that of a linear system. Under those conditions, the tools of linear systems analysis can also be profitably used for the behavior of the entire system. A good example is the spatial contrast sensitivity function, an analog to the modulation transfer function (MTF). To be strictly comparable to the MTF, a visual measure of perceived contrast would be required, but that is problematic. A widely used substitute measure is based on the assumption that the effective contrasts of all patterns are equal at their detection thresholds, that is, determining the minimum detectable contrast for a series of patterns identifies contrasts that are equal in their effects on the visual system. In practice, the minimum detectable contrast is first determined
559
100
10
1 0.1
1
10
100
Spatial frequency (c/deg) Figure 1. The spatial contrast sensitivity function for two normal human observers tested at a low photopic luminance level. The reciprocals of the threshold contrast required for detecting sine wave gratings of various spatial frequencies are plotted.
560
HUMAN VISUAL SYSTEM — SPATIAL VISUAL PROCESSING
individuals, it is lower still. The loss of contrast sensitivity at high spatial frequencies is attributable in part to the optics of the eye and in part to sampling limitations imposed by the density of photoreceptors within the retina, as discussed later in the section on Photoreceptors: Factors Limiting Resolution. In part, it may also result from sampling limitations imposed by later neural elements. The fall-off in sensitivity at high spatial frequencies is not unexpected in any comparable system. The reduction in contrast sensitivity at lower spatial frequencies cannot be accounted for either by optical system limitations or by the characteristics of the sampling photoreceptor mosaic. It is produced by neural interactions involving centersurround receptive fields (and later oriented receptive fields) such as those discussed in the sections on Anatomy and Physiology of Retinal Processing and on Selectivities later. In effect, it means that the visual system is insensitive to very gradual changes across space, as it is also to very finely spaced variations. The high-spatial-frequency cutoff frequency defines the resolution limit of the visual system. In a 60 c/deg square-wave grating, two nearest neighbor black (or white) stripes are separated by 0.5 arcmin, which is the smallest gap that can be visually resolved by an observer using the optical system of the eye. Clinical measures of spatial visual function typically estimate this value and use it to characterize the system. Clinically, a resolution limit of 1 arcmin (equivalent to a highfrequency cutoff of 30 c/deg) is considered normal, though many observers exceed that. It may seem arbitrary and unfortunate to limit clinical measures of spatial vision to determining just the resolution limit rather than the whole spatial frequency range, but in most cases it is sensible. Of the three factors that normally limit best visual acuity — optics, photoreceptor sampling, and neural processing — only the optical system can be corrected, and defects in the optics only produce losses at high spatial frequencies. Furthermore, refractive errors account for the great majority of all spatial vision deficits among normal observers. Thus, in most cases there is little to be gained from the time-consuming determination of the whole spatial CSF in the office of a busy clinician. The CSF is similar to an MTF for the entire visual system, but it does not reflect the filtering characteristics of the many individual mechanisms that together determine its shape. Rather, the CSF represents the envelope of the sensitivity functions of many more narrowly tuned mechanisms (often called channels). Each channel is band pass for both spatial frequency and orientation, thus two-dimensional spatial frequency. Following, we discuss the physiological organization responsible and the psychophysical tests that demonstrate its characteristics. The CSF, in combination with knowledge of the underlying channel organization, can be used to predict whether any arbitrarily selected pattern will be visible and whether or not two complex patterns are discriminable. For example, a square wave grating will first be detected when one of its component frequencies exceeds its own individual detection threshold (2). The overall contrast of the square wave is not the determining factor; rather, it is
the amplitude of the individual frequency components that make it up. Similarly, a square wave grating and a sine wave grating of the same intermediate spatial frequency will appear different only when the contrast is high enough that the third harmonic of the square wave reaches its own detection threshold. The optical system of the eye acts as a chromatic filter as well as a spatial filter. As the eye ages, the crystalline lens becomes progressively darker and less transparent and absorbs more and more of the light that reaches it. Because the lens selectively absorbs short-wavelength light, the spectral content of the retinal image is gradually shifted toward longer wavelengths as the lens ages. As a result, individuals who have a lens removed because of a cataract often notice a significant shift toward blue in the colors of familiar objects. VISUAL PROCESSING IN THE EYE Photoreceptors Rods and Cones. Once light reaches the retina, some of the photons are captured by photopigments in the photoreceptors and transduced into neural activity. The retina of each eye contains about 100 million rod photoreceptors and 3–5 million cones (3). Rods themselves are about four times as sensitive to light as cones (4), and the whole rod system is many times more sensitive than the cone system. Rods mediate vision at very low (scotopic) light levels. They all contain the same photopigment, rhodopsin, which absorbs light across the entire visible spectrum but is most sensitive to light in the middle of the visible spectrum; it has a peak sensitivity at about 500 nm. Cones, on the other hand, are divided into three types, according to which of the classes of photopigments they contain. The L (long-wavelength-sensitive) cones contain one of a class of photopigments that are maximally sensitive at 555–565 nm, the M (middle-wavelength) cones at 530–540 nm, and the S (short-wavelength) cones at 440 nm (5). Although each class of cone photopigments has a maximally sensitive range, all three have broad spectral sensitivity; the L and M cone pigments are sensitive across the whole visible wavelength range. There is variation in the human population in which of the L and M class cone pigments a given individual may possess, and thus considerable variation among individuals in their color vision. Rods, but not cones, saturate and thus become nonfunctional at high (photopic) light levels, so cones are responsible for vision under most daylight viewing conditions. Although there are many fewer cones than rods, most of the retinal neural processing in diurnal animals is devoted to the outputs of cones, and the cone system is responsible for the finest spatial and temporal resolution. Rods and the three cone types differ in their photopigments and also in their distribution across the retina. The center of the fovea, a region about 1.0° visual angle in diameter near the center of the retina where spatial acuity is highest, contains no rods and few S cones. In the very center of this small region (the foveola), the L and M cones have a very small diameter
HUMAN VISUAL SYSTEM — SPATIAL VISUAL PROCESSING
(about 1 µm) and their density is at its highest, about 140,000 cones/mm2 (3). The cones increase in diameter and their density falls very rapidly as distance from the foveola increases. From the edge of the fovea out to the far periphery, the cones increase in diameter (up to 15 µm in the far periphery) and are also separated from each other by the far more numerous rods. Across small distances in the foveola, L and M cones are packed in fairly regular hexagonal arrays. As the distance from the center of the fovea increases, however, rods and S cones begin to intrude and disorder the array. The S cones constitute about 10% of the cone population and increase rapidly in density away from the center of the fovea, reaching a peak at about 1° ; then, they decrease in number slowly through the retinal periphery. Rods are very small in diameter, about the same size as foveal cones. Their distribution across the retina is markedly different from that of cones. From their absence in the foveola, they increase in density as eccentricity increases, reach a peak at about 20° , and then gradually fall in density. Factors Limiting Resolution. The size and distribution of the photoreceptors, like the eye’s optical system, can be limiting factors in spatial resolution. The retina is, of course, not a continuous sheet of photosensitive material; rather, the individual photoreceptors discretely sample the retinal image, and the sampling resolution depends on the classes of receptors involved and the retinal eccentricity. Insofar as photoreceptors converge onto later neural elements, however, these later elements, the bipolar and ganglion cells, rather than the photoreceptors, set the resolution limit of the system. Under scotopic conditions, when only the rods are active, photoreceptor spacing does not limit resolution because of the very substantial neural pooling that follows. In the central 10° of the visual field under photopic conditions, however, the sampling limit set by the cone distribution is a significant factor. In the foveola, the Nyquist limit, the highest spatial frequency that can be adequately sampled by the array of tiny-diameter, tightly packed L and M cones is about 60 c/deg, similar to the limit imposed by the eye’s optics. This is as it should be to avoid aliasing, the artifactual production of low-frequency information from unresolved high spatial frequencies. Thus, it is not surprising to note that the highest spatial frequency that can be resolved by a human observer who has excellent vision under optimal conditions is about 60 c/deg (the official ‘‘normal’’ 20/20 visual acuity in American terminology actually corresponds to a resolution of 30 c/deg). Functional Mechanisms. When a receptor absorbs light, it produces a neural signal proportional to the number of photons absorbed, regardless of the wavelengths of the photons. This single value — in effect, one number — is the only information that a given photoreceptor can transmit. Because the receptor pigments are broadly (although not uniformly) sensitive across the spectrum, a given receptor signal can result from a bright light of a wavelength to which the receptor is relatively insensitive or a dimmer light of a wavelength to which the receptor is highly sensitive, or anything in between. Information about
561
wavelength and intensity, thus, is inextricably confounded within each receptor. Only by later neural processing involving comparisons between nearby cones containing different photopigments can the two kinds of information be separated (6). The main functions of the retina are to transduce light energy into neural activity and to transform the information into a usable form. The transduction of light energy into the flow of ions across nerve membranes, the language of the nervous system, takes place in the outer segments of the receptors. The capture of a single photon of light by a uniquely sensitive photopigment, rhodopsin (or the similar cone opsins), turns rhodopsin into an enzyme and initiates a cascade of chemical reactions within the cell. The cell membrane is maintained in a partially depolarized state at any illumination level. Variations in the chemical cascade produced by momentary increases or decreases in the amount of light absorbed produce momentary changes in polarization of the receptor outer membrane. A decrement in light absorbed results in depolarization of the membrane; an increment in light absorbed results in hyperpolarization of the membrane. The resulting current flow is conducted to the synaptic region of the receptor, where a synaptic transmitter is secreted in proportion to the net amount of depolarization. The cascade of chemical reactions within the outer segment, with two enzymatic sites provides the amplification to bridge the large energy gap between a single photon and the flow of ions across a membrane. It also has other effects directly relevant to spatial (and temporal) vision. It is a relatively slow process spread out across several milliseconds. This introduces a significant delay into the visual response, but it also provides for useful temporal integration that results in an increase in sensitivity. The resulting ‘‘smearing’’ in time filters out high temporal frequencies which, in the case of vision (as opposed to audition), are not informative and are mainly noise due to quantum fluctuations. Few objects in the natural world produce visually important variations in light at high temporal frequencies. Temporal information in vision is primarily useful for detecting motion and that can be accomplished without sensitivity to high temporal frequencies. The chemical cascade within the receptor outer segment also contributes to the solution of the problem discussed before of separating the signal due to reflective objects from that due to the illuminant by attenuating low temporal frequencies. Although we cannot know the size of the signal from the illuminant, there are certain differences between visual objects and illuminants that the visual system exploits to deal with this problem (although not without occasional large errors). The visual system correctly makes the assumption that light from the illuminant tends to be uniform over large spatial and temporal extents, whereas reflective objects of visual interest usually vary across space and also across time as we move our eyes. Therefore, the visual system, in effect, treats any image characteristic that is largely invariant in space and time as if it were due to the illuminant and ignores it, whereas aspects of the image that change more rapidly in space and time are considered to come from reflective objects and are further analyzed. Any
562
HUMAN VISUAL SYSTEM — SPATIAL VISUAL PROCESSING
process that strongly attenuates very low spatial and temporal frequencies will selectively attenuate signals from the illuminant, thereby allowing the visual system to detect and process the small variations due to objects by minimizing the contribution of larger variations in light intensity due to the illuminant (1). The attenuation of very low temporal frequencies starts within the photoreceptors themselves. A feedback circuit in the receptor outer segment reduces responses to steady illumination and passes responses to transient increments and decrements (7). Functionally similar interactions occur at later neural levels as well, so that most cortical cells respond poorly to any unchanging stimulus. The attenuation of very low spatial frequencies is also a multistage process, and it starts at the first synaptic level, as we discuss later. What is required to deal with the vast, but largely irrelevant, changes in light level during a day is not only low temporal and spatial frequency attenuation, but variations in sensitivity with light level as well. Light and dark adaptation refer to the processes by which the visual system adjusts its sensitivity as the illumination level goes up and down. A white object might reflect 80% of incident light, and a slightly grayer one 70%. In bright daylight, there might be 10,000,000 incident photons/s/unit area, so these two surfaces would reflect 8,000,000 and 7,000,000 photons/s/unit area, respectively. However, on a very dark night, the illumination might be only 10 photons/s/unit area, and then the two surfaces would reflect only 8 and 7 photons, respectively. To distinguish equally well between the two surfaces under these dim conditions, the visual system would need to be 1,000,000 times more sensitive. Furthermore, if the visual system did not change its sensitivity as illumination levels changed, objects would not maintain constant appearance as light levels vary in the course of a day. Much of the change in sensitivity required to maintain object constancy takes place within the receptors themselves. The decrease in sensitivity from dark to light is fast, but the reverse change can be very slow (as one notices on entering a dark theater from a brightly lit street). The breakdown of a photopigment molecule when a photon is captured is fast but the regeneration of the photopigment is very slow, and the presence of unregenerated photopigment molecules sends a signal that slows the increase in sensitivity when one suddenly enters a dark environment. Retinal Processing Retinal Problems. To understand the nature of the processing that takes place within the retina, one needs to consider two additional problems faced by the visual system. One is that neurons are very slow compared, for instance, with the processing speed of computers, and images have to be rapidly analyzed and recognized to deal with the world in real time. The only way that the immense amount of information in the visual image can be analyzed in the few milliseconds available is by massive parallel processing. There are two types of parallel processing that operate both in the retina and at later cortical levels. One, that we may term spatial parallelism,
consists of multiple cells at a given processing level; each simultaneously does the same analysis on different small portions of the image. For instance, there are about 100,000 of one type of ganglion cells (the parasol ganglion cells discussed later); each identifies luminance variations within a different small retinal region. There is also functional parallelism, in which there are separate processing channels within which cells carry out different types of analyses on the same small portion of the image to extract different types of information from a given part of the local image (e.g., its color versus its luminance contrast). A second problem that faces the visual system at the retinal level is the serious bottleneck in the visual path formed as a consequence of the fact that the retina is inverted in the vertebrate eye: the photoreceptors lie at the very back of the eye, and all of the retinal circuitry is in front of them. Consequently, the ganglion cell axons, which carry the outputs of the retinal processing to the brain, are inside the eye. For these axons to exit the eye (which they do at about 15° from the center on the nasal side), they must necessarily produce a large hole in the receptor layer of the retina. There are more than 100 million receptors in each eye, and 2,000 million or more cells in the cortex that process their output, but each optic nerve contains only 1 million axons and thus is a serious bottleneck in the processing path. Even this relatively small number of fibers exiting the eye produces a blind spot large enough to encompass some 50 nonoverlapping images of the moon. If there were even more than a million ganglion cells, the loss of vision would be great. To cope with this optic nerve bottleneck, the emphasis in retinal processing must be on compressing and efficiently encoding the visual information, rather than on carrying out a detailed analysis of it. That is done later by the multiplicity of cells in the cortex. Retinal Solutions. It can be seen that these two problems, the need for parallel processing to overcome neural slowness and the need to compress the information one-hundredfold from receptors to ganglion cells to deal with the optic nerve bottleneck, place contradictory demands on the visual system. The retinal organization deals with this problem by treating different regions of the retina quite differently. In the central retina — the region of the fovea, on which the image of whatever we fixate falls — the system is highly parallel. The output of each central cone is processed individually, and each cone feeds down multiple paths, so that there are about three times as many ganglion cells in this region as cones. This, of course, just exacerbates the problem of the optic nerve bottleneck, which the system deals with by a large convergence of cones onto ganglion cells in the far periphery (and throughout the retina for the rods). As a result, we gain relatively little detailed information about the world far away from the fixation point — just enough to detect large objects and anything that moves. We can then gain more information about some potentially important peripheral region by moving our eyes to fixate that region, putting that part of the image onto the central retina that has superior processing capabilities.
HUMAN VISUAL SYSTEM — SPATIAL VISUAL PROCESSING
Anatomy and Physiology of Retinal Processing. The receptor output is processed at two synaptic levels in the retina and one in the lateral geniculate nucleus (LGN) of the thalamus before arriving at the primary visual cortex (V1). The general anatomical arrangement in the retina consists of two paths: a direct path from receptors to bipolars to ganglion cells and indirect paths formed by lateral connections made by horizontal and amacrine cells that bring in information about neighboring regions at each of the two synaptic levels. At each stage, the outputs of a small group of receptors to which the bipolars, and through them the ganglion cells, are directly connected are compared with the outputs of a larger group of receptors to which the bipolars and ganglion cells are connected laterally through the horizontal and amacrine cells. The lateral inputs at each level are opposite in sign to the direct inputs, so that each ganglion cell reports not the output of a group of receptors, but rather the relation between the outputs of two different populations of receptors, either the difference between their outputs or their ratio. All of the receptors that feed into a particular cell constitute its receptive field (RF). (The term RF can equivalently refer to the region in visual space within which a stimulus changes the cell’s response.) Thus, the RF of a ganglion cell might consist of a disk-shaped center region that responds to light decrements and a larger, superimposed, disk-shaped ‘‘surround’’ region that responds to light increments, or vice versa. Depending on the eccentricity of a given ganglion cell, the spatial extent of its RF will vary from a tiny region in the foveal representation to a much larger region in the periphery. The functional parallelism in the retina consists of three main paths, or channels; each has two subtypes of cells. The so-called magnocellular, or Mc , pathway leads from receptors to diffuse bipolar cells to parasol ganglion cells. Their axons, in turn, project to the magnocellular layers of the LGN. The parvocellular, or Pc , path leads from receptors to midget bipolars to midget ganglion cells, whose axons project to the parvocellular layers of the LGN. All of the L and M cones feed into both of these paths, but different analyses of their outputs are carried out in the two cases. In addition, the relatively few S cones feed down a third (koniocellular, Kc ) path where their output is compared to that of the L and M cones in the region (8). The Kc system is involved almost exclusively in color vision and will not be discussed further. Each diffuse bipolar cell in the central retina directly contacts a total of about six L and M cones and receives indirect input from these and surrounding L and M cones by way of the lateral connections made by the horizontal cells. As retinal eccentricity increases, the numbers of cones that feed into both the center and the surround steadily increase. The horizontal cell output is mainly a negative feedback onto the receptors and forms a second gain-control adaptational process. As light levels increase or decrease in the course of a day (or differ between sunny and shadowed regions of a scene), the steady-state output of the receptors in that region stays about the same. The diffuse bipolars thus respond to the extent that the output of a small group of L and M cones in each region deviates
563
from the outputs of a larger group of L and M cones in a somewhat larger local region. There are two subtypes of diffuse bipolar cells that feed, in parallel, up the Mc pathway. One subtype receives direct excitatory input from L and M cones and thus is depolarized by decrements in light absorption. The other bipolar subtype inverts the receptor signal, so it is depolarized by increments in light. Each L or M cone feeds into one or more of each of these bipolar types. The crucial information for vision is not the absolute light level, which is due mainly to the illuminant, but the increments and decrements from this mean level that characterize reflective objects. For a single cell to encode both increments and decrements equally well would require a very high (and thus metabolically expensive) average maintained firing rate day and night. By having some cells that respond with excitation to increments and others that are excited by decrements, the visual system doubles its capability to transmit the information, eliminates the need for a high maintained firing rate, and at the same time ensures that the equally important signals about increments and decrements are equally represented in the message to the brain. Of course, the price paid for this is halving of the spatial information that can be transmitted through the optic nerve bottleneck. In addition to the synapses that each L and M cone makes with diffuse bipolars, each one also feeds into two midget bipolar cells of the Pc pathway, one signretaining and one sign-inverting. In the whole central 10° of the retina, each midget bipolar cell directly contacts just a single L or M cone (9). The midget bipolars also contact several horizontal cells, which bring in opposing information from neighboring L and M cones. The result is four types of midget bipolars and four types of midget ganglion cells into which the midget bipolars feed. The surround input from the same receptor type as the center just somewhat diminishes the center strength and can be ignored. We can characterize the four midget bipolar types as +L − M, −L + M, +M − L and −M + L, where + and − refer to depolarization to increments or decrements, respectively. Thus, each of these cells reports the extent to which the activation in one particular L or M cone exceeds that of a small group of neighboring L and M cones. As described before, cone photopigments absorb light of all visible wavelengths, so cones do not by their output specify the wavelength of the light. In fact, the L cones are more sensitive to the short wavelengths that we see as blue than they are to the long wavelengths we see as red. However, long wavelength light, regardless of its intensity, is absorbed more by an L than by adjacent M cones, and so preferentially activates a +L − M cell. Thus, color information does not lie in the outputs of different cone types but in the difference between the activation of receptors of different spectral sensitivity. It is this that is being computed by the cells in the Pc path. Thus, cells in the Mc and Pc pathways differ in their chromatic properties because the cone inputs to the RF center and surround are the same in Mc cells, M + L in each case, whereas the RF center of each Pc cell (in the central retina) is derived from one cone type and its surround largely from a different cone type. Thus, the Mc cells can
564
HUMAN VISUAL SYSTEM — SPATIAL VISUAL PROCESSING
convey information about increments and decrements from the mean luminance level, luminance contrast, but they cannot convey information about chromatic differences. Cells in the Pc pathway, on the other hand, convey information about both color and luminance (10–12). In the peripheral retina, cells in the midget system pick up from more than one L or M cone and thus lose some or all of their chromatic selectivity. They multiplex these parameters; the color and luminance information is transmitted mixed within the same cells’ outputs. Because of their radially symmetrical RFs, cells at this level have minimal selectivity for the shapes of objects, nor are they selective for motion nor, of course, for depth. All of this changes at the cortex. VISUAL ANALYSIS AND CODING STRATEGIES Retinotopic Organization and Coding The optical system of the eye forms an image of the visual world on the retina, so it is easy to think of spatial vision in terms of a simple mapping of this image into some perceptual representation. However, there are several reasons why this is erroneous. First, the eye’s optical system forms a two-dimensional image of the three-dimensional spatial world on the receptors, so that activation of adjacent receptors represents adjacent visual directions, not adjacent locations in space. Therefore, our three-dimensional perceptual world must be computed, as it is in the cortical processing, not just remapped from the image. It is less obvious, but even the precise two-dimensional image relationships are not maintained in cortical processing. Adjacent photoreceptors receive light from adjacent visual directions, and two receptors spaced, say, 100 µm apart are stimulated by points about equally far apart in visual angle in the image, regardless of whether the two are in the fovea or in the far periphery. Both the adjacency relationship and the equalseparation-with-eccentricity relationship break down in retinal processing and projection to the cortex. There is a systematic retinotopic organization in the projection to the cortex, where neighboring retinal regions are projected to neighboring cortical areas, but the topographical map becomes grossly distorted because what is mapped onto different regions of the cortex is not the receptor array but the array of ganglion cells. Each retinal ganglion cell in the projection through the LGN to the cortex commands roughly the same amount of cortical territory. There are far greater numbers of ganglion cells processing information from the central than from the peripheral retina, so the foveal representation is enormously magnified in the cortex. To a first approximation, the density of ganglion cells falls off in proportion to the log of the distance from the fovea, so an annulus on the retina from 1 to 2° contains about the same number of ganglion cells as an annulus from 10 to 20° , although the latter covers a region 100 times larger in the image (13). Thus, equal distances along the cortex no longer correspond to equal distances in the two-dimensional image, resulting in a gross distortion of the image. For example, when you fixate on the nose
of someone with whom you are talking, the cortical representation of his nose might be larger than that of all the rest of the person’s body. A second reason why one should not think of spatial vision as just a mapping of the retinal image onto perception is that there is no longer a precise spatial map of the image on a fine scale at the cortical level. Adjacent cells in the cortex no longer necessarily represent adjacent visual locations but are most likely to be cells responsive to different types of stimuli in the same location. There are some 500 times as many cells in the striate cortex (V1) as there are incoming optic radiation fibers. Large groups of neighboring cells all process input from the same spatial region but are responsive to different aspects of the stimulus in that region. On a fine scale, what is mapped across the surface of the striate cortex, then, is not adjacent locations in space but different stimulus features. Retinal information from receptors to ganglion cells is condensed and efficiently coded for transmitting the maximum amount of information through the limited number of cells that form the bottleneck of the optic nerve. Therefore, each ganglion cell responds to many different local spatial patterns. Its responses are independent of the local orientation of the pattern and largely independent of its spatial frequency, or whether the object is moving or stationary. Thus the fact that a particular ganglion cell fires tells one relatively little about the nature of the stimulus. Instead, the information is carried in the relative firing rates and locations within the array of each of a number of cells. The situation is quite different in the striate cortex, where large numbers of cells are available to process the data. The essence of the neural organization in the striate cortex that is relevant to spatial vision is the generation of cells each of which is selective for the presence of certain local stimulus characteristics and that then fire only if the local stimulus contains its particular set of features. Each cell responds if and only if the information in its local region is within the particular range of spatial frequencies, orientations, and so forth to which the neuron is tuned. The majority of the cells within a cortical region do not fire at all to most stimuli [this is thus termed ‘‘sparse coding’’ (14)], but when a particular cell fires, it conveys a great deal of information about the stimulus in that region: it must be a pattern of this orientation range and this range of spatial frequencies, for example. Modular and Columnar Organization In addition to the systematic, but distorted, retinotopic organization of the visual cortex, two other general arrangements of cells can be seen. One is a mosaic-like parcelation of the striate cortex into about 1,000–1,500 modules (15). Each of these modules contains all of the cells that deal with a specific small patch of the visual field. For cells in the foveally related cortex, this is a tiny patch; the patches become increasingly larger as retinal eccentricity increases. The different modules thus constitute a spatially parallel organization. About 700 optic radiation fibers from each eye project into each module, where about 400,000 cortical cells analyze
HUMAN VISUAL SYSTEM — SPATIAL VISUAL PROCESSING
this input by combining and comparing the incoming information in various ways. Within each cortical module is a columnar organization. All parts of the cerebral cortex, including V1, have six distinct layers of cells in the 2-mm thick layer of tissue. Although there are important lateral connections made between cells lying in different cortical regions, most of the connections and interactions are between cells lying in different layers within a column from the cortical surface to the white matter below. As a consequence, most of the cells within a column have some of the same stimulus selectivities. The various columns of cells within a module are doing functionally separate analyses of the information in that region, in a massive functionally parallel organization. By comparing the number of ganglion cells to the number of cortical modules, it can be seen that the spatial parallelism decreases from more than 100,000 parallel channels at the retinal ganglion cell level to only 1,000 or so spatially parallel channels in V1. However, the functional parallelism greatly increases from some six functionally parallel channels in the retina to some 100,000 functionally parallel channels within each V1 module. Thus, the information is recoded from a spatial representation in the receptors to a largely functional representation at the cortex. Following, we discuss the anatomical arrangement of the different functional columns within each module and some of the principal functional selectivities of V1 cells. Selectivities As noted before, striate cortex cells are much more selective for certain stimulus characteristics than units earlier in the path. Cells in the striate cortex pick up from somewhat larger spatial regions than cells earlier in the path, but they still perform a quite local analysis of the visual information and are largely unresponsive to whatever is present in any but their small part of the visual field. However, they do not respond just to any pattern within their local region. Rather, the stimulus for any given cell must have certain specific properties. Let us examine some of these selectivities. Orientation. Most neurons in V1 are selective for stimulus orientation, that is, they respond to a bar or grating within their RF only if it is within a certain range of orientations; the optimum orientation varies from cell to cell (16). Although the degree of orientational selectivity varies considerably across cells, the median bandwidth at half-maximum response is about 45° (a typical cell gives half-maximum response or better across a range of 45° ). So, though LGN cells have little or no orientation tuning and respond equally well to patterns of any orientation, a typical striate cortex cell responds only to patterns within about one-quarter of the total range of possible orientations. Each cortical module contains columns of cells tuned to each of many different orientations, along different axes, in a systematic polar arrangement (1). Spatial Frequency. In response to sinusoidal gratings of different spatial frequencies, LGN cells have a small
565
degree of spatial frequency selectivity for luminancevarying patterns and show attenuated responses to both very high and very low spatial frequencies. However, most V1 cells are much more narrowly tuned and have a median spatial frequency bandwidth of about 1.4 octaves (an octave is a 2 : 1 ratio of frequencies). Thus, their spatial frequency selectivity, like their orientation selectivity, encompasses about one-quarter of the total range to which we are sensitive. Each cortical module thus also contains columns of cells tuned to each of many different spatial frequencies (17). The overall contrast sensitivity function therefore reflects the envelope of the contrast sensitivity functions of these multiple units tuned to different spatial frequency ranges. Neurons in the foveal representation are tuned to various points that cover the total spatial frequency range to which we are sensitive, but those cells tuned to the highest spatial frequencies are progressively absent as retinal eccentricity increases. In V1 are also found, for the first time, neurons selective for the spatial frequency of chromatic- as well as luminance-varying patterns (18). Evidence for such striate-cortex selectivity for orientation and spatial frequency comes from psychophysical studies of human observers, as well as from physiological studies of monkeys. After fixating a moving sine wave grating of a particular spatial frequency for a minute or so, one is less sensitive to that pattern for a short time. This reflects a type of gain-control process by which the visual system deemphasizes unchanging aspects of the environment to facilitate the capture of information about change. However, adaptation to a grating of one spatial frequency produces no loss in sensitivity to patterns of a quite different spatial frequency, thus indicating the presence of cells that have fairly narrow spatial frequency tuning and of multiple spatial frequency channels (19). This loss in sensitivity to a grating of one spatial frequency also results in a short-term change in the apparent spatial frequency of gratings of slightly different spatial frequencies seen subsequently. A grating of a spatial frequency higher than the adapting pattern will appear to be higher still, and one lower in frequency will appear to be lower still (20). Similar selective sensitivity losses and aftereffects are seen in adaptation to a pattern of a particular orientation. For instance, after adapting to a grating tilted just off vertical, a vertical grating will appear to be tilted in the opposite direction. Spatial frequency selectivity and orientational selectivity are also revealed by psychophysical masking experiments. Detection of a faint pattern of a particular spatial frequency and/or orientation is interfered with (masked) by a grating of the same spatial frequency and orientation but not at all by gratings of very different spatial frequency and orientation (21). The bandwidths of the masking and of the adaptational losses in spatial frequency and orientation are consistent with the bandwidths of the most selective cells found in the striate cortex of monkeys. There are not different populations of cortical cells that have just orientation selectivity or just spatial frequency selectivity; rather, almost every cortical cell is selective for both a particular orientation and a particular spatial frequency (17). Arrays of cells that have both orientation
566
HUMAN VISUAL SYSTEM — SPATIAL VISUAL PROCESSING
and spatial frequency tuning encode information about the two-dimensional spatial frequency spectrum of the pattern within the small region of the visual world to which they are responsive. Spatial information is encoded purely in the space domain at the receptor level, and it is transformed into the local spatial frequency domain at the V1 cortical level by being filtered through multiple spatial frequency and orientationally selective cells within each cortical module. The average orientation and spatial frequency bandwidths of V1 cells are such that the 2-D spatial frequency spectrum of patterns could be specified by the relative activity of cells tuned to each of about six different spatial frequencies of two different phases, at each of about six orientations, thus about 70 cells in each region. A sine wave grating pattern of a particular spatial frequency and orientation would significantly activate only one of these 70 cells; more complex patterns would activate a small number of them. With this sparse code, the set of cells in a region that is activated uniquely defines the spectrum of the portion of the pattern in that local region. Other Cortical-Cell Selectivities. Spatial frequency and orientation selectivities not only play important roles in carrying out useful encoding of spatial patterns, but they also contribute crucially to two other types of selectivity seen at the striate cortex level, that for motion and that for stereoscopic depth. Within each cortical module is a subset of cells that responds to a pattern of a particular 2-D spatial frequency only if that pattern moves in a particular direction through the cell’s RF. Thus, such neurons extract a local motion signal. By definition, a moving object is at one location at time 1 and a different location at time 2, and the presence of motion can be detected by comparing the activity at these two locations at these two times. However, for a small portion of a complex moving image (which is all that cells at this level that have small RFs can ‘‘see’’), determining what is to be compared with what in the two time intervals is a nontrivial problem. Consider the problem of detecting the motion of a textured or a random dot pattern, in which each dot at time 1 could be related to any one of several different identical dots at time 2, with each combination signaling local motion in a different direction. This correspondence problem is vastly simplified by the fact that the motion-detecting neurons are selective for orientation and spatial frequency. Thus, each motion cell would compare only two patterns of the same orientation and spatial frequency at the two different times, resulting in far fewer false matches. Within each cortical module, which deals with stimuli from only a certain small region of the visual field, there are multiple local motion detectors. Each responds only to the components of a pattern of a particular spatial frequency that move in a particular direction within that small region. Although such local motion detectors play an important role, they constitute only the first step in motion perception. The local motion signals from different regions need to be integrated and processed in later cortical regions to determine the motion direction and speed of large, complex objects. The visual system uses many kinds of information to construct our three-dimensional visual world. Many of
these involve high-level processes that have not been examined for possible physiological mechanisms. The problem of determining the depth or three-dimensional structure from stereoscopic cues, however, has been studied and turns out to be similar to the problem of determining motion direction. When some point is binocularly fixated, light from any point at that distance from the eyes falls on corresponding locations in the two eyes, for example, a point 2° to the right of fixation in the visual field would stimulate receptors 2° to the temporal side of the fovea in the left eye and 2° to the nasal side of the fovea in the right eye. This curved surface, going through the fixation point on which corresponding points are located, is known as the horopter. Objects in the visual field that are closer or further away than the horopter, however, stimulate noncorresponding, disparate points in the two eyes. The direction and amount of disparity in the images in the two eyes carry information as to whether the point is nearer or further than the object being fixated and by what amount. The visual system uses this stereoscopic information as one of many cues to the relative depth of objects in the visual field. Note that this tells nothing about the absolute distance, but only about the distance with respect to the horopter at the fixation distance. Stereoscopic information is also useful only for relatively nearby objects because the retinal disparity decreases as distance increases; objects beyond about 6 meters produce essentially identical images in the two eyes, and this cue is no longer useful. Information from the two eyes first comes together at the level of the striate cortex. The two eyes project to slightly different regions in each V1 module, but most cells later in the processing path combine input from the two eyes. A subset of these binocular cells within each cortical module appears to compute the binocular disparity in that local region (22). Some of them receive input from corresponding regions in the two eyes and thus respond optimally to objects in the fixation plane. Other cells receive binocular input from slightly noncorresponding regions in the two eyes and respond best to objects in front of or behind the fixation plane. Again, as in motion, only inputs from monocular cells that have similar 2-D spatial frequency tuning are combined, thus vastly simplifying the correspondence problem. As with motion, separate systems process stereoscopic information across each of a number of different spatial frequency ranges. Binocular cells tuned to high spatial frequencies detect small binocular disparities; those tuned to lower spatial frequencies detect increasingly larger binocular disparities. Prefiltering of the cortical information into separate spatial frequency and orientation channels not only contributes to solving the correspondence problem — knowing what to combine from one eye or at one moment with what from the other eye or at the next moment — but also allows for the perception of transparency. Often, there are overlapping objects in a given region at slightly different depths and/or moving in different directions, such as a deer seen through grass in a field. The presence of multiple motion and stereoscopic systems, each selective for a
HUMAN VISUAL SYSTEM — SPATIAL VISUAL PROCESSING
different spatial frequency range, aids the visual system in dealing with transparency. For black–white vision, we would need only a single set of these processing elements. The great majority of humans (∼99%), however, have some form of trichromatic color vision. This requires three sets of such mapping systems, for we can discriminate the spatial frequency, the orientation, the motion direction, and the depth of patterns along each of three color axes, for example, red–green, yellow–blue, and black–white. Many V1 cells are color specific, and these cells also often have spatial frequency and orientational selectivity, indicating that local color-spatial maps — perhaps even more than three — are produced at this level (18) CORTICAL PROCESSING Processing Levels in V1 Simple cells. The two main cell types found in the striate cortex, termed simple and complex cells, reflect two successive levels of processing (16). The RF of a simple cell might consist of an elongated region responsive to light increments (white) and flanking regions responsive to black. Such a cell is most sensitive to a black–white grating of a particular spatial frequency and orientation in such a position that a white bar falls on the RF center and black bars fall on the flanking sidebands. If black and white are reversed (if the position of the grating is shifted 180° ), the cell is inhibited, and a 90° shift in either direction produces a null response. Thus, simple cells are spatial-phase-specific. The cell described before would respond best to a black–white grating of some particular spatial frequency and orientation in cosine phase with respect to the RF center. The optimum phase for simple cells varies from cell to cell. Another cell, for example, might have just one region to the left of the RF center that is responsive to black and a region to the right of center that is responsive to white, and thus responds best to a pattern in sine phase with respect to the RF center. A significant number of simple cells are color-coded and have, for instance, separate RF regions responsive to red and green. Such a cell responds best to a red–green grating of a particular spatial frequency and orientation. Although neurons earlier in the visual pathway show increases and decreases in firing from some maintained level to excitatory and inhibitory stimuli, cortical simple cells have little if any maintained discharge in the absence of stimulation. Thus they show either an increase in firing or no response to a grating pattern presented at different spatial phases, so that their output is effectively half-wave rectified. Simple cells also have an expansive nonlinearity with an exponent of about 2, so that their responses are both half-wave rectified and squared. Complex Cells. Complex cells are first found in the striate cortex and are the predominant cell type in later cortical regions. Complex cells within a cortical column have much the same orientation and spatial frequency tuning as simple cells in the same column, but they lack phase selectivity. A complex cell responds to the
567
black–white grating of a particular orientation and spatial frequency, but unlike a simple cell, it does so regardless of the spatial phase of the grating. Thus, a complex cell fires to either a white or a black bar (or often to a red or green bar) in any position within its RF (16). This behavior can be understood if the complex cell were summing the input from several simple cells, all of which had the same spatial frequency and orientation tuning but which differed in their spatial phase tuning. Thus, they maintain the 2-D spatial frequency selectivity of the simple cells from which they get their inputs but lose the spatial phase selectivity. One of many important roles it is believed that complex cells play in spatial vision is in texture perception. Although some objects are roughly spatially uniform over their extent, like clouds, most objects and many backgrounds are textured to various extents. It is important for the visual system to characterize the spatial spectrum of such textured regions to differentiate an object from its background or to distinguish one object or background from another. To do this, however, it is not necessary to have the phase information carried by simple cells that would be required to locate all the edges and discontinuities in a scene. For instance, the system need not specify all of the millions of edges in a patch of sand to distinguish it from a patch of grass. Which ones of an array of complex cells tuned to different 2-D spatial frequency ranges that are most active in a region would provide a very economical specification of the texture in that part of the scene (1). Local Versus Global Processing Each bipolar and ganglion cell in the retina processes information extremely locally; its RF covers just a few minutes of arc in cells in the fovea and fairly small regions even for those in the far periphery. In the striate cortex, RFs are larger and more varied in size. In the foveal representation in V1, RFs are of various sizes and form the substrate for multiple spatial frequency and orientation channels. The largest of these RFs can extend a degree or more, even in the foveal representation. Although an area 1° in diameter in the fovea encompasses several thousand cones, that area is still a very small region compared with the sizes of many visual objects that the system must identify. For instance, a person’s face at ordinary conversational distance may subtend 10° or more. Visual processing beyond V1 is more global; the RFs of cells are larger and larger at successive processing stages past the striate cortex. Many cells in the parietal and temporal lobes have RFs that encompass half or more of the whole visual field. With reference to the earlier discussion of retinotopic versus functional organizations, the clear trend for successive synapses through the visual path is a progressive decrease in a spatial, retinotopic representation and an increase in functional representations. Processing that is more global is required in later cortical areas not only to encompass large objects but also to resolve ambiguities in local signals. One example is that of separating luminant variations due to the reflectance of objects from those due to the illuminant. As discussed earlier, the primary assumption that the
568
HUMAN VISUAL SYSTEM — SPATIAL VISUAL PROCESSING
visual system uses in accomplishing that is that light from the illuminant is uniform across space. Attenuating the signal from spatially uniform regions begins in the retina, but the larger RFs of cortical neurons allow the system to discount the illuminant across larger areas than was possible in the retina. Another example of stimulus ambiguity that can be resolved only at higher cortical levels is the overall direction of motion of a complex object. As an object moves, contours in various local regions within the object may be moving in many different directions. A more global analysis is required to determine the overall direction of pattern movement (23). Segmentation of Image into Objects and Background When we look at a natural visual scene, we perceive that is it made up of various objects, in a rural scene perhaps of a group of trees, a cow or two in a grassy field, a barn, and a stream. It is so easy and natural for us to see the world as made up of separate objects such as these that it is hard to realize that such a segmentation of the image into separate objects against a background is not automatically given by the visual input to the eye but is a product of visual processing, a conclusion reached by our visual system after an elaborate analysis of the visual input. The vast array of numbers put out by the receptors is transformed by the processing up through the striate cortex, as we have been discussing, into the local 2-D spatial frequency domain. Although this domain has certain advantages for representing spatial information, it is clear that the cells at this level ‘‘know’’ virtually nothing about objects, that is to say, they respond to a pattern of a particular orientation and spatial frequency in their local region regardless of whether that pattern is part of one object or of another or is just part of the background. A particular striate cortex cell might respond virtually identically to a vertical trunk of a tree, a vertical stripe on a zebra, or the vertical shadow of a branch, as long as each of these patterns had the same 2-D spatial frequency content within the local region covered by the cell’s RF. To parse a visual scene into objects requires a higher level, more global analysis than occurs in the striate cortex. There is increasing evidence that the RFs of V1 cells are more complex than initially thought and the responses to patterns within what has been come to be known as the ‘‘classical RF’’ are influenced to some extent by other stimuli in the neighborhood. There is also evidence that the response properties of V1 cells are modulated to some extent by feedback from later processing levels. However, the influences of the ‘‘nonclassical’’ RFs of striate cortex cells and feedback still appear to be far from accounting for image segmentation. The problem of how the visual system groups local image components into separate visual objects and segregates them from the background was first recognized as a problem and extensively studied in perceptual experiments by Gestalt psychologists at the beginning of the twentieth century (24). They explicitly raised the question of the principles by which the visual system puts together some, but not other, local components to form a single object. Among the principles they identified were those of proximity (nearby components are more likely to
be grouped together and seen as parts of one object than components further apart), similarity (components similar in color or orientation, and so forth, tend to be perceptually grouped together as all parts of the same object), common fate (grouping can be based on common motion direction), and good continuation (a smooth contour is more likely than one that has an abrupt change in curvature to be seen as the contour of a single object). The relative depths of different parts of a scene, based on stereopsis or other kinds of depth information, also provide powerful cues for breaking a scene into different objects and separating objects from the background. After a considerable hiatus, investigations of grouping principles have recently been revived, particularly in studies of motion and texture. However, we know little about the way different grouping principles are combined in image segmentation, of what happens when the different grouping cues conflict, or of how the computations are carried out physiologically. The early stages of neural processing provide the necessary initial analyses for object segregation, using cells in V1 selective for spatial frequency, orientation, motion, color, and stereopsis. We can distinguish thousands of different colors and brightnesses, and this allows us to distinguish the contours of a chair of almost any color, for instance, on a background of almost any color. However, it would be totally unfeasible to have separate object segregation and form recognition systems for chairs of each of these thousands of different colors against each of thousands of different backgrounds. Furthermore, upholstered chairs may differ from their backgrounds in thousands of different textures, as well as in color. What is clearly required for scene segmentation and object recognition is a system that uses the information from the selectivities of V1 cells but then generalizes across some or all of various dimensions and combines like information from neighboring regions. Complex cells in V1 carry out such generalization across dimensions to a limited extent; they respond to a pattern of a particular spatial frequency and orientation, regardless of color or spatial phase. This type of processing is further extended in the second visual area (V2), where cells are found that have an orientation and direction tuning to contours, regardless of whether the contour is formed by luminant or textural variations (25). How such V2 cells accomplish this has not yet been established, but a plausible mechanism would be that these V2 cells sum the outputs of those cells in extended regions of V1 that have the same orientation and/or direction tuning but different spatial frequency and color tuning. Objects in the natural environment are often partially occluded by other objects in the field. So, for instance, a deer in a forest might have part of its body obscured by a nearby branch. Such occlusion changes the shape of the image in that region, thus making object recognition more difficult, and it may also break the image of a single object into separate pieces, making it difficult for the system to treat all of the separate elements as part of a single object. Cells in visual area V2 have been found that appear to be dealing with the problem of occlusion across limited regions (26). These cells respond to two collinear borders,
HUMAN VISUAL SYSTEM — SPATIAL VISUAL PROCESSING
even if there is a gap in the middle such as would be produced by a nearby occluding object. Two Processing Streams Some 25 or more visual cortical areas have been identified anatomically, V1 and V2 being the largest. Some of these areas are labeled on the basis of a presumed sequential order of visual processing (V1, V2, V3, V4, V5) and others on the basis of anatomical location with respect to various other anatomical landmarks, such as sulci (in the macaque monkey). A diagram of the interconnectivity among these regions gives a map resembling something that might have been produced by a drunken spider. Almost every visual area is connected to every other one. However, consideration of just those paths that have the largest number of fibers reveals some evidence for sequential processing and for two main parallel processing streams that lead from V1 forward in the brain (27). One of the streams is from cells in layer IVB of V1 to V5, and from there to various regions in the parietal lobe. The other path is from cells in layers II–III of V1 to V2, then V4, and from there to various regions in the temporal lobe. Cells in the temporal stream appear to be involved principally in identifying objects, processing information about form and color. Cells in the parietal stream, on the other hand, appear to be involved in identifying the location and change of location (motion) of objects with respect to each other and to the body. This requires input from eye–muscle commands and from the vestibular system because location in the world is a joint function of retinal location and of eye and head position. However, the division of labor between temporal and parietal streams should not be taken too literally. Numerous anatomical connections are to be seen between cells in these two paths, and it is also clear from functional considerations that the two systems cannot be totally independent. For instance, motion is one of the strongest cues for the segregating and identifying of form, so form analysis cannot be completely independent of the motion system. Nonetheless there is considerable evidence for some degree of specialization of function among the various prestriate visual regions and for this rough division between ‘‘what’’ (temporal) and ‘‘where’’ (parietal) systems. Lesions (brain damage due to stroke, injury, or disease) in the temporal lobe of humans often lead to loss of the ability to recognize certain classes of objects or of certain very specific characteristics of objects. For instance, lesions in a particular temporal lobe region may lead to loss of the ability to recognize the colors of objects but not their shapes. One of the most common results of strokes that affect certain parts of the temporal lobe is that a person can no longer identify individual faces. Faces are of great importance to us, both for recognizing different individuals and for identifying a person’s emotions in social interactions. To distinguish one person from another based on the very slight differences in facial features among individuals is surely among the most complex of visual tasks. It is perhaps not too surprising, then, that numerous cells in the temporal lobe seem to be involved specifically in processing the images of faces. Unfortunately, there are
569
no viable theories (and certainly no data) on the nature of the circuitry that accomplishes the transformations involved in going from cells in V2 and V4 (which appear to be only marginally different from striate cortex cells, except for their larger RFs) to the networks of neurons in the temporal lobe (which have such complex tuning properties). In the early and middle stages in visual processing (through the striate cortex), the visual system processes information from the whole visual field simultaneously and in parallel. As we have discussed, such massive parallel processing allows the very slow neural elements to carry out the huge number of computations required for vision in real time. It is apparent from psychophysical experiments and from the results of brain lesion studies, supported by recordings from cells in various cortical regions, that in later stages, the system may do only detailed processing of selective portions of the scene at any one time–that part to which one is attending. There is, in fact, some indication from patients who have Balint’s syndrome that object recognition may operate on only one object at a time (28). Lesions in the parietal lobe produce visual losses of quite different types from those produced by temporal lobe lesions. Loss of a particular occipitoparietal region that presumably corresponds to area V5 in macaque monkeys can lead to an inability to see motion but no loss in the ability to recognize objects on the basis of their shape. Lesions in other parietal regions (presumably later in the processing stream) can leave the ability to recognize whatever object is attended to, but the individual is totally unable to tell where that object is located in the visual field. Furthermore, such an individual may be quite unaware of the presence of other, non-attended-to objects, including one which had been attended to and recognized just a moment earlier. This suggests that the temporal lobe systems identify objects one at a time and feed this information to the parietal lobe regions that then fit them into their appropriate relative locations in the visual field, like a collage of separate photographs, to construct our unified perceptual world. One reason that we have evolved multiple visual areas is presumably the need we have discussed for analyzing across large parts of the visual field to accomplish object segregation and recognition of large objects. Cells in V1 carry all of the information required for visual perception because the whole path goes through this area. A typical visual object may extend over several degrees, and the neurons in V1 that respond to different portions of it are separated by tens of millions of other cells. For V1 cells to interact with each other across such a vast array would require an enormous set of interconnections. If, however, the cells from all parts of V1 that process a certain type of information (motion, for example) were brought together in a separate region, they would constitute a sufficiently small subpopulation to permit interactions across greater distances in visual space without requiring an impossible number of interconnections. The visual system may also have evolved separate parallel processing areas in the prestriate cortex to organize those cells that carry out a particular analysis
570
HUMAN VISUAL SYSTEM — SPATIAL VISUAL PROCESSING
in optimal ways for their particular functions (29). In the striate cortex, neighboring columns of cells are systematically organized by orientation and spatial frequency. Within a motion area, however, an organization based on motion direction, as in fact occurs in area V5, is presumably more useful. In a color area, there may be an organization by chromaticity. ABBREVIATIONS AND ACRONYMS MTF CSF RF V1 V2, V3, V4, V5 L, M, S Mc Pc Kc
Modulation Transfer Function Contrast Sensitivity Function Receptive Field first cortical visual processing area subsequent cortical visual processing areas L(ong), M(iddle) and S(hort) wavelength cones Magnocellular path and cells Parvocellular path and cells Koniocellular path and cells
BIBLIOGRAPHY 1. R. L. De Valois and K. K. De Valois, Spatial Vision, Oxford University Press, New York, 1988. 2. F. W. Campbell and J. G. Robson, J. Physiol. (London) 197, 551–566 (1968). 3. C. A. Curcio, K. R. Sloan, R. E. Kalina, and A. E. Hendrickson, J. Comp. Neurol. 292, 497–523 (1990).
10. R. L. De Valois, Cold Spring Harbor Symp. Quant. Biol. 30, 567–579 (1965). 11. R. L. De Valois, I. Abramov, and G. H. Jacobs, J. Opt. Soc. Am. 56, 966–977 (1966). 12. A. M. Derrington, J. Krauskopf, and P. Lennie, J. Physiol. (London) 357, 241–265 (1984). 13. R. B. H. Tootell, M. S. Silverman, E. Switkes, and R. L. De Valois, Science 218, 902–904 (1982). 14. D. J. Field, Neural Computation 6, 559–601 (1994). 15. J. C. Horton, L. R. Dagi, E. P. McCrane, and F. M. de Monasterio, Arch. Ophthalmol. 108, 1,025–1,031 (1990). 16. D. H. Hubel and T. N. Wiesel, J. Physiol. (London) 160, 106–154 (1962). 17. R. L. De Valois, D. G. Albrecht, and L. G. Thorell, Vision Res. 22, 545–559 (1982). 18. L. G. Thorell, R. L. De Valois, and D. G. Albrecht, Vision Res. 24, 751–769 (1984). 19. C. Blakemore and F. W. Campbell, J. Physiol. (London) 203, 237–260 (1969). 20. C. Blakemore and P. Sutton, Science 166, 245–247 (1969). 21. G. E. Legge and J. M. Foley, J. Opt. Soc. Am. 70, 1,458–1,471 (1980). 22. H. B. Barlow, C. Blakemore, and J. D. Pettigrew, J. Physiol. (London) 193, 327–342 (1967). 23. J. A. Movshon, E. H. Adelson, M. S. Gizzi, and W. H. Newsome, in C. Chagas, R. Gatass, and C. Gross, eds., Pattern Recognition Mechanisms, Springer Verlag, NY, 1985, pp. 117–151.
4. D. Baylor, Soc. Gen. Physiologists Ser. 47, 151–174 (1992). 5. M. Neitz, J. Neitz, and G. H. Jacobs, Vision Res. 33, 117–122 (1993). 6. R. L. De Valois and K. K. De Valois, Vision Res. 33, 1,053–1,065 (1993). 7. E. N. Pugh and T. D. Lamb, Vision Res. 30, 1,923–1,948 (1990).
24. W. K¨ohler, Gestalt Psychology, H. Liveright, NY, 1929. 25. H. C. Nothdurft, Ciba Foundation Symposium 184, 245–268 (1994). 26. R. von der Heydt, E. Peterhans, and G. Baumgartner, Science 224, 1,260–1,262 (1984). 27. L. G. Ungerleider and M. Mishkin, in D. J. Ingle, M. A. Goodale, and R. J. W. Mansfield, eds., Analysis of Visual Behavior, MIT Press, Cambridge, MA, 1982, pp. 549–586.
8. V. A. Casagrande, Trends Neurosci. 17, 305–310 (1994). ¨ ¨ 9. H. Wassle, U. Grunert, P. R. Martin, and B. B. Boycott, Vision Res. 34, 561–579 (1994).
28. R. Balint, (translated by M. Harvey), Cognitive Neuropsych. 12, 265–281 (1995). 29. H. B. Barlow, Vision Res. 26, 81–90 (1986).
I IMAGE FORMATION
that use tomographic image formation techniques. Holographic techniques produce three-dimensional images based on the interference patterns between light waves from the imaged object and a reference. Optical, tomographic, and holographic techniques are discussed in separate articles in this encyclopedia. This article focuses on image formation techniques based on scanning, triangulation, projection, and combinations of the others (hybrid).
JOSEPH P. HORNAK Rochester Institute of Technology Rochester, NY
INTRODUCTION An imaging system consists of ten loosely defined components or subsystems: the imaged object, probing radiation, measurable property, image formation, detected radiation, detector, processing, storage, display, and end user subsystems. The image formation subsystem maps spectroscopic information from the imaged object, person, or phenomenon onto a detector so that the processing subsystem can create an image. Figure 1 presents a schematic representation of this concept. The image formation (IF) component maps a spectroscopic property R of the imaged object spatially onto the detector so that R(x, y) is mapped as P(x , y ) on the detector. The lines between R and P have been intentionally curved to convey that a mapping process does not have to occur along straight lines. This concept is easiest to see in an optically based image formation system. In one of the simplest optical systems, a camera, the image formation component is a lens, the spectroscopic property is the reflected light from the imaged object, and the detector is a sheet of photographic film. In this example, the mapping by the lens is along straight lines between R and P. There are seven general classes of image formation techniques: optical, tomographic, holographic, scanning, triangulation, projection, and hybrid. Optical techniques are the most common, perhaps because they are based on the first imaging system, the animal eye. Current optical imaging systems use lenses, mirrors, and waveguides to map R onto P. Tomographic techniques produce images of a thin slice through an object. They are used for clinical and nondestructive analysis. Magnetic resonance imaging (MRI), ultrasound imaging, and computed tomographic imaging are a few examples of systems
PROJECTION TECHNIQUES Image formation based on projection is one of the simplest image formation techniques. Anyone who has seen the shadow on the ground of an airplane flying overhead on a sunny day, or the shadow silhouettes of individuals on a wall as they walk in front of a light bulb, has experienced image formation based on projection. Therefore, projection based image formation is shadow imaging. Energy or particles from a point source P are directed at the imaged object, O. The imaged object absorbs some of the energy or particles and creates a projection image on a flat surface S, behind the imaged object (Fig. 2). The size of the imaged object in the projection is magnified by d2 /d1 . Projection image formation is most commonly used in flat-film X-ray imaging. Here, a point source of X rays is directed at the imaged object, which could be a part of human anatomy or an airplane propeller, for example. A detector, such as a scintillation plate and sheet of photographic film or CCD array, is placed behind the imaged object. When the interaction of radiation with the imaged object or spectroscopy includes the possibility of scattering and diffraction of the radiation, an additional component is needed in the image formation system. This component is placed between the imaged object and detector and allows only radiation that travels in a straight line from the source to the detector to strike the detector. In flat-film
S O P
IF
R (x,y )
P (x ′,y ′)
d1 d2
Figure 1. A schematic representation of image formation where R(x, y) is mapped by image formation device IF as P(x , y ) on a detector.
Figure 2. A schematic representation of projection image formation on screen S of object O by point source of energy P. The magnification of the image is d2 /d1 . 571
572
IMAGE FORMATION
X-ray imaging, this device is called a Bucky diaphragm (1) as depicted in Fig. 3. A projection image formation technique has one big advantage over other techniques, its simplicity. Unfortunately, this advantage is also its greatest disadvantage. This disadvantage is best illustrated by the following example. You may remember from your childhood seeing a shadow puppeteer making the images of animals on a screen by placing his hands in a special way in front of a light source. Was there a bird or dog in the hands of the puppeteer? No, and that is precisely the shortcoming of projection image formation techniques. It is possible to have overlapping objects that absorb the probing radiation and give a false image. Does this mean that flat-film X-ray images are useless? No, as long as the end user of the information realizes the shortcomings of the technique. In fact, radiologists, the end users of flat-film X-ray systems in clinical settings, will often take projections at multiple angles through the object to minimize any uncertainties. In the extreme case, X-ray computed tomography (CT), the number of projections is very large and an inverse radon transform (2) is performed to reconstruct an image of the object. See the article on TOMOGRAPHIC IMAGE FORMATION TECHNIQUES in this section for more detail on CT imaging.
TRIANGULATION TECHNIQUES Triangulation techniques are those that form an image based on knowledge of the distances and angles between a source and a detector. Triangulation techniques are typically used where the location of a source is being mapped. An example of an imaging system that uses a triangulation technique is the lightning locator or lightning strike mapper. Lightning locators are used by pilots in the cockpits of commercial airlines for storm avoidance and by meteorologists for storm tracking and prediction. (See the LIGHTNING LOCATORS article in this encyclopedia for more information.) This system produces a two-dimensional map of lightning strikes based on information about either the distance from multiple detectors to a lightning strike (the source), the angle from multiple detectors to the source, or an angle and distance from a detector to the source. The presentation of triangulation image formation techniques in this article will use this same classification: (1) distance and angle from a single detector, (2) angles from multiple detectors, and (3) distances from multiple detectors. The explanation of the three techniques will first assume a twodimensional space before a three-dimensional description is presented. Single Detector that has Distance and Angle Information
(a)
The first class of triangulation image formation technique uses a single detector and the distance r from the detector to the imaged phenomenon located at P, as well as the angle θ , between a reference point, the detector, and the phenomenon (Fig. 4). The resolution of systems that use this image formation technique decrease as r increases, due to uncertainties in θ . A crossed detector coil lightning locator is an example of an imaging system that uses this type of image formation. (See the article on LIGHTNING LOCATORS.) Locating P in three dimensions requires knowledge of a colatitude, or angle, between a line perpendicular to the plane in which the detector resides and P.
Source
Imaged object Bucky diaphragm Detector
(b)
Multiple Detectors and Multiple Angles
5 4
3
2
1
The second class uses multiple detectors and multiple angles. Typically, three detectors, (D1 , D2 , and D3 ), and three angles (θ1 , θ2 , and θ3 ) from a reference to the phenomenon P, are used (Fig. 5). Distances r1 , r2 , and r3 are not known. The location of the imaged phenomenon
P
r Figure 3. A schematic representation of (a) a projection imaging system using a Bucky diaphragm and (b) an enlarged section of the Bucky diaphragm (dark lines) that permits only those rays that travel in a straight line from the source (Rays 1, 2, and 3) to pass. Scattered rays (4 and 5) are absorbed by the nonreflecting diaphragm and do not strike the detector.
q D Figure 4. A graphic representation of the point detector D that can determine the distance r and angle θ to a point P.
IMAGE FORMATION
is at the intersection of r1 , r2 , and r3 . Accuracy decreases as the differences among the three angles become smaller. Note that if θ1 and θ2 cannot be either 0 or 180° , detector D3 is not necessary. Some lightning locator systems use this form of image formation. In three-dimensional space, the three detectors could define only the distance of P to a line perpendicular to the original planar space. A fourth detector that can provide colatitude or three detectors that can provide longitude and colatitude would allow imaging in three dimensions. Multiple Detectors and Multiple Distances This third class is similar to the second where multiple detectors are used, but different in that now distances rather than angles are used. Typically, three detectors, (D1 , D2 , and D3 ) are used. Distances r1 , r2 , and r3 are known and angles θ1 , θ2 , and θ3 are unknown (Fig. 5). Each detector can only place P on a circle a distance r from its center. Two detectors limit the possible locations to at most two points located at the intersections of the circles. Three detectors place P at the common intersection of the three circles. The accuracy with which P can be mapped decreases as r1 , r2 , and r3 become very large compared to the distances between the detectors. Reflection seismology uses this form of image formation. Locating P in three dimensions requires only a change in the processing algorithm and the restriction that P must be located in or above the plane in which D1 , D2 , and D3 reside. The data are no longer limited to solutions in the plane, as indicated in Fig. 5, but to threedimensional space. Now, the location of P is calculated as the intersection of three spheres centered on r1 , r2 , and r3 .
573
measured at the detector drops off as 1/r2 . Therefore, knowing the intensity allows calculating r. Alternatively, time can be used to determine distance. If the velocity of signal propagation from the source to the detector is known, the distance can be calculated from the time it takes the signal to reach the detector. This technique is useful only when the time the phenomenon occurred is known or when multiple detectors are used. In the latter case, n detectors are used. The time for the signal to reach a reference detector is t, and t + tn is the time to reach detector n. Time t1 , as well as the distances from the source to each detector, are unknown. When n is large and the relative locations of the detectors are known, dn can be determined. An angle from a source to a detector to a reference point θ can be determined by a direction-sensitive receiver. For example, a set of orthogonal pickup coils can be used to determine this angle. The ratio of the signals from the two coils is equal to the tangent of θ .
SCANNING TECHNIQUES One of the more common image formation technique is the scanning technique. The scanning image formation technique is used in scanning probe and electron microscopy, magnetic resonance imaging, radar, ultrasound imaging, and many other imaging systems. Scanning image formation techniques scan the imaged object or space and send the spectral information from the imaged object to a detector, one point at a time. This definition includes multidimensional space and all coordinate systems for mapping the space. Scanning Patterns
Detectors It is worth examining how θ and r are determined in triangulation image formation. The distance r from the detector to the imaged phenomenon can be determined from the intensity of the emission from the source. For example, assume that all imaged phenomena emit radiation at a fixed intensity. The signal intensity
P
r1
The simplest scanning pattern is the x, y scan pattern generally used in probe microscopies. A two-dimensional space is scanned, one row at a time, as depicted in Fig. 6. The spectroscopic property is sampled at the x, y values indicated by the points. The two-dimensional space mapped in Fig. 6 can also be mapped on a polar grid, as depicted in Fig. 7. In this example, referred to as an r, θ scan pattern, image information is recorded along a radius, starting at the center and proceeding outward. Information from each radius is recorded starting at θ = 0° and proceeding to θ = 360° . Because points are usually recorded at fixed θ and r intervals, the sampled
r2
q1
r3
D1
q2 D2
q3 D3 Figure 5. A graphic representation of three point detectors (D1 , D2 , and D3 ) that can determine angles (θ1 , θ2 , and θ3 ) to point P.
Figure 6. An example of an x, y scan pattern. Image information is recorded in order, one point at a time, from the location of the dots. The arrow indicates the order of data collection.
574
IMAGE FORMATION
manner due to the rapid data collection scheme. Lines 1, n, 2n, . . . are scanned followed by 2, n + 1, 2n + 1, . . ., and then 3, n + 2, 2n + 2, . . . etc. (3). Here, n is the number of lines in k space divided by the number of echoes, typically 4, 8, 16, or 32. Many echo-planar imaging sequences traverse k space similarly to that described in Fig. 1 (3,4). Many fast angiography sequences use spirals (5,6) and rose patterns (7).
HYBRID TECHNIQUES
Figure 7. An example of an r, θ scan pattern. Image information is recorded along a radius, starting at the center and proceeding outward. Information from each radius is recorded starting at θ = 0 and proceeding to θ = 360° .
points from the r, θ scan pattern will not match the locations of the points from the x, y scan. Many additional scanning patterns are possible. In addition to the x, y and r, θ scans, already presented, Archimedean spirals (r = aθ ), logarithmic spirals (r = eaθ ), and rose patterns (r = a sin nθ ) have been used. Different scan patterns are used because they sample the imaged space at different densities. For example, in air traffic control (ATC) near an airport, it is often necessary to sample the airspace near the airport at a higher density than the space farther away. The x, y scan samples space at a constant density, whereas a logarithmic spiral and an r, θ scan sample points closer to r = 0 at a higher density. The direction of the scan may also be important, especially when the imaged object is moving. k-Space Scans Scan based image formation techniques need not be limited to mapping the image space of an object. Scan techniques can map the k space or Fourier transform of an imaged object. Magnetic resonance imaging (3) (MRI) records its image information from the k space of the imaged object. (For more information on MRI, see the article on MAGNETIC RESONANCE IMAGING or Ref. 3). This information is Fourier transformed to produce the image. MRI maps k space by using a raster pattern. The most routinely used imaging sequence in MRI is called the spinecho imaging sequence. In the spin-echo imaging sequence, data are recorded while k space is being traversed using an x, y raster that follows the pattern described in Fig. 6. The points in k space are defined as ki,j , where row i index goes from +m to −m, and column j index from +n to −n. The raster starts at the center of k space, k0,0 , and proceeds to km,−n . From this point, the raster proceeds to km,n and then back to k0,0 . This process is repeated for values of j down to −n. Data are collected on traversing a row. Many other pulse sequences are available in MRI. A fast spin-echo sequence traverses k space in a slightly different
Hybrid scan techniques use combinations of optical, holographic, projection, scan, and triangulation techniques. Hybrid techniques are generally used to reduce the imaging time. Two common examples of imaging systems that use a hybrid image formation system are an electrophotographic or Xerox copier, and a flatbed document scanner. These systems use a hybrid optical-scan system where the first dimension of the two-dimensional imaged document is mapped optically onto the detector while the second direction is scanned. The amount of time necessary to form the image in this hybrid technique is equal to the square root of the time necessary to scan a document with an two-dimensional x, y scan. The interested reader is directed to the article on ELECTROPHOTOGRAPHY section for more information.
ABBREVIATIONS AND ACRONYMS ATC CCD CT d D IF MRI n O P r S t θ
air traffic control charge coupled device computed tomograph distance detector location Image Formation magnetic resonance imaging integer object phenomenon location distance screen time angle
BIBLIOGRAPHY 1. J. P. Hornak, in Kirk-Othmer Encyclopedia of Chemical Technology, John Wiley & Sons, Inc., New York, NY, 1995. 2. S. R. Deans and S. Roderick, The Radon Transform and Some of its Applications, John Wiley & Sons, Inc., New York, NY, 1983. 3. J. P. Hornak, The Basics of MRI, http://www.cis.rit.edu/htbooks/mri/, 2001. 4. C. H. Meyer and A. Macovski, Magn. Resonance Imaging 5, 519–520 (1987). 5. C. H. Meyer, B. S. Hu, D. G. Nishimura, and A. Macovski, Magn. Resonance Med. 28, 202–213 (1992). 6. K. S. Nayak et al., Magn. Resonance Med. 43, 251–256 (2000). 7. G. E. Sarty, Magn. Resonance Med. 44, 129–136 (2000).
IMAGE PROCESSING TECHNIQUES
IMAGE PROCESSING TECHNIQUES
575
(a)
JOHN C. RUSS North Carolina State University Raleigh, NC
Image processing includes those methods that operate on arrays of pixels to produce a modified image in which some of the information is enhanced in visibility and necessarily some other information is reduced. This is done presumably because the user can determine from the intended use of the image which information is currently important, and in some other situation different processing might be used to extract other information from the image. There are several different ways to organize image processing methods to aid in understanding and to provide a logical structure for explanation. One is by purpose: to correct defects in images that arise from imperfect cameras, illumination, etc.; to enhance the visibility of particular structures within images; to recognize and locate features; and so on. But in many cases, the same tools can be used for several of these different objectives. Another method is to distinguish between operations that work on single pixels, within small neighborhoods, and globally on the entire image. That is the organization used here.
SINGLE-PIXEL OPERATIONS For example, the brightness and contrast in an image can be altered by substituting a new brightness (or color) into each pixel based simply on the previous value that was there, unrelated to the contents of any other pixel. A transfer function is typically constructed that relates initial to final values and is applied to every pixel in the image. Most image processing routines allow this to be done manually (usually interactively) to increase the contrast in bright or dark regions, and many can construct such functions based on the content of the image. This is done by constructing the image histogram, a plot of the number of pixels that have each possible brightness value. Redistributing the brightness values to cover the entire dynamic range of the display uniformly (called histogram equalization) generally increases the contrast and at the same time expands brightness gradients so that the details can be more easily seen. Figure 1 shows an example (Note — all of the images shown here are taken from The Image Processing and Analysis Cookbook, a tutorial that is part of The Image Processing Toolkit CD by C. Russ and J. Russ, Reindeer Games Inc., 1999). Transfer functions are very quick to apply, often by using a ‘‘lookup table’’ (LUT) with the hardware of the computer display for essentially instantaneous results. The transfer function can expand contrast selectively in bright or dark regions (compressing or clipping the values in the opposite portions), reverse
Figure 1. Expansion of image contrast using histogram equalization: (a) original; (b) equalized. The histograms, plots of the number of pixels that have each brightness value and the cumulative sum, are shown for the original and final images. After equalization, the values are spread to cover the full dynamic range of the display, and the cumulative plot is linear.
contrast, apply nonlinear functions (gamma), and apply false color. The latter technique replaces each brightness value in a gray-scale image with a unique color and takes advantage of the fact that human vision can distinguish only 20–30 brightness values under typical viewing conditions, but hundreds of colors. This method often results in making the overall content of images more difficult to recognize and should be used sparingly to highlight specific details. Figure 2 shows an example using several different color lookup tables (CLUTs). A second kind of single-pixel operation leaves the pixel values (brightness or color) alone but moves the pixels around in the image. This so-called image warping is commonly used in television advertising to ‘‘morph’’ one image into another, but it has other uses such as rectifying images that were taken of tilted or nonplanar surfaces, so that they can be measured. Figure 3 shows an example.
576
IMAGE PROCESSING TECHNIQUES
(a)
(b)
(b)
Figure 1. (Continued)
It is usually necessary to know either the exact geometry of the image acquisition (for example, in satellite imaging of a planetary surface) or to place some fiducial marks on the subject that can be used to determine the necessary transformational equations.
NEIGHBORHOOD OPERATIONS Single-pixel operations specifically ignore the neighborhood around each pixel, so two pixels that have the same initial value will end up with the same final value, regardless of their surroundings. It is often useful to derive new values that do depend on the surroundings. There are two primary ways to accomplish this. In the spatial domain of the pixel array, a neighborhood region can be defined in which the pixels are combined in a variety of mathematical ways to calculate a new value. Alternatively, the entire image can be processed in the frequency domain, so that all pixels contribute on the basis of their relative spacing. These methods are discussed separately later.
Figure 2. Application of color lookup tables to an image: (a) original gray scale; (b) heat or color temperature scale; (c) rapidly varying cycles. Notice that small variations become more visible due to rapidly varying cycles, but the features in the image become harder to recognize. See color insert for (b) and (c).
In the spatial or pixel domain, the simplest kind of neighborhood operation combines the pixel values within a region by using a simple formula, often represented as a set of weight values that are multiplied by the corresponding pixel values and added to form a sum. This resulting value is then stored in a single pixel location at the center of the neighborhood to build up a new image, the neighborhood is shifted by one pixel, and the operation repeated. The hallmark of most image processing systems is the application of some neighborhood operation to
IMAGE PROCESSING TECHNIQUES
(c)
Figure 2. (Continued)
every pixel in the image, which is an inherently parallel operation. Some specialized image processing computers use parallel processing, array processors, or other hardware to achieve real-time speeds, but the basic algorithms can be implemented quite easily in a conventional single-processor computer to achieve the same final result. Spatial domain neighborhoods are often implemented as square regions that simplify pixel addressing, but ideally they should be circular to produce isotropic results on the image; this means that all orientations of features within the image are treated uniformly. For a set of weight values that are multiplied by a pixel and its neighbors, a square array with zero values in the corners produces a circular effect. The size of
(a)
577
the neighborhood, usually specified as the diameter but sometimes the radius, is an important parameter in these operations. The simplest of all spatial domain operators, implemented in every system, is smoothing. The matrix of weight values multiplied by the pixel brightness values ideally has a Gaussian cross section (it has been shown that this produces optimum results, although some systems that originated in the era of less computing power simply average the pixels), and a radial standard deviation is chosen to average out random noise fluctuations in the pixel values while doing the least amount of blurring to the real features in the image. Figure 4 shows an example with two different Gaussian filters (Table 1). Notice that, as the standard deviation increases, the size of the matrix of numbers (often called a ‘‘kernel’’) increases (the radius is typically 3–4 times the standard deviation). As an implementation detail, large kernels require using extended precision to prevent information loss. Many systems implement kernel operations using only integers rather than real numbers (again, a legacy of less powerful computers), which produces only an approximation of the ideal values. In this case, the integral weights are multiplied by the pixel values, and the sum is then divided by the sum of the weights. Kernels, or arrays of multiplicative weight values, are not limited to symmetrical or positive values as used in smoothing. Some of the examples shown later have negative values (often a zero sum). Kernel operations are generally referred to as ‘‘linear operations’’ because the linear equation that sums up the product includes some contribution from each pixel in the neighborhood. All linear operations can also be accomplished by using frequency domain processing; for large kernels this is often a more computationally efficient method for carrying out the procedure, but the results are identical, and most people find the ideas easier to understand with the pixel-domain representation. The smoothing operation described here in the pixel
(b)
Figure 3. Rectification of an image by morphing: (a) original picture; (b) rotated and perspective correction applied.
578
IMAGE PROCESSING TECHNIQUES
(a)
(b)
(c)
Figure 4. Gaussian smoothing applied to a noisy image (a scanning electron microscope picture of gold deposited on silicon): (a) original; (b) smoothed with a standard deviation of 0.5 pixels; (c) smoothed with a standard deviation of 2 pixels. The kernels of weights are shown in Table 1. The image has been enlarged to show individual pixels better. Note that increased reduction in the visibility of noise is accompanied by greater blurring of boundaries.
domain will be called a ‘‘low-pass filter’’ in the frequency domain, meaning that it keeps (passes) gradually varying brightness changes but removes rapidly varying changes. The most common approach for color images is to convert the stored red, green, and blue (RGB) values to hue, saturation, and intensity (HSI, discussed in another article), to process the intensity values while leaving the color information unchanged, and then to convert the results back to RGB, so that the computer can display the image.
RANKING OPERATIONS A different (nonlinear) use of neighborhoods in the spatial domain uses ranking of pixel values. One of the drawbacks of the smoothing operation illustrated before is that although it reduces visible noise in the image, it also blurs edge sharpness. The nonlinear or ranking-based alternative is a median filter that ranks the pixel values, selects the median value (the one that has an equal number of brighter and darker values in the list), and keeps that value for the central pixel. Again, like all
IMAGE PROCESSING TECHNIQUES
579
Table 1. Weight Kernels Used to Perform the Gaussian Smoothing in Fig. 4. The Upper Left Value is the Center of the Symmetrical Array; Only the Bottom Right Quadrant of the Kernel is Shown. The Sum of All of the Terms in the Kernel is Unity Standard Deviation = 0.5 pixels (5 × 5 array) 0.0796 0.0293 0.0015
0.0293 0.0108 0.0005
0.0015 0.0005 0.0000 Standard Deviation = 2.0 pixels (15 × 15 array)
0.0199 0.0187 0.0155 0.0113 0.0073 0.0042 0.0021 0.0009
0.0187 0.0176 0.0146 0.0106 0.0069 0.0039 0.0020 0.0009
0.0155 0.0146 0.0121 0.0088 0.0057 0.0032 0.0016 0.0007
0.0113 0.0106 0.0088 0.0065 0.0042 0.0024 0.0012 0.0005
neighborhood operations, this process is then repeated for all pixels in the image, always using the original pixel values and saving the modified pixel values in a new image. As shown in Fig. 5, the median filter is a good noise remover, and it does not blur or shift edges as smoothing does. The size of the neighborhood controls what is defined as ‘‘noise’’ and hence removed. If a line (e.g., a scratch) or cluster of pixels (e.g., a dust particle) covers less than half the neighborhood, those pixel values will never contribute to the median value and will be removed. Like all image processing operations, selecting of the proper neighborhood size for a particular image depends on the
0.0073 0.0069 0.0057 0.0042 0.0027 0.0015 0.0008 0.0003
0.0042 0.0039 0.0032 0.0024 0.0015 0.0009 0.0004 0.0002
0.0021 0.0020 0.0016 0.0012 0.0008 0.0004 0.0002 0.0001
(b)
(a)
(c)
Figure 5. Median filter applied to the same image as Fig. 4: (a) original; (b) 1.5-pixel radius neighborhood; (c) 3.5-pixel radius neighborhood. Note that the noise is reduced without reducing contrast or blurring the edges.
Figure 5. (Continued)
0.0009 0.0009 0.0007 0.0005 0.0003 0.0002 0.0001 0.0000
580
IMAGE PROCESSING TECHNIQUES
user’s knowledge of the way the image was obtained and which features are of interest. The median for color images may be obtained by converting from RGB to HSI and using only intensity values. However, it is also possible to define a median in color space. For each pixel in the neighborhood, plot the color values in color space. The median pixel is that for which the sum of distances to the other pixel coordinates is least. Figure 6 shows an example of a color median applied to a noisy image. As for the kernels of pixel weights, which are all positive and symmetrically arranged for smoothing operations but can be tailored to other purposes (discussed later) as well, the ranking operation has more general application than
(c)
(a)
Figure 6. (Continued)
(b)
just the median filter. Keeping the brightest or darkest value in the region can remove features and replace them with nearby background. This permits removing the background by subtraction (or division, depending on whether the original image acquisition was linear or logarithmic) to correct for nonuniform illumination, varying specimen density, or curved surfaces. Figure 7 shows an example. Ranking in neighborhoods of different sizes and keeping the difference between the brightest (or darkest) values in each can also be used to isolate features that are locally brighter (or darker) than a varying background. By varying the shape of the neighborhoods, this ‘‘top hat’’ filter can be adapted to spots or lines in an image.
EDGE ENHANCEMENT
Figure 6. Median filter applied to a noisy color image (a light microscope image of a chemically etched aluminum-silicon alloy): (a) original; (b) median filter applied using only intensity values; (c) color median filter applied. See color insert.
Edges of features in images are important for several reasons. The abrupt brightness or color change at the edge is detected by human vision, so that any processing that enhances that change will make boundaries more evident. And for measurement purposes, the location of boundaries is also clearly important. Edge and line detection is a major topic in image processing, and a variety of algorithms are employed. One of the simplest enhancement methods is the opposite of the smoothing or low-pass filter described before: a sharpening or high-pass filter (i.e., it keeps rapidly varying brightness changes but eliminates gradually varying signals). In mathematical terms, the Laplacian is a second derivative of brightness, which is isotropic (ignores the direction in which the brightness gradient is oriented). A very simple Laplacian subtracts the average of the neighboring pixels from the central pixel, so that any change regardless of direction is detected.
IMAGE PROCESSING TECHNIQUES
581
(b)
(a)
(c)
Figure 7. Use of neighborhood ranking to level background illumination: (a) original image (macro image of rice grains on a copy stand with nonuniform illumination); (b) application of rank filter to keep darkest pixel in a neighborhood of 8-pixel radius; (c) result of subtracting image b from image a, producing uniform brightness.
Just as the smoothing operation is best performed using a Gaussian kernel of weights, so the best Laplacian is obtained by subtracting using a kernel of weights that has a Gaussian shape. This is often called an ‘‘unsharp mask’’ by analogy to darkroom photo processing methods. It can be done by subtracting (pixel by pixel) a smoothed copy of the image from the original, where the degree of smoothing is just adequate to blur the edges of whatever features are important in the picture. The difference between the original and the smoothed or blurred copy is precisely the edges that are of interest. Figure 8 illustrates this. Notice that the result increases the local contrast at edges (any location where there is some initial change in brightness).
It is common to add the Laplacian back to the original image so as to retain the overall contrast as an aid to visual recognition. This is shown in Fig. 9. The drawback to the Laplacian or unsharp mask is that it is sensitive to local pixel variations, usually called noise. One solution to this is to use a difference of Gaussians, where one copy of the original image is blurred just enough to remove the noise, the second copy is blurred more to smooth the edges of features, and then the difference is obtained. As shown in Fig. 10, this reduces the noise while increasing edge contrast. A different approach to edge enhancement is to use first derivatives. A kernel of weights with positive values on
582
IMAGE PROCESSING TECHNIQUES
(a)
(b)
Figure 8. Unsharp mask applied to a low-power light microscope image of an insect: (a) original image; (b) result of subtracting a copy smoothed with a Gaussian filter that has a standard deviation of 1.0 pixels. Note the visibility of detail and the suppression of overall contrast.
one side and negative values opposite (Fig. 11) produces a first derivative of brightness with respect to some chosen direction. Obtaining two such values for each pixel with orthogonal directional derivatives allows combining the results as vectors and constructing a new image in which each pixel has a value that is proportional to the magnitude of the local brightness gradient. This ‘‘Sobel’’ operator generally delineates edges and boundaries well and is quite noise resistant. It is also possible to determine the direction of the brightness gradient at each pixel, represented in Fig. 11 as the color of the pixels along the boundaries. This information provides an easy way to determine boundary orientation and hence to measure the anisotropy of alignments or to select boundaries that lie at a particular angle. There are many other edge-enhancement and edgefinding techniques. A variety of first-derivative methods (Canny, Kirsch, etc.) differ primarily in their computational efficiency. These approaches are generally less noise sensitive than the second derivative methods. The Canny technique combines a smoothing operation with a first derivative and has optimal edge locating behavior for some classes of images. Another approach matches many templates with the local neighborhood of pixels (e.g., the Frei and Chen operator shown in Fig. 12 that is less sensitive to the absolute value of pixel brightnesses). A different approach uses statistical calculations of the pixels in the neighborhood, such as calculating the variance of pixels and using that value to construct a new image, also shown in Fig. 12. The techniques discussed try to enhance or extract edges and boundaries from images and suppress the differences in color or brightness between different
regions. Another class of operations tries to narrow boundaries, so that the intermediate pixel values between regions are replaced by the region values. Figure 13 illustrates one of these, the Kuwahara filter, that compares the variance of pixel values within subregions and keeps whichever value has the least local variation.
(a)
Figure 9. Adding the Laplacian of an image to the original: (a) original (telescopic picture of Saturn); (b) Laplacian; (c) sum of images a and b.
IMAGE PROCESSING TECHNIQUES
583
discussed before can capture this characteristic. Forming a new image from the difference between the brightest and darkest pixels in the moving neighborhood converts the original image to one in which the smooth and textured areas have different average brightnesses and can be readily separated. Other texture operators calculate statistical properties of the moving region or calculate a local fractal dimension. Human vision generally relies on local rather than global comparisons. For example, it is very difficult to
(b)
(a)
(c)
(b)
Figure 9. (Continued)
TEXTURE AND OTHER NEIGHBORHOOD PROCESSING Human vision can often distinguish quite subtle information in images to delineate structures that are not directly selectable by the computer. One example of this is shown in Fig. 14. The brightness values of the pixels in the curds in this cheese are not unique because pixels that have the same values are present in the surrounding protein. Human vision identifies the curds readily because their texture is different from the protein. The curds are visually ‘‘smooth,’’ meaning that pixel values are similar to those of their neighbors, whereas in the highly textured regions, the local pixel values vary. The ranking operator
Figure 10. Difference of Gaussians applied to the same image used in Figs. 4 and 5: (a) original; (b) unsharp mask applied (note increased noise); (c) difference of Gaussians applied (notice enhanced edge contrast and suppression of noise).
584
IMAGE PROCESSING TECHNIQUES
to the central pixel; a new image is constructed by moving the neighborhood to every location in the image. The visibility of details is considerably enhanced. The technique can also be applied to color images (Fig. 16) by processing just the intensity data and leaving the color information (hue and saturation) unchanged, as mentioned earlier.
(c)
THRESHOLDING AND BINARY IMAGES
Figure 10. (Continued)
notice the gradual variation in brightness along the walls of a room, but it is easy to see the local sudden change of brightness at a corner. The local changes can be emphasized while diminishing gradual ones by combining the idea of neighborhood processing with histogram modification. Figure 15 shows two examples in which histogram equalization has been applied within a neighborhood and the result applied only
(a)
The enhancement of images by removing brightness or color variations, sharpening edges, adjusting contrast, etc., is useful for visual examination of images and as a precursor to printing. For some other purposes, such as feature identification and measurement, thresholding is used to identify the pixels that comprise the features as distinct from the surroundings or background. Image processing prepares the image for thresholding, so that a unique range of color or brightness values corresponds to the features of interest, and then the image is converted to a black-and-white ‘‘binary’’ image. The thresholding operation is rarely perfect. Variations or noise in the original image often result in misclassifying individual pixels within or on the boundaries of features. Features also may touch each other in the original image so that they are not distinguished in the binary image as separate objects. The binary images can be processed to correct these problems. The most widely used tools are morphological erosion and dilation. In its simplest form, erosion removes a pixel that was thresholded to be part of a feature if it touches any background pixel, and dilation adds a pixel to a feature
(b)
Figure 11. Sobel operator applied to the same image used in Fig. 8: (a) magnitude of the brightness gradient outlines edges; (b) encoding the direction of the gradient vector in the hue labels the edges with their orientation. See color insert.
IMAGE PROCESSING TECHNIQUES
585
(b)
(a)
Figure 12. Other edge-finding procedures (same image used in Figs. 8 and 11): (a) the Frei and Chen operator; (b) the variance operator.
(b)
(a)
Figure 13. Sharpening edges (light microscope image of an etched metal): (a) original, enlarged to show individual pixels (note that the pixels on the boundaries between phases have intermediate gray values); (b) Kuwahara filter applied, assigning boundary pixels to the phase regions and making the boundaries abrupt.
if it touches the feature, as shown in Fig. 17. Both of these operations alter the shape of the feature (hence the name morphological) and also change its size. The change in size is generally dealt with by using the two primitive tools in combination. Erosion followed by dilation (an ‘‘opening’’) restores the feature size but eliminates fuzzy irregularities around the boundaries and isolated pixels, and dilation
followed by erosion (a ‘‘closing’’) restores feature size and fills in small holes. Unfortunately, these classical erosion and dilation operations tend to bias feature shapes because they are composed of an array of square pixels. More generalized approaches to erosion and dilation use larger neighborhoods and allow specifying particular
586
IMAGE PROCESSING TECHNIQUES
(b)
(a)
Figure 14. Texture extraction (microscope image of curds and protein in a cheese): (a) original (note side-to-side shading due to varying sample thickness); (b) variance calculated in a neighborhood of 2.5-pixel radius, showing the smooth areas (curds) as dark and the textured areas (protein) as light.
(a)
(b)
Figure 15. Local histogram equalization: (a) original image of fingerprint on a magazine cover; (b) processed to enhance local contrast and to reveal the entire fingerprint; (c) original image of dried paint in the SEM, where a wide dynamic range hides details; (d) processed to enhance local contrast and reveal details in both bright and dark areas.
patterns or numbers of neighboring pixels that are required to add or remove a pixel from a feature. Even more isotropic results can be achieved by using the Euclidean distance map. This routine measures the distance of each pixel within a feature from the nearest
background pixel (in any direction) and produces a grayscale image with that result. Thresholding this image accomplishes erosion quickly and exactly. Performing a similar operation on the background allows dilation. The procedure is generally faster than classical erosion and
IMAGE PROCESSING TECHNIQUES
587
(d)
(c)
Figure 15. (Continued)
(b)
(a)
Figure 16. Local equalization in color (portion of a photographic image): (a) original, showing high dynamic range where features are hidden in shadow areas; (b) intensity plane processed to enhance local contrast, showing details in both bright and dark areas (hue and saturation are unchanged). See color insert.
dilation and, as shown in Fig. 18, produces much better results. The Euclidean distance map also provides the basis for separating touching features. Watershed segmentation uses the distance map to find lines of inflection, where convex features meet, to remove them. As shown in Fig. 19, this can permit measuring individual objects that
are adjacent to each other, but it requires that they be convex, and it cannot deal with piles of objects in which all or parts of some features are hidden behind others. Specialized erosion rules (that can be implemented either with classical pixel-by-pixel operations or using the Euclidean distance map) are also useful to reduce
(a)
(b)
(c)
(d)
(e)
Figure 17. Classical morphological operations applied to a binary image: (a) original; (b) erosion; (c) dilation; (d) opening (note removal of fine lines); (e) closing (note filling of fine gaps). 588
IMAGE PROCESSING TECHNIQUES
(a)
(b)
(c)
(d)
589
Figure 18. Comparison of erosion/dilation by classical and Euclidean distance map operators: (a) original image (light microscope image of etched steel sample); (b) thresholded to delineate the iron carbide; (c) classical closing and opening to delineate the pearlite (lamellar) regions (note the blocky appearance of boundaries); (d) closing and opening with the Euclidean distance map (note the improved fidelity of boundaries).
features to their skeletons. These are the pixels that remain after all exterior pixels are removed; the rule is that to remove any more pixels would separate the line of pixels into multiple disconnected parts. The skeleton (Fig. 20) preserves important shape information about features and makes such basic topological properties as the number of arms, loops, and branch points easily accessible. As a simple example, a tangle of overlapping and crossed fibers can be counted by detecting the end points in the skeleton and dividing by two because each fiber has two ends. Similarly, the individual branches in a structure (e.g., a
plant root) can be separated for individual measurement by removing the branch points.
IMAGE COMBINATIONS Another class of operations that takes place in the pixel domain involves more than one image of the same area. These may be multiple gray-scale images obtained with different color filters (including satellite imagery outside the visual wavelength range), elemental distribution
590
IMAGE PROCESSING TECHNIQUES
(a)
(b)
(c)
(d)
Figure 19. Watershed segmentation: (a) original (SEM image of touching sand grains); (b) thresholded to distinguish sand grains from substrate; (c) watershed segmentation using the Euclidean distance map, separating the image into individual convex objects; (d) outlines of the individual grains superimposed on the original image. See color insert.
maps obtained using X rays in the electron microscope, and so on. They may also be images derived from the same original image by using different image processing methods, such as highlighting edges or converting texture to gray scale described earlier, or by thresholding one copy of an image. These are arithmetic and logical operations such as addition, subtraction, etc., comparisons such as keeping the brighter or darker pixel at a location, and Boolean operations such as AND and OR. When arithmetic operators are used, it is generally the practice to rescale the results to fit into the dynamic range of the display.
The simple way to do this is based on the maximum possible range of the results; for example, when adding two images, the result is divided by 2. Alternatively, the actual minimum and maximum results may be used to rescale the data. Figure 7 showed one example in which an estimate of the variation in illumination was obtained by processing and then subtracted from the original image to level the contrast. Figures 8 and 9 used subtraction and addition to accomplish sharpening. Figure 21 illustrates another use of such combinations, in which features are shown
IMAGE PROCESSING TECHNIQUES
591
(b)
(a)
Figure 20. Skeletonization: (a) an irregular binary object; (b) the skeleton, whose end points and branch points are highlighted in green and red, respectively. See color insert.
(a)
(b)
Figure 21. Feature selection using Boolean logic: (a) original image (red blood cells on a slide); (b) binary image resulting from thresholding, filling of holes, removal of small features and watershed segmentation; (c) combination of image a with image b to eliminate background and leave original gray pixel values within the cells (by keeping whichever pixel value at each location is lighter).
with their original gray-scale values, but the background is removed. This permits using the gray-scale values for densitometry or other measurements. Figure 22 shows another example in which several different elemental X-ray maps are combined to select regions of a specified composition.
THE FREQUENCY DOMAIN The Fourier transform converts a time-varying or spatially varying signal to a frequency spectrum. This method is commonly used in engineering and signal processing applications and the well-known mathematics are not
(c)
Figure 21. (Continued)
(a)
(b)
(c)
(d)
Figure 22. Logical combinations of multiple images (SEM X-ray maps from a mica specimen: (a) silicon intensity; (b) aluminum intensity; (c) thresholded area rich in silicon; (d) thresholded area rich in aluminum; (e) the area rich in silicon but not aluminum. 592
IMAGE PROCESSING TECHNIQUES
593
(e)
Figure 22. (Continued)
(a)
(b)
Figure 23. Fourier transform: (a) superposition of three sinusoids that have different wavelengths (spacing) and orientations; (b) the Fourier power spectrum showing the three spikes (and their symmetrical pairs) that correspond to the frequency (radius from the center) and orientation of each of the sinusoids.
reviewed here. When applied to an image, the result is another image in which the amplitude of the various frequencies and orientations of the sinusoidal waves that make up the original image are shown. Figure 23 illustrates this for a very simple case of three sinusoids (note that the conventional power spectrum display is symmetrical about the central point, so that each ‘‘spike’’ appears twice). For a more complex image, the power spectrum is less easy to interpret but very useful for general linear filtering operations. A ‘‘high-pass’’ filter that reduces or eliminates the low frequencies represented by points
near the center of the power spectrum produces edge enhancement just like the Laplacian operator. Likewise, a ‘‘low-pass’’ filter that reduces or eliminates the high frequencies represented by points far from the center of the power spectrum produces blurring just like a Gaussian smoothing operator. Figure 24 shows a representative example. It is also possible to select intermediate frequencies (a ‘‘bandpass’’ filter) to remove noise and sharpen edges. The most powerful use of frequency domain processing is for images that contain significant contributions from one or a few specific frequencies. This may be noise
594
IMAGE PROCESSING TECHNIQUES
(a)
(b)
(c)
(d)
Figure 24. High- and low-pass filters: (a) original image (X-ray film of a hand); (b) FT power spectrum; (c) low-pass filter that reduces high frequencies and produces blurring; (d) high-pass filter that reduces low frequencies and produces sharpening.
introduced by electronic or vibrational sources, or it can arise from processes such as halftone printing. In Fig. 25, a newspaper halftone image is shown with its power spectrum. The ‘‘spikes’’ arise from the regularly spaced dots on the page. Removing just these selected frequencies can dramatically improve the image appearance, as shown in the example. Spatial domain operations such as smoothing would also eliminate many other high frequencies and would blur the fine lines and edges in the original image. In some cases, the periodic component of the image is the signal rather than the ‘‘noise.’’ A common example is
high-resolution electron microscope images such as shown in Fig. 26. The power spectrum is identical to the electron diffraction pattern. Selecting just the bright spots in this pattern and eliminating the random noise results in a clear picture of the atomic lattice that is equivalent to averaging all of the separate images of the repetitive structure.
CONVOLUTION AND CORRELATION The application of a high-pass or low-pass filter to an image is called convolution, and as noted before, is
IMAGE PROCESSING TECHNIQUES
595
(b)
(a)
(d)
(c)
Figure 25. Removal of periodic noise: (a) original halftone printed image from a newspaper; (b) FT power spectrum of image a; (c) removal of spikes and reduction of amplitude of high frequencies; (d) resulting image retransformed from frequency space to pixel space (note that fine lines are retained).
equivalent to, but more efficient than, performing a linear operation in the spatial domain using a large kernel of weight values. However, there are no frequency space analogs to the neighborhood ranking operations discussed earlier. Convolution consists of multiplying the frequency space representation of the image, point by point, by the frequency space representation of the filter. This superimposes the blur (for instance) onto the image. Frequency space representation of an image is also useful for removing the artifacts in images that can
be introduced by the imaging system itself. One of the best known examples is the error in mirror curvature that blurred images in the original Hubble telescope deployment. Dividing the frequency domain representation of a single star, which should have been imaged as a point, into the frequency domain representation of a more complicated out-of-focus image, followed by retransforming to the spatial domain, sharpens the picture, as shown in Fig. 27. This same method (‘‘deconvolution’’) is used in light microscopy to improve the resolution of images blurred
596
IMAGE PROCESSING TECHNIQUES
(a)
(b)
(c)
Figure 26. Removal of random noise: (a) original image (high-resolution transmission electron microscope image of the atomic lattice in a ceramic); (b) FT power spectrum; (c) image produced by retaining only the spikes and producing an averaged and noise-free lattice image.
by passing the light through the thickness of the specimen. The greatest difficulty in most circumstances is determining the blur function (also called the ‘‘point-spread function’’) of the optical path, which must usually be estimated iteratively. In addition, deconvolution tends to amplify noise in the image and thus depends on obtaining high-quality images in the first place. Correlation is closely related to convolution in the mathematical sense. Both involve multiplication of the frequency space representation of the image; in
correlation, the phase of the complex Fourier transform values is reversed. In correlation, the second image is a target object which is to be located in the original image. Figure 28 shows an example. The original image is a forest of trees which are to be counted. Such a task is difficult for both human and computer. Correlating the image with that of a single tree produces a bright spot at the tip of each one, which can be thresholded and counted. Similar procedures are used for reconnaissance of camouflaged objects and for detecting defects in industrial production.
IMAGE PROCESSING TECHNIQUES
597
(b)
(a)
Figure 27. Deconvolution: (a) original image (Hubble space telescopic picture); (b) image produced by deconvolution, dividing the Fourier transform of image a by the transform of the point spread function measured from the image of a single star. See color insert.
(a)
(b)
Figure 28. Correlation: (a) original image, an oblique aerial photograph of a pine forest; (b) cross-correlational image showing the individual trees for counting that is produced by multiplying the Fourier transform of image a by the transform of the image of a single tree. See color insert.
BIBLIOGRAPHY
Classical Emphasis on Mathematical Procedures
A large number of textbooks and references books are available on image processing. Many deal with the underlying mathematics and computer algorithms for implementation, while others concentrate on the selection of methods for particular applications and the relationship between the algorithms and processes of human vision which they often emulate. A few recommendations follow.
A. K. Jain, Fundamentals of Digital Image Processing, Prentice Hall, Englewood Cliffs, CA, 1989. W. K. Pratt, Digital Image Processing, John Wiley & Sons, Inc., NY, 1991. R. C. Gonzalez and R. E. Woods, Digital Image Processing, Addison-Wesley, Reading, PA, 1993.
598
IMAGE QUALITY METRICS
Application-Oriented Illustrations Comparing Techniques J. C. Russ, The Image Processing Handbook, CRC Press, Boca Raton, FL, 1998. Computer Algorithms and Code J. R. Parker, Algorithms for Image Processing and Computer Vision, John Wiley & Sons, Inc., NY, 1996. G. X. Ritter and J. N. Wilson, Handbook of Computer Vision Algorithms in Image Algebra, CRC Press, Boca Raton, FL, 1996.
IMAGE QUALITY METRICS NORMAN BURNINGHAM Hewlett-Packard Company Boise, ID
ZYGMUNT PIZLO Purdue University West Lafayette, IN
JAN P. ALLEBACH Purdue University West Lafayette, IN
INTRODUCTION Images produced on monitors, printers, copiers, and films contain a variety of elements, including text, graphics, and pictorial segments. As viewers evaluate the quality of images, they may notice that various image elements are not well formed or do not communicate the desired intent. Examples of some of the most significant deficiencies seen in images that have pictorial segments include lack of clarity in details, noise in areas expected to be smooth, loss of information in highlights or darker areas, and inaccurate representation of color. These characteristics form the basis for defining sharpness, graininess, tone scale, and color rendition, respectively. These four attributes are significant factors that viewers consider when judging image quality. In images containing text and graphics, perception of quality is critically affected by roughness of edges or lines that should be smooth, blurred edges, or nonuniform density. Figure 1 illustrates these defects in image rendering. In addition to the imperfections described before, artifacts are often introduced into the images because of mechanical or material problems during imaging. For example, mechanical noise in the imaging chain of a printer may produce uneven development of density, called banding. Other failures of the equipment may produce a variety of artifacts such as streaks or white or black spots. Such image defects are usually produced by equipment failures and are labeled artifacts, which are distinguished from the characteristics of the image that result from normal operation of the imaging device. Figure 2 illustrates three such artifacts — banding, repetitive patterns, and streaking. Throughout this article, we have used a set of high-quality original images
to demonstrate the effects of various image quality factors. Many of these images are from the widely available Kodak PhotoCD sampler. All artifacts and defects have been simulated using image processing algorithms that have been designed to mimic closely, in a carefully controlled way, the actual effect that is being illustrated. The perceived quality of an image for which the original is not available to the observer is governed by factors such as how ‘‘pleasing’’ the overall rendered image is and how well it conforms to the observer’s expectation of what it should look like. These aspects of quality are related to a number of visual attributes of the image. For example, perceived quality generally increases for increasing degree of sharpness and decreasing levels of graininess. The presence of artifacts at any visible level tends to decrease perceived image quality. However, if an original is available to the observer, image quality may be defined by how perceptually similar the copy is to the original. We refer to this particular aspect of image quality as image fidelity. Such an evaluation may be complicated if the copy can correct some flaws that are in the original. There are also scenes in which the preferred rendering may differ from an exact copy of the original. In photography, for example, the saturation of blue sky and green grass is often exaggerated beyond those observed in nature. The complexity of the issues outlined leads us to propose the following definition for image quality: the weighted combination of all of the visually significant attributes of the image, when considered in its intended marketplace or application. The quality of images is of concern to (i) industry, which manufactures devices that produce images, such as monitors, printers, copiers, cameras, and films; and (ii) consumers, who use the devices that produce images in art, entertainment, medicine, science, education, law enforcement, and business applications. There is a marked difference in the influence of these two communities. Consumers are the ultimate evaluators of image quality, but it is the industry that conducts systematic studies of image quality to design and manufacture equipment. Image quality is in the top several attributes of value to consumers. This attitude extends across a wide range of imaging technologies including lithography, photography, electrophotographic, and ink-jet printing. If the image quality is unacceptable for a user’s application, other characteristics of the system, including the cost or its features list, are of relatively little importance. When image quality is acceptable for the intended application, other features and characteristics of the system also become discriminators on which the potential user will base a purchase decision. Therefore, image quality is appropriately the focus of much attention in the design, manufacturing, and marketing of both color and monochrome imaging systems. The quality Q of an image is determined by a number of visual attributes (parameters) represented by a vector v (examples of visual attributes include graininess, sharpness, tone rendition, and color — see
(a)
(b)
(c)
(d)
Figure 1. Four major types of image defects. Each pair contains an original image and the same image that has the defect: (a) lack of clarity in details, (b) noise in areas expected to be smooth, (c) loss of information in highlights and darker areas, and (d) inaccurate representation of color. See color insert.
599
600
IMAGE QUALITY METRICS
(b)
(a)
(c)
Figure 2. Three examples of image artifacts. Each pair contains an original image and the same image that has the artifact: (a) banding due to variation in the velocity at which paper advances through printer, (b) repetitive artifacts due to damage to the drum or roller in the print mechanism, and (c) streaking due to variation in the properties of the imaging process across the paper. See color insert.
Fig. 1). Consumers are usually interested in a global (synthetic) indicator of image quality. They often seek a single number measure (Q) that would allow them to classify devices. Examples of measures that are commonly used in this manner include the number of bits per pixel that a scanner provides or the number of dots per inch (dpi) that characterizes the addressability of a printer. However, these numbers cannot by themselves define the level of image quality for the associated devices. In fact, obtaining a single number indicator for image quality is a very difficult task. It presupposes knowledge of the relative importance of individual perceptual attributes (color gamut vs. sharpness vs. the presence of artifacts, like banding). For example, Fig. 3 illustrates a tradeoff between sharpness and graininess that may be encountered in designing an imaging system. Choosing
that image from the sequence shown which represents the best compromise between these two perceptual attributes is a difficult task. It may be even further confounded by the effect of image content. It is, however, possible that the consumer would prefer separate measures of quality for the individual visual attributes. Having this information, the consumer would be in the best position to decide which attributes are of greatest importance for any specific application. The task of the designers and manufacturers of imaging systems is to manipulate the design variables d so that the image quality is maximal (examples of design parameters include particle size, halftone screening, and exposure). Let us assume that the consumer expects a single number (metric) Q to evaluate the overall quality of an image. Then, the task of optimizing image quality is to find the
IMAGE QUALITY METRICS
601
(a)
(b)
Figure 3. Trade-off between sharpness and graininess: (a) original image, (b) sequence of images ranging from maximum blur and no noise to no blur and maximum noise (left to right and top to bottom). See color insert.
values of the design parameters such that the quality Q[v(d)] is maximal. This optimization problem involves constraints. The constraints are related to cost, size, functionality, reliability, and serviceability. The tightness of these constraints is likely to affect the solution of
this problem and thus differentiate the performance of available products of a given class. The design variables d do not directly characterize image quality. A well-known example of a design variable that is often used by printer vendors as a measure of image
602
IMAGE QUALITY METRICS
quality is the number of dots per inch (dpi). However, dpi is usually a specification of addressability, the frequency at which an exposure device can address the image sensor. This could be the frequency at which an exposing laser is turned on and off or the spacing of an LED exposing bar. Addressability speaks only of exposure and does not consider the remaining elements of an imaging system, so it is not a good descriptor of overall image quality. Even if addressability in exposure were translated into resolution in the image, it would represent only one of the visual attributes necessary to describe overall image quality. Obviously, if the dpi are very low, image quality will be low as well, regardless of the values of other variables. The potential benefits of increasing dpi can be realized only if dot size is decreased and all other imaging elements of the system are close to an optimal level, which assures high quality. If dpi is very high, increasing it even more is not likely to improve quality. This discussion of dpi illustrates that material specifications and engineering design variables are poor candidates for an image quality metric. Image quality Q involves a subjective metric. It is important to realize that ‘‘subjective’’ does not mean vague, variable, or unspecific. ‘‘Subjective’’ refers to the fact that image quality involves properties of the observer’s percept, rather than physical properties p of the image itself (examples of physical properties of the image include reflectance and contrast). The relations between the physical properties p of the image and visual properties v and Q are established by using methods of psychophysics. Consider a simple example: two images, each depicting a sinusoidal change of reflectance (albedo). Let the physical contrasts in the images be different. The difference p between the two contrasts will give rise to a perceived difference v of contrast. The two images are shown to a given observer a number of times. Note that whereas p is constant in this experiment, v is a random variable (1). Specifically, the perceived difference between two contrasts varies from trial to trial. As a result, the perceived difference is characterized by a probability density function. The properties of this distribution are inferred by using psychophysical methods. These methods will be described and illustrated later. We begin by presenting ‘‘objective’’ methods, that define and measure the physical and geometric properties p of the image and their dependence on the design parameters d. Even though knowledge of the physical properties of the image does not directly lead to an adequate measure of image quality, their characterization is essential in designing high-quality devices. After the relations p(d) are established, one proceeds to identify the relations between the physical properties p of the image and the perceptual properties v. This is done by using psychophysical methods such as scaling and discrimination. Next, the perceptual properties v are combined and lead to an overall image quality Q. This is done by using psychophysical methods such as scaling. Finally, the relation between Q and p is identified and represented by models of image quality. This classification closely resembles the image quality circle of Engeldrum (2). He organized a logical framework to relate the various processes of image quality characterization.
He takes four milestones, technology variables, physical image parameters, customer perceptions, and customer quality preference, and relates them through various types of models. This work forms a hierarchical framework for understanding an overall perspective of image quality. OBJECTIVE MEASUREMENTS OF ATTRIBUTES OF IMAGES Objective metrics are measures that quantify specific physical attributes of the imaging process. Such measures are useful tools for understanding how the design parameters of the imaging process impact image reproduction. It may also be possible to establish a correspondence between such measures and the results of subjective evaluation of image quality. However, these measures generally cannot form the basis for a summary judgment of image quality because they consider only a subset of the factors that impact perceived image quality. In this section, we will first discuss the objective measures themselves. We will then look at how these measures may be used to predict image quality. The set of potential objective measures that are related to image quality is unlimited, and indeed a large number of such measures have been proposed over the years. In this section, we will restrict our attention to a set of measures that have been developed by an International Standards Organization (ISO)/ International Electrochemical Commission (IEC) working group (3). These measures are specifically directed to assessing the quality of hard-copy output from an imaging system, print quality. The motivation for choosing this set is that it has undergone a process of refinement and winnowing out of redundant metrics or those that do not correlate well with perceived image quality. It also represents to some extent the consensus of a diverse group of experts who work in the field of image quality. Although this set was developed for monochrome devices that have binary output capability, it in fact has broader applicability and will give the reader a general sense of the way objective metrics are chosen. These standards activities and the definition of a comprehensive set of image quality metrics are ongoing efforts. For further information, the reader may consult (4) and (5). In the past, objective metrics were based exclusively on measurements made by using specialized and costly instrumentation. The instrumentation was costly because of the required precision and the low volume of sales of such devices. More recently, the blossoming of digital imaging has, as a by-product, made computation of these metrics much less costly and consequently more broadly available to users of imaging systems. Specifically, low cost flatbed scanners or more expensive drum scanners can now be used to digitize hard-copy output. Once the page data are in digital form, computation of the metrics is simply a digital image processing task. Reference (6) describes one approach to creating such an integrated tool for print quality assessment. It is important to note, however, that the use of a scanner as the image capture device is not a panacea for image quality assessment. The problem is that the scanner is an imaging device in its own right and has its
IMAGE QUALITY METRICS
own image quality limitations. Thus, before using such a device as the basis for computing objective metrics, it is necessary to assess whether or not its performance is sufficient for the intended task. The first issue that needs to be addressed is that of the sampling rate (dots/in). It has been found that some objective metrics are strongly dependent on sampling rate (6). Once the sampling rate is fixed, then it should be verified that the scanner actually has sufficient optical and electromechanical resolution to yield quality data at this sampling rate. This can be done by measuring the modulation transfer function as detailed in (6). In addition, the scanner must be calibrated so that its output can be represented in standard radiometric or photometric units. In this section, we will use primarily units of reflectance which represent the fraction of incident light flux that is reflected from the paper, and thus lie between 0 (black) and 1 (white). Some of the metrics are expressed in units of density D, which is related to reflectance r by (1) D = − log10 (r). To calibrate the scanner, we print a set of test patches ranging in reflectance from 0 to 1. Then we scan these patches using the scanner and compute an average reflectance for each patch. We also measure the reflectance of the patches using an instrument such as a Gretag SPM-50 (7). Figure 4 shows the resulting data points for reflectance as a function of the scanner output of a Howtek D4000 drum scanner (8). Note that the relationship is not linear. We use the second-order polynomial, G = c2 ∗ H 2 + c1 ∗ H + c0 ,
(2)
and coefficients c0 = 2.650 · 10−3 , c1 = 1.242 · 10−1 , and c2 = −2.526, to relate the Gretag data G to the Howtek data H. This particular fit was obtained using the MATLAB ‘‘polyfit’’ function (9). The resulting polynomial will allow us to convert any raw scanner output value to a reflectance value. Note that the result of the calibration
Gretag calibrated value
200 Fitted curve Data points Outlier
150
603
process may depend on the colorant and media and should ideally be redone whenever either is changed. To develop a scanner-based solution for computing a particular objective metric, we need to design in concert a test pattern and an image processing algorithm, according to a specification of how the metric is to be computed. The test pattern may consist of lines, tint fills, or other structures that exhibit the specific features to be measured. Generally, each test pattern may serve as the basis for computing several different objective metrics. Repeating this process for the complete set of metrics of interest results in a full test page and a suite of image processing algorithms. The test page is embodied in the form of a digital file that can be printed by the target device, whose quality is to be assessed. Once the scanner has been calibrated and evaluated and the test page and processing algorithms developed, then, we are ready to use the system to assess image quality, according to the steps shown in Fig. 5. We first print the test page, then, we scan it; and finally, we compute the metrics. The metrics developed by the ISO/IEC working group may be divided into four groups: line metrics, solid-fill metrics, tint solid metrics, and background field metrics, as shown in Table 1. We will next discuss each of these groups of metrics. A suitable test pattern for measuring line metrics consists of two sets of parallel lines of varying widths. One set is oriented horizontally, and the other set is oriented vertically. Both are needed because the image reproduction properties of digital printers differ in the scan (orthogonal to paper movement) and process (parallel to paper movement) directions. To describe the line metrics, we need to define several quantities that are illustrated in Fig. 6. This figure shows the reflectance cross section of a scanned line. At the left and right edges of the plot, we observe the reflectance Rmax of the paper substrate which is close to one. As we move toward the center of the plot, we enter the region where the printer has deposited colorant, and the reflectance dips to its minimum value Rmin . The thresholds T10 , T60 , and T90 indicate the points at which the reflectance of the line profile has decreased by the corresponding percentages (10, 60, or 90) of the difference between Rmax and Rmin . The character field is defined as the region on either side of the line and within 500 µm of
100 Test target (file)
Printer
Printed target
50
0
0
100
200
Spot solid area Darkness= 0.6051 Mottle = 0.0134 LADV = 0.0840 Voids = 0.000
300
Scanner output value Figure 4. Raw reflectance data and fitted curve for calibration of Howtek D4000 drum scanner. The reflectance values are scaled to lie between 1 and 256. The outlier was not included in the polynomial fit.
Scanner
Workstation / PC
Metric values
Figure 5. Overview of procedure for computing objective metrics.
604
IMAGE QUALITY METRICS Table 1. Summary of Objective Metrics Line Metrics Blurriness Stroke width Raggedness Contrast Fill Darkness Extraneous marks Background haze Solid-Fill Metrics Overall darkness Mottle Large area density variation (LADV) Voids Tint Solid Metrics Overall darkness Large area density variation (LADV) Mottle Granularity Background Field Metrics Extraneous marks Background uniformity
1
R max T 10
Average reflectance
0.8
0.6 T character field
0.4
character
T 90 R min
0.2
0
60
character field
0
500
1000 Distance (µm)
1500
2000
Figure 6. Visual representation of quantities used in the ISO/IEC metric definitions.
the T10 threshold, excluding the region between the two T60 crossing points, which contains the line itself. Now, we are ready to discuss the line metrics that are listed in Table 1. The first four metrics quantify the edge properties and geometry of the line. Blurriness quantifies the sharpness of the transition across the line edge from the background region, where no colorant is present, to the interior of the line where colorant is present. It is measured as the mean physical distance between the T10 and T90 boundaries for the edge profile, measured on both sides of the line. This metric has also been called the normal edge profile and edge transition width. The stroke width is the mean width of the line measured as the distance between the T60 boundaries on either side of it. The blurriness and stroke width metrics describe the profile of the line normal to its edge. Raggedness describes the properties of the line tangential to its edge. A straight
line is fitted in a least squares sense to the T60 boundaries of the printed line. The raggedness metric is calculated as the standard deviation of the residuals of the actual T60 contour to the fitted line. This metric is also called the tangential edge profile. Contrast describes the difference between the reflectance of the line and the substrate. It is calculated as C = (Rmax − Rmin )/Rmax . The remaining line metrics quantify the uniformity of the line and the region surrounding it. Fill describes the homogeneity of the reflectance of the line itself. It is measured as the ratio of the number of pixels within the T90 boundaries of the line whose reflectance is less than the T75 threshold to the total number of pixels within the T90 boundaries. Darkness is the mean optical density within the T90 boundaries of the line. Extraneous marks refers to the presence of stray marks near the line. It is measured as the ratio of the number of pixels in the character field whose reflectance is less than the T20 threshold to the total number of pixels in the character field. Finally, background haze describes the presence of colorant near the line, not discernible as discrete marks. It is defined as the ratio of the mean reflectance of the character field, excluding stray marks, to the mean reflectance of the background (substrate) outside the character field. The solid-fill metrics refer to the attributes of an area of solid colorant. Overall darkness is the mean optical density within the T90 boundaries of a solid-fill area. Mottle is the standard deviation of optical density of small (2 × 2 mm2 ) regions within the T90 boundaries of a solid-fill area. It is computed by convolving the optical density of this region by using a 2 × 2 mm2 averaging window, and then computing the standard deviation of the result. Large area density variation (LADV) is similar to mottle, but it is based on a larger (4 × 4 mm2 ) region, and is calculated as the difference between the maximum and minimum density values obtained after convolution by using a 4 × 4 mm2 averaging window. The final solidfill metric is voids which is the percentage of a solid-fill area devoid of colorant. It is calculated as the ratio of the number of pixels within the T90 of a solid-fill area to the number of pixels in this region whose reflectance is greater than the T40 threshold. Tint solid metrics refer to the attributes of a tint solid (halftone) area. These metrics include overall darkness, mottle, LADV, and granularity. The first three of these metrics are computed in essentially the same way as for solid-fill regions. The procedure for measuring the fourth metric is not completely specified in (3). Granularity is intended as an objective measure of graininess, which is a subjective measure of the kind of noise illustrated in Fig. 1b. References (10) and (11) suggest computing it according to G = e−1.8D
ˆ ht (u, v)]V(u, v)2 }, {W(u, v)[1 − W
(3)
u,v
where (u, v) represent discrete spatial frequencies, V(u, v) is the normalized contrast sensitivity function of the human visual system (HVS), W(u, v) is the power spectrum ˆ ht (u, v) is the normalized power of the tint solid area, W spectrum of the halftone pattern in the area, and D is
IMAGE QUALITY METRICS
the mean optical density of the solid tint area. The power spectrum Sf (u, v) of an M × M point digital image f [x, y] can be computed (12) as the magnitude squared of the discrete Fourier transform:
(4)
The contrast sensitivity at a spatial frequency u is defined as the ratio of the background or average luminance to the amplitude of a sine wave grating of frequency u that is just detectable. Thus, it is a threshold measurement. A number of models for the HVS contrast sensitivity function have been reported in the literature (13,14). Barten’s work (13) is of particular interest because it incorporates a physiologically based model for the human eye and contains parameters that allow the model to fit measured contrast sensitivity data from several different studies. Figure 7 shows two contrast sensitivity curves that have been used (10,11) to compute granularity. They assume a viewing distance of 350 mm. Subjective testing by Hino et al. (15) revealed high correlation between graininess and granularity for either choice of contrast sensitivity function in Fig. 7. Looking at Eq. (3), we see that granularity is the rootmean-squared noise power of the tint solid area modified by the contrast sensitivity function of the human viewer. ˆ ht (u, v)] eliminates the spectral power of The term [1 − W the periodic halftone pattern in the solid tint area from this computation. The exponential term in D compensates for the effect that darker areas appear less grainy than lighter ones.
Bouk and Bumingham Dooley and Shaw
0.8 Normalized CSF
2 M−1 M−1 −2π(ux+vy)/M f [x, y]e Sf (u, v) = . x=0 y=0
1
605
0.6 0.4 0.2 0
0
2
4
6
8
10
Spatial frequency (cycles/mm) Figure 7. Two models for the normalized contrast sensitivity function of the human visual system.
Background field metrics refer to the attributes of an area of the substrate (paper), where no colorant should appear. Extraneous marks is the proportion of the background field containing colorant. It is computed in exactly the same way as the extraneous marks line metric, except that the measure area is not constrained to be near a line. Finally, background uniformity is the standard deviation of the reflectance of the pixels in a background field area. We conclude this section by showing a sample application of some of these metrics. Figure 8 shows part of a test target printed by four different combinations of printer technology and media. Figure 9 shows four of the line metrics computed for each of these four cases. The metrics show that the ink jet on special paper yields the darkest lines, whereas the remaining three combinations
(a)
(b)
(c)
(d)
Figure 8. Four combinations of printer technology and media scanned at 4000 dpi: (a) laser printer using coated paper, (b) laser printer using cotton bond, (c) ink-jet printer using standard paper, and (d) ink-jet printer using special ink-jet paper.
606
IMAGE QUALITY METRICS
(a) 1.5
0.015 Laser printer Ink-jet printer
0.01 1
Fill
Darkness (optical density)
(b) Laser printer Ink-jet printer
0.005 0.5
0
0 Coated Cotton Standard Special Paper type
40
Laser printer Ink-jet printer
30 25 20 15 10
Nominal width Laser printer Ink-jet printer
400 Stroke width (µm)
Raggedness (µm)
35
Cotton Standard Special Paper type
500
(d)
(c)
Coated
300
200
100
5 0
Coated
Cotton Standard Special Paper type
0
Coated
Cotton Standard Special Paper type
Figure 9. Line metrics for four combinations of printer technology and media. Each metric is an average over ten replications of the test target which were separately printed and scanned: (a) Darkness, (b) fill, (c) raggedness, and (d) stroke width.
have similar darkness. The ink jet on standard paper has the largest value for the fill metric (percentage of pixels without colorant). Again, the remaining three combinations have similar values for fill. The laser on coated paper has the greatest edge raggedness, and the ink jet on special paper has the least. The stroke width is similar for all four combinations, ranking in the same order as raggedness. Visual inspection of Fig. 8 confirms these results. At first glance, one might conclude that the laser on cotton has greater raggedness than the laser on coated paper. However, this impression is due to the fact that the laser on cotton has more scattered toner than on coated paper. By restricting one’s attention to the immediate vicinity of the line edges, we see that indeed the coated paper does yield a more ragged edge. As mentioned at the beginning of this section, we have sampled only a narrow slice of the work that has been done in the area of objective metrics for image quality. In particular, we have not discussed objective methods that provide a summary judgment of image quality, such as Barten’s square root intregral (SQRI) method (16), which is based on a single metric, or methods that combine several different metrics to yield a single value, such as the recently reported work by Keelan (17).
SUBJECTIVE METHODS There are two general approaches to measuring perceived quality, which differ with respect to whether the relevant physical properties (dimensions) are known to the experimenter. As an example of the first type, one may want to compare several imaging devices with respect to the quality of color reproduction of a complex original image. In this case, it is not immediately obvious which physical variable governs this judgment. In fact, there are a number of variables, but the experimenter does not exactly know what they are. Therefore, the only option that the experimenter has is to use methods that can reconstruct the perceptual scale of ‘‘color reproduction’’ along which the images can be arranged. Depending on the method, the scale could merely represent the order, or it can also represent perceptual distances or even ratios among the images (18). If, on the other hand, the task is to verify whether a test image and a reference (ideal) image lead to the same perceived contrast, the relevant physical dimension is physical contrast. In this case, one can measure the relation between the probability of detecting the difference and the contrast. These methods allow measuring detection
IMAGE QUALITY METRICS
and discrimination thresholds, and they do not require identifying the perceptual scale. The first group of methods (psychophysical scaling) provides direct information to the researcher about perceived image quality and is frequently used to scale complex images. Benchmark comparison of several printers or evaluation of unidimensional visual metrics such as the magnitude of graininess in a set of images are examples. The second group of methods, detection and discrimination, may be used to determine the threshold level of visible banding or to identify if a change in a process has led to an improvement in a visual characteristic. An important reason for considering the latter is that these methods do not confound the percept with the response bias (whereas scaling methods do). As a result, they produce more accurate and reliable results. Scaling Methods Rank Ordering Example. Color printing processes are subject to errors in placing of each of the color records as they are combined into the full color image. This register error is particularly noticeable along sharp edges as in lines, text, or distinct boundaries in pictorial images. The control of the image through color record development and transfer is a crucial task in designing printing systems. Establishing design limits of register error is a necessary step in equipment development. The image type of greatest practical significance and also of highest visual sensitivity to register errors is text. Graphic images have a similar sensitivity to these errors. Images that have pictorial content are less challenging than text and graphics. Therefore, an experiment using text and graphic images will provide a conservative test for register error. The following test images were used in the experiment:
607
1. 10 point Times Roman text; magenta and black 2. 4 point Times Roman text; magenta and black 3. A bar chart graphic image; magenta, cyan, and black The two font sizes represent both the very common 10 point and the small size 4 point text. Each of the images was produced using a range of variation in register error. Figure 10 shows misregistered 10 point text that is similar to that used in the experiment. Because creating images that have all possible combinations of register errors would be cumbersome, it was decided to use the visually most sensitive cases of misregistration. The registration errors were propagated in the horizontal, cross-track direction. A set of nine register errors from that just perceptible to very large (10.0 to 150 µm,) was selected. There was also some register error in the in-track direction, but this error was small. The subjects were asked to rank order the images for the quality of color registration. The nine samples of each image type were viewed by 33 subjects who created a rank order scale of perceived quality based only on the register errors. From such a scale, one can estimate an interval scale of quality. This estimate is based on the assumption that to order the test images, the observer must compare each stimulus with all of the others in the set. The relative frequency of assigning an ordinal rank to a given sample allows computing the corresponding standard normal variable, which is taken as an estimate of the interval scale for the percept. The details of this procedure are beyond the scope of this article but can be found in (19). In addition to the rank ordering, the observers were asked to select samples that they felt were acceptable for a specific office use. This acceptability function is a useful tool in assessing the value of making technological changes. The interval scales of registration quality for the three different types of image content are presented in
(a)
(b)
(c)
(d)
Figure 10. Misregistered blue 10 pt. text that has varying degress of displacement between the cyan and magenta planes: (a) 0-µm displacement, (b) 40-µm displacement, (c) 80-µm displacement, and (d) 120-µm displacement. See color insert.
608
IMAGE QUALITY METRICS
10 Graphics 10pt. text 4pt. text
Quality
8 6 4 2 0
0
30
60
90
120
150
180
Misregistration (µm) Figure 11. Perceived quality as a function of misregistration.
Percent acceptability
100 Graphics 10pt. text 4pt. text
a large number of image types would be necessary to accomplish the comparison accurately. One way to simplify this task is to use images that contain the image elements that the manufacturer feels define the marketplace use of a particular printer. In this way, a relatively small number of images can approximate the overall image quality that a customer might see. Overall evaluation of image quality requires an observer in an assessment experiment to combine the various parts of the test images into an overall impression. Some attributes of a printer may be better and some worse than those of a competitor. The weighting of this combination is personal for each observer. However, there is an implicit assumption that an underlying unidimensional metric of image quality exists. Consider the following experiment that involved five printers, as well as the reference printer from a given manufacturer. These printers varied significantly in addressability and other fundamental specifications. Five different test images were selected (see Fig. 13). Rating techniques are a convenient tool for evaluating a large number of images (20). There are essentially three forms of rating
50
0
0
100
(a)
(b)
(c)
(d)
200
Misregistration (µm) Figure 12. Acceptability as a function of misregistration.
Fig. 11, and the acceptability curves are shown in Fig. 12. First, note the linear relationship between the perceptual scale representing quality and physical misregistration. Next note a sharp decrease in acceptability across a narrow range of register error (10–70 µm) for all three of the image content types. Finally, note the similarity of the three curves in both graphs. Before the data were collected, it was expected that there would be substantial differences in sensitivity to register errors for the different target types. Indeed this expectation formed the basis for the experimental design. It is seen that if the goal is to achieve an acceptance rating greater than 90%, the registration error should be less than 55–60 µm for all of the common image types included in this experiment. There are certainly other image types that would have lower sensitivity to register error that were not included in this experiment.
(e)
Category Scaling Example A printer manufacturer must frequently assess the capabilities of a printer under development with those made by key external competitors. One way to accomplish this comparison is to generate images from all printers of interest and conduct a visual evaluation. However, because perceived quality is a function of image content,
Figure 13. Test images used to assess the impact of technology on print quality: (a) news, (b) DNA, (c) vacation, (d) bicycle, and (e) musicians. See color insert.
IMAGE QUALITY METRICS
8
Quality rating value
News DNA
Vacation Bicycle
Musicians
4
Measuring thresholds Consider measuring the difference threshold DL.1 The difference threshold is the smallest difference between two stimuli that can be reliably detected. For example, an experimenter may be interested in the value of DL for physical contrast. This value may then be used to predict the probability that the observer will detect the contrast difference between the rendered and the reference images. The method of constant stimuli is commonly used to measure a difference threshold. In this method, a number of levels of the test stimulus are chosen around the value of the reference stimulus and presented to the observer in a random order. The lowest level is chosen so that the observer will almost always correctly perceive it as lower than the reference stimulus, and the highest level so that the observer will almost always correctly perceive that it is higher. Each test stimulus is shown a number of times. The proportion of trials in which the test stimulus was perceived as higher than the reference is plotted against the value of the test stimulus. This relation is then approximated by a theoretical curve (e.g., Gaussian distribution function). The mean value is called the point of subjective equality (PSE), and the standard deviation is a measure of the difference threshold DL. This curve is called a psychometric function. Difference thresholds for contrast as a function of the spatial frequency and reference contrast are shown in Fig. 15. When the reference contrast was zero, the discrimination task was equivalent to the contrast detection. In this case, the threshold is strongly affected by the spatial frequency, when the frequency is greater than 8 cyc/deg. For larger reference contrasts (16 and 64%), the threshold is not affected much (if at all) by spatial frequency. Note that the psychometric function measured in the discrimination experiment does not directly allow
3 Constant increment threshold (%)
scales: numerical, adjectival, and graphical. The numerical rating scale consists of a range of numbers bounded by two anchors. The rater identifies the position that represents the appropriate proportion of the attribute in the test sample relative to the two reference points. It is a relative assessment. The adjectival rating scale does the same thing, except that it uses adjectives (e.g., low, average, high) which if properly chosen, will imply a series of equal intervals. On a graphical scale, the two extreme anchors are connected by a continuous line. The observer’s task is to mark the place along the line that represents the relationship of the test sample to the anchors. There are a number of experimental problems including a central-tendency effect, which refers to the fact that observers are reticent to use the ends of the scale. However, the rating scale experiments tend to be relatively rapid and are appropriate for experiments that have large numbers of stimuli. In the present experiment, a nine-point rating scale using imaginary anchors was selected. The imaginary anchors are Best Imaginable Image Quality, Category 1, and Worst Imaginable Image Quality, Category 9. Thirtyfour observers who had some experience in evaluating images were shown the thirty samples of the test set and given the task of evaluating overall image quality. In a brief pilot experiment, the observers rated four images. This pilot experiment familiarized the observers with the procedure of the experiment and the range of the quality of the images. The samples were placed on a table face up under the appropriate category designator. As succeeding samples were located, the observer had the option of moving images that were evaluated earlier. The experiment was conducted under a D50 illuminant and used a Munsel 5 gray surround. The results are shown in Fig. 14. It is clear that image quality is content-dependent because various elements of the image interact with strengths or weakness of the imaging technology. For example, Tech 5 handles text and graphics very well but suffers in images requiring better tone reproduction.
0.0% 0.5% 2.0% 16.0% 64.0%
1 0.3 0.1 0.03 0.01
1.0
0 Ref
Tech 1 Tech 2 Tech 3 Tech 4 Tech 5
609
2.0 4.0 8.0 16.0 Spatial frequency (c/deg)
27.0
Figure 15. Contrast sensitivity functions for reference contrasts of 0.0, 0.5, 2.0, 16.0 and 64.0% for Gabor patches that have an average luminance of 40 cd/m2 .
Technology Figure 14. Perceived quality as a function of printer technology, as judged in a rating scale experiment.
1 The letter ‘‘D’’ in DL stands for ‘‘difference.’’ ‘‘The letter ‘‘L’’ stands for ‘‘limen,’’ which means ‘‘threshold.’’
610
IMAGE QUALITY METRICS
predicting the probability that two images are perceived as different. To estimate this probability, one has to design the experiment differently. The levels of the test stimuli should vary between the magnitude of the reference stimulus and the value which is almost always judged as greater (or different) than the reference. The trials where the test stimulus is equal to the reference stimulus are called ‘‘catch trials.’’ They allow the experimenter to verify whether the response bias of the observer was not too extreme. Ideally, the proportion of responses ‘‘greater’’ on catch trials should be close (but not equal) to zero. The relation between the proportion of responses greater (different) and the magnitude of the test stimulus is again approximated by a theoretical curve. This curve can be used directly to predict the probability that two images will be perceived as different. The two types of experiments described and the parameters of the psychometric functions are related to one another (1). The method of constant stimuli involves a large number of trials. As a result, the experiments are time-consuming. Adaptive psychophysical methods reduce the number of trials by eliminating the levels of the test stimulus which lead to the proportion of correct responses close to zero or one (21). MODELS FOR IMAGE QUALITY METRICS Up to this point, we reviewed methods for measuring and evaluating physical properties p of an image, as well as the relation of these properties to the design parameters d. Second, we reviewed methods for measuring the relation between the visual properties v (as well as the quality Q) and the physical properties p of the image. In principle, the results of all of these measurements should be sufficient to maximize the perceived quality. Note, however, that finding the set of optimal values of the design parameters would require an enormous number of psychophysical experiments, so that the objective function Q is known for every set of values of the design parameters. Such an endeavor would be too expensive and time-consuming. A more realistic approach is to develop a computational model that allows predicting visual parameters v and Q, based on the values of the physical parameters p of the image and then use the model to determine the set of optimal values of the parameters p. Once these values are known, it is a task of an engineer to determine the values of the design parameters d, which would assure the desired values of parameters p and hence, the high quality of the image. This section concentrates on formulating a computational model representing the relation between v and p. In view of the generality and complexity of our earlier definition, we restrict our consideration to the simpler concept of image fidelity, which, as mentioned earlier, is one aspect of image quality. Consider the task of evaluating the perceived image fidelity, that is, given two images, original O, which is assumed to be of very high quality and completely free of artifacts, and rendering R, the model has to estimate the probability that the human observer will detect a difference between the two images. If this probability is close to zero, the rendered image is perceptually identical
to the original image, and thus its quality Q is assumed to be high. We begin by using a simple version of this task, that is, we assume that the images are monochromatic and that the model has to assign the probability of detecting a difference at all points of the image. Thus, the model will generate a spatial map of probabilities instead of a single probability for the entire image. We assume that the two images have identical physical sizes and that the correct alignment (registration) of the images is known. Clearly, if the images are misaligned, the model is likely to compare noncorresponding parts of the two images, which would give rise to spuriously high probabilities of visible differences. An image can be considered an array of points (pixels). If reflectances (albedos) of all points in two images are identical, then the two images should be perceived as identical. Therefore, it is reasonable to begin by attempting to understand (model) the percept of albedo of individual points (or regions) of the image. How can an observer obtain accurate information about the albedo of a surface? This problem is known in perceptual literature as a lightness constancy problem (lightness is the percept of albedo). The observer’s eye receives light reflected from the surface. The intensity of the reflected light is the product of the intensity of the incident light and the albedo of the surface. Thus, the observer receives the information about the albedo confounded by the intensity of the illuminant. The task for the visual system is to evaluate (reconstruct) the albedo by separating the effects of the illuminant and the albedo itself. Obviously, the lightness constancy problem is simplified if the two images are viewed under identical conditions. Then, the observer may simply compare the intensities of the light on the retina. These intensities are equal if and only if the reflectances are equal. This simplified case is potentially relevant to the issue of assessing image fidelity because an experimenter can always adjust the intensities of the light sources so that they are equal. However, it cannot be assumed that the human visual system uses simplified perceptual mechanisms in conditions that are physically simple. In fact, it is known that in this case it does not do so. Therefore, to understand the perceptual mechanisms that underlie lightness perception, it is necessary to consider a general case in which the viewing conditions (intensity of the illuminants) are not necessarily identical. Consider two surfaces whose albedos a1 and a2 are very similar. Let the intensity of the incident light be I1 . The intensities of light reflected from the two surfaces and stimulating the observer’s eye are I1 a1 and I1 a2 . Assume that the two stimuli are just barely discriminable. Specifically, assume that the human observer can detect the difference between I1 a1 and I1 a2 with a probability of 84%. Using the terminology of psychophysics, the difference threshold DL1 = I1 (a2 − a1 ). Now, assume that the same two surfaces are viewed under a different illuminant of intensity I2 . If the percept of the albedo of the two surfaces remains the same, despite the change in the intensity of the illuminating light, then the two stimuli I2 a1 and I2 a2 should be just barely discriminable. Specifically, the difference threshold (defined by the 84% level) should be equal to the difference between
IMAGE QUALITY METRICS
the intensities of the lights entering the eye: DL2 = I2 (a2 − a1 ). It is clear that in this case DL1 = DL2 that is, lightness constancy implies non-constancy of the difference threshold. What is constant in the case of lightness constancy is the ratio DL/(Ia1 ): DL1 /(I1 a1 ) = (a2 − a1 )/a1 = DL2 /(I2 a1 ).
(5)
This ratio is called a Weber fraction, and its constancy (called Weber’s law) represents one of the fundamental properties of human vision. We showed that this constancy is a necessary (although not sufficient) condition for lightness constancy. It follows that the ratio of albedos is more relevant to a perception of the albedo than the albedo itself. This seemingly paradoxical statement is a consequence of the fact that the observer does not have direct access to the albedo. Now, consider a more general case where two surfaces have different albedos and the difference is not necessarily small. Then, the percept follows a rule analogous to that expressed in Eq. (5). Specifically, Wallach (22) showed that the percept of the albedo of a test surface is strongly related to the ratio of the albedos of a test and a reference surface (surround). For a simple periodic pattern of amplitude A and average luminance M, this ratio is called a contrast: C = A/M = (amax − amin )/(amax + amin ),
(6)
where amax and amin are the maximum and minimum albedos in the stimulus. If the numerator and denominator of the right-hand side of Eq. (6) are multiplied by a constant I (intensity of the illuminant), one obtains a ratio of the intensities of the reflected light, and the value of this ratio is still equal to contrast C. This means that contrast C can be computed from the intensities of the reflected light, rather than from the albedos. Furthermore, the contrast computed this way is insensitive to the intensity of the illuminant. If an image is more complex than a single periodic pattern or a set of two patches that have different albedos, the percept of the albedo of a test patch is affected by the entire context (23–25). Thus, the simple ratio rule expressed by Eq. (6) no longer adequately describes the percept. Note, however, that in image fidelity, the two images have the same content, and they are similar in all aspects. Therefore, one can conjecture that the ratio rule (contrast) does provide an adequate measure of the perceived albedo. This conjecture is used by all current models of image quality: the models compare the original and rendered images for contrasts, rather than absolute albedos. It goes without saying, nevertheless, that this conjecture would need to be examined in some detail, and it is quite possible that future models of image fidelity will have to be more general. So far, we discussed the case of one test patch in the context of its surround. How do we generalize this case to complex images, in which some patches are small and other patches are large (i.e., they are of different spatial scales)? It is commonly accepted now that the human visual system processes visual information on many levels
611
of scale and resolution. Specifically, the visual system can be adequately represented by an exponential pyramidal architecture (26–28). In this architecture, the information is processed within small regions (receptive fields) of high spatial resolution on the lower layers of the pyramid. The regions get larger and resolution gets lower when one progresses up the pyramid. It follows that models of image fidelity must involve pyramidal architecture and compute contrasts locally on several levels of scale and resolution [e.g., (29,30)]. Figure 16 provides a schematic illustration of a pyramidal algorithm that creates representations of the image on several levels of scale and resolution. The image is presented to the bottom layer of the pyramid. The higher layers represent the image at lower resolution. Specifically, the input image is convolved (blurred) by a Gaussian filter. The size of the filter increases by a factor of 2 between layers, although other factors can also be used (31,32). After the image is blurred, then it is compressed (decimated, subsampled) by the same factor. If the size of the input image is N × N, then the height of the pyramid is at most log2 (N) + 1. The next operation is computing the contrast at each ‘‘pixel’’ of every layer for a set of orientations. There are several methods for computing contrast. Taylor et al.’s (33–35) model computes contrast using a definition, which is similar to the conventional Michelson definition of Eq. (6). At each pixel, a window is used whose center is at that pixel and has a given orientation f . The size of the window in combination with the low-pass nature of the filter that was used to create the representation on that layer serve as a band-pass filter, which reveals only a small range of spatial frequencies. Specifically, the low-pass filter removes high spatial frequencies, and a window of limited size does not ‘‘see’’ high contrasts at low spatial frequencies. Using a Hamming window, the model averages the intensities in a direction orthogonal to the orientation f . These averages are then used to compute the contrast. Other methods of computing contrast involve dividing the pixel value in a given layer by the baseband mean (29) or taking the ratio of the band-pass-filtered image at a given frequency to the low-pass image filtered to an octave below that frequency (30,36). All of these
LPF LPF
LPF
2
2
2
Figure 16. An illustration of the low-pass pyramidal decomposition. Successive levels of the pyramid are generated by low-pass filtering and decimating the image from the level below.
612
IMAGE QUALITY METRICS
methods estimate contrast at individual points of the image at a number of orientations and spatial frequencies. The next step is to compare the contrasts between the original and rendered image. The difference is expressed in the units of DL (difference threshold). If the difference is much lower than DL, then the model assigns a low probability of detecting the difference. If the difference is higher than the threshold, the probability of detecting this difference is high. In fact, image fidelity models use the entire psychometric function to map the difference in physical contrast to the probability of the visible difference. In Taylor et al.’s model, the contrasts from the two images are compared to the difference thresholds that were measured in psychophysical experiments. In other models, the difference between the contrasts is compared to the detection threshold after a correction for the magnitude of the reference contrast. This correction is called masking. It refers to the fact that it is usually more difficult to detect the difference between two contrasts when both contrasts are high, compared to when they are low. The difference thresholds are collected in psychophysical experiments (as described earlier) and then stored in the model’s look-up table (LUT). Obviously, it is important that the psychophysical experiments and the model use the same definition of contrast (not all models of image fidelity enforce this requirement). At this point, the model has local contrasts at all locations in all layers of the pyramid and at a number of orientations. This information has to be reduced. The first step is to compute the probability of detecting a difference between the two images at each point of the image (across all spatial frequencies and all orientations). One way is to assume that the individual channels are independent and to use the standard rule of probability summation. Another is to combine the magnitudes of the perceived differences (as expressed by standard normal variables) by using one of the Minkowski metrics (unfortunately, little is known at present about the rules used by the human visual system). The resulting probabilities are usually plotted as a spatial map of the probability that an observer would detect the differences. Here, lighter means a higher
predicted probability of perceiving a difference. Figure 17 shows the architecture of a typical model of image fidelity, and Fig. 18 shows the results of the prediction. The second step in reducing the amount of information provided by the model is to compute the overall probability of detecting differences between the original and the rendered image. Again, little is known about the rule that is used by human observers. In fact, it is quite possible that the observers may use more than one rule, depending on the application. Including color information is a natural generalization of the models of image fidelity. The human visual system has two opponent channels for processing the chromatic properties of a visual stimulus: red–green and blue–yellow (37). Because these channels can be activated in both the positive and negative directions, one cannot use any simple analog of contrast to characterize the two channels [if Eq. (6) is used, the denominator may sometimes be zero]. One possibility is to use chromatic differences (38,39). Then, one can measure the difference thresholds and then build two ‘‘replicas’’ of the monochromatic model of image fidelity. The resulting color image fidelity assessor (CIFA) model (Fig. 19) produces three probability maps characterizing visible differences in the luminance, red–green, and blue–yellow channels (Fig. 20). Here, lighter regions mean a higher probability of detecting a difference between the original and the reproduction in that particular channel. Despite its complexity, the CIFA model accounts only for the low-level processes of the human visual system. The viewer’s perception of color image fidelity, or more broadly, image quality, is also mediated by higher level processes that depend on recognizing image content. For example, Yendrikhovskij et al. (40–42) investigated the role of memory colors, such as the colors of skin, grass, and sky in viewers’ judgments of ‘‘naturalness.’’ They observed that viewers are much more capable of judging the fidelity with which memory colors are reproduced than other colors, and they proposed a model for identifying memory colors, based on such similarity judgments.
Psychometric LUT Psychometric selector Ideal image
Lowpass pyramid
Contrast decomposition
Rendered image
Lowpass pyramid
Contrast decomposition
+ Σ −
Channel response predictor Limited memory prob.sum Image map of predicted visible differences
Figure 17. Block diagram of the Image Fidelity Assessor (IFA).
(b)
(a)
(c)
Figure 18. The images in (a) the parrot image and (b) the parrot image distorted by Adobe Photoshop’s ‘‘crystallize’’ algorithm are the original and distorted images. The image in (c) is the output of the IFA. Lighter regions mean a higher probability of detecting a difference.
Luminance of ideal image Luminance of rendered image
Achromatic IFA
Probability map in luminance
Multi-resolution luminance images Red-green opponent of ideal image Red-green opponent of rendered image Blue-yellow opponent of ideal image Blue-yellow opponent of rendered image
Red-Green IFA
Probability map in red-green response
Blue-Yellow IFA
Probability map in blue-yellow response
Chromatic IFAs
Figure 19. Block diagram of the Color Image Fidelity Assessor (CIFA).
613
614
IMAGE QUALITY METRICS
(a)
(b)
(c)
(d)
(e)
Figure 20. CIFA predictions for hue change. (c)–(e) are the CIFA probability maps for (a) the image of parrots compared with (b) the image of parrots distorted by changing the hue. The probability maps (c), (d), and (e) correspond to the probability of visible differences in luminance, R–G, and B–Y channels, respectively. Lighter regions represent a higher probability of detecting a difference in the respective channel. The viewing distance in the CIFA was 15 times image height. See color insert.
It is important to realize that before one uses a given model to obtain predictions of visible differences, the model should be validated in a psychophysical experiment. Although the structure and functioning of the model were motivated by the anatomy and physiology of the human visual system and some free parameters (thresholds) were directly measured in psychophysical experiments, the model may or may not be psychologically plausible. One obvious reason is that the thresholds were measured by using very simple stimuli, but the model is used to
make predictions about complex stimuli. The fact that a complex stimulus can be represented as a sum of simple stimuli (Fourier components or wavelets) does not imply that the percept of a complex stimulus is a sum of simple percepts [e.g., (43)]. There have not been many attempts to design and conduct psychophysical validation of image fidelity models. One approach is to ask the observer to compare two images (original and rendered) in a number of places marked by a square grid (Fig. 21). The observer assigns ratings of the visible difference in all blocks.
IMAGE QUALITY METRICS
Figure 21. Snapshot of a trial for the validation experiment.
(a)
(b)
Figure 22. Validation experiment: (a) results of observer ratings of original and distorted parrot images shown in Figs. 18a,b for albedo; (b) output of IFA for these images (Fig. 18c) after reformatting to match the resolution of the visual representation of the results of the validation experiment.
By pressing a button, observers can temporarily remove the grid, so it will not interfere with their judgments. The rating scale involves two or more levels. Figure 22 shows the spatial map of the observer’s ratings. The map was obtained from the ratings by upsampling the array of numbers and then low-pass filtering it. It is seen that the spatial map of visible differences produced by the observer is similar to the spatial map of visible differences predicted by the model. To obtain quantitative evaluation of the relation between these two maps, one can perform direct statistical tests. For example, one can test a hypothesis that the two maps are statistically independent. If this hypothesis is rejected, one may conclude that the model’s predictions are related to the perceived differences. The modeling approach described so far involved the assumption of perfect registration (alignment) of the two images. This assumption may be acceptable only for digital images. Analog images, like printouts, are likely to violate this assumption. As a result, the images have to be aligned before the image fidelity model is applied. This can be
615
done on the basis of the image content. Specifically, first the model would have to perform image segmentation to reconstruct contours. Then, the contours in the two images can be compared and aligned. In fact, once the images are segmented, then one can use the contours and the regions to improve the prediction of the fidelity of the rendered image. It is known that contours play an important role in human visual perception [see (44)]. Consider, for example, two center-surround patterns that have reflectances S1 , S2 in one pattern and S1 + d, S2 in the other. A human observer will perceive the central region S1 as darker, compared to the other central region S1 + d. However, existing models of image fidelity would predict that the visible differences between these two patterns exist only along the contours between S1 and S2 . This failure is but one example illustrating the limitations of using contrast as an adequate measure of the percept of the albedo. The models described in this section use image fidelity as a tool for evaluating image quality. But the quality of an image may be evaluated without any reference (ideal) image. In such a case, image quality may be defined by the absence of visible artifacts. A model of image quality would have to be equipped with descriptions (models) of artifacts. These descriptions would be used to detect the artifacts and then to evaluate their visibility. Note that some problems that are inherent in using image fidelity models, such as registration of images, would not exist in this case. Formulating metrics of image quality based on detecting artifacts is, however, much more complex than that based on image fidelity. SUMMARY In this article, we discussed physical and perceptual aspects of image quality. We reviewed objective methods, which relate the design parameters to the physical parameters of the image. Then, we reviewed psychophysical methods that are used to characterize the percept in terms of the underlying perceptual or physical scale. Finally, we discussed the problem of formulating and validating computational models of image fidelity. Acknowledgments The authors thank Yousun Bang for her assistance in preparing the figures.
BIBLIOGRAPHY 1. N. A. Macmillan and C. D. Creelman, Detection Theory: A User’s Guide, Cambridge University Press, Cambridge, 1991. 2. P. G. Engeldrum, Psychometric Scaling: A Toolkit for Imaging Systems Development, Imcotek Press, Winchester, MA, 2000 3. ‘‘Document B123: 7th Working Draft,’’ International Standards Organization (ISO)/International Electrochemical Commission (IEC), Geneva, Switzerland, 1995. 4. M. Yuasa and P. Spencer, Proc. IS &T’s 1999 PICS Conf., Springfield, VA, 1999, p. 270.
616
IMAGE SEARCH AND RETRIEVAL STRATEGIES 36. E. Peli, J. Opt. Soc. Am. A 7(10), 2,032–2,040 (1990).
5. N. Burningham and E. Dalal, Proc. IS &T’s 2000 PICS Conf., Springfield, VA, p. 121, 2000.
37. L. M. Hurvich and D. Jameson, Science 114, 199–202 (1951).
6. J. Grice and J. P. Allebach, J. Imaging Sci. Technol. 43, 187–199 (1999).
38. W. Wu, Ph.D. Dissertation, Lafayette, IN, 2000.
7. Gretag SPM-50 Spectroradiometer: GretagMacbeth LLC, New Windsor, NY 12553-6148.
10. R. P. Dooley and R. Shaw, J. Appl. Photogr. Eng. 5, 190 (1979).
39. W. Wu, Z. Pizlo, and J. P. Allebach, Proc. 2001 IS &T Image Process. Image Quality Image Capture Syst. Conf., Quebec City, Quebec, Canada, 21–25, April 2001. 40. S. N. Yendrikhovskij, F. J. J. Blommaert, and H. de Ridder, Human Vision Electron. Imaging III, SPIE vol. 3299, 1998, pp. 274–281.
11. T. Bouk and N. Burningham, Proc. 8th Int. Congr. Adv. NonImpact Printing Technol., Springfield, VA, 1992.
41. S. N. Yendrikhovskij, F. J. J. Blommaert, and H. de Ridder, Color Res. Appl. 24(1), 52–67 (1999).
12. J. G. Proakis and D. G. Manolakis, Digital Signal Processing Principles, Algorithms, and Applications, 3e, Prentice-Hall, Englewood Cliffs, NJ, pp. 896–958.
42. S. N. Yendrikhovskij, F. J. J. Blommaert, and H. de Ridder, Color Res. Appl. 24(6), 393–410 (1999).
8. Howtek Scanmaster D4000: Howtek, Inc., Hudson, NH 03051. 9. MATLAB: The MathWorks, Inc., Natick, MA 01760-2098.
13. P. G. J. Barten, Human Vision, Visual Process. Digital Display III, SPIE vol. 1666, 1992, pp. 57–72. 14. S. H. Kim and J. P. Allebach, in Color Imaging: Device Independent Color, Color Hardcopy, and Graphic Arts VI, San Jose, CA, SPIE vol. 4300, January 2001, 23–26. 15. M. Hino, K. Kagitani, and M. Kojima, Proc. 11th Int. Congr. Adv. Non-Impact Printing Technol., Springfield, VA, 1995.
20. W. S. Torgerson, Theory and Methods of Scaling, John Wiley & Sons, Inc., New York, NY, 1958. 21. B. Treutwein, Vision Res. 35, 2,503–2,522 (1995). 22. H. Wallach, J. Exp. Psychol. 38, 310–324 (1948). 23. D. Jameson and L. M. Hurvich, Science 133, 174–179 (1961). 24. A. Gilchrist et al., Psychol. Rev. 106, 795–834 (1999). 25. C. S. Barnes, J. Wei, and S. K. Shevell, Vision Res. 39, 3,561–3,574 (1999). 26. A. Rosenfeld and M. Thurtson, IEEE Trans. Comput. C-20, 562–569 (1971). 27. Z. Pizlo, A. Rosenfeld, and J. Epelboim, Vision Res. 35, 1,089–1,107 (1995). 28. Z. Pizlo, M. Salach-Golyska, and A. Rosenfeld, Vision Res. 37, 1,217–1,241 (1997). 29. S. Daly, in A. B. Watson, ed., Digital Images and Human Vision, MIT Press, Cambridge, MA, 1993, pp. 179–205. 30. J. Lubin, in E. Peli, ed., Vision Models for Target Detection and Recognition, World Scientific, Singapore, 1995, pp. 245–283. 31. P. Burt, Comput. Graphics Image Process. 16, 20–51 (1981). 32. S. M. Graham, A. Joshi, and Z. Pizlo, Memory and Cognition 28, 1,191–1,204 (2000). 33. C. Taylor, Z Pizlo, J. P. Allebach, and C. A. Bouman, Proc. 1997 IS&T/SPIE Int. Symp. Electronic Imaging Sci. Technol., San Jose, CA, 8–14, February 1997, vol. 3016, pp. 58–69. 34. C. C. Taylor, J. P. Allebach, and Z. Pizlo, Proc. 1998 IS &T Image Process., Image Quality Image Capture Syst. Conf., Portland, OR, 17–20, May 1998, pp. 37–42. 35. C. C. Taylor, Ph. D. Dissertation, Purdue University, West Lafayette, IN, 1998.
West
44. T. N. Cornsweet, Visual Perception, Academic Press, NY, 1970.
IMAGE SEARCH AND RETRIEVAL STRATEGIES Y. L. MURPHEY The University of Michigan-Dearborn Dearborn, MI
17. B. W. Keelan, Proc. IS &T’s 2000 PICS Conference, Springfield, VA, 2000.
19. C. J. Bartleson and F. Grum, in Visual Measurements, vol. 5, Academic Press, Orlando, FL, 1984, p. 451.
University,
43. K. Koffka, Principles of Gestalt Psychology, Harcourt Brace, NY, 1935.
16. P. G. J. Barten, Hum. Vision Visual Process. Digital Display, SPIE vol. 1077, 73–82, 1989.
18. G. A. Gescheider, Psychophysics: Method, Theory, and Application, Lawrence Erlbaum, Hillsdale, NJ, 1985.
Purdue
INTRODUCTION The proliferation of computer technology and digital image-acquisition hardware has led to the widespread use of image data in a variety of applications, including astronomy, art, natural resources, engineering design, military, business operation, medicine, and education (1–6). Major research activities in digital image databases surged after the U.S. government’s Digital Library Initiative (DLI) from 1994 to 1998. The initiative, funded by three U.S. government funding agencies, NSF, DARPA, and NASA, supported six major projects to explore a broad range of fundamental research issues in digital information, including digital image libraries (3). The digital image library research was conducted by the University of California at Berkeley and partly by the University of California at Santa Barbara. The DLI research team at the University of California at Berkeley developed a work-centered digital information system that contains 450 digital text documents, 200 air photos, and more than 11,000 ground photographs (7). The system allowed a user or a working group to access its own collections of varying data types and to generate new materials to be added to the collection. The DLI project at the University of California(UC) at Santa Barbara focused on developing digital databases that contain geographically referenced materials such as maps and aerial photos. The research team developed a number of tools for browsing and retrieving map images at multiple resolution (8). Two other major efforts in developing digital image databases were at The National Library of Medicine (NLM), a component of the National Institutes of Health (NIH), and at Time Warner. NLM funded an image database project called the Visible Human Project that attempted
IMAGE SEARCH AND RETRIEVAL STRATEGIES
to support the most comprehensive online image repository of the male and female human body (4). Time Warner decided to have its 20-million photograph collection scanned and catalogued by hand, and detailed textual descriptions were appended to each image (5). More recently, research scientists at the University of California at Berkeley and the Fine Arts Museums of San Francisco successfully launched on the Internet the largest art image database in the world, the Thinker ImageBase (9). Automatic image indexing and retrieving technology is fundamental in digital image databases. There are two general strategies in image retrieval, browsing, and searching. Browsing is an information retrieval strategy in which a user navigates through an ordered arrangement of images by selecting from the progressive levels of a hierarchy into which the available images have been logically grouped. Image browsing relies on the user’s cognitive abilities to recognize images of interest without formulating a specific query. Searching is an information retrieval strategy in which the user communicates with an information retrieval system via an interface that requires the user to input a search query. Images can be indexed and retrieved by text-tag information and/or by image content. Conventionally, images are indexed using text-tag information, such as title, keywords, date of the work, artist, author, photographer, legend, and captions (10–15). Images are often subject to a wide range of interpretations, and textual descriptions can only begin to capture the richness and complexity of the semantic content of a visual image. In many applications, the complexity of the information embedded in image content such as types of objects, object attributes, and spatial relationship of objects, cannot be synthesized in a few key words. It has been reported that users who query an image collection tend to be much more specific in their requests and information needs than when querying a text database (16). Furthermore, text indexing requires large human effort in creating the meta-data that enables visual queries and is language dependent. Large databases that contain thousands and millions of digital images that can occupy gigabytes of space are almost impossible for manual indexing and searching. The demand for systems that use pictorial information combined with textual description in image retrieval is growing, and automatic indexing and retrieval based on image content have become the most promising techniques for large image databases and digital image libraries (2,17–21). This chapter describes the techniques developed for content-based image retrieval. Readers interested in text-based image retrieval may find these two papers interesting (22,23). CONTENT-BASED IMAGE INDEXING AND RETRIEVAL An image database that supports content-based image retrieval must address the following issues: representation and extraction of image features, image searching
617
and indexing, and image query formats and user interfaces. Image features used in describing image content can be divided into three levels. At the lowest level, image features include color, texture, and shape. These are the most popular features used in content-based image retrieval systems. At the middle level, image content is described by image regions represented by combinations of color, texture, shape, and spatial relationship. Image features at the middle level allow more powerful queries and more precise retrieval of images. Highlevel queries usually associate with human cognizance of image content, for example, finding images that contain people, horizon, flags, or fireman at work (6,24,25). High-level queries are generic, unpredictable, and difficult to implement because a high-level query often implies some semantic description that is nearly impossible to implement. Guibas and Tomasi gave a good discussion on a high-level query, ‘‘firemen at work’’ (24). They showed that images that contain the semantics of ‘‘firemen at work’’ might or might not contain fire, smoldering buildings, water hoses, helmets, or fire trucks. Most of the image feature extraction and object representation techniques are derived from the technology developed within the computer vision community. However content-based image retrieval differs from traditional computer vision in the following two aspects. • Instead of automatic recognition, in image retrieval, a user can guide or assist an image search process. • Instead of correctly recognizing objects, an image retrieval system identifies similar looking objects. The common practice in an image retrieval system is that for a given query, algorithms are developed to retrieve images from the database that will meet the general description of the query restricted to the set of retrieved images to be scanned visually by the user. By this practice, it is possible to have a fraction of false positives and false negatives within the retrieved images. Because the execution of a query is on-line, a large database system cannot keep a user waiting to allow it to go through even a cursory description of each image. For large image databases, an exhaustive search of an entire image database to find and order matches with a query is not feasible. According to Pentland, Picard, and Sclaroff (26), a search that requires minutes per image is simply not useful in a database that has with millions of images. Many practical image database systems adopt the following guiding principle in system design: image database search time should be linear or sublinear in the number of images in the database (24). The response time to a query depends on the complexity of the query, the image features used to implement the query, similarity measurement, and image indexing mechanisms. The similarity measurement between an image query and an image is important because images are retrieved in the order of the similarity measure. A similarity measure must correlate with human judgments; otherwise, the
618
IMAGE SEARCH AND RETRIEVAL STRATEGIES
images found by a database system will not be what the human user wants. Many existing image libraries or database systems allow interactive search; a user can recursively refine a search by selecting examples from the currently retrieved images and use them to initiate a new select-sort-display cycle. The interactive search mechanism give a users opportunities to ‘‘zero in on’’ quickly what they are looking for (26). Other approaches explore effective indexing techniques to assist in image searching. Image indexing refers to the encoding of feature vectors that represent an image and store the structure of the feature vectors in an image database. Because image indexing algorithms run off-line, it is acceptable to implement rather sophisticated algorithms by index images for efficiency at the search stage. A good indexing structure can minimize the on-line search time. Readers who are interested in image database construction and image indexing may find the following list of references useful (19,22,25,27–36). An image database system should provide a userfriendly interface for composing and submitting queries and displaying search results. A user needs an easyto-use interface to ascribe the visual properties of desired images to the elements of the query. The user interface of an image database depends very much on the image features it supports and on the educational and technical background of the user. In addition, the ethics and aesthetics should also be considered (37). The most popular interfaces that support content-based image retrieval are of two types: pictorial and sketch (38). A pictorial query, also called query-by-example, is presented in an image form. This type of query requires users to submit example images that are close to what they are looking for. In implementation, a database system usually provides an interface to allow a user to browse images and then select any one displayed on the screen as a query. Figure 1 shows an interface of query-by-example supported by IMAGE SEEK, an image search engine developed at the University of Michigan-Dearborn (39). In this figure, the browse button was selected, and eight
Figure 1. IMAGE SEEKER’s interface. See color insert.
images randomly drawn from a selected database were displayed. The first image shown with highlighted image frame was the selected query image. The search engine then conducted a search in its database for the images most ‘‘similar’’ to the query image based on the image features supported by the system. Many image databases, for example, IBM QBICTM (40,41) and the Virage image search engine (20,42), support this type of query. Query-by-sketch is a method that retrieves images based on a rough sketch drawn by a user. Systems that support sketch-based queries must provide tools to allow the user to draw image features such as shape, size, color, and texture to form queries. VisualSEEk, a content-based image/video retrieval system developed at Columbia University (43,44) provides extensive tools to support query-by-sketch. The sketch-based queries can address all three levels of image content, although it can be time-consuming to compose a complicated sketch. The following sections will introduce important image search and retrieval strategies that use different levels of image content. One special section will be devoted to the discussion of evaluative criteria for image databases, and the last section will discuss the future technology in image search and retrieval. IMAGE RETRIEVAL USING COLOR Color is the most popular feature used in image retrieval. Color is often described in terms of intensity, brightness, luminance, lightness, hue, and saturation. Intensity is a measure of the flow of power over some interval of the electromagnetic spectrum that is radiated from a surface. Intensity is what is called a linear light measure, expressed in units such as watts per square meter. Brightness is defined by the Commission Internationale ´ de l’ Eclairage (CIE) as the attribute of a visual sensation according to which an area appears to emit more or less light. Because brightness perception is very complex, the CIE defined a more tractable quantity, luminance. The luminous efficiency of the Standard Observer is defined, numerically, everywhere positive and peaks at about 555 nm. When a spectral power distribution (SPD) is integrated using this curve as a weighting function, the result is CIE luminance, denoted as Y. The magnitude of luminance is proportional to physical power; in that sense it is similar to intensity. But the spectral composition of luminance is related to the brightness sensitivity of human vision. The perceptual response to luminance is called lightness, denoted as L∗ . It is defined by the CIE as the modified cube root of luminance. Hue is the attribute of a visual sensation according to which an area appears to be similar to one of the perceived colors. Roughly speaking, if the dominant wavelength of a SPD shifts, the hue of the associated color will shift. Saturation of an area is judged in proportion to its brightness. Saturation runs from neutral gray through pastel to saturated colors. Roughly speaking, the more an SPD is concentrated at one wavelength, the more saturated will be the associated color. A color can desaturate by adding light that contains power at all wavelengths.
IMAGE SEARCH AND RETRIEVAL STRATEGIES
In image retrieval, a color is defined in a color space. The most popular color spaces are R(ed)G(reen)B(lue), HSV(hue, saturation, and value), HLS(hue, lightness, saturation), XYZ(CIE tristimulus values), CIELAB, and CIELUV. Conversions among these color spaces can be found in (45–47). A digital color image in a computer is represented either in a 24-bit RGB space or an 8-bit color palette. In a 24-bit color image, sometimes referred to as a true-color image, each pixel has three additive primary color components, Red(R), Blue(B), and Green(G), and therefore, can represent one of 224 colors. Every pixel in an 8-bit color image represents one of 28 (or 256) possible colors or shades of gray (approximately the quality of NTSC broadcast television). In color image retrieval, RGB is commonly used to preserve the rich color information in the original image. Some image retrieval systems choose to use a color space that exhibits perceptual uniformity such as HSV, HLS, CIELAB, or CIELUV. The advantage of using color in RGB space for image indexing and retrieval is that there is no need to perform color space transformation. However, the RGB color space does not provide perceptual uniformity, which means that a small perturbation of a component value should be approximately equally perceptible across the range of that value. In image retrieval, color viewing is a perceptual process and is closely related to human vision. Thus, perceptual uniformity should be an important and necessary attribute of the color space used in image retrieval algorithms. To be qualitatively and quantitatively useful, a color description system must relate to the human impression of color and have metric significance, which means that equal physical distances must correspond to equal perceptual differences. Most content-based image systems use the color histogram of an image as a color feature to index and retrieve images. Statistically, a color histogram denotes the joint probability of the intensities of the three-color components in a color space. When only a single image feature such as color, shape, or texture is used in image retrieval, it has been shown that the color histogram feature outperforms the other features in efficiency and robustness (21,48). Given a discrete color space, the color histogram of an image is obtained by discretizing the color components in the color space and counting the number of times each discrete color occurs in the image. The color histogram of an image represents the color distribution in the image, and each histogram bin represents a color in the given color space. Figure 2 is an example of the color histogram (on the right) generated from an image (on the left) in the RGB space. The x, y, and z axes represent R, B, and G, respectively. In this example, most image pixels were within bins that had high blue values. Image histograms are invariant to translation and rotation about the view axis. Image histograms change with scale variation and occlusion and change slowly as the view angle changes (48). Extensive discussion on computing the color histogram can be found in (48,49). For image databases that use color histograms as features, images are indexed by their color histograms. During
619
Figure 2. An example of a color histogram in the RGB space. See color insert.
the retrieval stage, a query is first transformed into the corresponding color histogram representation, and then the similarity between the query’s histogram and the histogram of each image in the image database is measured. The images that have higher similarity measures are returned before those that have lower similarity measures. There are a number of different similarity measures between two histograms. Let the histogram of a query image be H q , the histogram of mth color image in the database be H m , and let each histogram contain N color bins. The most popular method that measures the similarity between H q and H m is to calculate their color distance (50). Mathematically, the similarity of the two histograms can be measured by the following distance function: DIST(H q , H m ) = (H q − H m )t A(H q − H m ), where A = [aij ] is a weight matrix, aij denotes the similarity between color bin i and j, 0 ≤ aij ≤ 1, and aii = 1. When A is an identical matrix, DIST becomes the standard Euclidean distance function. By assigning proper weights to [aij ], the distance measure can provide results more agreeable to human perception. For example, let us assume that a histogram is composed of only three colors: RED, ORANGE and BLUE. If a user would like to consider RED color more similar to ORANGE color than to BLUE, the similarity coefficient between RED and ORANGE can be assigned as 0.8, and the coefficient between RED and BLUE as 0. The resulting similarity weight matrix is 1.0 A=
0.8 0
0.8 1.0 0
0 0 . 1.0
Let a color histogram, X = [1.0, 0, 0]T , represent a pure RED image, and another color histogram Y = [0, 1.0, 0]T represent an ORANGE image. The weighted squared histogram distance and the Euclidean distance of the two color histograms is 0.4 and 2, respectively. Now consider a pure BLUE image, Z = [0, 0, 1.0]T ; the weighted difference between Z and X becomes 2 and the Euclidean distance remains 2. The weighted distance measure indicates that the ORANGE image is more similar to the RED image than the BLUE image is to the RED image. This distance measure is more consistent with human perception. One
620
IMAGE SEARCH AND RETRIEVAL STRATEGIES
drawback of the weighted histogram distance function is that it requires more computational time because a color histogram is typically a high-dimensional distribution. A na¨ıve implementation of DIST requires a quadratic form of O(N 2 ) operations, where N is the number of color bins in the histogram. Hafner et al. suggested (50) using some precomputations (e.g., by diagnonalizing the quadratic form) to improve the computation to O(N). Because a computation is done for every image in the database, the color similarity measure would not be feasible for on-line image retrieval unless the dimensionality is controlled. The general conclusion is that one needs to generate low-dimensional indices so that using standard database indexing methods in image retrieval involves only O(log M) computation, where M is the size of the database. Many image databases use fewer less than 256 color bins. Much work has been done in reducing dimensionality without sacrificing completeness of color representation (50,51). The similarity between two image histograms H q and H m can also be computed by using the histogram intersection method (48). Mathematically, the normalized intersection function INTERSECT is defined as n
INTERSECT(H q , H m ) =
j=1
q
min(Hj , Hjm ) n j=1
. q
Hj
If INTERSECT(H q , H m ) > INTERSECT(H q , H P ), it implies that the mth image in the database has a better similarity match to the query than the pth image. Swain and Ballard (48) showed that the complexity of the histogram intersection method depends linearly on the product of the size of the histogram and the size of the database. Specifically, the complexity of the algorithm can be shown as O(NM), where N is the number bins in the histogram and M is the number of images in the database. Color histograms can be computed over subimages to combine the color feature with spatial relations (19,52). Figure 3 illustrates the advantage of this scheme. In this example, Fig. 3(a) is the query image, and (b) and (c) are two images in the database that contain one elliptical object that has the same color and shape as
(a)
(b)
(c)
Figure 3. An example of using four subimage features in image retrieval. Assume that (a) is the query image and (b) and (c) are two images in a database. Both images (b) and (c) will match image (a) based on the global image histogram.
that in the queried image but differs in spatial location. If the global color histogram were used in image retrieval, Fig. 3(b) and (c) would have the same similarity measure. If we used the four color histograms generated from the subimages shown in the figure, Fig. 3(b) would measure as more similar than (c) to the image in (a). In general, an image can be divided into K different subimages, and the color feature of an image can represented by the K color histograms each derived from one of the subimages. Formally, the histogram of subimage i can be defined as q Hi in the query image and Him in the mth image in the database. The similarity measure over the entire image q is obtained by summing the similarity measure over Hi m and Hi for all i’s. More sophisticated approaches involve image segmentation. Smith and Chang (43) segmented an image into regions of salient color features by Color Set Back-projection and then stored the position and Color Set feature of each region to support image queries. Rickman and Stonham proposed a color-tuple histogram approach (53). First, they constructed a codebook that described every possible combination of coarsely quantized color hues that might be encountered within local regions in an image. A histogram based on quantized hues was constructed as the local color feature. Stricker and Orengo (54) extracted the first three color moments from five predefined partially overlapping fuzzy regions. Pass et al. classified each pixel of a particular color as either coherent or incoherent based on whether it is part of a large similarly colored region (55). This approach improved the representation of local color features by distinguishing widely scattered pixels from clustered pixels. Huang et al. developed a color correlogram-based color layout representation (56). This approach used the autocorrelogram and correlogram as similarity measures. Their experimental results showed that this approach was more robust than the conventional color histogram approach in retrieval accuracy. Color feature has been used in all three types of image database interfaces, query-by-example, text query, and query-by-sketching. Color Image Retrieval Using Query-by-Example In this interface, a user submits an image as a query. The query image is usually selected from a set of images provided by a database system. The search engine of the database system searches its database for images whose color histograms match the color histogram of the queried image. The search result is displayed on the screen in descending order of similarity measures. A good example of query-by-example interface is one of IBM’s QBIC demonstrations. When a user first enters the system, twelve randomly selected images are displayed. The user can continue random browsing until an image is found that is suitable for use as a query. In each image frame, there are five icons labeled I, C, L, T, and S, respectively. If ‘‘I’’ is clicked, the full-sized image and the associated keyword list is displayed. If ‘‘C’’ is clicked, images whose color percentages are similar to the query image are retrieved. If ‘‘L’’ is clicked, the images whose color layout (sensitive to color position) is similar to the query image is retrieved. If ‘‘T’’ is clicked, images of global texture similar to the
IMAGE SEARCH AND RETRIEVAL STRATEGIES
Figure 4. The query-by-color interface IMAGE SEEKER. See color insert.
query image are retrieved; if ‘‘S’’ is clicked, images similar to the query image with respect to the special hybrid color are retrieved. Query-by-example is a very popular technique; other successful implementations can be found in other versions of IBM QBIC (19,40,41), Virage (20), and VisualSEEK (43,44,57).
(a)
621
of
(b)
Image Retrieval Using Query-by-Color The problem with query-by-example is that a user must submit an image as a query to retrieve the desired images from a database. Sometimes, the user can get frustrated in the attempt to find an appropriate query image. Image queries using color directly can circumvent this problem. There are a number of different interfaces that implement this type of retrieval. IMAGE SEEKER allows a user to select a color as a query from an array of color icons provided by the system (see Fig. 4). In implementation, the system uses a color descriptor (CD) set, CD = {color1 , color2 , . . . , color15 }. For each color, the center of the color and its spectrum in a color space are defined by using the 1976 CIE-L∗ u∗ v∗ diagram (58). For each image, IMAGE SEEKER computes the histogram of the fifteen color descriptors over the entire image, and the color descriptor with the largest value in the histogram is called the dominant color of the image. For a given query, the images whose dominant color matches the query color are retrieved. The ranking of these images is based on the percentage of the dominant color in the individual images. The larger percentage of the dominant color an image has, the closer the image is to the color query. Figure 5 shows an example. The images shown were retrieved from a satellite image database in response to a query descriptor GREEN, and the images were ranked according to the percentage of ‘‘GREEN’’ pixels in the corresponding images. The advantage of query by color name is that it is simple to use and is intuitive to common users. The major problem is that the dominant color of an image may not be the color of the object to which the query refers. To overcome this problem, some systems provide tools for a user to specify multiple color distributions. IBM’s QBIC system implements query-by-color through an interface of a color wheel or color percentage specification (19). To compose a color query, a user can ‘‘paint’’ a graphical representation of the desired pattern of colors and specify the approximate percentages. To specify a percentage, one can click in the painting area to block off a
(c)
(d)
Figure 5. Four images were retrieved in response to the query ‘‘GREEN.’’ The percentage of green color in each image was (a) 50.00%, (b) 47.00%, (c) 36.00%, (d) 32.00%. See color insert.
rectangle of the desired size (percentage of the full area). All new regions are painted white, initially. To select a color from the color picker to paint a rectangle, the user can either click in a color region or enter the desired RGB values in the text areas. The procedure of specifying the desired color and percentage can be repeated up to five color/percentage pairs. Once the system receives a query, namely, the specified percentages of RGB, it looks for images whose color histograms best match the query. The advantage of this approach is that the system still uses the same color histograms as those used in query-by-sample. The disadvantage is that the composition of a query can be tedious. The interface of VisualSEEk consists of several functional areas (44). The elemental unit of an image query is point of denotation (POD). A POD is meant to correspond to a spatial region within an image that has some visual properties of note. PODs are not isomorphic with physical objects depicted in the photos, nor do PODs correspond to the areas produced by a segmentation of the images. PODs merely refer to the regions within the images that may possess properties of color, texture, and shape. The VisualSEEk interface supports queries that
622
IMAGE SEARCH AND RETRIEVAL STRATEGIES
include up to three PODs. In other words, a user might indicate that up to three PODs or regions have visual properties to be used to query the image database. The PODs can be assigned some or all of the properties of color, texture, shape, and motion. A user is also allowed to position the PODs in any particular spatial layout. To specify a color feature of a POD, a user can simply activate the POD and then select a color from a ColorWell. The ColorWell contains several wheels stacked on top of each other that form a well of color wheels. The different color wheels have different hue saturation values. Chabot, developed at the University of California at Berkeley, supports query-by-color (59) through an interface of text description. Chabot quantizes the RGB color space to 20 bins, that is, 20 different colors. The color selection tool is a pull-down menu that contains a list of color descriptions such as ‘‘MostlyRed, MostlyBlue SomeYellow, MostlyPink, WhiteStuff. For example, if a user specifies ‘‘SomeYellow(2)’’ Chabot would search for images whose image histograms contain at least two bins that have yellow components. For well-categorized image databases, the color feature can be quite effective in retrieving images. However, for a general image database, retrieving images based on the color feature alone is often not sufficient. For example, the images that contain red apple, red cars or flags may have similar color histograms but completely different image contents. ‘‘Image Retrieval Using Combined Features’’ will show that a color feature becomes more powerful when it is combined with other image features. IMAGE RETRIEVAL USING TEXTURAL FEATURES Texture refers to the visual patterns whose properties of homogeneity do not result from the presence of only a single color or intensity (60). Texture is an innate property of virtually all surfaces of objects, including bricks, fabric, beans, clouds, trees, and hair, that contain unique visual patterns or spatial arrangements of pixels that surfaces’ gray level or color alone may not sufficiently describe. Texture is an important element of human vision. It has been reported that a ‘‘preattentive visual system’’ exists in human vision that can almost instantaneously identify ‘‘textons,’’ the preattentive elements of texture (61,62). Typically, it has been found that textures have statistical and/or structural properties such as contrast, coarseness, and directionality. Texture is a useful image feature for describing image content such as clouds, water, crop fields, trees, bricks, and fabric. To support image retrieval using textural features, techniques that automatically segment images by using textural features need to be developed. Textural feature extraction and segmentation have been studied extensively in computer vision research. Many approaches have been proposed to model various types of texture (63–72). Textural feature extraction and segmentation methods can be loosely divided into two categories, statistical and structural. The statistical methods involve calculating properties based on the color or gray tones of the image specimens. First-order statistics involve computing and extracting features such as mean, variance,
standard deviation, skewness, and kurtosis from the histogram of an image (70). Higher order statistics include the co-occurrence matrix approach for calculating various features (65,73). Gotlieb and Kreyszig experimentally found out that contrast, inverse deference moment, and entropy had the largest discriminatory power (74). Different textures can be discriminated by comparing the statistics computed over different subimages. Structural textural models try to determine the primitives of texture and compose textural features from the primitives. Extracted primitives and their placement rules can be used to recognize texture and in image retrieval, to synthesize a query image with textures. Motivated by the psychological studies of human visual perception of texture, Tamura et al. explored textural representation from a structural point of view (75), and developed computational approximations of the visual textural properties found to be important in psychological studies. The six visual textural properties were coarseness, contrast, directionality, linelikeness, regularity, and roughness. These textural representations are visually meaningful; therefore, they are very attractive in content-based image retrieval because they can provide a friendlier user interface. These textural features were implemented by the QBIC (76) and MARS systems (77,78). Methods of statistical and structural textural analysis are complementary and therefore are often used together in image segmentation. For example, Karu, Jain, and Bolle (72) used a structural method to compute the coarseness of an image and then used a method of statistical textural analysis to segment the detected textured regions. Wavelet subband features have been used for texturebased image retrieval in a number of image database systems, including the digital image library research group at the University of California at Santa Barbara (8) and VisualSEEk (79). Wavelet analysis consists of decomposing a signal or an image into a hierarchical set of approximations and details. The levels in the hierarchy often correspond to those in a dyadic scale. Wavelets have a broad range of applications, including digital signal analysis, image analysis, and image compression (66,68,79,80). Fast algorithms exist for computing forward and inverse wavelet transforms and reconstructing desired intermediate levels. The transformed images (wavelet coefficients) map naturally into hierarchical storage structures that are very useful in image databases. Wavelet decomposition has been used in image retrieval to compute textural features of image blocks. The features are computed from the mean absolute value and variance measures on the subbands produced for k iterations of wavelet decomposition. Smith and Chang (60,81) used a quadrature mirror filter (QMF) wavelet decomposition where k = 3 and achieved 93% correct classification of the complete set of 112 Brodatz textures (82). Wavelet transform was also combined with other techniques to achieve better performance. Gross et al. combined wavelet transform with KL expansion and Kohonen maps to analyze texture (83). Other researchers combined wavelet transform with a co-occurrence matrix to take advantage of both statistics-based and transform-based textural
IMAGE SEARCH AND RETRIEVAL STRATEGIES
analysis (68,84) Ma and Manjunath (85,86) evaluated textural image annotation by various wavelet transform representations, including orthogonal and biorthogonal wavelet transforms, tree-structured wavelet transforms, and Gabor wavelet transforms. They found that Gabor transform was the best among the candidates tested. One advantage of these wavelet spatial-frequency approaches is that they satisfy the minimum combined uncertainty in both spatial and spatial-frequency domains (66). The digital image library at the University of California-Santa Barbara also used banks of Gabor (modulated Gaussian) filters to extract textural features (8). Simple statistical features such as mean and standard deviation were computed from the filtered outputs, to index and match images. There have been comparative studies on textural features of images. Weszka et al. compared the textural classification performance of Fourier power spectrum, second-order statistics of gray-level differences (87). They tested the three methods on two sets of terrain samples and concluded that the Fourier method was inferior, whereas the other two were comparable. Ohanian and Dubes compared and evaluated four types of textural representations: Markov random field representation, multichannel filtering representation, fractal-based representation, and co-occurrence representation (69). They tested the four textural representations on four test sets, two synthetic and two real images. They found that the co-occurrence matrix representation performed best. Computational efficiency in image retrieval, is a major concern in selecting a texture algorithm for implementation. Pixel-based textural features are computed in a neighborhood surrounding each pixel in an image. When neighboring pixels are considered similar by a certain textural measure, a region of textures is formed. In (72), Karu et al. described a method based on local row and column extrema. A pixel was labeled textured if the local extrema density in the window centered at the pixel was between the lower and upper bounds. They showed good results in detecting the existence of texture by using the density of local extrema. The challenge of this type of approach is to classify the pixels that border several textural regions. Fortunately, in applying image databases, the precision of border pixel classification is not important in overall block textural matching. Block-based approaches divide an image into a number of overlapping image regions and then compute the spatial blocks of homogeneous texture (79) using one of the textural measuring methods mentioned before. Guibas and Tomasi (24) used a query window of 5 × 5 blocks, where each block specified one textural pattern. During the image search, the system took a query textural vector for one of the 25 rectangles and looked for some of its nearest neighbors in the database. Images returned by the 25 separate queries, one for each rectangle in the query window, were combined into the answer for the user. A scoring scheme was used to account for both the size of a region with a given texture and the similarity between the query texture and the image texture. Scores were added over all 25 blocks for each of the images returned, and a list of images, sorted by decreasing scores, was returned to the user. The nearest neighbor searches
623
in a large Euclidean space are computationally explosive, so various algorithms have been developed to optimize the computation of block-based texture (88). The IBM QBIC system extracts image texture using modified versions of coarseness, contrast, and directionality (40). For color images, these textural features were computed on the luminance band generated from the three color bands. Coarseness, measured on the scale of the texture, was calculated using moving windows of different sizes. Contrast that described the vividness of the pattern was a function of the variance of a gray-level histogram. Directionality was used to detect whether the image had a favored direction (like grass), or whether it was isotropic (like a smooth object). It was measured by the distribution of gradient directions in the image. More details about these methods can be found in (89). Photobook, an image retrieval system developed by the Media lab at MIT, implemented a new textural model based on the Wold decomposition for regular stationary stochastic processes in 2-D images (26). The Wold decomposition is based on a 1938 theorem by H. Wold for 1-D random processes. This theorem states that any random process can be written as the sum of two processes, one that can be predicted by a linear filter with zero mean-square error (deterministic), one regular (90), and that these two processes are mutually orthogonal. If an image is considered a homogeneous 2-D discrete random field, then the 2-D Wold-like decomposition is a sum of three mutually orthogonal components: a harmonic field, a generalized evanescent field, and a purely indeterministic field. Qualitatively, these components appear as periodicity, directionality, and randomness. Photobook uses three stages to implement the Wold model. The first stage is to determine whether there is a strong periodic (or nearly periodic) structure. Although highly structured textures can contain all three Wold Components, their harmonic components are usually prominent and provide good features for comparison. Additionally, they are also the quickest to compute. The second stage is used for periodic images. An algorithm is implemented to estimate first the location of large local maxima and then extract the fundamental frequencies of all harmonic peaks. The direction of the harmonic frequency that is closest to the origin is regarded as the main orientation angle of the texture. Rotations and other transformations can be applied to the peaks to align them into a sort of ‘‘generic view’’ image before further comparison. The third stage is applied when an image is not highly structured. This stage approximates the finding of the two less salient dimensions, the directional and complexity components. The number of dominant orientations is estimated via steerable filtering, and a decision process is developed based on the thresholding orientation histograms. Only the textures that possess the same number of main orientations are subsequently compared with the Wold complexity component. The complexity component is modeled by using a multiscale simultaneous autoregressive (SAR) model. The SAR model characterizes spatial interactions among neighboring pixels and computes textural features for a second-order simultaneous autoregressive model (on
624
IMAGE SEARCH AND RETRIEVAL STRATEGIES
a gray-scale component) over three scales (resolution levels 2, 3, 4) on overlapping 25 × 25 subwindows (18). The SAR parameters of different textures are compared using the Mahalanobis distance measure. In addition to its significance in random field theory, the Wold decomposition produces compact textural descriptions that preserve most of a texture’s perceptual attributes. The harmonic, evanescent, and indeterministic components in the Wold decomposition have close relationships to the three dimensions, repetitiveness, directionality, and complexity, commonly used in the psychophysical study of perceptual texture similarity (91). The Blobworld system developed at the University of California at Berkeley uses a quite novel approach in computing textures (92). This approach uses a scale selection method that is based on edge/bar polarity stabilization. The texture descriptors arise from the windowed second-moment matrix. Let ∇I be the gradient of the image intensity. The second-moment matrix for the vectors within a Gaussian window is approximated by using Mσ (x, y) = Gσ (x, y)∗ (∇I)(∇I)T where Gσ (x, y) is a separable binomial approximation of a Gaussian smoothing kernel whose variance is σ 2 . At each pixel location, Mσ (x, y) is a 2 × 2 symmetric, positive, semidefinite matrix; thus it provides three pieces of information about each pixel. Consider a fixed scale and pixel location. Let λ1 and λ2 denote the eigenvalues of Mσ at that location, and let φ denote the argument of the principal eigenvector of Mσ . When λ1 is large compared to λ2 , the local neighborhood possesses a dominant orientation, as specified by φ. When the eigenvalues are comparable, there is no preferred orientation, and when both eigenvalues are negligible, the local neighborhood is approximately constant. The scale selection is based on a local image property known as polarity measured by the extent to which the gradient vectors in a certain neighborhood all point in the same direction. Formally, the polarity is defined as p= where E+ =
|E+ − E− | . E+ + E− Gσ (x, y)∇I • nˆ + ,
(x,y)∈
and E− =
Gσ (x, y)∇I • nˆ − .
(x,y)∈
[]+ and []− are the rectified positive and negative parts of their argument, nˆ is a unit vector perpendicular to φ, and represents the neighborhood under consideration. In this case, E+ and E− are measured as the number of gradient vectors in on the positive and negative side of the dominant orientation, respectively. The process of selecting a scale is as follows. The first step is to compute the polarity at every pixel in the image for σk = k/2, k = 0, 1, 2, . . . , 7, thus producing a stack of polarity images
across scales. In the second step, for each k, the polarity image computed at scale σk is convolved with a Gaussian, whose standard deviation is 2σk , to yield a smoothed polarity image p˜ σk . Step 3 is to select the scale for the pixel as the first value of σk for which the difference between successive values of polarity (p˜ σk − p˜ σk −1 ) is less than 2%. Once a scale, σ ∗ , is selected for a pixel, that pixel is assigned three texture descriptors, polarity pσ ∗ , anisotropy a = 1 − λ2 /λ1 , and the normalized texture √ 3 contrast c = 2 λ1 + λ2 . The combination of this texture feature and color features used in image retrieval will be discussed later. Once textural features are extracted from an image and quantified in feature vectors, a similarity function should be used in image search. A distance function such as the Euclidean or Mahalanobis distance function is often used to measure the similarity between textural feature vectors. The distance values are used to rank the similarity of images in the database to the query image. Texture similarity can also be measured by a discriminant function such as Fisher discriminant analysis (93). This requires a training stage when the image database is constructed to construct a discriminant function general enough to discriminate between new and unknown textures. The common interface to support query-by-texture is either query-by-sample texture or query-by-example image. To implement query-by-sample texture, an image database system usually provides a textural easel or textural patches (19,44). A user can select any one pattern of texture as a query to submit to the system. The database system then looks for images that contain the submitted textural patch. The images that contain a higher frequency of the occurrence of the textural pattern are considered more similar to the query. One such example is VisualSeek’s TextureSpread interface (44,81). Initially, none of the textures is loaded, and the name of the texture is in the place of each texture. When the user double-clicks the mouse on the name, the textural image will be loaded and displayed for viewing. A user can submit any textural patch on display as a query. Because a textural patch contains a homogeneous textural feature, the similarity between a textural query and an image in the database can be measured by using the occupancy ratio of the textural feature within the image. In query-by-example, once a user selects an image, the search engine of the database system matches the textural feature vectors of the query image with the images in the database by using a similarity measure. The search result is displayed on the screen in descending order of the similarity measure. In implementation, the textural similarity in an n-dimensional textural space can be computed as the weighted sum of n distance functions, each of which measures the similarity of two textural feature vectors. Assume that a database has n different textural features. The distance between a query image Q and an image X in the image database is
Dqx =
n i=1
wi Di (Tqi , TXi )
IMAGE SEARCH AND RETRIEVAL STRATEGIES
where Tqi and Txi are the ith textural feature vectors computed from the query image Q and a database image X, respectively, and Di is the function that measures the distance between the two textural feature vectors Tqi and Txi . The most common weighting (or normalization) factors are the inverse of the variances for each textural component that are computed over all the samples in the image database (40). Normalization brings all components into a comparable range, so they have approximately the same influence on the overall textural distance. For example, in the IBM QBIC system, the textural distance function between two images Q and X is Dqx =
(Oq − Ox )2 (Cq − Cx )2 (Dq − Dx )2 + + 2 2 σo σc σD2
where O, C, and D represent coarseness, contrast, and directionality, respectively, and σo2 , σC2 , σD2 are the variances of the three textural features. Most images contain multiple texture patterns, so textural features are often used in combination with spatial information. The most popular data structure to support a combination of textural features with spatial information is a quad-tree structure (79,94,95). A quadtree structure allows a user to specify multiple textures and position them relatively in a query. Smith and Chang (79) extracted textural features from spatial blocks in a hierarchy of scales in each image by using a quad tree. All feature sets were computed directly from the QMF wavelet representation of the images. Each quad-tree node points to a block of image data, and textural features are extracted on the spatial blocks pointed to by each child node. Based on the values of the textural features, all, some, or none of the children are spawned. Child nodes are merged when a discriminant function indicates that the child blocks contain sufficiently similar textures. The textural key provided by the user is examined for information on the scale of the textural process. By performing quad-tree-based textural segmentation on the key, ideally, a single quad-tree node could result, indicating that one texture is present. The blocks in the database that are smaller than the minimum block size obtained in the textural key are not considered in the match. IMAGE RETRIEVAL USING OBJECT-SHAPED FEATURES Shape is an important datum in many image database applications, for example, meteorology, medicine, manufacturing, entertainment, and defense. In some image databases such as those that contain trademark images and MRI images (96–98), shape is the most useful feature for describing the image content. The design of a shape-based image retrieval system often involves the following issues: object segmentation, shape representation, and similarity function. Object segmentation refers to the process of partitioning an image into regions that have approximately homogeneous features such as luminance or color; this has been proven to be a difficult problem in computer vision (99,100). There is abundant literature on image segmentation (101–103), and therefore this
625
topic will not be discussed in this article. Shape extraction and representation involve techniques for describing various shape properties and attributes and extracting these properties automatically or semiautomatically from images (34). In general, image regions can be represented by either nonnumeric or numeric descriptions. A nonnumeric description can be a graph or semisymbolic description of objects such as adjacency graphs of regions with lists of attributes. Numeric description refers to the methods that result in a shape descriptor for each region. A numeric descriptor is often referred to as a feature vector that uniquely characterizes the shape of a region. In image retrieval, numeric descriptors, that is, feature vectors, are often used to represent shape. Methods of shape analysis used in image retrieval can be classified into two categories, global and local shape features. Global features of an object are descriptions of the attributes of an entire object shape. Examples of global features are area, perimeter, circularity, chain code, Fourier descriptors, medial axis transform (MAT), compactness, direction, and eccentricity (101,104,105). Most global shape features are invariant to shift and rotation. Some of them such as moments are invariant to affine transformation. Because the entire shape of an object is required to compute these properties, global featurebased representations do not perform well on partially visible, overlapping, or touching objects. Local features are computed from an object’s local regions such as boundary segments and points of maximal curvature change. Local feature-based methods are suitable for representing partially visible, overlapping, or touching objects. One important shape representation is the boundary of an object. The boundary of an object can be represented directly by a sequence of coordinates of boundary points or indirectly by a sequence of coded points, a functional approximation, or a sequence of interesting points of the boundary. Chain codes have been popular in representing a boundary by a connected sequence of straight-line segments of specified length and direction. A chain-code representation can be in either four or eight connectivity of the segments. The direction of each segment is coded by using a number scheme. Usually, digital images are acquired and processed in a grid format with equal spacing in the x and y directions, so a chain code could be generated by following a boundary in, say, a clockwise direction and assigning a direction to the segments that connects every pair of pixels (101). This method is generally unacceptable for two principal reasons: (1) the resulting chain of codes is usually quite long, and (2) any small disturbances along the boundary due to noise or imperfect segmentation cause changes in the code that may not necessarily be related to the shape of the boundary. Various algorithms have been proposed to overcome these problems. For example, Freeman extracted the critical points of an object boundary from the chain-code description of a boundary and then generated a shape description based on the critical points (106). Parui and Majumder (107) used a modified chain code to perform a symmetry analysis by using a hierarchical structure. The Fourier descriptor has been popular in representing shapes in various applications (105,108). The Fourier
626
IMAGE SEARCH AND RETRIEVAL STRATEGIES
descriptor of a boundary of an object is obtained through discrete Fourier transforms (109,110). Rui et al. described a modified Fourier descriptor to take into account the digitization noise in the image domain. The method was both robust to noise and invariant to geometric transformations (108). Lu et al. conducted an in-depth study on a number of Fourier descriptors. Interesting features of various Fourier descriptors can be found in (105). The boundary shape of an object can also be represented by an edge histogram (21,96,97). Edges of an image are pixels that are local maxima in gradients and can be used to describe the boundary shape of an object. An edge histogram is the histogram of edge directions in an image. The edges and the edge histogram of each image in an image database are generated off-line. The match between two images is measured by a histogram intersection technique similar to the color histogram matching described earlier. This method captures the general shape of an object and can be implemented efficiently as an on-line retrieval scheme. However, its accuracy depends very much on the accuracy of the underlying edge operator. In addition, edge histograms have the following characteristics (21). • The histogram of the edge directions is invariant to translation in the image. Thus, the positions of the objects in an image have no effect on edge directions. However, two totally different images may have the same histograms of edge directions • The use of edge directions is inherently not scaleinvariant. This is easy to verify. Two objects of identical shape but different sizes yield two different numbers of edge points, therefore, yield two different histograms of edge directions. To circumvent this problem, one may normalize the histograms with respect to the number of edge points in the image. A drawback of a normalized histogram is its inability to match individual parts contained in the images. • A histogram of edge directions is sensitive to rotation. A shift of the histogram bins during matching partially takes into account rotations of the image. A rotation of an image shifts each of the edge directions by the amount of rotation. For example, if a quantization of 45° (bin size = 45° ) is used, then two edge points with directions of 30° and 40° will fall into the same bin. However, if the same image is rotated by 10° , then these two points will fall into adjacent bins. Various schemes were proposed to overcome the effect of rotation. One of them was to smooth the histogram of edge directions as a 1-D discrete signal (21) using the following formula: i+k
Is [i] =
I[j]
j=i−k
2k + 1
,
where Is is the smoothed histogram, I is the original histogram, and the parameter k determines the degree of smoothing. • The effectiveness of matching two histograms depends on the bin size of the histogram. Matching
speed is increased by choosing a larger bin size. However, this also reduces the accuracy in an arbitrary rotation of the image. A rotation causes a shift in the histogram bins and also affects the bin membership. Use of a very small bin requires more matching time (because the number of bins increases), and it also demands a very high degree of accuracy in the edge directions. An object shape is often represented by the significant points of its boundary such as the maximum local curvature of boundary points or the vertices of the shape boundary’s polygonal approximation. Thus, a shape boundary is coded as an ordered sequence of interest points. Mehrotra and Gary (34) used the vertices of the shape boundary’s polygonal approximation in a shape retrieval system called Feature Index-Based SimilarShape Retrieval (FIBSSR), in which the shape of an object was represented by an ordered set of boundary segments and each boundary segment was represented by the adjacent interest points. Mehrotra and Gary showed that significant points could be encoded to be invariant to scale, rotation, and translation. Grosky and Mehrotra described a technique that represents a shape in terms of its boundary’s local structural features in a property vector (32). A string edit-distance-based similarity measure was employed to match two shapes. The edit-distance between two strings was the number of changes required to transform one string to the other. An m-way search tree-based index structure was used to organize the boundary features of the database shapes. Grosky and Mehrotra showed that this technique could handle images of occluded and touching objects. However, the index structure was quite complex. Moment-based methods use object regions to compute moments that are invariant to various transformations. Hu (111) identified seven moments that have the desirable property of being invariant under shift, scaling, and rotation. More recently a number of variations of Hu’s moments were developed (112–115). Yang and Albregtsen developed a fast method of computing moments in binary images based on the discrete version of Green’s theorem (112). Based on the fact that most useful invariants were found by extensive experience and trial and error, Kapur et al. (113) developed algorithms to generate and search systematically for a given geometry’s invariants. The advantage of moments is that they are mathematically concise, and the disadvantage is that it is difficult to correlate high-order moments with shape features. Because similar moments do not guarantee similar shapes, it is better to combine moments with other perceptual shape measures such as curvature and turning angles. Moments have been used as shape features in a number of image retrieval databases (40,97,116). Research has been performed to compare region-based with boundary-based shape representation methods. Li and Ma showed that the geometric moments method and the Fourier descriptor were related by a simple linear transformation (117). Mehtre et al. compared the performance of boundary-based representation including chain-code, Fourier descriptor, UNL Fourier descriptor, with region-based representations including moment
IMAGE SEARCH AND RETRIEVAL STRATEGIES
invariants, Zernike moments, pseudo-Zernike moments, and combined representations such as moment Invariants with Fourier descriptor (122). Their experiments showed that the combined representations outperform the simple representations. For shape matching, chamfer matching attracted much research attention (155). A chamfer-matching technique compares two collections of shape fragments at a cost proportional to the linear dimension rather than the area (119). Borgerfos developed a hierarchical chamfermatching algorithm to speed up the matching process (120), and the matching was performed at multiple resolutions from coarse to fine. Choosing a shape representation scheme is important in developing a shape-based image retrieval system and, generally, it is driven by application. For example, some applications may want to choose a representation that is insensitive to variations such as changes in scale, translation, and rotation, and others may require differentiating these variations. Many existing image databases use multiple shape features. IBM QBIC uses a combination of shape features: area, circularity, eccentricity, major axis orientation, and a set of algebraic moments (40). It is assumed that all shapes are nonoccluded planar shapes. The area of an object is computed as the number of pixels appearing in the object in a binary image. Circularity is computed as perimeter2 /area. The second-order covariance matrix is computed using the boundary pixels, and, from this covariance matrix, the major axis orientation is the direction of the largest eigenvector. The algebraic moment invariants are computed from the first m central moments of the boundary pixels and are given as the eigenvalues of the predefined matrices. A user can select any subset of these features. The shape feature was used in query-byimage as well as in query-by-sketch. Photobook takes quite a different approach. In considering the difference between objects of the same type that cause a change in view geometry or physical deformation, Pentland et al. modeled the physical ‘‘interconnectedness’’ of the shape (26). The interconnectedness is computed by the well-known finite-element method (FEM). This method produces a positive, definite, symmetrical matrix, called the stiffness matrix, that describes how each point on the object is connected to every other point. The semanticspreserving code for shape is derived from the eigenvectors of the stiffness matrix that contain deformations relative to some base or average shape. All of the shapes are first mapped into this space, and similarity is then computed based on the eigenvalues. The shape similarity is computed by comparing the amplitudes of the eigenvectors. Chu et al. described a medical image retrieval system that represents object shapes by using boundary features. Due to the complexity of object shapes in medical images, Chu et al. used a landmark-based decomposition technique to decompose complex-shaped contours into shape descriptive substructures. Because knowledge about the shapes of the human anatomy is available, modelbased feature extraction is used to extract shape features. More details of this approach can be found in (98).
627
A shape query in image databases, is typically a sketch (drawing), a shape icon, or an image. When a user submits a shape query, images that contain the object or objects similar to the query are retrieved and ranked according to a similarity measure. In general, query-by-sketch can support a combination of image features such as color, shape, texture, and spatial relations. A number of image database systems, including VisualSEEk and IBM’s QBIC (40,57,121,122) support query-by-sketch for retrieving images using shape features. To support query-by-sketch, a database system must derive a similarity measure that closely conforms to the common human perception of shape similarity. This similarity measure must also take into account the elastic deformation of user sketches. Bimbo and Pala (123) described a technique that uses global shape properties to represent a shape and an elastic method to match sketched templates over the shapes in the images. The similarity is measured by the match of the deformed sketch to the imaged objects and the elastic deformation energy of the sketch. A one-dimensional template is modeled by a second-order spline τ = (τx , τy ) : R → R2 that, it was assumed, is parameterized with respect to arc length and normalized so as to result in length one. For a given image I, its edge image IE is searched for a contour whose shape is similar to that of τ . A deformed template is given by φ(s) = τ (s) + θ (s). The match between the deformed template and the edge image IE is measured by 1 M= IE φ(s) ds. 0
If IE is normalized to [0,1], then M ∈ [0, 1]. When M = 1, the template lies entirely in image areas where the gradient is maximum, and when M = 0, the template lies entirely in areas where the gradient is null. The elastic deformation energy for the template is given by
dθy 2 dθx 2 + ds ε =S+=α ds ds
dθx 2 dθy 2 +β + ds. ds ds The quantity S, depending on the first derivative, is a rough measure of how the template τ is strained by the deformation θ , and the quantity , depending on the second derivative, is an approximate measure of the energy spent to bend the template. Two additional components, N and C, are used in matching. The complexity of the template N is measured by the number of the zeros of the curvature functions associated with its contour, and C is a qualitative measure of the correlation between the curvature function associated with the original template and that associated with the deformed one. All of these five parameter vectors, S, , M, N, and C, are classified by a back-propagation neural network subjected to appropriate training. For each input vector, the neural classifier gives one output value ranging from 0 to 1 that represents the similarity between the shape in the image and the shape of the template.
628
IMAGE SEARCH AND RETRIEVAL STRATEGIES
IBM’s QBIC system implements query-by-sketch as follows (40). First, the system converts a sketch to a signal band luminance image. Second, it computes the edges in the image using a Canny edge operator and reduces the image to size 64 × 64. During the matching process, the user sketched image is partitioned into 8 × 8 sets of blocks; each of them has 8 × 8 pixels, and each image in the database is divided into searching areas of size 16 × 16, each correlated with a block of the sketch. A logical binary correlation is implemented with specific values given on a pixel-by-pixel basis for edge to edge matches, edge to noedge matches, and no-edge to no-edge matches between the user sketch and database image. The matching score is computed as the sum of the correlation scores of each local block. To support the query-by-example interface for shape features, every image in the database must be represented by a set of selected shape features, and an algorithm must be implemented to match the shape features of the two images. A popular matching method is to use a weighted Euclidean distance similar to that discussed in the textural retrieval section. IMAGE RETRIEVAL USING SPATIAL RELATIONSHIPS Spatial relationships among the objects in an image is an image content feature commonly used in image retrieval that has not been discussed. Spatial relationships within image content span a broad spectrum ranging from directional relationships to adjacency, overlap, and containment, involving a pair of objects or multiple objects. Image retrieval using spatial relationships needs to address three issues: representation of spatial knowledge, spatial relation inference, and similarity retrieval. The data structures commonly used to represent spatial relationships for image retrieval include a family of 2D strings, spatial orientation graphs, quad trees, etc. A database system should select a data structure that can be used efficiently for indexing images and, more importantly, for matching images with a query. The family of 2-D strings is quite popular for representing spatial relationships of objects in image databases. Chang et al. first proposed a 2-D string representation based on orthogonal projections as a spatial knowledge structure in 1987 (124). The objects of an image were projected along the x and y axes to form two strings that represent the relative positions of objects. The formal description is as follows: Let S = {O1 , O2 , . . . , On } be a set of symbols, where Oi represents a pictorial object in an image, for i = 1, 2, . . . , n. Chang et al. defined a set of spatial operators: T = {=, <, :}. ‘‘a = b’’ denotes that objects a and b are at the same spatial location, ‘‘a < b’’ denotes that object a is at the left of b or a is below b, and ‘‘a : b’’ denotes that a is in the same set as b (124). A 2-D string over S ∪ T is defined as y y y (O1 rx1 O2 rx2 . . . rxn−1 On , Op(1) r1 Op(2) r2 . . . rn−1 Op(n) ), where p denotes a permutation function from {1, 2, . . . , n} y y y to itself and rx1 , rx2 , . . . , rxn−1 , r1 , r2 , . . . , rn−1 are spatial relationship operators in T. The first substring represents the spatial relations along the x axis and the second along the y axis. Examples of 2-D string representations
can be found in (124,125). Chang et al. (126) described an image retrieval approach based on the 2-D string representation. A symbolic picture is represented by a set of ordered triples, (Oi , Oj , rij ), that represent pairwise spatial relationships between Oi , Oj , and rij ∈ T. Thus, the spatial match problem becomes a subset matching problem from the set of all of the ordered triples of a picture query to the sets of all of the ordered triples of symbolic pictures, and the problem can be solved by using an algorithm for 2-D subsequence matching. Instead of using longest common subsequence(LCS) search method (127), Chang used a hash function to map the ordered triples into indexes in a picture table (126), in which each index corresponds to an ordered triple. A linked list of image names or IDs is associated with each index, and the entries at each index are the IDs of all of those images that have an ordered triple mapped to that index. The image retrieval is performed as follows: For a query image, the set of ordered triples is computed, every ordered triple in this set is mapped into an index in the picture table, and all of the image IDs associated with the index are retrieved. The intersection of the sets containing the IDs of the images is the retrieved result. This approach is an improvement over LCS in terms of computational speed; however, it makes adding new images to the database difficult because all of the spatial relationships of all images are indexed through a hash table Jungert, Chang, and Li (128,129) extended 2-D strings to a new structure called 2-D G-strings along with several new spatial operators to represent more relative position relationships among the objects of a picture. The problem with 2-D G-string was overlapping objects, because the number of segmented subparts of an overlapped object depended on the number of bounding lines of other objects that overlap with it. To solve this problem, Lee and Hsu introduced a 2-D C-string representation that uses a more efficient and economic cutting mechanism to capture the spatial relationships among all objects of an image in the x and y directions. Its formal definition and string generation algorithm can be found in (130,131). The 2-D C-string representation requires shorter strings and still preserves thirteen types of spatial relationships among the objects in one dimension. All of these 2-D string representations discussed before are sensitive to the orientation of the image. Thus, the spatial relationships derived among objects can significantly differ between an image and its rotated and/or shifted versions. To solve this problem, Petraglia et al. (132,133) proposed a representation called 2-D Rstring. A 2-D R-string is obtained from a mechanism that cuts concentric circles along the ‘‘ring direction’’ and radial segments in the ‘‘sector direction’’ based on a rotational center that is defined as the origin of a polar coordinate system. The string generated along the ringdirection is called a C-string, and that generated along the sector direction is called an S-string. A 2-D R-string is an ordered pair of C- and S-strings. Therefore, a 2-D R-string of a picture is rotationally invariant with respect to its rotational center but variant to shift. A serious problem with 2-D R-strings is that the rotational center is always missing from a 2-D R-string representation. Without
IMAGE SEARCH AND RETRIEVAL STRATEGIES
information about the rotational center, two dissimilar pictures can be misjudged as similar. To overcome this problem, Huang and Jean proposed a 2-D RS-string representation (30). An RS-string is an ordered pair of an R-string and an S-string, where the R-string is a compact representation of the spatial relationship among objects in the ring direction and the S-string in the sector direction. The polar coordinate system requires that the ring and sector directions must have an origin that is the centroid of any object in the segmented image. The algorithm for generating an RS-string for a given image can be found in (30). An RS-string supports thirteen spatial relationships between two objects along the ring or sector directions. Once the binary spatial relationships between any two objects in both directions are obtained, one can determine a two-dimensional spatial relationship such as disjoint, join, partial overlap, containment, and belong. Rotational invariance is achieved through a similarity retrieval algorithm that generates all possible RS-strings for the query image based on all possible rotational center objects and then matches each representation with database images. Gudivada and Raghavan described a different representation of spatial relationship called a spatial-orientation graph (134). A spatial-orientation graph of an image is a hypothetically weighted graph, in which each vertex in the graph is associated with one object in the image, an edge connected two objects in the image, and the weight associated with an edge is the slope of that edge. The collection of all such possible edges of an image constitutes the edge list of the image. If an image has n objects, the number of edges in the list is n(n − 1)/2. Gudivada and Raghavan suggested that to speed up the image search, it was better to sort the vertices of the spatial-orientation graph lexicographically and pair the names of vertices so that the resulting edge names in the list remain in the lexicographically sorted order. For each edge in the list, the objects connected by the edge and the slope of the edge are stored in the database. The slope values of edges are used to compute spatial similarity. The similarity of two images is measured by an algorithm, SIM that takes the two edge lists of the two images as arguments and produces a real number indicating their similarity. Formally speaking, SIM : {(E1 , E2 )} −−−→ R, where E1 and E2 represent the edge lists of two images. Let the edge lists for a query image Sqr and a database image Sdb be Eqr and Edb , respectively, and let n1 be the number of objects in Sqr . If the slopes of the corresponding edges in the two images are equal, the similarity between the two images can be measured by SIM(Eqr , Edb ) =
N com n1 (n1 − 1)/2
where N com is the number of common edges in Eqr and Edb . If all edges in the query image exist in the database image, the similarity measure is one, and if no common edges, the measure is zero. If the edges common to Eqr and Edb do not have the same slope or orientation, the similarity
629
between the two images can be measured by SIM(Eqr , Edb ) =
ei ∈Eqr
1 + cos(θi ) 2 n1 (n1 − 1) 2
where ei is the ith edge in Eqr and θi is the angle between ei and its corresponding edge in Edb . In fact, the previous similarity measurement is a special case of one where θi = 0 for all i. Gudivada and Raghavan showed that with careful implementation the time complexity of similarity measure between two images can be of the order of O(|Eqr | + |Edb |). Quad tree is a popular structure for representing spatial relationships in image retrieval. In a top-down quad-tree decomposition, a full quad-tree data structure is formed by splitting a single parent node into two, or three, or four child nodes. The decomposition of every node continues until a minimum bound is reached. In general, an image can be divided into many different subimages, each can be represented by a quad-tree structure. For example, if an image is divided into four subimages, then a quad tree consists of two levels; the top level is the root that represents the entire image, and the second level has four nodes that represent the four subimages. The nodes in the tree can represent image features such as statistical and spatial information (95). As the level of a subimage grows, the leaf nodes of the tree become parent nodes whose branches represent the new levels. As the level in a quad tree increases, the spatial relationship between the subimages may become more accurate; however, the computational time for image matching increases, and so does the possibility that an object will be segmented to pieces. Many algorithms exist for efficient construction and search of quad trees, and good references can be found in (94,135). The advantages of a quad tree include its compactness of representation, ability to operate at different levels, and ease of executing set operations. The disadvantages include rotational and shift variance and poor shape analysis in pattern recognition. Quad tree has been used in a number of image retrieval systems. Lu et al. decomposed an entire image into a quad tree and used a color histogram at each tree branch to describe the color content (136). VisualSEEk uses a quad tree to represent spatial relationships in image retrieval. A topdown quad tree is generated by recursively splitting every node into four children nodes until some minimum bound is reached (79). After the decomposition, a postprocessing is employed to adjoin spatially adjacent and similar nodes that have different parents. A final block-grouping stage is added to merge all similar blocks to obtain arbitrarily shaped regions. The quad-tree structure is used to combine spatial relations with image textural features in image retrieval. The details of this method will be discussed in the next section. The solutions to the problem of image retrieval using spatial relationships can be partitioned into two categories (134). One type of solution requires retrieving all of those database images that precisely satisfy all of the spatial relationships specified in the query. Another type requires retrieving all database images that satisfy as many as possible of the desired spatial
630
IMAGE SEARCH AND RETRIEVAL STRATEGIES
relationships indicated in the query. For the first type, an algorithm is required to provide a yes/no type of response for every database image for a given query. Most image database systems adopt the second type. These systems usually implement a function or an algorithm to compute spatial similarity between two images. A spatial-similarity function assesses the degree to which the spatial relationships in a database image conform to those specified in the query. A ranking of database images with respect to a query image can be obtained by applying the spatial-similarity function to the query image and to each database image in turn (134,137). To reduce the computational complexity of a database search, many existing image databases allow specifying only a small number of objects by spatial relationships. Two types of queries support spatial relationships, query-by-example and query-by-sketch. When implementing the first type of query, the query image is segmented first, and then the spatial relationships among the regions are extracted along with other image features that are to be used to match with the database images. In this case, the database images are processed and are represented by a set of explicitly defined spatial relationships. To support query-by-sketch, a database system provides the necessary drawing tools to allow a user to specify the desired spatial relationships. For example, a user can be provided with graphical icons that can be placed in any desired positions in a sketch pad window, and the user is also allow specify other image features for each graphic icon. A good example of such image database systems is VisualSEEk. The VisualSEEk interface supports queries that include up to three PODs. In other words, a user can indicate up to three regions that have visual properties that can be used to query the image database. A user can specify the spatial relationships among the regions by positioning the corresponding PODs in a desired spatial layout. There are three types of spatial matches: exact match, best match, and none. When exact match is selected, only the images that contain similar PODs in the same spatial positions, as specified in the query, are retrieved. When best match is selected, the spatial score for each POD is computed as follows: If the POD is in the same spatial region, the distance is zero. If the PODs overlap but the match is not fully encapsulated by the query POD, the distance is one; if the PODs are disjoint, the distance is two. These scores for all PODs contribute to the image matching score. If a user selects ‘‘none,’’ then the spatial match core is ignored, and images are retrieved on the basis of other image features. In image retrieval, spatial relationships of regions are always used in combination with the other image features such as color, texture and shape to get a more accurate description of image contents. IMAGE RETRIEVAL USING COMBINED FEATURES Image retrieval using single features such as color, shape, texture, and spatial relationships alone is not sufficient in many applications. A number of existing image database systems support combinations of these basic image features in image retrieval (18,42,40,44,92,98,138).
VisualSEEk developed by Smith and Chang at Columbia University supports queries composed of image features such spatial relationship, color, texture, and shape (121). As discussed earlier, it uses a POD to represent a spatial region within an image. A user can specify visual features of the POD from any or all of color, shape, and texture to represent the desired image region using the interfaces ColorWell, ShapeSketcher, and TextureSpread, respectively. In a single query, a user can specify up to three PODs and position the PODs in a desired spatial layout to represent the spatial relationships of the corresponding image regions. VisualSEEk also allows the user to specify the distribution of matching weights among the features used in a query. IBM’s QBIC image retrieval system supports queries specifying individual image features such as keywords, color, texture, shape, and location, as well as combinations of these features. The queries have two different interfaces, query-by-image or query-by-sketch. IBM provides a demo package of QBIC for downloading directly from its home page, and QBIC OEM licensing is also available for image database developers. QBIC has been implemented on several image databases: the trademark database, the stamp collection database, the Art and Art History QBIC project at UC Davis, Imagebase at the Fine Arts Museums of San Francisco, and image collections from the French Ministry of Culture. Blobworld (25,92) is a system developed at the University of California at Berkeley for image retrieval based on finding coherent image regions that correspond roughly to objects. The image feature used in retrieval is a combination of color and texture. Each image is treated as an ensemble of a few ‘‘blobs’’ representing image regions that are roughly homogeneous in color and texture. A blob is described by its color distribution and mean textural features. The image feature extraction and segmentation have two major computational steps: grouping pixels into regions and generating regional descriptions. To group pixels into regions, each pixel is assigned a vector that consists of color, textural, and positional features. The color feature consists of the coordinates in the L∗ a∗ b∗ space (see ‘‘Image Retrieval Using Color), and the three textural features are contrast, anisotropy, and polarity, extracted on an automatically selected scale (see Image Retrieval Using Textural Features). The positional features are simply the (x, y) coordinates of the pixel. The distribution of pixels in this 8-D space is modeled by using mixtures of two to five Gaussians. The expectation–maximization algorithm is employed to fit the mixture of Gaussians model to the data. The minimum description length (MDL) principle is used to choose the number of Gaussians that best suits the natural number of groups in the image. Once a model is selected, connected pixels that belong to the same color/texture cluster are spatially grouped. A regional description consists of descriptions of color and textural features. The color feature of a region is described by its color histogram in the L∗ a∗ b∗ space. The histogram has five bins in the L∗ dimension and 10 bins in each of the a∗ and b∗ dimensions. Although the total number of bins is 500, only 218 bins fall in the gamut corresponding to 0 ≤ {R, G, B} ≤ 1. To match
IMAGE SEARCH AND RETRIEVAL STRATEGIES
the color of two regions, a quadratic distance is used, d = (x − y)T A(x − y), where x and y are the histograms of the two regions and A is a symmetrical matrix of weights. The elements in A have values between 0 and 1 that represent the similarity between bins i and j based on the distance between the bin centers; and adjacent bins have a weight of 0.5. Carson et al. pointed out (25) that this distance measure gives a high score to two regions that have similar colors, even if the colors fall in different histogram bins. As for the textural features of a region, the mean texture contrast and anisotropy are used. The distance between two texture descriptors is defined as the Euclidean distance between their respective values of contrast and anisotropy × contrast. The images in Blobworld are indexed in R∗ -trees. A user can compose a query by following three steps: (1) submitting an image to see its Blobworld representation, (2) selecting the relevant blobs to match, and (3) specifying the relative importance of the blob features. Two types of queries are supported, atomic query and compound query. An atomic query is one that specifies a particular blob to match. A compound query is defined as either an atomic query or a conjunction of compound queries. The score µi for each atomic query with feature vector vi is calculated as follows: For each blob bj in the database image with feature vector vj ; the system (1) computes the distance between vi and vj : dij = (vi − vj )T (vi − vj ). (2) measures the similarity between bi and bj using µij = e−dij /2 . This measurement gives one if the blobs are identical in all relevant features, and it decreases when the match is less perfect. (3) µi = maxj µij The matrix is block diagonal. The block corresponding to the textural features is an identity matrix, weighted by the textural weight set by the user. The block that corresponds to the color features is the A matrix used in finding the quadratic distance, weighted by the color weight set by the user. The user can also specify a weight for each atomic query. The compound query score for a database image is calculated by using fuzzy logic operations on the scores and weights of the atomic queries. The retrieved images are ranked according to the overall score, and the best matches are returned, indicating for each image what set of blobs provides the highest score. After reviewing the query results, the user can change the weights of the blob features or specify new blobs to match in a new query. When multiple image features in content-based image retrieval are applied to large image collections, two major challenges surface (155): high dimensionality of feature vectors and multiple similarity measurements. The dimensionality of the feature vectors is normally of the order of 102 , but the embedded dimension is much lower (139). Studies showed that most visual feature vectors can be reduced considerably in dimension without
631
significant degradation in retrieval accuracy (139–142). The Karhunen–Loeve Transform (KLT) and clustering are typical methods for reducing feature dimensions. Feature reduction using clustering has been used in a number of areas including pattern recognition, signal analysis, and information retrieval (143–150). However, Rui et al. pointed out that blind dimension reduction can be dangerous, because information can be lost if the reduction is less than the embedded dimension (151). Multidimensional indexing techniques are needed when multiple similarity measurements are used in image retrieval. Euclidean measure may not effectively measure human perception of a certain visual content, therefore, various other similarity measures such as histogram intersection, cosine, and correlation, need to be supported in an image database system. The existing popular multidimensional indexing techniques include bucketing algorithm, k-d tree, priority k-d tree, quadtree, K-D-B tree, hB-tree, and R-tree and its variants R+ -tree and R∗ -tree (33,139,151–155). Ng and Sedighian proposed a three-step procedure for selecting an image indexing strategy: dimensional reduction, evaluation of existing indexing approaches, and customization of the selected indexing approach (156). Ng and Sedighian used an eigenimage approach to reduce feature dimensions and then used the following three characteristics to select good existing indexing algorithms: • The new dimension components are ranked by decreasing variance. • The dynamic ranges of the dimensions are known. • The dimensionality is still fairly high. Based on their test data sets, they found that the BA-KDtree gives the best performance. As we mentioned earlier, image similarity may not be measured using Euclidean distances and sometimes may even be nonmetric (e.g., histogram intersection). In these cases, there are two promising indexing methods, clustering and neural networks. In (157), Charikar et al. proposed an incremental clustering technique for dynamic information retrieval. This technique can index high-dimensional data and allow using non-Euclidean similarity measures. Zhang and Zhong (158), used SelfOrganization Map (SOM) neural networks as a tool for constructing a tree-indexing structure for image retrieval. The advantages of using SOM include its unsupervised learning capability, dynamic clustering, and the potential for supporting arbitrary similarity measures. OTHER TYPES OF APPROACHES Research and development on image search and retrieval technology has surged in the last 10 years. The previous sections introduced the most popular techniques used in image search and retrieval. Other techniques also demand attention. A number of image retrieval algorithms have been developed based on the compressed image features such as JPEG or MPEG. Discrete cosine transform (DCT) coefficients of 8 × 8 blocks have been used as
632
IMAGE SEARCH AND RETRIEVAL STRATEGIES
image features for image retrieval. Some algorithms used the zero-frequency (dc) components of DCT (of the luminance component) as a query expansion feature. In this representation, the feature vector of an image has a dimension of Nc , where Nc is the total number of DCT blocks in the image, and the ith element of the feature vector is the average of the dc coefficients within the ith block in the compressed image. For large resolution images such as 768 × 512, the number of 8 × 8 blocks is large; Nc = 6, 144 blocks. The dimension of a feature vector can be reduced by grouping the adjacent blocks and then computing the average of the dc coefficients over these blocks. More information on this type of image indexing and retrieval techniques can be found in (159–161). Humans apply high-level concepts to perception in daily life. A number of efforts have been made in applying artificial intelligence techniques to image database systems for more efficient and accurate retrieval. The research activities can be divided in two areas: offline learning and on-line learning. In off-line learning, neural networks, genetic algorithms, and clustering techniques have been used to learn domain-specific knowledge. Sheng et al. developed an intelligent patientimage retrieval system using neural network learning (6). A neural network was trained to learn the radiologist’s behavior associated with patient-image retrieval. Sheng et al. showed that the resulting neural network can tolerate imperfect data and its performance accuracy is comparable to that of a manually engineered knowledge base. PicHunter developed by Cox et al. applied Bayesian relevance feedback technique to image retrieval (162). The system estimates a user’s goal image based on the entire history of the user selections by using a probabilistic model of a user’s behavior. The predictions of this model are combined with selections made during a search to estimate the probability associated with each image. These probabilities are then used to select images for display. Cox et al. demonstrated that, even with simple image features, PicHunter could locate randomly selected targets in a database of 4,542 images after displaying an average of only 55 groups of four images. Other related work can be found in (163–167). On-line learning involves incorporating of user’s on-line feedback into image retrieval. One good example of on-line learning can be found in the MARS image retrieval system (168–170). MARS uses a relevance feedback architecture to allow the user and the computer to interact with each other to improve image retrieval performance. The explosive growth of the World Wide Web (www) has made a vast amount of information available to users. However, the benefit that users can derive from such a rich resource is limited by their ability to find specific information in the data. In response to this need, a number of text search engines have been developed and made publicly available on the web, but the search mechanisms for finding images on the web have just started to appear, and it is predicted that they will grow rapidly in the next few years. Readers can find a few entry points to this literature in (171–174).
EVALUATION OF IMAGE DATABASES Evaluation of image database systems is important but difficult. Different applications may require different evaluative criteria and different priorities. The following summarizes the evaluative criteria suggested by Smith and Chang (79), Carson et al. (25), and Mostafa and Dillon (175). 1. Usability The usability of an image database measures the ease of use or the level of proficiency required of the user. The designer of the image database must be careful in evaluating the proficiency of its user group to provide an interface that is sufficient to support all of the queries and yet is clear and simple enough for a user to form various queries. The usability of an image database system relates directly to the interface that the database system provides to its users. The importance of usability can never be underestimated; an overly complicated system has low usability. The advancement of computer graphics lends great aid in developing an image database interface with high usability. Mouse clicks and drags are commonly encouraged in an image database interface because they are easily understood by a broad range of users, including people who are experienced with computers and those who have little experience. 2. Performance Image database performance is determined by three factors: the time to perform an individual query, the quality of the results retrieved, and the ease of understanding and refining the query results. Performance criteria are usually considered the most important by many computer scientists. All three factors are driven by the algorithms developed to support the queries, namely, image retrieval, image indexing, and the hardware system running as the server. It is difficult to satisfy all three factors. Good image retrieval results usually require extensive search time. However, a user usually cannot wait more than a few seconds for the results from a query. 3. Portability Portability concerns the issues involved in porting an image database system to different computer hardware platforms and operating systems. An image database interface written in Java language can be ported across most popular platforms with relative ease. 4. Learnability An image database system should provide interfaces that can be easily learned to compose, all of its queries for its user group. The queries that are difficult to learn to compose are not likely to be used often. 5. Economy The cost of developing an image database is an important factor in the success of an image database system. System developers should consider what queries are truly needed for an application and determine the most cost-effective way to implement the queries.
IMAGE SEARCH AND RETRIEVAL STRATEGIES
6. Supportability It should be easy to make updates to the image database system, and the updates should be seamless or as seamless as possible. 7. Maintainability The image database system should be designed so that it is easy to correct bugs and make changes as needed. 8. Interoperability The image database should work well with other related application programs. For example, it should accept common image formats. 9. Understandability The language, illustrations, and structures used in the interface should be understandable to the user group. For example, if an image database system is developed for use by students in K–12, the developers, usually adults trained in computer vision and/or databases, should avoid using terms and illustrations that are obscure to most kids in grades K–12. 10. Modularity The application should allow expansion and improvement. It should be easy to add new features. If a system is well structured in modularity, it is easy to add or remove components to or from the system. These performance criteria can be used to develop a quantitative evaluation of an image system for a given set of image data. However, because the image retrieval results are subjective based on individual’s perception and interpretation, quantitative results are not always meaningful. An equally important issue is establishing a well-balanced large-scale test bed because we know that if systems use different sets of test data, the results are not comparable. A successful test bed should be large to allow a test of the scalability and should have a balanced image content to test the image feature’s effectiveness and the system’s overall performance. FUTURE DIRECTIONS AND CONCLUSION Visual features of image content are extremely rich and useful in image search and retrieval. The current technology has not yet reached the point where it can be efficiently used to formulate intellectually subtle queries. The following are open research issues in content-based image indexing and retrieval. Visual features can model human perception in many different perspectives and can be represented by using many different techniques. The selection of visual features and their representations in image retrieval depend on the nature of applications. As Huang et al. pointed out, visual features should be associated with images and also invoked at the right place and right time, whenever they are needed to assist retrieval (155). The advent of the World Wide Web (www) and the rapidly decreasing cost of disk drives have made possible
633
storing and disseminating large amounts of documents over the Internet. It is estimated that the total number of documents on the World Wide Web (www) is in the range of hundreds of millions and 73% of the Web is devoted to images (176). Major search engines such as Hotbot, AltaVista, and Inforseek are powerful in finding text documents on the Web but typically have few or no capabilities for finding visual media. Research efforts have been made in developing image search engines on the Web (177–181). A More notable result is a prototype system called ImageScape developed by researchers at Leiden University in The Netherlands (176). The system integrates technologies such as vector-quantization-based compression of the image database and k-d trees for fast searching over high-dimensional spaces. It provides queries for images using keywords, semantic icons, and user-drawn sketches. However Web image search engines still need more technical breakthroughs to make them comparable to their text-based counterpart. The most difficult problem is that the contents of most Web image documents are not indexed. Even with indexed images, different image retrieval systems usually focus on different sections of users and content. As a result, the indexing features and the subject taxonomies are quite different, which raises the concern of interoperability. Several recent efforts in new paradigms for visual search, including searching by icons, sketches, and similar images, are underway (155,176,182–184). There will be continuing efforts to develop effective algorithms for extracting and representing image content at all levels. Particularly, in the absence of a priori knowledge or an imposed model, it is difficult to retrieve images accurately and efficiently using merely lowlevel image features. Therefore, high-level queries and learnable image retrieval systems will receive more attention in both the research and commercial worlds in the future. As image databases grow in scale and image features addressing image content are more diverse, a multidimensional indexing module becomes an indispensable part of image retrieval. This will require research effort from multiple disciplines: computational geometry, database management, and pattern recognition (155). Interactive image retrieval systems explore the synergy of humans and computers. In many areas, interactive (185,186) systems, that is, ‘‘humans in the loop’’ are more effective than fully automated systems. A number of research projects have made considerable effort in the areas of interactive retrieval and relevance feedback (162,164,168–170,187). It has been realized that it is important to explore the possibility of integrating human perception of image content into image retrieval systems. Some research has been conducted in studying the psychophysical aspects of human perception and human perceptive subjectivity for image content (162,164,168,170,188,189). Specifically, semantic information, memory of previous input, visual cues, and relative versus absolute judgment of image similarities are important issues to be addressed in image retrieval systems.
634
IMAGE SEARCH AND RETRIEVAL STRATEGIES
ABBREVIATIONS AND ACRONYMS CIE DCT DLI FEM FIBSSR KLT LCS MAT MDL NIH NLM POD QMF SAR SOM SPD WWW
commission internationale de l’ e´ clairage discrete cosine transform digital library initiative finite element method feature index-based similar-shape retrieval karhunen-loeve transform longest common subsequence medial axis transform minimum description length national institutes of health national library of medicine point of denotation quadrature mirror filter simultaneous auto regressive self-organization map spectral power distribution world wide web
BIBLIOGRAPHY
25. C. Carson et al., Third Int. Conf. on Visual Inf. Syst., June 1999. 26. A. Pentland, R. W. Picard, and S. Sclaroff, Int. J. Comput. Vision 18(3), 233–254 (1996). 27. S. Chang et al., Proc. IEEE Workshop on Picture Data Description and Manage., 1977. 28. N. Roussopoulos et al., IEEE Trans. Software Eng. SE-14 (1988). 29. C. H. Leung, J. Hibler, and N. Mwara, Comput. Graphics 28(1), 24–27 (1994). 30. P. W. Huang and Y. R. Jean, Pattern Recognition 29(12), 2,103–2,114 (1996). 31. Y. -I Chang and B. -Y. Yang, Pattern Recognition 30(10), 1,745–1,757 (1997). 32. W. I. Grosky and R. Mehrotra, Comput. Vision Graphics Image Process. 54(3), 416–436 (1990). 33. N. Beckmann, H. P. Kriegel, R. Schneider, and B. Seeger, in Proc. ACM-SIGMOD Int. Conf. Manage. Data, Atlantic City, NY, 1990, pp. 322–331. 34. R. Mehrotra and J. Gary, IEEE Comput. 28, 57–62 (1995).
1. B. Holt and L. Hartwick, Aslib Proc. 46(10), 243–248 (1994).
35. D. K. S. Berchtold and H. Kriegel, in Proc. 22nd VLDB Conf., New Orleans, LA, 1996, pp. 28–39.
2. U. M. Fayyad, S. G. Djorgovski, and N. Weir, AI Mag. 16, 51–66 (1996).
36. D. White and R. Jain, in Proc. 12th IEEE Int. Conf. Data Eng., New Orleans, LA, 1996, pp. 516–543.
3. S. Griffin, IEEE Comput. 32, 46 (1999).
37. P. Brown, Comput. Graphics 28(1), 28–30 (1994).
4. M. Corn, IEEE Comput. 32, 47 (1999). 5. L. Bielski, Adv. Imaging 10(10), 26–28; 91 (1995).
38. S. F. Chang, A. Eleftheriadis, and R. McClintock, IEEE Proc. 86(5), 602–615 (1998).
6. O. R. Liu Sheng, C. P. Wei, and P. J. H. Hu, IEEE Expert Systems 13, 49–57 (1998).
39. Y. Lu and H. Guo, Int. Conf. Image Anal. Process., Venice, Italy, Sept. 1999.
7. R. Wilensky, IEEE Comput. 29, 37–45 (1996). 8. T. R. Smith, IEEE Comput. 29, 54–60 (1996).
40. C. Faloustsos et al., J. Intelligent Inf. Syst. 3, San Jose, CA, (1994).
9. N. Talagala, S. Asami, and D. Patterson, IEEE Comput. 33, 22–28 (2000).
41. W. Niblack et al., Storage and Retrieval for Image and Video Databases, SPIE, February, 1993, pp. 173–187, 1908.
10. A. F. Smeaton and I. Quigley, Proc. 19th ACM SIGIR Conf. on R&D in Inf. Retrieval, Zurich, 1996, pp. 174–180.
42. J. R. Bach et al., SPIE Conf. Storage and Retrieval for Still Image and Video Databases IV, San Jose, CA, Feb. 1, 1996.
11. H. Chen et al., IEEE Trans. Pattern Anal. Mach. Intelligence 18(8), 771–782 (1996).
43. J. R. Smith and S. F. Chang, Int. Conf. Image Process. Washington, DC, October 1995.
12. W. I. Grosky, in A. K. Sood and A. H. Qureshi, eds., Database Machines, NATO ASI Series, vol. F24, 1988.
44. J. R. Smith and S. F. Chang, Symposium Electron. Imaging: Sci. Technol. — Storage & Retrieval for Image and Video Databases IV, vol. 2670, San Jose, CA, February 1996.
13. G. Nagy, Image Vision Comput. 3(3), 111–117 (1985). 14. C. O. Frost, R. Johnson, S. Lee, and Y. Lu, Fourth DELOS Workshop, San Miniato, August, 28–30, 1997. 15. R. K. Srihari, IEEE Comput. 28, 49–56 (1995). 16. P. G. B. Enser, J. Doc. Text Manage. 1(1), (1993). 17. R. Jain, SIGMOD Rec. 22(3), 57–75 (1993).
45. E. J. Giorgianni and T. E. Madden, Digital Color Management — Encoding Solutions, Addison-Wesley, Reading, 1997. 46. R. C. Carter and E. C. Carter, Color Res. Appl. 8, 254–553 (1983).
18. A. Pentland, R. W. Picard, and S. Sclaroff, Proc. Storage and Retrieval for Image and Video Databases II, vol. 2, 185, SPIE 1994, pp. 34–47.
47. CIE Standard on Colormetric Observers, CIE S002, 1986.
19. M. Flickner, H. Sawhney, W. Niblack et al., IEEE Comput. 28, 22–32 (1995).
49. M. Stricker and M. Orengo, in Proc. SPIE Storage and Retrieval for Image and Video Databases, San Jose, CA, 1995.
20. J. R. Bach et al., SPIE Conf. Storage and Retrieval for Still Image and Video Databases IV, San Jose, CA, Feb. 1, 1996. 21. A. K. Jain and A. Vailaya, Pattern Recognition 29(8), 1,233–1,244 (1996). 22. H. Tamura and N. Yokoya, Pattern Recognition 17, 29–43 (1984). 23. S. -K. Chang and A. Hsu, IEEE Trans. Knowledge Data Eng. 431–442 (1992). 24. L. J. Guibas and C. Tomasi, DARPA Image Understanding Workshop, Feb. 1996.
48. M. J. Swain and D. H. Ballard, Int. J. Comput. Vision 7(1), 11–32 (1991).
50. J. Hafner et al., IEEE Trans. PAMI 17(7), 729–736 (1995). 51. R. Agrawal, C. Faloutsos, and A. Swami, Fourth Int. Conf. Foundations Data Organization Algorithms, Chicago, IL, 1993. 52. T. S. Chua, K. -L. Tan, and B. C. Ooi, in Proc. IEEE Conf. Multimedia Comput. Syst., Los Alamitos, CA, 1997. 53. R. Rickman and J. Stonham, in Proc. SPIE Storage and Retrieval for Image and Video Databases, San Jose, CA, 1996.
IMAGE SEARCH AND RETRIEVAL STRATEGIES 54. M. Stricker and M. Orengo, in Proc. SPIE Storage and Retrieval for Image and Video Databases, San Jose, CA, 1996. 55. G. Pass, R. Zabih, and J. Miller, in Proc. ACM Conf. on Multimedia, Boston, MA, 1996. 56. J. Huang et al., in Proc. IEEE Conf. Comput. Vision Pattern Recognition, San Juan, PR, USA, 1997. 57. J. R. Smith and S. F. Chang, System Report and User’s Manual, NIA, 1997. 58. A. N. Netravali and B. G. Haskell, Digital Pictures, Plenum Press, NY, 1995. 59. V. Ogle and M. Stonebraker, IEEE Comput. 28(9), 40–48 (1995). 60. J. R. Smith and S. -F. Chang, in Proc. IEEE Int. Conf. Acoustic, Speech, Signal Process., Atlanta, GA, 1996. 61. B. Julesz and J. R. Bergen, Bell Syst. Tech. J. 62(6), pt 3, 1,619–1,645 (1983). 62. R. W. Picard and T. P. Minka, ACM Multimedia Syst., 3(1), 3–14 (1995). 63. F. Liu and R. W. Picard, IEEE Trans. Pattern Anal. Mach. Intelligence 18(7), 722–733 (1992). 64. G. C. Cross and A. K. Jain, IEEE Trans. Pattern Anal. Mach. Intelligence 5, 25–39 (1983). 65. R. Haralick, Proc. IEEE 61(5), 786–803 (1979). 66. A. K. Jain and F. Farrokhia, Pattern Recognition 24(12), 1,167–1,186 (1991). 67. J. Mao and A. K. Jain, Pattern Recognition 25(2), 173–188 (1992). 68. A. Kundu and J. -L. Chen, GVGIP: Graphical Models and Image Process. 54(5), 369–384 (1992). 69. P. P. Ohanian and R. C. Dubes, Pattern Recognition 25(8), 819–833 (1992). 70. T. R. Reed and J. M. Hans du Buf, CVGIP: Image Understanding 57(3), 359–372 (1993). 71. G. L. Gimel’Farb and A. K. Jain, Pattern Recognition 29(9), 1,461–1,483 (1996). 72. K. Karu, A. K. Jain, and R. M. Bolle, Pattern Recognition 29(9), 1,437–1,446 (1996). 73. R. M. Haralick, K. Shanmugam, and I. Dinstein, IEEE Trans. Syst. Man Cybernetics SMC-3(6), 610–621 (1973). 74. C. C. Gotlieb and H. E. Kreyszip, Comput. Vision Graphics Image Process. 51, 70–86 (1990). 75. H. Tamura, S. Mori, and T. Yamawaki, IEEE Trans. Syst. Man Cybernetics SMC-8(6), 460–473 (1978). 76. W. Equitz and W. Niblack, Technical Report RJ 9805, Computer Science, IBM Research Report, May 1994. 77. T. S. Huang, S. Mehrotra, and K. Ramchandran, in Proc. 33rd Annual Clinic on Library, Appl. Data Process. — Digital Image Access Retrieval, 1996. 78. M. Ortega et al., in Proc. ACM Conf. Multimedia, 1997. 79. J. R. Smith and S. -F. Chang, Proc. 2nd Annual ACM Multimedia conf., San Francisco, Oct. 1994. 80. T. H. Chang and C. -C. Kuo, IEEE Trans. Image Process. 2(4), 429–441 (1993). 81. J. R. Smith and S. -F. Chang, Proc. IEEE Int. Conf. Image Process. (ICIP), Austin, Texas, 1994. 82. A. C. Bovick, IEEE Trans. Signal Process. 39(9), 225–325 (1991). 83. M. H. Gross, R. Koch, L. Lippert, and A. Dreger, in Proc. IEEE Int. Conf. Image Process., Austin, TX, 1994. 84. K. S. Thyagarajan, T. Nguyen, and C. Persons, Proc. Int. Conf. Image Process., Austin, TX, 1994.
635
85. B. S. Manjunath and M. Y. Ma, Proc. SPIE Conf. Image Storage Archiving Syst., vol. 2606, San Jose, CA, 1995. 86. M. Y. Ma and B. S. Manjunath, Proc. Int. Conf. Image Process., Washington, DC, 1995. 87. J. Weszka, C. Dyer, and A. Rosenfield, IEEE Trans. Syst. Man Cybernetics SMC-6(4), 269–285 (1976). 88. S. Arya and D. Mount, Proc. 5th SIAM Symp. Discrete Algorithms (SODA), 1994, pp. 271–280. 89. W. Equitz, Research Report, IBM Almaden Research Center, San Jose, CA, 1993. 90. C. W. Therrien, Discrete Random Signals and Statistical Signal Processing, Prentice-Hall, Englewood Cliffs, NJ, 1992. 91. A. R. Rao and G. L. Lohse, IEEE Conf. Visualization, San Jose, CA, 1993. 92. S. Belongie, C. Carson, H. Greenspan, and J. Malik, ICCV, Bombay, India, 1998. 93. W. R. Dillon and M. Goldstein, Multivariate Analysis, John Wiley & Sons, NY, 1984. 94. H. Samet, Comput. Surv. 16(2), 187–257 (1984). 95. M. Spann and R. Wilson, Pattern Recognition 18(3/4), 257–269 (1985). 96. A. Vailaya, Y. Zhong, and A. K. Jain, IEEE Proc. ICPR, San Francisco, CA, 1996, pp. 356–360. 97. B. Kroepelien, A. Vailaya, and A. K. Jain, Proc. IEEE ICPR, San Francisco, CA, 1996, pp. 370–374. 98. W. W. Chu, C. C. Hsu, I. T. Ieong, and R. K. Taira, in A. Sheth and W. Klas, eds.,Multimedia Data Management, McGraw-Hill, NY, 1998. 99. D. Mumford, IEEE First Int. Conf. Comput. Vision, London, England, 1987, pp. 602–606. 100. D. Mumford, SPIE Geometric Methods Comput. Vision 1570, 2–10 (1991). 101. R. M. Haralick and L. G. Shapiro, Computer and Robot Vision, Addison-Wesley, Reading, 1992. 102. A. Rosenfeld, Comput. Vision Graphics Image Process. 50, 188–240 (1990). 103. B. K. P. Horn, Robot Vision, MIT Press, Cambridge, MA, 1986. 104. T. Pavlidis, Comput. Graphics Image Process. 7, 243–258 (1978). 105. Y. Lu, S. Schlosser, and M. Janeczko, Mach. Vision Appl. 6(1), 25–34 (1993). 106. H. Freeman, IRE Trans. 10, 260–268 (1961). 107. S. Parui and D. Majumder, Pattern Recognition 16, 63–67 (1983). 108. Y. Rui, A. C. She, and T. S. Huang, in Proc. First Int. Workshop Image Databases and Multi Media Search, 1996. 109. E. Persoon and K. S. Fu, IEEE Trans. Syst. Man Cybernetics 7 no. 3, 170–179 (1977). 110. C. T. Zahn and R. Z. Roskies, IEEE Trans. Comput. (1972). 111. M. K. Hu, in Computer Methods in Image Analysis, IEEE Computer Society, Los Angeles, 1977. 112. L. Yang and F. Albregtsen, in Proc. IEEE Int. Conf. on Image Process, Jerusalem, Israel, 1994. 113. K. Kapur, Y. N. Lakshman, and T. Saxena, in Proc. IEEE Int. Conf. Image Process, Coral Gables, FL, 1995. 114. C. Cooper and Z. Lei, in M. Hebert, J. Ponce, T. Boult, and A. Gross, eds., Springer Lecture Notes in Computer Science Series, 3D Object Representation for Computer Vision, Springer, NY, 1995, pp. 139–153.
636
IMAGE SEARCH AND RETRIEVAL STRATEGIES
115. Z. Lei, D. Keren, and D. B. Cooper, in Proc. IEEE Int. Conf. Image Process., washington, DC, 1995. 116. B. M. Mehtre, M. S. Kankanhall, and W. F. Lee, Inf. Process. Manage. 34(1), 109–120 (1998). 117. B. Li and S. D. Ma, in Proc. IEEE Int. Conf. Pattern Recognition, Jerusalem, Israel, 1994. 118. B. M. Mehtre, M. S. Kankanhall, and W. F. Lee, Inf. Process. Manage. 33(3), 319–337 (1997). 119. H. G. Barrow, in Proc. 5th Int. Joint Conf. Artificial Intelligence, 1977. 120. G. Borgefors, IEEE Trans. Pattern Recognition Mach. Intelligence 10(6), 849–865 (1988). 121. K. Hirata and T. Kato, Advances in Database Technology EDBT’92, Third Int. Conf. Extending Database Technol., Vienna, Austria, 1992. 122. T. Kato et al., Int. Conf. Multimedia Inf. Syst., MIS’91, 1991, pp. 109–120. 123. A. Del Bimbo and P. Pala, Proc. ICPR, 1996, San Francisco, CA, pp. 120–124. 124. S. K. Chang, Q. Y. Shi, and C. W. Yan, IEEE Trans. Pattern Anal. Mach. Intelligence 9, 413–428 (1987). 125. Y. Lu and T. Q. Chen, IEEE Int. Forum Multimedia Image Process., Anchorage, AK, 1998. 126. C. C. Chang, J. Inf. Sci. Eng. 7, 405–422 (1991). 127. S. Y. Lee, M. K. Shan, and W. P. Yang, Pattern Recognition 22, 675–682 (1989). 128. E. Jungert, 4th BPRA Conf. Pattern Recognition, Cambridge, March, 1988, pp. 343–351. 129. S. K. Chang, E. Jungert, and Y. Li, Technical Report, University of Pittsburgh, 1988. 130. S. Y. Lee and F. J Hsu, Pattern Recognition 23, 1,077–1,087 (1990). 131. S. Y. Lee and F. J. Hsu, Pattern Recognition 25(3) (1992). 132. G. Petraglia, M. Sebillo, M. Tucci, and G. Tortora, Proc. 1993 IEEE Symp. Visual Languages, Bergen, Norway, 1993, pp. 392–394. 133. G. Petraglia, M. Sebillo, M. Tucci, and G. Tortora, S. Impedovo, ed., Progress in Image Analysis and Processing III, World Scientific, 1994, pp. 271–278. 134. V. N. Gudivada and V. V. Raghavan, ACM Trans. Inf. Syst., 13(2), 115–144 (1995). 135. H. Samet, Applications of Spatial Data Structures, AddisonWesley, Reading, 1990. 136. H. Lu, B. Ooi, and K. Tan, in Proc. 1994 Int. Conf. Appl. Databases, 1994. 137. A. Soffer and pp. 114–119.
H. Samet,
IEEE
Proc.
ICPR,
1996,
138. M. Y. Ma and B. S. Manjunath, Proc. Int. Conf. Image Process., Oct. 1997, vol. 1, pp. 568–571. 139. D. White and R. Jain, Algorithms and Strategies for Similarity Retrieval, in TR VCL-96-101. University of California, San Diego, 1996. 140. C. Faloutsos and K. -I. Lin, in Proc. SIGMOD, 1995, pp. 163–174. 141. D. White and R. Jain, in SPIE Storage and Retrieval for Image and Video Databases, San Jose, CA, 1996. 142. S. Chandrasekaran et al., Comput. Vision Graphics Image Process. 59 no. 5, 321–332 (1997). 143. Y. L. Murphey and H. Guo, Int. Conf. Pattern Recognition, Barcelona, Spain, September 3–8, 2000.
144. A. Vailaya, M. Figueriedo, A. Jain, and H. J. Zhang, IEEE Int. Conf. Multimedia Comput. Syst., 1999, Florence, Italy, vol. 1, pp. 518–523. 145. L. Rabiner and B. -H. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, NJ, 1993, Chap. 5. 146. M. A. Penaloza and R. M. Welch, Remote Sense Environ. 81–100 (1995). 147. F. Ferri, P. Pudil, M. Hatef, and J. Kittler, in E. Gelsema and L. Kanal, eds., Pattern Recognition in Practice IV, Elsevier Science, 1994, pp. 403–413. 148. H. Almuallim and T. G. Dietterich, Proc. Ninth Natl. Conf. Artificial Intelligence. MIT Press, Cambridge, MA, 1991, pp. 547–552. 149. A. Jain and D. Zongker, IEEE Trans. Pattern Anal. Mach. Intelligence 19(2), p. 153–158, (1997). 150. R. Kohavi and D. Sommerfield, Proc. KDD’95, Montreal, Canada, 1995, pp. 192–197. 151. Y. Rui et al., Dynamic Clustering for Optimal Retrieval in High Dimensional Multimedia Databases, In TR-MARS-1097, University of Illinois, 1997. 152. A. Guttman, in Proc. ACM SIGMOD, Boston, MA, 1984. 153. T. Sellis, N. Roussopoulos, and C. Faloutsos, Proc. 12th Very Large Databases, College Park, MD, 1987. 154. D. Greene, Proc. ACM SIGMOD, Los Angeles, CA, 1989. 155. Y. Rui, T. Huang, and S. -F. Chang, J. Visual Commun. Image Representation 10, 1–23 (1999). 156. R. Ng and A. Sedighian, in Proc. SPIE Storage and Retrieval for Image and Video Databases, San Jose, CA, 1996. 157. M. Charikar, C. Chekur, T. Feder, and R. Motwani, in Proc. 29th Annual ACM Symp. on Theory of Comput., 1997, El Paso, TX, pp. 626–635. 158. H. J. Zhang and D. Zhong, in Proc. SPIE Storage and Retrieval for Image and Video Databases, San Jose, CA, 1995. 159. M. Shneier and M. Abdel-Mottaleb, IEEE Trans. Pattern Anal. Mach. Intelligence 18, no. 8, 849–853 (1996). 160. V. Kobla, D. Doermann, and K. Lin, Multimedia Storage and Archiving Syst., SPIE Proc., Boston, 1996, vol. 2916, pp. 78–89. 161. K. F. Lai, H. Zhou, and S. Chan, Asian Conf. Comput. Vision, Hong Kong, 1998, pp. 402–409. 162. I. J. Cox, M. L. Miller, S. M. Omohundro, and P. N. Yianios, IEEE Proc. ICPR, San Francisco, 1996, pp. 361–369. 163. M. Y. Ma and B. S. Manjunath, Proc. IEEE Conf. Comput. Vision Pattern Recognition, San Francisco, 1996, pp. 425–430. 164. T. P. Minka and R. W. Picard, Pattern Recognition 30(4), 565–581 (1997). 165. N. Vasconcelos and A. Lippman, Proc. IEEE Conf. Comput. Vision Pattern Recognition, Santa Barbara, CA, 1998, pp. 566–571. 166. A. L. Ratan, O. Maron, W. E. L. Grimson, and T. LozanoPerez, Proc. IEEE Conf. Comput. Vision Pattern Recognition, Fort Collins, CO, 1999, vol. 1, 23–25. 167. A. B. Torralba and A. Oliva, ‘‘Semantic organization of scenes using discriminant structural templates,’’ Proc. IEEE Conf. Comput. Vision, Kerkyra, Greece, 1999, pp. 1,253–1,258. 168. Y. Rui, T. S. Huang, and S. Mehrotra, Proc. IEEE Int. Conf. Image Process., Santa Barbara, CA, 1997. 169. Y. Rui, T. S. Huang, S. Mehrotra, and M. Ortega, Proc. 2nd Int. Conf. Visual Inf. Syst., 1997.
IMAGE THRESHOLD AND SEGMENTATION 170. Y. Rui, T. S. Huang, M. Ortega, and S. Mehrotra, IEEE Trans. Circuits, Syst. Video Technol. 8(5), 644–655 (1998). 171. V. Athitsos, M. J. Swain, and C. Frankel, IEEE Workshop on Content-based Access of Image and Video Libraries, 1997. 172. C. Frankel, M. J. Swain, and V. Athitsos, WebSeer: An Image Search Engine for the World Wide Web, Technical Report 96-14, Department of Computer Science, University of Chicago, 1996. 173. S. Sclaroff, L. Taycher, and M. LaCascia, IEEE Workshop on Content-Based Access of Image and Video Libraries, 1997. 174. T. Gevers and A. W. M. Smeulders, IEEE Int. Conf. Multimedia Comput. Syst., Florence, Italy, 1999. 175. J. Mostafa and A. Dillon, in Proc. 59th Annual Conf. Am. Soc. Inf. Sci.: Global Complexity, Inf., Chaos, and Control, Baltimore, MD, 1996. 176. M. S. Lew, IEEE Comput. 33, no. 11, p. 46–53 (2000). 177. J. R. Smith and S. -F. Chang, IEEE Multimedia Mag. 4(3), 12–20 (1997). 178. C. Frankel, M. J. Swain, and V. Athitsos, Webseer: An Image Search Engine for the World Wide Web, Technical Rep TR-9614, Computer Science Department, University of Chicago, 1996. 179. V. Athitsos, M. J. Swain, and C. Frankel, in Proc. IEEE Workshop on Content-Based Access of Image and Video Libraries, 1997. 180. S. Sclaroff, L. Taycher, and M. la Cascia, in Proc. IEEE Workshop on Content-Based Access of Image and Video Libraries, 1997. 181. J. R. Smith and S. -F. Chang, Proc. SPIE Storage and Retrieval for Image and Video Databases, San Jose, CA, 1997. 182. MPEG-7 applications document, MPEG-7 Context and objectives, Third draft of MPEG-7, ISO/IEC JTC1/SC29/WG11 N1922, MPEG97, Oct. 1997. 183. S. Weibel and E. Miller, in the Internet Workshop, Digital Libr. Mag. (1997). 184. Y. Rui, T. Huang, and S. Mehrotra, ISO/IEC JTC1/SC29/ WG11 M2290, MPEG97, July, 1997. 185. P. Wegner, Commun. ACM, 40, no. 5, p. 80–91 (1997). 186. R. Jain, A. Pentland, and D. Petkovic, NSF-ARPA Workshop on Visual Inf. Manage. Syst., Cambridge, MA, June, 1995. 187. Y. Rui, T. S. Huang, S. Mehrotra, and M. Ortega, Proc. IEEE Workshop on Content-Based Access of Image and Video Libraries, in conjunction with IEEE CVPR’97, 1997. 188. T. V. Papthomas et al., in Proc. IS&T/SPIE Conf. Human Vision Electron. Imaging III, 1998. 189. B. E. Rogowitz et al., in Proc. IS&T/SPIE, Conf. Human Vision and Electron. Imaging III, 1998.
IMAGE THRESHOLD AND SEGMENTATION JOHN C. RUSS North Carolina State University Raleigh, NC
The process of reducing a gray scale or color image to a binary (black-and-white) image in which distinct features are isolated (usually for measurement) is called thresholding or segmentation. The words refer to slightly different but overlapping sets of procedures.
637
Thresholding usually implies selecting a range of grayscale or color values that distinguish the features from the background. It is widely used and very fast because it operates globally on images, but it often does not provide perfect delineation of features because some pixels (particularly along feature edges) are misclassified and features may touch each other. Segmentation implies separating features from the background and each other, usually based on local comparison of pixel values along feature boundaries. These methods are often slow, may not find all of the features present, and sometimes require assumptions about the shape or size of the features. THRESHOLDING One very common and apparently simple class of images is produced by scanning documents, typically as a precursor to conversion to a text file by optical character recognition. These images ideally consist of pixels that belong to one of two well-defined and separated classes that have quite distinct brightness values. If the characters also lie in regularly spaced lines, as typical printed text does, that can also assist in the discrimination process. In practice, things are often more complicated. Figure 1 shows an example in which diagrams, equations, and text showing through from the back side of the page make the thresholding process somewhat more difficult (Note — all of the images shown here are taken from The Image Processing and Analysis Cookbook, a tutorial that is part of The Image Processing Toolkit CD by C. Russ and J. Russ, Reindeer Games Inc., 1999). Thresholding typically uses an image histogram to determine the correct discriminator setting for separating the foreground (text) from the background. Figure 1 shows the histogram, a plot of the number of pixels at each possible brightness level (the original image in this case is a typical 8-bit or 256 gray-value image from an optical scanner). The threshold setting (which produces the binary result) may either be set interactively by the user or determined by an automated method. Several algorithms are used to calculate the binary threshold setting. On an image like that in Fig. 1 they produce quite similar results. One class of methods uses statistical procedures to calculate the most likely setting; for example, if the brightness values for foreground and background are assumed to be Gaussian in their distribution, then a t-test can be used to calculate the probability that the two populations divided by the threshold setting are distinct and to vary the setting until that probability is maximized (this is the method used in the figure). If the assumption of a normal distribution is relaxed, nonparametric tests can be used. Another approach maximizes the entropy of the thresholded image (either the entire image or the pixels along the feature boundaries). A somewhat different approach uses the shape of the histogram to determine the threshold, identified as a valley between two peaks. The difficulty with this approach, as shown in the example of Fig. 1, is that there may not be two distinct peaks or an evident valley. Because the single threshold method for
638
IMAGE THRESHOLD AND SEGMENTATION
(a)
(b)
Threshold gray value 183
(c)
Figure 1. Thresholding of a scanned document image: (a) original image (note the visibility of text from the back side of the page and the presence of lines, equations, and other irregularities that complicate the procedure); (b) the image histogram (the red line is the cumulative histogram) with a threshold setting; (c) the binary image that results. See color insert.
separating bright from dark pixels is a special case of the more general selection of a brightness or color range of pixels in the foreground, the various automatic methods will be compared in that context. One of the major problems with thresholding based on a histogram is that the decision is global — it applies to all of the pixels in the image. Figure 2 shows an example in which this will not work. The small dark particles (immunogold particles labeling biological tissue examined in a microscope) do not all have the same density; those on the dark organelles are darker than those on the lighter regions. A similar problem would arise in Fig. 1 if the illumination of the document was nonuniform or the surface was nonplanar. The most common and general approach to dealing with problems like this is to use image processing to modify the image so that it can be thresholded. In the example in the figure, a duplicate of the image is blurred to eliminate the small dark particles, and the ratio of the brightness
in the original image to the background produced by the blurred image is calculated. This eliminates the background variation and allows thresholding as shown. SELECTING A GRAY-LEVEL RANGE A problem more general than setting a single threshold level to discriminate dark from light pixels is selecting a range of gray-scale values that correspond to the foreground features. This requires two threshold values. Figure 3 shows an example in which there are three visually distinct phases that produce a histogram with three peaks. Selecting the middle peak (the intermediate gray values) produces a binary image that shows the location of one of the three phases. The example also illustrates many of the problems inherent in this type of thresholding. First, the locations of the settings on the histogram are not obvious. Most systems allow the user to adjust
IMAGE THRESHOLD AND SEGMENTATION
(a)
(b)
(c)
(d)
639
Figure 2. Background leveling: (a) original image (microscopic image of immunogold-labeled particles in biological tissue); (b) background image produced by smoothing image (a) with a Gaussian kernel, standard deviation 6 pixels; (c) ratio of image (a) to (b); (d) thresholded result from image (c).
levels interactively until the result ‘‘looks right,’’ but this invites inconsistency and is also incompatible with automation. As for the single threshold level discussed before, several techniques can be used to calculate the levels automatically, but their use depends upon making some assumptions about the nature of the image. In addition, they produce some characteristic types of errors that must be corrected afterward. In the example, the discriminator setting between the central gray peak and the left (dark) peak lies in a valley between the peaks that does not completely resolve them. This implies that some pixels from each phase lie on
the ‘‘wrong’’ side of the threshold setting and that hence they will be misclassified. Indeed, finding the best position for the threshold is a problem because the valley in the histogram is poorly defined and the histogram counts are ‘‘noisy’’ because of counting statistics. If the two peaks do not have the same size (or shape), there is no reason to expect that the lowest point will produce the least error. Another flawed strategy often used in industrial machine vision applications is to set the threshold midway between the two peak values, but there is no reason for this to produce the least error and, in fact, depending on the camera and electronics used, the brightness scale may be
640
IMAGE THRESHOLD AND SEGMENTATION
(a)
(b)
(c)
(d)
Figure 3. Thresholding of a three-phase structure: (a) original image (microscopic image of an etched aluminum–zinc metal alloy); (b) histogram, showing three peaks corresponding to the bright, intermediate, and dark gray phases, whose threshold settings are discussed in the text; (c) binary image resulting from the threshold levels in image (b) (note the lines along the boundaries between bright and dark phase regions); (d) result of applying an opening to image c (see text).
linear, logarithmic, or have some other function, so that the midpoint has no intrinsic meaning. Where should the setting be placed? Likewise, the discriminator setting between the central gray peak and the right (bright) peak presents some difficulties. The peaks are well separated, and the threshold setting can be placed anywhere over a relatively broad range of flat background in the histogram. There is no minimum. Which setting should be used? Answering these questions requires knowing (or assuming) something about the sample and using more
information about the pixel values than the histogram provides. For example, if it is known that there are exactly three phase regions in the sample, then the same kind of statistical reasoning discussed before for the thresholding of images with two populations of pixels can be employed. If the pixels whose neighbors are quite different in brightness are excluded from the histogram (presumably they lie near or on boundaries), then the histogram may consist of separated, well-defined peaks that can be used to select the optimal threshold settings. If it is known that the boundaries of the features are smooth because of
IMAGE THRESHOLD AND SEGMENTATION
the physical nature of the specimen (a good assumption for membrane-bound biological tissue and for solidified metals, as in the image of Fig. 3), then the smoothness of the boundaries produced by various threshold settings can be used (this was the method used in Fig. 3). Regardless of the way settings are selected, some pixels will generally be misclassified. In the example of Fig. 3, the pixels that straddle the boundary between the bright and dark phases have averaged brightness values that are the same gray as pixels within the gray phase and cannot be distinguished from them by thresholding. As shown in the figure, this produces lines in the thresholded binary image along the boundaries between light and dark phases. Further image processing can remove these lines on the assumption that any narrow lines in the image are not likely at the image magnification in use to represent real portions of the gray phase. An opening (erosion or removal of edge pixels, followed by dilation or addition of edge pixels) was used in Fig. 3 to remove the lines and produce a useful binary image that shows the location of the gray phase. Postthreshold image processing of this type is often required. COLOR IMAGES It might be imagined that a color image could be treated as three gray-scale images for thresholding because the image is normally acquired and stored in a computer as red, green, and blue (RGB) values. However, it turns out that this is rarely practical either for interactive or automatic setting procedures. Recognizing the intensity of red, green, and blue in various colored regions of an image is difficult, and RGB space is rarely used for image processing or thresholding. Instead, the image is generally converted to HSI or hue, saturation, intensity space. As shown in Fig. 4, the intensity is the brightness of the image, and it corresponds to the gray-scale image that would be obtained if the color information were discarded. Hue represents the color of the pixel as an angle on the RGB color wheel that progresses from red through orange, yellow, green, cyan, etc., back around to red. Saturation represents the amount of color present and distinguishes between a pastel (e.g., pink) and a fully saturated color (red). This color space (which exists in several generally similar forms called HSB, HSV, etc., whose differences are not too important here) is more useful than RGB for image processing and thresholding because it corresponds better to what people perceive in images and to what the colors represent in many applications. For example, Fig. 5 shows a specimen of biological tissue that has been stained with chemicals to introduce color. It is difficult to recognize the amount of red, green, and blue in each region (and furthermore the amounts vary from place to place because of variations in stain density, specimen thickness, and illumination intensity). However, converting the image to HSI space allows thresholding the hue image directly using the same logic introduced before for the document image, because it consists of just two regions that correspond to the two stain colors.
641
Figure 4. One representation of hue–saturation–intensity (HSI) space. Other versions treat it as a single cone, a hexagonal cone, a cylinder, or a sphere. See color insert.
It is sometimes useful to define ranges for hue, saturation, and intensity to define the color of a particular phase or structure of interest. In Fig. 6, the color values (converted from RGB to HSI) were sampled in the magenta region and used to set ranges of ±10% for HSI which produce a binary image of the phase. It can be helpful in making interactive threshold settings on color images to have a display like that in Fig. 7, where the relative amounts of pixel color values are shown. The right-hand graph is the usual linear histogram for intensity values, and the circular area presents the color wheel that shows hue/saturation values. This display can be used for interactive selection by selecting region(s) within the wheel using a mouse, and the user can also gain experience in judging hue and saturation ranges. Each color in the image is represented in this wheel as a cluster of dark values. FOLLOWING BOUNDARIES AND SELECTING FEATURES A human operator, who is asked to delineate a feature in an image, typically carries out a tracing operation that follows the edge or boundary of the feature. Several computer algorithms have been devised that attempt to duplicate or facilitate this process. Edge following begins at a userdesignated point and tries to follow the edge in the same way that a human would. Pixels with a large brightness (or color) gradient are presumed to correspond to edges, and weighting is used to prefer continuing the boundary in the same direction or to force a degree of smoothness on the boundary. Finding a set of parameters that will accomplish successful edge following in any given instance is often more time-consuming and arbitrary than simply outlining the edge manually, because the edge-following process is inherently local and tends to follow paths away from the correct boundary. The idea of a ‘‘magic lasso’’ (more technically described as active snakes, contours, or spline fits) is somewhat more useful. The user draws an outline that encloses the feature of interest but is not precise. Then the program
642
IMAGE THRESHOLD AND SEGMENTATION
(a)
(b)
(c)
(d)
Figure 5. Color thresholding using the hue data: (a) original image (microscopic image of stained mouse intestine); (b) histograms of the red, green, and blue values, which do not show two well-separated peaks for thresholding; (c) the hue values calculated for each image and represented as a gray scale; (d) binary image that results from thresholding of image (c). See color insert.
modifies the outline by discarding edge pixels from the region if they differ (in color or brightness) from the mean value of those contained within the boundary. Weighting may again be used to require greater similarity for points that cause large amounts of local curvature in the boundary. Figure 8 shows an example of this technique,
which is new enough to have little mention in most image processing textbooks (a few literature references are given later). Either the edge-following or boundary-hugging approach to feature delineation requires the user at least to start the process and select the features which are to be
IMAGE THRESHOLD AND SEGMENTATION
(a)
643
(b)
Figure 6. Color thresholding using HSI ranges: (a) original image (microscopic image of an etched aluminum-silicon alloy); (b) binary image produced by setting 10% range limits around the colors sampled in the central magenta region (hue = 250/255; saturation = 163/255; intensity = 129/255).
(a)
(b)
Intensity Hue and saturation
Figure 7. Interactive color thresholding using an HSI histogram: (a) color image (portion of a painting); (b) interactive histogram display of the hue–saturation wheel (left) in which dark clusters represent pixels that have various color values, and the intensity histogram (right) is a projection along the vertical axis in HSI space. See color insert.
outlined (and usually to monitor closely and modify the results as well). If all of the features in an image are to be delineated, this is an inherently slow operation. Region-growing (sometimes called the ‘‘magic wand’’ approach) is another method that requires some user interaction but accomplishes segmentation in a different
way. The user selects a location within a feature, and the pixels at that location are sampled to determine the mean brightness or color values. Then pixels that border on the selected point are examined, and if they are not ‘‘too’’ different from the mean initial value, they are added to the region. This process continues, adding more pixels
644
IMAGE THRESHOLD AND SEGMENTATION
(a)
(b)
Figure 8. Active contouring or boundary modification: (a) original image (insect viewed in the light microscope), with a manually drawn outline around the feature of interest (dashed line); (b) region outline modified by the computer as a set of splines that hug the feature (note that the small details on the ends of the legs have been omitted).
(b)
(a)
Figure 9. Region growing: (a) confocal microscopic image of fluorescing stains in tissue; (b) selection of one region by clicking on a seed point within the region; (c) selection of all similar points by thresholding on the basis of color values within the first region. See color insert.
to the region, until it is halted by a boundary where the pixel values change. The tolerance for including new pixels may be determined on the basis of the mean and standard deviation of the region (updated as new pixels are added), or it may be based on a comparison only to the local pixels within the region. The latter method is better able to delineate structures that vary somewhat
from center to edge or from side to side. Figure 9 shows an example. The greatest problem with the region-growing method is that if the growing region ‘‘leaks out’’ from its true boundary by even one pixel, then it often spreads far across the image (Fig. 10). It is also, usually necessary to add more regions, one by one, by selecting an interior seed
IMAGE THRESHOLD AND SEGMENTATION
645
(c)
Figure 9. (Continued)
Figure 10. Escape of a growing region. In this image (bubbles in an epoxy resin), clicking within one bubble has initiated a growing region that leaked through the boundary at one location, where the edge was poorly defined, and spread into much of the surrounding area.
point for region growing. However, a modification of this method is to use the brightness or color values from the selected region to perform a classical global thresholding operation, as discussed before (Fig. 9). SPLIT AND MERGE The split-and-merge approach to image segmentation is often referred to as a ‘‘quad-tree’’ method because of the
Figure 11. Diagram of four iterations of split-and-merge processing. The image is split into quadrants, each of which is then examined and split again if necessary. Sections that abut each other are tested for similarity and merged when appropriate. See color insert.
data structure usually employed to hold the information (although quad-tree data structures also have other uses and split-and-merge can be done in other ways). Split-andmerge is a top-down method, quite opposite in conception from the bottom-up approach represented by region growing. Instead of beginning with a single pixel and growing outward, split-and-merge starts with the entire image and works progressively toward smaller regions,
646
IMAGE THRESHOLD AND SEGMENTATION
(a)
(b)
Figure 12. Example of split-and-merge: (a) original image (portion of a painting); (b) result of region segmentation and labeling into 10 classes based on average color. The original image was 512 pixels wide, so nine iterations were needed. See color insert.
eventually to single pixels. Each step in the iterative procedure examines the histogram of a region. If all of the pixels are (by some statistical test) considered part of the same region, then the region is maintained intact. If not, it is divided into four quadrants, and each is then subjected to the same test. At the same time, after each step, fragments of the image which were separated earlier in the stepwise process but are adjacent to each other are compared and are merged if similar. The result, as indicated graphically in Fig. 11, classifies all of the pixels in the image into several regions. Some of these may have the same average brightness but be separated spatially in the image. Figure 12 shows an example. Another kind of splitting operation takes place after thresholding. Assigning the pixels in the image as part of the foreground or background does not group them into discrete features that can be identified or measured. A so-called ‘‘blob’’ is a group of pixels that touch each other, and in many cases several objects in the original image may touch so that the computer will treat them as part of one larger feature. Touching features cannot be separated in the thresholding step if the pixel brightness or color values are not distinct, regardless of which of the thresholding approaches discussed here is used. Postprocessing using other criteria is required, the most widely used approach is watershed segmentation. This assumes that feature shapes are convex. The watershed segmentation procedure calculates the Euclidean distance map for all of the foreground pixels, as shown in Fig. 13. This assigns a gray value to each pixel on the basis of its distance from the nearest background pixel (in any direction). It is convenient to think that this
image represents an elevation, so that the darker values toward the center of features are ‘‘higher’’ and can be interpreted as mountains. If rain falls on these mountains, it runs downhill until it reaches the background (the sea). But if may not be able to run straight downhill; if it reaches a watershed line between two mountains, it must turn aside because water cannot run uphill. Thus, these watershed lines represent the boundaries between different peaks and hence between the different features present. Watershed segmentation procedure removes the pixels along the lines where the different features touch, so that they are separate and can be treated individually.
BIBLIOGRAPHY A large number of textbooks and references books are available on image processing. Many deal with the underlying mathematics and computer algorithms for implementation, and others concentrate on selecting methods for particular applications and the relationship between the algorithms and processes of human vision which they often emulate. A few recommendations following:
Classical Emphasis on Mathematical Procedures A. K. Jain, Fundamentals of Digital Image Processing, PrenticeHall, Englewood Cliffs, 1989. W. K. Pratt, Digital Image Processing, John Wiley & Sons, Inc., New York, NY, 1991. R. C. Gonzalez and R. E. Woods, Digital Image Processing, Addison-Wesley, Reading, PA, 1993.
IMAGING APPLIED TO THE GEOLOGIC SCIENCES
(a)
647
(b)
(c)
Figure 13. Watershed segmentation: (a) original binary image of touching convex features (which would be interpreted by the computer as one, single, irregular feature); (b) the Euclidean distance map of image a, in which pixel gray values represent their distance from the nearest background point; (c) the segmented features that result from removing the watershed boundaries.
References to Active Contours (snakes) for Segmentation F. Leymarie and M. D. Levine, Tracking deformable objects in the plane using an active contour model, IEEE Trans. Pattern Analysis and Machine Intelligence 617–634 (1993).
IMAGING APPLIED TO THE GEOLOGIC SCIENCES RAYMOND E. ARVIDSON EDWARD A. GUINNESS THOMAS STEIN CURT NIEBUR KRISTOPHER LARSEN
S. Menet, P. Sait-Marc, and G. Medioni, Active contour models: Overview, implementation and applications, IEEE Conf. Systems, Man and Cybernetics 194–199 (1990).
Washington University St. Louis, MO
Application-oriented Illustrations Comparing Techniques J. C. Russ, The Image Processing Handbook, CRC Press, Boca Raton, FL, 1998.
INTRODUCTION Computer Algorithms and Code J. R. Parker, Algorithms for Image Processing and Computer Vision, John Wiley & Sons, Inc., New York, NY, 1996. G. X. Ritter and J. N. Wilson, Handbook of Computer Vision Algorithms in Image Algebra, CRC Press, Boca Raton, FL, 1996.
Geology is defined as the study of the nature and dynamics of planetary surfaces and interiors. This field has expanded greatly during the past few decades to include study of the earth, and also of other objects in the solar system, focusing heavily on the earth’s moon, Venus, and Mars. Acquisition and analysis of imaging
648
IMAGING APPLIED TO THE GEOLOGIC SCIENCES
data have been central to increased understanding of planetary surfaces. Imaging data of the planets and satellites have been fundamentally important because of the very limited opportunities for human exploration, return of samples, or even detailed in situ analyses. For the earth, analyses of image data have been central to several emerging fields of interest. The focus in this article is on key scientific objectives associated with the geology of the earth and how appropriate imaging data sets will help meet those objectives. This focus is not meant to downgrade the importance of imaging in robotic or human exploration of the solar system. Rather, it is meant to provide a coherent theme for the article and to delineate how imaging will help address a suite of fundamentally important objectives and challenges that will face us during the next few decades. Information on imaging requirements and results of planetary exploration can be found in publications generated by the Committee on Planetary Exploration of the Space Studies Board, National Research Council, and in the Journal of Geophysical Research. The article begins with an overview of remote sensing as applied to geology and focuses on imaging applications. Sections that follow provide summaries of objectives and tasks based in part on plans put together by the United States Geological Survey for the next decade (1). These plans focus the key scientific areas of research within geology that have significant societal benefits and provide a coherent pathway for discussing uses of imaging in
geology. Specific examples of the major objectives are given to illustrate the use of imaging techniques for particular applications. REMOTE SENSING IN GEOLOGY Imaging in geology spans a very wide range of applications. Interpretation of panchromatic and color aerial photography is still a very useful method that leads to geomorphic, structural, and geologic maps based on photogeologic analyses and extraction of quantitative data from photogrammetric processing (2). Identification and mapping of individual minerals is also possible by using imaging spectrometers that operate in the visible and reflected infrared and thermal portions of the electromagnetic spectrum (Fig. 1–3). Synthetic aperture radar (SAR) systems have been used to map terrains hidden by clouds and to penetrate beneath dry sands and map buried stream channels (3). Interferometric SAR data acquired with dual antennas or during repeat passes have been used to map fine-scale topographic changes across large areas (4). The use of imaging for planetary sciences focuses on mapping surfaces globally and locally to understand tectonic and surface processes (5). Imaging for the earth sciences includes local to regional scale studies due to lack of access to affordable global-scale data with high enough spatial resolution and appropriate spectral coverage to identify and map landforms and rock types across large areas. Landsat Thematic Mapper (TM) data and data from
115°55'W
115°50'W
38°35'N
38°35'N
38°30'N
38°30'N
Figure 1. Advanced Visible Infrared Imaging Spectrometer (AVIRIS) false-color composite for portion of Pancake Mountains, Nevada. Color assignments are blue = 0.8485, green = 1.5245, red = 2.2026 micrometer wavelengths. These color assignments maximize color variations among major rock types. Pixel size is 15 m. UTM Projection. Based on spectra shown in Fig. 2 and field work, green areas correspond to outcrops of rhyolite ash and lava flows that have been altered to kaolinite, tan rocks are dolostones, and cyan outcrops are pegmatite veins with muscovite. K, D, and M show locations of spectra in Fig. 2, and the box shows an area for which linear unmixing was applied (Fig. 3). The lobate dark feature trending east to west south of the box is a basaltic lava flow. See color insert.
115°55'W
115°50'W
IMAGING APPLIED TO THE GEOLOGIC SCIENCES
Kaolinite
(a)
649
(a)
1.20
Reflectance
1.10 1.00 0.90 0.80 Lab
0.70
Aviris
0.60 0.50 2.00
2.10
2.20
2.30
2.40
Wavelength (micrometers) (b)
Dolomite 1.20 (b)
Reflectance
1.10 1.00 0.90 0.80 0.70
Lab Aviris
0.60 0.50 2.00
2.10
2.20
2.30
2.40
Wavelength (micrometers) Muscovite
(c) 1.20
Reflectance
1.10 1.00 0.90
Figure 3. Portion of false-color composite in Fig. 1 is shown at top. Bottom represents areal proportions of kaolinite (red), muscovite (green), and dolomite (blue) dominated AVIRIS spectra. See color insert.
0.80 Lab
0.70
Aviris
0.60 0.50 2.00
2.10
2.20
2.30
2.40
Wavelength (micrometers) Figure 2. Spectra for the 2- to 2.4-micrometer portion of the AVIRIS data for (a) rhyolite, (b) dolostone, and (c) vein cutting dolostone. Spectra are compared with laboratory spectra for kaolinite, dolomite, and muscovite to show good fits to these minerals. Data have been normalized by continuum to emphasize absorption features. The good fit to kaolinite reinforces the extent to which the rhyolites have been altered. The muscovite in the veins is the spectrally active species present and thus dominates the fit. Other minerals in the veins include quartz and feldspar.
the SPOT High Resolution Visible (HRV) imaging systems are used extensively in geologic applications (2,6). The reason is that these systems are acquired on a global basis and have sufficient spatial resolution (30 m/pixel for TM and 10–20 m/pixel for SPOT) to sample landforms and features that have enough detail to be used extensively in geologic applications. In addition, both systems have multispectral capabilities (0.45–2.35 micrometers for TM
with 6 bands and 0.54–0.89 micrometers for SPOT threeband multispectral mode) of use in mapping major rock types and degree of alteration by hydrothermal systems and weathering. Note that the spectral sampling intervals and number of bands are not sufficient for unique identification of specific absorption features associated with minerals (6). Thus, extensive field work is necessary to check maps and inferences derived from these data. Numerous geologic studies have also been done using orbital SAR data (Shuttle Imaging Radar Missions, European Radar Satellite, RADARSAT) (2). High-altitude aerial photography remains a prime data set used by most geologists because of the high spatial resolution, wide coverage, and relatively low cost to the customer. Most detailed geologic analyses of the earth that involve advanced imaging systems thus far have been restricted to airborne data sets, including the Airborne Visible and Infrared Imaging Spectrometer (AVIRIS) that operates from 0.4–2.5 micrometers (8). This system has the spatial resolution (approximately 20 m/pixel
650
IMAGING APPLIED TO THE GEOLOGIC SCIENCES
or better) and spectral coverage (224 bands) to be of great use in mineral and rock mapping at scales of outcrops (Fig. 1–3). The Thermal Infrared Multispectral Scanner [TIMS; six bands, 8–14 micrometers atmospheric transparency window (9)] and Airborne and Topographic SAR (AIRSAR/TOPSAR) polarimetric and interferometric radar systems are also used extensively (10,11). A main point made in this article is the need to have these types of systems in orbit so that targeted areas can be covered in detail on a globally selectable basis. In the next sections, objectives and measurements for imaging in geology are given with examples and, it is shown, feed into this conclusion. ENERGY AND MINERAL RESOURCE ASSESSMENT Overview Continued economic growth of modern industrial society requires uninterrupted access to supplies of nonrenewable resources, including fuels in the form of oil, gas, coal, and uranium. Resources also include structural metals and building materials. Concrete, a major building material, is made from limestone, silica, gypsum, sands, and gravels. All of these materials are nonrenewable; they have and will be removed from the earth at a rate much faster than they form by geologic processes. Discovery of new deposits fundamentally requires knowledge of the geologic setting in which the materials formed. Imaging observations and analyses have been and will continue to be important elements for defining these settings (12). Knowledge of the geology of the earth varies across an extraordinary range. For example, large portions of the Asian continent remain largely unmapped at the fine scale needed to define likely nonrenewable resources. Even in Egypt, where mining has been underway since Pharaonic times, use of Landsat Thematic Mapper data to match geologic features on either side of the Red Sea led to major advances in understanding the overall structural setting of the Arabian–Nubian Shield and associated mineralization zones (13). Geologic mapping and prospecting continue in the southwestern United States, particularly in Nevada, site of many large ore deposits found during the past 100 years and probably others that remain to be discovered, mapped, and mined. In the example that follows, it is shown that AVIRIS data are useful in direct mineral mapping from reflectance spectra that cover terrain in Nevada that has been ‘‘claimed’’ for mining and explored with test pits and sampling. Use of AVIRIS for Mineral Mapping in the Pancake Mountains, Nevada The Pancake Mountains in Nevada are located within the Basin and Range Province (Fig. 1). Uplift along normal faults that bound the mountains has produced a horst block that exposes a thick sequence of rhyolite ash flow tuffs deposited within the ancient Lunar Lake caldera (14). Landslides within the caldera juxtaposed the tuffs and extensive inclusions of dolostones and other preexisting rock types. In addition, basaltic lava flows were emplaced
toward the end of the uplift period, and resistant flow surfaces capped the terrain in some areas. The Pancake Mountains are located halfway between the mining towns of Tonopah and Ely. Tonopah was founded in 1900 on the strength of Nevada’s gold boom; it produced in excess of nine million dollars of gold per year before switching to mining silver ores (15). Ely also began as a gold town but switched to copper ore when the gold ores were exhausted by the early twentieth century (15). The geologic setting of the ores is one in which rhyolite deposits have been altered by hydrothermal fluids and consequent emplacement of late stage incompatible elements such as gold, silver, and copper. AVIRIS data that cover the southern portion of the Pancake Mountains were acquired on September 29, 1989 at 10:47 AM local time as part of NASA’s Geologic Remote Sensing Field Experiment (Fig. 1). The data were calibrated to relative reflectance by dividing the radiance for each band by the average of all bands on a pixel by pixel basis. Examination of the resulting normalized spectra and comparison to laboratory spectra led to identification of kaolinite, Al2 Si2 O5 (OH)4 , based on the 2.16- and 2.21micrometer absorptions associated with Al–OH bend plus OH stretch combinations (Fig. 2) (16,17). Kaolinite in this setting is a common hydrothermal alteration product of feldspar. It is diagnostic of the extent of alteration and is often associated with ore minerals. Kaolinite is also a valuable mineral in its own right, and has varied uses as a filler in paper manufacturing and as the main component of ceramics and porcelain. AVIRIS data also allow unambiguous identification and mapping of muscovite and dolomite (Fig. 2). Muscovite was identified based on absorption features at 2.35 and 2.45 micrometers (16) and dolomite on the basis of the 2.32-micrometer feature (18). Linear unmixing was also applied to the calibrated AVIRIS data to show areal proportions of kaolinite, muscovite, and dolomite end members for each pixel (19). A fourth end-member that represents the spectral reflectance of a dark surface, radiance that reached AVIRIS from the atmospheric scattering alone, was also included. Figure 3 shows a color composite that represents proportions of the three end members where shadowed areas and the dark surface end member are removed. Comparison with the geologic map produced by Quinlivan et al. (14) shows that the end members correspond closely to major rock types. In particular, high proportions of kaolinite are found on outcrops of rhyolite, dolomite on the dolostones, and high proportions of muscovite correspond to linear veins that are pegmatites that cut the dolostones. More detailed analyses also show that several other members can be mapped from the AVIRIS data, including iron oxide associated with particular facies within the rhyolite units. Thus AVIRIS data and, more generally, imaging spectrometer data can be used for detailed mapping of alteration sites and identification of key areas for detailed ground-based assessment of mineral potential. Unfortunately, AVIRIS is an airborne instrument and thus data are restricted to only a few areas on the earth.
IMAGING APPLIED TO THE GEOLOGIC SCIENCES
GEOLOGIC HAZARD AND DISASTER ASSESSMENT Overview Geologic hazards include earthquakes, volcanism, and associated atmospheric and surface processes such as ash clouds, vog (volcanic fog with noxious fumes), ash flows, and lava flows. Hazards also include landslides and mudslides triggered by excess rainfall, earthquakes, or volcanism. Floods from fluvial systems, erosion of shorelines by storms, tsunamis from submarine earthquakes, and dust storms are other examples of geologic hazards. As the population of the earth increases and people populate areas prone to hazards, the extent to which these events will cause disasters will increase. Imaging of a variety of types has and will continue to be used to map hazardous features and processes over time to assess the effects of disasters. A logical approach to assessing of hazards and disasters is first to establish the nature of hazards and their probability of occurrence. This work naturally leads to assessments of probabilities of disasters, along with assessments of disasters as they are underway and postdisaster evaluation of their effects. Except for monitoring the spread and effects of ash clouds and vogs, analyses of geologic hazards and disasters require observations of high spatial resolution, in addition to periodic repeat coverage. For example, consider the assessment of hazards and disasters due to earthquakes. One approach is to survey areas of concern for past evidence of surface rupture and displacement. This requires image coverage at the scale of the ruptures and displacements, typically at spatial resolutions of 20–30 m/pixel or better. Another application is to map the topographic displacement, termed coseismic displacement, on either side of a fault line by using repeat pass SAR interferometry to identify centimeters’ worth of vertical offset from rupture and faulting (4). The work has proven valuable in delineating hazard probabilities and also in disaster assessment. Similar studies have been undertaken to assess volcanoes as hazards and to evaluate the extent of disasters from eruptions (20). For example, volcanoes typically inflate before eruptions, as magma rises to the surface. SAR interferometry has been used to map the inflation of Mt. Etna and other active volcanoes (21). Thermal emissions from volcanoes have also been monitored from Landsat Thematic Mapper band 6 (10.4- to 12.5-micrometer coverage) to assess the potential for eruptions (20). SAR is particularly useful for mapping during disasters that occur during bad weather because radar wavelengths penetrate clouds and precipitation (3). Landslides and mudflows associated with volcanic eruptions have had devastating consequences during the past two decades in areas such as the Andes Mountains. These earth movements are also triggered by particularly wet conditions in regions that have steep slopes and fairly loose materials and by accelerations due to nearby earthquakes. For example, in an extensive monitoring effort, Cannon et al. (22) surveyed areas in California susceptible to slides and flows and monitored the regions ˜ years when storms during the wet 1997–1998 El Nino
651
frequently intruded into California from the Pacific Ocean. Property losses during this period of time exceeded $550 million statewide. This study also targeted shoreline erosion from the storms. The primary product was a suite of maps that summarize susceptibility to mass failures and shoreline erosion. Generation of these products relied heavily on digital topographic data generated from stereophotogrammetric analyses of aerial photography. The Great Floods of the Midwest during 1993 were caused by continued rainfall throughout 1992 and into 1993 (23). Soils were saturated and runoff was maximized. The floods began in the early summer of 1993 and continued into the fall. Massive damage occurred along the Lower Missouri River floodplain from Sioux City, Iowa, to St. Louis, Missouri, and along the Upper Mississippi River floodplain all the way to Cape Girardeau, Missouri. Total damage has been estimated at several billion dollars (24). In the example that follows, Landsat Thematic Mapper and SPOT data are used to assess the effects of the Great Floods of 1993 on the floodplain of the Lower Missouri River. The example also explores the use of imaging to monitor how the river has reconnected with a series of new wetlands composed of areas judged to be too damaged or too susceptible to flooding for continued farming. These areas have been turned into a series of wetlands that have increased habitat diversity and reduced the magnitude of yearly spring floods. Great Floods of 1993 and the New Big Muddy Fish and Wildlife Refuge The 1993 floods along the Lower Missouri River produced discrete levee failures on a massive scale, extensive scour immediately downstream of the levee breaks, and deposition of sand splays as floodwaters spread from scour zones onto the floodplain (25,26) (Fig. 4). Repairs on a very large scale have been conducted on the floodplains since the 1993 floods, and practically the entire floodplain, except for approximately a dozen newly designated wetlands (a few percent of total area) has been put back into agricultural production (Fig. 4). These new wetlands fall under the ownership and/or management of the United States Fish and Wildlife Service (USFWS), the Missouri Department of Conservation (MDC), the United States Army Corps of Engineers (USACE), and private owners. One of the most heavily damaged portions of the Missouri River floodplain, located between Glasgow and Boonville, Missouri, is known as Arrow Rock Bottoms, Jameson Island, and Lisbon Bottoms (Fig. 5 and 6). These areas, now part of the United States Fish and Wildlife Service (USFWS) Big Muddy Fish and Wildlife Refuge (27), have been monitored by the authors since 1994, using a combination of Thematic Mapper, SPOT, and airborne observations. The study sites are locally known as loop bottoms, sections of the floodplain approximately equal in length and width and bounded by tight river bends (28). The river channel bounds loop bottoms on three sides, extending across the floodplain from bluff to bluff on both the upstream and downstream sides of the bottoms. Loop bottoms are different from long bottoms, which are portions of the floodplain bounded by relatively
August 26, 1993
Glasgow bottoms Glasgow (USGS gaging station)
Lisbon bottoms
Lisbon bottoms/ Jameson island Kansas city
St.Louis
Arrow rock bottoms Jameson island
Missouri
Arrow rock
Petersburg bottoms
Figure 4. Missouri place map where locations of Lisbon Bottoms and Jameson Island study areas are shown as an inset. False-color IR Landsat Thematic Mapper scene acquired between two bluff-to-bluff flooding events in 1993 is shown for study area and regions immediately upstream and downstream. Levees were breached in each bottom, and consequent scouring immediately downstream of the breaks (dark areas) and deposition of sand splays on the bottoms (bright areas) occurred. See color insert.
September 24, 1992
Boonville (USGS gaging station)
0 km
August 1, 1993
March 22, 1994
0 km
May 11, 1998
4 km
Figure 5. Time series showing study area changes before, during, and after the floods of 1993. Note the agricultural fields in 1992, bluff-to-bluff flooding in 1993, scours and sands in 1994, and the extensive vegetation and through-going chutes in 1998. See color insert.
652
10 km
IMAGING APPLIED TO THE GEOLOGIC SCIENCES
653
Lisbon bottoms 1
1993 levee breaks Intact levees Post-93 levee repairs Scour zones Sand deposits River Field work sites
1,2 1 2 3,4 6
5 3
6
Jameson island
Arrow rock bottoms 0 km
long, straight channel reaches. Loop and long bottoms are consequences of the underfit nature of the Lower Missouri River in which the characteristic meander radius of curvature exceeds the floodplain width (28). Comparison of the Lower Missouri River floodplain in 1879 and 1920 (based on ground surveys) shows that the loop bottoms study areas migrated downstream by approximately two kilometers (29). This migration was accomplished by outer bank erosion of the upstream portion of the bottoms and inner bank deposition on the downstream portions (28). Channelization of the river and associated emplacement of levees and wing dikes by the United States Army Corps of Engineers (USACE) and others precluded continued meander migration and fixed the locations of the study area bottoms as early as 1920. The land uses of the study area loop bottoms are illustrated in Thematic Mapper false color infrared data acquired in 1992, 1994, and 1996 and a classification map generated from the data using a supervised maximum likelihood estimator algorithm (30) (Fig. 5 and 7). Spectral properties for a suite of training samples within the scene were used to define the classes. Six classes of land cover within the study area were identified, including standing water, sand deposits, xeric (dry) shrubs and grasses, sparse mesic (moist) vegetation, mesic willow (Salix spp.) and cottonwood (Populos deltoides) saplings, and riparian forest. In addition, an agricultural land class was included on the basis of training samples from the adjacent long bottoms, although no agricultural land was found within the study area by the classifier after 1992. Training
4 km
Figure 6. Sketch map showing major features produced in study area as a consequence of the 1993 floods.
samples were measured for each class on each of the three scenes to account for scene-to-scene differences due to growing season variations, maturity of the vegetation, and lighting conditions. Training sample sizes varied from 50 to 1000 picture elements. Finally, every location within the three scenes was classified. In the 1992 preflood scene in Fig. 5, actively growing vegetation appears red in the false-color image because of low reflectances in green and red wavelengths relative to values for the near-infrared wavelengths (31). Missouri River water surrounding the two study loop bottoms is delineated as cyan due to suspended sediment, rather than the dark blue color that is characteristic of clear, deep water. A patchwork pattern of agricultural fields defines the extent of the floodplain; approximately 85% of the area was under cultivation (mainly soybeans and corn) before the 1993 floods (26). The range of colors within the fields is due to a combination of actively growing crops and fallow areas. Forested areas are located mainly outside the floodplain on the bluffs and appear in the composite image as red regions. A few patches of riparian forest are found within the floodplain, including that covering the chute that separates Arrow Rock Bottoms and Jameson Island. A forest-covered area is also evident on the northeastern side of Lisbon Bottoms. Further, forest covers the outer levees along the river bank. Mature forests along the floodplain typically consist of hackberry (Celtis occidentalis), American elm (Ulmus americana), green ash (Fraxinus pennsylvania), black walnut (Juglans nigra), and sycamore (Platanus occidentalis) (32).
654
IMAGING APPLIED TO THE GEOLOGIC SCIENCES
Sept 1992
Figure 7. Classification maps of land cover in study areas before and after the floods of 1993. Note the growth of vegetation that is controlled by the proximity to chutes and channels, as opposed to drier sand splays. See color insert.
Thematic Mapper false-color infrared data acquired in August 1993 show the study area during the 1993 flood (Fig. 5). The data were acquired two days after the 1993 flood peak discharge of 20,400 m3 /s was recorded at the Boonville gaging station. This station is approximately 18 km downstream from the study area and provides reasonable estimates of discharge within the study area because little discharge is added to the river between the study area and the station (29). River discharge at Boonville was approximately 17,000 m3 /s on the day the Thematic Mapper data were acquired. The data show that water extended from bluff to bluff; only the tops of trees stood exposed. For reference, it is estimated that a discharge of about 8,500 m3 /s would fill the river channel to the tops of the pre-1993 levees in the study area (29). Geomorphic and land cover changes due to the 1993 floods are evident in Thematic Mapper data acquired in March 1994, after floodwaters receded (Fig. 5–7). Levee breaks on all three bottoms are evident in the images as water-filled scour zones and gaps in the levee tree cover. There are major entrance levee breaks and associated scour zones and sand splays on each of the study area bottoms. Major exit breaks are also evident. The Lisbon Bottoms entrance breaks occurred at the beginning of a tight bend in the river. A long scour zone was produced on Lisbon Bottoms downstream of the entrance break and on an azimuth similar to the trend of the linear reach of the river before entering the bend just to the north of Lisbon Bottoms. This scour zone is the largest of several in the study area and extends into the floodplain for more than 1.5 km and is approximately 400 m wide (26).
Mar 1994
May 1998
Water
Mesic sparse vegetation
Agriculture
Sand
Mesic willow and cottonwood saplings
Xeric shrubs and grasses
Riparian forest
0 km
4 km
Erosion removed up to 10 m of soil and alluvium within the scour zone. At Arrow Rock Bottoms, a detailed survey by Schalk and Jacobson (25) showed that the scour zone south of the levee break was approximately 400 m wide and nearly 1 km long. The maximum measured scour depth at Arrow Rock Bottoms was approximately 15 m (25). On Jameson Island, a section of levee almost 2 km long was removed at the northeastern corner of the bottoms, based on measurements using the Landsat data. A scour zone approximately 200 m wide was eroded that paralleled the river channel for the length of the levee break. Sand splay deposits, which appear as bright areas in the Thematic Mapper data, formed on both sides and to the south of the scour zone (Figs. 5–7). Sand thicknesses of 1 to 2 m were measured in the field for these deposits (25,26). Most of the nonwoody vegetation on the bottoms was removed during the flood. In areas outside of the sand deposits, several centimeters of roots were found exposed, indicating modest loss of topsoil (26). Sands converge toward the exit break on the southern side of Lisbon Bottoms. The study areas have been undergoing rapid geomorphic changes since the 1993 floods because yearly floods have reconnected the river with these bottoms and because most vegetative cover was removed during the floods. The river reached inundation levels nine times between August 1993 and January 1998 in at least portions of the study areas. The two largest post-1993 flood events occurred in May 1995 and May 1996; peak discharges at the Boonville gaging station were 10,025 and 8,212 m3 /s, respectively. Thematic Mapper images acquired in May
IMAGING APPLIED TO THE GEOLOGIC SCIENCES
1995 and May 1996 indicate that these two events resulted in bluff-to-bluff flooding within the study area. Significant flooding also occurred in the study area in February and April of 1997 and in April and October of 1998; peak discharges at Boonville were between 7,000 and 8,000 m3 /s. Aerial photographs indicate that the 1997 floods inundated approximately one-half of the study area floodplain (29). SPOT false-color infrared image data (not shown) acquired on 6 August, 1996 show significant changes relative to the 1994 Thematic Mapper scene. A continuous chute across Lisbon Bottoms that formed during the May 1996 flood is evident in the image. In 1997, before flow was restricted with riprap, the Lisbon Bottom chute carried approximately 30% of the Missouri River discharge. The northeastern corner of Jameson Island at the location of the original scour zone was further eroded by bank undercutting (33). The continued erosion of Jameson Island increased the width of the Missouri River channel at the tight bend. The Arrow Rock Bottoms scour zone extended further into the floodplain. A chute system of multiple channels developed to connect the Arrow Rock Bottoms scour zone and the exit levee break at the south end of the bottoms. One branch of this chute system was located close to the bluffs on the western edge of the bottoms, and the other followed the western edge of the riparian forest. Sand splays formed during the 1993 flood were draped by silt and clay-size sediment. Vegetation encroached on the sand splay deposits, particularly in areas where silt and clay were deposited on the sand by floods in 1995 and 1996 (33). The changes in the study area between 1996 and 1998 can be seen in the Thematic Mapper image acquired in May 1998 and in the map of land cover generated from this scene (Fig. 5 and 7). The Lisbon Bottoms chute further evolved by widening, migrating laterally, and creating several meanders (29). Erosion of the northeastern corner of Jameson Island and the development of the Arrow Rock Bottoms chutes continued. Vegetation surveys conducted by the authors in 1998 and 1999 found that much of the Arrow Rock Bottoms chute complex was sparsely to moderately covered by mesic grasses and forbs. In addition, it was found that a dense cover of willow and cottonwood saplings bordered the chutes by 1998. The original sand splay deposits are less evident in the Landsat scene (Fig. 7) because these sands are covered by xeric forbs and grasses. Willow and cottonwood saplings cover much of the floodplain outside the chute system and the sand splay deposits. The geomorphic and vegetative evolution of the study area after the 1993 flood demonstrates that biogeomorphic feedback mechanisms have operated. Willow saplings have grown on the lowest, wettest areas, including regions adjacent to the chutes. Cottonwood saplings have appeared on soils that were stripped during the 1993 flood and are consequently relatively low and wet, although not as wet as chutes and adjacent areas where the willows are growing. Xeric forbs (i.e., bushes) and herbaceous grasses have grown on the high, dry sand deposits that formed during the 1993 flood (33). The Arrow Rock Bottoms chute system, which has been active with flowing floodwater each year, provided an environment that is either under
655
water or very wet for most of the growing season. Thus, this chute system is free of willow and cottonwood saplings and instead is populated by mesic forbs and grasses. Willow and cottonwood saplings, however, have grown right up to the edge of the Arrow Rock Bottoms chute system. The increased roughness effects of the saplings have constrained floodwater dynamics within the chute system. This flow-focusing effect of the saplings, in turn, further accelerated the development of the chute system. In summary, the new wetlands along the Lower Missouri River are undergoing very rapid geomorphic and vegetative changes that have numerous feedback loops, as the river reconnects with its wetlands. GLOBAL CHANGE DYNAMICS Overview Changes in the earth’s climate due to greenhouse effects associated with burning fossil fuels and deforestation are expected to increase the average temperature of the earth modestly during the next few decades (34). Global circulation models shows that this increase in temperature will change the distribution of climate zones, including a general movement of the latitudinally distributed zones toward the poles. Sea level may increase slightly, inundate the lowest areas, and allow storm surges to extend further inland. Changes in climate and storm systems will result in changes to fluvial systems, lacustrine systems, shorelines, and the nature and extent of desert terrains. Imaging data of modest resolution acquired periodically and over large areas will be important for monitoring changes in biomes. Imaging data of high spatial resolution acquired periodically will be important for mapping changes in landforms, such as sediment yields from drainage basins where decreasing vegetative cover leads to increased soil erosion (1). In the following example, the postulated desertification of the Sahel ecosystem in Africa is analyzed. The popular belief that removal of vegetation for fuel and by grazing led to increased albedo and thus less rainfall [the socalled albedo feedback mechanism (36,37)] is shown to be wrong. It is found instead that the vegetative cover is very sensitive to annual rainfall; noticeably less cover exists during drought years, as opposed to wet years (38). This example is particularly relevant because it is predicted that the area surrounding equatorial Africa will experience more variability in rainfall as greenhouse warming proceeds, and thus changes in landscapes and biomes are expected. This example demonstrates the importance of periodic imaging to monitor key areas and to allow testing key models related to regional-scale changes in land cover and landscapes. Desertification in Equatorial Africa The Intertropical Convergence Zone (ITCZ) is the region centered on the equator where the tradewinds meet. It is an area of upwelling, adiabatic expansion, atmospheric cooling, and consequent precipitation that produces the tropical jungles in Africa. In Africa, the ITCZ is displaced to the north of the equator during the northern summer
656
IMAGING APPLIED TO THE GEOLOGIC SCIENCES
and to the south of the equator during the northern winter. Thus, the belt of rainfall associated with the ITCZ shifts northward in the summer and southward in the winter. Further, the warm continent in the summer causes air to ascend, drawing moist air from the Atlantic and Indian Oceans and generating monsoonal rains. Thus, precipitation in the Sahel, the region north of the equator between the tropics and the Sahara Desert, is seasonal; maximal amounts occur during the summer. In years when the shifting ITCZ is not displaced too far to the north, the Sahel is subjected to drought as occurred in the late 1960s and early 1970s (37). In addition to effects of seasonal precipitation, some authors have postulated an anthropogenically related albedo feedback mechanism that has produced a permanent shift of the desert conditions within the Sahel toward the equator (35,36). Surface albedo in the visible and reflected infrared, it is hypothesized, increases as vegetation is removed by overgrazing and cutting wood for fuel. Increasing the albedo decreases the amount of sunlight absorbed, cooling the surface and lower troposphere. This enhances the ability of air to sink and compress adiabatically, making it less likely to have rainfall, compared to areas that have vegetative cover and darker, warmer surfaces. When grasslands and shrubs are removed, wind erosion of soil also increases, making it difficult for new vegetation to gain a foothold. These processes fall under the heading of desertification of semiarid lands. The question can be postulated whether desertification is a demonstrated process for the Sahel or whether the very dry conditions associated with regional-scale droughts during the 1960s and 1970s were responsible for the observed shift of the margin of the Sahel southward during the early 1970s. The test is to look at annual variations in vegetative cover and rainfall subsequent to the extended drought period and the extent to which the Sahel margin simply follows the rainfall. Precipitation and vegetative areal density for the Sahel were tracked by using satellite and ground data compiled by workers at the NASA Goddard Space Flight Center from 1986 to 1993. Normalized Difference Vegetation Index (NDVI) values were calculated from the NOAA Advanced Very High Resolution Infrared Radiometer (AVHRR) data. This instrument images in the visible and reflected IR wavelengths (0.58–0.68, 0.72–1.1 micrometers) and has a pixel size of approximately 1 kilometer. NDVI is the ratio of the difference between the brightness of these two bands divided by the sum of the brightnesses. High vegetative density shows as high NDVI (2,6). Precipitation records from ground stations in Africa were used to generate simulated images by interpolation between station locations and precipitation values. Results for the calendar year 1988 are shown in Fig. 8 as NDVI and precipitation images for January and August and in Fig. 9 as monthly time series for a portion of the Sahel. Clearly the vegetative cover closely follows precipitation and lags by approximately one month. The year 1988 was wet in the area and shows that the postulated propagating desert margin is, in fact, an artifact of looking at years when a regional-scale drought was in effect.
0°
Jan 1988
Jan 1998
0°
Aug 1988
Aug 1988
Precipitation 0 mm
NDVI 1000
0
0.6
Figure 8. Normalized difference vegetation indices (NDVI) for Africa for January and August 1988, generated from Advanced Very High Resolution Infrared Radiometer (AVHRR) data. Also shown is a simulated image of rainfall, generated by interpolation between ground stations. Note how the NDVI (and thus vegetative cover) follows precipitation as the Northern Hemisphere season changes from winter to summer. The white line is the equator. The small box shows the area in the Sahel for which the time series of precipitation and NDVI are plotted in Fig. 9.
GEOLOGIC FRAMEWORK FOR WASTE AND PATHOGEN DISPERSAL Overview Wastes produced from industrial societies have polluted the environment. For example, tailing piles that consist of dolostones and small concentrations of lead-zinc-copper sulfide ores dot the landscape in the Missouri lead belt. Lead is enriched in people who live near these tailings piles due to ingestion of dust blown from the hills. Smelters of processing ores from mines in the lead belt have left traces of lead and cadmium in the environment for hundreds of kilometers downwind. Leaching of acid waters from mines in the southwest has been a major, although local, problem. Imaging has been and will be used as a tool for evaluating and monitoring cleanup of these types of wastes that typically cover local areas at high spatial resolution (6). Another area of growing concern is the relationship between pathogen origin and dispersal, also known as pathogen vectors. These pathogens can be transported by surface and groundwater or in the air. The sources for the pathogens can be related to hydrogeologic conditions, as in the following example, which traces the mapping of Culex pipiens, a mosquito that breeds in wet soils and propagates the disease filariasis. Soil Moisture Mapping and Filariasis Prevalence, Nile Delta, Egypt Numerous studies have demonstrated the usefulness of image data in mapping environmental risk factors such
IMAGING APPLIED TO THE GEOLOGIC SCIENCES
200
657
0.40 Precipitation (mm)
175
0.35
150
0.30
125
0.25
100
0.20
75
0.15
50
0.10
25
0.05
0 1986
0.00 1987
1988
1989
1990
1991
1992
1993
Year
as soil moisture, diurnal temperature variations, and land use practices that control the distribution of tropical diseases, including schistosomiasis, trypanosomiasis, and malaria (38 and references therein). In this example, Thematic Mapper data are used to calculate available surface soil moisture as an environmental risk factor for the tropical disease Bancroftian filariasis (38). Bancroftian filariasis, a deforming illness transmitted by mosquitoes and caused by the parasite Wuchereria bancrofti, affects approximately 120 million people in tropical and subtropical regions worldwide (39). Filariasis is a progressive disease that slowly destroys the lymph system and causes elephantiasis in about 10% of the cases. The transmission of filariasis is strongly influenced by available surface moisture because the mosquitoes require standing water for breeding and humid atmospheric conditions for physical transmission of the disease. The southern Nile Delta of Egypt (Fig. 10), where filariasis is locally endemic and reemergent, provides the focus of this example because of available image coverage and filariasis prevalence data collected by the Egyptian Ministry of Health from 1985 to 1991 in 173 villages located in the southern Delta. In relatively dry regions like Egypt, moisture may be the single most important environmental factor that determines the distribution and spread of mosquito-borne diseases like filariasis. Mosquitoes require stagnant water for breeding and humidity for the prolonged survival (at least two weeks) necessary for maturation of filarial larvae to the infective stage. Humidity is also required for the infective larvae to survive on the skin until they can enter the human host. Thus, humidity is a crucial environmental factor in the transmission and regional distribution of filariasis. Image data can provide an efficient way to examine the regional-scale variability in moisture availability. Such data can be used to model available surface soil moisture as a proxy for humidity. In
1994
NDVI
Precipitation (mm)
NDVI
Figure 9. Time series of NDVI and precipitation for the Sahel showing that the vegetative cover closely tracks rainfall and lags by one to several months.
this example, Thematic Mapper data acquired in August 1990 over the southern Nile Delta are used to estimate the available surface soil moisture. Average surface soil moisture is calculated from a model developed by Gillies and Carlson (40) that uses normalized radiant surface temperature and fractional vegetative cover to solve a fourth-order polynomial equation that yields available surface soil moisture; then this is used to create a soil moisture map. Landsat Thematic Mapper visible and near-infrared bands are converted to reflectance values, and data from the thermal band are converted to temperature. Reflectance data for the red and infrared bands are used to calculate the normalized difference vegetative index, which is then converted to fractional vegetative cover. The relationship between NDVI and surface radiant temperature data for the southern Nile Delta is shown in Fig. 11. Several physical parameters constrain the roughly triangular shape of the data swarm. Decreasing NDVI with increasing temperature along the high temperature side of the data swarm is called the warm edge. The warm edge defines the warmest and presumably the driest pixels over a range of NDVI values that correspond to different mixtures of vegetation and bare soil. The point of zero vegetative cover or bare soil is the point along the warm edge that has the lowest NDVI and the highest temperature. The point of 100% vegetative cover is the point above 0.6 NDVI where the warm edge begins to bend horizontally. Similarly, the cold edge defines the wettest pixels across a range of NDVI values. Fractional vegetation (derived from NDVI) and normalized temperature (normalized between the range of the maximum radiant temperature and the observed air temperature) are inverted through a Soil Vegetation Atmosphere Transfer (SVAT) model that yields surface soil
658
IMAGING APPLIED TO THE GEOLOGIC SCIENCES
0
10
Kilometers
Egypt
Figure 10. Soil moisture map for Nile Delta and surroundings generated from Thematic Mapper data (A). An example of a relatively wet village that has a high prevalence of filariasis is shown as a white circle in B, and a dry village that has a low prevalence is shown in C. See color insert.
moisture availability (Mo ). The surface soil moisture availability in this model is defined as the ratio of soil water content to field capacity. The Mo isopleths form a triangle and converge at vegetative cover due to the insensitivity of vegetative temperature to surface soil water content (Fig. 11). To examine the relationship between available surface soil moisture and filariasis infection rates, the average available surface soil moisture of a 5-kilometer border area surrounding each village was determined for villages whose infection rates are known. A 5-kilometer radius was chosen for the border area size to correspond to a conservative estimate of the flight range of adult female Culex pipiens mosquitoes. A scatter plot of the
>90%
10%
Soil moisture index
filariasis prevalence rate versus average available surface soil moisture (Fig. 12) suggests that a critical interval of soil moisture between 0.2 and 0.5 is favorable for transmitting filariasis. Infection rates in villages whose average available soil moisture is less than 0.2 are negligible. Villages surrounded by areas of very high soil moisture availability (0.5 to 0.6) also have low rates of filariasis infection, suggesting that a critical window of soil moisture is necessary for moderate to high rates (>5% infection) of filariasis infection. The critical soil moisture window is between 0.2 and 0.5 average available surface soil moisture, and the highest incidence of disease falls between 0.3 and 0.44. It is possible that this critical
IMAGING APPLIED TO THE GEOLOGIC SCIENCES
659
NDVI versus temperature delta southwest 1
Normalized difference vegetative index
0.8
fr =100% fr =80% fr =60%
0.6
fr =40% fr =20%
0.4
0.2
fr =0% 0
NDVl SW Mo = 1 Mo = 0.8 Mo = 0.6 Mo = 0.4 Mo = 0.2 Mo = 0
−0.2
−0.4 10
15
20
25
30 35 Temperature ( °C)
40
45
50
Figure 11. NDVI plotted against ground temperature for the Delta study area where soil moisture contours are overlaid. The parameter fr is the fractional area covered by vegetation, and Mo is the measure of available moisture.
40 Filariasis prevalence (%)
35 30 25 20 15 10 5 0 0
0.1
0.2 0.3 0.4 Average surface soil moisture availability
0.5
0.6
Figure 12. Prevalence of filariasis plotted against soil moisture for 173 villages within the Nile delta study area.
available soil moisture interval reflects habitat conditions most favorable for breeding and survival of Culex pipiens mosquitoes. The prevalence rate is variable within the critical soil moisture interval, which suggests that other factors such as socioeconomic conditions, availability of pesticides to reduce mosquito populations, and availability of antifilarial drugs to treat infected humans modulate the prevalence rate. Low infection prevalence in villages whose soil moisture availability is less than 0.2 may also reflect conditions in which low humidity interferes with the transmission of infective larvae from the mosquito to the human host. IMAGING REQUIREMENTS FOR THE NEXT DECADE In this article, a number of application areas for imaging in geology are covered, including specific examples.
The material focuses on imaging requirements for geologic applications during the next decade for resource, hazard, and disaster assessments, monitoring effects of global change, and better understanding of relationships between pathogen dispersal and geologic controls. A key requirement for optimum use of imaging for these application areas is to continue to implement the planned program of earth-orbiting satellites. These include the Earth Observing System spacecraft that are designed to provide systematic and long-term monitoring of the earth’s atmosphere, surface, and oceans. TERRA, the first of these spacecraft, is beginning to make systematic observations after a period of instrument checkout and calibration. Key instruments on board include MODIS, an imaging spectrometer that covers the 0.41- to 14.4-micrometer wavelength region at a spatial resolution of 250–1000 m. MODIS data will be processed to a suite of standard
660
IMAGING APPLIED TO THE GEOLOGIC SCIENCES
products that will be of great use in monitoring global change. A second instrument is ASTER, a multispectral imaging system that operates at wavelengths from 0.52–11.7 micrometers at spatial resolutions ranging from 15–30 m for the visible and reflected infrared and 90 m for the thermal infrared portions of the spectrum. ASTER data will be acquired in stereo so that detailed topographic maps can be generated. The data will be of use in a variety of studies because the standard products pertain to mineral and rock mapping, moisture mapping, and mapping vegetation on a larger scale than possible using MODIS. A second important NASA mission is LightSAR, a planned interferometric radar system that will map the topography of earth’s landforms. These data will complement the recently acquired interferometric data obtained during the Shuttle Radar Topographic Mapping Mission (SRTM). The Tropical Rainfall Measuring Mission (TRMM) is an Earth Probe mission that will map rainfall and be of major use in hydrogeologic studies. Highaltitude aerial photography will continue to be acquired and will continue to be a very important component of many geologic applications. What is missing from the suite of planned missions is a platform and associated instrumentation for acquiring imaging spectrometric data at high spatial resolution, where pixel sizes are 20 m or smaller. The original Earth Observing System Program had such an instrument, termed HIRES, that was meant to be complementary to MODIS and focused on mapping for geologic applications. Acknowledgments We thank the NASA Solid Earth Sciences and Natural Hazards Program for Goddard Space Flight Center Grant NAG 5-7613 to Washington University.
ABBREVIATIONS AND ACRONYMS AIRSAR ASTER AVHRR AVIRIS HIRES HRV IR ITCZ MDC MODIS NASA NDVI NOAA SAR SPOT SRTM SVAT TIMS TM TOPSAR TRMM
airborne synthetic aperture radar advanced spacebone thermal emission and reflection radiometer advanced very high resolution infrared radiometer airborne visible infrared imaging spectrometer high resolution imaging spectrometer high resolution visible infrared intertropical convergence zone missouri department of conservation moderate resolution imaging spectroradiometer national aeronautics and space administration normalized difference vegetation index national oceanic atmospheric administration synthetic aperture radar satellite probatoire pour l’observation de la terre shuttle radar topography mission soil vegetation atmosphere transfer thermal infrared multispectral scanner thematic mapper topographic synthetic aperture radar tropical rainfall measuring mission
USACE USFWS UTM
united states army corps of engineers united states fish and wildlife service universal transverse mercator
BIBLIOGRAPHY 1. United States Geological Survey, Geology for a changing world, A Science Strategy for the Geologic Division of the U. S. Geological Survey, 2000–2010. USGS Circular 1172, 1998. 2. F. F. Sabins, Remote Sensing: Principles and Interpretation, 3rd ed., W. H. Freeman, NY, 1997. 3. F. M. Henderson and A. J. Lewis, eds., Principles and Applications of Imaging Radar, Manual of Remote Sensing, 3rd ed., John Wiley & Sons, Inc., NY, 1998. 4. D. Massonnet et al., Nature 364, 138–142 (1993). 5. M. H. Carr, The Surface of Mars, Yale University Press, New Haven, 1981. 6. R. B. Vincent, Fundamentals of Geological and Environmental Remote Sensing, Prentice-Hall, Englewood Cliffs, NJ, 1997. 7. R. E. Arvidson et al., in C. Pieters and P. Englert, eds., Remote Geochemical Analysis: Elemental and Mineralogic Composition, Cambridge University Press, NY, 1993, pp. 247–282. 8. G. Vane, J. E. Duvall, and J.B. Wellman, in C. Pieters and P. Englert, eds., Remote Geochemical Analysis: Elemental and Mineralogic Composition, Cambridge University Press, NY, 1993, pp. 121–143. 9. A. B. Kahle and A. F. H. Goetz, Science 222, 24–27 (1983). 10. F. T. Ulaby and C. Elachi, eds., Radar Polarimetry for Geoscience Applications, Artech House, Norwood, MA, 1990. 11. H. A. Zebker et al., IEEE Trans. Geosci. Remote Sensing GE30, 933–940 (1992). 12. Z. Berger, Satellite Hydrocarbon Exploration, SpringerVerlag, Berlin, 1994. 13. M. Sultan et al., Tectonics 12, 1,303–1,319 (1993). 14. W. D. Quinlivan, C. L. Rogers, and H. W. Dodge Jr., Geologic map of the Portuguese Mountain Quadrangle, Nye County, Nevada, USGS Misc. Inves. Series Map I-804, 1974. 15. R. R. Elliot, Nevada’s Twentieth-Century Mining Boom, University of Nevada Press, Reno, 1966. 16. R. N. Clark, T. V. King, M. Klejwa, and G. A. Swayze, J. Geophys. Res. 95, 12 653–12 680 (1990). 17. R. L. Frost and U. Johansson, Clays and Clay Minerals 46, 466–477 (1998). 18. S. J. Gaffey, American Mineralogist 71, 151–162 (1986). 19. J. B. Adams, M. O. Smith, and A. R. Gillespie, in C. Pieters and P. Englert, eds., Remote Geochemical Analysis: Elemental and Mineralogic Composition, Cambridge University Press, NY, 1993, pp. 145–166. 20. D. A. Rothery, Geology Today 5, 128–132 (1989). 21. D. Massonnet, M. Didier, P. Briole, and A. Arnaud, Nature 375, 567–570 (1995). 22. S. H. Cannon et al., GSA Today 8, 1–6 (1998). 23. D. R. Rodenhuis, in S. A. Changnon, ed., The Great Flood of 1993: Causes, Impacts, and Responses, Westview Press, Boulder, 1996, pp. 29–51. 24. Scientific Assessment and Strategy Team (SAST), Sharing the Challenge: Floodplain Management into the 21st Century: Report to the Administration Floodplain Management Task Force, part V. U.S. Government. Printing. Office., Washington, DC, 1994.
IMAGING SCIENCE IN ART CONSERVATION 25. G. K. Schalk and R. B. Jacobson, USGS Water-Resources Investigations Report 97-4110, 1997. 26. N. Izenberg et al., JGR-Planets, SIR-C Special Issue, 101, 23 149–23 167 (1996). 27. U. S. Fish and Wildlife Service (USFWS), Proposed Big Muddy National Fish and Wildlife Refuge Jameson Island and Lisbon Bottoms Units Final Environmental Assessment. Puxico, MO, 1994. 28. T. H. Schmudde, Anns. Assoc. Am. Geogr. 53, 60–73 (1963). 29. R. B. Jacobson et al., in Initial Biotic Survey of Lisbon Bottom, Big Muddy National Fish and Wildlife Refuge, V. J. Burke and D. D. Humburg, eds., U.S. Geological Survey, Biological Resources Division, Biological Science Report USGS/BRD/BSR–20000-0001, 1999. 30. J. R. Jensen, Introductory Digital Image Processing: A Remote Sensing Perspective, Prentice-Hall, Upper Saddle River, NJ, 1996. 31. A. F. H. Goetz, B. N. Rock, and L. C. Rowan, Econ. Geol. 78, 573–590 (1983). 32. D. L. Galat et al., Bioscience 48, 721–733 (1998). 33. G. L. Miller, Monitoring Interannual Changes in the Arrow Rock Bottoms/Jameson Island/Lisbon Bottoms Missouri River Floodplain as a Result of the Floods of 1993, 1995, 1996, and 1997, M. A. Thesis. Washington University, St. Louis, MO, 1997. 34. T. E. Graedel and P. J. Crutzen, Atmosphere, Climate, and Change, Scientific American Library, New York, 1995. 35. J. Charney, P. H. Stone, and W. J. Quirk, Science 187, 434–435 (1975). 36. J. Otterman, Science 186, 531–533 (1975). 37. C. J. Tucker, H. E. Dregne, and W. W. Newcomb, Science 253, 299–301 (1991). 38. M. K. Crombie et al., Photogrammetric Eng. Remote Sensing 65, 1,401–1,409 (1999). 39. E. A. Ottesen and C. P. Ramachandran, Parasitology Today 11, 129–131 (1995). 40. R. R. Gillies and T. N. Carlson, J. Appl. Met. 34, 745–756 (1995).
IMAGING SCIENCE IN ART CONSERVATION J. S. ARNEY L. E. ARNEY Rochester Institute of Technology Rochester, NY
Museum professionals regularly apply a variety of imaging technologies to artistic and historic objects. The particular imaging technique applied is determined in part by the intended use of the image. Ordinary photography, for example, is used to illustrate the approximate appearance of objects in a museum collection, and optical microscopy is commonly used to identify pigments in an oil painting. The choice of an imaging technique is also influenced by the anatomic structure of the object. For example, traditional drying oils used as vehicles and varnishes in oil paintings fluoresce more strongly as they age. Thus, a photograph taken under ultraviolet illumination is often used to reveal and document discontinuities between original and
661
restored or altered regions of a painting. On occasion, imaging techniques of this kind have been used to uncover fakes and forgeries, but such applications of imaging to museum studies are rare. By far, the most important use of imaging technologies is to archive information about the object and to extract information about the structure of the object. Archiving preserves information and makes information more readily available to scholars, and structural analysis provides guidance to museum professionals in conserving and restoring objects. This article reviews imaging techniques intended for these two purposes. ARCHIVAL IMAGING TECHNIQUES Chemical Imaging Systems The primary objective of an archive is to preserve information. Making that information readily available is a secondary objective. Often this means carefully maintaining the original book, newspaper, or other object that contains the information. Careful maintenance of objects often means limiting access to the information. Limiting access is particularly important if the original object is fragile or is subject to rapid environmental degradation. Moreover, a single copy of an original document is vulnerable to catastrophic loss, as demonstrated by the conflagration of the library at Alexandria in 641 A.D. Thus, preservation of information often requires that the archivist make a copy of the original information, a process often called reformatting by archivists, or backing up by computer technologists. The most important properties of imaging technologies for archival copying are permanence, accuracy, and retrievability (1,2). The development of photographic technology during the past century has provided several techniques to the archivist for archival photocopying. A camera using ordinary monochrome film, processed by archival techniques (3), can make a very stable negative image of a document, for example, and multiple prints on photographic paper provide convenient access to the information and cause no additional stress on the original object. The convenience and relative stability of photographic processes has led to their widespread use by archivists. Moreover, the ability to compress the data in a document by optically reducing the size of the copy image has led to the development and extensive use of microfilm technology. Silver Halide Microfilm. Microfilms used for capturing and archiving images directly from the original object are called ‘‘source-document’’ or ‘‘first-generation’’ microfilms. They are silver halide films that have a single emulsion layer (4) and are exposed, developed, and fixed by using conventional photographic reagents. The international standard ISO 9848 describes sensitometric procedures for characterizing films used for first-generation microfilming. The standard defines the D–Log(H) curve in terms of diffuse visual density D and exposure H, defined as the time integral of luminance from a black body radiator at 2650 ± 100 K. An index of sensitivity S and an index
662
IMAGING SCIENCE IN ART CONSERVATION
of gamma, called gradient G, are defined by Eqs. (1) and (2):
S=
45 , S= Hm
(1)
Dm − Dn G= . Log(Hm ) − log(Hn )
(2)
The density values Dm and Dn are defined as 1.20 and 0.10 above base plus fog, and the exposures Hm and Hn are the exposures required to produce Dm and Dn , respectively. Microfilms generally have high contrast for maximum image resolution. This requires multiple exposure testing when critical exposures are to be made. The high contrast of microfilm imposes practical problems in manuscript imaging. Fragile old books and manuscripts, particularly from rare book collections, cannot be flattened easily for photographic copying. This requires illumination from almost directly overhead and makes eliminating specular reflections more difficult. Moreover, nonflattened originals require a greater depth of field of focus, and this dictates a higher level of illumination which is undesirable when dealing with fragile historic manuscripts. Secondary Microfilming Using Vesicular Film. Many source-document microfilms are read directly by using projection readers, but they are also used to print multiple copies of second-generation microfilms for general distribution and use. These secondary copies are often made by contact printing in a nonsilver imaging process called vesicular imaging that is based on the photoinduced decomposition of aryl diazonium salts (4). The diazonium salt is dispersed in a polymer binder and coated on a transparent substrate. Exposure is made using an ultraviolet source, and the aryl diazonium salt decomposes to generate a latent image of nitrogen microbubbles.
N2+ X−
defined in ISO 9378 in terms of Eqs. (3) and (4):
hν
X + N2
The latent image is developed by heat. The heat softens the polymer and expands the gas. On cooling, the expanded voids in the polymer, called vesicles, remain and act as centers for light scattering. The final image is viewed in transmitted light by a projection microfilm reader. Exposed regions are highly scattering to light and appear dark. Unlike transparent materials based on absorption of light, vesicular materials have an image density that varies significantly with the aperture of the projection optical system. For this reason, measurements of visual density of vesicular images must include the optical aperture used for the measurement. The standard is f /4.5 for vesicular microfilms. The sensitometry of vesicular microfilms is based on a radiometric D–Log(H) curve, where H is in units of joules/meter2 , because exposure is by ultraviolet rather than visible radiation. The sensitivity and contrast are
1000 , Hm
LER = Log (Hm ) − log(Hn ).
(3) (4)
LER (log exposure ratio) is the practical range across which image formation occurs. In particular, Hm is the exposure to produce a visual density of 1.20 above Dmin (base plus fog density), where visual density is defined as an ISO standard visual density measured through an f /4.5 projection optical system. Similarly, Hn is the exposure to produce a density of 0.1 more than Dmin . Specialized Digital Imaging Techniques Digital electronic imaging is gradually replacing chemical imaging as an archiving tool. Digital cameras and scanners are the most common tools used to generate digital archives, and these devices are described in detail elsewhere in this encyclopedia. The imaging devices and techniques used for digital archiving are not unique, but digital archives present many new problems to the archivist in managing the archive. These problems include file security, provenance of a digital copy, copyright, and most important of all, the permanence of a digital file (5). The advantages of a digital format are driving significant development activity to solve these problems. Most museums maintain a catalog of photographs of the objects in the museum collection. Such pictorial archives serve a variety of needs such as teaching, preliminary art historical research, insurance documentation, and museum advertising. Often a photograph of an object assists art conservators in treating and restoring of the object. In addition, a copy image of an object can be manipulated to illustrate the way the object may have originally looked or to illustrate the probable appearance following proposed restorative treatment. Such surrogate restorations have been done by photographic techniques (6–8), but the advent of digital imaging has significantly improved the quality of such simulated restorations. For example, fading studies of commercial colorants used in early twentieth century color photographic processes have led to digital simulations of fading kinetics. The digital reversal of the kinetic curves resulted in surrogate restorations of faded images (6,9,10). The ease and widespread availability of digitally archived images are a clear advances over conventional photographic archives. (1,2,5,11–14). Moreover, pictorial images and associated text may be archived to allow search engines to locate images by complex stylistic attributes. (17–19). Permanence and long-term retrievability of digitally archived information is still a practical concern, but the need for reliable and permanent archives of digital information in nearly every technical and economic field of endeavor will drive the development of permanent material formats and long-term standards for digital storage. Moreover, because the 1’s and 0’s of digitally archived information are intrinsically permanent, a digital archive is not subject to fading and degradation in appearance as are traditional photographic archives.
IMAGING SCIENCE IN ART CONSERVATION
Thus digital archiving provides an accurate and reliable record of the condition and appearance of an object, and this in turn provides more accurate and reliable information for art historical scholarship and for conservation and restoration. The attributes of an imaging system that is appropriate for capturing pictorial images of museum objects will always be at the limit of practical achievability. (11). An ideal surrogate image of a museum object would have infinite spatial resolution, for example, and each pixel would contain the entire electromagnetic spectrum of that point on the image (20). Each pixel of an ideal pictorial copy of a museum object would contain all information about each point on the original object (gloss, elastic modulus, chemical composition, etc.). This is clearly not achievable, but there is no certain limit to how far one can go in developing approximations of the ideal. R&D efforts continue in the museum profession to develop techniques for pictorial archives, and often these efforts push the available imaging technology to the limit. Three examples are described following. The Charters of Freedom System. Beginning in the early 1980s, the National Archives and Records Administration of the United States began a $3 million project with Jet Propulsion Laboratories and Perkin-Elmer Company to develop the Charter of Freedom Monitoring System to monitor the condition of the U.S. Declaration of Independence, Constitution, and Bill of Rights, collectively called the ‘‘Charters of Freedom’’ (21). The system was completed in 1987 using the best CCD array available at the time. The CCD was mounted in a specially built camera and was thermoelectrically maintained at 18.1 ° C. Optics focused the CCD array onto a 3-cm field of view, thus providing a sampling frequency of 33 mm−1 . The camera was mounted on a high precision, mechanical mount, as illustrated in Fig. 1, and a highly reproducible illumination system was installed to achieve the highest possible radiometric reproducibility for capturing monochrome images. Each of the Charters of Freedom is mounted in a hermetically sealed glass case under a helium atmosphere. The glass cases are periodically removed from display at the National Archives and mounted under the camera for examination. Two illuminators project focused, rectangular slits of light on the document. The illuminators are 25° from opposite sides of the vertical camera. This geometry of illumination, coupled with extensive baffling within the camera barrel, was selected to minimize flare light from multiple reflections involving the several glass surfaces of the document case. The entire system was engineered to provide the highest reproducibility achievable at that time for mechanical, optical, radiometric, and electronic subsystems to monitor the Charters of Freedom and detect the possibility of minute changes in the documents. The VASARI Scanner. Visual Arts: System for Archiving and Retrieval of Images (VASARI) was a project begun in the late 1980s and is another example of a major effort to engineer an imaging system for digital archiving to
663
Granite uprights
Mechanical positioner
Camera
Glass document case Optics table
Pneumatic vibration dampers Figure 1. Approximate illustration of U.S. National Archives Charters of Freedom (21).
Figure 2. Photo of VASARI Scanner (23).
the limit of technical capability (22,23). VASARI was led primarily by the National Gallery of London but involved a consortium of European museums. The VASARI scanner, shown in Fig. 2, was developed around a commercially available camera using a 500 × 290 CCD array and a
664
IMAGING SCIENCE IN ART CONSERVATION
computer using a 386 processor. The camera was used in conjunction with an accurate mechanical stage to capture a series of subimages of regions of the original object. The subimages were then combined as a mosaic to produce a single digital image. This technique achieved a sampling frequency between 10 and 20 mm−1 across more than a square meter of the original object. The VASARI scanner also achieved very high radiometric accuracy. This was required to produce a seamless mosaic of the subimages, but it also was required to capture colorimetrically accurate data at each pixel location in the final image. Rather than using traditional trichromatic imaging in red, green, and blue, the VASARI system employed polychromatic imaging (23). Seven filters were used to control the power spectrum of the light illuminating the object during image capture. The seven filters were used in sequence to capture seven images (eight bits each) at each location of the CCD camera. The seven images provided data for an empirical but highly accurate calibration to standard CIE color coordinates. The resulting system reportedly achieved, pixel by pixel, colorimetric accuracy of E∗ab = 1.1 in CIE-L∗ a∗ b∗ units. This is at the threshold of a color difference just noticeable in a side by side color comparison by the average observer under optimum viewing conditions. By comparison, conventional color photography cannot achieve better than E∗ab = 10 under the best of circumstances (24). The VASARI polychromatic technique was described as a spectral imaging technique (23), but it appears to be only a trichromatic imaging system. However, Tzeng and Berns (25) showed that as few as seven spectral bands of information can provide quantitative estimates of reflection spectra for broad spectrum colorants typically encountered in museum paintings. The MARC Process. Methodology for Art Reproduction in colour (MARC) was also developed at the National Gallery in London and used the same 500 × 290 CCD array used in VASARI (22). However, the MARC system was designed to be easier and faster to use without a significant loss in image information. The system focused a single image of the original object onto an image plane inside the camera. High spatial resolution was achieved by moving a metal mask that has microholes in front of the CCD array, as illustrated in Fig. 3. The mask placed one square hole in front of each pixel, so that only 1/48th of the area of each CCD detector was exposed. In this way, the CCD captured exposure information corresponding to the focused detail of the original image within the CCD. Piezoelectric positioners then shifted the metal screen to the left and/or down to place the hole in a different position in the optical image projected on the CCD. The image of this portion of the focused image was then captured. The mask was shifted, so that the holes were positioned at six horizontal and eight vertical positions on the CCD to increase the sampling frequency of the CCD array from 500 × 290 to 3, 000 × 2, 320. This technique is called micropositioning and is half of the procedure used in the MARC system. The second procedure used by MARC is called macropositioning. Following the 46 image captures in
Original painting
Lens
Part of metal mask CCD array Figure 3. The MARC technique of increasing sampling frequency.
the micropositioning sequence, the entire CCD array and mask are shifted to a new region of the image plane within the camera, and 46 additional images are captured. The CCD array is moved to seven horizontal and nine vertical positions to record a total of 63 frames, each frame of 46 images. Using sufficient overlap to achieve a seamless mosaic, the final image is approximately 20, 000 × 20, 000 pixels. From the perspective of the operator, the system is much easier to use than VASARI because the camera remains in a single position and can be pointed in any direction like a typical tripod mounted camera. However, an extremely stable mount is required to minimize loss of spatial resolution through motion artifacts. Color information is obtained from the MARC system by placing a movable RGB filter array over the CCD array, so that each image is captured in red, green, and blue. The system is calibrated against a Macbeth Color Checker Chart, and a set of polynomial coefficients is determined to convert from the camera RGB values to estimated values of CIE-XYZ. Accuracy of E∗ab = 3.5 was claimed for the system. Archival Object Imaging. In additional to images of paintings, three-dimensional objects such as sculpture and ethnographic objects are included in pictorial archives in museums. Multiple photographs are often required to archive such objects. A three-dimensional, full color imaging process developed by the Canadian National Research Council (CNRC), Autonomous System Lab, can provide quantitative three-dimensional documentation of a colored object (15,16). Through a laser scanning technique, the three spatial dimensions of an object are measured, and by scanning using red, green, and blue laser beams, trichromatic color information is obtained at each point on the object. The technique, called Optical Ranging, has been adopted and used extensively by the Canadian Conservation Institute. The system is illustrated schematically in Fig. 4. The CNRC range scanner uses an RGB, He–Cd laser to allow recording color information. Range z and height y are horizontal and vertical in Fig. 4. The x dimension is determined by rotating the object around the vertical axis
IMAGING SCIENCE IN ART CONSERVATION
examining conservator searches for discontinuities in the object as indicators of its physical condition. Film cameras have long been used to document these examinations, but now digital imaging significantly enhances the utility of examination techniques as illustrated following.
B
Object
y θ
(y,z )
r
A Laser source D
z
665
E
C Figure 4. The NRC depth color scanner (15,16).
or translating the object in the direction perpendicular to y, z. A single scan in the y, z plane involves moving mirror A through one scan of angle θ . Mirror A is not a beam splitter but is fully silvered on both sides. The laser beam is reflected by mirror A and by the fixed mirror B and comes to a focus on the object. The laser point is reflected diffusely from the object. An image of the laser spot is reflected through fixed mirror C and then movable mirror A and is brought to focus on the linear CCD array E. The position of the spot image r is correlated with the mirror angle θ . The position vector (y, z) maps uniquely onto the instrument vector (θ , r). By geometric calculation or empirical calibration, the mapping from (θ , r) to (y, z) can be obtained. The scan is carried out at 20 kHz as the object is translated or rotated through the y, z plane. Beam splitting into RGB signals then provides the color information, and the result is a three-dimensional color image of the object.
Illumination Angle. Most museum objects are meant to be viewed in diffuse conditions of illumination, and under such conditions the object appears most pleasing esthetically. However, by examining the object under alternative geometries of viewing and illumination, clinically significant information can be obtained. The term raking illumination is used to describe the examination of objects illuminated at a low angle relative to the horizon. As illustrated in Fig. 5, raking illumination can reveal both topographic and gloss variations that are not easily seen under ordinary diffuse illumination. In this example, a damaged region that had been repaired by early restoration is seen in region (a), but raking-angle examination shows that the damage is more widely distributed and cracks and buckles are as far down as region (b). The topographic feature revealed at (c) suggests yet another small region that was restored and in-painted, leaving a slightly raised discontinuity in the paint layer.
(a)
DIAGNOSTIC IMAGING TECHNIQUES The analysis of art and ethnographic objects can be divided into two distinct categories. The first is esthetic analysis, the traditional domain of curators and art historians. The second category is diagnostic analysis of the object, often called scientific analysis in the museum profession (26). Diagnostic techniques such as chemical analysis, viscoelastic analysis, and optical analysis provide objective information to museum technologists about the physical condition of the object. This information is used in diagnosing and treating physical and chemical conditions that threaten the permanence of museum objects. In addition, diagnostic analysis can provide information about the provenance of an object and the methodology of the artist. This review describes imaging systems and techniques used in museum diagnostic imaging.
(b)
Examination by Visible Light The technical analysis of a museum object begins with visual inspection by an art conservator. This is analogous to a clinical examination carried out by a physician. Pictorial documentation of the examination, using film or electronic cameras, is common practice among art conservators. Thus, the most straightforward technology for museum analytical imaging involves the methodologies for visual inspection. In general, the
Figure 5. Detail of oil painting on leather (a) in diffuse light and (b) in raking specular light.
666
IMAGING SCIENCE IN ART CONSERVATION
Histogram Analysis. The ability to capture digital images offers a means of documenting a museum object and also of monitoring its condition and of making quantitative measurements. An early example of a histogram segmentation analysis was reported by the Folger Shakespeare Library in Washington, D.C. (27). Fig. 6 is an example analysis. The histogram segmentation shown in Fig. 6 was developed by Folger Library as a quantitative tool for measuring irregular areas of loss in printed illustrations from old manuscripts. The void area can be repaired by a technique called leaf casting in which new pulp is used to fill the voids and restore mechanical integrity. The key to a successful leaf casting treatment is to use a volume
of pulp as close as possible to that to required to fill the losses exactly. Too much or too little pulp results in a less pleasing appearance and lower mechanical strength. The problem, then, is to measure the area of the loss and the thickness of the remaining leaf. The technique shown in Fig. 6 involves capturing the image of the damaged leaf using a black background that shows through the irregular void. Then a simple histogram segmentation provides a reliably accurate measure of the areas of the irregular voids. Calibration is done easily by using a black square of known area. Two-Dimensional Histogram Analysis. Two-dimensional histogram analysis has been used to measure more complex forms of degradation in daguerreotypes. This is illustrated in Fig. 7, which shows a daguerreotype illuminated diffusely by using a black background (a) and at an angle equal and opposite to the angle of viewing (b) (28). The daguerreotype photographic process involves the formation of a highly scattering, white material in regions exposed to light. The substrate for the process is a polished silver mirror. Thus, specular illumination produces a negative image Fig. 7 and diffuse illumination produces a positive image Fig. 7. This positive/negative behavior is characteristic of a healthy daguerreotype, but as shown in Fig. 8, regions of the daguerreotype that have progressively more chemical tarnish vary from this positive/negative behavior. Segmentation analysis of the two-dimensional histogram, therefore, provides a quantitative index of the degree of tarnishing suffered by the object. Moreover, the tarnished regions can be mapped back from the histogram to show the regions of the object most damaged by tarnish. Similar techniques have been applied to extract quantitative topographic information from works of art on paper and to analyze watermarks in historic papers (29,30). Microscopy
Figure 6. Intaglio print where the loss region is shown over a black background. The gray level histogram shows that the black region is 12% of the image area.
Optical microscopy has long been used for materials identification and continues to be a major tool for investigating the materials in museum objects (31). A stereomicroscope that has a magnification between 10 and 100x is useful for preliminary examination of the surface characteristics of museum objects (32) and for taking microsamples for closer examination under a polarized light microscope (33–35). Samples smaller than 100 µm in diameter can generally be taken from a painting or other museum object without impacting the visual appearance of the object at all. A skilled polarized light microscopist who has museum experience can generally identify a pigment sample after a few minute inspection. Often a pigment, a binder, or a fiber can be identified immediately by its shape and microstructure. In addition, quantitative measurements can be made of the number of sides of a crystalline material and the angles of intersections of sides and edges. Examination by crossed polarizers often adds significantly to the observed microstructure of crystalline pigments and fiber materials, and a heated stage provides a direct measurement of melting and decomposition points. If identification is still uncertain, the
IMAGING SCIENCE IN ART CONSERVATION
(a)
667
1
R pos
0
0
1
R neg Figure 8. Two-dimensional histogram of the images in Fig. 7 showing reflectance of the positive image Rpos versus reflectance of the negative image Rneg . The histogram population is displayed in a gray scale, dark is for high population, and white for zero.
(b)
Figure 9. Photomicrographs of a cross section of paint from an oil painting. Images are in visible light and in several infrared wavelengths (38a). Figure 7. Daguerreotype histogram analysis: (a) image in diffuse light; (b) image in specular light.
literature on microscopic analysis is filled with a battery of selective staining reagents, refractive index oils, and microchemical spot tests. Electron microscopy is a technique capable of much higher magnification than optical microscopy and provides additional information about the structure of materials on a submicron scale. Still higher magnification, approaching molecular dimensions, is achievable by atomic force microscopy. In addition, supplemental microprobes capable of chemical or elemental analysis extend the utility of microscopy for materials analysis. Optical microscopes, for example, can be combined with Fourier transform infrared
(FTIR) microprobes to analyze organic compounds (36,37). Scanning electron microscopes are often configured with an energy dispersive, X-ray fluorescent microprobe for elemental analysis (36,38). These techniques then can provide a map of the microdistribution of a material in a museum object. For example, Fig. 9 illustrates digital photomicrographs of a cross section of a paint sample taken from an oil painting. The images were captured in visible light and at a series of infrared wavelengths, and principal component analysis can be applied to extract spatially resolved information about the materials and their changes during aging. A major advantage of microscopy as an analytical tool is its general applicability. Microscopy can be used to identify inorganic as well as organic pigments, fibers,
668
IMAGING SCIENCE IN ART CONSERVATION
resins, metals and corrosion products, ceramics, glass, etc. However, a major drawback of optical microscopy is the need for a trained microscopist. Several years of training and practice are required to become even a moderately skilled microscopist, and often sophisticated instrumentation is more available to art conservators than a trained microscopist.
(a)
Examination Using Ultraviolet and Infrared Energy Ultraviolet Illumination and Fluorescent Discontinuities. Museum objects often have a complex material composition, and examination of fluorescent patterns while illuminating an object by a mineral light is a common technique used by art conservators (39,40). Some adhesive materials fluoresce, revealing otherwise undetectable restorations in ceramic objects (41). A signature on a wooden panel, for example, may be enhanced and identified by ultraviolet examination. Drying oils, commonly used as pigment vehicles and as varnishes, dry by concurrent chemical oxidation that often produces fluorescent chromophores. Thus, oil paintings generally become more fluorescent as they age, and regions of a painting that have been damaged and repaired often appear much darker by UV examination, as illustrated in Fig. 10 (52). The most common technique is to use a color film or digital color camera to record the fluorescent image. A UV blocking filter is placed over the camera lens to confine exposure of the film or digital sensor to visible fluorescence (42,43). A technique called ultraviolet fluorescence microscopy has been used to identify binders in different paint layers in an oil painting (44). The technique involves the application of fluorescent staining reagents to the paint cross sections. Figure 11 illustrates a paint cross section that has a canvas support, an acrylic gesso ground, zinc white in linseed oil, and egg white. Image (a) was captured while the sample was under UV illumination. Other samples were prepared using fluorescent reagents known to bind specifically to various kinds of binders. Image (b) illustrates a sample from the same painting stained with rhodamine B. From this and other stained samples, the overall composition of the painting was interpreted, as shown in drawing (c) (44). Spectral Characteristics of Pigments in the NearInfrared. Imaging using near-infrared radiation was introduced into museum laboratories in the 1930s when commercial, IR sensitive film became available. The limited sensitivity of early black-and-white IR films required illumination by a strong incandescent source. Nevertheless, the different near-IR spectral characteristics of the pigments in many paintings proved very useful. For example, ultramarine blue and azurite can appear very similar under visible light. However, azurite absorbs near IR strongly and thus appears very dark on positive prints made from IR-sensitive film. Ultramarine blue, however, is a poor absorber of near IR and produces much lighter images on the same positive prints. In the 1950s, the Eastman Kodak company developed false color, near-IR film for use by the military to detect camouflage (45). This film has three emulsion layers,
(b)
Figure 10. Oil painting on a wooden panel from the workshop of Rogier van der Weyden, Netherlandish. Image (a) is by ordinary light and (b) is by UV illumination (52). The Dream of Pope Sergius, Workshop of Rogier van der Weyden (Netherlandish, 1399/1400–1464), Number 72.PB.20, The J. Paul Getty Museum, Los Angeles, CA.
each sensitive to a different part of the spectrum. Green light exposes a layer that controls yellow dye, red light controls magenta dye, and IR radiation from 700 to 900 nm controls cyan dye. The film was made commercially available in the 1960s and had immediate use as a tool for nondestructive identification of pigments in art. The film was much more sensitive than older black-and-white film, thus requiring much less light to capture an image, an important consideration when handling light-sensitive
IMAGING SCIENCE IN ART CONSERVATION
(a)
669
(a)
(b)
(c)
Egg white
Zinc white in linseed oil
Acrylic gesso
(b)
Figure 11. Photomicrograph illustrating the use of material-specific stains in examining the construction of an oil painting. Images (a) and (b) are by ultraviolet radiation. Image (b) was stained by rhodamine B. Image (c) was drawn based on the interoperation of fluorescent patterns in these and other stained samples (44).
museum objects. Figure 12 illustrates the use of IR false color film in examining an oil painting. Figure 12 shows a detail of a polyptych by Simone Sartini, ca. 1321 (46). The images are both in the red channel of digital RGB copies of color images originally captured on film. Image (a) was photographed using traditional color film, and image (b) was captured on false IR film. The red channel illustrates primarily the difference in the behavior of the pigments in the near IR, and a damaged region is easily observed in the upper left in image (b). Examination revealed that the region of damage had undergone an early restoration and been inpainted using a pigment to match the visual appearance of the painting. The pigment used in the in-painting, however, does not match the original in the near-infrared. The difference can be seen easily as a large hue difference between color prints made from color film and false color film (47). Near-Infrared Radiation and Substructures of Oil Paintings. The optical scattering power of many materials decreases as wavelength increases. This is the reason that the sky is blue and also and near-IR radiation often penetrates deeper into paint layers than visible light (48).
Figure 12. Red channel images of a detail of a polyptych by Simone Sartini. (a) Image captured on traditional color film. (b) Image captured on color IR film (46).
The lower scattering power of paint layers in the IR reveals an underlying structure in the painting that is not visible in the 400–700 nm range. This phenomenon is described quantitatively by the Kubelka–Munk model (49) of paint reflectance, illustrated in Fig. 13. In this model, I is the
670
IMAGING SCIENCE IN ART CONSERVATION
I
J
x
j
dx X
i
Paint with K and S.
Substrate reflectance, Rg
spatial design R(y, z) is controlled by artists by varying the amount of pigment, c(y, z), and the type of pigment, ε(y, z), in the paint layer. The artist may be guided by an underdrawing Rg (y, z) formed from carbon black applied to the top surface of the substrate. In this case, the artist intends to hide the underdrawing underneath the paint layer. If the scattering coefficient S of the paint goes to zero, then Eq. (7) converges to the Beer–Lambert equation for a transparent absorbing layer on a reflective substrate. However, if the paint has a sufficiently high scattering coefficient, Eq. (7) converges to the ‘‘complete hiding’’ limit of Eq. (9): R=a−b (9)
√ where a ≡ 1 + K/S and b ≡ a2 − 1. The absorption coefficient K is related to the concentration c of the chromophore in the paint layer and to the molar extinction coefficient ε of the chromophore by Eq. (8): K = 2.303εcX. (8)
The terms a and b are functions only of K and S, as defined in Eq. (8), so a high scattering paint has a reflectance that is independent of the underlying substrate Rg . Scattering materials are added to formulate a paint for complete hiding. The scattering material must have a high index of refraction compared to the surrounding medium and be present in sufficient concentration to make the overall scattering coefficient S high enough to be completely hiding in a thin paint layer. Some commonly used scattering materials found in paint are rutile (Ti O2 ), flake white (PbO2 ), and gypsum (CaSO4 ). To formulate the paint economically and to maintain good working properties, a minimum amount of scattering material is used that is sufficient to achieve complete hiding in visible light. In the near IR, however, S is lower, and the paint layer often no longer completely hides. Moreover, many paints absorb less strongly in the infrared. These two effects often combine to make the underdrawing, R(y, z), a very significant part of the observed reflectance of the system in the near-IR, so infrared imaging of paintings has become a major tool for examining painting substructures. The types of substructures in oil paintings examined by near-IR imaging techniques include underdrawing, underpainting, pentimenti, and original paint that has been masked by the accumulation of soot and grime (26,50). Underdrawings often were executed using a charcoal pigment, which is highly IR absorptive, applied to an IR reflective, white gesso ground. The underdrawing served as a guide to the artist in executing the painting. Examination of underdrawings by art historians has led to insights into the painting styles of artists. An underpainting is a paint layer underneath another paint layer. Sometimes an artist would paint over another artist’s work either as an economy measure to recycle the support or to hide work not considered appropriate. Art patrons have on occasion hired artists to overpaint objectionable features of paintings. During the Victorian period, many a bare breast of Eve was covered in this way. A pentimento, on the other hand, is a painted detail that was overpainted by the original artist as an intentional design change during the creation of the painting. Analysis of these substructures can provide insights to art historians into the artist’s working process and can provide guidance to art conservators in the appropriate treatment for a painting.
The reflectance R, of any point in the painting is a function of S, ε, c, X, and Rg . In many cases the
Near-Infrared Radiation Imaging Techniques. Near-IR photography, also known as IR reflectography, has been
Figure 13. The Kubelka–Munk model of paint reflection.
flux of light falling onto a paint surface, and J is the flux of light that returns to the surface as reflected light. The reflectance of the paint system is R = J/I and is a function of the layer thickness X, an absorption coefficient K, and a scattering coefficient S. Within a differential layer dx of a paint at any distance x from the surface, there is a flux of light I in the downward direction and a flux J in the upward direction. These fluxes are modeled as first-order absorptive and scattering processes according to Eqs. (5) and (6): dI = KI + SI − SJ, dx dJ = KJ + SJ − SI. − dx −
(5) (6)
The sign convention in Eqs. (5) and (6) is such that the distances x and X are positive, the flux values I and J are positive, and the constants K and S are positive firstorder constants for absorption and scattering, respectively. These expressions show a first-order decrease in both I and J as distances increase as a result of both absorption K and scattering S. Note that scattering changes some of the downward flux I into upward flux J as shown in the right-hand terms of these equations. Equations (5) and (6) can be solved using the boundary condition that at x = X the ratio of the flux values equals the reflectance of the substrate material, J/I = Rg . This leads to Eq. (7) for the system reflectance R = J/I: R=
1 − Rg [(a − b) cot(bSX)] , a − Rg + cot(bSX)
(7)
Detector spectral responsivity
IMAGING SCIENCE IN ART CONSERVATION
1.00
b
c
d
a 0.50
0.00
0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0 4.4 4.8 5.2 Wavelength (µm)
Figure 14. Spectral sensitivity of (a) typical silicon CCD camera, (b) Hamamatsu model N2606-A lead sulfide vidicon, (c) EG&G SO#16471 germanium scanning detector, and (d) Kodak model 310-21-X, PtSi thermal Schottky barrier camera (50).
used only to a limited degree in analyzing substructures of oil paintings. But by using commercially available vidicon cameras in the early 1960s, a technique called ‘‘infrared reflectography’’ became widespread. (39,40,50,51). Several systems have been used extensively in museum studies, and a review of four systems has been reported (50,51). Figure 14 shows the spectral sensitivities of these systems. The simplest system to use is an ordinary silicon CCD camera that has spectral sensitivity to about 0.9 µm. Most inexpensive CCD cameras are manufactured for (a)
(c)
671
use in the visible wavelengths by placing IR blocking filters over the CCD array. CCD video cameras are also readily available without IR blocking filters and can provide useful sensitivity out to 1 µm. Generally these cameras use an IR band-pass filter such as a Wratten 87A to eliminate visible light. The spectral sensitivity of CCD cameras is similar to that of IR film, but the added versatility of real-time video, the versatility of filters and illumination techniques, and the ease of digital processing make CCD video a very useful and inexpensive tool for museum applications. A lead sulfide vidicon system provides sensitivities to 1.7 µm, and in this range, many paints found on museum objects scatter radiation significantly less than visible light. Thus, the lead sulfide vidicon is used extensively as a tool for examining the understructures of paintings. Most infrared reflectography is carried out by using CCD or vidicon cameras used in real time with an ordinary video monitor. Individual images are captured from the video stream by conventional video frame grabbers. However, underdrawings often have relatively fine detail, so multiple close-up images are usually captured to document a large area. These close-up images are then combined in a mosaic to form a high resolution image of the entire region under examination, as illustrated in Fig. 15 (52). Examination of an enlarged detail in (b)
(d)
Figure 15. Infrared reflectographic examination of ‘‘Portrait of a Man,’’ a panel painting by a master of the 1540s. Courtesy of the Fogg Art Museum, Harvard University Art Museums, The Kate, Maurice R. and Melvin R. Seiden Special Purchase Fund in honor of Ron Spronk. (a) In visible light, (b) individual IR images, (c) mosaic constructed of the individual IR images, and (d) enlarged detail (52).
672
IMAGING SCIENCE IN ART CONSERVATION
Fig. 15(d) reveals underdrawing marks by the artist. In this illustration, the underdrawing appears to be a dry medium such as black chalk. Other infrared systems are also in limited use in museum studies. The Kodak model 310-21-X, PtSi thermal Schottky barrier camera, for example, is sensitive from 2.3 to 5 µm. Ambient thermal noise becomes much more significant the farther beyond 2 µm one goes, thus requiring a cooled detector. The Kodak system uses liquid nitrogen, and thus is less convenient for use in museums. Other systems using single detectors and a scanning mode of image capture have been explored (50,53). Such systems offer greater versatility in the selection of detectors, but the experimental complexity of scanning systems compared to the ease of use of video systems limits scanning systems to major museum laboratories. Near-Infrared Radiation Imaging Techniques. β Rays are energetic electrons, but they have lower energy than X rays and γ rays. Because they are charged particles, β rays do not penetrate far in materials before they are captured. β rays are typically produced by placing a thin sheet of lead over the object to be irradiated and then irradiating the lead with hard X-rays. The irradiated lead produces β rays that enter the object. Only very thin objects such as paper (stamps, prints, money, etc.) are examined this way. One example of the utility of β radiography is imaging watermarks in papers. Although watermarks may often be observed in transmitted visible light, they are often obscured by dark inks applied by the printer or artist. However, a β ray is absorbed in direct proportion to the total mass of material it encounters. Inks add negligible mass to paper, but a watermark significantly varies the mass of the paper. Thus, β radiography produces an image only of the watermark and the formation of the paper.
(a)
(b)
Examination Using High Energy Radiation Several imaging techniques based on high-energy radiation have been used extensively in analyzing museum objects. Some of these techniques, such as X radiography, are well known in medical imaging, and museums often prevail upon local hospitals or dentist offices for help in carrying out X-ray examinations of objects. Children’s Film cassette
Figure 17. ‘‘The Virgin and Child’’ from the workshop of Dirck Bouts, Netherlandish, c. 1460, oil on wood. Courtesy of the Fogg Art Museum, Harvard University Art Museums, Gift of Mrs. Jesse Isidor Straus in memory of her husband, Jesse Isidor Straus, class of 1893. (a) Visible light photograph and (b) X radiograph of the Virgin’s face (52).
X-ray source
Painting
Figure 16. Typical arrangement for X-radiographic analysis of museum objects.
Hospital in San Diego, for example, is well known for performing computed tomographic (CT) X-ray imaging of ethnographic and paleontological objects (54). In addition, because museum objects are not harmed by high-energy radiation to the same extent as humans, there are several X-ray and other high-energy methods of imaging that
IMAGING SCIENCE IN ART CONSERVATION
(a)
673
(b)
Figure 18. Illustration of an X radiographic examination of a painting on a wood panel. (a) A black-and-white image in visible light of ‘‘The Deposition’’, attributed to Jan Erasmus Quellinus (Los Angeles County Art Museum). (b) X-radiographic image of ‘‘The Deposition’’ (55).
are useful for materials analysis and imaging of museum objects that would not be applicable to medical imaging. These include imaging using high doses X rays, very high energy X rays, gamma rays, neutrons, and beta rays. X radiography of Paintings. The arrangement of museum X radiographic systems is similar to that used in medical and industrial X radiography. X rays pass through the object under study to form a shadow image on a photographic plate, as shown in Fig. 16. Unlike medical X radiography, museum X radiography does not require using a fluorescent screen to expose the film. The increased dose of X rays required to expose the film directly is not harmful to the painting, and increased resolution is obtained. Figure 17 illustrates an X radiograph captured by using the arrangement shown in Fig. 16 (52). Flake white is a lead oxide used extensively as a white pigment in European and American paintings. In the example shown in Fig. 17, lead white was applied locally in the undermodeling of the Virgin’s face. The ridge of the nose, the upper lip, and the eye sockets appear as highlights in the X radiograph because the lead white strongly absorbed X rays. Other pigments and cracks in the painting absorbed X rays to a lesser extent and thus appear dark on the X radiograph. Other pigments containing heavy metals, such as lead-tin yellow and vermilion, which contains mercury, absorb X rays in a manner similar to lead white. Table 1 lists the major pigments found in oil paintings and their X radiographic characteristics.
Figure 18 illustrates the utility of X radiography in determining the history and structure of a painting (55). The image on the left is a black-and-white photograph of an oil painting of the crucifixion on a wood panel. The image on the right is an X radiograph of the panel painting. Both images show the same area of the painting, but the right image appears to be of a sitting room scene. Further examination indicated that the sitting room scene had been painted on the wood panel. At a later date, the crucifixion was painted on paper and adhered to the first painting. Cases like this are not uncommon and present a problem to the art conservator about which painting to attempt to conserve and restore. In this illustration, both paintings were recovered and restored. Magnification Radiography. A technique called magnification radiography (57) provides some added versatility to conventional X radiography. As illustrated in Fig. 19 and Eq. (10), a magnification factor M is obtained by placing the painting some fraction of the way between the film cassette and the X-ray source. Magnification radiography has been used to examine the early stages of adhesive separation of paint layers in oil paintings (57). Adhesive loss can lead to curling and loss of paint flakes. Art conservators treat these conditions by techniques ranging from gentle heating and pressure, to application of new adhesive, and to complete relining and readhesion of the paint layer onto a new support.
674
IMAGING SCIENCE IN ART CONSERVATION Table 1. Color, Chemical Composition, and X-Ray Absorption of Artists’ Pigmentsa Color
Name
Chemical Composition
X-Ray Absorption
White
Silver white Zinc white Flake white Chinese white
Silver-lead carbonate
Very high Very high High High
Yellow and orange
Chrome yellow Cadmium yellow Zinc yellow Aurora yellow Yellow ochre Gamboge Naples yellow Mars yellow Yellow lake
Lead chromate Cadmium sulfide Zinc chromate Cadmium sulfide Iron oxide, alumina Organic Lead antimonide Iron oxide Organic
Very high High High High Medium–high Low Very high Medium Low
Red
Red lead Vermilion-cinnabar Vermilion red Carmine lake Madder (rose, brown, purple)
Lead oxide Mercury sulfide Iron oxide Organic Organic
Very high Very high Medium Low Low
Brown
Florence brown Mara brown Prussian brown Sepia
Copper cyanide Iron oxide Iron cyanide Organic
High Medium Medium Low
Blue
Cerulean blue Cobalt blue Light ultramarine Prussian blue Indigo
Cobalt stannate Cobalt aluminate Sodium sulfide Iron cyanide Organic
High Medium Medium Medium–high Low
Violet
Cobalt violet Mars violet Mineral violet
Cobalt phosphate Iron oxide Maganese phosphate
Medium Medium–high Medium
Green
Emerald green Chrome green Cobalt green Green lake
Copper arsenate Chrome oxide Zinc, cobalt oxide Organic
High Medium High Low
Gray and black
Ivory black Iron black Bleu black Lamp black Carbon black
Calcium phosphate & organic Iron oxide Organic Organic Organic
Medium Medium–high Low Low Low
All colors
Acrylic
Organic
Low
a
Ref. 56.
Detailed examination by magnification radiography can guide the conservator in the choice of treatment options. The Sensitometry of X-Ray Imaging. Medical X rays are generally in the 20 to 90 kV range (58), but museum applications involve everything from 5 to 1,000 kV, depending on the type and thickness of the material under analysis (56). For example, very soft X rays of 5 to 10 kV border the ultraviolet region of the electromagnetic spectrum and penetrate only limited depths of materials. Soft X rays are useful for analyzing thin objects such as prints on paper and money. The 10 to 40 kV range is more useful for analyzing of paintings on canvas or wood that have thick gesso grounds. Energies of 80 to 250 kV are used for very thick wood pieces, mummies in heavy coffins, metal sculptures, and other metal objects.
Painting
Film cassette
X-ray source
d1 M=
d2
d1 + d2 d1
Figure 19. Schematic illustration of magnification X radiography.
IMAGING SCIENCE IN ART CONSERVATION
Industrial X rays of 250 to 1,000 kV are used for very thick stone and metal sculptures. The characteristic curve of an X-radiological system may be measured by placing a step wedge of metal on top of the film or cassette. The metal step tablet is machined so that it has a linear increase in metal thickness, as illustrated in Fig. 20. Exposure of the film through this step tablet produces image density values in the developed film that decrease as the thickness of the metal increases. Figure 21 illustrates the resulting sensitometric curves for soft and for hard X rays. It is evident from this that hard X rays are more penetrating and they also provide a broader dynamic range and lower image contrast on film. One of the factors that decreases the contrast in hard X-ray radiographs is secondary radiation emitted by the irradiated object under study. Secondary radiation is lower in energy than the primary radiation from the X-ray source. Primary radiation can be scattered from
Film cassette X-ray source
Aluminum step tablet
Film cassette Figure 20. Illustration of an aluminum step tablet that has steps of 1 through 8 mm thickness.
675
the object in random directions. The amount of scattered primary radiation and of secondary radiation increases both as the thickness of the object and the energy of the primary radiation increase. Thus, archeological and paleontological objects that require hard X rays are subject to much more secondary radiation than oil paintings. Higher contrast film and processing techniques are generally used to compensate for the resultant decrease in contrast. The electrophotographic process, which has an intrinsically high gamma and a peaked MTF for edge enhancement, significantly overcomes much of the loss of contrast caused by secondary radiation (59,60). However, now, the ability to digitize radiographic images provides many additional options for contrast compensation and significantly reduces the need for intrinsically high contrast detection of X rays. Another device used to decrease the effects of secondary radiation and to increase image contrast is the antidiffusion grid (56), as illustrated in Fig. 22. The grid lines are made of alternating of lead and strips a low absorbing material such as paper or aluminum. The grid lines are placed at an angle so that the direct radiation from the X-ray source falls on the film and is absorbed minimally by the grid lines. However, the grid lines significantly increase the absorption of scattered secondary radiation and diffusely reflected primary radiation. The antidiffusion grid is characterized by (a) the focal distance f , as illustrated in Fig. 22, and (b) the grid ratio, defined as the grid height divided by the spacing interval. A grid ratio of 10, that has a grid height of 2 mm and a spacing of 0.20 mm, is commonly used for large museum objects. Reflex Radiography. Before the advent of the office copy machine, office copies were often made on silver halide print paper using the reflex copy technique illustrated in Fig. 23. The reflex process involves exposing through the copy emulsion onto the original document. The light is absorbed in dark regions of the original document. However, in white regions of the original document, the light is reflected back and forth between the document and
Film
Density
f
2
0
X-ray source
0
0.5 log(L )
Object
1
Figure 21. Developed density D of an X-ray film versus log(L), where L is millimeters of aluminum. Data is for hard X rays(x) and soft (O) X rays using the aluminum metal step tablet technique shown in Fig. 17.
Antidiffusion grid Figure 22. Antidiffusion grid (56).
676
IMAGING SCIENCE IN ART CONSERVATION
the emulsion. This significantly increases the probability of image formation by the emulsion. Thus, a significant exposure differential is achieved between black-and-white regions of the original document. The same technique can be used to take an X ray of a very thick object, as illustrated in Fig. 24 (61). The artistic design on the surface of the column varies in X-ray transparency, thus modulating the X-ray dose received by the column material. This in turn modulates the amount of secondary radiation returned to the surface of the column. Then the artistic design again modulates the absorption of the secondary radiation reaching the film. The resulting radiographic image contains information about both the structure of the paint layers and the substructure of the column itself. Such information is useful for planning the restoration and conservation of architectural art. Nonsilver Output for X Radiography. The fluoroscope was a popular X-radiographic technique used in routine medical practice throughout the early 1950s until the harmful effects of X-ray exposure were fully realized. The fluoroscope was used for real time X-ray examination, as illustrated in Fig. 25. The system uses a zinc-cadmium sulfide screen that fluoresces in visible light when struck by X rays. The fluoroscope is no longer used because X-ray exposure is excessive and also because the image viewed on the fluorescent screen has very low visual contrast and definition (56). To eliminate the poor luminosity and low contrast of traditional fluoroscopy, image intensifiers have been developed to increase luminosity by three to four orders of magnitude (56). Figure 26 is a schematic representation of the image intensifier. When X rays strike the fluorescent screen, visible light is emitted. The light then stimulates a photoconductive
Incident light Print paper AgX emulsion Original document printed both sides Figure 23. The reflex copy technique.
Secondary radiation
X-ray source
Artistically decorated architectural column
Ag X film Figure 24. Schematic illustration of reflex radiography, also called X-ray emissiography, applied to an artistically decorated architectural column.
Fluorescent screen
Observation in real time
X-ray source
Object or patient
Figure 25. Schematic illustration of an X-ray fluoroscope.
Anode and fluorescent screen
Fluorescent screen
X ray
e−
Light
Focusing field Photocathode −
+
30 kV Figure 26. Schematic representation of the image intensifier.
cathode. A high voltage amplifies the signal as a high electron current. The electron current is focused imagewise on the anode, which then functions much like a CRT screen. The light output from the anode screen is brighter by several orders of magnitude than the light from the input fluorescent screen (56). The output from the image intensifier is generally captured by a video camera for real-time display on a monitor. Video capture then produces a digital radiographic image, and this allows applying the vast variety of digital enhancement tools. Higher spatial resolution can be achieved by drum scanning a film radiograph, but the convenience of real time X-radiographic examination makes the video system a very versatile tool for museum analysis. X-Ray Fluorescent Imaging. The diffuse secondary radiation that causes decreased contrast in radiographic images (vide supra) is actually the basis of an important technique for elemental analysis. The secondary radiation is a fluorescent emission from metal ions in the irradiated material, and the energy of the fluorescent X rays is a function of the atomic number of the metal atom. X rays can be formed in situ by using a scanning electron microscope. This is the principal of the SEM microprobe technique of analysis, called X-ray fluorescence (XRF), illustrated in Fig. 27 (62). X-ray fluorescent analysis does not have to be carried out on a microscale, nor does it need to be done in a vacuum. Commercial instruments are available that can be rolled up to a painting in a museum. An X-ray source irradiates a point on the painting. Then fluorescent X rays are detected by an energy dispersive X-ray spectrophotometer. The
IMAGING SCIENCE IN ART CONSERVATION
677
Electron beam Painting
Si(Li) detector X-ray detector Metal foil
Fluorescent X rays
In situ X-rays Al al Sample Figure 27. X-ray fluorescent (XRF) probe in an electron microscope.
Figure 29. Schematic drawing of an XRF spectral imaging system that has a scanning X-ray source and an X-ray spectrometer detector.
Fe Relative Intensity
Scanning X-ray source
(a) K
Ti Cu
Si
0
10 X-ray energy, ke V
Figure 28. Schematic illustration from data reported on an XRF analysis of a drawing by Rembrandt (63).
result is a spectrum of intensity versus X-ray energy whose peaks correspond to specific elements. Figure 28 is an illustration from data reported on a study of the elemental composition of inks used in drawings by Rembrandt. Primary X radiation was done at 20 kV (63). Art conservation scientists at the Hermitage Laboratory in Russia developed a scanning XRF system (64). An X-ray tube using electron scanning produced an X-ray beam that can be raster scanned across a 60 × 60 mm area of a painting, as illustrated in Fig. 29. This system reportedly can produce hyperspectral images that cover the X-ray energy range from 0 to 50 keV at a spatial resolution of 0.5 mm. Gamma (g) Radiography. Gamma (γ ) radiation is electromagnetic radiation that has higher energy than X radiation. γ Radiation has been used on occasion to image very large objects opaque even to hard X rays. A γ -ray source such as iridium-192 (300 to 600 keV), cesium-137 (660 keV, or cobalt-60 (1.3 to 1.7 MeV) is contained in a capsule that has a mechanical aperture and shutter. The γ -ray source is used much like an Xray source to expose silver halide film. Figure 30 is an example of an γ radiograph of ‘‘Crouching Aphrodite’’ from the Greek, Etruscan, and Roman antiquities department of the Louvre (56). The underlying mechanical structure of the statue is clearly visible.
(b)
Figure 30. Visible light image and γ radiograph of ‘‘Crouching Aphrodite’’ from the Greek, Etruscan, and Roman antiquities department in the Louvre (56).
678
IMAGING SCIENCE IN ART CONSERVATION
Neutron Radiography. Neutron radiography is an imaging process that is in only limited use in the analysis of oil paintings although it can provide far more information than conventional X radiography (65a,b,c). The reason for the limited use of this technique is that it requires a nuclear reactor to provide a source of cold neutrons. The Ward Center for Nuclear Sciences at Cornell University, Ithaca, NY, provides neutron radiographic services for art conservation (65d). However, as illustrated in Fig. 31, the configuration of the nuclear reactor places constraints on the size and physical integrity of the painting. The Cold Neutron Source facility of the 10-MW research reactor BERII was constructed using a neutron guide tube designed to accommodate oil paintings up to 120 × 120 cm (65c). Transmission neutron radiography can be performed similarly to traditional X radiography by placing a film and a suitable fluorescent screen behind the painting. The neutrons passing through the painting activate the screen to expose the film. Neutron radiography offers a unique advantage over X radiography because neutron absorption occurs by a mechanism very different from the absorption of X rays. Neutrons are not well absorbed by high atomic number materials such as lead but are absorbed strongly by low atomic number atoms such as carbon, hydrogen, and oxygen. Thus transmission neutron
(a)
radiography provides a complement to X radiography by imaging organic material (65c). A more often used technique of neutron radiography is called neutron activation autoradiography (NAAR) (65a,b). This technique involves irradiating a painting using cold neutrons to a degree sufficient to produce radioactive isotopes of atoms used in pigments. The isotopes then decay, and beta and gamma radiation are emitted. Beta rays can be used to excite a fluorescent screen which, in turn, can expose a radiographic film. The relative intensity of beta-ray emission from different metals depends on the relative abundance of the metal, the neutron capture efficiency, and the half-life. The half-lives of isotopes formed from atoms in pigments can range from minutes (27 Al and 60m Co) to hours (56 Mn) to days (203 Hg). Many atoms form multiple isotopes whose half-lives also range from minutes to days. Thus, the relative beta-ray intensity of different isotopes varies significantly over time as the painting loses its radioactivity, and radiographic images formed on film show very significant differences, depending on the delay time between activation and exposure of the film. An excellent didactic illustration of NAAR analysis of an oil painting was published by The Ward Center for Nuclear Sciences at Cornell University, Ithaca, NY (65d). An oil painting measuring 0.229 × 0.178 m was constructed as
(b)
Figure 31. Placing a painting in the TRIGA Nuclear Reactor at the Ward Laboratory, Cornell University, Ithaca, NY (65d).
IMAGING SCIENCE IN ART CONSERVATION
(c)
Figure 32. Construction of a test oil painting to illustrate NAAR imaging (65d).
a test sample using known materials. A sheet of 1/8 Lucite was coated with a commercial, flake white, oil paint (containing both white lead and zinc) as a ground layer. Next, a head was painted on the ground layer using raw umber containing manganese, as illustrated in Fig. 32. Then grid lines were drawn over the head using cobalt blue. Then the grid squares were painted in with cobalt blue to cover the head image. A cadmium red paint composed of both selenium and cadmium was then used to paint a mask over both the head and the grid. Once dried, the painting was inserted into the vertical aluminum pipe of the reactor, as illustrated in Fig. 31, where it was exposed to a thermal neutron environment of 109 ns−1 . cm−2 for 20 minutes. Following the activation, the radioactive painting was removed and a sequence of autoradiographs was made by placing it in direct contact with Polaroid type AR positive transparency film. Four films were exposed at increasing delay times after neutron activation, as shown in Table 2. Note that as time passes, the radioactivity decays and requires much longer exposure times to capture an image. The four autoradiographs are shown in Fig. 33. The 15-minute image shows a predominance of cobalt in the Table 2. Delay Times and Film Exposure Times Time after Activation
Film Exposure Time
15 minutes 2 hours 4 days 36 days
10 minutes 2 hours 2 days 21 days
(a)
(b)
(c)
(d)
Figure 33. Autoradiographs of the test painting in Fig. 32 made (a) 15 minutes, (b) 2 hours, (c) 4 days, and (d) 32 days after neutron activation.
Cd Cd Relative signal strength
(b)
(a)
679
Cd
Cd
Mn
Zn Cd
60mCo
27Al
Mn
Mn 15 min
Mn 2 hr
Cd Cd Se Se
Se
65Zn
Co
Co
65Zn
Co
Co 36 days
4 days
Relative gamma-ray energy Figure 34. Gamma-ray spectra of the painting of Fig. 32 taken at different delay times.
grid and blocks. Manganese in the head and cadmium in the mask are visible but less pronounced. The presence of the cadmium is also shown by its attenuation of beta rays from the cobalt. Gamma radiation emitted by the decay of isotopes provides another indication of the composition of the painting. The energy of a gamma ray from the decay of a particular isotope is diagnostic of the isotope. Immediately before exposing film to make an autoradiograph, the test painting in Fig. 32 was examined using a gamma-ray spectrometer. The results are shown in Fig. 34. The shortlived 60m Co in the grid and blocks, isotopes of Mn in the head, and of Cd in the mask are all evident. In addition, an isotope of Al is present. Aluminum stearate, present in most of the paints tested, is commonly used as a stabilizing agent in manufacturing modern oil paints.
680
IMAGING SCIENCE IN ART CONSERVATION
Figure 33(b) after the 2-hour delay time shows the disappearance of the grid/checkerboard image. The shortlived 60m Co is no longer active, as shown in the gamma-ray spectrum, but isotopes of manganese, cadmium, and zinc now predominate. The zinc was present in the white ground layer and produces a general fogging of most of the painting’s surface. Figure 33(c) shows the radiograph after a 4-day delay, where the mask is the dominant feature of the image. As shown in the corresponding spectrum in Fig. 34, both cobalt and manganese have decayed and left the cadmium isotopes in the mask as the dominant species that forms the radiographic image. Zinc and cobalt are indicated in the spectrum but are not manifested in the radiographic image. After a delay of 36 days, the gamma spectrum shows complete decay of the Cd isotopes and the emergence of isotopes of selenium, which is a component of cadmium red pigment. Thus, the radiograph in Figure 33(d) shows features of the grid, blocks, and mask, but the head is not visible. The versatility of NAAR imaging, coupled with gamma spectroscopy, is clearly evident.
(a)
Examination by Digital Image Processing
(c)
Digital image processing has had as great an impact on museum imaging as it has had on other imaging applications. The ability to enhance contrast, segment histograms, apply convolution kernels, do Fourier filtering, do morphological analysis, and perform multi-image processing has significantly improved the utility of all of the analytical imaging technologies used in studying of museum objects (66–72,74). One example demonstrates the utility of image processing for replacing the inconvenience of β radiography by simple techniques of digital video capture and optical analysis (29). Figure 35 shows images of a nineteenth-century Spanish ledger captured (a) in transmitted light and (b) in reflected light. The images were calibrated, so that the pixel values correspond to reflectance values R and transmittance values T. The transmittance of image (b) is governed by the transmittance Tp , of the paper and by the absorption coefficient K of the ink [T = f (Tp , K)]. Similarly, the reflectance of image (a) is a function of these same two variables [T = g(Tp , K)]. Kubelka–Munk theory provides the two functions f and g, so inversion allows processing the two images to generate an image showing only Tp , the transmittance of the paper. This reveals the watermark where the visually obscuring ink is digitally removed, shown as (c) in Fig. 35. A unique and somewhat controversial application of digital image processing is in the stylistic analysis and authentication of works of art. Expert art historians can examine a painting visually to determine the artistic style of the work and thus identify the artist. Technical analysis often provides essential confirmation, but stylistic analysis by experts remains of key importance in museums. Some attempts have been made to develop quantitative computer analytical techniques for stylistic analysis to supplement the work of the experts. For example, optical character recognition (OCR) has been applied successfully as a supplemental tool used by
(b)
Figure 35. Watermark analysis (29): (a) Back lit; (b) Front lit; (c) Processed.
scribes to authenticate Hebrew manuscripts for which accurate laws of calligraphy have been established (72). However, other types of hand script, executed using less rigorously governed laws of calligraphy, are much more difficult to recognize by OCR software. One would also anticipate difficulties in attempts to quantify the stylistic features of artists. Nevertheless, several publications have made the attempt. Techniques applied to the analysis of painting style have included histogram analysis (1,74), geometric analysis and element juxtaposition (75–77), and statistical regression (78–81). Although a computer substitute for the expert art historian has not been achieved, studies of this kind, coupled with analytical imaging techniques, have provided insights into the working styles of many artists. BIBLIOGRAPHY 1. See, for example, the journal, A. Hamber, J. Miles, and W. Vaughn, eds., Computers and the History of Art, Mansell, London and New York. 2. H. Besser, J. Am. Soc. Inf. Sci. 42, 589 (1991).
IMAGING SCIENCE IN ART CONSERVATION 3. C. Bard, in C. N. Proudfoot, ed., Handbook of Photographic Science and Engineering, 2nd ed., IS&T, Springfield, VA, 1999, p. 431. 4. T. Suga, in C. N. Proudfoot, ed., Handbook of Photographic Science and Engineering, 2nd ed., IS&T, Springfield, VA, 1999, pp. 586–587. 5. D. J. Waters, Report to The Commission on Preservation and Access, Washington, D.C., 1991. 6. J. Wallace, J. Imaging Technol. 17, 107 (1991). 7. Eastman Kodak Company, White Light Printing Methods, Kodak Technical Publication CIS-22, Eastman Kodak, Rochester, NY, 1979. 8. Eastman Kodak Company, Color Corrected Duplicates from Faded Color Transparencies Using Copy Negatives of Kodak Vericolor Internegative Films 4112 & 6011, Kodak Technical Publication CIS-28, Eastman Kodak, Rochester, NY, 1981. 9. R. Gschwind and F. Frey, J. Imag. Sci. Technol. 38, 513–519 (1994). 10. H. Besser, J. Am. Soc. Inf. Sci. 42, 589–596 (1991). 11. M. Ester, Visual Resourc. 7, 327–352 (1991). 12. H. Besser, Mus. Stud. J. 74–81 (Fall/Winter 1987). 13. L. MacDonald, Adv. Imaging 24–27 (Sept 1990). 14. A. Hamber, J. Miles, and W. Vaughan, eds., Comput. Hist. Art, Mansell, London and New York, 1989. 15. R. Baribeau, M. Rioux, and G. Godin, IEEE Trans. Pattern Anal. Mach. Intelligence 14, 263–269 (1992) and references therein. 16. J. M. Taylor and I. N. M. Wainwright, in Museums and Information, Manitoba Museum of Man and Nature, Winnipeg, 1990, pp. 151–154. 17. J. Trant, Inf. Serv. Use 15, 353–364 (1995). 18. A. E. Cawkell, Inf. Serv. Use 12, 301–325 (1992). 19. M. W. Smith, INSPEL 26, 212 (1992). 20. R. Thibadeau, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 21 April 1999, http://www.ecom. cmu.edu/preservation. 21. A. R. Calmes and E. A. Miller, SPIE 901, 61–64 (1988). 22. D. Saunders, Comput. Humanities 31, 153–167 (1998). 23. A. Hamber, Comput. Hist. Art 1, 3–19 (1991). 24. N. Ohta, J. Imaging Sci. Technol. 36, 63–72 (1992). 25. D. Y. Tzeng and R. S. Berns, Proc. 6th IS&T/SID Color Imaging Conf., 1998, p. 112. 26. C. Lahanier, Mikrochimica Acta 2, 245–254 (1991). 27. J. F. Mowery, Restaurator 12, 110–115 (1991). 28. J. S. Arney, J. M. Mauer, J. Imaging Sci. Technol. 38, 145–153 (1994). 29. J. S. Arney and D. Stewart, J. Imaging Sci. Technol. 37, 504–509 (1993). 30. J. S. Arney, D. Stewart, and R. A. Scharf, J. Imag. Sci. Technol. 39, 261–267 (1995). 31. B. Ford, I. MacLeod, and P. Haydock, Stud. Conserv. 39, 57–69 (1994). 32. R. Freere, 8th Triennial Meet. ICOM Comm. Conserv. Getty Conservation Institute, Los Angeles, 1987. 33. W. C. McCrone, J. Am. Inst. Conserv. 33, 101–104 (1994). 34. M. J. D. Low and N. S. Baer, Stud. Art Conserv. 22, 116–128 (1977). 35. W. C. McCrone, J. Int. Inst. Conserv.-Can. Group 7, 11–34 (1982). 36. J. -S. Tsang and R. H. Cunningham, J. Am. Inst. Conserv. 30, 163–177 (1991).
681
37. M. R. Derreck, E. F. Doehne, A. E. Parker, and D. C. Stulik, J. Am. Inst. Conserv. 33, 171–84 (1994). 38. M. H. McCormick-Goodhart, Working Group 8 of ICOM Committee for Conservation, Photogr. Rec. 1, 262–267 (1990) (a) Images provided by MOLART Research Committee, NWO, P.O. Box 93138, 2509 AC Den Haag, The Netherlands. 39. J. R. J. Van Asperen de Boer, Sci. Technol. Eur. Cult., Proc. Symp. 1989, 1991, pp. 278–83. 40. J. R. J. Van Asperen de Boer, Infrared Reflectography, Central Research Laboratory for Objects of Art and Science, Amsterdam, 1970. 41. J. J. Rorimer, Ultraviolet Rays in the Examination of Art, The Metropolitian Museum of Art, NY, 1931. 42. A. M. De Wild, The Scientific Examination of Pictures, G. Bell and Sons, London, 1929. 43. Eastman Kodak Company, Infrared and Ultraviolet Photography, Kodak Publication No. M-3, Rochester, NY, 1953. 44. J. M. Messinger II, J. Am. Inst. Conserv. 31, 267–274 (1992). 45. Eastman Kodak Company, Applied Infrared Photography, Kodak Publication No. M-28, Rochester, NY, p. 36 ff. 46. C. Hoeniger, J. Am. Inst. Conserv. 30, 115–124 (1991). 47. C. H. Olin and T. G. Carter, in IIC-American Group Technical Papers from 1968 through 1970, International Institute for Conservation-American Group, 1970, pp. 83–88. 48. R. M. Johnson and R. L. Feller, in Application of Science in Examination of Works of Art, Museum of Fine Arts, Boston, 1965, pp. 86–95. 49. P. Kubelka and F. Munk, Z. Tech. Physik. 12, 593 (1931). 50. E. Walmsley, C. Metzger, C. Fletcher, and J. K. Delaney, Stud. Conserv. 39, 217–231 (1994). 51. E. Walmsley, C. Fletcher, and J. Delaney, Stud. Conserv. 37, 120–131 (1992). 52. Images from ‘‘Looking at Paintings: A Guide to Technical Terms’’, by Dawson W. Carr and Mark Leonard, The J. Paul Getty Museum in association with British Museum Press, 1992, pp. 74–75. Technical images courtesy of the Straus Center for Conservation, Harvard University Art Museums. 53. D. Bertani et al., Stud. Conserv. 35, 113–116 (1990). 54. B. Manz, Adv. Imaging 44–47 (Oct. 1992). 55. J. R. Druzik, D. L. Glackin, D. L. Lynn, and R. Quiros, J. Am. Inst. for Conserv. 22, 49–56 (1982). 56. Images provided by DAMRI, CEA Saclay 91193 Gif-surYvette Cedex, France, Tel: 33 (0)1 69 08 27 09, Fax: 33 (0)1 69 08 95 29, [email protected] 57. Eastman Kodak Company, Medical Radiography and Photography, Kodak Publication, vol. 63-1, Rochester, NY, 1987. 58. A. A. Moss, Handbook for Museum Curators, B-4, Museums Association, London, 1954. 59. R. E. Alexander and R. H. Johnston, Int. Conf. Nondestructive Testing Conserv. Works Art I/3, 1–17, 1983. 60. M. E. Scharfe, D. M. Pai and R. L. Gruber, ‘‘Electrophotography’’, Imaging Processes and Materials, J. Sturg, V. Walworth and Allan Shepp ed., 8th Ed., Ch. 5 Published by Van Nostrand Reinhold, NY, 1989, pp. 135–180. 61. S. Miura, in 8th Triennial Meet. ICOM Comm. Conserv., Getty Conservation Institute, Los Angeles, 1987. 62. I. M. Watt, Electron Microscopy, Cambridge University Press, 1997, p. 288. 63. Joice Zucker, J. Am. Inst. Conserv. 38(3) (1999). 64. A. J. Kossolapov, 9th Triennial Meet. ICOM Comm. Conserv., Getty Conservation Institute, Los Angeles, Aug. 1990, pp. 44–46.
682
IMAGING SCIENCE IN ASTRONOMY
65 (a) E. V. Sayre and H. N. Lechtman, Stud. Conserv. 13, 161–185 (1968); (b) M. W. Ainsworth et al., Art and Autoradiography: Insights into the Genesis of Paintings by Rembrandt, Van Dyck, and Vermeer, The Metropolitan Museum of Art, NY, 1982; (c) C. O. Fischer et al., Nucl. Instrum. Methods A 424, 258–262 (1999); (d) Images and supporting text provided by Ward Laboratory, Cornell University, Ithaca, NY 14853-7701. 66. D. L. Glackin and E. P. Korsmo, Jet Propulsion Laboratory, Final Report 83-75, JPL Publications, Pasadena, 1983. 67. J. R. Druzik, D. Glackin, D. Lynn, and R. Quiros, 10th Annu. Meet. Am. Inst. Conserv., 1982, pp. 71–72. 68. E. J. Wood, Textile Res. J. 60, 212–220 (1990). 69. F. Heitz, H. Maitre, and C. DeCouessin, IEEE Trans. Acous., Speech Signal Process. 38, 695–704 (1990). 70. J. Sobus, B. Pourdeyhimi, B. Xu, and Y. Ulcay, Textile Res. J. 62, 26–39 (1992). 71. K. Knox, R. Johnston, and R. L. Easton Jr., Opt. Photonics News 8, 30–34 (1997). 72. L. Likforman-Sulem, H. Maitre, and C. Sirat, Pattern Recognition 24, 121–137 (1991). 73. E. Lang, and D. Watkinson, Conserv. News 47, 37–39 (1992). 74. F. Su, OE Reports SPIE 99, 1,8–(1992). 75. J. L. Kirsch and R. A. Kirsch, Leonardo 21, 437–444 (1988). 76. J. F. Asmus, Opt. Eng. 28, 800–804 (1989). 77. R. Sablatnig, P. Kammerer, and E. Zolda, Proc. 14th Int. Conf Pattern Recognition, 1998, pp. 172–174. 78. L. R. Doyle, J. J. Lorre, and E. B. Doyle, Stud. Conserv. 31, 1–6 (1986). 79. J. Asmus, Byte Magazine March (1987). 80. P. Clogg, M. Diaz-Andreu, and B. Larkman, J. Archaeological Sci. 27, 837 (2000). 81. P. Clogg and C. Caple, Imaging the Past, British Museum Occasional Paper, London, 1996, p. 114.
IMAGING SCIENCE IN ASTRONOMY JOEL H. KASTNER Rochester Institute of Technology Rochester, NY
INTRODUCTION The vast majority of information about the universe is collected via electromagnetic radiation. This radiation is emitted by matter distributed across tremendous ranges in temperature, density, and chemical composition. Thus, more than any other science, astronomy depends on innovative methods to extend image taking to new, unexplored regions of the electromagnetic spectrum. To bring sufficient breadth and depth to their studies, astronomers also require imaging capability across a vast range in spatial resolution and sensitivity, with emphasis on achieving the highest possible resolution and signal gain in a given wavelength regime. This simultaneous quest for better wavelength coverage and ever higher spatial resolution and sensitivity represents the driving
force for innovation and discovery in astronomical imaging. Classical astronomy — for example, the search for new solar system objects and the classification of stars — is still largely conducted in the optical wavelength regime (400–700 nm). This has been the case, of course, since humans first imagined the constellations, noted the appearance of ‘‘wandering stars’’ (planets), and recorded the appearance of transient phenomena such as comets and novae. During the latter half of the twentieth century, however, a revolution in astronomical imaging took place (1). This relatively brief period in recorded history saw the development and rapid refinement of techniques for collecting and detecting electromagnetic radiation across a far broader wavelength range, from the radio through γ rays. Just as these techniques have reached maturation, astronomers have also developed the means to surmount apparently fundamental physical barriers placed on image quality, such as the distorting effects of refraction by Earth’s atmosphere and diffraction by a single telescope of finite aperture. The accelerating pace of these innovations has resulted in deeper understanding of, and heightened appreciation for, both the rich diversity of astrophysical phenomena and the fundamental, unsolved mysteries of the cosmos. OPENING THE WINDOWS: MULTIWAVELENGTH IMAGING For most of us, our eyes provide our first, fundamental contact with the universe. It is interesting to ponder how humans would conceive of the universe if we had nothing more in the way of imaging apparatus at our disposal, as was the case for astronomers before Galileo. In contrast to the complex cosmologies currently pondered in modern physics, most of which involve an expanding universe shadowed by the afterglow of the Big Bang, the ‘‘first contact’’ provided by our eyes produces a model of the universe that is entirely limited to the Sun, Moon, and planets, the nearby stars, and the faint glow of the collective background of stars in our own Milky Way galaxy and a handful of other, nearby galaxies. From this simple thought experiment, it is clear that the bulk of the visible radiation arriving at Earth is emitted by stars. But the apparent predominance of visible light from the Sun and nearby stars is in fact merely an accident of our particular position in the universe, combined with the evolutionary adaptation that gave our eyes maximal sensitivity at wavelengths of electromagnetic radiation that are near the maximum of the Sun’s energy output. The Sun provides by far the majority of the visible radiation arriving at Earth strictly by virtue of its proximity. The brightest star in the night sky, Sirius (in the constellation Canis Major), actually has an intrinsic luminosity about 50 times larger than that of the Sun, but is about 8.6 light years distant (a light year is the distance traveled by light in one year, 9 × 1012 km; the Sun is about 8 light minutes from Earth). In turn, Sirius is only about one-ten-thousandth as luminous as the star Rigel (in the neighboring constellation Orion),
IMAGING SCIENCE IN ASTRONOMY
683
but Sirius appears several times brighter than Rigel because it is about 50 times closer to us. Like the Sun, which has a surface temperature of about 6,000 K, most of the brightest stars have surfaces within the range of temperatures across which hot objects radiate very efficiently (if not predominantly) in the visible region. Representative stellar surface temperatures are 3,000 K for reddish Betelgeuse, a red supergiant in Orion; 10,000 K for Sirius; and 15,000 K for the blue supergiant Rigel (Fig. 1). Thermal Continuum Emission The tendency of objects at the temperatures of the Sun and stars to emit in the visible can be understood to first order via Planck’s Law, which describes the wavelength dependence of radiation emitted by a perfect blackbody. The peak of the Planck function lies within the visible regime for an object at a temperature of 6,000 K. This same fundamental physical principle tells us that objects much hotter or cooler than the Sun should radiate predominantly at wavelengths much shorter or longer than visible, respectively. Indeed, for a perfect blackbody, the peak wavelength of radiation is given by Wien’s displacement law (2), 0.51 . (1) λ(cm) ∼ T(K) This relationship between the temperatures of objects and the wavelengths of their emergent radiation allows us to understand why Betelgeuse appears reddish and Rigel appears blue (Fig. 2).
106 105
Rigel Deneb
Betetgeuse
M
Spica
Superglants
ts lan dg e Capella R Aldebaran Vega Arcturus Sirius A Pollux Altair Procyon A
qu se
103
n ai
102
ce en
Intrinsic brightness (Sun = 1)
104
10 1
Sun
10−1
W
hit
e
10−2
dw ar
ts Sirius B Procyon B
10−3 40,000
20,000
10,000 6,000 4,000 3,000 2,000
Figure 2. Wide-field photograph of Orion, illustrating the difference in color between the relatively cool star Betelgeuse (upper left) and the hot star Rigel (lower right). The large, red object at the lower center of the image, just below Orion’s belt, is the Orion Nebula (see Fig. 7). (Photo credit: Till Credner, AlltheSky.com) See color insert.
The same, simple relationship also provides powerful insight into astrophysical processes that occur across a very wide range of energy regimes (Fig. 3). The lowest energies and hence longest (radio) wavelengths reveal ‘‘cold’’ phenomena, such as emission from dust and gas in optically opaque clouds distributed throughout interstellar space in our galaxy. At the highest energies and hence shortest wavelengths (characteristic of X rays and γ rays), astronomers probe the ‘‘hottest’’ objects, such as the explosions of supermassive stars or the last vestiges of superheated material that is about to spiral into a black hole.
Stars' surface temperature (K) Figure 1. The Hertzsprung–Russell diagram. The diagram shows the main sequence (Sun-like stars that are fusing hydrogen to helium in the cores), red giants, supergiants, and white dwarfs. In addition, the positions of the Sun, the twelve brightest stars visible from the Northern Hemisphere, and the white dwarf companions of Sirius and Procyon are indicated [Source: NASA (http://observe.ivv.nasa.gov/nasa/core.shtml.html)]. See color insert.
Nonthermal Continuum Emission Certain radiative phenomena in astrophysics do not strongly depend on the precise temperature of the material and are instead sensitive probes of material density and/or chemical composition (3,4). For example, the emission from ‘‘jets’’ ejected from supermassive black holes at the centers of certain galaxies (Fig. 4) is said to be
684
IMAGING SCIENCE IN ASTRONOMY
Figure 3. Schematic diagram showing various regimes of the electromagnetic spectrum in terms of temperatures corresponding to emission in that regime. The diagram also illustrates the wavelength ‘‘niches’’ of NASA’s four orbiting ‘‘Great Observatories.’’ [Source: NASA/Chandra X-Ray Center (http://chandra.harvard.edu)]. See color insert.
‘‘nonthermal’’ because its source is high-velocity electrons that orbit around magnetic field lines. Other, similar examples are the emission from filaments of ionized gas located near the center of our own galaxy and from the chaotic remnant of the explosion of a massive star in 1054 A.D. (the ‘‘Crab Nebula’’). Such so-called ‘‘synchrotron radiation’’ often dominates radiation emitted in the radio wavelength regime (Fig. 5). Indeed, if human eyes were sensitive to radio rather than to visible wavelengths, the early mariners probably would have navigated by the Galactic Center and the Crab because they appear from Earth as the brightest stationary radio continuum sources in the northern sky. The synchrotron emission from the Crab is particularly noteworthy; it can be detected across a very broad wavelength range from radio through X ray (Fig. 6). Monochromatic (‘‘Line’’) Emission and Absorption Deducing Chemical Compositions. Astronomers use electronic transitions of atoms (as well as electronic, vibrational, and rotational transitions of molecules) as Rosetta stones to understand the chemical makeup of gas in a wide variety of astrophysical environments. Because each element or molecule radiates (and absorbs radiation) at a discrete and generally well-determined set of wavelengths — specified by that element’s particular subatomic structure — detection of an excess (or deficit) of
emission at one of these specific wavelengths1 is both necessary and sufficient to determine the presence of that element or molecule. Hence, our knowledge of the origin and evolution of the elements that make up the universe is derived from astronomical spectroscopy (which might also be considered multiband, one-dimensional imaging). Spectra obtained by disparate means across a very broad range of wavelengths can be used to ascertain both chemical compositions and physical conditions (i.e., temperatures and densities) of astronomical sources because the emissive characteristics of a given element depend on the physical conditions of the gas or dust in which it resides. For example, cold (100 K), largely neutral hydrogen gas emits strongly in the radio at 21 cm, whereas hot (10,000 K), largely ionized hydrogen gas emits at a series of optical wavelengths (known as the Balmer series). The former conditions are typical of the gas that permeates interstellar space in our own galaxy and in external galaxies, and the latter conditions are typical of gas in the proximity of very hot stars, which are sources of ionizing ultraviolet light. Such ionized gas also tends to glow brightly in the emission lines of heavier elements such as oxygen, nitrogen, sulfur, and iron (Fig. 7). 1 Such spectral features are called ‘‘lines,’’ because they appeared as dark lines in early spectra of the Sun.
IMAGING SCIENCE IN ASTRONOMY
Figure 4. At a distance of 11 million light years, Centaurus A is the nearest example of a so-called ‘‘active galaxy.’’ This radio image shows opposing ‘‘jets’’ of high energy particles blasting out from its center [Source: National Radio Astronomy Observatory (NRAO)]. See color insert.
Figure 5. The Crab Nebula is the remnant of a supernova explosion that was seen from the earth in 1054 A.D. It is 6,000 light years from Earth. This radio image shows the complex arrangement of gas filaments left in the wake of the explosion (Source: NRAO). See color insert.
Deducing Radial Velocities from Spectral Lines. Atomic and molecular emission lines also serve as probes of bulk motion. If a given source has a component of velocity along our line of sight, then its emission lines will be
685
Figure 6. X-ray image of the innermost region of the Crab Nebula. This image covers a field of view about one-quarter that of the radio image in the previous figure. The image shows tilted rings or waves of high-energy particles that appear to have been flung outward across a distance of a light year from the central star (Source: Chandra X-Ray Center). See color insert.
Figure 7. Color mosaic of the central part of the Great Nebula in Orion, obtained by the Hubble Space Telescope. Light emitted by ionized oxygen is shown as blue, ionized hydrogen emission is shown as green, and ionized nitrogen emission as red. The sources of ionization of the nebula are the hot, blue-white stars of the young Trapezium cluster, which is embedded in nebulosity just left of center in the image (Source: NASA and C.R. O’Dell and S.K. Wong). See color insert.
Doppler shifted away from the rest wavelength. The absorption or emission lines of sources that approach
686
IMAGING SCIENCE IN ASTRONOMY
Velocity (km/s)
30,000
20,000
10,000
0
0
100
200 300 Distance (Mpc)
400
500
Figure 8. Plot of recession velocity vs. distance [in megaparsecs (Mpc); 1 Mpc ≈ 3 × 1019 km] for a sample of galaxies. This figure illustrates that, to very high accuracy, the recession velocity of a distant galaxy, as measured from its redshift, is directly proportional to its distance. This correlation was first established in 1929 by Edwin Hubble and underpins the Big Bang model for the origin of the Universe (Figure courtesy Edward L. Wright, 1996).
us are shifted to shorter wavelengths and are said to be ‘‘blueshifted,’’ whereas the lines of sources moving away from us are shifted to longer wavelengths and are said to be ‘‘redshifted.’’ The observation by Hubble in 1929 that emission lines of distant galaxies are uniformly redshifted and that these redshifts increase monotonically as the distances of the galaxies increase, underpins modern theories of the expansion of the universe2 (Fig. 8). Images obtained at multiple wavelengths that span the rest wavelength of a bright spectral line can allow astronomers to deduce the spatial variation of lineof-sight velocity for a source whose velocity gradients are large. Such velocity mapping, which is presently feasible at wavelengths from the radio through the optical, helps elucidate the three-dimensional structure of sources (Fig. 9). Multiwavelength Astronomical Imaging: An Example Planetary nebulae represent the last stages of dying, Sunlike stars. These highly photogenic nebulae are formed after the nuclear fuel at the core of a Sun-like star has been spent, that is, the bulk of the core hydrogen has been converted to helium. The exhaustion of core hydrogen and the subsequent nuclear fusion, in concentric shells, of hydrogen into helium and helium into carbon around the 2
In practice, all astrophysical sources that emit line radiation — even those within our solar system — will appear Doppler shifted, due for example, to the Earth’s motion around the Sun. Hence it is necessary to account properly for ‘‘local’’ sources of Doppler shifts when deducing the line-of-sight velocity component of interest.
spent core causes the atmosphere of the star to expand, forming a red giant. Although the extended atmospheres of red giants are ‘‘cool’’ enough (∼3,000 K) for dust grains to condense out of the stellar gas, red giant luminosities can be huge (more than 10,000 times that of the Sun). This radiant energy pushes dust away from the outer atmosphere of the star at speeds of 10–20 km s−1 . The outflowing dust then collides with and accelerates the gas away from the star, as well. Eventually enough of the atmosphere is removed so that the hot, inert stellar core is revealed. This hot core is destined to become a fossil remnant of the original star: a white dwarf. But before the ejected atmosphere departs the scene entirely, it is ionized by the intense ultraviolet light from the emerging white dwarf, which has cooled from core nuclear fusion temperatures (107 to 108 K) to a ‘‘mere’’ 105 K or so. The ionizing radiation from the white dwarf causes the ejected gas to fluoresce, thereby producing a planetary nebula. Because the varied conditions that characterize the evolution of planetary nebulae result in a wide variety of phenomena in any given nebula, such objects demand a multiwavelength approach to imaging. A case in point is the young planetary nebula BD +30° 3639 (Fig. 10). This planetary nebula emits strongly at wavelengths ranging from radio through X ray. The Chandra X-ray image shows a region of X-ray emission that seems to fit perfectly inside the shell of ionized and molecular gas seen in Hubble Space Telescope images and in other high-resolution images obtained from the ground. The optical and X-ray emitting regions of BD +30° 3639, which lies about 5,000 light years away, are roughly 1 million times the volume of our solar system. The X-ray emission apparently originates in thin gas that is heated by collisions between the ‘‘new’’ wind blown by the white dwarf, which is seen at the center of the optical and infrared images, and the ‘‘old,’’ photoionized red giant wind, which appears as a shell of ∼10,000 K gas surrounding the ‘‘hot bubble’’ of X-ray emission. REQUIREMENTS AND LIMITATIONS To understand the requirements placed on spatial resolution and sensitivity in astronomical imaging, we must consider the angular sizes and energy fluxes of astronomical objects and phenomena of interest. In turn, there are three fundamental sources of limitation on the resolution and limiting sensitivity (and hence quality) of astronomical images: the atmosphere, the telescope, and the detector. Spatial Resolution Requirements: Angular Size Scales of Astronomical Sources. Figure 11 shows schematically typical scales of physical size and distance from Earth for representative objects and phenomena studied by astronomers. Most of the objects of intrinsically small size, like the Sun, Moon, and the planets in our solar system, lie at small distances; we can study these small objects in detail only because they are relatively close, such that their angular sizes are substantial.
IMAGING SCIENCE IN ASTRONOMY
687
38′′41′45′′
38′′41′40′′
38′′41′36′′
Dec. (2000.0)
38′′41′30′′ 38′′41′45′′
38′′41′40′′
38′′41′35′′
38′′41′30′′ 21h02m19.0
21h02m18.5 R.A. (2000.0)
21h02m18.0
Figure 9. Radio maps of the Egg Nebula, a dying star in the constellation Cygnus, showing emission from the carbon monoxide molecule. At the lower left is shown blueshifted CO emission, and at the lower right redshifted emission; the upper right panel shows the total intensity of CO emission from the source. One interpretation for the localized appearance of the blueshifted and redshifted CO emission is that the Egg Nebula is the source of a complex system of ‘‘molecular jets,’’ shown schematically in the top left panel. Such jets may be quite common during the dying stages of Sun-like stars [Source: Lucas et al. 2000 (5)]. See color insert.
Within our own Milky Way galaxy, we observe objects that span a great range of angular size scales. The angular size of a Sun-like star at even a modest distance makes such stars a challenge to resolve spatially, even with the best available techniques. On the other hand, many structures of interest in our own Milky Way galaxy, such as star-forming molecular clouds and the expelled remnants of dying or expired stars, are sufficiently large that their angular sizes are quite large.3 Certain giant molecular clouds, planetary nebulae, and supernova remnants subtend solid angles similar to that of the Moon. Just as for stars, the angular sizes of external galaxies span a very wide range. The Magellanic Clouds, which are the nearest members of the Local Group of galaxies (of which the Milky Way is the most massive and luminous
3
The ejected envelopes of certain dying, sun-like stars were long ago dubbed ‘‘planetary nebulae’’ because their angular sizes and round shapes resembled the planets Jupiter and Saturn.
member), are detectable and resolvable by the naked eye, whereas the Andromeda galaxy (a Local Group member that is a near-twin to the Milky Way) is detectable and resolvable with the aid of binoculars. The angular sizes of intrinsically similar galaxies in more distant galaxy clusters span a range similar to that of the planets in our solar system. The luminous cores of certain distant galaxies (‘‘quasars’’) — which can outshine their host galaxies — likely have sizes only on the order of that of our solar system; yet these are some of the most distant objects known, and hence quasars are exceedingly small in angular size. Galaxy clusters themselves are of relatively large angular size, simply by virtue of their enormous size scales; indeed, such clusters (and larger scale structures that consist of clusters of such clusters) probably represent the largest gravitationally bound structures in the universe. At still larger size scales lies the cosmic background radiation, the radiative remnant of the Big Bang itself. This radiation encompasses 4π steradians and has only very subtle variations in intensity with position across the sky.
688
IMAGING SCIENCE IN ASTRONOMY
Figure 10. Optical (left), infrared (center), and X-ray (right) images of the planetary nebula BD +30° 3639 [Source: Kastner et al. 2000 (6)]. The optical image was obtained by the Wide Field/Planetary Camera 2 aboard the Hubble Space Telescope in the light of doubly ionized ˚ The infrared image was obtained by the 8-meter Gemini North sulfur at a wavelength of 9,532 A. telescope at a wavelength of 2.2 µm (also referred to as the infrared K band). The X-ray image was obtained by the Advanced CCD Imaging Spectrometer aboard the Chandra X-Ray Observatory, ˚ Images are presented at the same spatial and covers the wavelength range from ∼7 A˚ to ∼30 A. scale. See color insert.
Of course, even within our solar system, there are sources of great interest (e.g., the primordial, comet-like bodies of the Kuiper Belt) that are sufficiently small that they are unresolvable by present imaging techniques. Sources of large angular sizes (such as molecular clouds, planetary nebulae, supernova remnants, and galaxy clusters) typically show a great wealth of structural detail when imaged at high spatial resolution. Thus, our knowledge of objects at all size and distance scales improves with any increase in spatial resolving power at a given wavelength. Limitations
Atmosphere. Time- and position-dependent refraction by turbulent cells in the atmosphere causes astronomical point sources, such as stars, to ‘‘scintillate’’; i.e., stars twinkle. Scintillation occurs when previously planeparallel wave fronts from very distant sources encounter atmospheric cells and become distorted. Astronomers use the term ‘‘seeing’’ to characterize such atmospheric image distortion; the ‘‘seeing disk’’ represents the diameter of an unresolved (point) source that has been smeared by atmospheric distortion. Seeing varies widely from site to site, but optical seeing disks at visual wavelengths are typically not smaller than (that is, the seeing is not better than) ∼1 at most mountaintop observatories. Telescope. The diameter of a telescope places a fundamental limitation on the angular resolution at a given wavelength. Specifically, the limiting angular resolution (in radians) is given by θ ≈ 1.2
λ d
(2)
where θ is the angle subtended by a resolution element, λ is the wavelength of interest, and d is the telescope diameter. This relationship follows from consideration of simple interference effects of wave fronts incident
on a circular aperture, in direct analogy to planeparallel waves of wavelength λ incident on a single slit of size d. The resulting intensity distribution for a point source (known as the ‘‘point-spread function’’) is in fact a classical diffraction pattern, a central disk (the ‘‘Airy disk’’) surrounded by alternating bright and dark annuli. In ground-based optical astronomy using large telescopes, atmospheric scintillation usually dominates over telescope diffraction (that is, the ‘‘seeing disk’’ is much larger than the ‘‘Airy disk’’), and such a diffraction pattern is not observed. However, in spacebased optical astronomy or in ground-based infrared and radio astronomy, diffraction represents the fundamental limitation on spatial resolution.
Detector. Charge-coupled devices (CCDs) have been actively used in optical astronomy for more than two decades. During this period, CCD pixel sizes have steadily decreased, and array formats have steadily grown. As a result, CCDs have remained small and still maintain good spatial coverage. Detector array development at other wavelength regimes lags behind the optical, to various degrees, in number and spacing of pixels. However, almost all regimes, from X ray to radio, now employ some form of detector array. Sizes range from the suite of ten 1, 024 × 1, 024 X-ray-sensitive CCDs aboard the orbiting Chandra X-Ray Observatory to the 37- and 91-element bolometer arrays used for submillimeter-wave imaging by the James Clerk Maxwell Telescope on Mauna Kea. These devices have a common goal of achieving a balance between optimal (Nyquist) sampling of the point-spread function and maximal image (field) size. Sensitivity Requirements: Energy Fluxes of Astronomical Sources. Astronomical sources span an enormous range of intrinsic luminosity. Figure 12 readily shows that the least luminous objects known tend to be close to Earth (e.g.,
IMAGING SCIENCE IN ASTRONOMY
689
1015
Virgo_cluster
1010 M31 Cygnus_GMC
Radius (astron. units)
Cas_A 105 Ring_Nebula CS_disk 3C273 Betelgeuse
100 Sun
Alpha_Cen
Jupiter Moon
Pluto
10−5 Crab_pulsar
10−10 10−10
10−5
100 Distance (light years)
105
1010
Figure 11. Physical radii vs. distances (from Earth) for representative astronomical sources (7). One astronomical unit (AU) is the Earth–Sun distance (1.5 × 108 km). A light year is the distance traveled by light in one year (9 × 1012 km). Represented in the figure are objects within our own solar system, the nearby Sun-like star α Cen, the red supergiant Betelgeuse, the pulsar at the center of the Crab Nebula supernova remnant, a typical circumstellar debris disk (‘‘CS disk’’), a typical planetary nebula (the Ring Nebula), the supernova remnant Cas A, the galactic giant molecular cloud located in the direction of the constellation Cygnus (‘‘Cygnus GMC’’), the nearby Andromeda galaxy (M31), the quasar 3C 273, and the Virgo cluster of galaxies. Diagonal lines represent lines of constant angular size, and angular size decreases from upper left to lower right.
small asteroids in the inner solar system), and the most luminous sources known (e.g., the central engines of active galaxies or the primordial cosmic background radiation) are also the most distant. This tendency to detect intrinsically more luminous sources at greater distances follows directly from the expression for energy flux received at Earth, F=
L , 4π D2
(3)
where F is the flux, L is luminosity, and D is distance. Thus an astronomical imaging system that has a limiting sensitivity F ≥ Fl penetrates to a limiting distance, L , (4) Dl ≤ 4π Fl
for sources of uniform luminosity L. Real samples (of, e.g., stars or galaxies), of course, may include a wide range of intrinsic luminosities. As a result, there tends to be strong selection bias in astronomy, such that the number and/or significance of intrinsically faint objects tends to be underestimated in any sample of sources selected on the basis of minimum flux. For this reason in particular, astronomers require increasingly sensitive imaging systems. To calibrate detected fluxes properly, such systems must still retain good dynamic range, so that the intensities of faint sources can be accurately referenced to the intensities of bright, well-calibrated sources. In addition, because a given source of extended emission may display a wide variation in surface brightness, a combination of high sensitivity and good dynamic range frequently is required to characterize
690
IMAGING SCIENCE IN ASTRONOMY
1015 3C273
M31 1010
Total luminosity (solar units)
105
Betelgeuse Cas_A Ring_Nebula
Alpha_Cen
Sun
100
CS_disk Crab_pulsar
10−5
Jupiter 10−10
Moon
Pluto 10−15 −10 10
10−5
100 Distance (light years)
105
1010
Figure 12. Intrinsic luminosities vs. distances (from Earth) for representative astronomical sources; symbols are the same as in the previous figure. Luminosities are expressed in solar units, where the solar luminosity is 4 × 1033 erg s−1 . Diagonal lines represent lines of constant apparent brightness, and apparent brightness decreases from upper left to lower right.
source morphology adequately and, hence, deduce intrinsic source structure. Limitations
Atmosphere. The Earth’s atmosphere attenuates the signals of most astronomical sources. Signal attenuation is a function of both the path length through the atmosphere between the source and telescope and the atmosphere’s intrinsic opacity at the wavelength of interest. Atmospheric attenuation tends to be smallest at optical and longer radio wavelengths, at which the atmosphere is essentially transparent. Attenuation is largest at very short (γ ray, X ray and UV) wavelengths, where the atmosphere is essentially opaque; attenuation is also large in the infrared. In the infrared regime especially, atmospheric transparency depends strongly on wavelength because the main source of opacity is absorption by molecules (in particular, water vapor). The atmosphere also is a source of ‘‘background’’ radiation at most wavelengths, particularly in the thermal infrared and far-infrared (2 µm ≤ λ ≤ 1 mm), at which
most of the blackbody radiation of the atmosphere emerges. This background radiation tends to limit the signal-to-noise ratio of infrared observations for which other noise sources (such as detector noise) are minimal. Elimination of thermal radiation from the atmosphere provides a primary motivation for the forthcoming Space Infrared Telescope Facility (SIRTF), the last in NASA’s line of Great Observatories.
Telescope. Sensitivity (or image signal-to-noise ratio) is directly proportional to the collecting area and efficiency of the telescope optical surfaces (‘‘efficiency’’ here refers to the fraction of photons incident on the telescope optical surface that are transmitted to the camera or detector).4 Reflecting telescopes supplanted refracting telescopes at the beginning of the twentieth century because large primary mirrors could be supported more easily than large
4 The product of telescope collecting area and efficiency is referred to as the effective area of the telescope.
IMAGING SCIENCE IN ASTRONOMY
objective lenses and the aluminized surface of a mirror provides nearly 100% efficiency at optical wavelengths. Furthermore, unlike lenses, paraboloid mirrors provide images that are free of spherical or chromatic aberrations. These same mirrors provide excellent efficiency and image quality in the near-infrared, as well. Parabolic reflectors are also used as the primary radiation collecting surfaces in the radio regime, where the requirements of mirror figure are less stringent (due to the relatively large wavelengths of interest).
Detector. The photon counting efficiency of a detector and sources of noise within the detector also dictate the image signal-to-noise ratio. Photon counting efficiency is usually referred to as detector quantum efficiency (QE). Detector QEs at or higher than 80% are now feasible in many wavelength regimes; however, such high QE often comes at the price of the introduction of noise. Typical image noise sources are read noise, the inherent uncertainty in the signal readout of the detector, and dark signal, the signal registered by the detector in the absence of exposure to photons from an external source. Surmounting the Obstacles Beating the Limitations of the Atmosphere: Adaptive Optics and Space-Based Imaging. Adaptive optics techniques have been developed to mitigate the effects of atmospheric scintillation. In such systems, the image of a fiducial point source — either a bright star or a laser-generated artificial ‘‘star’’ — is continuously monitored, and these data are used to drive a quasi-real-time image correction system (typically a deformable or steerable mirror). Naturally — as has been demonstrated by the spectacular success of the refurbished Hubble Space Telescope — placement of the telescope above the Earth’s atmosphere provides the most robust remedy for the effects of atmospheric image distortion. Beating the Limitations of Aperture: Interferometry. The diffraction limit of a single telescope can be surmounted by using two or more telescopes in tandem. This technique is referred to as ‘‘interferometry’’ because it uses the interference patterns produced by combination of light waves from multiple sources. Therefore, the angular resolution of such a multiple telescope system, at least in one dimension, is limited by the longest separation between telescopes, rather than by the aperture of a single telescope. However, it is generally not possible to ‘‘fill in’’ the gaps between two telescopes at large separation by using many telescopes at smaller separation. As a result, interferometry is generally limited to relatively bright sources, and interferometric image reconstruction techniques necessarily sacrifice information at low spatial frequencies (i.e., large-scale structure) in favor of recovering information at high spatial frequency (fine spatial structure). Interferometry has long been employed at radio wavelengths because recombination of signals from multiple apertures is relatively easy at long wavelengths. Indeed, the angular resolution achieved routinely at centimeter wavelengths by NRAO’s Very Large Array in New Mexico rivals or
691
exceeds that of optical imaging by the Hubble Space Telescope. Recently, however, several optical and infrared interferometers have been developed and successfully deployed; examples include the Navy Prototype Optical Interferometer at Anderson Mesa and the Infrared Optical Telescope Array on Mt. Hopkins, both in Arizona, and the optical interferometer operated at Mt. Wilson, California, by the Center for High Angular Resolution Astronomy. Beating the Limitations of Materials: Mirror Fabrication. The sheer weight of monolithic, precision-ground mirrors and the difficulty of maintaining the requisite precise figures renders them impractical for constructing telescope apertures larger than about 8 meters in diameter. Hence, during the late 1980s and early 1990s, two competing large mirror fabrication technologies emerged: spin-cast and segmented mirrors (Fig. 13). Both methods have yielded large mirrors whose apertures are far lighter and more flexible than previously feasible. The former method has yielded the 8-meter-class mirrors for facilities such as the twin Gemini telescopes, and the latter method has yielded the largest mirrors thus far, for the twin 10-meter Keck telescopes on Mauna Kea. It is not clear, however, that either technique can yield optical-quality mirrors larger than about 15 meters in diameter. An entirely different mirror fabrication approach is required at high energies because, for example, X rays are readily absorbed (rather than reflected) by aluminized glass mirrors when such mirrors are used at near-normal incidence. The collection and focusing of X-ray photons instead requires grazing incidence geometry to optimize efficiency and nested mirrors to optimize collecting surface (Fig. 14). The challenge now faced by high-energy astronomers is to continue to increase the effective area of such optical systems while meeting the strict weight requirements imposed by space-based observing platforms. It is not clear that facilities larger than the present Chandra and XMM-Newton observatories are practical given present fabrication technologies; indeed, Chandra was the heaviest payload ever launched aboard a NASA Space Shuttle. THE SHAPE OF THINGS TO COME Projects in Progress At this time, several major new astronomical facilities are partially or fully funded and are either in design or under construction. All are expected to accelerate further the steady progress in our understanding of the universe. A comprehensive list is beyond the scope of this article; however, we mention a few facilities of note. • The Space Infrared Telescope Facility (SIRTF): SIRTF is a modest-aperture (0.8 m) telescope equipped with instruments of extraordinary sensitivity for observations in the 3 to 170 µm wavelength regime. SIRTF features a powerful combination of sensitive, wide-field imaging and spectroscopy at low to moderate resolution over this wavelength range. It is well equipped to study (among many other things) primordial galaxies, newborn
692
IMAGING SCIENCE IN ASTRONOMY
Figure 13. Photo of the segmented primary mirror of the 10-meter Keck telescope (Photo credit: Andrew Perala and W.M. Keck Observatory). See color insert. Field-of-view — 5°
Doubly reflected X rays
Four nested hyperboloids
Focal surface
Doubly reflected X rays
10 meters
X rays
X rays
Four nested paraboloids Mirror elements are 0.8 m long and from 0.6 m to 1.2 m in diameter
Figure 14. Geometry of the nested mirrors aboard the orbiting Chandra X-Ray Observatory [Source: NASA/Chandra X-Ray Center (http://chandra.harvard.edu)]. See color insert.
stars and planets, and dying stars because all of these phenomena emit strongly in the mid- to far-infrared. SIRTF has a projected 5-year lifetime and is expected to be deployed into its Earth-trailing orbit in 2002. • The Stratospheric Observatory for Infrared Astronomy (SOFIA): SOFIA will consist of a 2.5-meter telescope and associated cameras and spectrometers installed aboard a Boeing 747 aircraft. SOFIA will be the largest airborne telescope in the world. Due to
its ability to surmount most of Earth’s atmosphere, SOFIA will make infrared observations that are impossible for even the largest and highest groundbased telescopes. The observatory is being developed and operated for NASA by a consortium led by the Universities Space Research Association (USRA). SOFIA will be based at NASA’s Ames Research Center at Moffett Federal Airfield near Mountain View, California. It is expected to begin flying in the year
IMAGING SCIENCE IN BIOCHEMISTRY
2004 and will remain operational for two decades. Like SIRTF, SOFIA is part of NASA’s Origins Program, and hence its science goals are similar and complementary to those of SIRTF. • The Atacama Large Millimeter Array (ALMA): ALMA will be a large array of radio telescopes optimized for observations in the millimeter wavelength regime and situated high in the Atacama desert in the Chilean Andes. Using a collecting area of up to 10,000 square meters, ALMA will feature roughly 10 times the collecting area of today’s largest millimeter-wave telescope arrays. Its telescope-totelescope baselines will extend to 10 km, providing angular resolution equivalent to that of a diffractionlimited optical telescope whose diameter is 4 meters. ALMA observations will focus on emission from molecules and dust from very compact sources, such as galaxies at very high redshift and solar systems in formation. Recommendations of the Year 2000 Decadal Review The National Research Council, the principal operating arm of the National Academy of Sciences and the National Academy of Engineering, has mapped out priorities for investments in astronomical research during the next decade (8). The NRC study should not be used as the sole (or perhaps even primary) means to assess future directions in astronomy, but this study, which was funded by NASA, the National Science Foundation, and the Keck Foundation does offer insight into some potential groundbreaking developments in multiwavelength astronomical imaging. Highest priority in the NRC study was given to the Next Generation Space Telescope (NGST). This 8-meterclass, infrared-optimized telescope will represent a major improvement on the Hubble Space Telescope in both sensitivity and spatial resolution and will extend spacebased infrared imaging into the largely untapped 2–5 µm wavelength regime. This regime is optimal for studying the earliest stages of star and galaxy formation. NGST presently is scheduled for launch in 2007. Several other major initiatives were also deemed crucial to progress in astronomy by the NRC report. Development of the ground-based Giant Segmented Mirror Telescope was given particularly high priority. This instrument has as its primary scientific goal the study of the evolution of galaxies and the intergalactic medium. Other projects singled out by the NRC report include • Constellation-X Observatory, a next-generation X-ray telescope designed to study the origin and properties of black holes; • a major expansion of the Very Large Array radio telescope in New Mexico, designed to improve on its already unique contributions to the study of distant galaxies and the disk-shaped regions around stars where planets form; • a large ground-based survey telescope, designed to perform repeated imaging of wide fields to search for both variable sources and faint solar-system
693
objects (including near-Earth asteroids and some of the most distant, undiscovered objects in the solar system); and • the Terrestrial Planet Finder, a NASA mission designed to discover and study Earth-like planets around other stars. BIBLIOGRAPHY 1. A. Sandage, Ann. Rev. Astron. Astrophys. 37, 445–486 (1999). 2. K. R. Lang, Astrophysical Formulae, 3rd ed., Springer-Verlag, Berlin, 1999. 3. G. B. Rybicki and A. P. Lightman, Radiative Processes in Astrophysics, John Wiley & Sons, Inc., NY, 1979. 4. D. Osterbrock, Astrophysics of Gaseous Nebulae and Active Galactic Nuclei, University Science Books, Mill Valley, 1989. 5. R. Lucas, P. Cox, and P. J. Huggins, in J. H. Kastner, N. Soker, and S. Rappaport, eds., Asymmetrical Planetary Nebulae II: From Origins to Microstructures, vol. 199, Astron. Soc. Pac. Conf. Ser., 2000, p. 285. 6. J. H. Kastner, N. Soker, S. Vrtilek, and R. Dgani, Astrophys. J. (Lett.) 545, 57–59 (2000). 7. C. W. Allen and A. N. Cox, Astrophysical Quantities, 4th ed., Springer Verlag, Berlin, 2000. 8. C. McKee et al., Astronomy and Astrophysics in the New Millennium, National Academy Press, Washington, 2001.
IMAGING SCIENCE IN BIOCHEMISTRY NICOLAS GUEX TORSTEN SCHWEDE MANUEL C. PEITSCH Glaxo Smith Klime Research & Development SA Geneva, Switzerland
INTRODUCTION Research in biology, aimed at understanding the fundamental processes of life, is both an experimental and an observational science. During the last century, all classes of biomolecules relevant to life have been discovered and defined. Consequently, biology progressed from cataloging species and their life styles to analyzing their underlying molecular mechanisms. Among the molecules required by life, proteins represent certainly the most fascinating class because they are the actual ‘‘working molecules’’ involved in both the processes of life and the structure of living beings. Proteins carry out diverse functions, including signaling and chemical communication (for example kinases and hormones), structure (keratin and collagen), transport of metabolites (hemoglobin), and transformation of metabolites (enzymes). In contrast to modeling and simulation, observation and analysis are the main approaches used in biology. Early biology dealt only with the observation of macroscopic phenomena, which could be seen by the naked eye. The development of microscopes permitted observation of the smaller members of the living kingdom and hence of the cells and the organelles they contain (Fig. 1). Observing
694
IMAGING SCIENCE IN BIOCHEMISTRY
Atom 10
−11
10−10 Å
Amino acid Protein 10−9 nm
10−8
Virus
Prokaryotic cell Eukaryotic cell nucleus
10−7
10−6 µm
10−5
10−4
Drosophila 10−3 mm
10−2 cm
Mouse 10−1
Human 1 m
10 m
Violet Green Red Visible light
Figure 1. Relative sizes (indicative only) of atoms, molecules, and organisms, shown on a log scale of lengths from meter to angstrom with the corresponding wavelength of electromagnetic radiation. Imaging an object requires electromagnetic radiation of wavelength equal to or smaller than the object.
Figure 2. Representation of gene expression to produce a protein. A gene (top left) is transcribed into a corresponding mRNA (not shown). The latter is processed by a ribosome (not shown) which links amino acids together to form a polypeptidic chain (top right). The polypeptidic chain (bottom right) folds into a compact functional protein (bottom left).
macromolecules on an atomic level could not be achieved by visible light or electron microscopy, until recent advances allowed imaging the outer shapes of large molecular entities at low resolution (1,2). Consequently, indirect methods that reveal molecular organization in a crystal (X-ray crystallography) or interactions between atoms (NMR) have been developed and are routinely used. Thus, bear in mind that all images are not direct observations. Images in this chapter were carefully selected to give the reader a broad view of the various modes of representation used by scientists to help them unravel protein structure and function. Several programs to generate such images are available (see Software), and most of them can display or generate many of the different representations shown in this article. For a general introduction into principles of protein structure, see Ref. (3). Proteins are linear polypeptides built from 20 different building blocks called amino acids. The information needed to make a protein is encoded by DNA, which is, for eucaryotes, located in the cell’s nucleus. A segment of a DNA molecule, called a gene, is transcribed into an intermediate (mRNA), which is then processed by ribosomes to produce a protein (for details about this process, refer to a biology (4) or biochemistry textbook (5–7)). During this process, individual L-α-amino acids are linked together by peptide (amide) bonds to form a polypeptide (a continuous chain of amino acid residues, in which the carbonyl carbon atom of an amino acid is covalently linked to the nitrogen atom of the next amino acid). Then, this polypeptide folds into its final 3-D conformation to assume a specific function (Fig. 2).
There is no freedom of rotation around the peptidic bond [ω = 180 ° (trans), or 0 ° (cis]; all of the conformational flexibility of the protein backbone is due to two rotatable bonds for each amino acid residue (the dihedral angles φ and ϕ) (Fig. 3a). Most of the 20 natural amino acids have additional rotatable bonds in their sidechains. These dihedral angles are named χ1 –χ5 . Figure 3a is an abstraction in which each atom is represented by its chemical symbol (N,C,O. . .) connected by lines symbolizing covalent bonds. This flat representation of structure, frequently seen in textbooks, does not reflect the spatial arrangement of atoms. Therefore, more realistic representations are used to reflect the three-dimensional nature of the compound. In Fig. 3b the tripeptide segment of Fig. 3a is drawn in ball-and-stick style, where position of atoms are marked by small balls connected by sticks symbolizing the chemical bond. This reveals the geometry (distances and angles), but it does not reflect the space occupied by the atoms accurately and gives the erroneous impression that the atoms of molecules are not tightly packed. Plotting dotted spheres using the van der Waals radii of the respective atom can remedy this shortcoming (Fig. 3b). On the other hand, space-filling models (CPK, Corey–Pauling–Koltun) give a reasonable impression of the overall shape, even though the underlying chemical structure cannot be recognized easily (Fig. 3c). In all cases, those images are static projections and provide only a limited representation of the object from a single viewpoint. Therefore, the possibility of manipulating models of 3-D objects in real time proved essential for the development of the field. Indeed, the overall shape and orientation of an object are much easier to grasp when it is smoothly
IMAGING SCIENCE IN BIOCHEMISTRY
695
(b)
(a) w c1 c2
c3 c4 c5
(c)
1.5Å
Figure 3. Three residues (ala-lys-ala) in a polypeptide chain. (a) Schematic drawing of the chemical structure. ω dihedral angles values are restricted to 180 ° or 0 ° , but conformational flexibility is possible through the freely rotatable bonds around dihedral angles , , and χ1 –χ5 . (b) Ball-and-stick model with the van der Waals radii of atom shown as dotted spheres. (c) Space-filling model. Atoms are colored by chemical element (nitrogen, N, blue; carbon, C, gray; oxygen, O, red). See color insert.
rotated in real time or rendered illuminated with lights casting shadows (Fig. 4), than when it is only drawn in thin lines. However, those representations cannot replace the realism achieved through stereoscopic vision (Figs. 5 and 6). Various computer display methods and tools have been developed to ease the manipulation of 3-D objects. Indeed, it is always difficult to work with 2-D projections because they lack the ‘‘feeling’’ obtained by observing a
Figure 4. A cube presented in a wire frame does not allow the viewer to determine the sphere location precisely. It could be on a face or at the center (left). The same cube properly lit and rendered with shadows (right) gives a much clearer picture of the relative positions of both sphere and cube.
3-D environment (for example, it is easier to evaluate a distance or the slope of a mountain in a real environment than by looking at a map). Because proteins and molecules are 3-D objects, their representation directly benefits from these technologies. Common practice is to draw stereo pairs, where the left and right images are rotated by −3 and +3 ° , respectively, and are separated by 6 cm (about the distance between human eyes) (Fig. 6). This allows parallel viewing of small images (left eye image on left side), as in journal articles. For larger images, such as on computer displays or projection screens, images are placed for cross-eyed viewing (right image on the left side). In either cases, the brain reconstructs a 3-D image. This helps immensely to capture the orientation of compounds in an active site or the overall topology of a protein. The disadvantage of this technique is that many people need an adaptive time to ‘‘switch’’ to stereo view. Indeed, the angle of view is naturally linked to the focusing distance. However stereo perception requires decoupling the angle of view from the focusing distance, that is, for parallel viewing, ‘‘look at infinity’’ while focusing on the screen or paper plane (literally seeing ‘‘through’’ the paper or screen plane). Hence, specialized hardware has been developed that allows for true stereo display. The basic principle is always the same, because two images taken at slightly
696
IMAGING SCIENCE IN BIOCHEMISTRY (a)
(b)
(c)
Figure 5. How stereo perception is achieved: an object (top) is displayed from two different points of view (middle), as it would be perceived by each eye. Various means are available to present the left image to the left eye and the right image to the right eye. For instance, both images can be alternately presented at the same screen position, provided that the left eye is obstructed when the right image is displayed, and conversely. This is achieved by using special LCD shutter glasses synchronized with the screen. Alternatively, images can be colored (in green for the left one and in red for the right one). Then, glasses tinted with the complementary colors (green for the left eye, red for the right eye) can be used to filter out the view that does not correspond to the proper eye. See color insert.
different points of view (to mimic the human stereoscopic vision) are generated and displayed alternately on a screen at high frequencies (Fig. 5). Stereo glasses equipped with LCD shutters (the left lens is obstructed when the right image is displayed, and conversely) are synchronized with the screen, allowing the viewer to see in 3D without effort. Other systems in which LCD screens are built so that the left eye sees pixels different from the right one also allow for 3-D perception when appropriate images are presented to each eye. Those 3-D systems are especially useful during experimental structure elucidation by X-ray crystallography. PROTEIN STRUCTURE DETERMINATION BY X-RAY CRYSTALLOGRAPHY Understanding the microscopic details of life processes is a long-standing task for scientists. Unfortunately, the resolution of optical devices is limited approximately to the wavelength of the electromagnetic radiation. Distances between bonded atoms in molecules are on the order of 0.15 nm or 1.5 A˚ (the unit A˚ is commonly used in crystallographic literature instead of the SI units nm or pm; 1 A˚ corresponds to 10−10 m) (Fig. 1). Therefore, exploration on a molecular or even atomic level cannot
Figure 6. Some stereo images. (a): same cube as in Fig. 4; (b): ATP, an energy transfer agent in all organisms. (c): a protein [retinoic acid transporter protein; PDB entry 1CBQ (40)]. To learn to view in stereo, start with the cube because it is easier to see in 3D. (The trick is to view the left image with the left eye and the right image with the right eye; this gives the impression of ‘‘moving’’ the two spheres until they superpose. Sometimes it helps to place a postcard between the two pictures.) For detailed instructions in stereo viewing, see http://www.usm.maine.edu/∼rhodes/0Help/StereoView.html.
be achieved with microscopes using visible light of wavelengths 400–700 nm. Currently, two experimental methods allow resolving details of large molecules on the level of individual atoms. The most commonly used method, single-crystal X-ray crystallography, can be used for very large macromolecules but is limited by the availability of protein crystals. As of June 2001 about 11,630 structures determined by this method have been deposited in the Protein Data Bank, the repository for 3-D macromolecular structure data (8) (http://www.rcsb.org/pdb/). About 1,918 structures of small macromolecules (e.g., proteins up to a molecular weight of 40 kDa) have been solved in solution rather than in the crystalline state by NMR methods (see references 9,10, and 11 for details on NMR methods). Both experimental methods do not produce a picture of the molecule directly but rather provide data that allows computing a molecular model that is consistent with experimental observations. This article will schematically describe structure solution
IMAGING SCIENCE IN BIOCHEMISTRY
by single-crystal X-ray crystallography. For a more complete introduction to protein crystallography, see Ref. 11. X-Ray Diffraction by Protein Crystals Electromagnetic radiation in the range of 0.1–100 A˚ is called X rays. X rays in the useful range for ˚ can be produced either by crystallography (0.5–3 A) X-ray tubes (in which accelerated electrons strike a metal target) or from synchrotron radiation, a by-product in particle accelerators. In both cases, the resulting primary radiation must be filtered by monochromators to produce X rays of a single wavelength. X-ray tubes are limited to a fixed wavelength that is defined by the emission spectrum of the target metal (e.g., λ = 1.54 A˚ for the commonly used CuKα L → K transition). Modern synchrotron radiation facilities provide X-ray sources that have much higher brilliance and allow varying the selected wavelength. The availability of these powerful radiation sources contributed vitally to the success of structural determination projects in recent years (12,13). The electrons of the individual atoms of a molecule diffract X rays. Whereas for optical light, a focused image of the object can be produced from the diffracted light using optical lenses, this is not possible for diffracted X rays. On one hand, there are no lenses available that can be used to focus X rays, and on the other, the interaction between X rays and a molecule is weak, so that only a tiny fraction of the primary beam is diffracted. Therefore the diffracted radiation from a single molecule is too weak to be detected. The experimental solution for this dilemma is to examine the diffraction of a crystal, which contains a large number of highly ordered molecules in the same or a small number of orientations. The diffracted radiation from of all these molecules sums to strong detectable Xray beams. Crystallographers can compute the image of the molecule, from the directions, intensities and phases of these diffracted X-ray beams, thereby simulating an X-ray lens. Crystals (Fig. 7) are arrays of so-called unit cells (in simple cases they contain one molecule) that build a three-dimensional crystal by periodic repetition in three dimensions. Only part of the crystal volume is occupied by protein molecules. Indeed, protein crystals contain 30–70% water (14,15). All handling of protein crystals has to be done in a humid atmosphere to preserve a nearly native environment of the protein. Modern X-ray
Figure 7. Photo of a protein crystal of recombinant histidine ammonia lyase from Pseudomonas putida using polarized light. The size of the crystal is ca. 250 × 250 × 700 µm3 .
697
diffraction facilities use nitrogen gas at about 100 K to flash-cool the protein crystals and preserve them during diffraction data collection. X-ray beams diffracted by a crystal form a highly regular pattern (Fig. 8), because for a given orientation of the crystal relative to the primary beam, constructive interference of the diffracted radiation is only observed at distinct angles. A simple geometric explanation for this relation is given by Bragg’s law [Eq. (1), Fig. 9a], nλ = 2d sin ,
(1)
where diffraction by a crystal is described as reflection of the beam by a stack of parallel planes with an interplanar spacing of d (Fig. 9b). In this picture, constructive interference is observed only when the difference in the path length of rays reflected by successive planes is equal to an integral number (n) of wavelengths (λ) of the impinging X rays. Each unit cell can be described by three cell edges (a, b, c) and the angles between them (α, β, γ ). The symmetry and geometry of the observed diffraction pattern are related to the symmetry and unit-cell dimensions of the diffracting crystal. Each of the measured reflections corresponds to a set of parallel planes and can be indexed by three small integral numbers h, k, and l, called Miller indexes, that indicate the number of parts into which this set of planes cuts the edges (a, b, c) of each unit cell (Fig. 9b). In the Bragg model, (h, k, l) can be interpreted as a vector perpendicular to the set of planes that is in reflecting position. The goal of crystallographic data collection is to measure the diffracted radiation for all possible orientations between the crystal and the impinging beam.
Figure 8. X-ray diffraction pattern of the same protein as in Fig. 7. The positions of the reflections show a highly regular pattern, but the intensities vary significantly.
698
IMAGING SCIENCE IN BIOCHEMISTRY Incoming beam
The wavelength is that of the X-ray source, the amplitude F(hkl) is amenable to experiment as the square root of the intensity of the diffracted radiation. Unfortunately, the phase ϕ can not be measured directly but has to be determined for each reflection h, k, l. This is called the ‘‘phase-problem’’ of crystallography. Due to the complexity of the subject, only a short overview of phase determination methods can be given in this chapter. For a more detailed discussion, see Refs. 11, 16 and 17. In crystallography of small molecules, phases can be determined computationally from the measured amplitudes directly, but these ‘‘direct methods’’ are not applicable to large protein molecules due to the high number of atoms (18,19). Three methods have been widely used to determine phases in protein crystallography.
Diffracted beam
nl q
q
d
d
y
d sinq
(010)
(010)
d sinq
The Heavy Atom Method (Multiple Isomorphous Replacement, MIR)
1 0) ( 2
b
(100)
x
a
Figure 9. (a) Bragg’s law is a simple geometric model, that describes X-ray diffraction as a reflection of a beam by a stack of parallel planes that have an interplanar spacing of d. (b) Different stacks of equivalent parallel planes of atoms in a two-dimensional section of a crystal lattice. Dots symbolize identical positions in the crystal (e.g., the position of a molecule). A set of parallel planes is identified by three indices hkl, indicating the number of parts into which this set of planes cuts the edges a, b, c of each unit cell.
Anomalous Scattering Anomalous Dispersion)
Electron Density as a Fourier Series The symmetry of a diffracting crystal is responsible for the symmetry of the observed diffraction pattern, but the content of the unit cell determines the intensities of the diffracted radiation. X-ray diffraction takes place by interaction with the electrons of individual atoms. The electron density distribution within the unit cell (reflecting the position of the atoms and therefore the structure of the molecule) can be described by a Fourier series: ρel (x, y, z) =
i F(hkl) · e−2π i(hx+ky+lz) , V h
k
(2)
l
where ρel (x, y, z) represents the electron density at a position x, y, z in the unit cell, V is the volume of the unit cell, and h, k, l are the reflection indexes mentioned before. Equation (2) connects the observed diffraction pattern with the electron density, which is the molecular image of the molecule in the form of a three-dimensional contour map. F(hkl), called ‘‘structure factor,’’ represents the diffracted X-ray beam and is a periodic function that consists of wavelength, amplitude, and phase (Eq. 3): F(hkl) = F(hkl) · eiϕhkl .
Each atom in the unit cell contributes to all observed reflective intensities. The basic principle of the MIR method is collecting diffraction data of several (multiple) crystals of the same protein that share the same crystal properties (isomorphous) but differ in a small number of heavy atoms. The experimental approach is normally to soak protein crystals with dilute solutions of heavy metal compounds (e.g., mercury or platinum derivatives) that often bind specifically to certain protein residues. These additional atoms cause a slight perturbation of the diffraction intensities. To achieve a perturbation large enough to be measured correctly, the added atoms must diffract strongly; so elements with a high number of electrons (heavy atoms) are used (diffraction intensity increases with the atomic number). The differences in the reflective intensities can be used to locate the positions of the heavy atoms within the unit cell, which allows a first estimate of phases.
(3)
(MAD,
Multiple
Wavelength
The MAD method is based on the capacity of heavy atoms to absorb X rays of a specific wavelength. Near the characteristic absorption wavelength of the heavy atoms, the diffraction intensities of symmetry-related reflections (Friedel pairs, h, k, l and −h, −k, −l) are no longer equal. This effect is called anomalous scattering. The characteristic absorption wavelengths of typical protein atoms (N, C, O) are not in the range of the X rays used in protein crystallography and therefore do not contribute to anomalous scattering. However, the use of synchrotron Xray sources with adjustable wavelengths allows collecting diffraction data under conditions where heavy atoms exhibit strong anomalous scattering. In practice, several diffraction data sets are collected from the same protein crystal at different wavelengths. The locations of the heavy atoms can be determined from the small differences between the Friedel pairs, and the initial phases of the native data can be estimated.
Molecular Replacement In some cases, it is known that the structure to be determined is very similar to another structure that has
IMAGING SCIENCE IN BIOCHEMISTRY
already been solved experimentally. For example, this could be a homologous protein from another organism or a mutant of this protein. Then, the phases computed from the known protein structure (phasing model) can be used as initial estimates of the phases of the unknown protein. Model Building and Refinement As mentioned earlier, the primary experimental result of a crystallographic structure determination is a threedimensional electron density map, computed (Eq. 2) from the measured reflection intensities and the initial phases. Unfortunately, in most cases this initial three-dimensional image of the molecule is not detailed and accurate enough to identify atomic details. If the initial phase estimates are sufficiently good, the map will show gross features of the molecules, like continuous chains of electron density along the protein backbone. Then, knowing the chemical structure of the molecule (i.e., the sequence of amino acid residues), the three-dimensional conformation has to be modeled so that it accords with the observed electron density. Interactive computer programs that support hardware stereo display (see earlier) are used to build the first molecular model by placing the parts of the molecule manually into the observed map (map fitting) (Fig. 10a). This initial model represents only a rough cartoon of the real molecular structure and explains the observed diffraction intensities only partially. Computational methods are used to modify the atomic parameters of the model to fit the observed diffraction data better. The process of improving the atomic model to reflect the observed data is called crystallographic structural refinement. Besides the observed experimental data, most refinement programs use a set of idealized chemical parameters derived from structures solved experimentally at very high resolution (20). By restricting the deviations of the modeled bonds and angles from these idealized values, a stereochemically and conformationally reasonable model is ensured. This is especially important when only a small amount of diffraction data is available and the number of observations (reflections) is small compared to the number of parameters (coordinates of atoms) to be refined. The progress of the refinement procedure is monitored by comparing the measured structure factor amplitudes |Fobs | with amplitudes computed from the current model |Fcalc |. When converging to the correct molecular model, the measured F’s and computed F’s should also converge. The primary measure for this is the residual index, or R factor: ||Fobs | − |Fcalc || . (4) R= |Fobs | R factors range between zero for a perfect match between model and diffraction data to 0.6 for a diffraction data set compared to a set of random amplitudes. Commonly, R factors of refined protein structures are in the range of 0.2 for data at 2.5 A resolution, but R factors below 0.1 can occasionally be achieved for very high resolution data. The term ‘‘resolution’’ in X-ray crystallography, unlike in microscopy, refers only to the amount of data used during structural determination. A resolution of 2 A˚ means that diffraction data from planes whose interplanar spacing is as low as d = 2 A˚ (Fig. 9) has
699
been included in the refinement procedure. Actually, for protein structures refined at 2 A˚ resolution, the precision ˚ in of atom locations lies in the range of 0.15–0.40 A; comparison, a carbon–carbon bond is approximately 1.5 A˚ long. However, this is only an estimate for the average positional error resulting from the limits of data accuracy. Note that a crystallographic model represents the average of all molecules that diffracted X rays during the data collection. There are also two major physical reasons for uncertainty in crystal structures: dynamic disorder, or thermal vibrations of atoms around their equilibrium positions, and static disorder, in which equivalent atoms do not occupy the same positions in different unit cells (Fig. 11). Therefore, the model description contains an occupancy value nj (ranging from 0.0–1.0) describing static disorder, and a temperature factor Bj , reflecting the dynamic disorder or thermal motion of atom j for each atom position j. B factors are provided in [A˚ 2 ] and are related to the mean square displacement {uj2 } from its rest position (Eq. 5). Bj = 8π 2 {uj 2 }. (5) Thermal factors in protein structures, usually vary from 5–35 A˚ 2 for main-chain atoms and from 5–60 A˚ 2 for sidechain atoms. B values greater than 60 A˚ 2 may indicate disorder or errors in the protein structure. Structure Databases Most experimentally solved macromolecular structures are deposited in the Protein Data Bank [(PDB) (); URL: http://www.rcsb.org] or the Nucleic Acid Database NDB (URL: http://ndbserver.rutgers.edu), and can be retrieved from their Internet sites. Structures of small molecules are usually collected at the Cambridge Crystallographic Data Centre (URL: http://www.ccdc.cam.ac.uk/). As of 26 June 2001, the PDB contained 13,836 different database entries that consist of 11,630 structures solved by diffraction, 1918 by NMR, and 288 theoretical models. However, bear in mind that numerous proteins have several very similar entries in the PDB, such as structures elucidated at different resolutions, in the presence or absence of ligands, in mutated forms, and so forth. Although these data on similar structure provide useful information about protein families or mode of function, they do not contribute to the diversity of elucidated proteins. Consequently, the number of distinct protein structures is approximately one-fourth of the number of entries. REPRESENTATION OF BIOLOGICAL MACROMOLECULES As explained in the previous section, proteins are not directly observable. Therefore, it should be emphasized that all pictures are abstractions based on the 3-D coordinates of the model and are not direct images of the molecule. Those images also give a static view of the molecule. From comparative analysis of protein structures in different states (for example, with or without bound substrate) and from NMR experiments, it is known that certain proteins can undergo significant conformational changes. See Ref. 21 for a discussion of protein motions.
700
IMAGING SCIENCE IN BIOCHEMISTRY (a)
(b)
Figure 10. Stereo pictures of small sections of three-dimensional electron density maps of the same protein structure [PDB entry 1B8F (41)] at different resolutions. The term ‘‘resolution’’ in crystallography, unlike in microscopy, refers only to the amount of data used during structural determination. Note that these maps are the only experimentally derived images of the molecule. All further representations are based on models derived from these direct observations. (a) Initial experimental map derived by multiple isomorphous replacement (MIR) at 3.3-A˚ resolution. (b) Refined protein model surrounded by a density map at 2.0-A˚ resolution (c) and at 1.1-A˚ resolution.
Backbone Representations The first level of information about a protein is encoded in its amino acid sequence, also referred to as its primary structure. Amino acid sequence is directly dictated by the sequence of gene coding for the protein (Figs. 2
and 3). The backbones of all amino acids (except for proline) have alternating hydrogen donor and acceptor capability. This allows a second level of organization, called the secondary structure (Fig. 12), which consists of a repetition of a basic motif where the backbone forms a regular pattern of hydrogen bonds and dihedral angles
IMAGING SCIENCE IN BIOCHEMISTRY
701
(c)
Figure 10. (Continued)
30Å
and (see Fig. 3a) adopt similar values for successive residues. It is noteworthy that those structures were predicted as possible stable folding units by Linus Pauling long before they were actually observed (22). The first
Figure 11. Two-dimensional section of a crystal lattice of histidine ammonia lyase from Pseudomonas putida [PDB identifier 1B8F (41)]. For simplification, only the backbone of the protein chain is represented as a so-called Cα plot, where the positions of the Cα atoms of individual amino acid residues are connected by a pseudobond. To give a better idea of the repetitive unit, one of the molecules is highlighted in dark gray. The unit cell of the crystal is indicated as a rectangle.
protein whose structure was elucidated (23) [myoglobin; PDB entry 1MBN (24)] was indeed constituted from an assemblage of α helices; the backbone of 70% of its residues adopted an α conformation.
702
IMAGING SCIENCE IN BIOCHEMISTRY
(a)
(a)
3Å
(b)
5Å
Figure 12. Secondary structural elements extracted from an immunoglobulin binding protein [PDB entry 1IGD (42); see also Fig. 13]. (a) α helix. This structure is stabilized by hydrogen bonds between the amide nitrogen group of each residue (i) and the amide carbonyl group of the residue (i + 4). (b) β sheet composed of three strands. The top two strands are parallel because their C-terminal ends point in the same direction, whereas the bottom two strands are antiparallel because their C-terminal ends point in opposite directions. As in the α helix, the β-sheet structure is stabilized by a network of hydrogen bonds between amide nitrogen and carbonyl groups.
On average, secondary structure accounts for about 60% of protein structure. Although some of the remaining residues may be disordered and adopt flexible conformations, the majority of residues remain in a fixed conformation for which the backbone dihedral angles and do not adopt similar values for successive residues (see Ref. 25 for a detailed discussion of secondary structures). Because secondary structure elements are essentially repetitive, they allow a simplified representation that defines the overall fold and topology of the protein. In its crudest form, this can be an α-carbon trace, where pseudobonds that link α carbons of successive residues are drawn instead of the backbone (Fig. 11). More elaborate representations, called ribbons, do not plot any backbone atoms, but rather use them to guide a spline curve that illustrates the overall topology more smoothly (26). Solid representations of this curve (Fig. 13a) that have arrows at the end of each secondary structure element allow the viewer to identify the topology of a protein. However, even those simplified drawings tend to be difficult to follow for very large proteins, and even more abstract representations of the protein topology are useful. An example is given in Fig. 13b, where each secondary structure element is reduced to a circle or a triangle. Assigning a unique secondary structure to a stretch of residues is somewhat arbitrary because the boundaries between secondary structures depend largely on interpretation. Hence, different individuals or programs may assign different secondary structures to the same residue. A further complication is that secondary structures (especially β strands) are frequently (if not always) distorted compared to a ‘‘canonical’’ (idealized) structure. Therefore, assigning a secondary structure for a residue does not capture the subtle variations (tilt, bend, torsion)
(b)
C
N
Figure 13. Two different representations of the topology of an immunoglobulin binding protein [PDB entry 1IGD (42)]. (a) Ribbon representation, in which each secondary structure element is colored differently using a color gradient from blue (N terminal) to red (C terminal). (b) Further simplification of the topology, where each secondary structural element is represented by symbols connected by a line representing the polypeptidic chain. Circles symbolize α helices and triangles β sheets. Lines to the centers of figures connect in front; lines to the edges of figures connect in back. See color insert.
that are visible when secondary structure elements are compared. Of course, this has important implications for secondary structure prediction from the primary sequence alone. Domains and Fold Families The arrangement of all secondary structure elements of a protein, which is essentially the way the polypeptidic chain is folded, is named the tertiary structure, or fold. Proteins elucidated so far adopt a limited number of folds. Several fold classifications based on different methods coexist [SCOP (27), CATH (28), Vast/MMDB (29), Dali/FFSP (30)]. They all capture a slightly different aspect of the folding space. Therefore, it is difficult to estimate the total number of different folds that account for the proteome (the set of all expressed proteins for a given organism or tissue), and although estimates vary from hundreds to a few thousands (31,32), a recent analysis predicts that about 650 folds are enough to account for the total proteome (33). However, our current view of the fold space is likely to be biased because of an overrepresentation of certain common folds in the database, possibly those for which crystals could be obtained. Nevertheless, the total number of distinct folds in an organism is much lower than the total number
IMAGING SCIENCE IN BIOCHEMISTRY (a)
(b)
(c)
703
sequence level generally indicate high-level functional similarities, and hence fold recognition can be used to assist functional assignment of newly discovered proteins. However, this is not always the case because folds are not always linked to protein function. Examples of proteins that have the same function but different folds are known, as well as examples of protein that have a different function but share the same fold (Fig. 14). Most of the proteins folds can be further subdivided into domains (Fig. 15), that are independently folding units that often provide different functions (such as regulatory or binding units). One can look at domains as building blocks of larger proteins. As with folds, there are frequently alternative ways of splitting proteins into domains or recurrent secondary structure elements (also commonly referred to as supersecondary structures) (35). It is noteworthy that proteins frequently assemble in larger functional units called quaternary structures. Dimers, trimers, or even larger oligomers of identical (homo) or different (hetero) protein chains frequently occur. These functional units are distinct from the unit cell observed during protein crystallization, which may represent only part of the biologically relevant structure or multiple copies of the structure (36). Displaying Atomic Interactions Many proteins are enzymes, that is, proteins that catalyze reactions as diverse as the digestion of starch or modification of the activity of other proteins through phosphorylation. Therefore, it is not surprising that scientists are interested in the specific region of the protein responsible for the catalysis, the so called active site. Indeed, the analysis of interactions between an active site and its natural substrate enables computational chemists to design small molecules that could alter enzyme activity. Then, such compounds could be further refined and turned into medicines to cure disease. Furthermore, it is expected that one can design enzyme variants to
Figure 14. Illustration that protein fold and function are not correlated. (a) Soybean (Glycine max.) beta amylase [EC 3.2.1.2; PDB entry 1BYA (43)] adopts a TIM-barrel fold (left); Aspergillus awamori 1,4-α glucosidase, [EC 3.2.1.3; PDB entry 1DOG (44)] adopts an α−α toroid fold (right). Both proteins hydrolyze glucoside linkages. (b) Trypsin inhibitor [1TIE (45)] and interleukin-1 receptor antagonist [1ILR (46)] both belong to the β-trefoil fold family. Nevertheless, their biological functions are different. (c) Proteins sometimes evolve new functions: the avian eye lens protein δ2-crystallin [right, 1DCN (47)] is closely related to the enzyme argininosuccinate lyase [left 1AOS, (48)], which is involved in the urea cycle. Even though δ-crystallin has no enzymatic function in the eye lens, it still can bind the former substrate argininosuccinate in vitro.
of distinct proteins. This has important implications for protein structure prediction: it is expected that the computational prediction of protein folding can be replaced by fold recognition. Indeed, it was recognized early that proteins whose primary structures are similar also share the same fold (34). Consequently, similarities at the
Figure 15. Porcine fibroblast (interstitial) collagenase [PDB entry 1FBL (49)] consists of two domains (blue for the catalytic N-terminal domain, red for the C-terminal domain). The fragment of the peptide chain colored in green can be considered a connector between the two domains. Because the ribbon representation can mislead the reader in thinking that the two domains are widely separated, a transparent molecular surface (see section ‘‘derived properties’’) has been added to demonstrate that the two domains are actually in close contact. Note that, for clarity, only one peptide chain is represented here, although the biological functional unit is a homodimer. See color insert.
704
IMAGING SCIENCE IN BIOCHEMISTRY
improve on a given property such as stability under extreme conditions (washing powders), faster catalysis (higher yields in industrial production processes), or even the capacity to catalyze new reactions. These applications frequently require comparing different active sites, for which various representations or abstractions are helpful. Because several amino acids of the protein interact with several atoms of the compound to analyze, even stereoscopic representations become cumbersome (Fig. 16a). Therefore, schematic representations have been developed to help capture at a glance the interactions between a protein and its bound ligand (Fig. 16b). Derived Properties Proteins frequently bind to other large macromolecules such as DNA (e.g. to regulate gene expression), RNA (ribosome), or other proteins (receptors). Therefore, scientists are interested in predicting which proteins or DNA sequence can interact with a specific protein and also where the putative interactive zone is located. Prediction of binding commonly involves computing protein surfaces because it helps to grasp the overall shape of a protein. Although van der Waals surfaces (CPK atom representation) (Fig. 3c) give a crude idea
of the protein shape, they do not really represent the actual shape complementation that is possible between two molecules. Indeed, taking the inverse of this representation (Fig. 17a) does not reflect the space actually available for atoms of other molecules but only the spatial location where an infinitely small atom or water molecule could be placed. Because water molecules ˚ some parts of the have an approximate radius of 1.4 A, molecular surface are actually out of reach of the solvent (in narrow crevices, for instance). Hence, two other surface computational methods that address this problem have been developed (37,38): the solvent accessible surface and the molecular surface (Fig. 17b). Analytical methods and numerical approximations (using rectangular grids) are available for both. The accessible surface is generated by adding the average radius of the solvent molecule to the radius of each atom, and by plotting each point that is not inside another such atom that has an increased radius. This surface represents the limit beyond which the center of an atom cannot be placed without penetrating into the protein; such a representation can be useful for interactive docking of compounds. Computing a molecular surface is equivalent to rolling a solvent molecule over the protein atoms. It will be considered automatically that the regions that are not accessible to
(a)
Ser217
Ser217 3.12
3.12 2.92
2.92 Gly216
Gly216
Gly250
Gly193 3.03
Gly193
Trp252 Ala251
2.69 2.33
Gly250
3.15
3.15
3.00
3.03
Trp252 Ala251
2.69
2.33
3.00 3.13
3.13 Ser195
Ser195
Ser214
Ser214
2.62 2.70
2.62 2.70
His57 2.95 Asp102
His57 2.95 Asp102
Figure 16. Two representations of a serine protease active site [PDB entry 8GCH (50)]. (a) Stereoscopic view allowing a stereochemically correct representation of the atoms in the active site. Hydrogen bonds between the enzyme and the substrate are represented by green dotted lines. (b) Schematic representation of the same active site. See color insert.
IMAGING SCIENCE IN BIOCHEMISTRY
705
(b)
Figure 16. (Continued)
the solvent still belong to the protein, and the threedimensional surface patch can be considered a boundary between the protein interior and the solvent. Both surfaces can be rendered transparently so that identification of structural elements beneath the surface is possible (Fig. 17c). Electrostatic Potentials All of the properties one wishes to observe or compute (39) are ultimately linked to the atomic coordinates deduced from X-ray analysis, NMR experiments, or even theoretical models because they depend on the overall arrangement of atoms in the molecule. However, such properties do not need to be projected directly on top of atomic coordinates or bonds, as was illustrated in the previous section for molecular surface. Another example of property not mapped directly onto atoms, but still calculated from their positions, is the electrostatic potential of a protein. Although coloring amino acids or surface patches by amino acid charge (positive in blue, negative in red) can already give an indication of the protein’s polarity, this does not allow to draw any conclusion regarding the fine location and intensity of the electrostatic potential generated by the protein. The ability to generate 3D isopotential contours around a protein permits the
viewer to visualize how a potential extends away from a protein (Fig. 18) to influence the approach of substrates or other ligands. Properties such as electrostatic potential can also be mapped on the molecular surface to help explain how and why substrates interact in an active site (Fig. 19). Comparing potentials between enzymes of the same family can also give useful insights regarding their activity. Other Color-Coded Properties It was mentioned earlier that amino acids properties could be color coded (Fig. 19), for example to locate clusters of charges quickly at the protein surface or to verify that no charged residue is in direct contact with the hydrophobic core during protein modeling. The principle of coloring objects by a property of interest is quite general. It is applied, for example, quite extensively to geographic maps where the basic properties of countries (borders) are retained and a third parameter, such as density of population, is color coded and captured on the map. The same method can, of course, be applied to proteins. In this case, the basic property that is kept is the atomic coordinates (topology), and a fourth dimension (color) is used to represent additional parameters. Such parameters can be the similarity
706
IMAGING SCIENCE IN BIOCHEMISTRY
CONCLUSION
(a)
(b)
Accessible surface
Solvent atom
r = 1.4 Å
Protein atom
Molecular surface
Protein atom
Protein function depends on the three-dimensional conformation that amino acids residues adopt in a folded protein. This explains why considerable effort goes into elucidating protein structures. Several methods to extract, derive, and simplify information from protein structure have been developed. Proteins have been analyzed in terms of oligomeric interactions, surface properties, domains, secondary structure elements, backbone conformations, and side-chain properties. In summary, various levels of visualizing a protein are necessary, depending on the research application. However, bear in mind that these images are generated (as opposed to observed) to represent the specific theoretical properties of a molecule. Due to the advent of large-scale sequencing projects, far more data are available on protein sequence than on protein structure (crystallographic or NMR data). Protein structure prediction methods are essential to bridge this gap. These methods must rely on the information already known, and therefore rely strongly on the visualization methods mentioned in this chapter. Software
(c)
Figure 17. (a) Schematic representation of the van der Waals surface. Left: space actually occupied by the atoms of a molecule; right: space theoretically available for solvent atoms, although some crevices are actually too narrow to accommodate solvent atoms. (b) Schematic definition of the molecular surface and of the accessible surface (see text). (c) practical illustration of the difference between the accessible surface (dotted) and the molecular surface (transparent) of a small peptide (wire frame).
between two proteins (rms deviation in distances between corresponding atoms), the crystallographic B factor, the standard deviation between similar atoms in multiple NMR solutions, or the degree of amino acid conservation for similar proteins. The last proves particularly useful in locating potential active sites, binding pockets, or regions involved in protein–protein interactions. Indeed, when several proteins are compared, residues conserved among the whole family usually indicate well what had to be conserved throughout evolution to maintain a specific function.
Numerous software packages (53–64) currently available can be used to produce the kind of images described in this chapter. The following list is by far not complete, but as a starting point, it provides some programs that are freely available to the scientific community and that were used to prepare images for this article. • GRASP (53) is a molecular visualization and analysis program. It is particularly useful for displaying and manipulating the surfaces of molecules and their electrostatic properties. URL: http://transfer.bioc.columbia.edu/grasp/ • Ligplot (54) automatically generates schematic diagrams of protein–ligand interactions from the 3-D coordinates. It can also generate plots of residue interactions across a dimer or domain–domain interface. URL: http://www.biochem.ucl.ac.uk/bsm/ligplot/ ligplot.html • MOLMOL (55) is a molecular graphics program for displaying, analyzing, and manipulating the threedimensional structure of biological macromolecules, specially emphasizing the study of protein or DNA structures determined by NMR. URL: http://www.mol.biol.ethz.ch/wuthrich/software/molmol/ • MolScript (56) is a program for displaying molecular 3-D structures, such as proteins, in both schematic and detailed representations. URL: http://www.avatar.se/molscript/ • MSMS (57) Michel Sanner’s Molecular Surface package URL: http://www.scripps.edu/pub/olson-web/people/sanner/
IMAGING SCIENCE IN BIOCHEMISTRY
707
Figure 18. Electrostatic potential of the superoxide dismutase [PDB entry 2SOD (51)] contoured at +1kB T/e (blue) and −1kB T/e (red), respectively. Because the functional unit is a dimer, two active sites (located at the positive electrostatic potential) are present. The isocontoured surfaces are rendered transparently to see the protein backbone, which is represented as a worm.
Figure 19. Electrostatic potential of a protein kinase [PDB entry 1YDR (52)] mapped onto its molecular surface. Negative and positive surface patches are colored in red and blue, respectively. A peptide inhibitor (wire frame) is bound to the protein. Amino acids that bear a negative or positive charge are colored in red or blue, respectively. See color insert.
• MSP Michael Connolly’s Molecular Surface Package (see references 37 and 38). URL: http://www.biohedron.com/ • O (58) is a general purpose macromolecular modeling environment developed by Alwyn Jones and Morten Kjeldgaard. The program is designed for scientists who need to model, build, and display macromolecules. O is used mainly in the field of protein crystallography. URL: http://origo.imsb.au.dk/∼mok/o/ • POV-Ray The Persistence of Vision Raytracer is a high-quality tool for creating three-dimensional graphics. It can be used to render pictures
composed in other programs such as SwissPdbViewer (Deep View). URL: http://www.povray.org/ • RasMol (59) is a free scriptable software written by Roger Sayle for viewing molecular structures. RasMol stimulated broad interest in molecular visualization for structural biology and education. URL: http://www.umass.edu/microbio/rasmol/ • Raster3D (60) is a set of tools for generating highquality raster images of proteins or other molecules. It can also be used to render pictures composed in other programs such as Molscript in 3D with highlights, shadowing, etc.
708
IMAGING SCIENCE IN BIOCHEMISTRY
URL: http://www.bmsc.washington.edu/raster3d/ • Swiss-PdbViewer (DeepView) (61,62) provides a user-friendly interface to display and analyze several proteins simultaneously. It also provides a scripting language and various tools for structural analysis and model building. URL: http://www.expasy.org/spdbv/ • TOPS (63) computes two-dimensional schematic representations of protein folds, so-called protein topology cartoons. URL: http://www3.ebi.ac.uk/tops/ • WHAT IF (64) is a versatile protein structural analysis program that can be used for mutant prediction, structure verification, molecular graphics, etc. URL: http://www.cmbi.kun.nl/whatif/ Acknowledgments The authors thank Roman Laskowsky for providing the LigPlot output, Mathias Baedeker for the diffraction pattern and electron density maps, and the reviewers for carefully reading the manuscript and excellent suggestions.
ABBREVIATIONS AND ACRONYMS CPK DNA LCD MAD MIR NMR rmsd RNA PDB
corey-pauling-koltun; a way of representing molecules deoxy ribonucleic acid liquid crystal display multiple wavelength anomalous dispersion multiple isomorphous replacement nuclear magnetic resonance root mean square deviation ribonucleic acid protein data bank
BIBLIOGRAPHY 1. T. Walz, T. Hirai, K. Murata, J. B. Heymann, K. Mitsuoka, Y. Fujiyoshi, B. L. Smith, P. Agre, and A. Engel, Nature 387, 624–627 (1997). 2. M. H. Stowell, A. Miyazawa, and N. Unwin, Curr. Opinion Struct. Biol. 8, 595–600 (1998). 3. I. Branden and J. Tooze, Introduction to Protein Structure, Garland publishing, Inc., New York & London, 1998. 4. B. Alberts, D. Bray, J. Lewis, M. Raff, K. Roberts, and J. D. Watson, Molecular Biology of the Cell, Garland publishing, Inc., New York & London, 1989. 5. L. Lehninger, D. L. Nelson, and M. M. Cox, Principles of Biochemistry, W. H. Freeman and Company, New York, 1993. 6. L. Stryer, Biochemistry, W. H. Freeman and Company New York, 1995. 7. D. Voet, J. G. Voet, and C. A. Pratt, Fundamentals of Biochemistry, John Wiley & Sons, Inc., NY, 1998. 8. H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, and P.E. Bourne, Nucleic Acids Res. 28, 235–242 (2000). 9. D. G. Reid, ed., in Methods in Molecular Biology 60, Humana Pr. ISBN 0-89603-309-0, 1997. 10. H. N. Moseley and G. T. Montelione, Curr. Opinion Struct. Biol. 9, 635–642 (1999). 11. G. Rhodes, Crystallography Made Crystal Clear, Academic Press, San Diego, 1993, 2000.
12. M. A. Walsh, G. Evans, R. Sanishvili, I. Dementieva, and A. Joachimiak, Acta Crystallogr. D Biol. Crystallogr. 55, 1,726–1,732 (1999). 13. M. A. Walsh, I. Dementieva, G. Evans, R. Sanishvili, and A. Joachimiak, Acta Crystallogr. D Biol. Crystallogr. 55, 1,168–1,173 (1999). 14. B. W. Mathews, J. Mol. Biol. 33, 491–497 (1968). 15. B. W. Mathews, in H. Neurath and R. L. Hill, eds., The Proteins, 3rd ed., Academic Press, NY, 1977, pp. 404–590. 16. C. Giacovazzo, H. L. Monaco, D. Viterbo, F. Scordani, G. Gilli, G. Zanotti, and M.Catti, Fundamentals of Crystallography, International Union of Crystallography, Oxford University Press, Oxford, UK, 1992. 17. J. Drenth, Principles of Protein Crystallography, Springer, New York, 1994. 18. G. M. Sheldrick, J. Mol. Struct. 130, 9–16 (1985). 19. G. M. Sheldrick and T. R. Schneider, R. M. Sweet and C. W. Carter Jr., eds., SHELXL: High Resolution Refinement, Methods in Enzymology, Academic Press, Orlando, FL, 277, 1997, pp. 319–343. 20. R. A. Engh and R. Huber, Acta Crystallogr. A47, 392–400 (1991). 21. M. Gerstein and W. Krebs, Nucleic Acids Res. 26, 4,280–4,290 (1998). 22. L. N. Pauling, The Nature of the Chemical Bond, Cornell University, Ithaca, NY, 1939. 23. G. C. Kendrew, The Three-Dimensional Structure of a Protein Molecule, Sci. Am. 205, 96 (1962). 24. H. C. Watson, Prog. Stereochem. 4, 299 (1969). 25. J. S. Richardson, Adv. Protein Chem. 34, 167–339 (1981). 26. M. Carson, J. Mol. Graphics. 5, 103–106 (1987). 27. A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia, J. Mol. Biol. 247, 536–540 (1995). 28. C. A. Orengo, A. D. Michie, D. T. Jones, M. B. Swindells, and J. M. Thornton, Structure 5, 1,093–1,108 (1997). 29. J. F. Gibrat, T. Madej and S. H. Bryant, Curr. Opinion Struct. Biol. 6, 377–385 (1996). 30. L. Holm and C. Sander, Nucleic Acids Res. 26, 316–319 (1998). 31. C. Chothia, Nature 357, 543–544 (1992). 32. C. A. Orengo, D. T. Jones, and J. M. Thornton, Nature 372, 631–634 (1994). 33. Z. X. Wang, Protein Eng. 11, 621–626 (1998). 34. C. Chothia and A. M. Lesk, EMBO J 5, 823–826 (1986). 35. I. N. Shindyalov and P. E. Bourne, Proteins: Struct. Function Genet. 38, 247–260 (2000). 36. K. Henrick and J. M. Thornton, Trends Biochem. Sci. 23, 358–361 (1998). 37. M. L. Connolly, Science 221, 709–713 (1983). 38. M. L. Connolly, J. Appl. Crystallogr. 16, 548–558 (1983). 39. A. R. Leach, Molecular Modelling, Principles and Applications, Addison-Wesley, Harlow, UK, 1996. 40. G. J. Kleywegt, T. Bergfors, H. Senn, P. Le Motte, B. Gsell, K. Shudo, and T. A. Jones, Structure 2, 1,241–1,258 (1994). 41. T. F. Schwede, J. Retey, and G. E. Schulz, Biochemistry 38, 5,355–5,361 (1999). 42. J. P. Derrick and D. B. Wigley, J. Mol. Biol. 243, 906–918 (1994). 43. B. Mikami, M. Degano, E. J. Hehre, and J. C. Sacchettini, Biochemistry 33, 7,779–7,787 (1994).
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY 44. E. M. Harris, A. E. Aleshin, L. M. Firsov, and R. B. Honzatko, Biochemistry 32, 1,618–1,626 (1993). 45. S. Onesti, P. Brick, and D. M. Blow, J. Mol. Biol. 217, 153–176 (1991). 46. H. A. Schreuder, J. M. Rondeau, C. Tardif, A. Soffientini, E. Sarubbi, A. Akeson, T. L. Bowlin, S. Yanofsky, and R. W. Barrett, Eur. J. Biochem. 227, 838–847 (1995). 47. F. Vallee, M. A. Turner, P. L. Lindley, and P. L. Howell, Biochemistry 38, 2,425–2,434 (1999). 48. M. A. Turner, A. Simpson, R. R. McInnes, and P. L. Howell, Proc. Natl. Acad. Sci. USA 94, 9,063–9,068 (1997). 49. J. Li, P. Brick, M. C. O’Hare, T. Skarzynski, L. F. Lloyd, V. A. Curry, I. M. Clark, H. F. Bigg, B. L. Hazleman, and T. E. Cawston et al., Structure 3, 541–549 (1995). 50. M. Harel, C. T. Su, F. Frolow, I. Silman, and J. L. Sussman, Biochemistry 30, 5,217–5,225 (1991). 51. J. A. Tainer, E. D. Getzoff, K. M. Beem, J. S. Richardson, and D. C. Richardson, J. Mol. Biol. 160, 181–217 (1982). 52. R. A. Engh, A. Girod, V. Kinzel, R. Huber, and D. Bossemeyer, J. Biol. Chem. 271, 26 157–26 164 (1996). 53. A. Nicholls, K. Sharp, and B. Honig, Struct. Function Genet. 11, 281 (1991). 54. A. C. Wallace, R. A. Laskowski, and J. M. Thornton, Protein Eng. 8, 127–134 (1995). ¨ 55. R. Koradi, M. Billeter, and K. Wuthrich, J. Mol. Graphics 14, 51–55 (1996). 56. P. J. Kraulis, J. Appl. Crystallogr. 24, 946–950 (1991). 57. M. F. Sanner, A. J. Olson, and J. C. Spehner, Biopolymers 38, 305–320 (1996). 58. T. A. Jones, J. Y. Zou, S. W. Cowan, Kjeldgaard, Acta Crystallogr. A 47, 110–119 (1991). 59. R. A. Sayle and E. J. Milner-White, Trends Biochem. Sci. 20, 374–376 (1995). 60. E. A. Merritt and D. J. Bacon, Methods Enzymol. 277, 505–524 (1997). 61. N. Guex and M. C. Peitsch, Electrophoresis 18, 2,714–2,723 (1997). 62. N. Guex, A. Diemand, and M. C. Peitsch, Trends Biochem. Sci. 24, 364–367 (1999). 63. D. R. Westhead, D. C. Hutton, and J. M. Thornton, Trends Biochem. Sci. 23, 35–36 (1998). 64. G. Vriend, J. Mol. Graphics 8, 52–56 (1990).
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY RICHARD W. VORDER BRUEGGE Federal Bureau of Investigation Washington, D.C.
INTRODUCTION Imaging and imaging technologies play an important role in forensics and criminology. Photographs and other visual recordings represent an efficient means for documenting the condition of a subject at an instant in time. The subjects of such images may include accident and crime scenes, items of evidence in a laboratory, victims, or suspects. Such visual records are easily preserved for later use by investigators, eyewitnesses, forensic scientists,
709
court officials, and juries. The thorough and complete photographic recording of a scene, person, or item of evidence also enables investigators and forensic scientists who were not present when the images were recorded to develop meaningful information long after the person, scene, or item was photographed and may have lost all traces of a prior condition. The recording of this prior condition can be critical in many investigations due to the transient nature of the subjects or scenes. For instance, victims’ wounds can heal, leaving no visible manifestation of the violence done to them. Similarly, a highway that is the scene of an automobile accident cannot be closed indefinitely to preserve the evidence but must be reopened in a timely fashion to permit the flow of traffic. Likewise, a footprint left in snow at a crime scene will not remain once the snow has melted. Photographs may also help eyewitnesses remember relevant facts they might otherwise discount or forget. In other instances, imaging technologies may provide the only means for revealing or recording evidence. Surveillance imagery depicting the robbery of a bank may be used to document the actions of the robbers, which might not otherwise have been witnessed. In the laboratory, forensic imaging techniques may be used to produce a visible record of writings that are not apparent to the naked eye through the selective use of lighting, specialized photographic papers, and filters. Image processing may be used to extract critical details or reduce extraneous noise from an image, thus helping clarify the picture. Finally, images may also be used as the subject of forensic analysis to derive information above and beyond that immediately apparent to the untrained eye. Such analyses may include comparing photographs of a latent impression with fingerprints of a known suspect or determining the height of a robber depicted in a surveillance image. The range of imaging technologies used in law enforcement and forensic applications encompasses a broad field from traditional silver-based film to analog video, and now includes digital still and digital video imaging. Of particular importance to law enforcement today are technologies such as scanners and digitizers (or ‘‘frame grabbers’’) that convert traditional film and analog video imagery to a digital format. Once in a digital format, such images may be analyzed using digital processing techniques. The range of situations in forensics and criminology in which imaging technologies play a role is wide. For simplicity, these situations may be divided into several general categories: field-based photography and video, laboratory-based photography, forensic image processing, and laboratory image analysis. These categories are rarely independent of one another, however, because images acquired in the field or in a laboratory frequently become the subject of forensic image processing or image analysis. FORENSIC FIELD-BASED PHOTOGRAPHY AND VIDEO Images convey information that is not easily recreated through words alone. Photography provides a way of
710
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
recording complex information that is easily shared with others who were not first hand witnesses of a scene or object. Crime scenes, accident scenes, and the victims of such incidents are among the most commonly documented subjects in forensic and criminological imaging. Basic documentation is critical to provide a picture of the end result of the events that are under investigation to those who were not at the scene. The category of ‘‘forensic field-based photography and video’’ is defined herein as that category of forensic imaging in which the subject of the photography is perishable or otherwise unstable in a relative sense. Most of the time, the forensic photographer in the field will have only a short time to capture the scene before the conditions of the scene or the subject of the photography changes and alters its fundamental visual characteristics. For example, if a crime or accident occurs in a public place, authorities will have only a limited amount of time to restrict public access to the area before they must remove victims and evidence and allow public access to the scene. In such cases, the photographer has only one chance to document the subject thoroughly. Once the victims and evidence have been removed, they are not easily returned for further photographic documentation. Basic Crime Scene Photography Forensic photographers encounter a wide variety of situations including accident scenes, murders, fires, bombings, property crimes, and mass disasters. In every case, investigators are interested in finding out what happened, how it happened, and who was responsible. The specific means for photographing a crime scene may vary depending upon the nature of the crime but for the most part involves the same process. One of the primary goals of crime scene photography is to document the scene so that it is as accessible to later viewers as it was to those on the scene. Therefore, forensic photographers attempt to photograph the scene to approximate how an on-site investigator might view the scene. This typically involves a three-step process in which the viewer is provided with a sequence of images that progress from a relative broad view of an entire scene down to a narrow focus on specific items of interest. In every case, attempts should be made to complete the photography before any of the evidence or victims are moved so that the photographs reflect a crime scene that is as close to pristine as possible. The first step involves relatively wide field of view shots of a scene, sometimes referred to as ‘‘establishing views.’’ An entire scene is captured within a single frame so that the viewer can observe the relationship of the scene to the immediate neighborhood. In some cases, for example, if the crime scene is located inside a house, overall shots will be taken from multiple angles or sides of the house to document possible paths of entrance and egress. If the crime scene is an automobile, photographs of each side of the vehicle will be taken before moving to the interior of the vehicle. In a continuation of the wide-shot process, the approach to the crime scene within a building may include a progression of photographs documenting doorways, staircases, and hallways leading to the room or rooms
in which the main crime scene is located. Once at the specific room in which the crime took place or in which the evidence is located, the process of documenting the scene by overall views continues from photographs of the entire room. Such wide-angle views are generally taken from every corner of the room to provide as thorough documentation as possible. Following the wide shots, medium-range shots are taken to help viewers identify the spatial relationships between individual items of evidence. Thus, these are sometimes called ‘‘relational views.’’ This process frequently involves interaction between the photographer and the personnel at a scene who are responsible for investigating the crime. The investigative personnel may indicate to the photographer specific items of evidence in a scene that are suspected of being important to an investigation, such as a possible murder weapon or wounds on a deceased victim. The photographer documents the relationship between the specific item of interest and its surroundings by focusing on the object of interest while also including recognizable objects in the immediate vicinity to help viewers relate the object in question to the overall scene. Once the position of an object of evidence is documented relative to its surroundings, the photographer can proceed to the third level of documentation in which the object of interest is the sole subject of the photographs. Such photographs are called ‘‘close-up views.’’ This process typically involves two steps, photographing the subject as it was found first, then photographing the subject with an identifier and/or scale inserted into the field of view. Exquisite care must be taken in placing the scale relative to the item of evidence as well as in placing and directing the lighting used to illuminate the object of interest. This is due to the fact that in many cases, the photographs taken at this time will be subjected to later forensic analysis to extract additional information that is not immediately apparent to those on the scene. This is especially important for fingerprints, footwear impressions, and tire tread impressions, as discussed later. Macrophotography is frequently used in this third step of the process. In macrophotography, the size of the image on the recording medium is approximately the same size as the object itself. This permits anyone viewing the photograph of the object to detect as much detail as they would if they were viewing the object itself. It also simplifies the process of measuring an object by using the photographs instead of measuring the object itself directly. Equipment for Forensic Field-Based Photography. Unlike many other forensic fields, there are no formally mandated minimum standards for basic crime scene photography. This may be due, in part, to the extreme range of conditions in which crime scene photographers find themselves, as well as the requirements or restrictions that many agencies place on their photographers. Although it was not uncommon for police photographers to use 4 × 5 black-and-white film cameras several decades ago, the most frequently used cameras for major crime scene documentation today are 35-mm single-lens reflex cameras, although some agencies continue to use larger
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
format 2 1/4 film cameras. For lesser crimes, perhaps involving only minor property damage or limited physical harm to a victim, many agencies will limit their photographic documentation to using point-and-shoot or ‘‘instant’’ film cameras or digital still cameras. Both black-and-white film and color films may be used, black-and-white films are preferred for close-up photography in which maximizing the visibility of small details is most critical. A variety of wide-angle, macro-, and zoom lenses should also be available to provide flexibility to the photographer in capturing the area of interest. Finally, a separable, or ‘‘off-camera’’ flash unit is also frequently included in a field photographer’s kit because it is often necessary to provide lighting from a direction other than that of the camera. Many agencies use video documentation to augment their film documentation of crime scenes. However, the limited resolution of video relative to film media makes video alone insufficient for completely documenting major crime scenes and providing the type of forensic photographic evidence that can be most useful in successfully investigating a crime. High-definition video offers a chance to improve on the quality of crime scene video, but higher costs and the current lack of an industry standard (similar to the analog VHS) prevent its widespread use in law enforcement. Nevertheless, video documentation can be very useful in providing ‘‘walk-through’’ views and giving the viewer a sense of the spatial relationships between one part of a scene and other widely separated parts of a scene. In some cases, investigators will use video to document the paths individuals took when walking or driving from one location to another, such as the suspects’ paths from their homes or places of work to the crime scene. A recent development now being marketed to law enforcement agencies are systems in which 360° views are generated from a single position by using multiple cameras. The separate views generated by each camera are related to one another in real time through computer software, making it possible for viewers to generate their own ‘‘virtual walk-throughs’’ of a scene. As for high-definition video systems, the high cost of such systems prevents their widespread distribution as yet, but it is anticipated that they, too, will become a useful tool for law enforcement in years to come. In recent years, as digital still photography has become more widely available to the consumer market, a number of law enforcement agencies have become interested in using this technology to replace traditional film technologies. Unfortunately, the perception among many is that because digital technology is newer than film, it must be better. This is not yet the case. For forensic applications, the single most important factor besides actually producing a photographic record of a subject is the resolution or amount of detail one may detect in an image. Digital camera systems that are comparable in cost to 35-mm film camera systems are not yet available that can match the resolution that is possible using traditional 35-mm film cameras under optimal conditions. A simple calculation demonstrates this. A single frame of 35-mm film measures 36 × 24 mm. The technical specifications for a major film
711
manufacturer’s ISO 100 black-and-white film states a resolution of between 63 and 200 lines per millimeter, depending upon the contrast exhibited by the subject. Using the simplifying assumption that one line may be equated to one-and-a-half pixels, for the worst case film scenario (resolution = 63 lines per millimeter), this translates into an image size of 3,402 × 2,268 pixels, a total of, 7.7 million pixels. This is more than two and a half times the size of the detectors used in 3-megapixel consumer digital cameras (approximately 2,000 × 1,500 pixels), where one megapixel equals approximately one million pixels. Under the best possible conditions for this film (resolution = 200 lines per millimeter), the resulting image size would be 10,800 × 7,200 pixels, a total of 77.8 million pixels — more than 25 times the number of pixels available in a 3megapixel digital camera. Some researchers state that the assumption of one and a half pixels per line is too low and that a value of two pixels per line would be more appropriate. Using this value, a frame of the 35-mm film discussed before would have a size ranging from 4,536 × 3,024 pixels (total 13.7 million pixels) to 14,400 × 9,600 pixels (a total of 138.2 million pixels). Thus, the film would contain 4 to 40 times more pixels than the 3-megapixel digital camera. Regardless of whether one selects a conversion factor of 1.5 or 2 pixels per line, the 35-mm film can record far more detail in a scene than most digital cameras. Larger format film cameras, such as 2 1/4 cameras, use the same type of films that have the same high resolutions, but capture an even larger frame, resulting in even more pixels per picture. Another factor that favors traditional film cameras over current digital cameras is the homogeneous distribution of the light-sensitive silver grains within film compared with the rigid arrangement of the detectors into horizontal and vertical lines in digital cameras. This fixed alignment of detectors within digital cameras reduces the accuracy with which straight lines or edges that are oriented at an angle may be rendered in the final image. Straight lines on an angle will be rendered by jagged edges in a digital image, whereas the silver grains within film can create a more accurate smooth edge for such lines because they are not restricted to a horizontal or vertical grid position. A primary goal of forensic imaging is to provide the most accurate recording possible, so film would again be favored over digital imaging. Although based on some very broad assumptions, these conclusions support the underlying concept that 35-mm film cameras remain the most effective means of capturing forensic images in the field. Besides ignoring the actual contrast sensitivity of digital camera detectors (a consideration that favors digital cameras in the preceding calculations), a number of other factors may lead one to favor 35-mm (and larger format) cameras for forensic field applications. These factors include the greater availability of interchangeable lenses for SLR film cameras, as well as the ability to use off-camera flash units. Such accessories are critical for applications such as close-up photography and latent impression photography. Although such accessories are becoming more widely available for professional digital cameras, they are not yet
712
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
generally available for consumer-grade digital cameras that many agencies are now considering for use in law enforcement. Despite the current limitations described before, digital still imaging is being used by a number of law enforcement agencies to augment film documentation of crime scenes. These digital images can be examined and used immediately by investigators on the scene which is not possible when using traditional 35-mm film images. They may also be transmitted back to one’s office or to other agencies for rapid dissemination to other investigators. Most of the time, these digital images are not intended for use in later forensic analyses but are, instead used as ‘‘recognition-type’’ photographs, as discussed following. Specialized Crime Scene Photography The photographs whose acquisition is described in the preceding section are primarily intended to document the condition of a scene as it was found, so that an accurate representation of the scene and the items within it may be presented to later observers. In many cases, however, specialized photographic techniques are used to acquire evidence for further forensic investigation. The subject of such photographs typically includes latent fingerprints or other impressions left on objects or victims at the scene. The use of fingerprint, tire tread, and footwear impression evidence is described later. For such photographic evidence to have forensic value, it must be acquired in a manner that preserves its utility as a meaningful piece of evidence. To an extent, the difference between basic crime scene photography and specialized crime scene photography comes down to a question of recognition versus identification. Recognition is a subjective process through which individuals perceive that an object or person is the same object or person that was known to them beforehand. Identification involves an objective process through which an object or person is uniquely distinguished from all other objects or individuals based on specific, observable characteristics. A primary purpose of basic crime scene photographs is to enable those examining them to recognize the places, objects, and people depicted therein, as well as their physical locations within the scene and relative to one another. In most cases, the establishing views and relational views are intended for use in recognizing the physical evidence collected from a scene. Close-up views may also aid in recognition, but in many instances, they serve a more critical purpose linked to the forensic identification process. Latent fingerprint impressions, tire tread impressions, and footwear impressions left at crime scenes provide strong physical evidence that can be used to link suspects to crimes. This is due to the fact that the objects that create these impressions can be individualized to a single source based on unique, observable characteristics. The characteristics that make an individual fingerprint unique involve the arrangement of friction ridges, including features such as bifurcations, ridge endings, and islands (short ridges that are unconnected to other ridges on the skin). These features appear repeatedly on the fingers of
all persons and are arranged in different, unique patterns on different fingers. In tire treads and footwear, differences in the tread design may enable one to differentiate tires or footwear made by different manufacturers, but the key to identifying individual shoes or tires depends upon individual tiny nicks, gouges, or embedded particles that may have been created during the manufacturing process or through normal wear and tear. These individual identifying features result from random processes, thereby making them useful in differentiating one object from another. When a latent impression is made by a finger, shoe, or tire, the pattern of ridges, nicks, gouges, and embedded particles on the object is transferred to the receiving surface, leaving a record of those individual identifying characteristics. When the forensic photographer encounters such evidence, the critical task is to capture those details photographically, so that those characteristics can be meaningfully compared with the object or objects suspected of having left the impression. Such details are typically small and may not be immediately apparent to the untrained eye, so a great deal of care must be exercised to capture them. The first key to capturing such detail is to use a medium that can record sufficient resolution across the entire field of interest so that the features of interest may be detected, identified, and compared against known samples. Latent fingerprints recovered from crime scenes are compared against known fingerprint samples taken from individuals who have been arrested in the past. These known samples usually consist of inked ‘‘ten-print cards’’ that record an individual’s fingerprint for each digit. These ten-print cards define a standard against which latent print photographs may be compared. The National Institute for Standards and Technology (NIST), working with the Federal Bureau of Investigation (FBI), has developed a standard for exchanging fingerprint data via electronic media in which ten-print cards (those that depict inked fingerprints of a suspect’s ten digits) are scanned at a resolution of 500 samples per inch or approximately 19.7 samples per millimeter. Because Nyquist sampling theory holds that one must sample a feature at a rate of at least two pixels per feature length to depict that feature accurately, the NIST standard permits one to resolve features that are approximately one-tenth of a millimeter across. Note, however, that if one defines the feature of interest as a paired set of dark and light lines (or a fingerprint ridge and its adjacent furrow), then one must use four pixels — two each for the dark and light lines — and the resolvable feature would be only one-fifth of a millimeter across. NIST recommends capturing latent impressions (those impressions from an unknown subject that are recovered at a crime scene) at a resolution of 1,000 samples or pixels per inch (ppi). This higher sampling rate is intended to ensure that the features visible at 500 samples per inch in the ten-print card can be detected in the latent impression. If one were attempting to photograph the latent impression left by an average adult male’s hand that is approximately 7 inches long by 5 inches wide, one would need 7,000 ×
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
Figure 1. Photograph depicting part of an average adult male’s hand. Shading indicates the area covered by a camera with 6 megapixels (3,000 × 2,000 pixels) when the resolution is fixed at 1,000 pixels per inch (ppi), the resolution recommended for acquiring latent impressions. See color insert. (FBI Laboratory Photograph)
5,000 pixels. Such an image size can be achieved by using 35-mm film under good conditions as described earlier, but is beyond the detector size of most digital cameras today. Figure 1 further demonstrates this by superimposing the area encompassed by a 6-megapixel camera (3,000 × 2,000 pixels) on an average adult male’s hand at a resolution of 1,000 ppi. It would be possible to photograph the entire hand depicted in Fig. 1 by using the 6-megapixel digital camera at 1,000 ppi, but it would require taking multiple photographs and then stitching them together to create a single image of the hand. In fact, to cover fully a rectangular area 7 inches long by 5 inches wide at a resolution of 1,000 ppi would require eight photographs using the 6-megapixel camera. For 3-megapixel cameras, which are far more common in law enforcement than 6megapixel cameras, the number of photographs needed to capture the same area is almost double. The size of features needed for positive identification of footwear and tire tread impression is not as well defined as for fingerprints. Because such features consist of random nicks or gouges that can originate from a variety of sources and may be only as small as a pin prick. For purposes of discussion, however, one can consider that the smallest feature is comparable to those needed for fingerprint analysis — or a tenth of a millimeter across. Across a distance of 1 foot — a reasonable assumption for the length of most footwear impressions and the width of most tire tread impressions — this translates into a total of 6,000 pixels in the long dimension. This number of pixels is achievable within a single frame when using 35-mm high-resolution film under the best conditions, but, as for the hand print, is not possible using commercially available digital cameras, whose maximum pixel count is no more than 3,000 in the long dimension. As in other situations, larger size film formats, including 2 1/4-inch films and 4 × 5-inch films, provide an even larger area over which high resolution imagery may be recorded.
713
After resolution, the next most significant factor to consider when acquiring forensic images of latent impressions is proper lighting and exposure. In blackand-white photography, one objective is to acquire a wide a range of density or brightness within the scene from darkest black to brightest white. When photographing impression evidence, however, the type and direction of the light source can be even more important, especially when the subject is a footwear or tire tread impression in soil or other material, that has resulted in a marked indentation. Tire tread and footwear sole patterns create impressions that consist of both raised and sunken features that can be highlighted by placing a light source low to the ground and across the impression. This permits identifying both raised and sunken features by creating highlights on those features that face the light source and shadows on those features that face away from the light source. If the light source is placed only directly over the features of interest, they may be less apparent and therefore less useful as a tool of forensic identification. Other steps that must be taken to ensure acquiring the best possible impression photographs include ensuring that the film plane is parallel to the impression so that distortions are not created in the dimensions of the impression. Inclusion of a scale or ruler is also of critical importance to ensure that accurate comparisons of the impression can take place. When including a scale, it should be placed as close to the same height (or depth) as the impression, so that the scale correctly represents the scale of features within the impression. As a final consideration, care must be taken to ensure that the depth of field is selected, so that the entire impression is in focus. Autopsy And Victim Photography One specialized type of forensic photography that falls between field-based and laboratory-based photography is autopsy and victim photography. The conditions under which an autopsy photographer works may be somewhat more controlled from an environmental standpoint — one need not worry about weather conditions, and the lighting should be completely within the control of the photographer — but once the photographs have been taken and each step of the autopsy has been completed, there is little or no chance to go back and take additional photographs. Therefore, autopsy photographs must be taken correctly the first time. When a living victim is being photographed, it is best to photograph the wounds as soon as possible because their outward appearance will change as they heal. As in photography of crime scenes, autopsy and victim photography should begin by taking shots depicting the entire body, moving in to medium-range, relational views to provide the context of specific wounds or features, followed by close-up, wound-specific photographs. The close-up photographs should include frames depicting the wound alone, followed by frames depicting the wound with a ruler or scale in close proximity. It is best to place the ruler immediately adjacent to the wound, so that the most accurate measurements of the wound may be acquired from the photographs, if necessary. In most cases, it is
714
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
preferable to measure the wounds directly at the time of autopsy and record it then; the photographs merely document these measurements. Close-up photographs of wounds may be important in an investigation by showing such characteristics as powder stippling from a gunshot wound or a jagged edged wound created by a serrated knife. Likewise, photographs of gunshot wounds can be used to document whether they represent entrance or exit wounds. In some cases, autopsy or victim photographs can be of special importance in identifying the implement, device, or object that was responsible for creating the wound. Ropes or cords used as ligatures may leave striping or lineations in the skin of the victim, which may then be compared later with items suspected of having been their source. Likewise, burns from hot objects such as lighters or irons may be documented photographically. Because such wounds may heal unevenly, causing a part of the pattern to disappear earlier than the rest of the wound, it is especially important to photograph the wounds when fresh to ensure the most accurate documentation. A final consideration in autopsy and victim photography is the importance of color. The color or change of color in some wounds may be critically important to the investigating pathologist, so the photographs used to depict these injuries must be capable of documenting those observations. Therefore the use of color films is appropriate for most autopsy photography. Care must be taken in such cases, however, to take into account the available lighting conditions and to correct for them when necessary. Fluorescent lighting can generate a green tint in photographs taken under them, so the use of a flash unit or other light source is necessary in such situations. Including a color chart along with a scale in such photographs is always useful for calibrating the colors at a later date. SURVEILLANCE IMAGING Surveillance imaging is any activity designed to record photographically the activities of individuals engaged in, or suspected of, illegal or improper activities. Surveillance may be active or passive and may be conducted by law enforcement agencies in ongoing investigations or as part of an organization’s security activities. The events being documented through surveillance imaging are typically one-time events that will not be replicated. There is only one opportunity to obtain the images, so this type of photography falls within the field-based category. Law enforcement agencies must often engage in active surveillance to document the movements or actions of individuals who are suspected of illegal activities. In such circumstances, surveillance is usually covert, so that the suspects are unaware of the surveillance and will not alter their activities. Most frequently, still images may be used to document the fact that a suspect or vehicle was observed in a specific location or that the suspect met with a specific individual or was carrying a specific object. Traditionally, 35-mm film cameras that have telephoto lenses are used for still imagery. It is not unusual to use lenses whose focal lengths are 300 mm or more, and zoom lenses are
frequently used as well, especially if the range to the subject will vary during the surveillance. Black-and-white film is usually selected as the recording medium in covert surveillance for the higher resolution it offers over color films, digital cameras, and video cameras. In addition, the lighting conditions in many surveillances are such that a high speed film, rated at ISO 800 or higher, is necessary to obtain properly exposed images of sufficient quality that they may be useful to the investigation. Surveillance images are of little use if they are blurry or too poorly exposed to permit one to recognize the individuals, vehicles, or objects depicted therein. Video cameras are used in covert situations in which it is desirable to record the complete actions of individuals under surveillance. Analog video systems remain the most widely used systems in law enforcement, but digital video systems are beginning to be used more frequently. The most widespread use of video cameras in surveillance involves their use in security systems. Banks and other commercial operations such as convenience stores and fast food restaurants, as well as schools and other public facilities, use video camera systems that provide some of the most important imaging evidence used in law enforcement today. Imaging technologies developed specifically for video surveillance include timelapse recorders and multiplexers. ‘‘Time-lapse’’ refers to video systems that can record video imagery at a rate of less than 60 fields per second. (For simplicity, the discussion of video in this section and throughout this entry focuses on National Television Standards Committee (NTSC) video, the standard for video systems used primarily in the Western Hemisphere and Japan. For the most part, this discussion is also valid for video imagery recorded in the Phase Alternation by Line (PAL) and Sequentiel Couleur a Memoire(SECAM) formats, which predominate throughout the rest of the world and record fewer images per second, but have a larger image size than that of NTSC.) The utility of timelapse systems lies in their ability to record imagery during a much longer period of time than would be possible if the recording were made at a normal rate. Whereas a normal, commercial grade video cassette tape will record approximately two (2) hours of NTSC video at a rate of 60 fields per second, time-lapse systems permit one to record images during periods of time up to 240 hours in some cases. The means for accomplishing this is simple. The rate at which images are recorded is simply reduced from 60 fields per second to as few as 1 every two seconds, which permits recording images during a period of 240 hours. In such instances, the cameras are usually pointed at locations in which the subject will be present for a period of time in excess of the period between sequential images. For example, if a typical bank transaction takes 1 minute to occur, then a single camera pointed at a teller station on a time-lapse setting of 240 hours would record 30 images during a transaction. This is usually more than enough time to obtain a good view of the subject. In many surveillance applications, images from multiple video cameras are recorded on a single tape by a multiplexer. In the simplest form, a multiplexer accepts
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
input signals from multiple sources and generates a single output signal. In video surveillance systems, multiple cameras enable an organization to monitor and record images of more than one location at a time. Such systems are particularly useful to banks that choose to record transactions at multiple teller stations at one time and to retail stores that wish to monitor multiple aisles or locations in a store. The rate at which a video multiplexer switches from one input camera signal to the next can vary to as little as one field per camera before switching to the next camera. Multiplexers can be combined with time-lapse recorders to create a multitude of recording options in surveillance systems, and, in fact, some surveillance time-lapse video recorders have built-in multiplexers. Another functionality included in some multiplexers is the ability to display the inputs from multiple cameras simultaneously. The most common configuration used in this scenario displays the incoming signals from four cameras at once, although systems that use sixteen cameras at once are also frequently encountered. In a four-camera system, the output image generated by the multiplexer is divided into four quadrants. This is commonly referred to as ‘‘quad video.’’ To display four signals simultaneously on a regular video monitor, however, it is necessary to compress the incoming signals so that, together, they make a single video image. The resulting output image has only one-half the width and one-half the height of the input image, so this reduces the amount of information available in any one camera’s image to one-quarter of what it was before it was routed through the multiplexer. Finally, a conversion is underway in the surveillance industry in which video will no longer be stored in analog format, but rather in digital format. The analog video signal is captured, then converted to a digital signal, which can then be stored locally on a computer hard drive or transmitted to a remote server for storage. Conversion of the video to a digital format also makes it possible to enhance the capabilities of the surveillance system. In some security applications, the video signals are linked to systems that can detect motion, track people or objects, and even compare the faces of individuals depicted in the video with faces contained in a database. If someone who is not supposed to be present is detected, the system can be configured to alert security personnel. Another benefit of these systems is that by storing the images in a digital format, it is far easier to process the images after the fact using nonlinear editing techniques, as discussed later. From a forensic perspective, the only major drawback to such digital video systems is the frequent use of compression to permit faster transmission and storage of the video images. As discussed later, excessive compression can severely reduce the forensic utility of imagery, so care must be taken to minimize its deleterious effects. Although video now represents the most common imaging technology used in private surveillance systems, some banks and other institutions still use black-andwhite film cameras in conjunction with time-lapse video systems in some of their security systems. The video systems run continuously in time-lapse mode, but the film
715
systems usually do not. Instead, they may be triggered to start taking pictures when an alarm is set off or when a set of ‘‘bait’’ bills is pulled during a robbery. Once activated, these cameras take one or two pictures per second until deactivated or until a set period of time elapses. Usually, such cameras are located over the main entrance/exit of the bank to ensure that any robbers exiting the bank are photographed as they leave. Several different film formats are used in bank surveillance cameras, including 16-mm, 35-mm, 35-mm half-frame, and 70-mm, as well as some panoramic films, among others. Most commonly, the film used in these cameras is rated at approximately ISO 400 and has a resolving power far superior to that of video — anywhere from 50 to 125 lines per millimeter. This translates to an effective resolution of between 3,600 × 2,400 pixels and 9,000 × 6,000 pixels, versus a standard NTSC video resolution of approximately 640 × 480 pixels. Although the resolution of ISO 400 film is less than that of one rated at ISO 100, its faster speed permits higher shutter speeds so that blurring due to subject motion can be reduced while maintaining suitable exposure. Subjects depicted in bank robbery footage often move at high velocities relative to individuals in banks at other times, so a higher shutter speed is useful. MUG SHOTS In 1841, a few years after the invention of photography by Nicepce, the police in Paris began collecting a rogues’ gallery — the first mug shots. In 1874, the use of photographs to identify people was permitted in court for the first time. The use of identification photographs in law enforcement has grown ever since. The greatest boost in taking mug shots came in the 1880s when the Bertillon method of identification was established. This method quickly became the preferred means of personal identification in law enforcement until supplanted by fingerprint identification in the early 1900s. The Bertillon method is a biometric system that relies on careful measurements of the craniofacial characteristics of individuals. To assist the practitioners of this method in their work, mug shots were taken to depict full-facial and profile views of individuals. The general practice of mug shot photography has not changed much since that time, except for the technologies used. As in other areas of forensic imaging, the current tendency is to migrate toward digital or electronic imaging technologies to acquire and store these images. Although the quality of these digital images lags behind that of 35-mm film in identification, they are suitable for recognition. The digital format also simplifies transmitting them via telecommunications channels. Today, the greatest benefit of digital imaging of mug shots is the promise they hold for allowing automated searches of mug shot databases. A number of automated facial recognition technologies have advanced to high capability, although there is still room for improvement. Perhaps the greatest challenge facing those who seek to develop national and international criminal mug shot databases is the lack of standards in acquiring mug shots.
716
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
Until a standard is established, many law enforcement agencies are likely to continue to shoot mug shots on film. LABORATORY BASED PHOTOGRAPHY Once a crime scene has been photographed, it is frequently necessary to photograph items of evidence removed from the scene in a controlled, laboratory setting. Such photography may be undertaken merely for additional documentation, such as documenting the condition of the evidence before it is alterated by destructive testing. Alternately, laboratory photography may be undertaken to support forensic laboratory examinations, where it is easier for the examiner to use photographs of the evidentiary object, rather than the object itself. Likewise, documentary photographs can be easier for investigators, court officials, and juries to handle than the actual item. Laboratory photography may also be used to extract further information of forensic value from the evidence. Evidence Documentation A fundamental advantage of laboratory-based evidence documentation is the opportunity it affords the photographer to select the specific lighting conditions and angle of view that will be used to depict the object. Furthermore, it enables the photographer to focus solely on the object under consideration, without having to worry about other objects that may fall within the field of view and distract from the subject. Unless the object being photographed is particularly fragile or delicate, the photographer also has far more time for photographing the subject than in the field. A common purpose of laboratory photography is to document the condition of an item of evidence before it is modified for further forensic testing. For example, an item of clothing bearing blood stains or other bodily fluids will be photographed before and after swatches of the stained cloth are cut out and removed for serological testing. This is done to document the presence of the fluid on the item, as well as the location from which the swatches were removed. In this way, the integrity of the evidence and the resulting forensic analyses may be ensured. Microscopic evidence is one type of evidence that requires specialized laboratory photographic documentation. Hair, fiber, glass, or other trace evidence recovered at a crime scene can provide useful information in the investigation if it can be associated with a known individual or source. Likewise, bullets and cartridge cases can provide strong evidence in a case if they can be associated with a particular weapon. In each of these instances, the item of evidence can be distinguished from other similar items based on the characteristics visible under microscopic examination. Scientists who examine trace evidence and ballistics rely on comparison microscopes for conducting side by side comparisons of evidence recovered from a scene with items from a known source. A comparison microscope consists of two separate microscopes joined by an optical bridge. By using a comparison microscope, characteristics of hairs and fibers such as color and diameter may be compared
Figure 2. Side by side comparison of hair. See color insert. (FBI Laboratory Photograph)
directly. Figure 2 depicts a side by side comparison of a hair recovered from a crime scene with a hair obtained from a known source. By aligning the known hair with the questioned hair, similarities in the dimensions and optical characteristics may be demonstrated. In bullet or cartridge case comparisons, impressions or striations imparted to the bullet by the barrel rifling or to the cartridge case by the firing pin can be compared directly with those of known weapons by using bullets and cases generated from test firings of the weapon. Once the visual comparison has been performed by the examiner, it can be documented by mounting a camera on the microscope to recreate the examiner’s view. In addition to the one-to-one comparisons that a firearms expert may conduct using the comparison microscope, the images of recovered bullets and shell casings can also be digitally scanned and compared with images of bullets and casings recovered across the United States by using several available image databases. It is expected that by the year 2003, the major ballistics image databases in the United States will be combined into a single national database, known as the National Integrated Ballistics Information Network (NIBIN). Because weapons are frequently used in more than one jurisdiction, this system will enhance the ability of law enforcement agencies to connect
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
previously unconnected crimes that occur across the country. Advanced Laboratory Techniques Although much of forensic photography consists of documenting items and characteristics visible to the naked eye, one of the greatest contributions of forensic photography to law enforcement is its ability to document evidence that cannot normally be seen. Forensic photographers take advantage of lighting and light sources, filters, and films not normally used by recreational photographers to document otherwise invisible evidence. Some of these techniques rely simply on the use of filters to block out unwanted portions of the visible spectrum, and others depend on the reflectance or emission of ultraviolet and infrared light by the object under scrutiny. Filtration and Nonvisible Reflectance. One of the simplest techniques in laboratory forensic photography involves using filters to block out certain wavelengths of visible light, so that the subject of interest may be seen more clearly. Figure 3a depicts an aluminum can bearing a dark fingerprint that extends from the white portion of the can to the red portion. The low contrast between the dark fingerprint dust and the red background makes it difficult to distinguish the details of the print. However, by applying filtration that allows only red light to pass to the film, it is possible to lighten any areas that are red (Fig. 3b). In this filtered view, the red portions appear brightest, and the blue and green portions are dark. The fingerprint, which contains blue and green components
(a)
717
as well as red, remains dark against the previously red background, permitting detection of details of the print. A similar technique may be used when attempting to decipher writing that has been obliterated through the application of additional ink. In much the same way that inks come in different colors, differences in chemistry may affect their properties in the infrared and ultraviolet wavelengths. Two inks that appear identical in visible light may be very different when viewed in infrared or ultraviolet light. Figure 4 depicts an example of this where the infrared reflectance of two inks are found different. Figure 4a depicts an obliteration seen in visible light. When photographed so that only the reflected infrared is recorded (Fig. 4b), the signature underneath the obliteration is clearly seen, indicating that the obliterating ink is transparent to infrared light, whereas the underlying ink is not. The same holds true for the ultraviolet region seen in Fig. 5. Figure 5a depicts two pages from a passport under normal viewing conditions. The passport is suspected of being counterfeit, and one way counterfeit passports are produced is to make previously canceled passports appear valid by staining the pages to obliterate the ‘‘CANCELED’’ stamps. Figure 5b depicts the same pages photographed in reflected ultraviolet light, revealing that the passport is actually a counterfeit that has been treated to obscure the ‘‘canceled’’ stamp. In some cases, the application of reflected infrared or ultraviolet lighting may be combined with high-contrast film to clarify latent impressions further. Figure 6a depicts a blood-stained sheet of paper recovered from a crime scene that was photographed in visible light. The right side of the page appears blank. However, when photographed using reflected ultraviolet and high-contrast film, a pair of footwear impressions is revealed in the previously blank area (Fig. 6b). Luminescence. The use of infrared and ultraviolet reflectance to decipher obliterated writing, as described before, depends on the tendency of some inks to be transparent to these wavelengths whereas others are (a)
(b)
(b)
Figure 3. The use of filtration to drop red portions of the soda can, so that the latent print is more easily detected. (a) View of can with no filtration. (b) View after filtration is inserted to pass only the red component of light, making the red portion lighter. See color insert. (FBI Laboratory Photograph)
Figure 4. Example of infrared reflectance used to reveal obliterated signature. (a) Visible light view of obliteration. (b) View of signature photographed in reflected infrared. (FBI Laboratory Photograph)
718
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
(a)
(a)
(b)
(b)
Figure 5. Counterfeit passport revealed via reflected ultraviolet. (a) Visible light view of passport. (b) View of passport photographed in reflected ultraviolet. (FBI Laboratory Photograph)
opaque. Another property of some inks that can be used to great effect in forensic photography is their tendency to exhibit luminescence when exposed to various light sources. Figure 7a depicts a portion of a check suspected of having been altered from $9 to $99. Figure 7b depicts the same check after being photographed by infrared luminescence. The glowing writing in Fig. 7b represents the original writing on the check in an ink different from
(a)
(b) Figure 7. Check alteration revealed via infrared luminescence. (a) Visible light view of check. (b) Check photographed in infrared reveals original writing. (FBI Laboratory Photograph)
Figure 6. Footwear impressions revealed via reflected ultraviolet and high contrast film. (a) Visible light view of page. (b) Page photographed in reflected ultraviolet and recorded on high-contrast film. See color insert. (FBI Laboratory Photograph)
that used to alter the check. Close examination reveals that the individual who altered the check attempted to hide this fact by overwriting all of the original writing as well as adding the ‘‘Ninety’’ and ‘‘9’’ to the dollar value. One can also note in Fig. 7b that the effect of the fluorescence is so strong that fluorescence from the ink stamp on the reverse side of the check is visible to the left of the signature in this image.
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
719
(a)
(b)
Figure 8. Alteration of postal meter stamp detected through ultraviolet luminescence. (a) Visible light view of stamp. (b) Stamp viewed in ultraviolet reveals original meter number. (FBI Laboratory Photograph)
Ultraviolet luminescence likewise can be used to reveal secret writing and to detect alterations, such as the change in the postal meter number depicted in Fig. 8. In each of the images that reveals the alteration, the object in question is being illuminated by an ultraviolet source, but the reflected ultraviolet is filtered out. In some cases, a combination of filtration, contrast, and luminescence must be used to ensure that the maximum amount of information is extracted. Figure 9 provides one such example. Figure 9a depicts the skin removed from the palm of a murder victim. Visual inspection of the hand revealed what appeared to be some writing on the hand. When photographed using a red filter and high-contrast film, the image in Fig. 9b was produced. The writing was discovered to be a phone number. The palm was next photographed using ultraviolet fluorescence techniques, resulting in the photograph in Fig. 9c, which appears to depict a name written above the phone number. Both the phone number and the name proved to be useful investigative leads. Polarization and Directional Lighting. Whereas the previous examples demonstrated the selection or removal of specific wavelengths of light, differences in the polarization of light reflected from a surface may also be used to enhance the visibility of details on a surface. Figure 10a depicts the appearance of a jacket under natural light. The surface of the jacket appears unremarkable. However, upon the application of crosspolarized light, combined with a polarizing filter and a high-contrast film, the dusty impression of a shoe print on the jacket is recorded (Fig. 10b). Finally, sometimes the proper application of light is all that is needed to make latent evidence more apparent. Indented writing is one such type of evidence. If an individual writes a holdup note or some other writing related to a crime on a piece of paper that is the top sheet of a stack of papers, then the pressure of the writing
implement is transmitted through the top sheet, leaving indentations in the sheets below. Figure 11a depicts a page torn from a desk calendar upon which a name and number have been written. Figure 11b depicts the page immediately below that in Fig. 11a, photographed under ambient lighting. This photograph does not reveal any indentations in this second sheet. However, when the second sheet is photographed in light striking the page at a very low angle (‘‘side lighting’’), the indentations from the original writing are revealed (Fig. 11c).
FORENSIC IMAGE PROCESSING ‘‘Image processing’’ is defined by the Scientific Working Group on Imaging Technologies (SWGIT) as any activity that transforms an input image into an output image. By this definition, image processing encompasses a range of activities from simple format conversions to complex image enhancements. The increased availability of lowcost digital image processing hardware and software has accelerated the use of these technologies in law enforcement and forensic applications. In some cases, image processing simply involves converting an image to a form that will make it easier for the end user to view it or otherwise use it. An example of this type of processing is the chemical development of a roll of film after it has been removed from a camera to create a fixed roll of negatives. Those negatives can then be processed further using standard darkroom techniques to create prints. For film used in forensic applications, these development and printing processes are, for the most part, identical to those used for commercial and consumer photography. The processes used for forensic video analysis, however, are more specific to law enforcement applications than those for commercial or consumer uses.
720
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
(a)
(b)
(c)
Figure 9. Use of filtration and ultraviolet luminescence to detect writing on palm of hand. (a) Visible light view of palm. (b) View of palm using red filter and high-contrast film enhances phone number. (c) View of palm under ultraviolet luminescence reveals the presence of a name. See color insert. (FBI Laboratory Photograph)
Video Format Conversions The consumer-grade players for video evidence that are most commonly available to investigators and court officials use the video home system (VHS) format. In many instances, however, recordings taped by crime scene personnel, witnesses, suspects, or victims are made using consumer camcorders that use smaller, more compact formats such as VHS-C or 8-mm. VHS-C tapes may be played in a standard VHS machine by using an adapter, but recordings made on 8-mm tapes must be copied onto
(a)
(b)
Figure 10. Use of polarization to detect footwear impression on jacket. (a) Visible light view of jacket. (b) View of jacket under polarized light enhances visibility of footwear impression. (FBI Laboratory Photograph)
VHS tapes to be played in a VHS player. Such simple processing can be considered the video equivalent of making photographic prints from film negatives. A more involved type of format conversion more specific to forensic applications involves multiplexed and timelapse video recording systems. As described before, the use of time-lapse systems and multiplexers is common in video surveillance. Such systems permit users to record images from multiple cameras and over a longer period of time than possible using consumer-grade video recorders. However, the output of time-lapse and multiplexed surveillance recorders is usually difficult to view on consumer-grade video players because a recording made in time-lapse mode (less than 60 fields per second) that is played in ‘‘real-time’’ mode (approximately 60 fields per second) appears at an accelerated rate. If multiple camera views were also recorded on such a time-lapse tape (i.e., multiplexed), then the ability to view the video on consumer-grade equipment is further complicated. The simplest operation to improve the utility of such surveillance tapes is to reduce the playback rate of the tape, preferably using the same type of time-lapse recorder that was used to record the tape. This reduced-speed playback can then be used to record a copy of the tape at a rate that is viewable on a consumer brand video player. If the time period of interest depicted in the time-lapse
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
(a)
(b)
(c)
Figure 11. Use of side lighting to reveal indented writing. (a) Page from desk calendar with writing. (b) Page from desk calendar immediately beneath page in (a) viewed in ambient light. (c) Same page as in (b) illuminated via sidelighting reveals indented writing from upper page. (FBI Laboratory Photograph)
tape exceeds two hours, it may be necessary to record the reduced-speed copy on more than one tape.
721
In some multiplexed recordings, however, only a single camera view is of investigative value, making it desirable to remove all but the view of interest from the reduced speed copy. Using analog editing systems, such a procedure can be accomplished only after a timeconsuming operation in which an operator reviews the segment of interest to find the camera view to be copied. The operator then locates each individual field or frame of interest, copies it, and rewrites it out to a new analog tape depicting only the single view. In some cases, rather than write out the individual images directly to another analog tape before copying, the operator will generate a digital file of each image of interest, which is then stored on a computer hard drive or other digital storage medium. The device needed to convert the analog video images to digital image files is an analog-to-digital converter (ADC), commonly referred to as a digitizer or ‘‘frame grabber.’’ Once all of the images of interest have been converted to digital files, then, the individual digital image files may be read out in sequence back to analog videotape by digital-to-analog conversion (DAC) to create the final copy for investigative use. Further discussion of digitizers and frame grabbers is provided later. Like the fully analog method described before, the method of converting each frame of interest to a digital image file can be very time-consuming if the operator must select each image to be converted and perform the analogto-digital (A-to-D) conversion separately on each one. A far less time-consuming means of accomplishing this task is now available to the forensic imaging community in the form of nonlinear digital editing systems. Such systems begin by automatically converting the entire video sequence to a digital file that consists of individual fields or frames that can then be treated as individual images. In many cases, such digitizers use compression to reduce the size of the digital file created through this process. As discussed later, the application of such compression must be given special consideration in forensic applications because it can result in a loss or modification of image data unless care is taken in selecting system settings. Once the video sequence has been converted to a set of digital image files, then, it is a relatively simple procedure to extract the camera view of interest through automated procedures. The simplest means of accomplishing this is to select or extract every ‘‘N’’ th image, where N equals the number of camera views depicted in the multiplexed sequence. This process may not always succeed for several reasons. First, the process by which the analog video sequence is played back and read out is not always perfect due to differences in the devices that record and play back the tapes. Some fields or frames on the original tape may not be played back on a device other than the one that made the recording, due to differences in the alignment of the playback heads or due to a physical process in which the tape being played back does not remain completely in physical contact with the playback heads. Such events can occur randomly, so that one only need rewind the tape and play it back again over the area of interest to retrieve the missing field or frame. In some cases, however, it may be
722
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
necessary to select a different playback device to acquire all of the fields of interest. In a worst case scenario, one may need to play back the tape on the same device that was used to make the original recording to achieve successful playback. An additional challenge that may arise in using automated processes to select a single camera view from a multiplexed recording is that some surveillance systems will alter the sequence by which the camera views are recorded, based on predetermined cues that are tied to other parts of a surveillance system. For example, if a motion detector is set off by movement in an area that is under the observation of a particular surveillance camera, the multiplexer may be programmed to select that camera’s view over those of the other cameras, as long as motion is detected in that area. Another cue frequently used in multiplexed surveillance systems is the activation of a cash register. Such a signal may induce the multiplexer to select only those views that depict the area of the cash register. Finally, some surveillance system multiplexers are programmed to alter the rate at which time-lapse images are recorded when an alarm is triggered; they typically increase the rate to the ‘‘realtime’’ standard 60 fields per second mode, while the rate at which the multiplexer switches between different camera views remains unchanged. When such events prevent one from using a simple time-based technique to extract the relevant images of interest, there are other, more complex automated techniques that can be applied. The most common of these is one in which pattern recognition algorithms are used to permit automatic selection of the camera views depicting the area of interest. Such processes rely the operator to select fixed objects or areas depicted within the field of view of the camera of interest. Then, the nonlinear system may be programmed so that anytime the predetermined objects or area are detected within an image, that image will be extracted from the sequence and saved. In this way, all of the images that depict a single camera’s view may be extracted. One caution regarding this technique, however, is that care must be taken to avoid selecting objects that may become obstructed from view by the movement of people, vehicles, or objects within the scene. Furthermore, although such techniques are technically feasible at present, they require advanced processing systems that are expensive and are unlikely to be widely used by the general forensic imaging community in the near future. Once the entire set of images depicting views from the camera of interest have been copied to a digital format, then, it is a straightforward process of writing them out in sequence, either in real-time (actual speed) or reduced speed (slow-motion) mode. Perhaps more importantly, though, once these images have been converted to digital format, it becomes possible to use digital image enhancement operations more easily to improve their forensic value further. Image Enhancement And Restoration The power of images in forensic science and criminology has been greatly increased in recent years due to progress
in computer technology that now permits individuals to perform complicated image enhancement at their desktops. The SWGIT defines image enhancement as any process intended to improve the visual appearance of an image. Traditional photographic darkroom techniques used to enhance images include brightness (exposure) and contrast adjustments, dodging and burning (or localized exposure variations), color balancing, and cropping. These enhancements can increase the visibility of image details previously obscured by shadows or highlights in an image. Such enhancements are now routinely conducted using digital image processing, which has the benefit of enabling the user to perform enhancements more quickly than in a traditional darkroom without the negative environmental impact created by the chemicals and excess paper used in a traditional darkroom. In addition, digital image processing also permits the use of enhancement and ‘‘image restoration’’ operations not available in the traditional darkroom — operations that originated in signal processing theory. Conversion to Digital Imagery. Digital images used in forensic applications are captured in a number of ways. The quickest method is direct capture using a digital camera. Digital cameras are used in a number of law enforcement and forensic applications to document minor traffic accidents, victim wounds in nonfatal assaults, and gang member tattoos, among other subjects. In some forensic laboratories, digital cameras attached to microscopes are used to document trace evidence such as hairs and fibers and soil samples. Rarely, however, are digital cameras used as the primary means to document major investigations or to generate images that will be subject to detailed forensic analysis due to the current limitations on digital camera technology discussed before. Instead, in many forensic applications, images first captured on film are converted to digital format by film or print scanners. Both devices scan the original image, using either a linear array charge-coupled device (CCD) imager or an area array CCD imager. By scanning an image, rather than simply projecting the entire image onto a fixed area CCD, it is possible to achieve higher resolution and record finer detail from the image. Film scanners that can achieve an optical resolution of 1,200 pixels per inch or greater and flatbed (print) scanners that can achieve an optical resolution of 1,000 pixels per inch or greater are used in many forensic applications. Some film scanners in widespread use can achieve optical resolutions in excess of 3,000 pixels per inch, which is sufficient to resolve individual grains of silver in some black-and-white film images. Videotapes are another source of digital images used in forensic applications. As described before, video signals are recorded electronically in an analog format, so the conversion to digital format is accomplished by an analogto-digital converter (ADC) or ‘‘frame grabber’’ that converts each line of analog image data to a line of pixels. Video signals in the United States follow the RS-170 standard, in which each frame contains approximately 486 usable visible lines of data. To maintain the proper width to height aspect ratio of 4 : 3, a digitized video image that
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
is 486 lines (or pixels) high is sampled at least 648 times to create a digitized video image size of 486 pixels high by 648 pixels wide. Some frame grabbers permit a user to sample each video line more than 648 times and may also sample additional lines above and below the usable visible area. To maintain the proper aspect ratio in such cases, additional horizontal lines must be added to the digital image, either by duplicating entire lines or by interpolating between alternate lines. A related factor that complicates the processing of video images is that each video frame actually consists of two interlaced fields scanned sequentially at a rate of 60 fields per second. By interlacing these fields, each line is scanned approximately 1/60th of a second before or after the lines immediately above and below it, which provides the appearance of continuous motion. A drawback of this system occurs when time-lapse video systems are used. Interlacing can result in extreme discontinuities between subsequent fields, for example, when an object or person moves across the field of view in the one second or more that it may take to record sequential fields. In such cases, a technique known as deinterlacing is used to eliminate one field while maintaining the image height of 486 lines or pixels. This is accomplished either by duplicating every other line or by interpolating between alternate lines and is demonstrated later. Histogram Analysis. Once an image has been rendered in a digital format, it becomes possible to analyze the image using numerical methods because a digital image simply represents a two-dimensional grid of equally spaced locations, each having a specific number corresponding to the brightness or luminance assigned to it. In color images, a set of three numbers is typically used; each number designates the brightness of the red, green, or blue component. The brightness values range from a minimum of zero (0), corresponding to black, up to a maximum of 255, corresponding to white, a total of 256 brightness values. In most cases, 256 values are sufficient to produce the appearance of a continuum within a single scene and has become the de facto standard in digital imaging. This range of values is represented by 8 bits of information (28 = 256 values). Color values consist of three sets of 8-bit data (one each, for red, green, and blue), so color images are usually said to contain 24-bit data. Once the image has been converted into a purely numerical format, a variety of mathematical operations can be performed to analyze the content of the image and adjust the specific values of individual pixels to enhance the visibility of some features (such as small details) or reduce the visibility of other features (such as noise). Histograms provide a simple way of analyzing the content of a digital image that permits one to assess what steps should be taken to adjust the pixel values of an image to improve its overall quality. A histogram represents the brightness content of an image in a format that is similar to a bar chart. It is a plot of the number of pixels that have given a luminance value versus the luminance values. Figure 12a shows the original image of a sidewalk scene, and Fig. 12b shows the histogram of this image. The horizontal axis is labeled
723
‘‘DN’’ for ‘‘digital number,’’ which corresponds to the brightness or luminance values ranging from 0 to 255. The vertical axis represents the number of pixels that have given a DN value. Several observations can be made from analysis of Fig. 12. First, the overall darkness of the image is reflected in the concentration of pixel values at the left end of the histogram. Second, a number of peaks are present in the histogram. Such peaks usually represent large portions in an image that correspond to individual objects or areas depicted in the image. The brightest parts of the image in Fig. 12a, the left faces of the planters in the foreground, are represented by the histogram peak in the vicinity of DN = 130. The second peak around DN = 90 corresponds to the sidewalk upon which the planters rest, and the third peak in the vicinity of DN = 40 corresponds to the right sides of the planters. The less prominent peak below DN = 40 corresponds to the remaining dark parts of the image. Using histogram analysis, one can determine objectively, without inspecting the image itself, whether an image is too bright, too dark, or has too little contrast. The concentration of pixels at the low end of the histogram in Fig. 12b reflects the fact that the image is dark and has few bright, nearly white areas. Once one has assessed the condition of an image’s brightness and contrast variations, then, one can determine what steps should be taken to improve the overall appearance of the image through brightness and contrast adjustments. Brightness and Contrast Adjustments. Brightness and contrast adjustments represent the simplest processing operations that can be applied to images in any setting, whether or not it is forensic. Identifying the proper exposure setting when taking photographs or shooting video is the first consideration for the photographer or videographer. Aperture and shutter speed settings must be selected to ensure that a sufficient amount of light strikes the detector — whether it is film or a CCD — to create an adequate exposure. Most cameras used in forensic (and other) applications use sensors that can detect the exposure within a scene and automatically set the aperture and shutter speed to create a satisfactory exposure for the scene as a whole. Likewise, film processing machines used to generate photographic prints also use automated settings to generate output photographs that will create a satisfactory exposure for the entire image. Such automated processes are geared to producing a pleasing outcome to the casual viewer, and they are based on the assumption that all parts of the scene in the image (including all objects and people throughout the image) are of equal interest to the viewer. In forensic situations, however, the opposite frequently holds true in that a single person or object within a larger scene may be the subject of interest and care must be taken to ensure the proper exposure of that subject, regardless of the effect on the surrounding scene. Properly trained forensic photographers and videographers are aware of these factors and take steps to ensure that the proper exposure and contrast are achieved by using different shutter speeds and aperture settings, as well as different films, photographic papers, and light sources, as necessary.
724
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
(a)
250
240
230
220
210
200
190
180
170
160
150
140
130
120
110
90
100
80
70
60
50
40
30
20
0
45000 40000 35000 30000 25000 20000 15000 10000 5000 0 10
# pixels
(b)
DN value (brightness) Figure 12. (a) Sidewalk scene and (b) the histogram of this image. (FBI Laboratory Photograph)
However, many images examined in a forensic setting are not acquired in a manner that ensures proper exposure and contrast of the subject when the images are taken. Instead, images under forensic examination frequently contain only a small portion that is of interest to the investigator. For instance, a bank robbery surveillance film image will depict the interior of the bank, mostly the lobby region, and only a small portion depicting the bank robbers themselves. Using standard automated processing techniques, the overall exposure of the print will be selected to make the details of the lobby most apparent, and the exposure of the robber will be either too great or too little. Because the details of the bank robbers, their faces, clothing, and footwear are of most interest, adjustments geared to improve the visibility of details on the robbers’ persons are appropriate. Traditional Darkroom Techniques. In a traditional photographic darkroom, the brightness of an image or part of an image are adjusted through changes in the exposure. The overall brightness of an image is increased by decreasing the amount of light that is allowed to strike the photographic paper used to print the image. Darker prints
are produced by increasing the amount of light striking the paper. Such changes in exposure are made either by changing the exposure time or by changing the aperture of the lens between the film and the paper. Dodging and burning are traditional darkroom techniques used to decrease or increase the exposure of a particular portion of an image. If a portion of an image appears too dark in one exposure (such as a shadow region), then the photographer will reduce the amount of light striking the paper in that area, making it lighter. If the area is too light (such as a region of highlights), then the area will be subjected to an increased amount of light. Simple masks made of paper or cardboard are frequently used to block the excess light or allow the passage of more light to the areas that are being dodged or burned. Contrast is adjusted in a traditional darkroom in one of two ways. In the first, differences in the chemical makeup of the photographic paper can result in differences in the contrast generated in the images. Silver bromide is the primary reactant in photographic papers. However, silver chloride also reacts to light in a well-controlled manner and may be mixed in different quantities with silver bromide to create a range of light-sensitive surfaces. Pure silver
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
chloride produces a higher contrast image than pure silver bromide, and by combining the silver chloride with silver bromide in varying amounts, it is possible to produce a range of contrasts in different papers. The second way by which traditional darkrooms achieve higher contrast images involves a single type of paper but uses different filters to adjust the wavelengths of light striking the paper during exposure. These papers typically contain both high- and low-contrast emulsions; the highcontrast emulsions are sensitive to blue wavelengths, and the low-contrast emulsions are sensitive to green wavelengths. By adjusting the filtration to increase or decrease the relative amounts of blue and green light, the photographer can achieve variations in contrast. Digital Processing Techniques. Brightness and contrast are adjusted by using digital image processing techniques are accomplished in the same way as other digital operations are accomplished, through numerical operations that act on the digital number (DN) values of each pixel. In the sidewalk scene depicted in Fig. 12a, an increase in brightness could be useful in improving the visibility of details behind the planters, in the street, or on the far sidewalk. By increasing the overall brightness of this image, as in Fig. 13, the visibility of details such as the vehicles is markedly improved. Such improvements do not come without a cost, however. The brightness-adjusted scene depicted in Fig. 13 reveals a characteristic of image processing activities that is frequently discounted. Specifically, when changes are made to improve the visibility of features in one part of an image, the visibility of other features in the image may be degraded. In this case, the details of the planter face have become somewhat washed out by this simple operation. To maintain the ability to detect these differences, one must also adjust the contrast of the image. Contrast adjustments in digital images typically permit easy differentiation of areas that exhibit relatively similar brightness values. Contrast enhancements are intended to provide to the viewer a clearer distinction between two values. From the standpoint of histogram analysis, such operations work by increasing the difference between
Figure 13. Sidewalk scene from Fig. 12 after overall brightness of image has been increased. Note loss of detail in planter faces. (FBI Laboratory Photograph)
725
adjacent brightness values and driving them toward the extremes in the image — toward darkest black and lightest white. Equalization. Although adjusting brightness and contrast separately can be very effective in improving the forensic utility of an image, they are especially effective when used together. Equalization is a particularly effective digital image processing operation that combines these two adjustments. Equalization is an operation in which the distribution of pixel DN values is adjusted so that it is relatively uniform across the entire range from black to white. The result is a histogram whose peaks are spread out more widely than in the original image. Such a histogram reflects an image that exhibits a wide range of brightness values from black to white, as well as good contrast across the image. Figure 14a depicts the sidewalk scene from Fig. 12 and its histogram (Fig. 14b) after histogram equalization. As in other digital operations, histogram equalization does not come without some trade-off. Instead of a smoothly varying histogram, as depicted in Fig. 12b, the equalized image in Fig. 14 has a histogram made up of relatively widely spaced spikes. Whereas this image provides a good range of brightness values from black to white, it does not permit one to detect fine variations between areas that have slight differences in brightness. Unsharp Mask. When brightness and contrast adjustments are applied to an entire image, they are referred to as global operations because they work across the entire image and pixel values, rather than in specific, local areas. These global operators adjust the DN values of individual pixels based solely on the overall statistics of the entire image. Other image processing operations that can improve the overall forensic utility of images are more focused in their application and rely on analysis of the pixels within a given region for their adjustments. Among the most valuable of these operations is the unsharp mask or Laplacian operation. The unsharp mask operation is used to increase the visibility of fine detail or edges in an image. It accomplishes this by increasing the contrast of the edges by making the dark side of an edge darker and the bright side of an edge brighter. The degree to which the edge contrast is increased, as well as the distance from the edge across which the operation acts, may be adjusted, depending on the effect desired. This operation can be performed by using traditional darkroom techniques, but it requires a labor-intensive process where an out-of-focus copy negative is combined with a high-contrast copy negative to produce the desired effect (hence the name ‘‘unsharp mask’’). Digital image processing greatly simplifies this operation through automation. An example of the unsharp mask operation at work is shown in Fig. 15 where the camouflage pattern becomes more clearly defined when the unsharp mask is applied. Deinterlacing. One of the most persistent problems in improving the overall quality of forensic images obtained by using video cameras derives from the interlaced nature
726
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
(a)
250
240
230
220
210
200
190
180
170
160
150
140
130
120
110
90
100
80
70
60
50
40
30
20
0
45000 40000 35000 30000 25000 20000 15000 10000 5000 0 10
# Pixels
(b)
DN value (brightness) Figure 14. (a) Sidewalk scene from Fig. 12 after histogram equalization. (b) Histogram of this image. Detail on planter face remains visible. (FBI Laboratory Photograph)
of RS-170 video. Although each video frame consists of approximately 486 useful lines of image data, these 486 lines actually consist of two sets of 243 lines contained in each field. The fields are read out sequentially so that the lines from one field are displayed alternately between the lines from the previous field. By reading out 60 fields per second, a frame rate of 30 frames per second is achieved. When video is recorded and played back at a speed of 60 fields per second, the human eye is unaware of the interlacing effect because changes in the scene between subsequent fields occur too quickly to be detected and give the appearance of continuous motion. However, when time-lapse systems are used to record video images, the movement of objects and individuals in the scene may be sufficient to display detectable variations from field to field. Thus, when a frame grabber is used to convert time-lapse analog video images to digital stills, the result may be that two fields that have detectable variations between them are read out and produce a blurred image. Under other circumstances, a frame grabber actually acquires only a single field of video but creates an entire frame by duplicating each line. This results in an image in which any straight lines that are
oriented on a diagonal exhibit a jagged stair-step pattern, instead of a smoothly varying one. This is demonstrated in Fig. 16a. In either case, an operation called deinterlacing can be used to improve the image quality. Deinterlacing reduces the effect of interlacing by replacing every other line with interpolated values based on the values of the two lines above and below it. In some cases, the interpolated value for each pixel in a line may consist merely of the average of the pixel values immediately above and below the interpolated pixel. In other cases, the interpolation may incorporate weighted values of all of the pixels immediately adjacent. For images resulting from a single field, deinterlacing creates a relatively smooth transition from one line to the next and reduces the stair-step pattern exhibited by diagonal lines. When the interlaced images consist of two separate fields, it becomes possible to generate two separate images by deinterlacing to remove either the odd lines or the even lines. Figure 16b provides the result when the image in Fig. 16a is deinterlaced by replacing every other line with the average of the lines above and below. Note the relative smoothness of the diagonal lines in these images.
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
(a)
(a)
(b)
(b)
Figure 15. Unsharp mask operator. (a) Image of bank robber jacket before application of unsharp mask and (b) after. (FBI Laboratory Photograph)
Noise Reduction. Noise in film and video images represents one of the most persistent problems in forensic image analysis. In film images, the film grain itself is frequently referred to as noise because graininess can interfere with the observer’s ability to detect details in an image. Noise in video images may arise from a variety of sources. Noise that has an immediate effect on the output of an imaging sensor includes dark current, thermal noise, and readout noise. Once the video signal
727
Figure 16. Deinterlacing applied to bank robber image. (a) Before deinterlacing and (b) after deinterlacing. (FBI Laboratory Photograph)
leaves the sensor, it is also degraded by interactions with associated electronic components to which it is exposed. The electronics of the camera itself, including clock signals, can generate interference noise, as can the wiring from the camera to the recorder. Once the signal arrives at the recorder, it is subjected to interference from the recorder’s electronics and any other nearby electronics. Dust, dirt, and mechanical problems in the recording device can also have a deleterious effect on the recorded signal.
728
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
In surveillance applications, the problems of noise may be amplified by the environment in which the cameras and recording devices are placed. In contrast to broadcast quality recording environments, surveillance cameras and recorders may be placed in locations that lack adequate ventilation and that are not electronically shielded from one another. Such conditions can test the thermal limits of these devices and result in images that have excessive noise. If an analog video signal is subsequently converted to a digital format, then further noise is generated during the recorder readout and in the A-to-D conversion. As for the other components in this process, the electronics of the frame grabber and other hardware used to implement Ato-D conversion also generate interference that degrades the signal. Noise reduces the overall quality of the image. Although horizontal and vertical banding from a repeating noise signature frequently occurs in video images, the most common type of noise in forensic images is the random pattern introduced by film grain or by electronic noise in video images. This random noise generates a ‘‘speckled’’ or ‘‘salt and pepper’’ appearance in these images. When these images have been converted to digital format, the noise
manifests itself most prominently as marked variations in the digital number (DN) value of individual pixels or groups of pixels within regions that should have uniform values. Such variations can have a great effect on the ability to distinguish the small, individual characteristics used to identify a person or object — either by masking such characteristics or by making a characteristic appear on an object depicted in an image when, in fact, it is not there. When the noise in an image is random speckling, the most straightforward way of minimizing its effect is to set each pixel value to an average of that pixel’s value and its neighbors’ values. This has the drawback, however, of reducing the overall sharpness of the image by averaging together those areas where large pixel value variations represent actual physical boundaries such as edges. This results in an image that appears less sharp or more blurred compared to the original, thus reducing the effective resolution of the image. Some filtering techniques compensate for this problem by adjusting the values of adjacent pixels by varying weights when averaging (e.g., neighborhood ranking or median filtering), but these techniques still tend to reduce the effective resolution of the image.
Figure 17. Four images from video surveillance tape to be averaged together to reduce the visibility of noise. (FBI Laboratory Photograph)
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
Another approach used to minimize the effect of noise is to take multiple images of the same scene or object at different times and average them together on a pixel-bypixel basis. Such pixel-by-pixel averaging can be conducted if the target object is perfectly aligned or ‘‘registered’’ from one image to the next. This results in no loss of spatial resolution and eliminates the blurring generated in singleimage neighborhood averaging. When the object–camera geometry is fixed, for example, when the camera and target object are both stationary, then the object should be in registration in every image and multiple images may be averaged together on a pixel-by-pixel basis without any need to align the images. Real-time video is an ideal medium for generating multiple images from a fixed camera–object geometry if the camera and object are stationary, as is the case in the example following. In this example, multiple images of a minivan were recorded during a surveillance operation. Upon examining the surveillance images, it was observed that the side of the minivan exhibited a number of irregular features that appeared to represent scrapes, scratches, dents, and other marks that could have resulted from normal wear and tear. Such marks, if matched to those on a known vehicle, would be sufficient to individualize the known vehicle as the one in the surveillance video. In this way, it might be possible to associate the owner of the minivan with the activities being monitored by the surveillance operation. Four images from the surveillance video tape depicting the questioned vehicle are included in Fig. 17. It is possible to recognize scrape marks, scratches, and dents in any one of the images shown in Fig. 17. Close inspection, though, reveals a grainy texture (or speckle) from camera/recorder noise that overwhelms the fine detail, particularly when various contrast enhancement operations are performed. To reduce this noise, the four digitized images were averaged together using commercially available image processing software. The images were averaged together in pairs of two to ensure that each image is given equal weighting. Then the two averaged images were themselves averaged, resulting in the image shown in Fig. 18. Because the surveillance
Figure 18. Image that results from the average of four images in Fig. 17. (FBI Laboratory Photograph)
729
camera and target object (the minivan) remained fixed relative to one another in all four images, no image resizing or image shifts were necessary to bring the minivan images into registration. The averaging operation has a significant effect in reducing the amount of speckle and in improving the quality of the image. The improvement in the image quality generated by the averaging operation is demonstrated in Fig. 19. Figure 19a shows a portion of one image from Fig. 17, and the same portion from Fig. 18 is shown in (b). Both images in Fig. 19 have
(a)
(b)
Figure 19. (a) Comparison of images before averaging and (b) after averaging reveals a reduction in the level of noise visible. (FBI Laboratory Photograph)
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
undergone histogram equalization to increase the visibility of fine details. The speckle dominating image (a) is greatly reduced in the averaged image (b), indicating a successful overall reduction in noise in the averaged image. Furthermore, close inspection of the window area
in the averaged image reveals that it is now possible to detect the presence of the window frame/support strut on the far side of the vehicle. The pattern of the rear wheel rim and cap is also more sharply defined in the averaged image. This detail is not visible in image (a).
(a)
Histogram of difference image (offset to gray at DN=128)
(b) 35000
30000
# Pixels
25000
20000
15000
10000
5000
DN Value Figure 20. Difference image demonstrates transient noise removed from surveillance image by the averaging operation. (a) Unprocessed difference image. (b) Histogram of difference image shows concentration of values in middle-gray region. (c) Difference image after histogram equalization. (FBI Laboratory Photograph)
250
240
230
220
210
200
190
180
170
160
150
140
130
110
90
100
80
70
60
50
40
30
20
10
0
0 120
730
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
731
(c)
Figure 20. (Continued)
As a way of assessing the amount of noise removed from the unprocessed images, a difference operation was performed using the averaged image of Fig. 18 and one of the images from Fig. 17. The difference operation compares the digital number (DN) values of a single pixel location in the two images and assigns a new DN value to that pixel location in an output image. The new DN value is determined by the difference between the value in the first image and that in the second image. If the value in the first image is 10 DN greater than that in the second image, then the output value is +10. If the first value is 5 DN less than the second values, then the output value is −5. To ensure that the output image has a range of DN values from 0 to 255 (and will therefore display a range from black to white), the difference values are offset by a fixed DN level of +127. Thus a difference value of −10 results in a DN value of 117 (= 127 − 10). A difference value of 0 (i.e., the same value in both images) results in a DN value of 127 or middle gray. Then, extreme differences are represented as black-or-white. Figure 20a shows the result of the difference operation, and Fig. 20b shows the histogram of this difference image. Through histogram analysis, it can be determined that 95% of the pixels in the difference image have DN values in the range between 106 and 142, that corresponds to middle gray. Thus, Fig. 20a is dominated by middle gray values. Figure 20c shows the image that results when the brightness and contrast are enhanced in Fig. 20a to emphasize the details in this difference image. Two primary observations can be made about the noise signature in Fig. 20c. First, the difference image contains a high degree of speckle, consistent with random noise
in the single image that has been removed from the averaged image. Second, a regular pattern of noise in the form of horizontal bars is also present. This noise is likely to be from clocking signals or other harmonic signals generated in the surveillance system or recorder. Based on this difference image (Fig. 20c), it is possible to conclude that the averaging operation was useful in removing both random and harmonic noise from the original images. Fourier Analysis. Another image processing technique that has been of great use in forensic applications is Fourier analysis, which is based on theories developed in signal processing. The fundamental basis of Fourier analysis is that any signal or waveform may be expressed as a combination of a series of sinusoidal signals. Furthermore, once a signal has been broken down into a series of sinusoids, it is possible to extract individual sinusoids from signal. An example of a one-dimensional signal that results from the combining of two separate signals is provided in Fig. 21. In this case, the signal represents a variation in voltage, but it could just as easily represent a variation in brightness, such as the variation in brightness across a single line of video. Fourier analysis is not restricted to one-dimensional signals. Through Fourier analysis it is possible to examine simultaneously the variations in brightness in both the horizontal and vertical directions in an image and identify any repeating patterns in the brightness variations. The most common technique used in Fourier analysis of forensic images is the Fast Fourier Transform (FFT). Through application of the FFT, the contents of a twodimensional image are converted into a frequency space
732
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
(a)
Figure 21. Combination of two one-dimensional signals to create a third signal. See color insert. (FBI Laboratory Figure)
representation in which repeating brightness patterns can be more easily detected and deleted. The method for converting two-dimensional images to frequency space is demonstrated in Fig. 22. In a frequency-domain representation, the brightness of a feature corresponds to the strength of a given component, and the location corresponds to the signal wavelength. The longer wavelength components are represented closer to the center of the frequency-domain representation, and shorter wavelength components are located farther from the center. An additional characteristic of frequency-domain representations is that repeating features are displayed in frequency space at an angle that is 90° away from their orientation in the actual image. This is reflected in Fig. 22 where the vertical bars in Fig. 22a are represented in frequency space by the horizontal line in Fig. 22b. An example of using FFTs to improve the clarity of a surveillance video image is shown in Fig. 23. The original video image (Fig. 23a) reveals an unknown suspect whose shirt pattern is obstructed by electronic interference (Fig. 23b). The signature of this interference consists of repeating patterns oriented vertically and on diagonals from upper left to lower right and upper right to lower left. Figure 23c depicts the frequency space representation of Fig. 23b. Among the dominant characteristics of this frequency representation are the bright vertical bars to the left and right of the center (Fig. 23c). The bars in the upper left and lower right quadrants correspond to the diagonal noise patterns that extend from the upper right to lower left in Fig. 23b. By suppressing these noise patterns, as shown in Fig. 24a and then converting this image back to an image space representation (Fig. 24b), one may observe that the upper right to lower left diagonals have been removed. Further noise suppression and the resulting image are shown in Figs. 24c,d. Although the poor resolution of the original video image prevents one from clearly identifying the pattern in the shirt, the resulting image allows the viewer to differentiate this shirt more easily from others that might be recovered. As in the example of noise reduction provided before, it is possible to analyze the harmonic noise removed by applying the FFT in this case by using a difference image.
(b)
Figure 22. Demonstration of fast Fourier transform. (a) Image depicting a repeating wave pattern. (b) Frequency space representation of image in (a). (FBI Laboratory Photograph)
Fig. 24e documents the parts of the original image that have been removed. The example provided demonstrates the utility of FFTs in eliminating electronic noise signatures; Fourier analysis has also been particularly useful in separating latent fingerprint or friction ridge patterns from background weave patterns in fabrics. In some cases it is not necessary to remove the repeating pattern completely. Simply reducing the intensity of some background patterns may be sufficient to permit detecting the patterns of interest. LABORATORY IMAGE ANALYSIS Although a majority of the images used in forensic applications are taken for documentation, in many cases, the images themselves become the subject of scrutiny to advance an investigation or to implicate or exonerate individuals suspected of crimes. Three major categories of forensic analysis to which images may be subjected are forensic photographic comparisons, forensic photogrammetry, and forensic image authentication. Forensic Photographic Comparisons Whenever an item recovered from a suspect can be associated with a crime, it can provide important
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
733
widely available during the last 30 years. It was envisioned that such images would be used to compare the faces of the individuals depicted in the surveillance film or video with known suspects. Unfortunately, such facial comparisons cannot be conducted when a felon’s face is obscured by a mask or hood. In such cases, it is frequently possible to make an association between the mask or hood (or other item of apparel) worn by the felon and one recovered from the suspect’s home or vehicle based on the visible characteristics of the item. Other such items may include trousers, shoes, hats, firearms, luggage, and vehicles. Photographic comparisons are based upon the ‘‘principle of individualization’’ (sometimes called the ‘‘principle of identification’’), which states:
(a)
(b)
(c)
Figure 23. FFT applied to the video image of a robber’s shirt. (a) Original video image. (b) Close-up of robber’s shirt. (c) Frequency space representation of image (b). (FBI Laboratory Photograph)
circumstantial evidence in prosecuting that suspect. Photographs and videotape images taken by using surveillance cameras during the commission of crimes are one source of circumstantial evidence that has become
When any two items contain a combination of corresponding or similar and specifically oriented characteristics of such number and significance as to preclude the possibility of their occurrence by mere coincidence, and there are no unaccounted for differences, it may be concluded that they are the same, or their characteristics attributed to the same cause (1).
Two types of characteristics are examined in side by side comparisons: class and individual identifying characteristics. Class characteristics are those that separate a person or item belonging to a specific group or set of objects. The make and model of an automobile or the color of an individual’s hair are examples of class characteristics. Individual identifying characteristics are those characteristics used to differentiate a person or object from others within a class. Individual identifying characteristics frequently used to identify people include freckle patterns, moles, scars, chipped teeth, and ear patterns. Individual identifying characteristics frequently used in identifying clothing and other objects include wear marks, rips, nicks, stains, and other damage that can arise from wear and tear resulting from the regular use of an item. Clothing can also exhibit individual identifying characteristics generated during the manufacturing process such as the unique alignment of patterns across seams. Clothing manufactured from materials that bear plaid or camouflage patterns exhibit unique characteristics that are frequently highly visible in surveillance images due to the high contrast features of these patterns. For automobiles, individual identifying characteristics may include dents, scrapes, rust spots, and even bumper stickers. License plate numbers and vehicle identification numbers represent unique identifying characteristics of the manufacturing process. The mere presence of such features alone is not usually sufficient of individualization, however. Consideration must also be given to the location and orientation of these characteristics, as well as placement relative to one another. As an example, many individuals exhibit moles or scars, but the location of these features varies from person to person. Likewise, many automobiles have dents and scratches but their frequency and placement vary from vehicle to vehicle. Even when a person or object has individual identifying characteristics, a more critical factor in forensic photographic comparisons is often the quality of the surveillance images. To make a positive identification, one must be able
734
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
(a)
(b)
(c)
(d)
Figure 24. Application of FFT to a robber’s shirt. (a) Frequency space representation of robber’s shirt edited to remove diagonals that are oriented from upper right to lower left. (b) Image that results from editing in frequency space (a). (c) Frequency space representation edited to remove additional repeating noise in image. (d) Image that results from editing in (c). (e) Difference image reflects the noise removed from image in Fig. 23b to generate the image in Fig. 24d. (FBI Laboratory Photograph)
to distinguish a set of individual identifying characteristics of such number or significance that it separates the item or person from all others. In many cases involving video surveillance images, the quality of the unprocessed images is too poor to allow an examiner to distinguish even a single individual identifying characteristic. Two major factors contribute to the relatively poor quality of most video surveillance imagery. The first is that the inherent spatial resolution of video is far less than that of film, making it more difficult to see fine detail in a single video image than in a single film image. The second major factor that reduces the
quality of video images is noise. Reducing the effect of noise in video images is one of the primary tasks in forensic image enhancement, as discussed earlier. However, despite the poor quality of most surveillance images, quite frequently the quality is sufficient to provide valuable evidence. Examples of Forensic Photographic Comparisons. Figure 25 provides an example of a facial comparison. The photographs (Fig. 25a,b) are arrest photographs taken several years apart in two different jurisdictions in the United States. The photographs reportedly depict
(e)
Figure 24. (Continued)
(b)
(a)
(c)
(d)
Figure 25. Facial comparison. Two mug shots (a,b) thought to depict the same individual. (c) Comparison of region above eyes. (d) Comparison of right ear region. (FBI Laboratory Photograph) 735
736
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
two different individuals, but authorities suspected they were actually of the same person. Fingerprint records corresponding to the first photograph were not available, so a facial comparison was conducted to assess whether the photographs depict the same individual. Despite differences in the lighting conditions and quality of the images, multiple similarities were noted between the two images, particularly in the region above the eyes (Fig. 25c) and in the right ear (Fig. 25d). Based on the correspondence of these multiple characteristics, it was concluded that the two photographs depict a single individual. Tattoos provide another means of identifying individuals. The surveillance images depicted in Fig. 26a,b were obtained as part of a covert surveillance. The images of the individual’s face did not show individual identifying characteristics such as moles, scars or ear patterns that would be sufficient to identify the suspect. However, the images in Fig. 26a,b depict what appear to be tattoos on the inside of the left arm of the individual. Law enforcement authorities identified a suspect and obtained photographs depicting the tattoos on the suspect’s arms. As shown in Fig. 26c,d, the tattoos exhibited by the suspect correspond directly with the markings on the known individual. Based on these characteristics, it was possible to identify the suspect as the individual depicted in the surveillance images.
The techniques of photographic comparison may also be applied to comparisons of clothing worn by individuals. As an example, a comparison was conducted between the camouflage jacket worn by the bank robber depicted in Figs. 15 and 16 and a camouflage jacket that was recovered during the investigation of a suspect in the crime (Fig. 27b). The individual pieces of material that make up the camouflage jacket are cut at random from continuous rolls of material upon which the camouflage pattern has already been printed. To reduce manufacturing costs, the individual pieces are cut to maximize the amount of material used per roll. No attempt is made during the manufacturing process to align patterns from one jacket to the next because that would require additional time and effort from the individuals constructing the garments. As a result, each seam can be expected to exhibit a different juxtaposition of individual elements of the camouflage pattern. In the camouflage jacket in Fig. 27, more than a half dozen separate pieces are pictured including the right sleeve, the right front panel, the left front panel, the left sleeve, the back panel, the right front breast pocket, and the right front breast pocket flap. The arrows in Figs. 27a,b indicate some of the similarities identified between the jacket worn by the bank robber and the suspect’s jacket. The correspondence of these multiple
(a)
(b)
(c)
(d)
Figure 26. Personal identification by tattoos. (a,b) Images from surveillance video depicting tattoos on subject’s left arm. (c,d) Tattoos on suspect’s left arm. (FBI Laboratory Photograph)
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
737
(b)
(a)
Figure 27. Photographic comparison of camouflage jacket. (a) Surveillance image depicting jacket worn by robber. (b) Jacket recovered from suspect. Arrows indicate some of the points of similarity. (FBI Laboratory Photograph)
similarities is sufficient to identify the suspect’s jacket as the one worn in the bank. Other objects such as vehicles may also be identified through photographic comparison. The minivan in Figs. 17 through 19 that was the subject of noise reduction was compared with the minivan owned by an individual suspected of involvement in the case (Fig. 28). Recall that the noise was reduced in the video images to clarify small characteristics on the side of the minivan that were scratches and dents. Figure 28a documents similarities in the class characteristics of the questioned and known vehicles, and Fig. 28b documents similarities in the individual identifying characteristics including multiple scratches and dents. Based on these similarities, the suspect’s vehicle could be identified as the one in the surveillance video.
Comparisonof class characteristics (1 - overall configuration)
(a)
2 2 3 4
5
6
3 4
5
6
Individual identifying characteristics
(b)
5 5
1
2
3
4
1
2
3
4
Figure 28. Comparison of questioned vehicle (left images) with suspect’s vehicle. (a) Corresponding class characteristics. (b) Corresponding individual identifying characteristics. (FBI Laboratory Photograph)
Forensic Photogrammetry Photogrammetry, another primary type of forensic image analysis, involves extracting dimensional information from images. A frequent purpose of forensic photogrammetric examinations is to determine the approximate height of bank robbers depicted in surveillance imagery. This type of analysis provides circumstantial evidence against a known suspect if it is determined that the bank robber is approximately the same height as the suspect. In some cases, however, this analysis can provide a crucial means of eliminating suspects or differentiating among several suspects of different heights. This can be especially useful in corroborating the testimony of a cooperating witness when multiple bank robbers are involved and the cooperating witness was involved in the robbery. Finally, this type of analysis may also generate investigative leads to be used in the profile describing an unknown subject. In the latter instance, this type of analysis can be particularly useful when eyewitnesses offer different estimates of an unknown subject’s height. Many engineering and scientific activities use multiimage photogrammetry to derive accurate measurements of a scene or location. However, in bank surveillance imagery, photogrammetry most frequently involves singleimage analysis because there may be only one camera within a bank or, if there is more than one camera, the cameras are not synchronized to capture simultaneous images of a bank robbery. Therefore, single-image photogrammetric techniques are the most commonly used in forensic applications. One technique often used in forensic photogrammetry involves perspective analysis. Generally, when perspective is present in an image, then lines which are parallel in the original scene (such as the two sides of a door frame, the top and bottom of a service counter, or the outline of tiles that make up the floor of a bank lobby) will appear to converge or vanish at a single point in the image. Such points are called ‘‘vanishing points.’’ The perspective properties of an image can be defined by the two-dimensional coordinate locations for three vanishing points that correspond to three sets of mutually orthogonal parallel planes. These three points define the ‘‘perspective triangle.’’ Once the perspective triangle is defined for an image and a single true linear
738
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
dimension is known within the scene, then it becomes possible to measure any linear dimension within that scene. The bank robbery surveillance image in Fig. 29a is an example. The floor tiles upon which the robber is standing provide a known orthogonal grid pattern on the floor of the bank. The teller counter and the customer service counters in the lobby are also aligned with these tiles in an orthogonal arrangement. By extending the edges of the tiles and counters to the left and right, it is possible to locate these vanishing points (Fig. 29b). Likewise, it is possible
(a)
(b)
Vanishing point "Y "
Vanishing point "X "
Vanishing point "Z " (c) Robber height
Counter height Measuring line
Figure 29. Photogrammetric analysis to determine bank robber’s height. (a) Bank robbery surveillance image. (b) Vanishing point determination as part of photogrammetric analysis. (c) Projection of bank robber height and counter height to measuring line. (FBI Laboratory Photograph)
to determine the third vanishing point in the downward direction by using the vertical edges of the counter face panels and the edges of door frames and wall corners, thus completing the perspective triangle. Once a known scale dimension is located within the scene, then, it is possible to use the perspective triangle to project that dimension and the robber’s height to a common location or ‘‘measuring line,’’ as shown in Fig. 29c. Once these dimensions have been projected to the measuring line, direct scaling may be used to calculate the robber’s height. The dimensions of fixed objects within the scene are also calculated by using this technique to assess the accuracy of the solution. In this case, the calculated height of the robber was approximately 69.75 , with an accuracy of approximately 1% or approximately 3/4 . The true height of the robber, it was later found, was 70 , within the calculated accuracy. (Note: English units are used in many photogrammetric analyses in the United States because American law enforcement officials, court officials, and juries are most familiar with these units. The use of, or conversion to, metric units is done whenever necessary.) Reverse projection is a second technique frequently used in forensic photogrammetry. In many cases, traditional analytical single-image techniques cannot be used in examining these images due to a lack of photogrammetric control either within the scene or for the camera. When this is so, reverse projection can frequently be used. Reverse projection can be used to locate the position of a camera from which an image was taken. Once this camera station has been determined, then, from that position it is possible to ‘‘project’’ the location of people or objects depicted in a photograph. The camera station is determined through an iterative process in which a photograph is first examined to determine the approximate location depicted in the scene and the relationship of the camera to that scene. Once the approximate camera location has been determined, then the type of camera and focal length of the lens used are estimated and an attempt is made to locate the exact position of the camera at the time the photograph was taken. This is accomplished by aligning the camera position and focal length so that the same field of view depicted in the image is depicted in the viewfinder of the camera during an on-site examination. Then, various fixed objects within the scene are used as points of reference to align the camera’s field of view until it exactly replicates that of the image. When the images are film, a fine-grain positive of the image may be placed in the viewfinder to aid in the alignment. In video cases, a mixing board is used along with video ‘‘wipes’’ and ‘‘dissolves’’ to compare the live and video images. A video wipe is a process for replacing one part of the overlying video image by that part of the underlying image. Wipes are typically applied to images horizontally or vertically, allowing the viewer to track the alignment of horizontal and vertical features in a scene within the two images. A video dissolve is a process for reducing the opacity of the overlying image and increasing that of the underlying image so that the first image appears to ‘‘dissolve’’ into the second image.
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY (a)
739
(b)
Figure 30. Bank robber height determination through reverse projection photogrammetry. (a) Image of bank robber in lobby. (b) Reconstruction of scene with height chart at bank robber position. (FBI Laboratory Photograph)
After the alignment is achieved, the crime scene is reconstructed by determining the position at which the individual or object was located within the scene and then placing a height chart or other scale device at that location. Once the scale device is in place, additional photographs depicting the scale device are taken, and the individual or object may be measured through direct comparison with the scale. Figure 30 depicts a bank robber and a height chart located at the robber’s location from which the individual’s height could be determined, as indicated in the figure. Forensic Image Authentication A third type of forensic image analysis involves situations in which the veracity or authenticity of an image may be questioned. Photographs have been manipulated to change the content or the appearance of people and objects within an image practically since the invention of photography. In many cases, alterations are for artistic purposes, and in others they have been used for political purposes. Many high ranking individuals in the Nazi and Soviet regimes of the mid-twentieth century disappeared from photographs after they fell out of favor with the leaders of their nations. Today, given the widespread availability of digital image processing software, just about anyone can alter a digital image file or ‘‘manipulate’’ it. Images originally recorded on film may even be digitized, altered, then
transferred back to film to create an original film image that cannot be identified as a manipulated image. The implications of such capabilities are quite worrisome to members of the criminal justice system. If the veracity of images cannot be trusted, then their value as evidence is greatly reduced or eliminated altogether. Fortunately, if the veracity of an image or set of images is questioned, techniques may be used to analyze the images to detect artifacts left over from a poorly performed manipulation. Given a sufficient amount of time and sufficient resources, it should be possible for a well-trained individual to create an undetectable image manipulation. The key to detecting manipulations, therefore, rests on the fact that few individuals have the time, training, and resources needed to make a perfect manipulation. A number of factors can be examined to detect a manipulation. The content of an image is the first feature examined. If objects or people depicted in the questioned image could not possibly have been present at the time and location shown, then one can conclude that an alteration has taken place. Likewise, it may be possible through photogrammetric analysis to determine that the size and shape of objects depicted in an image are different from their true dimensions, in which case an alteration is also indicated. Other characteristics that are routinely examined include the lighting and shadowing, contrast, color, sharpness, grain structure or resolution, and focus of the various
740
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY (b)
(a)
Figure 31. Image manipulation detection. (a) Manipulated image. (b) Original image from which manipulated image was derived. Irregularities in image content, focus, and resolution in the manipulated image can be observed as described in the text. (FBI Laboratory Photograph)
objects and people in an image. Any sudden, unexplained changes in any one of these characteristics can indicate a manipulation. For example, if two people are depicted in a face-on view standing side by side and one of the individuals appears out of focus whereas the other is in sharp focus, then it is possible one of them has been inserted into the image. Likewise if a shadow is present underneath the nose of one of the individuals and the other’s nose casts no shadow, then a manipulation is also indicated. Figure 31 depicts a manipulated image (Fig. 31a) and the original image from which it was derived (Fig. 31b). The image content provides the first indication of manipulation. The fish depicted in the image is a bass, but the size of the one depicted in the manipulated image is far in excess of the largest bass on record. Furthermore, the individual holding the bass does not appear to be straining in any way to support the weight of the fish, as indicated by his right biceps. In addition, although the fisherman’s hands and leg are sharply in focus, the fish is out of focus or blurred. In contrast, the outline of the fish is quite sharply defined. If the interior of an object is out of focus, the edges of the object should also be out of focus. All of these observations indicate a manipulation. Video Authentication. In addition to the authentication of single images discussed before, the authenticity
of videotapes is often questioned. In addition to the techniques of single-image analysis used earlier, analog videotapes may be authenticated through electronic signal analysis. Such analysis can be used to detect record stops, starts, and erasures, because each of these activities leaves electronic signals and physical signal traces on the magnetic tape that may be detected. These analyses are not restricted to videotapes but are also commonly used to authenticate audiotapes. In some cases, it is possible to record an image of the physical appearance of these traces on the magnetic tape. However, the details of this type of analysis are beyond the scope of this article. For further information, the reader is directed to Koenig (2).
ADDITIONAL FACTORS IN FORENSIC IMAGING APPLICATIONS The increasing use of digital imaging media in forensic applications presents a challenge to the law enforcement community. The integrity of images acquired and stored using electronic media is considered by many more open to suspicion than traditional film due to the ease of altering digital images. Likewise, the issue of image compression represents a challenge that remains to be adequately addressed by law enforcement.
IMAGING SCIENCES IN FORENSICS AND CRIMINOLOGY
Image Integrity Images offered in a court of law can be subjected to the same rigorous standards as living witnesses. If an image cannot be verified as a ‘‘true and accurate representation’’ of what it purports to represent, then it may not be accepted by the court. Evidence showing that the image has been altered may also be suppressed. Even if allowed into the court proceedings, an image whose integrity is challenged may have reduced value to the jury that must consider it. Therefore, the issue of image integrity is critical for law enforcement agencies. Before the advent of digital imaging in forensic applications, the veracity of film images was rarely challenged. It was recognized that film images could be altered, but the means for generating fakes was considered beyond that of the lay person. ‘‘Chain of custody’’ procedures in which police follow strict processes to ensure the proper handling of film evidence helped ensure its integrity by treating film same as other pieces of physical evidence. Some fear that digital imaging and the widespread availability of commercial image processing may change the perceived integrity of forensic images. As a result, many in law enforcement are seeking ‘‘foolproof’’ ways of ensuring the integrity of their digital images using technological fixes. As yet, however, there are no standard technologies in widespread use throughout the law enforcement community for dealing with this issue. One reason for the lack of a standard technology for digital image protection is that, in much the same way that images stored on film negatives can be copied, altered, and rephotographed, so, too, images that are stored in ‘‘secure’’ digital formats can be copied, altered, and then stored once again in ‘‘secure’’ formats to defeat these technologies. One common ‘‘secure’’ technology now being offered to law enforcement agencies is digital watermarking. Such systems are designed to prevent the alteration of images by incorporating a unique image signature within the content of the image so that any alteration is detected. A critical drawback of watermarking systems is that by superimposing this watermark on the image, the content of the image is altered by replacing some of the image content with the watermark. Another drawback is that the watermarking system could be used to place a watermark on a manipulated image, after which point the manipulated image would appear to be authenticated. Given the current lack of rigorously tested systems to maintain the integrity of digital images, the Scientific Working Group on Imaging Technologies (SWGIT) has recommended that agencies protect their images in the same way that they protect any other evidence — by saving one’s images on a physical medium such as film or CDROM and using traditional physical security measures to protect that physical evidence. As long as a sound chain of custody is maintained from the acquisition of an image through its presentation in court, the chances that an image is altered are severely limited. Compression A second major challenge facing law enforcement agencies that would use digital imaging is compression. Compression reduces the size of digital files so that they take up
741
less room on storage devices and permits one to store more images on a single piece of medium. This reduction in size also enables one to transmit an image more quickly on available communications channels. Compression may be either ‘‘lossy’’ or ‘‘lossless.’’ In lossless compression, redundant information is eliminated from an image and permits accurate re-creation of the original, precompressed content of the image. ‘‘Lossy’’ compression is less benign, however; some information contained in an image — usually the least significant bits — is removed from the image. When uncompressed, a lossy image will fill in the missing data using estimated values. The removal of data from forensic images through lossy compression is a major challenge because some of the most critical data may reside in the parts of an image that are removed by compression. Another drawback of lossy compression arises from the fact that, in some cases, the act of compressing an image introduces visible artifacts into the image that were not in the original image. One such artifact is the ‘‘blocking’’ artifact frequently seen in images that have been compressed by using the JPEG compression standard. JPEG compression breaks an image into 8 × 8-pixel regions, which are then individually compressed. This can create relatively large changes in luminance and color from one 8 × 8 block to the next, which can be exaggerated when enhancement operations such as unsharp mask are used. Likewise, lossy compression can alter colors in an image. This could be particularly dangerous in victim wound photography where color shifts might reduce the visibility of existing bruises, or, alternately, create the appearance of bruises where none exist. In the absence of further guidelines for using compression appropriately in law enforcement applications, the SWGIT strongly recommends avoiding the use of lossy compression unless sufficient cause is given by those who compress the images. ABBREVIATIONS AND ACRONYMS ADC A-to-D CCD CDROM DAC DN FBI FFT ISO JPEG mm NIBIN NIST NTSC PAL ppi SECAM SLR SWGIT VHS VHS-C
Analog to Digital Converter (or Conversion) Analog to Digital Charge-Coupled Device Compact Disk Read Only Memory Digital to Analog Converter (or Conversion) Digital Number Federal Bureau of Investigation Fast Fourier Transform International Standards Organization Joint Photographic Experts Group millimeters National Integrated Ballistics Information Network National Institute for Standards and Technology National Television Systems Committee Phase Alternation by Line pixels per inch Sequentiel Couleur a Memoire Single Lens Reflex Scientific Working Group on Imaging Techniques Video Home Systems Video Home System — Compact
742
IMAGING SCIENCE IN MEDICINE
BIBLIOGRAPHY (BY CATEGORY) 1. H. Tuthill, Individualization: Principles and Procedures in Criminalistics, Lightning Powder Company, Salem, Oregon, 1994. 2. B. E. Koenig, J. Audio Eng. Soc. 38(1/2), 3–33 (1990).
Forensic Field-Based Photography and Video American National Standards Institute (ANSI), American National Standard for Information Systems — Data Format for the Interchange of Fingerprint, Facial, & Scar Mark & Tattoo (SMT) Information. ANSI/NIST-ITL 1-2000, 2000. Association for Information and Image Management (AIIM), Technical report for Information and Image Management — Resolution as it Relates to Photographic and Electronic Imaging, AIIM TR26-1993, 1993. E. C. Berg, Forensic Sci. Commun. 2(4) (2000). (http://www. fbi.gov/programs/hq/lab/fsc/backissu/oct2000/berg.htm). H. Blitzer, Law Enforcement Technol. 27(2), 16–19 (2000). H. Blitzer, C. Garcia, and A. Leitch, Law Enforcement Technol. 27(6), 52–55 (2000). H. Blitzer, Law Enforcement Technol. 276), 58–62 (2000). W. J. Bodziak, Footwear Impression Evidence, 2nd ed., CRC Press, Boca Raton, 2000. J. E. Duckworth, Forensic Photography, Charles C. Thomas, Springfield, 1983. D. R. Redsicker, The Practical Methodology of Forensic Photography, 2nd ed., CRC Press, Boca Raton, 2001. Scientific Working Group on Imaging Technologies (SWGIT), Forensic Sci. Commun. 3(1), (2001). (http://www.fbi.gov/programs/hq/lab/fsc/backissu/july2001/swgitltr.htm). Scientific Working Group on Imaging Technologies (SWGIT), Forensic Sci. Commun. 2(1), (2000). (http://www.fbi.gov/programs/hq/lab/fsc/backissu/jan2000/swigit.htm).
Laboratory Based Photography R. K. Bunting, The Chemistry of Photography, Photoglass Press, Normal, IL, 1987. W. R. Harrison, Suspect Documents, Nelson-Hall, Chicago, 1981. G. M. Mokrzycki, Forensic Sci. Commun. 1(3) (1999). (http://www.fbi.gov/programs/lab/fsc/backissu/oct1999/ mokrzyck.htm). S. A. Schehl, Forensic Sci. Commun. 2(2) (2000). (http://www/ fbi.gov/programs/lab/fsc/backissu/april2000/schehl1.htm). L. Stroebel and R. Zakia, eds., The Focal Encyclopedia of Photography, 3rd ed., Focal Press, London, 1993.
Forensic Image Processing
Laboratory Image Analysis D. A. Brugioni, Photo Fakery, Brassey’s, Dulles, VA, 1999. J. Cadigan and W. J. Stokes, AFTE J. 25(4), 242–245 (1993). E. A. Craig and W. M. Bass, J. Forensic Identification 43(1), 27–34 (1993). A. Criminisi et al., Proc. SPIE 3,576, 227–238 (1999). C. Fields, H. C. Falls, C. P. Warren, and M. Zimberoff, Obstet. Gynecol. 16(1), 98–102 (1960). C. N. Frechette, J. Forensic Identification 44(4), 410–429 (1994). M. Houts, ed., Photographic Misrepresentation, Matthew Bender, New York, 1969. R. A. Huber, Criminal Law Q. 2, 276–296 (1959–1960). A. Iannarelli, Ear Identification, Paramount, Fremont, CA, 1989. M. Y. Iscan and R. P. Helmer, eds., Forensic Analysis of the Skull, John Wiley & Sons, Inc., NY, 1993. A. Jaubert, Making People Disappear, Pergamon-Brassey’s, Washington, 1986. D. King, The Commissar Vanishes, Metropolitan Books, Henry Holt, NY, 1997. W. J. Mitchell, Sci. Am. 270(2), 68–73 (1994). F. H. Moffitt and E. M. Mikhail, eds., Photogrammetry, 3rd ed., Harper & Row, NY, 1980. E. Muybridge, The Human Figure in Motion, Dover, NY, 1955. E. Muybridge, The Male and Female Figure in Motion, Dover, NY, 1984. J. K. Nickell, J. Forensic Identification 46(6), 702–714 (1996). C. C. Slama, ed., Manual of Photogrammetry, 4th ed., American Society of Photogrammetry, Falls Church, VA, 1980. W. J. Stokes and J. R. Williamson, Annu. Proc. Photogrammetry Remote Sensing Falls Church, 2, August 1992, pp. 8. C. Van der Lugt, Int. Assoc. Identification Annu. Educ. Conf. Milwaukee, WI, July 11–17, 1999, pp. 30. P. Vanezis and C. Brierley, Sci. Justice 36(1), 27–33 (1996). R. W. Vorder Bruegge and T. M. Musheno, Proc. Am. Defense Preparedness Assoc. 12th Annu. Joint Gov.-Ind. Security Technol. Symp. Exhibition, Williamsburg, VI, 1996, pp. 8. R. W. Vorder Bruegge, MAAFS Newsl. 23(2), 5–6 (1995). R. W. Vorder Bruegge, J. Forensic Sci. 44(3), 613–622 (1999). J. Whitnall and K. Millen-Playter, Photomethods 36–39, November 1985. J. Whitnall, K. Millen-Playter, and F. H. Moffitt, Functional Photography 32–38, January/February 1988. J. Whitnall and F. H. Moffitt, in Non-Topographic Photogrammetry, 2nd ed., American Society for Photogrammetry and Remote Sensing, 1989, pp. 389–393. J. R. Williamson and M. H. Brill, Dimensional Analysis Through Perspective: A Reference Manual, American Society for Photogrammetry and Remote Sensing, 1990, pp. 233.
G. A. Baxes, Digital Image Processing: Principles and Applications, John Wiley & Sons, Inc., New York, 1994. A. Bovik, ed., Handbook of Image and Video Processing, Academic Press, San Diego, 2000.
IMAGING SCIENCE IN MEDICINE
K. R. Castleman, Digital Image Processing, Prentice Hall, Upper Saddle River, 1996.
WILLIAM R. HENDEE Medical College of Wisconsin Milwaukee, WI
J. C. Russ, Forensic Uses of Digital Imaging, CRC Press, Boca Raton, 2001. J. C. Russ, The Image Processing Handbook, 3rd ed., CRC Press, Boca Raton, 1998.
INTRODUCTION
R. W. Vorder Bruegge, Proc. SPIE 3,576, 185–194 (1999).
Natural science is the search for ‘‘truth’’ about the natural world. In this definition, truth is defined by
P. Warrick, J. of Forensic Identification 50(1), 20–32 (2000).
IMAGING SCIENCE IN MEDICINE
principles and laws that have evolved from observations and measurements about the natural world that are reproducible through procedures that follow universal rules of scientific experimentation. These observations reveal properties of objects and processes in the natural world that are assumed to exist independently of the measurement technique and of our sensory perceptions of the natural world. The purpose of science is to use these observations to characterize the static and dynamic properties of objects, preferably in quantitative terms, and to integrate these properties into principles and, ultimately, laws and theories that provide a logical framework for understanding the world and our place in it. As a part of natural science, human medicine is the quest for understanding one particular object, the human body, and its structure and function under all conditions of health, illness, and injury. This quest has yielded models of human health and illness that are immensely useful in preventing disease and disability, detecting and diagnosing conditions of illness and injury, and designing therapies to alleviate pain and suffering and restore the body to a state of wellness or, at least, structural and functional capacity. The success of these efforts depends on our depth of understanding of the human body and on the delineation of effective ways to intervene successfully in the progression of disease and the effects of injuries. Progress in understanding the body and intervening successfully in human disease and injury has been so remarkable that the average life span of humans in developed countries is almost twice that expected a century ago. Greater understanding has occurred at all levels from the atomic through molecular, cellular, and tissue to the whole body and the social influences on disease patterns. At present, a massive research effort is focused on acquiring knowledge about genetic coding (the Human Genome Project) and the role of genetic coding in human health and disease. This effort is progressing at an astounding rate and gives rise to the belief among many medical scientists that genetics and bioinformatics (mathematical modeling of biological information, including genetic information) are the major research frontiers of medical science for the next decade or longer.
The human body is an incredibly complex system. Acquiring data about its static and dynamic properties yields massive amounts of information. One of the major challenges to researchers and clinicians is the question of how to acquire, process, and display vast quantities of information about the body, so that the information can be assimilated, interpreted, and used to yield more useful diagnostic methods and therapeutic procedures. In many cases, the presentation of information as images is the most efficient approach to this challenge. As humans, we understand this efficiency; from our earliest years we rely more heavily on sight than on any other perceptual skill in relating to the world around us. Physicians also increasingly rely on images to understand the human body and intervene in the processes of human illness and injury. The use of images to manage and interpret information about biological and medical processes is certain to continue to expand in clinical medicine and also in the biomedical research enterprise that supports it. Images of a complex object such as the human body reveal characteristics of the object such as its transmissivity, opacity, emissivity, reflectivity, conductivity, and magnetizability, and changes in these characteristics with time. Images that delineate one or more of these characteristics can be analyzed to yield information about underlying properties of the object, as depicted in Table 1. For example, images (shadowgraphs) created by X rays transmitted through a region of the body reveal intrinsic properties of the region such as its effective atomic number Z, physical density (grams/cm3 ) and electron density (electrons/cm3 ). Nuclear medicine images, including emission computed tomography (ECT) where pharmaceuticals release positrons [positron emission tomography (PET)] and single photons [single photon emission computed tomography (SPECT)], reveal the spatial and temporal distribution of target-specific pharmaceuticals in the human body. Depending on the application, these data can be interpreted to yield information about physiological processes such as glucose metabolism, blood volume, flow and perfusion, tissue and organ uptake, receptor binding, and oxygen utilization. In ultrasonography, images are produced by capturing energy reflected from interfaces in the body that separate tissues that have different acoustic impedances, where the acoustic impedance is the product
Table 1. Energy Sources and Tissue Properties Employed in Medical Imaging Image Sources • X rays • γ rays • Visible light • Ultraviolet light • Annihilation radiation • Electric fields • Magnetic fields • Infrared • Ultrasound • Applied voltage
743
Image Influences
Image Properties
• Mass Density • Electron density • Proton density • Atomic number • Velocity • Pharmaceutical location • Current flow • Relaxation • Blood volume/flow • Oxygenation level of blood • Temperature • Chemical state
• Transmissivity • Opacity • Emissivity • Reflectivity • Conductivity • Magnetizability • Resonance absorption
744
IMAGING SCIENCE IN MEDICINE
of the physical density and the velocity of ultrasound in the tissue. Magnetic resonance imaging (MRI) of relaxation characteristics following magnetization of tissues can be translated into information about the concentration, mobility, and chemical bonding of hydrogen and, less frequently, other elements present in biological tissues. Maps of the electrical field (electroencephalography) and the magnetic field (magnetoencephalography) at the surface of the skull can be analyzed to identify areas of intense electrical activity in the brain. These and other techniques that use the energy sources listed in Table 1 provide an array of imaging methods useful for displaying structural and functional information about the body that is essential to improving human health by detecting and diagnosing illness and injury. The intrinsic properties of biological tissues that are accessible by acquiring and interpreting images vary spatially and temporally in response to structural and functional changes in the body. Analysis of these variations yields information about static and dynamic processes in the human body. These processes may be changed by disease and disability, and identification of the changes through imaging often permits detecting and delineating the disease or disability. Medical images are pictures of tissue characteristics that influence the way energy is emitted, transmitted, reflected, etc. by the human body. These characteristics are related to, but not the same as, the actual structure (anatomy), composition (biology and chemistry), and function (physiology and metabolism) of the body. Part of the art of interpreting medical images is to bridge the gap between imaging characteristics and clinically relevant properties that aid in diagnosing and treating disease and disability. ADVANCES IN MEDICAL IMAGING Advances in medical imaging have been driven historically by the ‘‘technology push’’ principle. Especially influential have been imaging developments in other areas, particularly in the defense and military sectors, that have been imported into medicine because of their potential applications in detecting and diagnosing human illness and injury. Examples include ultrasound developed initially for submarine detection (sonar), scintillation detectors and reactor-produced isotopes (including 131 I and 60 Co) that emerged from the Manhattan Project (the United States World War II effort to develop the atomic bomb), rare-earth fluorescent compounds synthesized initially in defense and space research laboratories, electrical conductivity detectors for detecting rapid blood loss on the battlefield, and the evolution of microelectronics and computer industries from research funded initially for security and surveillance, defense, and military purposes. Basic research laboratories have also provided several imaging technologies that have migrated successfully into clinical medicine. Examples include reconstruction mathematics for computed tomographic imaging and nuclear magnetic resonance techniques that evolved into magnetic resonance imaging and spectroscopy. The migration of technologies from other arenas into medicine has not always been successful. For example, infrared detection devices
developed for night vision in military operations have so far not proven useful in medicine, despite early enthusiasm for infrared thermography as an imaging method for early detection of breast cancer. Today the emphasis in medical imaging is shifting from a ‘‘technology push’’ approach toward the concept of ‘‘biological/clinical pull.’’ This shift in emphasis reflects a deeper understanding of the biology underlying human health and disease and a growing demand for accountability and proven usefulness of technologies before they are introduced into clinical medicine. Increasingly, unresolved biological questions important in diagnosing and treating human disease and disability are used as an incentive for developing new imaging methods. For example, the function of the human brain and the causes and mechanisms of various mental disorders such as dementia, depression, and schizophrenia are among the greatest biological enigmas that confront biomedical scientists and clinicians. A particularly fruitful method for penetrating this conundrum is the technique of functional imaging that employs tools such as ECT and MRI. Functional magnetic resonance imaging (fMRI) is especially promising as an approach to unraveling some of the mysteries of human brain function in health and in various conditions of disease and disability. Another example is the use of X-ray computed tomography and magnetic resonance imaging as feedback mechanisms to shape and guide the optimized deployment of radiation beams for cancer treatment. The growing use of imaging techniques in radiation oncology reveals an interesting and rather recent development. Until about three decades ago, the diagnostic and therapeutic applications of ionizing radiation were practiced by a single medical specialty. In the late 1960s, these applications began to separate into distinct medical specialties, diagnostic radiology and radiation oncology, that have separate training programs and clinical practices. Today, imaging is used extensively in radiation oncology to characterize the cancers to be treated, design the plans of treatment, guide the delivery of radiation, monitor the response of patients to treatment, and follow patients over the long term to assess the success of therapy, the occurrence of complications, and the frequency of recurrence. The process of accommodating this development in the training and practice of radiation oncology is encouraging a closer working relationship between radiation oncologists and diagnostic radiologists. EVOLUTIONARY DEVELOPMENTS IN IMAGING Six major developments are converging today to raise imaging to a more prominent role in biological and medical research and in the clinical practice of medicine (1): • The ever-increasing sophistication of the biological questions that can be addressed as knowledge expands and understanding grows about the complexity of the human body and its static and dynamic properties. • The ongoing evolution of imaging technologies and the increasing breadth and depth of the questions
IMAGING SCIENCE IN MEDICINE
•
•
•
•
that these technologies can address at ever more fundamental levels. The accelerating advances in computer technology and information networking that support imaging advances such as three- and four-dimensional representations, superposition of images from different devices, creation of virtual reality environments, and transportation of images to remote sites in real time. The growth of massive amounts of information about patients that can best be compressed and expressed by using images. The entry into research and clinical medicine of young persons who are amazingly facile with computer technologies and comfortable with images as the principal pathway to acquiring and displaying information. The growing importance of images as an effective means to convey information in visually oriented developed cultures.
A major challenge confronting medical imaging today is the need to exploit this convergence of evolutionary developments efficiently to accelerate biological and medical imaging toward the realization of its true potential. Images are our principal sensory pathway to knowledge about the natural world. To convey this knowledge to others, we rely on verbal communications that follow accepted rules of human language, of which there are thousands of varieties and dialects. In the distant past, the acts of knowing through images and communicating through languages were separate and distinct processes. Every technological advance that brought images and words closer, even to the point of convergence in a single medium, has had a major cultural and educational impact. Examples of such advances include the printing press, photography, motion pictures, television, video games, computers, and information networking. Each of these technologies has enhanced the shift from using words to communicate information toward a more efficient synthesis of images to provide insights and words to explain and enrich insights (2). Today, this synthesis is evolving at a faster rate than ever before, as evidenced, for example, by the popularity of television news and documentaries and the growing use of multimedia approaches to education and training. A two-way interchange of information is required to inform and educate individuals. In addition, flexible means are needed for mixing images and words and their rate and sequence of presentation to capture and retain the attention, interest, and motivation of persons engaged in the educational process. Computers and information networks provide this capability. In medicine, their use in association with imaging technologies greatly enhances the potential contribution of medical imaging to resolving patient problems in the clinical setting. At the beginning of the twenty-first century, the six evolutionary developments listed before provide the framework for major advances in medical imaging and its contributions to improvements in the health and well-being of people worldwide.
745
Molecular Medicine Medical imaging has traditionally focused on acquiring structural (anatomic) and, to a lesser degree, functional (physiological) information about patients at the organ and tissue levels. This focus has nurtured the correlation of imaging findings with pathological conditions and led to enhanced detection and diagnosis of human disease and injury. At times, however, detection and diagnosis occur at a stage in the disease or injury where radical intervention is required and the effectiveness of treatment is compromised. In many cases, detection and diagnosis at an earlier stage in the progression of disease and injury are required to improve the effectiveness of treatment and enhance the well-being of patients. This objective demands that medical imaging refocus its efforts from the organ and tissue levels to the cellular and molecular levels of human disease and injury. Many scientists believe that medical imaging is well positioned today to experience this refocusing as a benefit of knowledge gained at the research frontiers of molecular biology and genetics. This benefit is often characterized as the entry of medical imaging into the era of molecular medicine. Examples include the use of magnetic resonance to characterize the chemical composition of cancers, emission computed tomography to display the perfusion of blood in the myocardium, and microfocal X-ray computed tomography to reveal the microvasculature of the lung. Contrast agents are widely employed in X ray, ultrasound, and magnetic resonance imaging techniques to enhance the visualization of properties correlated with patient anatomy and physiology. Agents in wide use today localize in tissues either by administration into specific anatomic compartments such as the gastrointestinal or vascular systems or by reliance on nonspecific changes in tissues such as increased capillary permeability or alterations in the extracellular fluid space. These localization mechanisms frequently do not yield a sufficient concentration differential of the agent to reveal subtle tissue differences associated with the presence of an abnormal condition. New contrast agents are needed that exploit growing knowledge about biochemical receptor systems, metabolic pathways, and ‘‘antisense’’ (variant DNA) molecular technologies to yield concentration differentials sufficient to reveal subtle variations among various tissues that may reflect the presence of pathological conditions. Another important imaging application of molecular medicine is using imaging methods to study cellular, molecular, and genetic processes. For example, cells may be genetically altered to attract metal ions that (1) alter the magnetic susceptibility, thereby permitting their identification by magnetic resonance imaging techniques; or (2) are radioactive and therefore can be visualized by nuclear imaging methods. Another possibility is to transect cells with genetic material that causes expression of cell surface receptors that can bind radioactive compounds (3). Conceivably this technique could be used to tag affected cells and monitor the progress of gene therapy. Advances in molecular biology and genetics are yielding new knowledge at an astonishing rate about the molecular and genetic infrastructure that underlie the static and
746
IMAGING SCIENCE IN MEDICINE
dynamic processes of human anatomy and physiology. This new knowledge is likely to yield increasingly specific approaches to using imaging methods to visualize normal and abnormal tissue structure and function at increasingly microscopic levels. These methods will in all likelihood lead to further advances in molecular medicine. Human Vision Images are the product of the interaction of the human visual system with its environment. Any analysis of images, including medical images, must include at least a cursory review of the process of human vision. This process is outlined here; a more detailed treatment of the characteristics of the ‘‘end user’’ of images is provided in later sections of this Encyclopedia. Anatomy and Physiology of the Eye The human eye, diagrammed in Fig. 1, is an approximate sphere that contains four principal features: the cornea, iris, lens, and retina. The retina contains photoreceptors that translate light energy into electrical signals that serve as nerve impulses to the brain. The other three components serve as focusing and filtering mechanisms to transmit a sharp, well-defined light image to the retina.
Tunics. The wall of the eye consists of three layers (tunics) that are discontinuous in the posterior portion where the optic nerve enters the eye. The outermost tunic is a fibrous layer of dense connective tissue that includes the cornea and the sclera. The cornea comprises the front curved surface of the eye, contains an array of collagen fibers and no blood vessels, and is transparent to visible light. The cornea serves as a coarse focusing element to project light onto the observer’s retina. The sclera, or white of the eye, is an opaque and resilient sheath to which the eye muscles are attached. The second layer of the wall is a vascular tunic termed the uvea. It contains the choroid,
Posterior chamber
Anterior lris chamber lris
Cornea
ciliary body, and iris. The choroid contains a dense array of capillaries that supply blood to all of the tunics. Pigments in the choroid reduce internal light reflection that would otherwise blur the images. The ciliary body contains the muscles that support and focus the lens. It also contains capillaries that secrete fluid into the anterior segment of the eyeball. The iris is the colored part of the eye that has a central aperture termed the pupil. The diameter of the aperture can be altered by the action of muscles in the iris to control the amount of light that enters the posterior cavity of the eye. The aperture can vary from about 1.5–8 mm.
Chambers and Lens. The anterior and posterior chambers of the eye are filled with fluid. The anterior chamber contains aqueous humor, a clear plasma-like fluid that is continually drained and replaced. The posterior humor is filled with vitreous humor, a clear viscous fluid that is not replenished. The cornea, aqueous and vitreous humors, and the lens serve collectively as the refractive media of the eye. The lens of the eye provides the fine focusing of incident light onto the retina. It is a convex lens whose thickness can be changed by action of the ciliary muscles. The index of refraction of the lens is close to that of the surrounding fluids in which it is suspended, so it serves as a finefocusing adjustment to the coarse focusing function of the cornea. The process of accommodation by which near objects are brought into focus is achieved by contraction of the ciliary muscles. This contraction causes the elastic lens to bow forward into the aqueous humor, thereby increasing its thickness. Accommodation is accompanied by constriction of the pupil, which increases the depth of field of the eye. The lens loses its flexibility from aging and is unable to accommodate, so that near objects can be focused onto the retina. This is the condition of presbyopia in which reading glasses are needed to supplement the focusing ability of the lens. Clouding of the lens by aging results in diminution of the amount of light that reaches the retina. This condition is known as a lens cataract; when severe enough it makes the individual a candidate for surgical replacement of the lens, often with an artificial lens.
Lens
Hyaloid canal Vitreous
body
Retina Choroid Sclera
Fovea centralis
Central retinal artery Optic nerve
Figure 1. Horizontal section through the human eye (from Ref. 4, with permission).
Retina. The innermost layer of the eye is the retina, which is composed of two components, an outer monolayer of pigmented cells and an inner neural layer of photoreceptors. Because considerable processing of visual information occurs in the retina, it often is thought of more as a remote part of the brain rather than as simply another component of the eye. There are no photoreceptors where the optic nerve enters the eye, creating a blind spot. Near the blind spot is the mucula lutae, an area of about 3 mm2 over which the retina is especially thin. Within the macula lutae is the fovea centralis, a slight depression about 0.4 mm in diameter. The fovea is on the optical axis of the eye and is the area where the visual cones are concentrated to yield the greatest visual acuity. The retina contains two types of photoreceptors, termed rods and cones. Rods are distributed over the entire retina, except in the blind spot and the fovea centralis. The
IMAGING SCIENCE IN MEDICINE
retina contains about 125 million rods, or about 105 /mm2 . Active elements in the rods (and in the cones as well) are replenished throughout an individual’s lifetime. Rods have a low but variable threshold to light and respond to very low intensities of incident light. Vision under low illumination levels (e.g., night vision) is attributable almost entirely to rods. Rods contain the light-sensitive pigment rhodopsin (visual purple) which undergoes chemical reactions (the rhodopsin cycle) when exposed to visible light. Rhodopsin consists of a lipoprotein called opsin and a chromophore (light-absorbing chemical compound called 11-cis-retinal) (5). The chemical reaction begins with the breakdown of rhodopsin and ends with the recombination of the breakdown products into rhodopsin. The recovery process takes 20–30 minutes, which is the time required to accommodate to low levels of illumination (dark adaptation). The process of viewing with rods is known as ‘‘scotopic’’ vision. The rods are maximally sensitive to light of about 510 nm in the blue–green region of the visible spectrum. Rods have no mechanisms to discriminate different wavelengths of light, and vision under low illumination conditions is essentially ‘‘colorblind.’’ More than 100 rods are connected to each ganglion cell, and the brain has no way of discriminating among these photoreceptors to identify the origin of an action potential transmitted along the ganglion. Hence, rod vision is associated with relatively low visual acuity in combination with high sensitivity to low levels of ambient light. The retina contains about 7 million cones that are packed tightly in the fovea and diminish rapidly across the macula lutae. The density of cones in the fovea is about 140,000/mm2 . Cones are maximally sensitive to light of about 550 nm in the yellow–green portion of the visible spectrum. Cones are much (1/104 ) less sensitive to light than rods, but in the fovea there is a 1 : 1 correspondence between cones and ganglions, so that visual acuity is very high. Cones are responsible for color vision through mechanisms that are imperfectly understood at present. One popular theory of color vision proposes that three types of cones exist; each has a different photosensitive pigment that responds maximally to a different wavelength (450 nm for ‘‘blue’’ cones, 525 nm for ‘‘green’’ cones, and 555 nm for ‘‘red’’ cones). The three cone pigments share the same chromophore as the rods; their different spectral sensitivities result from differences in the opsin component. Properties of Vision For two objects to be distinguished on the retina, light rays from the objects must define at least a minimum angle as they pass through the optical center of the eye. The minimum angle is defined as the visual angle. The visual angle, expressed in units of minutes of arc, determines the visual acuity of the eye. A rather crude measure of visual acuity is provided by the Snellen chart that consists of rows of letters that diminish in size from top to bottom. When viewed from a distance of 20 feet, a person who has normal vision can just distinguish the letters in the eighth row. This person is said to have 20 : 20 vision (i.e., the person can see from 20 feet what a normal person can
747
see at the same distance). At this distance, the letters on the eighth row form a visual angle of 1 minute of arc. An individual who has excellent vision and who is functioning in ideal viewing conditions can achieve a visual angle of about 0.5 minutes of arc, which is close to the theoretical minimum defined by the packing density of cones on the retina (6). A person who has 20 : 100 vision can see at 20 feet what a normal person can see at 100 feet. This individual is considered to have impaired visual acuity. Other more exact tests administered under conditions of uniform illumination are used for actual clinical diagnosis of visual defects. If the lettering of a Snellen chart is reversed (i.e., white letters on a black chart, rather than black letters on a white chart), the ability of observers to recognize the letters from a distance is greatly impaired. The eye is extremely sensitive to small amounts of light. Although the cones do not respond at illumination levels below a threshold of about 0.001 cd/m2 , rods are much more sensitive and respond to just a few photons. For example, as few as 10 photons can generate a visual stimulus in an area of the retina where rods are at high concentration (7). Differences in signal intensity that can just be detected by the human observer are known as just noticeable differences (JND). This concept applies to any type of signal, including light, that can be sensed by the observer. The smallest difference in signal that can be detected depends on the magnitude of the signal. For example, we may be able to discern the brightness difference between one and two candles, but we probably cannot distinguish the difference between 100 and 101 candles. This observation was quantified by the work of Weber who demonstrated that the JND is directly proportional to the intensity of the signal. This finding was quantified by Fechner as
dI , (1) dS = k I where I is the intensity of stimulus, dS is an increment of perception (termed a limen), and k is a scaling factor. The integral form of this expression is known as the Weber–Fechner law: S = k(log I) + C,
(2)
or, by setting C = −k(log I0 ),
S=k
log I I0
.
(3)
This expression states that the perceived signal S varies with the logarithm of the relative intensity. The Weber–Fechner law is similar to the equation for expressing the intensity of sound in decibels and provides a connection between the objective measurement of sound intensity and the subjective impression of loudness. A modification of the Weber–Fechner law is known as the power law (6). In this expression, the relationship between a stimulus and the perceived signal can be stated as
dI dS , (4) =n S I
748
IMAGING SCIENCE IN MEDICINE
Table 2. Exponents in the Power Law for a Variety of Psychophysical Responsesa Perceived Quantity
Exponent
Stimulus
0.6 0.5 0.55 1.3 1.0 0.95 1.1 1.1 1.45 3.5
Binaural Point source Coffee Salt Cold on arm 60 Hz on finger White noise stimulus Static force on palm Lifted weight 60 Hz through fingers
100
a
Mass attenuation coefficient (cm2/g)
50 Loudness Brightness Smell Taste Temperature Vibration Duration Pressure Heaviness Electric shock From Ref. 7.
which, when integrated, yields log S = n(log I) + K
(5)
YTaO4
20
10
Gd2O2S
5
2
1
CaWO4
and, when K is written as −n[log I0 ],
S=
I I0
n
0.5
,
(6) 10
where I0 is a reference intensity. The last expression, known as the power law, states that the perceived signal S varies with the relative intensity raised to the power n. The value of the exponent n has been determined by Stevens for a variety of sensations, as shown in Table 2. Image Quality The production of medical images relies on intercepting some form of radiation that is transmitted, scattered, or emitted by the body. The device responsible for intercepting the radiation is termed an image receptor (or radiation detector). The purpose of the image receptor is to generate a measurable signal as a result of energy deposited in it by the intercepted radiation. The signal is often, but not always, an electrical signal that can be measured as an electrical current or voltage pulse. Various image receptors and their uses in specific imaging applications are described in the following sections. In describing the properties of a medical image, it is useful to define certain image characteristics. These characteristics and their definitions change slightly from one type of imaging process to another, so a model is needed to present them conceptually. X-ray projection imaging is the preferred model because this process accounts for more imaging procedures than any other imaging method used in medicine. In X-ray imaging, photons transmitted through the body are intercepted by an image receptor on the side of the body opposite from the X-ray source. The probability of interaction of a photon of energy E in the detector is termed the quantum detection efficiency η (8). This parameter is defined as η = 1 − e−µ(E)t ,
(7)
20
50
100
200
Photon energy (keV) Figure 2. Attenuation curves for three materials used in X-ray intensifying screens (from Ref. 9, with permission).
where µ(E) is the linear attenuation coefficient of the detector material that intercepts X rays incident on the image receptor and t is the thickness of the material. The quantum detection efficiency can be increased by making the detector thicker or by using materials that absorb X rays more readily (i.e., have a greater attenuation coefficient µ(E) because they have a higher mass density or atomic number. In general, η is greater at lower X-ray energies and decreases gradually with increasing energy. If the absorbing material has an absorption edge in the energy range of the incident X rays, however, the value of η increases dramatically for X-ray energies slightly above the absorption edge. Absorption edges are depicted in Fig. 2 for three detectors (Gd2 O2 S, YTaO4 , and CaW04 ) used in X-ray imaging. Image Noise Noise may be defined generically as uncertainty in a signal due to random fluctuations in the signal. Noise is present in all images. It is a result primarily of forming the image from a limited amount of radiation (photons). This contribution to image noise, referred to as quantum mottle, can be reduced by using more radiation to form the image. However, this approach also increases the exposure of the patient to radiation. Other influences on image noise include the intrinsically variable properties of the tissues represented in the image, the type of receptor chosen to acquire the image, the image receptor, processing and display electronics, and the amount of
IMAGING SCIENCE IN MEDICINE
scattered radiation that contributes to the image. In most instances, quantum mottle is the dominant influence on image noise. In an image receptor exposed to N0 photons, the image is formed with ηN0 photons, and the photon image noise σ can be estimated as σ = (ηN0 )1/2 . The signal-to-noise ratio (SNR) is SNR =
ηN0 (ηN0 )1/2
(8)
= (ηN0 )1/2 A reduction in either the quantum detection efficiency η of the receptor or the number of photons N0 used to form the image yields a lower signal-to-noise ratio and produces a noisier image. This effect is illustrated in Fig. 3. A complete analysis of signal and noise propagation in an imaging system must include consideration of the
749
spatial-frequency dependence of both the signal and the noise. The propagation of the signal is characterized by the modulation transfer function (MTF), and the propagation of noise is described by the Wiener (noise power) spectrum W(f ). A useful quantity for characterizing the overall performance of an imaging system is its detective quantum efficiency DQE(f ), where (f ) reveals that DQE depends on the frequency of the signal. This quantity describes the efficiency with which an imaging system transfers the signal-to-noise ratio of the radiative pattern that emerges from the patient into an image to be viewed by an observer. An ideal imaging system has a DQE(f ) = η at all spatial frequencies. In actuality, DQE(f ) is invariably less than η, and the difference between DQE(f ) and η becomes greater at higher spatial frequencies. If DQE = 0.1η at a particular frequency, then the imaging system performs at that spatial frequency as if the number of photons were reduced to 1/10. Hence, the noise would increase by 101/2 at that particular frequency.
Figure 3. Illustration of quantum mottle. As the illumination of the image increases, quantum mottle decreases and the clarity of the image improves, as depicted in these classic photographs (from Ref. 10).
IMAGING SCIENCE IN MEDICINE
Spatial Resolution
Fourier transform F
The spatial resolution of an image is a measure of the smallest visible interval in an object that can be seen in an image of the object. Greater spatial resolution means that smaller intervals can be visualized in the image, that is, greater spatial resolution yields an image that is sharper. Spatial resolution can be measured and expressed in two ways: (1) by a test object that contains structures separated by various intervals, and (2) by a more formal procedure that employs the modulation transfer function (MTF). A simple but often impractical way to describe the spatial resolution of an imaging system is by measuring its point-spread function (PSF; Fig. 4). The PSF(x, y) is the acquired image of an object that consists of an infinitesimal point located at the origin, that is, for an object defined by the coordinates (0,0). The PSF is the function that operates on what would otherwise be a perfect image to yield an unsharp (blurred) image. If the extent of unsharpness is the same at all locations in the image, then the PSF has the property of being ‘‘spatially invariant,’’ and the relationship of the image to the object (or perfect image) is Image(x, y) = PSF(x, y) ⊗ object(x, y),
(9)
where the ‘‘⊗’’ indicates a mathematical operation referred to as ‘‘convolution’’ between the two functions. This operation can be stated as Image(x, y) = PSF(x − u, y − v) object (u, v) du dv.
MTF(m, n) = F[PSF(x, y)],
(10)
The convolution effectively smears each value of the object by the PSF to yield the image. The convolution (blurring) operation can be expressed by a functional operator, S[. . .], such that PSF(x, y) = S[point(x, y)] (11)
(12)
where (m, n) are the conjugate spatial frequency variables for the spatial coordinates (x, y). This expression of MTF is not exactly correct in a technical sense. The Fourier transform of the PSF is actually termed the system transfer function, and the MTF is the normalized absolute value of the magnitude of this function. When the PSF is real and symmetrical about the x and y axes, the absolute value of the Fourier transform of the PSF yields the MTF directly (11). MTFs for representative X-ray imaging systems (intensifying screen and film) are shown in Fig. 5. The PSF and the MTF, which is in effect the representation of the PSF in frequency space, are important descriptors of spatial resolution in a theoretical sense. From a practical point of view, however, the PSF is not very helpful because it can be generated and analyzed only approximately. The difficulty with the PSF is that the source must be essentially a singular point (e.g., a tiny aperture in a lead plate exposed to X rays or a minute source of radioactivity positioned at some distance from a receptor). This condition allows only a few photons (i.e., a very small amount of radiation) to strike the receptor, and very long exposure times are required to acquire an
Relative intensity (light)
750
where S[. . .] represents the blurring operator, referred to as the linear transform of the system. The modulation transfer function MTF(m, n) is obtained from the PSF(x, y) by using the two-dimensional
10-µm X-ray pencil beam
1.0
Fast screen 0.5 Medium screen
0
400
200
0 200 Distance (µm)
400
Modulation transfer function
1.0 Perfect screen Medium screen 0.5 Fast screen
0
Figure 4. The point-spread function PSF (x,y).
0
2
4
6 Cycles/mm
8
10
12
Figure 5. Point-spread (top) and modulation-transfer (bottom) functions for fast and medium-speed CaWO4 intensifying screens (from Ref. 9, with permission).
IMAGING SCIENCE IN MEDICINE
image without excessive noise. In addition, measuring and characterizing the PSF present difficult challenges. One approach to overcoming the limitations of the PSF is to measure the line-spread function (LSF). In this approach, the source is represented as a long line of infinitesimal width (e.g., a slit in an otherwise opaque plate or a line source of radioactivity). The LSF can be measured by a microdensitometer that is scanned across the slit in a direction perpendicular to its length. As for the pointspread function, the width of the line must be so narrow that it does not contribute to the width of the image. If this condition is met, the width of the image is due entirely to unsharpness contributed by the imaging system. The slit (or line source) is defined mathematically as
+∞
Input = line (x ) = ∫ point (x, y )dy
751
Output = LSF(x ) = S [line(x )]
−∞
y
y
(13)
Figure 6. The line-spread function is the image of an ideal line object, where S represents the linear transform of the imaging system (from Ref. 11, with permission).
The line-spread function LSF(x) results from the blurring operator for the imaging system operating on a line source (Fig. 6), that is,
is presented with a source that transmits radiation on one side of an edge and attenuates it completely on the other side. The transmission is defined as:
+∞ LSF(x) = S[line(x)] = S point(x, y) dy
STEP(x, y) = 1 if x > 0, and 0 if x < 0
+∞ point(x, y) dy. line(x) = −∞
This function can also be written as
−∞
+∞ S[point(x, y)] dy =
(14)
x STEP(x, y) =
−∞
+∞ PSF(x, y) dy, LSF(x) =
line (x) dx
(15)
The edge-spread function ESF (x) can be computed as
that is, the line-spread function LSF is the point-spread function PSF integrated over the y dimension. The MTF of an imaging system can be obtained from the Fourier transform of the LSF.
ESF(x) = S[STEP(x, y)] = S
+∞ LSF(x) exp(−2π imx) dx =
x
line(x) dx
−∞
x
x S[line(x)] dx =
= −∞
F[LSF (x)]
LSF(x) dx
(19)
−∞
This relationship, illustrated in Fig. 7, shows that the LSF (x) is the derivative of the edge-response function ESF (x). d [ESF(x)] (20) LSF(x) = dx
−∞
+∞ +∞ PSF(x, y) exp(−2π imx) dy dx = −∞
+∞ +∞ PSF(x, y) exp[−2π i(mx + my)] dy dy n = 0 = −∞
(18)
−∞
−∞
−∞
(17)
Input : STEP (x ) = 0 if x < 0,1 if x > 0 Output = ESF (x ) = S [STEP(x )]
−∞
= F[PSF(x, y)]n=0 = MTF(m, 0),
(16)
that is, the Fourier transform of the line-spread function is the MTF evaluated in one dimension. If the MTF is circularly symmetrical, then this expression describes the MTF completely in the two-dimensional frequency plane. One final method of characterizing the spatial resolution of an imaging system is by using the edge-response function STEP (x, y). In this approach, the imaging system
y
y
Figure 7. The edge-spread function is derived from the image of an ideal step function, where S represents the linear transform of the imaging system (from Ref. 11, with permission).
752
IMAGING SCIENCE IN MEDICINE
The relationship between the ESF and the LSF is useful because one can obtain a microdensitometric scan of the edge to yield an edge-spread function. The derivative of the ESF yields the LSF, and the Fourier transform of the LSF provides the MTF in one dimension. Contrast Image contrast refers conceptually to the difference in brightness or darkness between a structure of interest and the surrounding background in an image. Usually, information in a medical image is presented in shades of gray (levels of ‘‘grayness’’). Differences in gray shades are used to distinguish various types of tissue, analyze structural relationships, and sometimes quantify physiological function. Contrast in an image is a product of both the physical characteristics of the object being studied and the properties of the image receptor used to form the image. In some cases, contrast can be altered by the exposure conditions chosen for the examination [for example, selection of the photon energy (kVp ) and use of a contrast agent in X-ray imaging]. Image contrast is also influenced by perturbing factors such as scattered radiation and the presence of extraneous light in the detection and viewing systems. An example of the same image at different levels of contrast is shown in Fig. 8. In most medical images, contrast is a consequence of the types of tissue represented in the image. In X-ray imaging, for example, image contrast reveals differences in the attenuation of X rays among various regions of the body, modified to some degree by other factors such as the properties of the image receptor, exposure technique, and the presence of extraneous (scattered) radiation. A simplified model of the human body consists of three different body tissues: fat, muscle, and bone. Air is also present in the lungs, sinuses, and gastrointestinal tract, and a contrast agent may have been used to accentuate the attenuation of X rays in a particular region. The chemical composition of the three body tissues, together with their percentage mass composition, are shown in Table 3. Selected physical properties of the tissues are included in Table 4, and the mass attenuation coefficients for different tissues as a function of photon energy are shown in Fig. 9.
Table 3. Elemental Composition of Tissue Constituentsa % Composition (by Mass) Hydrogen Carbon Nitrogen Oxygen Sodium Magnesium Phosphorus Sulfur Potassium Calcium a
Adipose Tissue
Muscle (Striated)
11.2 57.3 1.1 30.3
10.2 12.3 3.5 72.9 0.08 0.02 0.2 0.5 0.3 0.007
0.06
water 11.2
88.8
Bone (Femur) 8.4 27.6 2.7 41.0 7.0 7.0 0.2 14.7
From Ref. 11, with permission.
Table 4. Properties of Tissue Constituents of the Human Bodya Material
Effective Atomic Number
Air Water Muscle Fat Bone
7.6 7.4 7.4 5.9–6.3 11.6–13.8
a
Density (kg/m3 )
Electron Density (electrons/kg)
1.29 1.00 1.00 0.91 1.65–1.85
3.01 × 1026 3.34 × 1026 3.36 × 1026 3.34–3.48 × 1026 3.00–3.10 × 1026
From Ref. 11, with permission.
In Table 4, the data for muscle are also approximately correct for other soft tissues such as collagen, internal organs (e.g., liver and kidney), ligaments, blood, and cerebrospinal fluid. These data are very close to the data for water, because soft tissues, including muscle, are approximately 75% water, and body fluids are 85% to nearly 100% water. The similarity of these tissues suggests that conventional X-ray imaging yields poor discrimination among them, unless a contrast agent is used to accentuate the differences in X-ray attenuation. Because of the presence of low atomic number (low Z) elements, especially hydrogen, fat has a lower density and effective atomic number compared with muscle and other soft tissues. At less than 35 keV, X rays interact in fat and
Figure 8. Different levels of contrast in an image (from Ref. 12, with permission)
IMAGING SCIENCE IN MEDICINE
100
le at sc F ne Mu
Bo
Mass attenuation coefficient (cm2/gm)
1000
1
0.01 1
10 100 Photon energy (keV)
1000
Figure 9. Mass attenuation coefficient of tissues.
soft tissues predominantly by photoelectric interactions that vary with Z3 of the tissue. This dependence provides higher image contrast among tissues of slightly different composition (e.g., fat and muscle) when low energy X rays are used, compared with that obtained from higher energy X rays that interact primarily by Compton interactions that do not depend on atomic number. Low energy X rays are used to accentuate subtle differences in soft tissues (e.g., fat and other soft tissues) in applications such as breast imaging (mammography), where the structures within the object (the breast) provide little intrinsic contrast. When images are desired of structures of high intrinsic contrast (e.g., the chest where bone, soft tissue, and air are present), higher energy X rays are used to suppress X-ray attenuation in bone which otherwise would create shadows in the image that could hide underlying soft-tissue pathology. In some accessible regions of the body, contrast agents can be used to accentuate tissue contrast. For example, iodine-containing contrast agents are often injected into the circulatory system during angiographic imaging of blood vessels. The iodinated agent is water-soluble and mixes with the blood to increase its attenuation compared with surrounding soft tissues. In this manner, blood vessels can be seen that are invisible in X-ray images without a contrast agent. Barium is another element that is used to enhance contrast, usually in studies of the gastrointestinal (GI) tract. A thick solution of a bariumcontaining compound is introduced into the GI tract by swallowing or enema, and the solution outlines the borders of the GI tract to permit visualization of ulcers, polyps, ruptures, and other abnormalities. Contrast agents have also been developed for use in ultrasound (solutions that contain microscopic gas bubbles that reflect sound energy) and magnetic resonance imaging (solutions that contain gadolinium that affects the relaxation constants of tissues). Integration of Image Noise, Resolution and Contrast — The Rose Model. The interpretation of images requires analyzing all of the image’s features, including noise, spatial resolution, and contrast. In trying to understand
753
the interpretive process, the analysis must also include the characteristics of the human observer. Collectively, the interpretive process is referred to as ‘‘visual perception.’’ The study of visual perception has captured the attention of physicists, psychologists, and physicians for more than a century — and of philosophers for several centuries. A seminal investigation of visual perception, performed by Albert Rose in the 1940s and 1950s, yielded the Rose model of human visual perception (13). This model is fundamentally a probabilistic analysis of detection thresholds in low-contrast images. Rose’s theory states that an observer can distinguish two regions of an image, called ‘‘target’’ and ‘‘background,’’ only if there is enough information in the image to permit making the distinction. If the signal is assumed to be the difference in the number of photons used to define each region and the noise is the statistical uncertainty associated with the number of photons in each region, then the observer needs a certain signal-to-noise ratio to distinguish the regions. Rose suggested that this ratio is between 5 and 7. The Rose model can be quantified by a simple example (11) that assumes that the numbers of photons used to image the target and background are Poisson distributed and that the target and background yield a low-contrast image in which N = number of photons that define the target ∼ number of photons that define the background N = signal = difference in the number of photons that define target and background A = area of the target = area of background region C = contrast of the signal compared with background The contrast between target and background is related to the number of detected photons N and the difference N between the number of photons that define the target and the background: N N Signal = N = CN C=
(21) (22)
For Poisson-distributed events, noise = (N)1/2 , and the signal-to-noise ratio (SNR) is SNR =
signal CN = C(N)1/2 = C(A)1/2 = noise (N)1/2
(23)
where is the photon fluence (number of photons detected per unit area) and A is the area of the target or background. Using the experimental data of his predecessor Blackwell, Rose found that the SNR has a threshold in the range of 5–7 for differentiating a target from its background. The Rose model is depicted in Fig. 3. A second approach to integrating resolution, contrast, and noise in image perception involves using contrastdetail curves. This method reveals the threshold contrast needed to perceive an object as a function of its diameter. Contrast-detail curves are shown in Fig. 10 for two sets
754
IMAGING SCIENCE IN MEDICINE
Invisible
Contrast
Low noise
High noise
Visible Size Figure 10. Results of tests using the contrast-detail phantom of Fig. 2–14 for high-noise and low-noise cases. Each dashed line indicates combinations of size and contrast of objects that are just barely visible above the background noise (from Ref. 9, with permission).
of images; one was acquired at a relatively high signal-tonoise ratio (SNR), and the other at a lower SNR. The curves illustrate the intuitively obvious conclusion that images of large objects can be seen at relatively low contrast, whereas smaller objects require greater contrast to be visualized. The threshold contrast curves begin in the upper left corner of the graph (large objects [coarse detail], low contrast) and end in the lower right corner (small objects [fine detail], high contrast). Contrast-detail curves can be used to compare the performance of two imaging systems or the same system under different operating conditions. When the performance data are plotted, as shown in Fig. 10, the superior imaging system is one that encompasses the most visible targets or the greatest area under the curve. Image Display/Processing Conceptually, an image is a two-(or sometimes three-) dimensional continuous distribution of a variable such as intensity or brightness. Each point in the image is an intensity value; for a color image, it may be a vector of three values that represent the primary colors red, green, and blue. An image includes a maximum intensity and a minimum intensity and hence is bounded by finite intensity limits as well as by specific spatial limits. For many decades, medical images were captured on photographic film, which provided a virtually continuous image limited only by the discreteness of image noise and film granularity. Today, however, many if not most medical images are generated by computer-based methods that yield digital images composed of a finite number of numerical values of intensity. A two-dimensional medical image may be composed of J rows of K elements, where each element is referred to as a picture element or pixel, and a three-dimensional image may consist of L slices,
where each slice contains J rows of K elements. The threedimensional image is made up of volume elements or voxels; each voxel has an area of one pixel and a depth equal to the slice thickness. The size of pixels is usually chosen to preserve the desired level of spatial resolution in the image. Pixels are almost invariably square, whereas the depth of a voxel may not correspond to its width and length. Interpolation is frequently used to adjust the depth of a voxel to its width and length. Often pixels and voxels are referred to collectively as image elements or elements. Digital images are usually stored in a computer so that each element is assigned to a unique location in computer memory. The elements of the image are usually stored sequentially, starting with elements in a row, then elements in the next row, etc., until all of the elements in a slice are stored; then the process is repeated for elements in the next slice. There is a number for each element that represents the intensity or brightness at the corresponding point in the image. Usually, this number is constrained to lie within a specific range starting at 0 and increasing to 255 [8 bit (1 byte) number], 65,535 [16 bit (2 byte) number], or even 4,294,967,295 [32 bit (4 byte) number]. To conserve computer memory, many medical images employ an 8-bit intensity number. This decision may require scaling the intensity values, so that they are mapped over the available 256 (0–255) numbers within the 8-bit range. Scaling is achieved by multiplying the intensity value Ii in each pixel by 255/Imzx to yield an adjusted intensity value to be stored at the pixel location. As computer capacity has grown and memory costs have decreased, the need for scaling to decrease memory storage has become less important and is significant today only when large numbers of high-resolution images (e.g., X-ray planar images) must be stored. To display a stored image, the intensity value for each pixel in computer memory is converted to a voltage that is used to control the brightness at the corresponding location on the display screen. The intensity may be linearly related to the voltage. However, the relationship between voltage and screen brightness is a function of the display system and usually is not linear. Further, it may be desirable to alter the voltage: brightness relationship to accentuate or suppress certain features in the image. This can be done by using a lookup table to adjust voltage values for the shades of brightness desired in the image, that is, voltages that correspond to intensity values in computer memory are adjusted by using a lookup table to other voltage values that yield the desired distribution of image brightness. If a number of lookup tables are available, the user can select the table desired to illustrate specific features in the image. Examples of brightness: voltage curves obtained from different lookup tables are shown in Fig. 11. The human eye can distinguish brightness differences of about 2%. Consequently, a greater absolute difference in brightness is required to distinguish bright areas in an image compared with dimmer areas. To compensate for this limitation, the voltage applied to the display screen may be modified by the factor ekV to yield an adjusted brightness that increases with the voltage V. The constant k can be chosen to provide the desired contrast in the displayed images.
IMAGING SCIENCE IN MEDICINE
755
(a) Number of pixels
(a) B
V (b) B
U Pixel value DB (b) Number of pixels
DV
L
V
Figure 11. (a) A linear display mapping; (b) a nonlinear display to increase contrast (from Ref. 6, with permission).
Image Processing Often pixel data are mapped onto a display system in the manner described before. Sometimes, however, it is desirable to distort the mapping to accentuate certain features of the image. This process, termed image processing, can be used to smooth images by reducing their noisiness, accentuate detail by sharpening edges in the image, and enhance contrast in selected regions to reveal features of interest. A few techniques are discussed have as examples of image processing. In many images, the large majority of pixels have intensity values that are clumped closely together, as illustrated in Fig. 12a. Mapping these values onto the display, either directly or in modified form as described earlier is inefficient because there are few bright or dark pixels to be displayed. The process of histogram equalization improves the efficiency by expanding the contrast range within which most pixels fall, and by compressing the range at the bright and dark ends where few pixels have intensity values. This method can make subtle differences in intensity values among pixels more visible. It is useful when the pixels at the upper and lower ends of the intensity range are not important. The process of histogram equalization is illustrated in Fig. 12b. Histogram equalization is also applicable when the pixels are clumped at the high or low end of intensity values. All images contain noise as an intrinsic product of the imaging process. Features of an image can be obscured by noise, and reducing the noise is sometimes desired to make such features visible. Image noise can be reduced by image smoothing (summing or averaging intensity values) across adjacent pixels. The selection and weighting of pixels for averaging can be varied among several patterns;
Pixel value Figure 12. (a) Representative image histogram; (b) intensityequalized histogram.
representative techniques are included there as examples of image smoothing. In the portrayal of a pixel and its neighbors shown in Fig. 13, the intensity value of the pixel (j,k) can be replaced by the average intensity of the pixel and its nearest neighbors (6). This method is then repeated for each pixel in the image. The nearest neighbor approach is a ‘‘filter’’ to reduce noise and yield a smoothed image. The nearest neighbor approach is not confined to a set number of pixels to arrive at an average pixel value; for example, the array of pixels shown in Fig. 13 could be reduced from
j − 1, k + 1
j − 1, k + 1
j + 1, k + 1
j − 1, k
j, k
j + 1, k
j − 1, k − 1
j, k − 1
j + 1, k − 1
Figure 13. The nearest neighbor approach to image smoothing.
756
IMAGING SCIENCE IN MEDICINE
9 to 5 pixels or increased from 9 to 25 pixels, in arriving at an average intensity value for the central pixel. An averaging of pixel values, in which all of the pixels are averaged by using the same weighting to yield a filtered image, is a simple approach to image smoothing. The averaging process can be modified so that the intensity values of some pixels are weighted more heavily than others. Weighted filters usually emphasize the intensity value of the central pixel (the one whose value will be replaced by the averaged value) and give reduced weight to the surrounding pixels in arriving at a weighted average. An almost universal rule of image smoothing is that when smoothing is employed, noise decreases, but unsharpness increases (i.e., edges are blurred). In general, greater smoothing of an image to reduce noise leads to greater blurring of edges as a result of increased unsharpness. Work on image processing often is directed at achieving an optimum balance between increased image smoothing and increased image unsharpness to reveal features of interest in specific types of medical images. The image filtering techniques described before are examples of linear filtering. Many other image-smoothing routines are available, including those that employ ‘‘nonlinear’’ methods to reduce noise. One example of a nonlinear filter is replacement of a pixel intensity by the median value, rather than the average intensity, in a surrounding array of pixels. This filter removes isolated noise spikes and speckle from an image and can help maintain sharp boundaries. Images can also be smoothed in frequency space rather than in real space, often at greater speed. Image Restoration. Image restoration is a term that refers to techniques to remove or reduce image blurring, so that the image is ‘‘restored’’ to a sharper condition that is more representative of the object. This technique is performed in frequency space by using Fourier transforms for both the image and the point-spread function of the imaging system. The technique is expressed as O(j, k) =
I(j, k) P(j, k)
(24)
where I (j,k) is the Fourier transform of the image, P (j,k) is the Fourier transform of the point-spread function, and O (j,k) is the Fourier transform of the object (in three-dimensional space, a third spatial dimension l would be involved). This method implies that the unsharpness (blurring) characteristics of the imaging device can be removed by image processing after the image has been formed. Although many investigators have pursued image restoration with considerable enthusiasm, interest has waned in recent years because two significant limitations of the method have surfaced (6). The first is that the Fourier transform P (j,k) can be zero for certain values of (j,k), leading to an undetermined value for O (j,k). The second is that image noise is amplified greatly by the restoration process and often so overwhelms the imaging data that the restored image is useless. Although methods have been developed to reduce these limitations,
the conclusion of most attempts to restore medical images is that it is preferable to collect medical images at high resolution, even if sensitivity is compromised, than to collect the images at higher sensitivity and lower resolution, and then try to use image-restoration techniques to recover image resolution. Image Enhancement. The human eye and brain act to interpret a visual image principally in terms of boundaries that are presented as steep gradients in image brightness between two adjacent regions. If an image is processed to enhance these boundary (edge) gradients, then image detail may be more visible to the observer. Edgeenhancement algorithms function by disproportionately increasing the high-frequency components of the image. This approach also tends to enhance image noise, so that edge-enhancement algorithms are often used together with an image-smoothing filter to suppress noise. CONCLUSION The use of images to detect, diagnose, and treat human illness and injury has been a collaboration among physicists, engineers, and physicians since the discovery of X rays in 1895 and the first applications of X-ray images to medicine before the turn of the twentieth century. The dramatic expansion of medical imaging during the past century and the ubiquitous character of imaging in all of medicine today have strenghthened the linkage connecting physics, engineering, and medical imaging. This bond is sure to grow even stronger as imaging develops as a tool for probing the cellular, molecular, and genetic nature of disease and disability during the first few years of the twenty-first century. Medical imaging offers innumerable challenges and opportunities to young physicists and engineers interested in applying their knowledge and insight to improving the human condition. ABBREVIATIONS AND ACRONYMS DQE ECT ESF fMRI GI JND LSF MRI MTF PET PSF SNR SPECT
detective quantum efficiency emission computed tomography edge-spread function functional magnetic resonance imaging gastrointestinal just noticeable differences line-spread function magnetic resonance imaging modulation transfer function positron emission tomography point-spread function signal-to-noise ratio single photon emission computed tomography
BIBLIOGRAPHY 1. W. R. Hendee, Rev. Mod. Phys. 71(2), Centenary, S444–S450 (1999). 2. R. N. Beck, in W. Hendee and J. Trueblood, eds., Digital Imaging, Medical Physics, Madison, WI, 1993, pp. 643–665.
IMAGING SCIENCE IN METEOROLOGY 3. J. H. Thrall, Diagn. Imaging (Dec.), 23–27 (1997). 4. W. R. Hendee, in W. Hendee and J. Trueblood, eds., Digital Imaging, Medical Imaging, Madison, WI, 1993, pp. 195–212. 5. P. F. Sharp and R. Philips, in W. R. Hendee and P. N. T. Wells, ed., The Perception of Visual Information, Springer, NY, 1997, pp. 1–32. 6. B. H. Brown et al., Medical Physics and Biomedical Engineering, Institute of Physics Publishing, Philadelphia, 1999. 7. S. S. Stevens, in W. A. Rosenblith, ed., Sensory Communication, MIT Press, Cambridge, MA, 1961, pp. 1–33. 8. J. A. Rowlands, in W.R. Hendee, ed., Biomedical Uses of Radiation, Wiley-VCH, Weinheim, Germany, 1999, pp. 97–173. 9. A. B. Wolbarst, Physics of Radiology, Appleton and Lange, Norwalk, CT, 1993. 10. A. Rose, Vision: Human and Electronic, Plenum Publishing, NY, 1973. 11. B. H. Hasegawa, The Physics of Medical X-Ray Imaging, 2nd ed., Medical Physics, Madison, WI, 1991, p. 127. 12. W. R. Hendee, in C.E. Putman and C.E. Ravin, eds., Textbook of Diagnostic Imaging, 2nd ed., W.B. Saunders, Philadelphia, 1994, pp. 1–97. 13. A. Rose, in Proc. Inst. Radioengineers 30, 293–300 (1942).
IMAGING SCIENCE IN METEOROLOGY ROBERT M. RAUBER LARRY DI GIROLAMO University of Illinois at Urbana-Champaign Urbana, IL
INTRODUCTION Meteorologists draw on a wealth of data from measurements and numerical weather prediction models to understand the atmosphere and forecast its evolution. A dramatic increase in the type and quantity of data available to meteorologists during the latter half of the twentieth century has required worldwide computing resources to manage efficient data processing, dissemination, and storage. Imaging and animation have become essential tools of meteorologists as they attempt to assess the current state of the atmosphere and forecast its future behavior. Meteorological measurements of atmospheric properties such as temperature, pressure, winds and moisture content are made by weather stations in towns and airports worldwide at regular time intervals, typically every 1–3 hours. Some of these stations use balloon-borne instruments to measure vertical profiles of atmospheric properties every 12 hours. Few observations exist in many parts of the world, particularly over oceans, deserts, and mountainous regions. Even in populated areas, the distances between stations are large enough that significant weather phenomena such as thunderstorms may not be reported. Therefore, satellite and radar imagery are key sources of information used by meteorologists to detect and analyze these weather phenomena. The high temporal and spatial resolution of these images permit meteorologists to monitor the progress of weather systems, analyze
757
the structure and evolution of storms, and determine where dangerous conditions are likely to occur. The data used to construct these images are also used to initialize numerical forecast models and to determine the quality of forecasts. The first successful meteorological satellite to acquire images of the earth was the Television and Infrared Observational Satellite 1 (TIROS 1). TIROS 1 was launched into a 48° inclination orbit on 1 April, 1960. The first image acquired by TIROS 1 is shown in Fig. 1. Although its 79-day lifetime was brief and the technology crude by today’s standards, the acquired images demonstrated the powerful utility that satellite imagery would bring to the field of meteorology. Many meteorological satellites followed, and on 7 December, 1966, a landmark event in satellite meteorology took place when the Applications Technology Satellite 1 (ATS 1) was launched. ATS 1 was the first meteorological satellite placed into geostationary orbit. From this vantage point, images of nearly an entire hemisphere can be acquired at high enough temporal resolution to make animation possible. For the first time, meteorologists could visually monitor large weather disturbances such as cyclones and hurricanes. Today, animations routinely appear on television weather broadcasts. By the end of the 1960s, information acquired by satellites was also used quantitatively in numerical weather prediction models for both initialization and verification. Radar has been used for meteorological studies since World War II. Between the 1940s and 1970s, meteorologists routinely examined radar data on cathoderay-tube displays where image brightness was a measure of echo strength and likely precipitation intensity. The
Figure 1. The first TIROS-1 image of earth (courtesy of the National Oceanic and Atmospheric Administration).
758
IMAGING SCIENCE IN METEOROLOGY
development of scanning Doppler radar in the early 1970s, which provides information on wind fields, was quickly followed by the development of color images of radar data. The first displays were developed in 1974 by the Air Force Geophysics Laboratory, the Raytheon Corporation, and the National Center for Atmospheric Research. The use of color by researchers progressed slowly in the 1970s, limited primarily by computing power, but accelerated in the 1980s as computer technology advanced. In 1979, several U.S. government agencies collaborated to establish a network of Doppler radars in the United States. The network was designed and tested in the 1980s and finally deployed in the 1990s. Rapid data processing and dissemination of data from this radar network via the World Wide Web now make it possible for meteorologists and the public to obtain radar images in near real time. Meteorologists require images of satellite and radar data at both fine and coarse resolution to examine the many scales of motion in the atmosphere. Consider, for example, the midlatitude cyclonic storm shown in Fig. 2. A cyclone’s cloud pattern typically takes the shape of a comma. Cyclones can influence weather across an area roughly two-thirds the size of the United States. In Fig. 2, a 1,500-kilometer long line of showers and thunderstorms is present along the ‘‘tail’’ of the comma cloud. Individual thunderstorms within this line are typically about 15 kilometers in diameter. One or more of these thunderstorms may be rotating, and the scale of rotation is typically about 1–5 kilometers. A tornado whose diameter ranges from 0.1–1 kilometer may develop within one of the rotating thunderstorms. A forecaster
Warm front near middle cloud boundary
Surface cold front marked by position of low cloud boundary
Upper level front marked by west edge of high clouds
Figure 2. Infrared satellite image of a cyclone over the central United States on 10 November 1998. The shading is indicative of the temperature of the cloud tops and ground surfaces viewed by the satellite. The blue enhanced regions denote the coldest cloud tops, light gray shades are warm-topped clouds, and dark gray shades are the earth’s surface. See color insert.
responsible for issuing weather forecasts and severe weather warnings must quickly assimilate, interpret, and update information about all of these scales of motion as the cyclone evolves. To facilitate this process, meteorologists require that satellite and radar data be easily remapped into various projections, marked with geographical features such as state and county boundaries, overlaid with other meteorological data, and animated to observe the growth or decay of various features within the weather systems. SATELLITE IMAGERY IN METEOROLOGY Satellite Measurements Meteorological satellites measure electromagnetic radiation, usually in several spectral bands. Virtually all of the radiation measured by satellites originates from the sun or the earth. About 95% of the radiative energy emitted by the sun lies between the wavelengths of 0.3 µm and 2.4 µm, which includes the ultraviolet, visible and near-infrared portion of the electromagnetic spectrum. The sun’s peak radiative output occurs around 0.46 µm in the visible portion of the spectrum. The solar radiation that reaches earth can be scattered or absorbed by atmospheric constituents (air molecules, aerosols, and clouds) and the earth’s surface. Some of the scattered solar radiation makes its way back to space, where it can be measured by satellites. The earth’s surface and atmosphere also emit radiation, primarily in the infrared and microwave portion of the spectrum. The earth’s peak radiative output occurs at around 10 µm in the infrared portion of the spectrum. Some of the radiation emitted by the earth makes its way to space where it can be measured by satellites. Satellite infrared imagery is often gray-scale inverted so that cold clouds, which emit smaller amounts of infrared radiation, appear white and warmer surface features, which emit larger amounts of radiation, appear dark. To interpret satellite-measured radiation properly, basic knowledge of the transmission properties of the earth’s atmosphere is required. Figure 3 shows the vertical transmission properties of the earth’s cloud and aerosolfree atmosphere as a function of the wavelength of the electromagnetic radiation. Note that there are several spectral regions where the transmittance is large. These regions are called ‘‘windows’’ because the atmosphere is nearly transparent to radiation at these wavelengths. Two window regions are commonly exploited by meteorologists. The first is the visible window region that lies between 0.4 µm and 0.7 µm. In this region, a small reduction in the transmittance from unity is caused primarily by scattering of visible light by atmospheric gases. On a clear day, most of the visible sunlight that reaches the top of the earth’s atmosphere is transmitted to the earth’s surface. Some of the transmitted sunlight is then scattered by the surface back to space, where it can be measured by satellites and analyzed by using ‘‘visible-channel’’ imagery. The second window region of interest lies in the infrared portion of the spectrum between 10 µm and 12 µm. In this spectral region, a small reduction of transmittance from unity is caused primarily by water
IMAGING SCIENCE IN METEOROLOGY
759
Vertical transmittance
1.0 0.8 0.6 0.4 0.2 0.0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 3 4 5 6 7 8 9 101112131415 Wavelength (µm)
vapor absorption. The peak emission of the earth occurs at around 10 µm, so that on a clear day, a satellite that has an infrared channel in this window region can measure radiation primarily emitted by the earth’s surface. This measurement can be considered a coarse measure of the earth’s surface temperature because the intensity of radiation emitted by a surface is a function of temperature as described by Planck’s function. In practice, direct inversion of Planck’s function is not used to retrieve surface temperature because a correction for the small reduction in atmospheric transmittance and surface emittance is needed to retrieve the accurate surface temperatures required by meteorologists. One meteorological feature that truly stands out in satellite imagery is clouds. Clouds scatter radiation very efficiently in the visible portion of the spectrum, and only a small amount is absorbed. As a result, clouds appear bright in visible imagery. How bright a cloud appears depends largely on the amount, type, and spatial distribution of the particles that make up the cloud. As a rule of thumb, thick clouds appear brighter than thin clouds. Clouds become very efficient in absorbing radiation in the infrared window and not very efficient in scattering radiation. Thus, clouds absorb much of the infrared radiation emitted by the earth’s surface in the window region. However, according to Kirchoff’s law, good absorbers of radiation at a particular wavelength are also good emitters of radiation at that wavelength. The amount of radiation emitted decreases with decreasing temperature, according to the Planck function. Thus, in infrared imagery, clouds that are at a higher altitude emit less radiation to space than lower clouds because the average temperature in the lower atmosphere, where nearly all clouds occur, decreases with increasing altitude. As an example, Figs. 4a and b show visible (0.67 µm) and infrared (11 µm) images taken by the Advanced Very High Resolution Radiometer (AVHRR) onboard the National Oceanic and Atmospheric Administration (NOAA)-11 satellite on 6 June, 1992 over the Bering Strait. Several important features stand out in these images. Based on the infrared image, the landmasses of Alaska and the Chukchi Peninsula appear much warmer than the surrounding waters. In the visible image, ice covers much of the waters north of the Bering Strait (D) and in Kotzebue Sound (E). In the visible image, open waters appear dark (F) compared to ice-covered waters (D, E). However, in
Figure 3. The vertical transmittance of the earth’s atmosphere between 0.2 µm and 15 µm. The transmittance was calculated using a radiative transfer model at a spectral resolution of 5 cm−1 . The effects of aerosols and clouds have not been included in the calculations.
the infrared, points D, E, and F show very little contrast, indicating little difference in surface temperature between open and ice-covered waters around the strait. The waters to the south are warmer, which can be determined by contrasting point F and C. Snow-covered mountaintops stand out in both visible and infrared images (e.g., H). Roughly one-third of the AVHRR scene in Fig. 4 is covered with clouds. In the visible image, the clouds at points A and B are brighter than the waters at point C. More information can be ascertained in the infrared. The temperature of clouds at A is lower than that of clouds at B, indicating that the cloud tops at A are at a higher altitude than the cloud tops at B. Clouds are more difficult to identify over ice. There is little contrast in both the visible and infrared images between the low-altitude cloud at point G and the surrounding ice field. However, higher altitude clouds over the ice tend to be easier to identify in the infrared image. Overall, by studying infrared and visible imagery in tandem, meteorologists can characterize the scene better than by studying visible or infrared imagery alone. In addition to visual inspection of satellite spectral imagery, remote sensing techniques exist that use measurements from a number of spectral channels to infer quantitatively a wide variety of properties about the earth’s surface and atmosphere, including surface temperature, vertical temperature and humidity profiles, cloud liquid water content, and surface and cloud albedo. Operational Imaging Satellites Satellite imaging instruments used in the daily operations of weather services around the world are called operational satellites. They are found in two types of orbits: geostationary and low earth orbits. A geostationary orbit is a circular orbit whose angular velocity is the same as the earth’s and lies in the earth’s equatorial plane. Consequently, the satellite remains essentially motionless relative to a point on the earth’s equator. A geostationary orbit must be about 36,000 km above the earth’s surface for the satellite to maintain the same angular velocity as that of the earth. A low earth orbit lies between several hundred and several thousand kilometers in altitude. Many meteorological satellites in a low earth orbit are in a near-polar orbit that is sun-synchronous. A satellite in a sun-synchronous orbit crosses the equator at the same local time every day.
760
IMAGING SCIENCE IN METEOROLOGY
(a)
G
D
H
E F
B
A
C
(b)
G
H
D
E F
B
A
C Figure 4. (a) A visible (0.67 µm) image and (b) an infrared (11 µm) image of the Bering Strait between Alaska and Siberia taken from the AVHRR instrument on board the NOAA-11 satellite platform on 6 June 1992. The letter labels are used as reference in the text.
Several countries around the globe maintain geostationary and low earth orbiting imaging meteorological satellites. The United States operates the Geostationary Operational Environmental Satellite (GOES) imager and the Advanced Very High Resolution Radiometer (AVHRR)
in the NOAA series of polar low earth orbiting satellites. These imagers are typical of satellite imagers maintained worldwide, such as the instruments on the Meteorological Satellite (METEOSAT), operated by a European consortium, and the Geostationary Meteorological Satellite (GMS) operated by Japan. The first of the GOES series of satellites was placed in orbit in 1976. Ten GOES satellites have been operational through the year 2000. The latest, GOES-10, carries five spectral channels centered on 0.65, 3.9, 6.7, 10.7, and 12 µm. Table 1 summarizes the GOES-10 imager characteristics. At any given time, the United States operates two GOES satellites, GOES-East (currently GOES-8) and GOES-West (currently GOES-10), located around 75 ° W and 135 ° W, respectively. Most of the Western Hemisphere can be observed from these two locations. For example, Figures 5a, b, and c show the 0.65, 6.7 and 10.7 µm full disk images from GOES-10 taken on 13 August, 1999. These three spectral channels are the channels most commonly exploited in constructing meteorological satellite imagery. The 0.65-µm (visible) imagery gives a measure of the visible reflectivity of objects in the image. Ocean observations, outside of sun-glint, tend to have the lowest reflectivity, and clouds and snowcovered surfaces tend to have the highest reflectivity. The 10.7 µm window (infrared) imagery gives a measure of surface and cloud-top temperatures; clouds are normally colder than the earth’s surface. The 6.7 µm (infrared) channel is in a spectral region of strong water vapor absorption (see Fig. 3). As a result, no surface features can be seen in the 6.7-µm imagery. Instead, the features observed are humidity and clouds, typically in the range of 2–18 km in altitude. Because water vapor is ubiquitous but variable in concentration in the earth’s atmosphere, animation of water vapor imagery allows meteorologists to monitor the atmospheric circulation in both clear and cloudy conditions. The two infrared channels provide useful imagery during day and night, whereas visible imagery provides useful imagery only during the day. Figure 5 was taken around local noon of the subsatellite point, a time when the sun is illuminating the entire scene viewed by GOES-10. The images show two hurricanes and a tropical depression at about 20 ° N latitude. The westernmost hurricane is Eugene, a weakening hurricane. Directly to the east of Eugene is Hurricane Dora, near its peak intensity. GOES-10 can acquire an image of the full earth’s disk viewed by the satellite every 25 minutes. Full disk imaging can be halted at any time to perform a rapidscan image, whereby a small section of the earth can be imaged at higher temporal resolution. A full disk image from GOES is not a complete view of the hemisphere. Latitudes up to only 72.6° can be observed by GOES because the earth subtends a half angle of 17.4° at the geostationary altitude. Thus, polar orbiting instruments, like the AVHRR, are needed to monitor meteorological conditions at polar latitudes. The first AVHRR series of imagers was on board TIROSN, launched in 1978. All of the NOAA series of satellites that followed carried AVHRR imagers. The characteristics of the most recent AVHRR on board NOAA-15 are shown in
IMAGING SCIENCE IN METEOROLOGY
761
Table 1. Specifications of the GOES-10 Imager Instrument Parameters
Specifications
Spectral channels (half-power points in µm) Instantaneous field of view (µrad) Ground instantaneous field of view at nadir Scanning rate 500 × 500-km region 3,000 × 3,000-km region Full disk Instrument mass Instrument size
Channel 1
Channel 2
Channel 3
Channel 4
Channel 5
0.55–0.75
3.8–4.0
6.5–7.0
10.2–11.2
11.5–12.5
28 1 km
112 4 km
224 8 km
112 4 km
112 4 km
20 s 3.1 min 25 min 120 kg Sensor: 115 × 80 × 75 cm; Electronics: 67 × 43 × 19 cm; Power supply: 29 × 20 × 16 cm 119 W
Power consumption
Table 2. Specifications of the AVHRR on Board NOAA-15 Instrument Parameters Instantaneous field of view Ground instantaneous field of view (850-km orbit) Nadir Swath edge Number of samples across scan line Maximum scan angle Swath width Spectral channels (half-power points in µm) 1 2 3A 3B 4 5 Instrument mass Instrument size Power consumption
Specifications 1.3 mrad
1.1 km 2.3 × 6.4 km 2,048 ±55.3° from nadir 3,000 km
0.58–0.68 0.725–1.00 1.58–1.64 3.55–3.93 10.3–11.3 11.5–12.5 30 kg 27 × 37 × 79 cm 29 W
Table 2. AVHRR is a scanning radiometer that sweeps out a swath approximately 3,000 km wide. Thus, from a given satellite pass, the AVHRR views only a small fraction of the earth. Because NOAA-15 orbits the earth about 14 times per day, a different portion of the polar region is viewed about every 100 minutes as the earth rotates under the satellite’s orbit. Although this information is much less frequent and covers a much smaller spatial extent than information from GOES, the information is nonetheless crucial for monitoring the weather in polar regions. Examples of Meteorological Satellite Image Interpretation Cloud Classification. Meteorologists use a standard scheme for classifying clouds that has remained nearly unchanged for more than a century. In this classification,
clouds that have vertical development are called cumulus, layer clouds, stratus, high clouds, cirrus, and precipitating clouds, nimbus. The terms are combined to characterize particular cloud types, such as stratocumulus (a layer of clouds that have vertical development) and cumulonimbus (a thunderstorm), and further refined to take into account a cloud’s altitude and shape. The satellite meteorologist attempts to identify cloud types because the clouds’ type, location, and organization can be related to the type of weather that occurs in their vicinity. An experienced meteorologist can use satellite imagery to identify different cloud types. For example, Fig. 4 shows a wide variety of cloud types belonging to the high and low cloud categories. Point A is a cirrostratus cloud, and points B and G are stratus clouds. Stratocumulus clouds lie to the west of point C, and cumulus clouds tend to dominate the Alaskan landmass to the east and south of Norton Sound. Figure 5 shows several examples where the clouds’ type and organization can be related to the type of weather occurring in their vicinity. The obvious example is Hurricane Dora, where cumulonimbus clouds make up the hurricane’s center and the surrounding cumulus shows the hurricane’s spiral flow. Figure 5 also shows a long, narrow band of clouds whose cirrus tops in the far southern Pacific are associated with a cold front. The marine stratocumulus clouds off the coast of Baja California indicate two regions of large-scale descent and ascent of air. This can be better seen in Figure 6, which is a higher resolution image of the marine stratocumulus region indicated in the boxed region on Fig. 5a. To the right of the image is a region of open-cell convection, characterized by cells whose center is clear and edges are cloudy. To the left of the image, closed-cell convection dominates. Clouds in this region fill the center of the cell, and the edges of the cell are clear. Open-cell convection occurs mainly in regions of largescale descending motion; heating from the surface drives the convection, and closed-cell convection occurs primarily in regions of large-scale ascending motion; infrared cooling of the clouds tops drives the convection.
762
IMAGING SCIENCE IN METEOROLOGY
(a)
(b)
(c)
Figure 5. GOES-10 full disk images for (a) the 0.65 µm channel, (b) the 6.7 µm channel, and (c) the 10.7 µm channel taken on 13 August 1999 at 14:00 Greenwich mean time. The boxed region in (a) is enlarged in Fig. 6 (courtesy of Mike Garay and Roger Davies, Department of Atmospheric Sciences, University of Arizona, Tucson, AZ).
Fronts. A front is the boundary between two air masses that have distinctly different thermal or moisture properties. When cold (typically dry) air is advancing into a region of warm (typically moist) air, the boundary between the two air masses is called a cold front. Similarly, when warm air is advancing into a region of colder air, the boundary between the two air masses is called a warm front. If the boundary between the two air masses is not moving, the front is called a stationary front. Lifting of air is the driving mechanism in cloud and precipitation formation along fronts. As cold air advances toward a warmer air mass, the warm air, which is less dense than the cold air, is forced upward. Similarly, when
warm air advances toward a cold air mass, the warm air will rise over the colder air. As the warm air rises, it cools by expansion. If the warm air is sufficiently moist and enough cooling takes place for the air to reach saturation, clouds form. The cloud types that form along fronts depend on the physical properties of the air masses and the amount of lifting that takes place. Fortunately, some generalizations of cloud types associated with warm and cold fronts can be made, which helps meteorologists to identify fronts on satellite images. Cold fronts tend to produce stronger lifting over a narrower distance than warm fronts. As a result, the clouds along a cold front tend to be cumuliform such as
IMAGING SCIENCE IN METEOROLOGY
763
Figure 7. GOES-8 visible-channel image of the northeast coast of North America taken on 23 February 1999. Figure 6. GOES-10 high-resolution visible-channel image of the marine stratocumulus field off the coast of Baja California shown in the box in Fig. 5a.
cumulus, stratocumulus, and cumulonimbus. If the clouds grow tall enough, strong winds aloft may carry ice crystals at the top of the clouds ahead of the front, producing a broad shield of cirrus clouds. Cold fronts can sometimes be identified in satellite imagery by a long, narrow band of clouds, where the rear edge of the cloud band is very sharp. For example, Fig. 5 prominently shows a long, narrow band of clouds in the far southern Pacific that is associated with a cold front. Figure 2 shows a special kind of cold front associated with the ‘‘comma cloud’’ of the cyclone, a ‘‘split cold front.’’ A split cold front occurs when dry air aloft overruns the warm, moist air ahead of the surface cold front. Under these conditions, deep clouds develop at the leading edge of the advancing dry air aloft (the upper level cold front), and shallow clouds occur below the dry air and ahead of the surface cold front. Infrared satellite images can clearly distinguish the adjacent bands of clouds at the two altitudes. Severe thunderstorms are often found just ahead of the upper level front. Warm fronts tend to be associated with gentle lifting across a wide area. As a result, the clouds along a warm front tend to be stratiform. Far ahead of the warm front, often out to 1000 km, cirrus and cirrostratus clouds form. Closer to the warm front, deeper clouds typically produce a wide area of precipitation that is generally less intense than the precipitation along a cold front. Figure 2, which shows an example of a cloud shield north of the warm front, illustrates a basic difficulty in frontal analysis using satellite data. In this image, the position of the surface warm front is mostly hidden below a deck of cirrus clouds (indicated by the blue enhancement). The lower cloud decks associated with air rising over the warm front can be seen only across Pennsylvania and New York, east of the high cirrus cloud deck. Air-mass Modification. Air masses can be roughly categorized as polar (cold), tropical (warm), continental
(dry), and maritime (humid). Air masses can cover several million square kilometers because the source regions (the underlying continents and oceans) are very large. As an air mass moves away from its source region across a surface that has different thermal or moisture properties, airmass modification takes place. Figure 7 is an example of air-mass modification easily identified in a satellite image. A high-pressure center lies north of Lake Ontario, forcing continental polar air, which originated over northern Canada, southeastward toward the Atlantic coast and over the ocean. When this air mass moves over the warm waters of the Gulf Stream, the lowest layer of the air mass quickly warms, moistens, and destabilizes. Warm moist updrafts develop, producing the cloud bands seen in the image. Cold, dry downdrafts between the cloud bands produce gusty winds at the surface, making the seas choppy and potentially hazardous. Note how the air blowing off the coast of Maine and New Brunswick must cross the Bay of Fundy and Nova Scotia before reaching the Atlantic. As a result, the stability of the air is modified more than once enroute to the Atlantic, as can be seen in the image. Animation. Images of nearly an entire hemisphere can be acquired from geostationary orbit at a high enough temporal resolution to make animation possible. Animated images are a valuable tool in meteorology because the life cycle of weather systems, from small cumulus clouds to large-scale cyclones, can be monitored. Winds at cloud level can be estimated from cloud motions. The large-scale horizontal motion of air can be monitored from water vapor imagery. Animations allow meteorologists to study the progress of significant weather events and make more reliable forecasts. Nowhere is this more important than in forecasting the movement of hurricanes. Hurricanes cause more damage in the United States than any other natural disaster. In terms of lives lost, the hurricane that hit Galveston, Texas, in 1900 was the largest natural disaster in United States history and left more than 6,000 dead.
764
IMAGING SCIENCE IN METEOROLOGY
Today, such disasters are mitigated by advanced hurricane warnings to the public, allowing early preparation and evacuation. Satellite imagery is an important tool used to issue these advanced warnings. Once the advanced warnings became available, lives lost in the United States by hurricanes were reduced dramatically, despite rapid expansion of population in coastal areas affected by hurricanes.
where |K|2 is a dimensionless factor that depends on the dielectric properties of the particle; it is approximately equal to 0.93 for water and 0.18 for ice at radar wavelengths. The radar reflectivity of clouds and precipitation is obtained by summing the cross sections of all the particles in the scattering volume and is written as Eq. (3):
Radar Imagery in Meteorology
The radar reflectivity factor Z is defined by Eq. (4), where the summation is across the scattering volume: 6 D Z= (4) V
Radar Measurements Meteorological radars transmit short pulses of electromagnetic radiation at microwave or radio frequencies and detect energy backscattered toward the radar’s antenna by scattering elements in the atmosphere. Radiation emitted by radar is scattered by water and ice particles, insects, other objects in the path of the beam, and refractive index heterogeneities associated with air density and humidity variations. The returned signal is the combination of radiation backscattered toward the radar by each scattering element within the volume illuminated by a radar pulse. This volume is determined from the angular beam width, the pulse duration, and the distance of the pulse from the radar, as it travels outward at the speed of light. Typically the beam width is about 1° and the pulse duration 1 microsecond. At a range of 50 km, the scattering volume equals about 108 m3 . In moderate rain, this volume may contain more than 1011 raindrops. The contributions of each scattering element in the volume add in phase to create the returned signal. The returned signal fluctuates from pulse to pulse as the scattering elements move. For this reason, the returned signals from many pulses are averaged to determine the average received power. Meteorologists use the average received power to estimate the intensity of precipitation. Doppler radar also measures the pulse-to-pulse phase change due to the average motion of the scattering elements along the radar beam and use this to determine the wind speed toward or away from the radar. Meteorologists image and animate radar data to track and predict the movement of storms, identify dangerous weather situations, estimate winds and amounts of precipitation, and relate rainfall distributions to atmospheric phenomena such as fronts. The radar range equation for meteorological targets such as raindrops is given by Eq. (1) Pr =
Pt G2 λ2 Vη, (4π )3 R4
(1)
where Pr is the average received power, Pt the transmitted power, G the antenna gain, λ the wavelength of transmitted radiation, R the range, V the scattering volume, and η the reflectivity. The radar cross section σ of a spherical water or ice particle whose diameter D is small compared to the wavelength λ is given by the Rayleigh scattering law, Eq. (2), σ =
π5 |K|2 D6 , λ4
(2)
η=
π5 |K|2 Z. λ4
(3)
It is customary to use meters cubed as the unit for volume and to measure the particle diameters in millimeters, so that Z has conventional units of mm6 /m3 . Typical values of the radar reflectivity factor range from 10−5 to 10 mm6 /m3 in nonprecipitating clouds, 10 to 106 mm6 /m3 in rain, and as high as 107 mm6 /m3 in large hail. Because of the sixth-power weighting on diameter in Eq. (4), raindrops dominate the returned signal in a mixture of rain and cloud droplets. The radar reflectivity factor Z is important because it is related to the raindrop size distribution within the scattering volume. In meteorological applications, the averaged returned power is measured and converted to values of Z using Eq. (1)–(4). Because Z varies across orders of magnitude, a logarithmic scale, defined by Eq. (5),
Z (5) dBZ = 10 log10 1 mm6 /m3 is used to display the radar reflectivity factor. For example, an increase from 20 to 40 dBZ represents a hundredfold increase in the radar reflectivity factor. Radar images of weather systems commonly seen in the media (e.g., Fig. 8) show the radar reflectivity factor in logarithmic units. Images of the radar reflectivity factor overlain with regional maps permit meteorologists to determine the location and intensity of precipitation. Meteorologists often interchangeably use the terms ‘‘radar reflectivity factor’’ and ‘‘reflectivity,’’ although radar experts reserve the term ‘‘reflectivity’’ for η. A small change in the frequency of the returned signal, called the Doppler shift, occurs when scattering elements move toward or away from the radar with the ambient wind. Doppler radar determines this small frequency shift by measuring the pulse-to-pulse change in the phase of the returned signal. Doppler measurements provide an estimate of the along-beam, or radial wind component, called the radial velocity. Images of the radial velocity across the full spatial domain of the radar allow meteorologists to estimate the direction and speed of the winds. When thunderstorms move across the radar viewing area, the average motion of the storms can be determined from animation of the radar reflectivity factor. This motion is often subtracted from the measured radial velocities to obtain the storm-relative radial velocity. Images of the storm-relative radial velocity are particularly useful in identifying rotation and strong winds that may indicate severe conditions.
IMAGING SCIENCE IN METEOROLOGY
765
(b)
(a)
10 20 30 40 50 60 70 Reflectivity factor (dBZ)
(c)
(d)
Figure 8. Composite radar images of the radar reflectivity factor (dBZ) from several radars in the midwestern United States during the 19 April 1996 tornado outbreak. Panel a is at 4 P.M. local time. Successive panels are each one hour later.
Images from individual radars are typically plotted in a map-like format called the ‘‘plan-position indicator’’, or PPI display; the radar is at the center, north is at the top, and east is at the right. Earth curvature and atmospheric refraction cause a radar beam to rise above the earth’s surface as the beam propagates away from the radar. For this reason, distant radar echoes on a PPI display are at higher altitudes than those near the radar. Composite radar images constructed from several radars are projections of data in PPI format onto a single, larger map. For this reason, there is ambiguity concerning the altitude of the echoes on composite images. Sometimes, a radar beam is swept between the horizon and the zenith at a constant azimuth. In this case, data are plotted in a ‘‘range-height indicator’’ or RHI display, which allows the meteorologist to view a vertical cross section through a storm. Radar Applications
or weakening, and determine their speed and direction. Storm tracking allows forecasters to estimate the time of storm passage several hours in advance. Meteorologists animate images from single radars to determine the movement, intensity change, and dimensions of individual storms and storm complexes. Composite images from radar networks, combined with other meteorological data, permit meteorologists to determine how precipitation regions organize within their parent weather systems, which can extend over thousands of kilometers. Figure 8 shows a series of images that are composites of radar reflectivity data from radars located in the midwestern United States during a tornado outbreak on 19 April 1996. A line of storms can be seen developing near the western Illinois border and moving eastward into central Illinois. Thunderstorms appear on these radar images as isolated regions of high reflectivity or as small cores of high reflectivity embedded in widespread precipitation.
Storm Tracking. Images and animations of the radar reflectivity factor allow meteorologists to observe storm development, determine whether storms are intensifying
Severe Thunderstorm Detection. Severe thunderstorms can produce large hail, damaging straight-line winds, and
766
IMAGING SCIENCE IN METEOROLOGY
(a)
Radar
(b)
Radar
10 km
Location of tornado
Hook echo
<−64−58 −48−37−26 −16 −5 4 15 25 36 47 57 >64 Storm-Relative Radial Wind Velocity (knots)
13 18 24 29 34 40 45 50 55 61 Reflectivity Factor (dBZ)
Figure 9. Radar images of (a) the radar reflectivity factor (dBZ) and (b) the storm-relative radial wind velocity (knots) for a supercell thunderstorm that produced a tornado near Jacksonville, IL on 19 April 1996. On the radial velocity display, red (green) colors represent air motion away from (toward) the radar. White lines are county boundaries. See color insert.
tornadoes. Radars, particularly those that have Doppler capability, allow meteorologists to issue timely warnings when severe thunderstorms threaten. Figures 9a and b are high-resolution images of the reflectivity and storm-relative radial velocity fields from a tornadic thunderstorm in the 19 April 1996 outbreak (see arrow in Fig. 8c) and illustrate the signature features of tornadic thunderstorms. Rotation on Fig. 9b appears as a tight couplet of adjacent strong inbound (green) and outbound (red) radial motions. Severe thunderstorms often develop a hook-like appendage in the reflectivity field (Fig. 9a). Tornadoes typically develop near the center of the hook. Animations of the reflectivity and radial velocity images, used to track hook positions and radial velocity couplets, allow meteorologists to estimate the arrival time of dangerous conditions and issue specific warnings. A second indicator of a storm’s severity and potential destructiveness is the reflectivity intensity. Values of reflectivity that exceed 55 dBZ are often associated with hail. Large hail that can reach diameters of 1–6 cm appears on radar images as reflectivity approaching 70 dBZ. Precipitation Measurement and Hydrology. The radar reflectivity factor Z is only a general indicator of precipitation intensity. An exact relationship between Z and the precipitation rate R does not exist. Research has shown that Z and R are approximately related by Eq. (6): Z = aRb ,
point. These differences are due to uncertainties in the values of a and b, radar calibration uncertainties, sampling uncertainties, and other sources of error. Some of these errors are random, so radar estimates of total accumulated rainfall over larger areas and longer times tend to be more accurate. Figure 10, the radar-estimated precipitation during the landfall of Hurricane Georges along the Gulf Coast of Louisiana, Alabama, and Florida in 1998, illustrates the high resolution with which rainfall patterns can be determined from radar imagery. Precipitation estimates
Mobile
Pensacola
New Orleans
(6)
where the coefficient a and the exponent b take on different values that depend on the precipitation type. For example, in widespread rain, a is about 200 and b is 1.6 if P is measured in mm/h and Z is in mm6 /m3 . In general, radar estimates of the short-term precipitation rate across an area can deviate by more than a factor of 2 from surface rain gauge measurements made at a
0 1 2 3 4 6 8 10 12 14 16 18 20 22 Radar estimated rainfall (inches) Figure 10. Total precipitation through 6 A.M. local time on 29 September 1998, as measured by the National Weather Service radar at Mobile, Alabama during the landfall of Hurricane Georges.
IMAGING SCIENCE IN METEOROLOGY
such as those in Fig. 10 allow hydrologists to estimate the total precipitation across a watershed, which can be used to determine stream runoff. For this reason, radar is an important tool for issuing flash flood warnings. Aviation Meteorology. Thunderstorms, hail, lightning, and strong winds pose a direct threat to aviation. In particular, a phenomenon called wind shear, a sudden sharp change in wind speed or direction along the flight path of an aircraft, can pose an extreme hazard for an aircraft that is approaching or departing an airport. Wind shear occurs in downbursts, localized intense downdrafts that induce an outburst of potentially damaging winds on or near the ground. Wind shear also occurs along gust fronts, the leading edge of fast moving rain-cooled air that rushes outward from precipitating areas of thunderstorms. Radar imagery is routinely used to identify downbursts and track the position of gust fronts. Figure 11 shows an example of gust fronts located southwest of three severe thunderstorms in South Dakota on 27 July 1999. The leading edge of the strong winds appears as thin lines of weak reflectivity in the radar reflectivity field. The thin lines are associated with cloud lines that form at the leading edge of the advancing cool air. Gust fronts can also appear as sharp gradients in the radial velocity field. Doppler radars are now installed near larger airports to provide warnings of wind shear. Storm Structure Studies. Doppler radar has made it possible to examine the kinematic and thermodynamic structure of storms in fine detail. When two or more Doppler radars simultaneously observe a storm from different viewing angles, the wind fields within the storm can be retrieved at a spatial resolution as high as 200–500 m. Using appropriate processing techniques, the entire wind field, including updrafts and downdrafts,
North Dakota
South Dakota
Gust front boundaries
10
Minnesota
20 30 40 50 60 Reflectivity factor (dBZ)
70
Figure 11. Composite radar images of the radar reflectivity factor during a thunderstorm outbreak in South Dakota on 27 July 1999. The thin lines of higher reflectivity indicate gust fronts, the leading edges of cool air that rush outward from thunderstorms.
767
can be determined. Images of the wind and reflectivity fields help meteorologists understand 3-D flow patterns within storms and address questions about storm structure and evolution. For example, Fig. 12a shows an infrared satellite image of a storm system over the central United States. Radar data superimposed on the satellite image show a thunderstorm over northeast Kansas. Figure 12b shows a vertical cross section of the radar reflectivity and winds across this thunderstorm derived from measurements from two Doppler radars located near the storm (Fig. 12a). The forward speed of the storm has been subtracted from the winds to illustrate the circulations better within the storm. The vertical scale on Fig. 12b is stretched to illustrate the storm structure better. A 15-km wide updraft appears in the center of the storm, where central updraft speeds approach 5 m s−1 . The updraft coincides with the heaviest rainfall, indicated by the high reflectivity region in the center of the storm. Techniques have also been developed to estimate pressure and temperature fields within storms from the wind fields, once they are deduced. Since the mid-1970s, atmospheric scientists have deployed networks of Doppler radars to investigate the three-dimensional structure of thunderstorms, hurricanes, fronts, and many other weather phenomena. Precipitation Physics. Precipitation forms by two processes. The first involves the growth (and possible melting) of ice particles, and the second involves the collision and coalescence of cloud droplets. Radar provides one of the best methods for detecting precipitation particles in clouds because of the sixth-power relationship between particle diameter and reflectivity (Eq. 4). Radar measurements, combined with aircraft samples and other measurements of cloud properties, have been used to study the relative importance of these two processes in different cloud types. Radar can identify regions of new precipitation formation in clouds. The subsequent growth of precipitation particles can be inferred from images and animations of the reflectivity, as precipitation falls to the earth’s surface. For example, in cold stratiform clouds, streams of ice particles are often detected forming near a cloud top and falling into the lower part of the cloud. The reflectivity increases as the streams of particles descend, allowing meteorologists to estimate the rate of growth of the ice particles. Radar also provides information about the fall speed of precipitation particles. The fall speed is related to the size and shape of particles and to the motion of the air in which the precipitation is embedded. In stratiform clouds, which often have vertical motions of a few cm s−1 or less, the distribution of fall speeds measured by a vertical pointing Doppler radar can be related to the particle size distribution. A prominent feature on radar images from widespread clouds is a ‘‘bright band’’ of high reflectivity at the altitude of the melting level. When snowflakes melt, their reflectivity increases because the dielectric factor |K|2 of water exceeds that of ice (Eq. 2). Snowflakes become more compact and finally collapse into raindrops, as they continue to melt and descend through the cloud. The drops
768
IMAGING SCIENCE IN METEOROLOGY
(a)
Radar A
Radar B
Location of cross section in Panel A
(b) 10
8
−5
7
0
Radar reflectivity factor (dBZ)
Height above sea level (km)
9
5 6
10 15
5
20 4
25 30
3
35 2
40
5 m/s
20 m/s
1 0
8
16
24
32
40 48 56 Distance (km)
64
72
80
88
Figure 12. (a) Infrared satellite image of a storm system over the Central Plains of the United States on 14 February 1992. Composite radar data are superimposed on the image. The location of the two Doppler radars used to create the image in panel B and the location of the cross section in panel B are shown on the image. (b) Vertical cross section of the radar reflectivity factor and storm-relative wind in the plane of the cross section through the thunderstorm in northeast Kansas that appears on panel A.
have a smaller radar cross section and fall faster than snowflakes, leading to reduced reflectivity below the melting layer. Wind Measurements. Some radars are designed to operate at longer wavelengths, so that they are sensitive
to turbulent irregularities in the radio refractive index associated with temperature and humidity variations on a scale of half the radar’s wavelength. These radars, called wind profilers, measure a vertical profile of the wind fields above the radar. Doppler frequency measurements enable estimating the drift velocities of the scattering elements,
IMAGING SCIENCE IN METEOROLOGY
ADVANCED USES OF IMAGERY
8
Height (km)
6
4
2
0
769
12
11
10
9 8 7 Time (CST)
6
5
4
Figure 13. Hourly profiles of winds as a function of height above a wind profiler on 20 January 2000. The wind profiler is located at Fairbury, Nebraska. Short barbs, long barbs, and flags on the staff denote winds of 5, 10 and 50 knots, respectively. The orientation of the staff denotes the wind direction. For example, a vertical staff that has barbs on top denotes wind from the north, and a horizontal staff that has barbs on the left denotes a wind from the west. Most winds on this figure are from the northwest. Color permits easier visualization of the wind speed (blue: ≤20 knots; green: 20–50 knots; yellow: 50–60 knots; orange: 60–70 knots; red: >70 knots). See color insert.
from which wind velocity can be obtained. Wind profilers operate best in precipitation-free conditions. These radars can measure a vertical profile of the wind speed and direction across the profiler site at time intervals as short as 5 minutes. Shorter wavelength radars used for storm detection also can generate vertical wind profiles. The technique involves analyzing the radial (along-beam) wind component on a circle at a fixed distance from the radar. More distant circles correspond to higher elevations. Echoes must cover a sufficient area around the radar to make these measurements. For this reason, the technique works best when precipitation is present. Figure 13 shows an image that depicts hourly vertical wind profiles measured by a profiler in Fairbury, Nebraska on 20 January 2000 during a cold air outbreak. The winds at the site were from the northwest and increased with height to the level of the jetstream, which was located 8 km above the surface (red barbs). A shift in the low level winds near the earth’s surface from northwesterly to southwesterly marked the passage of the center of the high pressure system associated with the cold air. Together, wind profilers and shorter wavelength radars provide wind measurements aloft in most atmospheric conditions.
Advances in computing capabilities now allow meteorologists to superimpose meteorological data analyses, obtained from either measurements or numerical simulations, on radar and satellite imagery. Animations of images using data superimposed permit meteorologists to associate cloud and precipitation patterns better with fronts, jetstreams, or other meteorological phenomena that are responsible for creating the patterns. Figures 14a–d, for example, shows a four-hour sequence of infrared satellite images taken early in the day on the same day that the thunderstorm illustrated in Fig. 12b developed. Superimposed on Fig. 14 are radar data, terrain contours, and analyses of the surface wind and relative humidity based on measurements taken at weather stations. The earliest evidence of the developing thunderstorm on Fig. 12B actually appeared 7 hours before the time of Fig. 12 when, at 8 A.M. local time (panel a of Fig. 14), a small bow-like cloud band first developed over southwest Kansas (see arrow). The cloud band developed south of a low pressure center, which on Fig. 12a is located at the center of the counterclockwise wind circulation. A region of very dry air appears in the relative humidity field extending southeastward from the east slope of the Rockies and then northeastward around the low pressure center. The dry air, which had descended the east slope of the Rocky Mountains, was associated with strong winds. The cloud band that eventually developed into the thunderstorm shown on Fig. 12 was located at the leading edge of this advancing, dry air mass (Fig. 14a–d). This simple illustration shows how meteorologists use images and superimposed data to interpret dynamic and physical processes in the atmosphere and associate them with clouds and storm systems. Atmospheric scientists use imaging techniques to display other types of meteorological data in addition to satellite and radar data. For example, lidars, devices that transmit laser light and measure light backscattered by aerosol and cloud particles, can be used to determine atmospheric circulations impossible to detect by satellite and radar. For example, Fig. 15 shows a high-resolution image from the University of Wisconsin’s Volume Imaging Lidar, a scanning lidar that was located in Sheboygan, Wisconsin, approximately 20 meters from the west shore of Lake Michigan in winter. The image shows patterns produced by very light snow (not visible to the eye) falling into the convective atmospheric surface layer over the lake. The patterns result from combinations of patterns in the snowfall and the subsequent organization caused by surface layer convection. Animated time sequences of images such as Fig. 15 allow researchers to observe wind circulations as they track the movements of particles and aerosols. Animations of vertical cross sections, produced by scanning the lidar between the horizon and zenith, show convection and turbulent circulations. Forecasting today begins with predictions generated by computer models. These models, systems of mathematical equations that describe the behavior of the atmosphere, are based primarily on Newton’s laws of motion and conservation of mass and energy. Additional equations
770
IMAGING SCIENCE IN METEOROLOGY
(a)
8 a.m.
(b)
9 a.m.
30 40 50
90 50
70
90 80 60
70
25 m/s m/s 25 (c)
10 a.m.
25 m/s
(d)
11 a.m.
90 30 50
90 70 80 30 4050 60 30
20
70
25 m/s
25 m/s
Figure 14. Surface relative humidity (%, white lines) and wind fields (white vectors) overlaid on infrared satellite images and radar echoes at 8, 9, 10 and 11 A.M. Central Standard Time, 14 February 1992, for a region of the Central Plains of the United States centered on the state of Kansas. The black lines are elevation contours; with the outer contour is at 1,400 m, and the interval is at 600 m. The arrows on each panel point to the small cloud band discussed in the text. This small cloud band developed into the thunderstorm illustrated in Figs. 12a and 12b.
describe the behavior of ideal gases; heat and moisture transfer from the earth’s surface to the atmosphere; the phase changes of water between vapor, liquid, and ice; and solar and terrestrial radiation transfer within the atmosphere. When the equations are packaged together in a computer code, they are called a ‘‘numerical model’’ of the atmosphere. Numerical models have been developed to study a broad range of atmospheric processes that range from long-term climate change to the growth and evolution of individual clouds. The models are typically formulated on a three-dimensional grid whose resolution depends on the phenomena to be simulated. Numerical models generate huge volumes of data. For
example, a model that produces a 48-hour weather forecast might generate a trillion pieces of data. The ability to employ imaging and visualization techniques to analyze these data has been a major development in the atmospheric sciences. For example, Fig. 16 shows a threedimensional rendering of the thunderstorms associated with the 19 April 1996 tornado outbreak (Figs. 8 and 9). The data used to create this image are from a numerical simulation. The ‘‘thunderstorms’’ in this image are actually isosurfaces of the rainwater concentration calculated by the model. The coloring denotes the surface water vapor field, and the arrows show the surface wind. By examining the relationship of the thunderstorms to
IMAGING SCIENCE IN METEOROLOGY
12/20/97
ele = 0.53
10
8
6
4 km
2
0
2
15:19:13 UT
771
University of Wisconsin Volume Imaging Lidar 20-Dec-1997 15:19:13 UT
2
4
6
8
10 km
12
14
16
18
Figure 15. Lidar image of very light snow (invisible to the eye) falling into the convective atmospheric surface layer over Lake Michigan. The data have a range resolution of 15 m and an azimuthal resolution of 25 m at 18 km from the lidar. The lidar beam was 5 m above the lake surface (courtesy of the University of Wisconsin-Madison Lidar Group and Dr. Ed Eloranta).
various other quantities calculated by the model, the cause of the thunderstorms and the reasons for their severity can be investigated. Three-dimensional rendering and animation of numerical model data, a relatively new tool in meteorological research and forecasting, are contributing to both better understanding and prediction of atmospheric processes. FUTURE DIRECTIONS New instrumentation, analytic techniques, and data visualization methods continue to revolutionize the field of meteorology. In recent years, the advancement in computer processing power has been matched by our ability to collect information from satellites. Collecting satellite data at better spatial, temporal, and spectral resolution, along with better computing power, will continue into the future, thereby gathering more information and improving the analytic techniques used in meteorology. This anticipated growth of information must also be matched by our ability to assimilate the data into an operational forecasting environment. Operational satellite imaging technologies must first pass an experimental phase. Recently, several new concepts in experimental satellites have come forth. For example, the Tropical Rainfall Measuring Mission (TRMM), launched in November 1997, carries the first
Figure 16. Numerical simulation of tornadic thunderstorms from the 19 April 1996 tornado outbreak (see Figs. 8 and 9). The viewpoint is from an altitude well above the storms looking north. The image shows surface wind vectors, rainwater (the isosurface where rain water concentration = 0.5 grams of liquid water/kilogram of dry air), and ground-level water vapor content (colored and contoured every 1 gram of water vapor/kilogram of dry air). Green (orange) shading represents moist (dry) air (courtesy of Dr. Brian Jewett, University of Illinois at Urbana/Champaign). See color insert.
precipitation radar on an earth-orbiting satellite platform. For the first time, the three-dimensional distribution of precipitation particles can be studied from space. The data from the precipitation radar is already finding operational demand, and the images constructed from the
772
IMAGING SCIENCE IN METEOROLOGY
data are giving meteorologists new insight into tropical precipitation processes. Placing active instruments in space has been hampered by their large costs and bulky technologies. However, these problems are being resolved, and the future of spacebased active remote sensing looks promising. During the next several years, the National Aeronautics and Space Administration (NASA) and other space agencies around the world will be launching several active meteorological instruments, including cloud radar and lidar instruments. These are not imagers; rather, they collect a vertical cross section of data along the orbital path, thus giving us a 2-D vertical cross-sectional view of the underlying meteorology, as opposed to a 2-D horizontal view from imagers. Multiangle viewing instruments are also generating new ways of gathering meteorological information from space. Traditionally, satellite instruments have been single viewing, or in the case of many scanners, have tended to scan across the orbital track, so that different view angles correspond to different scenes. These instruments use spectral signatures to sense scene properties remotely. Multiangle viewing instruments are designed to view the same scene from different view angles, typically within several minutes of each view. These instruments take advantage of the angular anisotropy of the upwelling radiation field to sense scene properties remotely; thus, they combine both spectral and angular signature techniques for remote sensing. Three multiangle viewing instruments have been placed in orbit thus far: the Along Track Scanning Radiometer (ATSR), the Polarization and Directionality of the Earth’s Reflectance (POLDER), and the Multiangle Imaging SpectroRadiometer (MISR). In addition to angular signatures, instruments such as MISR offer stereoscopic capabilities that allow fusing images from different view angles to give depth information about the scene. This essentially allows us to view the world in 3D. For example, Fig. 17 shows the MISR stereoanaglyph image of Hurricane Alberto taken on 19 August 2000. The three-dimensional structure of the hurricane stands out when viewed by red-blue 3-D glasses. New visualization tools are being developed to take full advantage of these new 3-D images from space. No one satellite instrument can gather all the information needed from space. A well-coordinated suite of instruments and platforms is required to monitor the earth’s weather and climate systems properly. This statement has recently been reinforced by NASA’s introduction of the Earth Observing System (EOS). EOS is a long-term coordinated effort through NASA’s Earth Science Enterprise program. It is designed to provide a suite of earth-orbiting satellites needed for long-term monitoring of the earth’s land surface, biosphere, solid earth, ocean, and atmosphere. By providing synergy amongst instruments onboard a satellite and across satellite platforms, EOS will provide us with a more complete picture of our natural environment. Similar advances are occurring in radar meteorology. Current operational meteorological radars are designed to radiate and receive electromagnetic waves that have
Figure 17. MISR stereoanaglyph image of Hurricane Alberto taken on 19 August 2000. At this time, Alberto was located about 1,700 km west of the Azores. Viewing this stereoanaglyph image with red/blue 3-D glasses (red lens over the left eye) clearly shows the three-dimensional structure of this storm. The hurricane’s eye is about 60 km in diameter. The thunderstorms in the hurricane’s spiral arms and the funnel-shaped eye wall are apparent when viewed in 3D (courtesy of the National Aeronautics and Space Administration, Goddard Space Flight Center and Jet Propulsion Laboratory MISR Science Team). See color insert.
fixed polarization, a single, fixed orientation of the electric field. Special research radars called polarization diversity radars allow varying the polarization state of the transmitted and/or received signal. Polarization measurements are sensitive to particle orientation. As large raindrops fall, they have a tendency to flatten into an oval shape. Hail tumbles as it falls and generally has no preferred orientation. Ice crystals may or may not have preferred orientation depending on their shape. Polarization measurements can discriminate hail from rain and identify predominant types of ice particles in clouds. Researchers are also exploring methods for using polarization techniques to estimate rainfall better. Polarization diversity measurements are expected to be incorporated into operational Doppler radars within the first decade of the new century. Bistatic radar systems are radars that have one transmitting antenna but several receivers located 10–50 km away from radar. Researchers are currently examining the potential of these radar systems to retrieve high-resolution wind fields in storms. Bistatic radars measure microwave energy scattered toward each receiver. The additional receivers measure the pulse-to-pulse phase change associated with the Doppler shift, from which they determine the wind component toward or away from the receiver. Because the receivers view a storm from different directions, all wind components are measured and make it possible to retrieve wind fields within a storm similarly to that currently done with two or more Doppler radars. Future networks of bistatic radars may make it possible to create images of detailed wind fields within storms in near-real time, providing forecasters a powerful tool to determine storm structure and severity.
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
773
New data visualization methods continue to be developed to image, animate, and manipulate meteorological data. Techniques now exist to render surfaces in three dimensions (e.g. Fig. 16); superimpose data and air trajectories on those surfaces; and animate, rotate, and view the displayed data from various perspectives. These new computational capabilities and new data sets are improving meteorologists’ understanding of weather and storms and are leading to better forecasts and warnings of severe weather for the public.
ABBREVIATIONS AND ACRONYMS ATS ATSR AVHRR EOS GMS GOES METEOSAT MISR NASA NOAA POLDER PPI RHI TIROS
applications technology satellite along track scanning radiometer advanced high resolution radiometer earth observing system geostationary meteorological satellite geostationary operational environmental satellite meteorological satellite multiangle imaging spectroradiometer national aeroautics and space administration national oceanic and atmospheric administration polarization and directionality of the earth’s reflectance plan position indicator range height indicator television and infrared observational satellite
Figure 1. Gaspard Felix Tournachon, also known as Nadar, collected photographs from a hot-air balloon over Paris in 1859.
BIBLIOGRAPHY 1. D. Atlas, ed., Radar in Meteorology, American Meteorological Society, 1990. 2. M. J. Bader et al., in Images in Weather Forecasting, A Practical Guide for Interpreting Satellite and Radar Imagery, Cambridge University Press, Cambridge, U.K., 1995. 3. R. J. Doviak and D. S. Zrni´c, in Doppler Radar and Weather Observations, 2nd ed., Academic Press, San Diego, CA, 1993. 4. S. Q. Kidder and T. H. Vonder Haar, in Satellite Meteorology: An Introduction, Academic Press, San Diego, CA, 1995.
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE ROBERT D. FIETE Eastman Kodak Company Rochester, NY
BRIEF HISTORY Overhead surveillance has come a long way since the days when Gaspard Felix Tournachon, also known as ‘‘Nadar,’’ collected the first known aerial photographs from a hotair balloon 1,200 feet (370 m) over Paris in 1858 (Fig. 1). The first successful aerial photo taken in the United states and the earliest existing aerial photograph was
taken on October 13, 1860, by James Wallace Black from a balloon above Boston (Fig. 2). Thaddeus Lowe urged the creation of the U.S. Army Balloon Corps in 1862 to survey the Confederate positions during the U.S. Civil War. Unfortunately, the balloons made easy targets for the Confederate soldiers to shoot down, so the unit was deactivated in 1863. In 1906, George Lawrence used a camera carried aloft by seventeen kites to capture an aerial view of San Francisco, California, several weeks after the devastating earthquake (Fig. 3). In 1907, Alfred Maul patented a gyroscopically stabilized camera for taking overhead pictures (1) using rockets that could reach an altitude of 2,600 feet (700 m) (Fig. 4). Using breast-mounted cameras for pigeons, patented by Julius Neubronne in 1903, the Bavarian Pigeon Corps captured overhead images of Germany in 1909 by using a timing mechanism to take a picture every 30 seconds during the pigeon’s flight (Fig. 5). The invention of the airplane in 1903 made overhead surveillance more practical than kites, pigeons, or hot air balloons. Wilbur Wright’s passenger, L. P. Bonvllain, took the first photographs from an airplane in 1908 near Le Mans, France. In 1909, Wilbur Wright took photographs from an airplane using a motion picture camera over Centrocelli, Italy (Fig. 6). France and Germany were quick to develop aerial reconnaissance cameras and photointerpretation methods during World War I. Cameras were eventually mounted on the side of the biplane to relieve the photographer of holding the bulky
774
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
Figure 2. Earliest existing aerial photograph taken in 1860 by J. Wallace Black from a balloon above Boston.
Figure 3. George Lawrence used seventeen kites to capture an aerial view of San Francisco after the devastating earthquake in 1906.
camera and to acquire consistent vertical views (Fig. 7). These images provided invaluable information on troop positions and a visual map of the vast network of trenches through the battlefields (Fig. 8). The first nighttime aerial photograph was taken by George W. Goddard in 1925 when he ignited an 80-pound flash powder bomb over Rochester, New York. The science of photoreconnaissance and photointerpretation advanced dramatically during World War II due to the development of color infrared film, radar systems, and the collection of stereo images. Harold Edgerton developed a powerful flash system that was used for collecting
nighttime aerial photographs of Normandy before the D-Day invasion. Innovative camera designs were developed to operate at altitudes above 20,000 feet (6 km), well out of range of enemy guns. These high altitude images were used primarily for strategic purposes, such as locating and identifying production facilities. Figure 9 shows an aerial photograph taken in 1944 over Peenemunde that captured an image of the V-2 rocket being developed by the Germans. Unfortunately, tactical information still required higher resolution images, which required pilots to fly high-speed, low-altitude missions, generally putting their lives at great risk (2).
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
Figure 4. Overhead image from Alfred Maul’s rocket.
As the cold war between the United States and the Soviet Union intensified in the 1950s, the need for high-altitude reconnaissance was answered by the development of the U-2 at Lockheed’s ‘‘Skunk Works’’
775
under the leadership of C.L. ‘‘Kelly’’ Johnson. The U2 (Fig. 10) had a wingspan of 80 feet (24 meters) and could fly at an altitude of 80,000 feet (24 km), well out of reach of Soviet missiles at the time. It had a range of 3,000 miles (4,800 km) and a maximum speed of more than 500 mph (220 m/s). The U-2 B camera could take images with 2.5 feet (0.8 m) resolution from an altitude of 65,000 feet (20 km). The first U-2 mission was flown in 1955, and modified versions are still in operation 45 years later. The images collected by the U-2 provided the U. S. military with information that could not have been provided by any other methods. This was most dramatically demonstrated during the Cuban missile crisis in 1962. Although intelligence reports suspected that the Soviet Union intended to deploy missiles in Cuba, the Soviets officials denied this. On 14 October, two U-2 aircraft flew over Cuba, and the photographs provided undeniable proof that the Soviets were constructing bases for intermediate-range missiles. Figure 11 shows an aerial image of a missile launch site in Cuba during the crisis. On 28 October, Soviet Premier Khrushchev agreed to remove the missiles. Several years after the U-2 became operational, advances in Soviet radar capabilities allowed them to track the U-2 reliably. Although the Soviets did not
Figure 5. Carrier pigeons were used to collect overhead images of Germany in 1909.
776
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
Figure 6. Photograph taken from an airplane by Wilbur Wright over Centrocelli, Italy in 1909.
Figure 7. Reconnaissance camera attached to the side of a plane during World War I.
Figure 8. Aerial image taken of Fort Douaumont near Verdun, France during World War I.
possess a reliable means for shooting down the U2 aircraft, the tracking data could be used to protest the overflights. The SR-71 (Fig. 12), also known as the ‘‘Blackbird,’’ was developed from the Lockheed A-12 and YF-12A aircraft as an advanced reconnaissance aircraft to replace the U-2. Like the U-2, it was developed and built
Figure 9. Aerial image of a German V-2 rocket (circled) at Peenemunde during World War II.
Figure 10. U-2 in flight (courtesy of USAF).
at Lockheed’s ‘‘Skunk Works.’’ The SR-71 aircraft could fly at more than 2,200 mph (980 m/s), could reach cruising altitudes more than 85,000 ft. (26 km), and could fly more than 2,000 miles (3,220 km) without refueling. The first operational SR-71 was flown in 1964, and they continued in operation until the U.S. Air Force retired them in 1990. Several SR-71s are still in service today, but only for research purposes. Although more than 1,000 antiaircraft missiles were fired at the SR-71s during their lifetime, no SR-71s were ever hit. On 6 March, 1990, an SR-71 flew from Los Angeles, California, to Washington, District of Columbia in 1 hour and 4 minutes. SR-71s are still the world’s fastest and highest flying production aircraft. After Gary Power’s U-2 was shot down in 1960, the United States pledged that it would cease overflights of the Soviet Union. This pledge made the SR-71, which was then under development, useless for collecting overhead surveillance of the Soviet Union. Another method for collecting reconnaissance images of the Soviet Union was needed. In 1945, the Air Force commissioned the RAND Corporation to investigate the feasibility of reconnaissance
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
777
Figure 11. Aerial image of a missile launch site in Cuba on 23 October, 1962 (courtesy of USAF).
Figure 12. SR-71 in flight (courtesy of USAF).
from space. Their recommendation was for a televisiontype system, that had severe limitations of spatial resolution. A new program was started that was based on collecting images on film and then returning the film to earth for processing. This program became the CORONA program (3,4) and was classified until 1995. The program was masked as a scientific mission called DISCOVERER. The CORONA camera acquired images on film, which was spooled into a capsule and returned to earth for processing. The first successful mission launch and retrieval of the film capsule finally occurred with the launch of DISCOVERER XIII on 10 August, 1960, but the mission was a no-orbit flight and carried diagnostic equipment rather than a camera and film.
The recovery of the film capsule was historic in that it was the first man-made object recovered from space. The next mission, DISCOVERER XIV, launched on 18 August, 1960, achieved an orbit whose perigee was 190 km and apogee was 810 km and returned the first overhead reconnaissance images from space. The first CORONA image taken of the Soviet airfield at Mys Shmidta on 18 August, 1960, is shown in Fig. 13. The ground resolution was approximately 10 meters and the image quality was poor, but the images provided significant intelligence information on missile sites and airfields. The early CORONA missions provided evidence that the Soviet Union had fewer intercontinental ballistic missiles than they had claimed, thus disproving the ‘‘missile gap.’’ With improvements in the film, optics, and the reduction of system vibrations, the final CORONA system design, the KH-4B, could achieve ground resolution as good as 2 meters and had an orbital life of 18 days. Figure 14 shows a CORONA image of the Pentagon on 25 September, 1967 from the first KH-4B system. The CORONA program was discontinued in 1972. The first image of the earth from space was transmitted from Explorer VI, launched on 7 August, 1959. Figure 15 shows an image of the northern Pacific Ocean taken from an altitude of 27,000 km by Explorer VI on 14 August, 1959. Although crude, it demonstrated the capability of photographing the earth’s cloud cover using a television camera in space. NASA developed the Television and Infrared Observation Satellite program (TIROS) as the world’s first meteorological satellite. TIROS-1 was launched on 4 April, 1960 and consisted of two television cameras. Although it collected data for only 78 days, it
778
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
Figure 13. First CORONA image taken of the Soviet airfield at Mys Shmidta on 18 August, 1960.
Figure 15. First image of the earth taken from space, by Explorer VI on 14 August, 1959 (courtesy of NASA).
Figure 14. CORONA 25 September, 1967.
image
taken
of
the
Pentagon
on
successfully demonstrated the utility of meteorological satellites. The first television picture taken from space by TIROS-1 is shown in Fig. 16. The next generation of meteorological satellites consisted of the NIMBUS satellites, first launched on 28 August, 1964. These satellites were placed in sun-synchronous orbits and carried more sophisticated television cameras as well as infrared systems that allowed capturing images at night. The Geostationary Operational Environment Satellite (GOES) program, first launched in 1975, operates two geostationary weather satellites for National Oceanic and Atmospheric Administration (NOAA). The two satellites give continuous coverage of both the Western and Eastern Hemispheres and track large storms, such as hurricanes. Figure 17 shows a GOES satellite image of hurricane Diana on 11 September, 1984.
Figure 16. First TIROS-1 image and the first television picture taken from space on 1 April, 1960 (courtesy of NASA).
Due to the success of the weather satellites, NASA developed satellites to collect scientific data on the earth’s resources. The first Landsat satellite, originally known as the Earth Resources Technology Satellite (ERTS), was
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
779
Figure 17. GOES satellite image of hurricane Diana, 11 September, 1984 (courtesy of NOAA).
launched on 23 July, 1972 and collected digital image data that was downlinked to several ground stations. Landsat 1 had a return beam vidicon (RBV) and a multispectral scanner (MSS) that collected images at a resolution of 80 meters (5). Landsat 4, launched on 16 July, 1982, replaced the RBV with the Thematic Mapper (TM), which collected multispectral data at a resolution of 30 meters. Landsat 7, launched on 15 April, 1999, added
the capability to collect panchromatic (black-and-white) images at a 15-meter resolution. Landsat images can reveal the extent of natural disasters, such as the 1993 St. Louis flooding captured by Landsat 5 TM images from an altitude of 705 km (Fig. 18). The French Space Agency launched its first Systeme Probatoire d’Observation de la Terre (SPOT) satellite on 22 February, 1986. The sensors on the SPOT satellites can collect multispectral images
780
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
Figure 19. One-meter resolution IKONOS image of the San Diego Convention Center, taken on 4 April, 2000 (courtesy of Space Imaging).
Figure 18. Landsat 5 TM images of St. Louis, Missouri, show the extent of the 1993 flood (courtesy of EOSAT).
at a 20-meter resolution and panchromatic images at a 10-meter resolution. Presidential Decision Directive 23 and the Remote Sensing Act of 1992 liberalized licensing of commercial remote sensing systems in the United States. Space Imaging’s IKONOS satellite, launched on 24 September, 1999, was the first commercial satellite to capture imagery at a resolution of 1 meter. IKONOS has a panchromatic sensor that can capture black-and-white images at a resolution of one meter and a multispectral sensor that can capture color images at a resolution of four meters from an altitude of 680 km. The multispectral image can be used to generate a 1-meter resolution color image (Fig. 19). The image quality and collection capabilities of satellite surveillance systems continue to improve dramatically. Figure 20 shows a comparison of a CORONA image of the Washington Monument, collected in 1967 and an IKONOS image collected in 1999. Note that the scaffolding on the monument is clearly visible in the IKONOS image. The image quality of the commercial image in 1999 is far superior to the image quality of the then classified image captured 32 years earlier. Today, an increasing amount of overhead surveillance imagery is collected by systems that do not require pilots. Satellites are used to collect data high above the earth, and unmanned aerial vehicles (UAVs) are being used to collect information at lower altitudes. Overhead
CORONA (25, September 1967)
IKONOS (30, September 1999)
Figure 20. Comparison of CORONA and IKONOS images of the Washington Monument.
surveillance systems continue to provide environmental and meteorological data to the scientific community as well as important intelligence data from areas that are difficult to access. Even human atrocities cannot escape the images captured by overhead surveillance cameras. Aerial images collected in 1944 showed the existence of Nazi concentration camps (Fig. 21) in countries controlled by Germany. Satellite images showed the extent of the oil fires in Kuwait set by retreating Iraqi soldiers in 1991 (Fig. 22). In early 1999, Kosovo refugees described mass killings that were being committed by Serbian forces, but these accounts could not be confirmed. Overhead images taken in April 1999 indicated scores of new graves where they had not appeared a month earlier (Fig. 23), offering compelling evidence of Serbian atrocities.
Figure 21. Aerial image of the Auschwitz-Birkenau extermination camp in Poland on 25 August, 1944 (courtesy of National Archives).
NOAA (4-km resolution)
Landsat (30-m resolution)
781
Figure 22. Satellite images taken in 1991 show the extent of the oil fires in Kuwait (courtesy of NOAAe and EROS Data Center).
782
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
is the portion to which the human eye is sensitive and extends from approximately 0.4 to 0.7 µm. The radiometry determines the strength of the signal that will be produced at the detector. The characteristics of the ground target, the atmosphere, and the camera system play critical roles in determining of the signal. The spectral radiant exitance of a blackbody for a given wavelength of light λ, is given by Planck’s equation (5–7), MBB (λ, T) =
Figure 23. Overhead images taken in April 1999, indicated scores of new graves near Izbica, Kosovo (courtesy of NATO).
IMAGE CHAIN OF OVERHEAD SURVEILLANCE SYSTEMS Linking together many steps in an image chain creates the final image product produced by an overhead surveillance system. Each link plays a vital role in the final quality of the image. Figure 24 illustrates the key components of the imaging chain of a remote sensing satellite. The image chain begins with the electromagnetic energy from a radiant source, for example, the sun. Surveillance systems are generally categorized by the wavelength region of the electromagnetic spectrum (Fig. 25) that the sensor images. The visible spectrum
2π hc2 λ5
1 hc
e λkT − 1
watts , m2 − µm
(1)
where T is the temperature of the source in K, h = 6.63 × 10−34 (j-s), c = 3 × 108 (m/s), and k = 1.38 × 10−23 (j/K). For a Lambertian surface, the spectral radiance from a blackbody is given by watts MBB (λ, T) . (2) LBB (λ) = π m2 − µm − sr If the exitance from the sun is approximated by a blackbody, then the solar spectral irradiance on a target on the ground can be approximated as Etarget (λ) ≈ MBB (λ, Tsun ) cos(φzenith )
r2sun sun – targ τ (λ) 2 rearth – sun atm
watts , m2 − µm
(3)
where rsun is the radius of the sun, rearth – sun is the distance sun – targ is the atmospheric from the earth to the sun, τatm transmittance along the path from the sun to the target, φzenith is the solar zenith angle, and Tsun is approximately 5,900 K. The atmosphere has a significant effect on the radiometry. Various molecules in the atmosphere absorb radiation at different wavelengths (Fig. 26) and allow using only certain atmospheric windows for imaging. The spectral radiance from a Lambertian target on earth at the entrance aperture of a remote sensing satellite can be calculated from (5–7) ρtarget (λ) targ – sat (λ) Etarget (λ) + Eskylight (λ) Ltarget (λ) = τatm π (4) + εtarget (λ)LBB (λ, Ttarget ) ,
1.0
Wavelength (µm) 10−6
10−4
10−2
1
102
104
106
108
1010
Transmittance
0.8 Figure 24. Illustration of an image chain for a satellite surveillance system.
0.6 0.4 0.2 0.0 0.2
Gamma rays X-rays UV Visible IR Microwave, Radar Figure 25. Electromagnetic spectrum.
Radio
1
10
Wavelength (µm) Figure 26. Atmospheric transmission from 0.2 to 10 µm calculated by MODTRAN v4.
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
in terms of m and Rimage , where f is the focal length of the optical system, (10) Rimage = f (m + 1).
Adetector Rtarget
R image
A image
Atarget Optics
Figure 27. An imaging system at a distance Rtarget from the target where the focal plane is at a distance Rimage from the camera optics.
targ – sat
where τatm is the atmospheric transmittance along the path from the target to the satellite, ρtarget is the reflectance of the target, εtarget is the emissivity of the target, and Eskylight is the irradiance on the target due to the skylight from atmospheric scattering. Radiometry models, such as MODTRAN, are generally used to calculate Ltarget because the radiometric calculations depend on the acquisition geometry and can be complicated (6). Note that the spectral radiance Ltarget is a combination of the solar irradiance that is reflected from the ground target, including any radiance scattered directly into the aperture from the atmosphere, as well as the blackbody radiance from the target. Assume that an imaging system is at a distance Rtarget from the target where the focal plane is at a distance Rimage from the camera optics, as shown in Fig. 27. For a polychromatic remote sensing camera where the aperture is small compared to the focal length f , the radiant flux within the spectral bandpass that reaches the entrance aperture of the camera from the target is Atarget Aaperture λmax Ltarget (λ) dλ R2target λmin λmax = Atarget Ltarget (λ) dλ(watts),
aperture =
(5)
λmin
where λmin and λmax define the spectral bandpass, Atarget is the area of the target, Aaperture is the area of the camera aperture, and is the solid angle that encompasses the aperture area. The area of the image Aimage , is given by Aimage = m2 Atarget ,
(6)
Rimage . Rtarget
image =
Aimage Aaperture f 2 (m + 1)2
λmax
Ltarget (λ)τoptics (λ) dλ.
detector = =
Adetector image Aimage Adetector Aaperture f 2 (m + 1)2
λmax
Ltarget (λ)τoptics (λ) dλ. (12) λmin
For overhead surveillance cameras that are large distances from their targets, such as those used in satellites, a telescope is required and Rtarget Rimage ; therefore, m + 1 ∼ = 1 and f ∼ = Rimage . If the overhead surveillance camera uses a telescope design that has primary and secondary mirrors, such as a RitcheyChretien or a Cassegrain, as shown in Fig. 28, then the radiant flux on the detector can be written as λmax Adetector π D2ap − D2obs Ltarget (λ)τoptics (λ) dλ, detector = 4f 2 λmin (13) or Adetector π(1 − ε) λmax Ltarget (λ)τoptics (λ) dλ, detector = 4(f #)2 λmin (14) where Dap is the diameter of the optical aperture, Dobs is the diameter of the central obscuration, ε is the fraction of the optical aperture area obscured, and f # is the system f number given by f . (15) f# = Dap
Primary mirror Focal plane
Secondary mirror
R2target R2image
(11)
λmin
If the size of the target is large compared to the groundinstantaneous field of view (GIFOV), then the target is an extended source and Aimage Adetector , where Adetector is the area of the detector. The radiant flux on the detector for an extended source is
(7)
Thus, Atarget can be written as Atarget = Aimage
Using Eq. (8) and Eq. (10), and multiplying by the transmittance of the optics τoptics , the radiant flux reaching the image plane is
The quality of the optics is critical to the final image quality and must be manufactured and built to very tight
where m is the magnification given by m=
783
.
(8)
Dap
Rewriting the Gaussian lens formula, 1 1 1 + = , Rtarget Rimage f
Dobs
(9)
Figure 28. Telescope design that has primary and secondary mirrors.
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
Modulation =
No central obscuration, ε = 0
2 (A + B + C) MTFoptics (ν ) = π (1 − ε2 ) where
D0 D
ε= ν = νc =
10% central obscuration, ε = 0.32
ν νc
B = ε cos
−1
ν ε
for 0 ≤ ν ≤ 1
for ν > 1
A = 0,
(20)
−
ν ε
(22)
1−
ν ε
2
Figure 29. The optics MTF for an incoherent diffraction-limited Ritchey–Chretien optical system that has circular aperture and circular central obscurations of 0, 5, 10, and 25%.
1.0
Smear MTF
0.8 0.6
Detector diffusion MTF Optics quality MTF Jitter MTF
0.4
Optics MTF
0.2 System MTF
Detector aperture MTF
0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 Spatial frequency, normalized to the optical cutoff frequency
(23) for ν > ε
B = 0,
(24)
φ (1 + ε) 1 + ε2 − 1 − ε2 tan−1 2 (1 − ε)
φ − π ε2 , 2 C = 0,
0.0 0.0 0.2 0.4 0.6 0.8 1.0 Spatial frequency normalized to the optical cutoff frequency
Figure 30. The individual MTF curves for a notional design and the final system MTF after the individual MTF curves have been multiplied together.
,
for 0 ≤ ν ≤ ε
C = ε sin φ +
(21)
25% central obscuration, ε = 0.50
0.2
(19)
D 1 = λf λ(f #)
0.6
0.4
(18)
A = cos−1 (ν ) − (ν ) 1 − ν 2 ,
2
(17)
5% central obscuration, ε = 0.22
0.8
maximum signal − minimum signal . (16) maximum signal + minimum signal
The MTF for an incoherent diffraction-limited optical system whose a circular aperture of diameter is D and circular central obscuration of diameter is D0 is given by
× tan
1.0
MTTF
specifications. Small deviations in the curvature of the mirror can cause a large degradation in image quality. Light that is imaged by the optics will spread out the point-spread function (PSF) describes the spreading of the light for a point object. The optical transfer function (OTF) is the Fourier transform of the optics PSF, and the magnitude of the OTF is the modulation transfer function (MTF) of the optics (8,9). The MTF measures the output modulation divided by the input modulation for each spatial frequency ν, where modulation is defined as
MTF
784
for (1 − ε) ≤ 2ν ≤ (1 + ε) (25) for 2ν ≥ (1 + ε)
and φ = cos−1
1 + ε2 − 4ν 2 2ε
(26)
.
(27)
Figure 29 shows MTF plots for central obscurations of 0, 5, 10, and 25. Note that the MTF decreases as the spatial frequency increases, which has a blurring effect on the image. Note also that there is a cutoff spatial frequency at νc , which corresponds to the resolution limit of the optics. Increasing the size of the central obscuration does not
change νc , but it does decrease the MTF, which results in additional blurring. Other factors will also blur the image, for example, vehicle motion; each has its own MTF (10). The MTF of the actual optics will be lower than the MTF given in Eq. (17) due to imperfections in manufacturing the optics. The MTF of the optics is multiplied by an optical quality MTF to achieve the actual MTF. Other MTF contributors, such as the jitter and smear caused by camera motion, can be cascaded with the MTF of the optics to yield a system MTF. Figure 30 shows the individual MTF curves for a notional design of a digital camera and the final system MTF after the individual MTF curves have been multiplied together. The MTF of the optics is usually the most significant component of the system MTF. Although the traditional detector for surveillance systems has been film, today, most systems use digital charge-coupled devices (CCDs) for the detectors (10). The CCD detectors generate an electric charge that is proportional to the number of photons that reach the
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
detector, given by ndetector =
Adetector π(1 − ε) 4(f #)2
λmax
λmin
λ tint hc
× Ltarget (λ)τoptics (λ) dλ(photons),
(28)
where tint is the integration time of the imaging system. The signal from the ground target, measured in electrons, that is generated at the detector is starget =
Adetector π(1 − ε)tint 4(f #)2 hc
λmax
η(λ) λmin
× Ltarget (λ)τoptics (λ)λ dλ(electrons),
(29)
where η is the quantum efficiency, which is the average number of photoelectrons generated per incident photon. The CCD detector allows the image data to be electronically transmitted near real time to the user rather than waiting for the film to be returned to a lab, processed, and then shipped to the user. Push-broom sensors use a linear array of CCDs that scan the ground in one direction to build up a two-dimensional image, as shown in Fig. 31. Linear arrays allow imaging larger areas in less time than framing cameras that use two-dimensional CCD arrays. These sensors typically have time delay and integration (TDI) stages to improve the signal. A single, continuous, long array can be difficult to build, so they are generally built in staggered segments. The two-dimensional scene can be reconstructed from these staggered segments through ground processing. Every detector in an electronic image sensor, such as a CCD image sensor, may have a different response function that relates the target radiance to the number of photoelectrons generated. This response function can change with time or operating temperature. For a linear CCD detector array, this can lead to streaking in the image. The sensors are calibrated to reduce this nonuniformity, so that each detector has approximately the same response for the same illumination radiance. The calibration is generally performed by illuminating each detector with a given radiance from a calibration lamp and then recording the signal from each detector to
Linear detector array P
Optics
estimate the response function for each detector. Then, the response function for each detector is used in the ground processing to equalize the output of all of the detectors, so that uniform illumination across all of the detectors will produce a uniform output. Even when the detectors are calibrated, some errors from the calibration process are unavoidable. Each detector is sensitive to a slightly different spectrum of light, but they are all calibrated using the same calibration lamp that has a broad, nonuniform spectrum. Because the scene spectrum is unknown, the calibration process assumes the spectrum of the calibration lamp, and the scenes are identical. The spectrum of the calibration lamp will usually be somewhat different from the spectrum of the scene being imaged; hence, calibration errors will occur. Calibration errors also occur because the calibration process includes an incomplete model of the complete optical process and because the response function for each detector changes over time and operating temperature. Random noise in the signal arises from elements that add uncertainty to the signal level of the target and is quantified by the standard deviation of its statistical distribution. If the distribution of each of the different noise contributors follows a normal distribution, then the variance of the total noise is the sum of the variances of each noise contributor (10). For N independent noise contributors, the standard deviation of the noise is
σnoise
GSD GIFOV Ground plane
Figure 31. Push-broom sensors use a linear array of CCDs that scan the ground in one direction to build up a two-dimensional image.
N = σ 2. n
(30)
n=1
For images with high signals, the primary noise contributor is the photon noise, which arises from random fluctuations in the arrival rate of photons. The photon noise follows a Poisson distribution; therefore, the variance of the signal equals the expected signal level s: σphoton =
√ s.
(31)
When s > 10, the Poisson distribution approximates a normal distribution. The radiance from the target is not the only light that reaches the detector. Scattered radiance from the atmosphere, as well as any stray light within the camera, will produce a background signal with the target signal at the detector. The background contribution adds an additional photon noise factor to the noise term, thus the photon noise, measured in electrons, is σphoton = =
Direction of scan (along scan)
785
2 2 σphoton target + σphoton background
starget + sbackground .
(32)
As in calculating Ltarget , calculating the atmospheric contribution to the signal is a complicated process (6), therefore, radiometry models, such as MODTRAN, are generally used to calculate the background radiance component of sbackground . When no light is incident on the CCD detector, electrons may still be generated due to the dark noise σdark .
786
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
Although many factors contribute to the dark noise (10), the principal contributor to σdark at nominal operating integration times of less than one second is the CCD read noise, caused by variations in the detector voltage. The value of σdark for a digital sensor is usually obtained from test measurements of the detector at a given temperature. The analog-to-digital converter quantizes the signal when it is converted to digital counts. This produces an uncertainty in the target signal because a range of target signals can produce the same digital count value. The standard deviation of a uniform distribution is √112 ; therefore, if the total number of electrons that can be stored at each detector, Nwell depth , is divided by NDR digital counts, where NDR is the dynamic range in digital counts, then the quantization noise is σquantization =
Nwell depth QSE = √ , √ NDR 12 12
(33)
where QSE is the quantum step equivalence in electrons per count. Combining Eq. (32) and Eq. (33) with the dark noise, the system noise in electrons can be written as σnoise =
2 2 starget + sbackground + σquantization + σdark (electrons). (34) The digital count value from the signal and noise at each detector represents a pixel, or picture element, in the image data that is then downlinked to the
Figure 32. Reducing the number of bits per pixel by simply quantizing the count values of each pixel.
ground station using a transmitter that has maximum transmission rate in bits per second. If the number of bits per pixel can be reduced or ‘‘compressed,’’ then more data can be transmitted within the limited bandwidth of the transmitter. For an 11-bpp (bits per pixel) detector, the dynamic range is NDR = 211 = 2, 048. Reducing the number of bits per pixel by simply quantizing the count values of each pixel throws away image information and also produces unacceptable quantization of the gray levels at compression levels of 4 bpp and less, as shown in Fig. 32. Bandwidth compression (BWC) algorithms attempt to reduce the average dynamic range of the image, that is, reduce the number of bits per pixel, while maintaining the image quality. Lossless BWC algorithms allow an exact reconstruction of the original image data, whereas, ‘‘lossy’’ BWC algorithms trade off this ability to allow greater compression of the image data (5). Figure 33 shows the effect on image quality as an 11-bit image is compressed to fewer bits per pixel using an adaptive differential pulse code modulation (DPCM) algorithm. The 4:1 compression does not allow exact reconstruction of the image data, but the compression is visually lossless, that is, there is no impact on image quality. The image quality loss from the 6:1 and 11:1 compression, however, is significant. Many strategies exist for compressing image data, such as wavelets, fractals, and vector quantization. After the image data has been downlinked to the ground station and decompressed, it may be hard to interpret the image due to low contrast, blurred edges, streaks from
11 bpp
5 bpp
4 bpp
3 bpp
2 bpp
1 bpp
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
787
by a computer for machine processing, such as feature classification (7). Figure 34 shows a series of images that illustrate the effects of the image chain as the ground scene is imaged and processed. Sensor Types
11 bpp (no compression)
2.8 bpp (4 : 1 compression)
1.8 bpp (6 : 1 compression)
1.0 bpp (11 : 1 compression) Figure 33. An 11-bit image compressed to fewer bits per pixel.
sensitivity differences between detectors, failed detectors, and discontinuities at the segment boundaries. Image processing techniques can be used to optimize the visual interpretability of the image data. Calibration removes the streaks, and the data from each detector segment are processed to synthesize an image collected by a single continuous detector array. The image is also processed to remove any image motion effects, such as vehicle oscillation, and geometric distortions that might occur. The image data can also be processed to a specific mapping geometry, such as orthorectification. Edge sharpening and contrast enhancement processes may be applied to generate the final image. The image chain ends with the human visual system if the image is exploited by an image analyst or alternatively
As mentioned previously, surveillance systems are generally categorized by the wavelength region of the electromagnetic spectrum that they image. The earliest cameras used black-and-white film that captured panchromatic images, that is, film that is sensitive to the visible portion of the electromagnetic spectrum. Even with CCD detectors today, black-and-white images are the most common because the light is captured over a broad spectrum of wavelengths and thus allows a high enough signal to collect high-resolution images. Visible imaging systems generally acquire images by using the sun as the illumination source. Multispectral images collect several images within different narrow bands of the electromagnetic spectrum but collectively span a broader portion of the spectrum. The most common multispectral image is a true color image; three images are collected where one images the red portion of the visible spectrum, another images the green portion of the spectrum, and the other images the blue portion of the spectrum. When these three images are viewed together on a color monitor, a true color image is produced. Figure 35 shows an example of the way multispectral images can improve the detection of military vehicles. Panchromatic (black-and-white), true color (blue, green, and red), and false color (green, red, and near infrared) images were generated using the digital image and remote sensing image generation (DIRSIG) (6) process developed at Rochester Institute of Technology (RIT). Note that the vehicles are not easily discernible from the background in the panchromatic, but they are more readily detected in the true color image and are even more apparent in the false color image where the difference in the spectrum is greater. Hyperspectral images are collected that have much finer spectral resolution than multispectral images. A multispectral collection may divide a spectral region into a few bands, whereas a hyperspectral collection may divide the same spectral region into tens to hundreds of bands. Each pixel in a hyperspectral image contains spectral characteristics about the surface material imaged within that pixel. Hyperspectral imagery is collected to help identify and discriminate different materials in the scene and is generally used with machine vision algorithms. Figure 36 shows a hyperspectral cube of Moffet Field, California, where the spectral images are stacked in order of the spectral wavelength so that the z axis shows the spectrum of each pixel. The hyperspectral data was collected using NASA’s Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor, which has 224 channels from 0.4 to 2.5 µm; each has a spectral bandwidth of approximately 10 nm. Figure 37 compares the 224 hyperspectral bands imaged by AVIRIS with the multispectral bands imaged by the Landsat Thematic Mapper.
Radiometry of ground scene
Optics
Sensor
Ground processing
Image enhancements Figure 34. Series of images illustrating the effects of the image chain as the ground scene is imaged and processed.
Panchromatic
True color
False color
Figure 35. Panchromatic (black-and-white), true color (blue, green, and red), and false color (green, red, and near infrared) images generated using RIT’s DIRSIG process to simulate the detection of military vehicles (courtesy of RIT). See color insert. 788
Solar irradiance and earth excitance (w−m−2µm−1)
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
789
10000 1000
Solar irradiance
100 Earth excitance 10 1 0.1
1
10
100
Wavelength (µm)
Figure 38. Blackbody spectral curves for the sun (T = 5, 900 K) and the earth (T = 300 K).
Figure 36. Hyperspectral cube of Moffet Field, California, taken on 20 August, 1992, using the AVIRIS sensor (courtesy of NASA JPL).
Atmosphere transmittance
1.0 0.9 0.8 0.7 0.6 0.5 0.4
Image Quality
0.3 0.2 0.1 0.0 0.4
an infrared image, whereas no differences may appear in a visible image. Synthetic aperture radar (SAR) imaging emits highfrequency radio signals and converts the returned signal into imagery. The motion of the platform is used to synthesize a larger antenna and produce higher resolution imagery. Because SAR supplies its own illumination source and the radio waves are not scattered by clouds, SAR surveillance systems can acquire images during any time of the day or night and under cloudy conditions. Figure 40 shows an example of an SAR image and illustrates the speckle characteristic of an SAR image due to the coherent nature of the radar signal.
0.7
1.0
1.3
1.6
1.9
2.2
2.5
Wavelength (µm) Landest TM multispectral bands
AVIRIS hyperspectral bands
Figure 37. The 224 hyperspectral bands imaged by AVIRIS compared with the multispectral bands imaged by the Landsat Thematic Mapper.
Infrared images are collected by using sensors that detect the electromagnetic spectrum between 0.7 and 100 µm, outside the range visible to the human eye. Figure 38 compares the spectral irradiance of the sun (T = 5, 900 K) above the earth’s atmosphere and the earth’s spectral radiant exitance (T = 300 K), where each is modeled as a blackbody distribution using Eq. (1). The earth emits radiation at wavelengths in the infrared part of the spectrum above 3 µm. For this reason, infrared surveillance systems can be used to acquire overhead images at night. Infrared images are also used to detect and measure differences in the temperature of objects. Underground features that cause a temperature difference at the surface, such as hot pipes, can be detected, as shown in Fig. 39. Trucks or planes that have their engines running can be discriminated from those that are idle in
Image quality is a broad term that encompasses many factors and has many measures. Image quality may have different meanings to different users; for example, a user of hyperspectral data will require high spectral resolution, whereas a user of visible panchromatic imagery may require high spatial resolution. The utility of an image should not be equated with the quality of the image. For example, geographic surveys can be performed better with overhead images that trade off lower resolution for a larger area of coverage. Figure 41 is a simulation of two overhead images acquired from the same camera but at different altitudes. The image acquired at the lower altitude has more detail that can be resolved on the ground, but much of the surrounding information is lost, which may be very important for understanding the context of the objects on the ground. The image quality depends on each element of the image chain, illustrated in Fig. 24. Assuming that all elements of the image chain have been optimized to maximize image quality, then the primary limitations on image quality for most overhead surveillance systems will be the spatial resolution and the SNR (signal-to-noise ratio). Spatial Resolution The highest spatial resolution, that is, the resolving power, of an imaging system is the highest spatial frequency that can be resolved in the final image. In film systems, the limiting resolution was determined by imaging ground resolution targets and then determining
790
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
Visible
Infrared
Figure 39. Visible and infrared image of the same scene. Note that the hot underground pipe is visible in the infrared image.
the limiting factor in spatial resolution, the interaction between detector sampling and the performance of the optics plays an important role in determining final image quality. The GSD is usually the only figure of merit used to communicate the image quality of an overhead surveillance system that uses a digital camera. Figure 42 shows image simulations of various imaging system designs at the same GSD, but each has a different system MTF, SNR, and amount of image motion that results in very different image quality. The detector resolution and the optical resolution for all camera systems must be understood to assess image quality. Detector Resolution Figure 40. SAR image of the U.S. Capitol building (courtesy of Sandia National Laboratories).
the ground resolvable distance (GRD) of the film product. The performance of the optics, the film, and the stability of the collection system primarily limited the GRD. Most digital overhead surveillance systems use the ground sampled distance (GSD) as the measure for spatial resolution. The GSD, however, refers only to the detector sampling projected onto the ground and, unlike the GRD, ignores any effects that the optical system may have on spatial resolution. Even if the detector sampling is
Figure 41. Simulation of two overhead images acquired from the same camera but at different altitudes.
Assuming that the digital detector has good signal performance, the detector sampling pitch, that is, the distance between the centers of adjacent detector elements, limits the highest spatial frequency that can be sampled without aliasing. Spatial frequencies higher than the Nyquist frequency, defined by (9) νN = Nyquist frequency ≡
1 , 2p
(35)
where p is the detector sampling pitch, will be aliased and will appear as spatial frequencies less than the Nyquist
High-altitude collection
Low-altitude collection
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
791
Figure 42. Images at the same GSD but of different image quality.
frequency in the image. Figure 43 illustrates the effect of aliasing on an image when the sampling pitch is increased. For this discussion, the imaging detector will be modeled as a linear array of detector elements that is imaged in a push-broom fashion, as shown in Fig. 31. The GSD is calculated by projecting the detector sampling pitch through the imaging system onto the ground. If the fill factor is 100% for each detector element, then the GSD is equal to the ground instantaneous field of view (GIFOV), which is the linear extent of each detector projected onto the ground. The GSD in one dimension for a nadir viewing geometry is h GSD = p , f
(36)
where p is the detector pitch, h is the altitude of the satellite, and f is the effective focal length of the optical system. In terms of the GSD, the highest spatial frequency
on the ground that can be sampled without aliasing is
νN =
h . 2f GSD
(37)
Referring to Fig. 44, the slant range H is the distance from the satellite to the ground target being imaged and is given by H=
(R + h)2 − R2 cos2 (ISEL) − R sin(ISEL),
(38)
where R is the radius of the earth. The imaging satellite elevation angle (ISEL) is the angle between the line from the ground target to the satellite and the line tangent to the earth at the ground target. The ISEL is related to the look angle L, which is the angle between the line from the satellite to the ground target and the line from the
792
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
Then, the geometric mean GSD is
GSD = Original image
Sampling pitch increased 2x
H GSDx GSDy = p f
2
× sin (β) +
cos (β) +
cos(β) sin(ISEL)
2
sin(β) sin(ISEL)
2
2 14 .
(41)
Diffraction Resolution Sampling pitch increased 3x
Sampling pitch increased 4x
Sampling pitch increased 5x
Sampling pitch increased 6x
Figure 43. The effect of aliasing on an image when the sampling pitch is increased.
where r = x2 + y2 and J1 (r) is the first-order Bessel function. The modulus of the Fourier transform of the optics PSF is the MTF. The MTF of a diffraction-limited incoherent optical system that has a circular aperture has a distinct spatial frequency cutoff νc , given by (8,9)
L h
H
ISEL
R
The diffraction resolution of a diffraction-limited incoherent imaging system is determined by the optics. The diffraction of light imaged through the optics causes a point of light to spread out, as described by the point-spread function (PSF) (8,9). The diffractionlimited incoherent optics PSF for a circular aperture is given by 2 π Dr 2J1 λf , (42) PSFoptics (r) = π Dr λf
νc =
b
D 1 = , λf λ(f #)
(43)
In two dimensions, the GSD is commonly defined as the geometric mean of the x and y detector pitch projected to the ground. The x axis is associated with the crossscan direction of the array and the y axis is associated with the along-scan direction. Because the GSD is the sampling distance projected onto the ground plane, the sampling distance along the line of sight will be increased by a factor of 1/sin(ISEL). If β is the angle between the x direction of the digital array and the line perpendicular from the line of sight, then the GSD in the x direction will be
where D is the diameter of the optics aperture, f is the focal length, λ is the wavelength of light, and f # is the system f number that equals f /D. This spatial frequency cutoff limits the spatial resolution that can be imaged with the optical system. This cutoff does not change, even if a central obscuration is placed in the optical aperture. Although it is common in the optics community to define the width of the diffraction-limited incoherent optics PSF for a circular aperture as the diameter of the first zero in PSFoptics from Eq. (42), given by 2.44 λ(f #), it will be convenient in this analysis if we define the PSF width as λ(f #), which is approximately the full width at half maximum (FWHM) of the PSF for a circular aperture. If the optics ground spot size (GSSoptics ) is defined as the width of the optics PSF projected through the imaging system onto the ground, as illustrated in Fig. 45, then the GSSoptics at a nadir viewing geometry for a diffractionlimited incoherent optical system that has a circular aperture is h h (44) GSSoptics = λ = λ(f #) . D f
2 12 H sin(β) . cos2 (β) + GSDx = p f sin(ISEL)
For viewing geometries off-nadir, the GSSoptics can be calculated as the geometric mean of the projection onto the ground of the x and y directions, given by
Figure 44. Satellite acquisition geometry used to calculate the off-nadir GSD.
satellite to the center of the earth, by sin(L) =
R R+h
cos(ISEL).
(39)
(40)
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
PSFoptics Focal plane
GSSoptics = GSD
Optics
GSSoptics Ground plane Figure 45. The optics PSF projected onto the ground.
GSSoptics =
GSSopticsx GSSopticsy
H =λ D
cos (β) + 2
2
× sin (β) +
sin(β) sin(ISEL)
cos(β) sin(ISEL)
2
2 14 .
(45)
Optimizing Detector and Diffraction Resolution A digital imaging system can be designed, so that the system resolution is either detector-limited or diffractionlimited. The two resolution limits can be compared to one another by calculating the ratio of the sampling frequency to the optical bandpass limit of the optical system. For a diffraction-limited incoherent optical system, this ratio is
Detector sampling frequency = Optical bandpass limit
1 λ(f #) p = . 1 p λ(f #)
2 14 sin(β) cos2 (β) + sin(ISEL) H λ 2 D cos(β) 2 sin (β) + sin(ISEL) 2 14 sin(β) cos2 (β) + sin(ISEL) H p 2 f cos(β) 2 sin (β) + sin(ISEL)
793
=
λ(f #) . p
(48) Figure 46 shows the sampling of the optics PSF from Eq. (42) for λ(f #)/p equal to 1, 2, and 3. For panchromatic systems, λ represents the mean wavelength. Note that λ(f #)/p is independent of the acquisition geometry, that is, it is a fundamental design parameter of the imaging system. The diffraction resolution equals the detector resolution when λ(f #)/p = 2 because the optics MTF falls to zero at the Nyquist frequency and νN = νc . If λ(f #)/p < 2, then the spatial resolution is limited by the detector because νN < νc . Spatial frequencies captured by the optics that are above νN will be manifested as aliasing artifacts. If λ(f #)/p > 2, then the spatial resolution is limited by the diffraction of the optics because νc < νN . This would imply that an overhead surveillance system that is designed to λ(f #)/p = 2 by reducing the GSD or the GSSoptics will provide the best image quality, yet remote sensing systems are generally designed so that λ(f #)/p < 2. It is important to note that spatial resolution alone is not synonymous with image quality. Although more aliasing occurs as λ(f #)/p is reduced, the higher MTF below νN will cause edges in the image to appear sharper. Figure 47 shows image simulations that illustrate the improvement in image sharpness as λ(f #)/p decreases (11). Stronger sharpening filters could be used to improve the edge sharpness of the higher λ(f #)/p images, but the noise gain and edge overshoot effects would have been undesirable. Another important benefit of reducing λ(f #)/p is that the SNR improves.
Optics PSF for circular aperture
(46)
Detector
In terms of the Nyquist frequency and the cutoff frequency, l(f/#)/p = 1
λ(f #) 2νN . = p νc
(47)
2.44 l (f/#)
p
l(f/#)/p = 2
λ(f #)/p can also be interpreted as a measure of how finely the detector samples the diffraction-limited optics PSF. Thus, in remote sensing, λ(f #)/p is interpreted as a measure of how finely the GSD samples the ground scene with respect to the GSSoptics , given by
l(f/#)/p = 3 Figure 46. λ(f #)/p as a measure of how finely the detector samples the optics PSF.
794
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
where starget is given by Eq. (29) and σnoise is given by Eq. (34). This SNR calculation for a remote sensing system design would be straightforward, except for the calculation of the target spectral radiance Ltarget (λ). The target spectral radiance depends on the imaging collection parameters, that is, the solar angle, the atmospheric conditions, and the viewing geometry of the remote sensing system, as well as the target reflectance ρtarget (λ). Figure 48 shows the effects of the atmosphere on image quality. As the visibility of the atmosphere decreases, the contrast of the image decreases. Image processing used to improve the image contrast also enhances the noise in the image. Assuming that ‘‘typical’’ imaging collection parameters will be used to calculate the SNR, the most common difference between SNR metrics is the value used for
l(f# )/p = 0.3
l(f# )/p = 1.0
Horizontal visibility = 48 km Nadir viewing geometry l(f# )/p = 1.5
Horizontal visibility = 24 km Nadir viewing geometry l(f# )/p = 2.0 Figure 47. Changing λ(f #)/p while maintaining a GSD of 1 meter at constant SNR.
Signal-to-Noise Ratio (SNR) The quality of an overhead surveillance image depends on the amount of signal that is captured from the desired ground target compared to the noise level in the image. Many different metrics have surfaced in the remote sensing community over the years to define the SNR. In their basic form, all of the metrics relate a signal level to a noise level, that is SNR ≡
signal , noise
(49)
but the differences arise in what is considered signal and what is considered noise. Most SNR metrics compare the mean signal with the standard deviation of the noise, such that SNR =
starget mean target signal = signal deviation σnoise ,
Horizontal visibility = 8 km Nadir viewing geometry
(50)
Horizontal visibility = 8 km 60° look angle Figure 48. The visibility of the atmosphere and the look angle influence the SNR and the quality of the image.
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
ρtarget (λ). A common assumption is to use the signal from a 100% reflectance target, given by $ starget $$ ρtarget = 100%, (51) SNRρ=100% = σnoise $ where the vertical line ‘‘|’’ means ‘‘evaluated at’’. This metric is not very realistic for remote sensing purposes, so values closer to the average reflectance of the earth are used instead. For land surfaces, the average reflectance of the earth between λmin = 0.4 µm and λmax = 0.9 µm is approximately 15% but will vary depending on the terrain type, such as soil and vegetation. Two targets cannot be distinguished from one another in the image if the difference between their reflectance values is less than the signal differences caused by noise. Therefore, it is beneficial to define the signal that uses the difference of the reflectance between two targets (or a target and its background): ρ = ρhigh − ρlow .
(52)
The SNR metric for the reflectance difference between the two targets is $ $ $ $ − starget $$ starget $$ ρtarget =ρhigh ρtarget =ρlow SNRρ = σnoise $ $ starget $$ ρtarget =ρ = . (53) σnoise
SNR(r = 15%) = 84 NEDr = 0.2%
SNR(r = 15%) = 12 NEDr = 1.3%
SNR(r = 15%) = 6 NEDr = 2.4%
SNR(r = 15%) = 1 NEDr = 11%
795
This SNR metric is used often in remote sensing, but the value of SNRρ dependents on the values chosen for ρhigh and ρlow . The value for ρhigh is typically used to calculate the photon noise in σnoise . Another metric commonly used is the noise equivalent change in reflectance, or NEρ, which calculates the difference of the reflectance between two targets that is equivalent to the standard deviation of the noise. Therefore, it will be difficult, to differentiate two targets that have reflectance differences less than the NEρ due to the noise. The NEρ can be calculated by solving SNRρ for ρ. If ρ is independent of λ, then the NEρ is simply NEρ =
σ 1 ρ $ noise = = . $ SNRρ SNRρ starget $$ ρ ρtarget =100%
(54)
Figure 49 shows image simulations for a camera design at different SNR and NEρ values. A 15% target reflectance value and 75 electrons of dark noise were used. National Imagery Interpretability Rating Scale The National Imagery Interpretability Rating Scale (NIIRS) is a 0–9 scale developed by the U. S. government’s Imagery Resolution Assessment and Reporting Standards (IRARS) Committee (12). The NIIRS is an exploitation task-based scale that quantifies the interpretability of
Figure 49. Image simulations for a camera design at different SNR and NEρ values.
Table 1. Visible NIIRS Criteria Visible NIIRS
Criteria
0
Interpretability of the imagery is precluded by obscuration, degradation, or very poor resolution
1
Detect a medium-sized port facility and/or distinguish between taxiways and runways at a large airfield.
2
Detect large hangars at airfields. Detect large static radars (e.g., AN/FPS-85, COBRA DANE, PECHORA, HENHOUSE). Detect military training areas. Identify an SA-5 site based on road pattern and overall site configuration. Detect large buildings at a naval facility (e.g., warehouses, construction hall). Detect large buildings (e.g., hospitals, factories).
3
Identify the wing configuration (e.g., straight, swept, delta) of all large aircraft (e.g., 707, CONCORD, BEAR, BLACKJACK). Identify radar and guidance areas at a SAM site by the configuration, mounds, and presence of concrete aprons. Detect a helipad by the configuration and markings. Detect the presence/absence of support vehicles at a mobile missile base. Identify a large surface ship in port by type (e.g., cruiser, auxiliary ship, noncombatant/merchant). Detect trains or strings of standard rolling stock on railroad tracks (not individual cars).
4
Identify all large fighters by type (e.g., FENCER, FOXBAT, F-15, F-14). Detect the presence of large individual radar antennas (e.g., TALL KING). Identify, by general type, tracked vehicles, field artillery, large river crossing equipment, wheeled vehicles when in groups. Detect an open missile silo door. Determine the shape of the bow (pointed or blunt/rounded) on a medium-sized submarine (e.g., ROMEO, HAN, Type 209, CHARLIE 11, ECHO 11, VICTOR II/III). Identify individual tracks, rail pairs, control towers. Distinguish between a MIDAS and a CANDID by the presence of refueling equipment (e.g., pedestal and wing pod). Identify radar as vehicle-mounted or trailer-mounted. Identify, by type, deployed tactical SSM systems (e.g., FROG, SS-21, SCUD). Distinguish between SS-25 mobile missile TEL and Missile Support Vans (MSVS) in a known support base, when not covered by camouflage. Identify TOP STEER or TOP SAIL air surveillance radar on KIROV-, SOVREMENNY-, KIEV-, SLAVA-, MOSKVA-, KARA-, or KRESTA-II-class vessels.
5
6
Distinguish between models of small/medium helicopters (e.g., HELIX A from HELIX B from HELIX C, HIND D from HIND E, HAZE A from HAZE B from HAZE C). Identify the shape of antennas on EW/GCI/ACQ radars as parabolic, parabolic with clipped corners, or rectangular. Identify the spare tire on a medium-sized truck. Distinguish between SA-6, SA- I 1, and SA- 17 missile airframes. Identify individual launcher covers (8) of vertically launched SA-N-6 on SLAVA-class vessels. Identify automobiles as sedans or station wagons.
7
Identify fitments and fairings on a fighter-sized aircraft (e.g., FULCRUM, FOXHOUND). Identify ports, ladders, vents on electronics vans. Detect the mount for antitank guided missiles (e.g., SAGGER on BMP-1). Detect details of the silo door hinging mechanism on Type III-F, III-G, and 11-H launch silos and Type III-X launch control silos. Identify the individual tubes of the RBU on KIROV-, KARA-, KRIVAK-class vessels. Identify individual rail ties.
8
Identify the rivet lines on bomber aircraft. Detect horn-shaped and W-shaped antennas mounted atop BACKTRAP and BACKNET radars. Identify a hand-held SAM (e.g., SA-7/14, REDEYE, STINGER). Identify joints and welds on a TEL or TELAR. Detect winch cables on deck-mounted cranes. Identify windshield wipers on a vehicle.
9
Differentiate cross-slot from single-slot heads on aircraft skin panel fasteners. Identify small light-toned ceramic insulators that connect wires of an antenna canopy. Identify vehicle registration numbers (VRN) on trucks. Identify screws and bolts on missile components. Identify braid of ropes (1–3 inches in diameter). Detect individual spikes in railroad ties.
796
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
797
Table 2. Civilian NIIRS Criteria Visible NIIRS
Criteria
0
Interpretability of the imagery is precluded by obscuration, degradation, or very poor resolution.
1
Distinguish between major land use classes (e.g., urban, agricultural, forest, water, barren). Detect a medium-sized port facility. Distinguish between runways and taxiways at a large airfield. Identify large area drainage patterns by type (e.g., dendritic, trellis, radial).
2
Identify large (i.e., greater than 160-acre) center-pivot irrigated fields during the growing season. Detect large buildings (e.g., hospitals, factories). Identify road patterns, like clover leafs, on major highway systems. Detect ice-breaker tracks. Detect the wake from a large (e.g., greater than 300’) ship.
3
Detect large area (i.e., larger than 160 acres) contour plowing. Detect individual houses in residential neighborhoods. Detect trains or strings of standard rolling stock on railroad tracks (not individual cars). Identify inland waterways navigable by barges. Distinguish between natural forest stands and orchards.
4
Identify farm buildings as barns, silos, or residences. Count unoccupied railroad tracks along right-of-way or in a railroad yard. Detect basketball court, tennis court, volleyball court in urban areas. Identify individual tracks, rail pairs, control towers, switching points in rail yards. Detect jeep trails through grassland. Identify Christmas tree plantations. Identify individual rail cars by type (e.g., gondola, flat, box) and locomotives by type (e.g., steam, diesel). Detect open bay doors of vehicle storage buildings. Identify tents (larger than two person) at established recreational camping areas. Distinguish between stands of coniferous and deciduous trees during leaf-off condition. Detect large animals (e.g., elephants, rhinoceros, giraffes) in grasslands.
5
6
Detect narcotics intercropping based on texture. Distinguish between row (e.g., corn, soybean) crops and small grain (e.g., wheat, oats) crops. Identify automobiles as sedans or station wagons. Identify individual telephone/electric poles in residential neighborhoods. Detect foot trails through barren areas.
7
Identify individual mature cotton plants in a known cotton field. Identify individual railroad ties. Detect individual steps on a stairway. Detect stumps and rocks in forest clearings and meadows.
8
Count individual baby pigs. Identify a USGS benchmark set in a paved surface. Identify grill detailing and/or the license plate on a passenger/truck type vehicle. Identify individual pine seedlings. Identify individual water lilies on a pond. Identify windshield wipers on a vehicle.
9
Identify individual grain heads on small grains (e.g., wheat, oats, barley). Identify individual barbs on a barbed wire fence. Detect individual spikes in railroad ties. Identify individual bunches of pine needles. Identify an ear tag on large game animals (e.g., deer, elk, moose).
an image and has become a principal measure of image quality for reconnaissance systems (13). An experienced image analyst can assign an NIIRS rating to an image. The specific objects listed in the NIIRS criteria do not need to be in the image, but the experience of the image analyst is required to decide if the NIIRS criteria would be
met if the information were present in the image. If more information can be extracted from the image, then the NIIRS rating will increase. Therefore, the NIIRS is directly related to the quality of the image being exploited and is an important tool for defining the imaging requirements of surveillance systems.
Table 3. Infrared NIIRS Criteria Infrared NIIRS
Criteria
0
Interpretability of the imagery is precluded by obscuration, degradation, or very poor resolution.
1
Distinguish between runways and taxiways on the basis of size, configuration, or pattern at a large airfield. Detect a large (e.g., greater than 1 square kilometer) cleared area in dense forest. Detect large ocean-going vessels (e.g., aircraft carrier, supertanker, KIROV) in open water. Detect large areas (e.g., greater than 1 square kilometer) of marsh/swamp.
2
Detect large aircraft (e.g., C-141, 707, BEAR, CANDID, CLASSIC). Detect individual large buildings (e.g., hospitals, factories) in an urban area. Distinguish between densely wooded, sparsely wooded, and open fields. Identify an SS-25 base by the pattern of buildings and roads. Distinguish between naval and commercial port facilities based on type and configuration of large functional areas.
3
Distinguish between large (e.g., C-141, 707, BEAR, A300 AIRBUS) and small aircraft (e.g., A-4, FISHBED, L-39). Identify individual thermally active flues running between the boiler hall and smoke stacks at a thermal power plant. Detect a large air warning radar site based on the presence of mounds, revetments, and security fencing. Detect a driver training track at a ground forces garrison. Identify individual functional areas (e.g., launch sites, electronics area, support area, missile handling area) of an SA-5 launch complex. Distinguish between large (e.g, greater than 200-meter) freighters and tankers.
4
Identify the wing configuration of small fighter aircraft (e.g., FROGFOOT, F-16, FISHBED). Detect a small (e.g., 50 meter square) electrical transformer yard in an urban area. Detect large (e.g., greater than 10-meter diameter) environmental domes at an electronics facility. Detect individual thermally active vehicles in garrison. Detect thermally active SS-25 MSVs in garrison. Identify individual closed cargo hold hatches on large merchant ships. Distinguish between single-tail (e.g., FLOGGER, F-16, TORNADO) and twin-tailed (e.g., F-15, FLANKER, FOXBAT) fighters. Identify outdoor tennis courts. Identify the metal lattice structure of large (e.g., approximately 75-meter) radio relay towers. Detect armored vehicles in a revetment. Detect a deployed transportable electronics tower (TET) at an SA-10 site. Identify the stack shape (e.g., square, round, oval) on large (e.g., greater than 200-meter) merchant ships.
5
6
Detect wing-mounted stores (i.e., ASM, bombs) protruding from the wings of large bombers (e.g., B-52, BEAR, Badger). Identify individual thermally active engine vents atop diesel locomotives. Distinguish between a FIX FOUR and FIX SIX site based on antenna pattern and spacing. Distinguish between thermally active tanks and APCs (Armored Personnel Carrier). Distinguish between a two-rail and four-rail SA-3 launcher. Identify missile tube hatches on submarines.
7
Distinguish between ground attack and interceptor versions of the MIG-23 FLOGGER based on the shape of the nose. Identify automobiles as sedans or station wagons. Identify antenna dishes (less than 3 meters in diameter) on a radio relay tower. Identify the missile transfer crane on an SA-6 transloader. Distinguish between an SA-2/CSA-1 and a SCUD-B missile transporter, when missiles are not loaded. Detect mooring cleats or bollards on piers.
8
Identify the RAM airscoop on the dorsal spine of FISHBED J/K/L. Identify limbs (e.g., arms, legs) on an individual. Identify individual horizontal and vertical ribs on a radar antenna. Detect closed hatches on a tank turret. Distinguish between fuel and oxidizer multisystem propellant transporters based on twin or single fitments on the front of the semitrailer. Identify individual posts and rails on deck edge life rails.
9
Identify access panels on fighter aircraft. Identify cargo (e.g., shovels, rakes, ladders) in an open-bed, light-duty truck. Distinguish between BIRDS EYE and BELL LACE antennas based on the presence or absence of small dipole elements. Identify turret hatch hinges on armored vehicles. Identify individual command guidance strip antennas on an SA-2/CSA-1 missile. Identify individual rungs on bulkhead-mounted ladders.
798
Table 4. Radar NIIRS Criteria Radar NIIRS
Criteria
0
Interpretability of the imagery is precluded by obscuration, degradation, or very poor resolution.
1
Detect the presence of aircraft dispersal parking areas. Detect a large, cleared swath in a densely wooded area. Detect, a port facility based on presence of piers and warehouses. Detect lines of transportation (either road or rail), but do not distinguish between.
2
Detect the presence of large (e.g., BLACKJACK, CAMBER, COCK, 707, 747) bombers or transports. Identify large phased-array radars (e.g., HEN HOUSE, DOG HOUSE) by type. Detect a military installation by building pattern and site configuration. Detect road pattern, fence, and hardstand configuration at SSM launch sites (missile silos, launch control silos) within a known ICBM complex. Detect large noncombatant ships (e.g., freighters or tankers) at a known port facility. Identify athletic stadiums.
3
Detect medium-sized aircraft (e.g., FENCER, FLANKER, CURL, COKE, F-15). Identify an ORBITA site on the basis of a 12-meter dish antenna normally mounted on a circular building. Detect vehicle revetments at a ground forces facility. Detect vehicles/pieces of equipment at an SAM, SSM, or ABM fixed missile site. Determine the location of the superstructure (e.g., fore, amidships, aft) on a medium-sized freighter. Identify a medium-sized (approx. six-track) railroad classification yard. Distinguish between large rotary-wing and medium fixed-wing aircraft (e.g., HALO helicopter versus CRUSTY transport). Detect recent cable scars between facilities or command posts. Detect individual vehicles in a row at a known motor pool. Distinguish between open and closed sliding roof areas on a single bay garage at a mobile missile base. Identify square bow shape of ROPUCHA class (LST). Detect all rail/road bridges.
4
5
Count all medium helicopters (e.g., HIND, HIP, HAZE, HOUND, PUMA, WASP). Detect deployed TWIN EAR antenna. Distinguish between river crossing equipment and medium/heavy armored vehicles by size and shape (e.g., MTU-20 vs T-62 MBT). Detect missile support equipment at an SS-25 RTP (e.g., TEL, MSV). Distinguish bow shape and length/width differences of SSNS. Detect the break between railcars (count railcars).
6
Distinguish between variable and fixed-wing fighter aircraft (e.g., FENCER vs FLANKER). Distinguish between the BAR LOCK and SIDE NET antennas at a BAR LOCK/SIDE NET acquisition radar site. Distinguish between small support vehicles (e.g., UAZ-69, UAZ-469) and tanks (e.g., T-72, T-80). Identify SS-24 launch triplet at a known location. Distinguish between the raised helicopter deck on a KRESTA II (CG) and the helicopter deck with main deck on a KRESTA I (CG).
7
Identify small fighter aircraft by type (e.g., FISHBED, FITTER, FLOGGER). Distinguish between electronics van trailers (without tractor) and van trucks in garrison. Distinguish, by size and configuration, between a turreted, tracked APC and a medium tank (e.g., BMP-1/2 vs T-64). Detect a missile on the launcher in an SA-2 launch revetment. Distinguish between bow-mounted missile system on KRIVAK I/II and bow-mounted gun turret on KRIVAK III. Detect road/street lamps in an urban residential area or military complex.
8
Distinguish the fuselage difference between a HIND and a HIP helicopter. Distinguish between the FAN SONG E missile control radar and the FAN SONG F based on the number of parabolic dish antennas (three vs. one). Identify the SA-6 transloader when other SA-6 equipment is present. Distinguish limber hole shape and configuration differences between DELTA I and YANKEE I (SSBNs). Identify the dome/vent pattern on rail tank cars.
9
Detect major modifications to large aircraft (e.g., fairings, pods, winglets). Identify the shape of antennas on EW/GCI/ACQ radars as parabolic, parabolic with clipped corners, or rectangular. Identify, based on presence or absence of turret, size of gun tube, and chassis configuration, wheeled or tracked APCs by type (e.g., BTR-80, BMP- 1/2, MT-LB, Ml 13). Identify the forward fins on an SA-3 missile. Identify individual hatch covers of vertically launched SA-N-6 surface-to-air system. Identify trucks as cab-over-engine or engine-in-front.
799
800
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
Separate NIIRS criteria have been developed for visible, infrared, radar, and multispectral sensor systems because the exploitation tasks for each sensor type can be very different. The visible NIIRS criteria are shown in Table 1. The NIIRS criteria were originally developed for military applications. Realizing the need to relate NIIRS to commercial systems, a civilian NIIRS for visible imagery was eventually developed that is shown in Table 2. The infrared, radar, and multispectral NIIRS criteria are shown in Tables 3, 4, and 5, respectively. Although NIIRS is defined as an integral scale, NIIRS (delta-NIIRS) ratings at fractional NIIRS are used to measure small differences in image quality between two images. A NIIRS that is less than 0.1 NIIRS is usually not perceptible and does not impact the interpretability of the image, whereas a NIIRS more than 0.2 NIIRS is easily perceptible. The NIIRS scale is designed so that the NIIRS ratings are independent of the NIIRS rating of the image, for example, a degradation that produces a 0.2 NIIRS loss in image quality on a NIIRS 6 image will also produce a 0.2 NIIRS loss on a NIIRS 4 image. An image quality equation (IQE) is a tool designed to predict the NIIRS rating of an image, given an imaging system design and collection parameters. The General Image Quality Equation (GIQE), version 4, for visible EO systems is (14)
Predicted NIIRS = 3
Predicted NIIRS = 4
Predicted NIIRS = 5
NIIRS = 10.251 − a log10 GSDGM + b log10 RERGM − 0.656HGM − 0.344
G , SNR
(55)
where GSDGM is the geometric mean (GSD), RERGM is the geometric mean of the normalized relative edge response (RER), HGM is the geometric mean-height overshoot caused by the edge sharpening, G is the noise gain from the edge sharpening, and SNR is the signal-to-noise ratio. Figure 50 illustrates the calculation of RER and H from a normalized edge response. The SNR calculations use Eq. (53) where ρhigh = 15% and ρlow = 7%. The coefficient a equals 3.32 and b equals 1.559, if RERGM ≥ 0.9; a equals 3.16, and b equals 2.817 if RERGM < 0.9. Figure 51 shows image simulations at one NIIRS increments to illustrate the change in image quality as NIIRS increases from
Normalized edge response
H = height of overshoot
1.50
Figure 51. Image simulations at 1 NIIRS increments.
NIIRS 3 to NIIRS 6. Figure 52 shows the decimal NIIRS predictions to illustrate the image quality change at 0.2 NIIRS increments. ABBREVIATIONS AND ACRONYMS
Enhanced edge profile
AVIRIS
1.00
0.50
Predicted NIIRS = 6
Blurred edge profile
RER = slope 0.00 −4.5 −4 −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 −0.50 −1.00
Figure 50. RER and H calculation from a normalized edge response.
bpp BWC CCD DIRSIG DPCM ERTS FWHM GIFOV GIQE GOES GRD
airborne visible/infrared imaging spectrometer bits per pixel bandwidth compression charge coupled device digital image and remote sensing image generation differential pulse code modulation earth resources technology satellite full width at half maximum ground-instantaneous field of view general image quality equation geostationary operational environment satellite ground resolvable distance
IMAGING SCIENCE IN OVERHEAD SURVEILLANCE
801
Table 5. Multispectral NIIRS Criteria Multispectral NIIRS
Criteria
0
Interpretability of the imagery is precluded by obscuration, degradation, or very poor resolution.
1
Distinguish between urban and rural areas. Identify a large wetland (greater than 100 acres). Detect meander flood plains (characterized by features such as channel scars, oxbow lakes, meander scrolls). Delineate coastal shoreline. Detect major highway and rail bridges over water (e.g., Golden Gate, Chesapeake Bay). Delineate extent of snow or ice cover.
2
Detect multilane highways. Detect strip mining. Determine water current direction as indicated by color differences (e.g., tributary entering larger water feature, chlorophyll, or sediment patterns). Detect timber clear-cutting. Delineate extent of cultivated land. Identify riverine flood plains. Detect vegetation/soil moisture differences along a linear feature (suggesting the presence of a fence line). Identify major street patterns in urban areas. Identify golf courses. Identify shoreline indications of predominant water currents. Distinguish among residential, commercial, and industrial areas within an urban area. Detect reservoir depletion.
3
4
Detect recently constructed weapon positions (e.g., tank, artillery, self-propelled gun) based on the presence of revetments, berms, and ground scarring in vegetated areas. Distinguish between two-lane improved and unimproved roads. Detect indications of natural surface airstrip maintenance or improvements (e.g., runway extension, grading, resurfacing, bush removal, vegetation cutting). Detect landslide or rockslide large enough to obstruct a single-lane road. Detect small boats (15–20 feet in length) in open water. Identify areas suitable for use as light fixed-wing aircraft (e.g., Cessna, Piper Cub, Beechcraft) landing strips.
5
Detect automobile in a parking lot. Identify beach terrain suitable for amphibious landing operation. Detect ditch irrigation of beet fields. Detect disruptive or deceptive use of paints or coatings on buildings/structures at a ground forces installation. Detect raw construction materials in ground forces deployment areas (e.g., timber, sand, gravel).
6
Detect summer woodland camouflage netting large enough to cover a tank against a scattered tree background. Detect foot trail through tall grass. Detect navigational channel markers and mooring buoys in water. Detect livestock in open, but fenced areas. Detect recently installed minefields in ground forces deployment area based on a regular pattern of disturbed earth or vegetation. Count individual dwellings in subsistence housing areas (e.g., squatter settlements, refugee camps).
7
Distinguish between tanks and three-dimensional tank decoys. Identify individual 55-gallon drums. Detect small marine mammals (e.g., harbor seals) on sand/gravel beaches. Detect underwater pier footings. Detect foxholes by ring of spoil outlining hole. Distinguish individual rows of truck crops.
GSD GSS IRARS ISEL MODTRAN MSS MTF NASA NIIRS
ground sampled distance ground spot size imagery resolution assessment and reporting standards imaging satellite elevation angle moderate resolution transmittance multispectral scanner modulation transfer function national aeronautics and space administration national imagery interpretability rating scale
NOAA OTF PSF QSE RBV RER SAR SNR SPOT TDI TIROS
national oceanic and atmospheric administration optical transfer function point spread function quantum step equivalence return beam vidicon relative edge response synthetic aperture radar signal-to-noise ratio systeme probatoire d’observation de la terre time delay and integration television and infrared observation satellite
802
INFRARED THERMOGRAPHY
Figure 52. Image simulations at 0.2 NIIRS increments.
TM UAV
thematic mapper unmanned aerial vehicle
Predicted NIIRS = 4.1
Predicted NIIRS = 4.3
Predicted NIIRS = 4.5
Predicted NIIRS = 4.7
Predicted NIIRS = 4.9
Predicted NIIRS = 5.1
14. J. C. Leachtenauer, W. Malila, J. Irvine, L. Colburn, and N. Salvaggio, Appl. Opt. 36, 8,322–8,328 (1997).
BIBLIOGRAPHY 1. F. J. Janza, ed., Manual of Remote Sensing, American Society of Photogrammetry, Falls Church, 1975. 2. R. Bailey, The Air War in Europe, Time-Life Books, Alexandria, 1979. 3. C. Peebles, The Corona Project, Naval Institute Press, Annapolis, 1997.
INFRARED THERMOGRAPHY BRENT GRIFFITH ¨ DANIEL TURLER HOWDY GOUDEY Lawrence Berkeley National Laboratory Berkeley, CA
4. D. Day, J. Logsdon, and B. Latell, Eye in the Sky, Smithsonian Institute Press, Washington, 1998. 5. T. Lillesand and R. Kiefer, Remote Sensing and Image Interpretation, John Wiley & Sons, Inc., NY, 2000. 6. J. R. Schott, Remote Sensing, The Image Chain Approach, Oxford University Press, NY, 1997. 7. R. A. Schowengerdt, Remote Sensing, Models and Methods for Image Processing, Academic Press, NY, 1997. 8. J. W. Goodman, Introduction to Fourier Optics, McGraw-Hill, NY, 1968. 9. J. D. Gaskill, Linear Systems, Fourier Transforms, and Optics, John Wiley & Sons, Inc., NY, 1978. 10. G. C. Holst, CCD Arrays, Cameras, and Displays, SPIE Optical Engineering Press, Bellingham, 1998. 11. R. D. Fiete, Opt. Eng. 38, 1,229–1,240 (1999). 12. J. C. Leachtenauer, ASPRS/ASCM Annual Convention and Exhibition Technical Papers: Remote Sensing and Photogrammetry 1, 262–272 (1996). 13. K. Riehl and L. Maver, Proc. SPIE 2829, 242–254 (1996).
INTRODUCTION Infrared (IR) thermographic systems, or IR imagers, provide images that represent surface temperatures, or thermograms, by measuring the magnitude of infrared radiation emanating from the surface of an object. Because IR imagers see the radiation naturally emitted by objects, imaging may be performed in the absence of any additional light source. Modern IR imagers resolve surface temperature differences of 0.1 ° C or less. At this high sensitivity, they can evaluate subtle thermal phenomena that are revealed only as slight temperature gradients. Some applications that employ IR thermography include inspections for predictive maintenance, nondestructive evaluation of thermal and mechanical properties, building
INFRARED THERMOGRAPHY
THEORY OF OPERATION The fundamental principles that make IR thermal imaging possible begin with the observation that all objects emit a distribution of electromagnetic radiation that is uniquely related to the object temperature. Temperature is a measure of the internal energy within an object, a macroscopic average of the kinetic energy (the energy of motion) of the atoms or molecules of which the object is composed. Electromagnetic radiation arises from the oscillation of electrostatically charged particles, such as the charged particles found within an atom, the electron and the proton. Electromagnetic radiation propagates by the interaction between oscillating electric and magnetic fields and can sustain itself in the absence of any conveying media. The wavelength, the distance between successive peaks in the oscillations of the electric and magnetic fields, can vary across a wide range that represents a diverse range of phenomena, including radio transmissions, microwaves, infrared (IR), visible and ultraviolet (UV) light, X rays, and gamma rays. The IR portion of the electromagnetic spectrum, which is of primary in interest to thermographers, includes wavelengths from about 1 to 100 µm. The interaction of materials with radiation of different wavelengths is extremely varied. Electromagnetic radiation may be absorbed, reflected, or transmitted by a material, depending on the material properties with respect to the wavelength in question. The IR band of radiation is considered ‘‘thermal’’, mostly because it contains the wavelengths of radiation emitted by objects at ordinary temperatures. However, if it were not for the high absorption of IR by most objects, it would not be nearly as important in heat transfer. For example, human skin
absorbs well in the IR; for this reason, we perceive IR radiation as heat more readily than we perceive radiation of other wavelengths, such as X rays, which are mostly transmitted. Visible radiation is also considered thermally important, because visibly dark objects absorb it well and because it is a substantial component of the radiation emitted by the sun. An object at a single temperature does not simply emit a single wavelength of electromagnetic radiation. Because temperature is a macroscopic average of molecular scale oscillations, there is, in fact, a distribution of molecular kinetic energies underlying a single temperature. Correspondingly, there is a distribution of wavelengths and intensities of electromagnetic radiation emitted by an object at a single temperature, as a result of the varied oscillation rates of the charged particles within. Using a theoretical, idealized emitter termed a blackbody, Planck first derived a mathematical expression for the emissive power of radiation as a function of wavelength and temperature; hence, it is known as the Planck distribution. Qualitatively, at low temperatures, the shape of the distribution is broad and does not have a well-defined peak. At higher temperatures, the distribution is narrower, and the peak is very well defined. Figure 1 shows Planck distributions, or blackbody curves, for objects at 5,800 K (temperature of the sun), 2,000 K, 1,273 K (1,000 ° C), 373 K (100 ° C), and 293 K (20 ° C — room temperature). It can also be seen in Fig. 1 that the wavelength emitted with the most intensity, or peak wavelength, is also a function of temperature. The Wien displacement law expresses this relationship of temperature, T (in Kelvin) to the peak wavelength λpeak of radiation emitted by a body: λPeak =
2897.8 . T
(1)
Using this expression, it is clear that most commonly encountered temperatures correspond to radiation in the IR band (1 to 100 µm). For example, a temperature of 300 K (room temperature) corresponds to a wavelength of about 10 µm. In contrast, the surface of the sun is 5,800 K; hence, the peak wavelength of solar emission is in the visible portion of the spectrum (about 0.5 µm). Because
Emissive power (W/m2)
science, military reconnaissance and weapons guidance, and medical imaging. Infrared thermography can be used both as a qualitative and a quantitative tool. Some applications do not require exact surface temperatures. In such cases, it is sufficient to acquire thermal signatures, characteristic patterns of relative temperatures of phenomena or objects. This method of qualitative visual inspection is expedient for collecting a large number of detailed data and conveying them, so that they can be easily interpreted. In contrast, accurate quantitative thermography demands a more rigorous procedure to extract valid temperature maps from raw thermal images. However, the extra effort can produce large arrays of high-resolution temperature data that are unrivaled by contact thermal measurement techniques, such as using thermocouple wires. A skilled operator of an IR thermographic system, or thermographer, must be conscious of the possibility that reflected or transmitted, rather than emitted, IR radiation may be emanating from an object. These additional sources manifest themselves as signals that appear to be, but are not actually, based exclusively on the temperature of the spot being imaged. To understand the challenges and possibilities of IR thermography, it is first necessary to review the principles of physics on which it relies.
803
1.E+09 1.E+08 1.E+07 1.E+06 1.E+05 1.E+04 1.E+03 1.E+02 1.E+01 1.E+00 1.E−01 1.E−02 1.E−03 1.E−04
5800 K 2000 K 1273 K 373 K 293 K
0.1
1
10
Wavelength (microns) Figure 1. Blackbody emissive power vs. wavelength.
100
804
INFRARED THERMOGRAPHY
most terrestrial temperatures yield emission in the IR band, it is important to recognize that, in IR imaging, unlike in common experience in visible imaging, almost every object in the field of view is a source, not just a reflector of a source. Integrating the Planck distribution over all wavelengths for a given temperature yields the blackbody emissive power Eb . Equation (2) is a simple expression for Eb called the Stefan–Boltzmann law (σ is the Stefan–Boltzmann constant, 5.67 × 10−8 W/m2 K4 ): Eb = σ T 4 .
(2)
This relationship is at the core of IR thermography because it is the emissive power that an IR imager physically measures, whereas temperature is the parameter of interest. Furthermore, the success of IR thermography is highly dependent on the fortuitous fourth power relationship, so that emissive power is a very strong function of temperature. The large response makes it possible to achieve excellent temperature resolution. Because it is derived from Planck’s distribution, the Stefan–Boltzmann law is limited to describing an idealized blackbody emitter. In reality, no body emits exactly as much power as a blackbody. The ratio of power actually emitted by a given body to that of the power emitted by a blackbody at the same temperature is called the emissivity and is represented by a nondimensional number between zero and one. Emissivity is a wavelength, temperature, and incidence angle dependent property specific to each material. The term emittance, or e, is more useful to the thermographer because it is used to describe the emissivity of an object in aggregate across a range of wavelengths, temperatures, and incident angles. Usually this simplified quantity is valid only across a limited range of these parameters, within which the variation in emissivity with respect to each parameter is not highly significant. A surface that meets the criteria of nearly constant emissivity with respect to wavelength and incidence angle within a certain regime is termed a gray surface and can be characterized by a single emittance for simplified calculations. Published values of emittance should be used only when the wavelength, temperature, and angular parameters agree closely with those for the IR imaging arrangement at hand. Surface emittance needs to be well understood for successful operation of IR thermographic systems. Introducing emittance into Eq. (2) yields Eq. (3), the Stephan–Boltzmann law modified by emittance e: (3) E = eσ T 4 . Thermal radiation is largely a surface phenomenon because most materials are not transparent to IR. As a result, it is the material properties of the surface of an object that determine emittance. Polished metals have low emittances, but a thin layer of paint can transform them to high emittance. For materials that are opaque to IR radiation, emittance can be considered the complement of reflectance (the amount of incident radiation that is reflected by a surface), expressed by 1 − e. It is important to realize that because no real surface has an emittance
e = 1, the radiation that is viewed coming from an object is always a combination of emitted and reflected radiation, termed radiosity, and it contains information regarding both the temperature of the object and its surroundings. Special techniques described within this article are necessary to distinguish the two components and obtain accurate surface temperatures for the object of interest. When imaging outdoors, it should be taken into account that the sun is a significant source of IR radiation, particularly in the shorter wavelengths of the IR band. TYPES OF IMAGING SYSTEMS General Characteristics Infrared thermographic systems are essentially imaging IR radiometers. Often, they provide IR images continuously, in real time, similar to the TV image provided by conventional video cameras. The imager itself contains, at a minimum, a detector and an image formation component. Complete thermographic systems also integrate an image processing and display system. An IR imager is often called radiometric when it is designed for measuring temperatures. Nonradiometric IR imagers are used in applications that do not require measuring quantitative temperature differences, but rather are satisfied by a qualitative image display. For example, this type of imager is used for night vision and surveillance. Nonradiometric imagers do not need extensive calibration, thermal stability, or image processing capabilities, which makes them less expensive. The two basic types of IR imagers are focal plane arrays (FPA) and scanners. There are also two categories of detector technologies used in these systems: photon absorbers and photon detectors. They are summarized in Table 1. System Selection The choice of an imaging system for a particular application depends on a number of variables, including the temperatures of the specimens to be measured, the amount of money available to purchase thermographic equipment, the necessary measurement accuracy, the ease of use, and the appropriate wavelength for the application. Table 1 lists typical applications for the various types of IR imagers. Shorter wavelength IR imagers are also referred to as near-IR imagers, because the wavelengths are near the visible range (1 to ∼8 µm). IR imagers that are sensitive in the longer wavelengths (∼8 to 14 µm) of the IR band are typically called long-wave imagers. Near-IR imagers work well for high-temperature subjects, but there is almost always enough thermal radiation for high-temperature objects to be imaged using long-wave imagers, as well. Near-IR imagers are often used to image lasers, such as in the development of LIDAR systems. Long-wave IR imagers experience fewer problems when measuring in sunlight, because the solar spectrum peaks in the visible and has very little power at longer IR wavelengths. More materials are transparent to near-IR than to long-wave IR. To measure the surface temperature of an object, it is important to choose a detector that
INFRARED THERMOGRAPHY
805
Table 1. Summary of Specifications of Various Types of IR Imaging Systems
IR Imager System Types
Resolutions Available (pixels)
Response Ranges (µm)
NETD (noise equivalent temperature difference) (mK)
Typical uses
FPA, Indium Gallium Arsenide (InGaAs)
320 × 256, 640 × 512
0.9 to 1.68
FPA, indium antimonide (InSb)
320 × 256, 640 × 512
1.0 to 5.4 3 to 5
FPA, microbolometer (absorber)
160 × 128, 320 × 240
8 to 14
FPA, quantum well infrared photodetectors (QWIP).
320 × 256, 640 × 512
Narrow band within 8 to 10
30
R&D, laboratory
Scanner, mercury cadmium telluride (HgCdTe or MCT)
Variable, 173 × 240, 1, 024 × 600,
8 to 14, 3 to 5
50
R&D, laboratory
uses wavelengths to which the object of interest is opaque. Conventional glass is a common material that is transparent to near-IR, yet opaque to long-wave IR. IR imagers that have large array sizes tend to be more expensive to produce largely because of their increased complexity and the challenge of making large arrays without excessive numbers of nonoperational pixels. For many applications, it will be important to consider the imager’s instantaneous field of view (IFOV) measured in milliradians, a combination of array size and optical arrangement that determines the physical dimensions represented by any one pixel for a given distance from the imager. Special optics are available to allow IR imaging on the micron length scale; however, most IR imagers have a wide field of view, resulting in pixels whose physical dimensions are of the order of millimeters for object distances in meters. Some detectors require cooling to low temperatures using liquid nitrogen or a closed-cycle Stirling cooler. The need for cooling can add inconvenience, expense, and start-up delay, depending on the cooling method used. Stirling coolers are expensive and require several minutes of operation before the imager can be used. Liquid nitrogen cooling is inexpensive; however, it can be inconvenient for field use where it may not be readily available. There are also detector technologies that do not require cooling.
N/A
IR Laser/LED development, high temperatures
25
General, military imaging, fast
100
General, predictive maintenance, fire fighting
to IR, rather than to visible radiation. In an FPA, each pixel that makes up an IR image is measured by an individual detector. The detectors are arranged in a flat, two-dimensional array. One-dimensional arrays are also used to provide line imaging. The array is placed in the focal plane of the optical system (lens) of the imager, as diagramed in Fig. 2. Typically, array size does not exceed 1, 024 × 1, 024 pixels. However, larger arrays are produced on a custom basis for specialized uses, such as astronomical imaging. Operability describes the percentage of functional pixels in an array. Most fabrication processes yield operability greater than 99%. Still, most arrays inevitably contain bad pixels. Software usually masks this defect by interpolating missing data from neighboring pixels. In contrast to scanners, FPA imagers are more mechanically robust and provide improved depth of field and field of view, as a result of a simple optical design. FPAs do not have any moving parts, other than a simple focus adjustment. Scanners In a scanning imaging system, one or several detectors are combined with a single or multiple-axis mirror system. Images are acquired sequentially by combining
“Whisk-broom” imaging
Focal Plane Arrays
Discrete detector or
Mirror 2
Focal plane array (FPA) imagers are the most common types of systems today. They are analogous to the CCD arrays found in handheld video cameras but are sensitive
several detectors for multispectral imaging Mirror 1 “Push-broom” imaging
• Display • Data storage • Postprocessing Optics
Line array detector Mirror
Detector
Figure 2. Focal plane array detectors. See color insert.
Figure 3. Scanning system configurations. See color insert.
806
INFRARED THERMOGRAPHY
individual measurements. Scanning mirror imagers were used before large FPAs became readily available in the 3–12 µm range. Multiple detector units are used in multispectral imaging systems; each detector records a certain band of the spectrum. Old Landsat satellites are equipped with multispectral scanning systems. Figure 3 is a diagram of two scanning designs. So-called whiskbroom scanning uses a two-axis mirror system. Single point measurements are combined into a line; many lines then compose the final image. Interlaced whisk-broom scanning acquires every other line initially and then fills in the remaining lines in a second pass. Noninterlaced whiskbroom scanning acquires all lines sequentially. Older IR scanners are usually of the whisk-broom type. Push-broom scanning acquires an entire line at once using a linear array detector. A single-axis mirror configuration scans perpendicularly to the linear array to compose the image. Some systems can be set to different scanning speeds; 60 and 30 Hz frame rates are common. Scanning systems have the advantage that they can acquire image arrays of any size. Often the manufacturer fixes the image array size; however, there are also systems that dynamically configure the scanning head to acquire the desired image array. Mercury cadmium telluride detectors are the most frequently used single-element detectors in handheld scanning systems, although any detector that has a sufficiently fast dynamic response could be used. Scanners are delicate mechanical instruments that do not always tolerate vibration or high instrument acceleration. Another disadvantage is that the dynamic response of the detector limits the maximum gradient obtainable in an image at a given scanning speed. As a result, the frame rates are relatively low compared to FPAs. The acquisition time delay across the image may also pose a problem when it is capturing rapid transient events. Photon Absorbers A microbolometer is an example of a photon absorber. The absorber is made of a passive energy-absorbing material that is simply warmed by the IR radiation. Incoming thermal radiation results in an increase in absorber temperature that is proportional to the radiosity of the surface being imaged. The detector temperature is then determined by measuring a temperature dependent material property such as electrical resistance. The absorber has to be thermally decoupled from the substrate and the environment for maximum performance. Absorbing materials perform well across a wide range of the spectrum. Microbolometer arrays are temperature
stabilized, but do not need to be cooled to cryogenic temperatures, as is necessary for some photon detectors discussed in the following section. Pixels are typically 30 × 30 µm and are micromachined in monolithic wafers that also incorporate signal acquisition and processing. Individual pixels are suspended and electrically connected by two arms. Heat exchange through gas convection is suppressed by packing the array in a hard vacuum housing. The broadband absorption intrinsic to the detector is thus limited to 6–14 µm by the transmission of the vacuum housing window. This technology is likely to see further development in the near future that will increase sensitivity (NETD less than 50 mK). Photon Detectors Photon detectors are active elements. A photon striking the detector triggers a free charge, which is collected and amplified by an electronic circuit. Detectors and readout circuits are constructed on different substrates and are electrically connected into a hybrid assembly by indium bump bonding. A variety of detector materials is used today; each has a specific spectral range and specialized application. Some detectors require cryogenic cooling to reduce the dark current (the amount of current passed by the detector in the absence of any photon signal) to acceptable levels. The detectors in Table 2 following are most commonly used today. Indium Gallium Arsenide (InGaAs). Indium gallium arsenide detectors work best mounted on a lattice-matched substrate. Unfortunately, this arrangement also limits the spectral response to 0.9–1.6 µm, which makes this detector material unsuitable for room temperature applications. However, InGaAs is well suited for measuring temperatures above 500 ° C. Uncooled FPA and linear arrays are produced in large quantities and many size variations. This is the detector of choice for military applications such as heat-seeking missiles. InGaAs arrays employed in general purpose IR imagers are commonly 320 × 240 pixels. Indium Antimonide (InSb). Indiuml antimonide detectors are functional in the 3–5 µm range, so they are useful in room temperature IR imagers. Glass and certain plastics are partially transparent in this range of the spectrum and need to be painted or covered by an opaque adhesive tape to collect accurate radiometric data. Some materials exhibit larger reflectivity in this spectral range compared to the 8–12 µm range, so distinguishing emitted radiation from reflected background radiation may prove more challenging. Arrays up to 1, 024 × 1, 024 pixels and NETD
Table 2. Focal Plane Array (FPA) Devices Detector Material InGaAS InSb QWIP (GaAs/AlGaAs) Microbolometer HgCdTe
Wavelength(µm)
FPA Size
Sensing
Operability
0.9–1.6 3.0–5.0 8.0–9.0 7.5–13.5 (limited by optics) 1.0–20.0 (limited by optics)
320 × 240 1, 024 × 1, 024 512 × 512 320 × 240 1, 024 × 1, 024
Detection Detection Detection Absorption Detection
>99% >99% >99% Typical of Si process 85–95%
Cooling Uncooled Cooled to 70 K Cooled to <70 K Uncooled Cooled to 60–160 K
INFRARED THERMOGRAPHY
as low as 10 mK have been demonstrated using InSb. These large arrays are not commonly produced and are used primarily in observatory telescopes. InSb detectors are known for rapid acquisition and have frame rates up to 80 Hz. Cryogenic cooling is required to minimize dark current. Quantum-Well Infrared Photodetectors (QWIP). QWIP arrays were developed in the mid-1990s. Only in recent years have QWIP detectors become commercially available in mass-produced, handheld IR imagers. Enormous advances in crystal growth technology and processing techniques made it possible to produce uniform, large arrays. Quantum well detectors are characterized by a sharp spectral absorption line. The well is tuned in depth and width, so it can hold an electron only in the ground or first excited state. A photon whose energy is equal to the difference between the energy level of the ground state and the first excited state will promote a ground-state electron to the first excited state. An externally applied voltage then sweeps the excited electron out, resulting in a photo current. The spectral response can be set during fabrication by adjusting the well width and depth. QWIP arrays can typically have their absorption lines anywhere in the 8–10 µm range, which is the ideal range for observing room temperature objects. Stirling coolers are used to reduce the temperature of these detectors to 70 K Quantum wells are made of layers of aluminum–gallium–arsenic and gallium–arsenic. Mercury Cadmium Telluride (HgCdTe or MCT). Mercury cadmium telluride is sensitive to a wide range of wavelengths (1 to 20 µm), which allows making detectors for highly varied applications. Large arrays up to 2, 000 × 2, 000 pixels are produced in very small numbers for astronomical applications. Unfortunately, technical difficulties prevent such large arrays from being commercialized in small handheld IR imagers. The fabrication process requires very toxic substances and poses an environmental problem. In addition, the detector arrays currently produced have a poor operability of 85 to 95%. Cooling to between 60 and 160 K is required, depending on the bandwidth selected. MCT is commonly employed as a single-element detector in scanning systems. Data Acquisition/Image Processing Collecting image data is an important part of operating a real-time imager. The data are commonly recorded for later analysis, assembly into reports, etc. Continuous video signals may be recorded on tape or by using a computerized video digitizer. In steady-state thermal situations, a single IR image would be generated by averaging a number of frames over a period of time. Transient thermal situations are usually recorded continually, and a series of images are stored over time. The resolution of acquired data is typically expressed in number of bits. An 8-bit system distinguishes 256 different levels of temperature or radiosity; a 12-bit system distinguishes 4,096 levels.
807
GOOD PRACTICE/SYSTEM OPERATION GUIDANCE Emittance and Background The key to using an IR thermographic system well is to combine skilled operation of the imaging system, a good understanding of the way IR radiation interacts with the object(s) being imaged, and consideration of the movement of heat energy in and around the object(s) by all modes of heat transfer (conduction, convection, and radiation). Emittance and its relation to reflected IR radiation from the background were discussed briefly in the theory section of this article. However, this topic deserves a more thorough treatment because understanding how to interpret these effects is essential to successful thermography. Variations in surface emittance can cause large changes in the appearance of an IR image, so the thermographer must know the emittances of the various objects in the image to distinguish genuine temperature differences from those resulting from nonuniform emittance. Emittance describes the capability of a surface to emit IR radiation; it is a nondimensional value ranging from zero to one: zero for no emission and complete reflection; one is for complete emission and no reflection. As long as the surface is opaque to radiation of the band of wavelengths measured by the imager’s detector, then reflectance and emittance are complementary, and added together they equal one. Some published sources of emittance values may not be appropriate for IR thermography. Accurate IR thermographic results are best achieved by establishing emittances through testing, as described later in this article in the Emittance Measurement section under Scientific Temperature Mapping. Because no real object has an emittance of one, background thermal radiation that arrives at a surface from the surroundings is in part reflected and becomes a component of the apparent radiation emanating from the surface. The complete background thermal scene is a combination of the geometry or orientation of surfaces (including the sky) that are in view of the object being imaged and their effective temperatures. The sky has a very cold effective temperature that is hard to quantify, and it depends heavily on cloud cover and the moisture content of the air. Therefore, IR thermographic measurements should be made indoors, if possible, or the sky should be screened by large cloth backdrops. A thermographer should not consider merely the emittance and temperature of objects in the direct field of view, but should also be aware of the entire background thermal scene. What IR radiation is arriving at the surfaces being imaged? Is an observed hot spot real or a reflection from a nearby object? IR radiation generated by emission is reflected and re-reflected and may eventually emanate from all directions at varying magnitudes that depend on the temperatures, geometries, and emittances of the walls, floors, lights, windows, open sky, trees, and everything else that surrounds the objects or phenomena being imaged. Specimen Considerations For a specimen to be imaged, a thermographer should consider the physical properties of emittance, reflectance, and
808
INFRARED THERMOGRAPHY
transmittance/absorption coefficients, especially across the wavelengths detected by the IR imager. Reflectance and emittance are closely related and mathematically complementary, when there is no transmission. In IR imaging, high emittance is preferable. In this case, most of the apparent radiation is actually due to emission from the object of interest. Low-emittance (low-e) surfaces are problematic because the reflection of background radiation can be a more significant portion of the apparent radiation than the emitted signal of interest. Table 3 lists some emittance values for various materials. These measurements were made by the authors using the procedures described in the Emittance Measurement section found later in this article. Most common materials have emittances of E = 0.9, which means that roughly 10% of the IR radiation incident from surrounding objects will be reflected and 90% of the theoretical blackbody radiation will be emitted. These surfaces work well for IR imaging. For surfaces that have lower emittances, particularly below E = 0.5, precautions are in order. A surface of very low emittance, such as a shiny metal surface, will reflect much more IR radiation than it emits, and the IR imager will observe a collection of reflections from the surfaces surrounding the specimen, rather than information about the specimen itself. This phenomenon may be used to the thermographer’s advantage when analyzing the background thermal scene and as a means to document reference positions by using small, highly reflective, location markers. However, when the task at hand is to collect temperature data from an object that has low emittance, the surface needs to be modified by applying a thin layer of a higher emittance material to the surface. An opaque adhesive tape or paint can be applied to the surface to raise its emittance. One quick way to check surface properties is to manipulate the background thermal scene. For example, a thermographer may introduce a background object whose temperature is significantly different from ambient (e.g., the thermographer’s hand, a butane lighter, or an ice cube). Moving the background object laterally near the specimen and observing any corresponding movement or changes in the IR image can identify highly reflective surfaces. It is important to recognize that reflections from a specimen can be both specular and diffuse. A diffuse reflection is angularly independent and blurs out nonuniformities in the background thermal scene. Specular reflections are angularly dependent and preserve Table 3. Emittance Values for IR Temperature Measurements Using an 8 to 12-µm Mercury Cadmium Telluride Detector Material Glass Wood Vinyl tape Paper Gold Shiny aluminum Common paint
Emittance 0.86 0.90 0.90 0.90 0.02 0.04 0.90
the image of thermal nonuniformities in the background. It is more difficult to evaluate the emittance of a specular surface by visual inspection. For example, window glass is specular and, despite its high emittance (e = 0.86), it can show clear imaged reflections because of the high sensitivity of IR imagers. The distinction becomes one of magnitude. If a specular surface also has low-e, the reflections appearing as movement in the image will be more intense than those from a higher emittance surface. A roll of paper/masking tape (e = 0.90) is a valuable asset to a thermographer faced with a specimen of unknown surface properties. A quick check for problematic low emittance or transmittance can be performed by applying tape to the specimen and then imaging the specimen. The thermographer will know that the specimen has an emittance different from 0.9 (or that the specimen exhibits measurable transmittance) if the piece of tape is apparent in the image. The tape should have good thermal contact with the specimen and should not appreciably alter the true temperature. For example, if the background thermal scene has an effective temperature colder than the specimen and the taped region shows as warmer, then it is very likely that the surface has low emittance and needs to be modified by painting or more completely covered by tape. Painting low-e surfaces is a good idea if it is feasible. An even, opaque coating of paint is often the best solution for imaging low-e, varying emittance, specular, or transmitting surfaces when a clean, accurate image or an accurate temperature measurement is desired. Water-based, temporary paint (e = 0.9) applied as a spray is useful and relatively easy to clean up and remove. Repeat the tests described before for determining whether a surface has low-e or is specular to be sure that the paint is applied sufficiently thickly; a coating that is too thin will not raise the emittance completely or uniformly. Locational markers can be applied to a specimen for later use in identifying geometry and spatial locations in the thermogram. Small strips of aluminum adhesive tape are examples of low-e surfaces that can be used as locational markers in an IR image, if the background is at a temperature different from the specimen. Altering the emittance of surfaces for IR imaging will also alter the heat transfer. This can lead to significant changes in the real surface temperatures of a test subject whose surface has been painted, especially when radiative heat transfer dominates convection and conduction. A thermographer should be careful to report any modifications made for imaging, so that the results can be interpreted properly. Repeat some measurements with and without paint to clarify the effect of modifying the surface emittance. The usefulness of good quality IR data often offsets the drawback of extrapolating from measured results to account for altering the radiative heat transfer. If altering emittance is not possible and high-quality data are required for a varied surface (such as a semiconductor), it is possible to use additional procedures to obtain a pixel by pixel emittance map for correcting an image if the subject can be fixed in place, made isothermal,
INFRARED THERMOGRAPHY
and controlled to two temperatures that differ from the background (1). Thermal Setup As a thermographer develops facility in accounting for emittance, reflection, and transmission, the potential for collecting misleading IR images decreases. The challenge is then to do a good job of interpreting the heat transfer phenomena that are revealed in the information contained in the image. Because an IR imager displays surface temperatures and temperature differences drive heat flow, thermograms can also be used to visualize where energy is flowing, as it moves from hot to cold. Consideration of heat transfer is important both when designing an experiment that will use IR imaging and when interpreting thermal patterns in the image. IR images can reveal temperature differences in a specimen; when there is a temperature difference, heat energy is moving. If thermographers want to image something based on observing temperature differences, then they are likely to need to make heat energy move as part of that effort. In any IR imaging setup, there are two possible heat transfer states, steady state and transient. In steadystate heat transfer, temperatures and heat flows in any one place are constant over time. In transient heat transfer, temperatures and rates of heat flow change over time. Steady-state heat transfer depends on varying thermal conductance; transient heat flow also depends on the varying thermal capacity of materials. Often the objects being imaged will determine which is used because of a preexisting condition such as internal heat generation, a fabrication process, or other normally occurring temperature differences. Most objects that consume energy are likely to be imaged during actual or simulated operation. However, a thermographer is frequently faced with an energetically passive situation where heat flow must be induced to gather the desired information. The same physical phenomenon on which IR imagers rely, thermal radiation, is an important vehicle for heat transfer; conduction and convection are the other two. A thermographer can use thermal radiation as a method of heating a surface in an imaging setup. Lamps can provide a convenient source of thermal radiation when a thermographer needs to add heat to a system. Nondestructive testing often uses highpowered flash lamps to generate an abrupt pulse of radiant thermal energy that quickly heats a surface in transient studies. Other techniques for generating heat flow include mounting specimens between warm and cold environmental chambers, attaching resistive contact heaters, blowing hot or cold air on the specimen, and scanning lasers. Features of the material being imaged can be observed from the propagation of the heat into the material, indicated by the temperatures as they evolve over time. To test the effectiveness of a fluid-based cooling circuit in a heat exchanger using thermal imaging, both the heat load and the cooling mechanism should be activated. Often, establishing a known, reproducible heat flow is the most challenging aspect of performing an IR
809
imaging measurement. Creativity and attention to all aspects of the thermal scene are necessary to design and conduct an IR imaging effort that gathers valid and revealing data. An IR imager can also be used to gather information based on varying transmission/absorption of IR by a material. For example, a gas may absorb IR differently from air, so it can be detected or quantified using an IR imager. For accurate surface temperature mapping, transmission of IR by the object of interest is a liability, but there are applications where IR transmission can be beneficial. A material may be slightly transparent in the IR (but not in the visible band), so an IR imager would be useful for tasks such as aligning components. Spectral filters can be used to control the wavelengths monitored during transmission/absorption measurements. System Operation It is impractical to try to describe the operation of all of the various IR imagers available. However, some important aspects of operation are relevant to most systems. The magnitude of the temperature in a thermogram is represented by a color or gray scale. For visual interpretation, it is not necessary to exceed 256 (8-bit) possible display colors, and gray-scale images often need only 16 (4-bit) shades of gray. The span adjustment of an imaging system will allow the user to manipulate the maximum displayable temperature difference by assigning a different temperature increment to each color step (determining the width of the scale in temperature units). The center temperature adjustment provides a means for shifting the absolute temperature of the middle of the scale. Random noise is inherent in IR detectors; it can be mitigated by averaging many frames together, pixel by pixel. The standard error of the average of N ‘‘identical’’ measurements that have random noise is the standard deviation of those measurements divided by the square root of N. Because the standard deviation of most imagers is small, it is not usually necessary to average more than 15–50 frames before the random error is reduced to a much smaller component of the total error than the systematic error involved in using the instrument. Most commercial thermographic systems provide a means to average frames to remove random noise. IR image data are essentially two-dimensional numerical arrays that often have 12-bit resolution (4,096 temperature steps). Colors are assigned to levels, but there is no such thing as true color, as in visible imaging. Black-andwhite thermal images are better for indicating the forms of objects, and color IR images are better for indicating temperature magnitudes and differences. Figure 4 shows the same IR image data in both color and black and white. The image is of an architectural model of a city that is being artificially heated by halogen lights. Imaging Procedures IR imagers need to be focused. The best focus will show the steepest gradients at edges that have abrupt changes
810
INFRARED THERMOGRAPHY
but unless the mirror is very large or the background radiation very uniform, it may be difficult to collect a good average background temperature to use in surface temperature corrections. Diffuse reflecting mirrors, special highly reflective but convoluted surfaces, are available to provide thermal background information that represents more of an average background temperature as incident on the location of the mirror. This can be convenient for determining a single value for background temperature, when it is not possible to attain good background uniformity. THERMOGRAPHIC INSPECTION Thermographic inspection evaluates images primarily for qualitative thermal signatures (rather than absolute temperature levels) to discern the presence and proximity of a feature of interest or defect. The status-of-operation or maintenance condition of a piece of equipment may also be discerned. Thermographic inspections can be used in industry, medicine, and aerospace to reveal malfunction, leaks, material loss, delamination, contamination, friction, pain, and tumors, etc. It is important to recognize the distinction between radiometric imaging for absolute temperature accuracy and thermal imaging for relative temperature information or thermal signatures. Figure 4. Black-and-white vs. color image display. See color insert.
in temperature. If there is a region on the object that has sharp thermal contrast, this is relatively straightforward, but if the object is isothermal, then focusing can be a challenge. One focusing technique is to apply a piece of low-e tape, such as aluminum, and focus on the contrast created by the tape. The tape can then be removed, and the focus maintained. The background thermal scene should be assessed early in the arrangement of an IR imaging experiment. An IR mirror is indispensable for this purpose. Use a material whose front surface is highly reflective in the IR. Typical glass mirrors are inappropriate because they usually have glass front surfaces and the reflective metal is behind the glass. A long-wave IR imager will image the surface temperatures of the glass more than reflecting the background because glass has a relatively high emittance. A piece of metalized polyester film is a good choice for a background mirror. It can be placed on or in front of the specimen to be measured and allows imaging the background radiation which is incident on and hence may be reflected from the specimen. It is desirable to establish a thermal background that is as uniform as possible. Nonuniformities in the background can manifest themselves as nonuniformities in the specimen being measured. A backdrop such as a sheet of foam or heavy cloth can provide a uniform background. IR mirrors are usually specular; this is useful for identifying thermal patterns in the background,
Radiometric Imaging In radiometric imaging, IR imagers are used as noncontact temperature sensors. The objective of the imaging is to monitor, check, or compare an object’s surface temperatures or temperature differences. Most IR imagers sold today are radiometric instruments. Although the imaging interface of a radiometric thermographic system will provide quantitative temperatures, it is imperative to recognize that these temperatures are accurate only when sufficient attention is given to the imaging setup, calibration, surface emittance, and background radiation (see Scientific Temperature Mapping). Examples that require this approach are process temperature monitoring in activities such as baking, or evaluating the heat dissipative behavior of a prototype electronic circuit. Thermal Imaging Most routine inspections require only simple thermal imaging without consideration for the parameters necessary to achieve accurate absolute temperatures. There are many applications where the temperature difference of the thermal signature is large enough to be distinct, despite less than ideal imaging conditions. Given the basic criteria of high, uniform specimen emittance and a uniform thermal background, the relative temperatures displayed by the imager will also be reasonably accurate. Qualitative visual inspection is usually sufficient to resolve the thermal signatures in basic thermal imaging applications, and numerical data collection and postprocessing are not necessary. Typically, thermal imaging inspections focus
INFRARED THERMOGRAPHY
on surface temperature differences; however, surface contamination might be revealed by an altered local surface emittance. Differences in the surface emittance of the imaged object can be determined only if the energy levels of the object and background are different and background radiation is uniform. Predictive Maintenance. Historically, most IR thermographic systems have been marketed and sold for predictive maintenance using largely qualitative thermal imaging. Commercial training and certification is available (Snell Infrared and Academy of Infrared Thermography). The most common application is monitoring the condition of electrical power distribution systems and infrastructure. Figure 5 shows an IR image of a circuit breaker panel that has an overloaded circuit and the heat generated at the electrical contact resistance where a wire is attached to a circuit breaker. In this electrical example, the heat of electrical I2 R losses generates the observed thermal signature. However, any phenomenon that impacts heat flow is likely to be detectable. Bad bearings in a conveyor system will develop heat from additional mechanical friction. A fluid leak might lead to localized cooling from evaporation. And missing or bad thermal insulation will cause an increase in heat flux. All of these heat flow phenomena can be imaged by IR thermography. Nondestructive Evaluation (NDE) Fast Transient NDE. Fast transient thermal imaging is based on the temporal evolution of surface temperatures on a test specimen during heating or cooling. Variations in temperature depend on the heat flow through the object; heat flow is sensitive to cracks, delaminations, voids, and changes in material properties. In a typical setup, flash equipment used to generate a heat pulse, and the surface temperature of the test specimen is monitored by an IR imager. The pulse duration and the total energy delivered across the length of the pulse depend on the material properties and the nature of the defect to be detected. A series of approximately 10 to 30 frames is recorded at frame rates up to 80 Hz using scanners or at 1 kHz using FPA imagers. In the absence of thermal pulse equipment and a high-speed
30°C 29°C 28°C 27°C 26°C 25°C 24°C 23°C 22°C Figure 5. IR image of overloaded electrical panel. See color insert.
811
50 s
40 s
30 s
20 s
Figure 6. NDE transient series.
imager, transient NDE can be successfully performed on some specimens, particularly those of low thermal conductivity. Figure 6 shows several frames taken at 10-second intervals after a steady IR heat source was applied to the back side of a composite truck-bed panel. The well-bonded portions warm faster on the opposite, imaged side. An image sequence like this can already be very informative when visually inspected by a technician experienced in interpreting thermograms. However, image postprocessing can greatly enhance images and reveal features that are hidden in the unprocessed image. Postprocessing includes averaging image sequences, coloration by different palettes, image subtraction, and edge enhancement. Complex processing is possible, including Fourier transformation of time sequence images and tomography, if sufficient computer power is available for these tasks. Tomographic analysis translates the time image sequence into a threedimensional object sequence; defects can then be located in depth as well as in the image plane, even though data are collected only at the surface (2). Steady-State NDE. Fast transient thermography can provide a more robust nondestructive evaluative technique than steady-state thermography, but its use has been limited because it requires expensive thermal-pulse equipment and storing a large number of data for each specimen. Steady-state IR thermography, by contrast, is much less expensive, requiring minimal equipment and data storage. Near steady-state thermography could be used for NDE during phases of production where temperature cycles have been applied to parts, for example, the cool-down period after paint baking, adhesive curing, or injection molding of plastic parts. The required temperature difference between the part to be evaluated and the environment depends on the material, geometry, and size of the part; the size and nature of the defects being sought; and the minimum resolvable temperature difference of the IR imager being used. This difference can be as small as a few degrees Celsius.
812
INFRARED THERMOGRAPHY
Figure 7. NDE composite panel steady state with and without processing.
Defects in the sample are more blurred in the steadystate condition than in a transient image, as can be seen in Fig. 7, a pattern bonded composite truck-bed panel. The blurring is caused by the thermal conduction of the sample. However, image processing algorithms based on convolution of data by a kernel function can improve image definition (3). Before postprocessing of the images, it is necessary to remove random and digital noise from the data. A filtering technique based on convoluting the data by a kernel function can be used for this purpose (3). The filtered data are calculated from the inverse Fourier transformation of the product of the discrete Fourier transforms of the thermographic data and a kernel function. In the nomenclature of signal processing, the kernel function is a low-pass or band-pass filter. The kernel function is adjusted to provide the level of detail necessary for further image analysis, for example, temperature gradients or heat flow in a joint of the specimen. The final image may not represent valid temperatures but is likely to reveal features of interest more prominently. Medical Imaging: Breast Cancer IR imaging of the exterior of the human body is straightforward and completely harmless. Pain management doctors have long used thermography to assess inflammation and other conditions. Breast cancer is a widespread problem that can be mitigated by successful early detection. IR thermography offers a completely harmless method of imaging tissue that can resolve subtle changes in blood vessel temperature and configuration associated with the early stages of cancer development. Evidence suggests that thermal imaging can detect the beginning stages of cancerous cell growth up to 2 years earlier than would be detectable by other methods. The detection rate of the IR method is 87%, whereas the traditional mammography rate is 67% (4). Because they test for different indicators, they are actually quite complementary, and used together attain detection rates of 95% (4). IR imaging is particularly well suited to early detection in young women whose higher hormone levels support fast cell growth because the IR method relies on indicators that are much more pronounced under these circumstances.
It is also valuable to have reference images starting as early as possible because an individual’s thermal signatures are quite consistent over time and any variations are likely to be indicators of increased breast cancer risk. First, the thermal image is analyzed for asymmetrical temperature signatures between the left and right breast. Warm spots are usually an indication that the person is at risk. By exposing patients to a thermal stress (a cold room), the response of their bodys’ sympathetic nervous systems can be evaluated. The sympathetic nervous system is responsible for adjusting the blood flow near the skin surface when stimulated by external conditions. Dysfunction in the sympathetic nervous system is associated with nerve damage in the region of tumor development. Localized regions of different skin surface temperatures under conditions of environmental thermal stress thus suggest cancer development. Sophisticated image processing can be performed to enhance the patient’s blood vessels in the image. Often tumors cause a change in the blood vessel density and orientation (or vascular pattern). More vessels become apparent, and the vessels are oriented circularly; however fibrocystic cell growth can cause similar patterns and does not pose a threat to the person’s health. Usually the tumor is located in the center of that circular pattern and is the cause of the altered blood vessels. Figure 8 shows a raw thermographic image, a sympathetic dysfunction processed image, and a vascular map processed image. IR thermography for breast cancer screening is a new and very promising field; it provides early detection, and imaging can be performed as often as desired without the additional health concerns from the ionizing radiation of mammography. SCIENTIFIC TEMPERATURE MAPPING Temperatures are a common parameter of interest to measure. Accurate temperatures are typically acquired using contact probes such as thermocouples and thermistors. These methods are highly reliable and accurate; however, it is difficult to make large numbers of simultaneous measurements, and the presence of a contact probe can also interfere with the heat flows that establish the temperature of interest. The two-dimensional
INFRARED THERMOGRAPHY
813
gold-coated aluminum whose surface is shaped to provide diffuse reflection. View Arrangement
Figure 8. Medical imaging — breast cancer screening. See color insert.
array of temperatures provided by IR thermography is unparalleled for its noninvasive acquisition of many high resolution temperatures. However, it takes extra effort to raise basic thermography to a level where it can be relied upon for reproducible quantitative scientific measurements. Rigorous attention must be given to the measurement of emittance and background radiation as well as to the collection of so-called metadata parameters (e.g., the environmental conditions at the time of the test) to ensure accurate results. External Temperature Referencing Each thermal image is composed so that a reference emitter that has multiple targets for calibrating the IR data is included in the view. The addition of external referencing targets allows absolute accuracy of temperature measurements. The reference emitter should be situated near the specimen being measured, within the field of view of the IR imager, and should be kept reasonably in focus while the imager is focused on the specimen. Commercial reference emitters are available; they consist of a thermoelectric heat pump on a plate of highly conductive metal. The metal plate is painted to provide a well-characterized surface emittance, and the temperature of the plate is stabilized by a sophisticated closed-loop controller. If a temperature-stable circulating water bath is available, a reference emitter can also be fabricated by providing channels for fluid flow in a highly conductive metal plate. To measure an accurate absolute temperature, it is desirable to have a bore in the reference emitter plate that can be used to accommodate a platinum resistance thermometer (PRT) probe contact. Two IR mirrors should also be mounted on the reference emitter to quantify background thermal radiation. One mirror is aluminized polyester film, and the other is
In contrast to flat or convex surfaces, concave or selfviewing surfaces are more complex to image and require additional steps to gather background radiation levels and discern accurate temperatures. After obtaining a first set of uncorrected IR data for the specimen, data are gathered for the effective temperature of the reflected background at each location by temporarily applying background mirrors (aluminized polyester film) to the specimen in multiple stages. Thus, for an interior corner (e.g., the sill of a window frame), three sets of images would be collected: the unmodified surface, the background levels for vertical surfaces, and the background levels for horizontal surfaces. This technique allows for temperature corrections only in the area where the mirror is applied, so it is difficult to image an entire object this way. However, this procedure can be useful for extracting accurate linear temperature profiles in complex areas of interest. It is necessary to keep in mind that the addition of the low-e mirror alters the thermal environment that determines the background, so it is important to keep the mirrors as small as possible. Emittance Measurement Emittance describes the ability of a body to emit radiation. It is necessary to determine the appropriate emittance value for each surface of a specimen with respect to the wavelength band of the IR imager being used and the temperature range on the object. Emittance is a basic material property; however, the appropriate emittance value to use for quantitative IR thermography is scaled by the wavelength-dependent response of the detector used in the radiometer and the temperature range of the object being imaged. Therefore, the emittance values used in thermography may differ from literature values and may also vary among different types of detectors. The IR thermographer should use values for emittance of the various materials on a specimen’s surface that have been first measured in a separate experiment, which compares the specimen’s emittance to the emittance of a known material. An experiment of this type is diagramed in Fig. 9. Thin specimens of both the known material and the unknown sample material are mounted in good thermal contact to an isothermal, temperature-controlled plate and brought to the same surface temperature. Paper masking tape (e = 0.9) makes a good reference. The temperature is set to at least 20 ° C above or below the background temperature to ensure high contrast between the radiosity from the specimens and the radiation from the background. The IR imager is set to emittance 1.0 to turn off any background radiation compensation. Both specimens are imaged simultaneously. Readings are averaged over both time and space for the equivalent blackbody temperature of the unknown sample Te=1,smpl and the equivalent blackbody temperature of the known reference material, Te=1,ref .
814
INFRARED THERMOGRAPHY
should consider performing this calculation themselves, using varying values for Tback and esurf . Background Irradiation Reflected background
TIR
esmpl eref
Radiosity of reference
Tplate = Tamb + 20°C
Figure 9. Emittance measuring schematic.
Background radiation equivalent blackbody temperature, Tback , is quantified using a background mirror. Care is taken to provide a very uniform background, which is verified by imaging the background mirror. The emittance of the sample material, esmpl is then calculated from the emittance of the known material, eref , using Eq. (4). Temperatures are in degrees Kelvin. Several measurements are made and then averaged:
esmpl
(5)
Equation (5) is used to determine both the apparent reference emitter temperature TIR,Ref , (as measured by IR imaging) and the apparent sample surface temperature, TIR,smpl . The difference between TIR,Ref and the direct contact measured value TDC,Ref may be applied to correct TIR,smpl and to arrive at the final IR surface temperature result T, as shown in Eq. (6). This correction should be made for each temperature datum and for each IR image to produce arrays of temperature values: (6) T = TIR,smpl − (TIR,Ref − TDC,Ref )
Radiosity of sample
& % 4 4 Te=1,smpl − Tback eref . = 4 4 Te=1,ref − Tback
' (1/4 4 4 Te=1 − (1 − esurf )Tback = esurf
Final data sets merge spatial location coordinates and temperature values. The real distances between location markers on the specimen are measured, and a coordinate system is used to create temperature/location data pairs. The temperatures may be distributed linearly. Temperature data are then made a function of spatial coordinates, mapped to the appropriate coordinate system. Infrared Temperature Uncertainty
(4)
Infrared Temperature Calculations To obtain quantitative surface temperatures, raw thermographic data may need to be extracted from the imaging system and processed in a separate mathematical program. The image processing component of the imaging system is used to extract data from the thermal image and export them in appropriate arrays of text values to be synchronized with spatial coordinates. The result for temperature at each data point should be calculated from the correct emittance and a location-specific background level. The calculated IR temperature for the reference emitter may be compared to direct contact measurements, and deviations may be used to scale the rest of the IR data. Separate data sets may be merged together to combine data from different views. Equation (5) shows an expression for calculating a surface temperature, TIR , from the total thermal radiation represented by the variable Te=1 , the emittance of the surface esurf , and the background radiation level represented by the variable Tback . Te=1 is an equivalent blackbody temperature for the surface being measured. Tback is the equivalent blackbody temperature for the background thermal radiation level measured at each location using an applied mirror. Values for both Te=1 and Tback are obtained from the thermographic system by setting emissivity to unity. Conversely, setting emittance to less than one should cause the IR image processing component of the imaging system to process the image data using the calculation in Eq. (5). Thermographers
The thermographer will want to estimate the uncertainty in temperature data. A previous publication by the authors discusses in detail the origin of the following equations used to propagate uncertainty (5). Errors in esurf should be treated as a source of undefinable systematic uncertainty δesurf and may be analyzed by propagating uncertainty in Eq. (4). Errors in Te=1 and Tback lead to random uncertainties, δTe=1 and δTback . δTe=1 and δTback are usually closely related to equipment specification for the noise equivalent temperature difference (NETD). To analyze error propagation in Eq. (5), the sum of the squares of partial differentials of Eq. (5) are calculated with respect to variables Te=1 , Tback , and esurf , which leads to one possible solution, shown in Eq. (7). The uncertainty in TIR , δTIR , is then calculated using Eq. (7): 4 −(3/4) 4 Tback 1 Te=1 4 + Tback − δTIR = 4 esurf esurf 2 2 δTe=1 3 δTback 3 3 T + 4Tback δTback − 4 T 4 esurf e=1 esurf back . 2 δesurf 4 δesurf 4 + Tback − 2 Te=1 2 esurf esurf (7) The total uncertainty in the measured surface temperature, δT, is obtained from Eq. (8). Values for both δTIR,smpl and δTIR,Ref are obtained using Eq. (7). The uncertainty in the direct contact measurement of the reference emitter surface temperature, δTDC,Ref , is determined from the overall system accuracy of the direct contact sensor combined
INFRARED THERMOGRAPHY
with any errors in adjustments that correct for gradients in the surface material. The uncertainty arising from variations across the field of view, δTFOV , is determined from the magnitude of deviations in heavily averaged data for an isothermal plate that fills the section of the field of view being used. The uncertainty arising from nonlinear performance or calibration of the imager, δTLin , is determined by evaluating TIR for a temperature controlled and separately measured target across a range of temperatures and evaluating how deviations change across that range: δT = δTIR,smpl + δTIR,Ref + δTDC,Ref + δTFOV + δTLin .
815
computer simulations. The upper image in Fig. 10 shows a window that has an external reference emitter on the right side. Location markers are visible that delineate spatial features in the IR image. The lower image shows the same specimen but has IR mirrors mounted on it for gathering data on the effective temperature of the background. The final temperature data, shown in Figure 11, were obtained using Eqs. 5 and 6. CONCLUSION This article provides a summary of infrared thermographic systems. The theory of operation, which is based on measuring blackbody thermal radiation, and the fourth power temperature dependence of emitted IR energy, demonstrate the strong signal inherent in this sensing technology and hence, the high temperature resolution that is achievable. There are various types of IR imaging systems. All of them generate images that represent the surface temperatures of an object, but they can be comprised of a variety of detector materials; each has unique spectral-physical properties, which makes it important
(8)
Window Thermal Testing Example An example of IR imaging science producing a temperature map is drawn from the authors’ work in building sciences (6,7). Figure 10 shows two IR images of the sill region of a window. The window was conditioned in laboratory chambers to simulate a long, cold winter night. Building scientists would like to compare the exact distribution of surface temperatures on the window to
IR image of window sill
External reference emitter
IR image of window sill that has background mirrors on vertical surfaces
Background mirrors applied to specimen
−5°C
Cold weather side
22.0°C
Room temperature side
IR imager
Window sill
View arrangement
Figure 10. Temperature mapping for self-viewing surfaces using background mirrors. See color insert.
816
INFRARED THERMOGRAPHY
Window centerline surface temperature map
Surface temperature(°C)
25 20
Wood frame corners
15
Thermocouple wire with 25-mm tape
10
For instance, an imager that has a very high frame rate may lack the ability to make absolute temperature measurements as accurately as an imager designed for slower phenomena, where absolute accuracy is most crucial. Overall, there is much to be learned by ‘‘seeing’’ temperatures by using an IR imager, and this capability is likely to grow more powerful and become more accessible in the future.
5 0 −5 Edge of glass −10 −250
−200
−150
−100
−50
0
50
100
Acknowledgments This work was supported by the Assistant Secretary for Energy Efficiency and Renewable Energy, Office of Building Technology, State and Community Programs, Office of Building Research and Standards of the U.S. Department of Energy under Contract No. DE-AC03-76SF00098.
Accumulated distance (mm)
BIBLIOGRAPHY Figure 11. Surface temperatures of a building window.
to select the proper detector technology for a particular application. There are two major families of optical design, focal plane array imagers and motorized scanning mirror-based imagers. For successful IR imaging, it is essential to be aware of the surface property emittance. Knowledge of surface emittance in the spectral band of the imager being used allows the thermographer to distinguish the emitted portion of the measured radiation from the reflected portion. The thermal setup and environment also require special considerations for successful thermography. The thermographer often manipulates the overall heat transfer that leads to the temperature differences imaged on the specimen of interest to produce the most accurate and informative thermal data. Scientific temperature mapping is a methodology that improves on the absolute accuracy of quantitative thermography. The procedures employed include external referencing targets and background mirrors. Typical applications of IR imaging include thermographic inspection for predictive maintenance, medical imaging, nondestructive materials testing, and building component testing. The future of IR imaging is likely to see many more applications emerge, as this powerful technology becomes more accessible due to less expensive and more compact equipment. New detector types such as microbolometers, QWIPS or direct sensing optical systems (change of refraction due to temperature), will become more prominent. IR imagers will be substantially smaller and more portable and have reduced power requirements, as a result of compact focal plane array optics and eliminating cryogenic cooling for sensors. Increasing computing power will make image analysis and processing possible for on-line usage. Image resolution of focal plane arrays is expected to rise. Megapixel arrays are likely to be available for thermal imaging. Faster image acquisition speeds will open a new world of diagnostic possibilities for very fast thermal phenomena such as combustion or blasting technology. The various detection mechanisms may make IR imagers even more specialized for their intended applications. Selecting the appropriate technology for the job will continue to be important.
1. G. C. Albright, J. A. Stump, J. D. McDonald, and H. Kaplan, Proc. SPIE, vol. 3700, Thermosense XXI, 5–9, April 1999. 2. V. Vavilov et al., in D. O. Thompson and D. E. Chimenti, eds., Review of Progress in Quantitative NDE, Plenum Press, NY, 1991, 1992, pp. 425–432. 3. D. Turler, Proc. SPIE, vol. 3700, Thermosense XXI, 5–9, April 1999. 4. N. Belliveau et al., Breast J. 4(4) (1998). 5. D. Turler, B. T. Griffith, and D. Arasteh, in R. S. Graves and R. R. Zarr, eds., Insulation materials: Testing and Applications: ASTM STP 1320, American Society for Testing and Materials, Philadephia, PA, 1997. 6. B. T. Griffith and D. Arasteh, Proc SPIE, vol. 3700, Thermosense XXI, 5–9, April 1999. 7. B. T. Griffith, H. Goudey, and D. Arasteh, ASHRAE Trans., (in press).
Further Reading Heat transfer Mills A. F., Heat Transfer, 2e, Prentice-Hall, Englewood Cliffs, NJ, 1999.
Predictive maintenance Chan W. L., So A .T. P., and Lai L. L., IEE Proc., Generation Transmission Distribution 147(6), 355–360 (2000). Cronholm M., Proc. SPIE, vol. 4020, Thermosense XXII, April, pp. 99–106, 2000.
NDE/Tomography Vallerand S. and Maldague X., NDT & E Int. 33(5), 307–315 (2000). Inagaki T., Ishii T., and Iwamoto T., NDT & E Int. 32(5), 247–257 (1999). Szekely V. and Rencz M., IEEE Trans Components Packag. Technol. 22(2), 259–265 (1999). Vavilov V., Proc. SPIE, vol. 1313, Thermosense-XII, 1990, p. 307. Maldague X. and Moore P. O., Infrared and Thermal Testing, 3e, American Society for Non-Destructive Testing, Columbus, OH, 2001.
INK JET PRINTING FOR ORGANIC ELECTROLUMINESCENT DISPLAY Maldague X., ed., NDT Handbook on Infrared and Thermal Testing, vol. 3, 3rd ed., American Society for Non-Destructive Testing Handbook series, ASNT Press, 2001.
INK JET PRINTING FOR ORGANIC ELECTROLUMINESCENT DISPLAY
Maldague X., in Birnbaum G. and Auld B. A., ed., Topics on NDE, vol. 1: Sensing for Material Characterization, Processing, and Manufacturing, American Society for Non-Destructive Testing–ASNT, 1998, pp. 385–398.
YANG YANG SHUN-CHI CHANG JAYESH BHARATHAN JIE LIU University of California Los Angeles, CA
Gas transmission/absorption applications Tregoures A. et al., Waste Manage. Res. 17(6), 453–458 1999.
Medical IR thermography Koyama N. et al., Pain 84(2–3), 133–139 (2000). Jones B. F., IEEE Trans. Med. Imaging 17(6), 1,019–1,027 (1998). Silverman H. L. and Silverman L. A., Clinical Thermographic Technique: A Primer for Physicians and Technicians, Rabun Chiropractic Clinic, Clayton, GA, 1987.
General IR thermography Rozlosnik A. E., Proc. SPIE, vol. 4020, Thermosense XXII, pp. 387–404, April 2000. Heinrich H., Proc. SPIE, vol. 4020, Thermosense XXII, pp. 310– 313, April 2000. Astarita T., Cardone G., Carlomango G. M., and Meola C., Opt. Laser Tech. 32, 593–610 (2000). Inagaki T. and Okamoto Y., Heat Transfer — Asian Res. 28(6), 442–455 (1999). Mayer R., Henkes R. A. W. M., and VanIngen J. L., Int. J. Heat Mass Transfer 41(15), 2,347–2,356 (1998). Shulz A., Measure. Sci. Tech. 11(7), 948–956 (2000). Atkin K., Heyman C., and Scott R., Thermal Imaging Markets, 3rd ed., Jane’s, Coulsdon, 2001. Kruse P. W., Uncooled Thermal Imaging Arrays, Systems and application, SPIE Optical Engineering Press, Bellingham, WA, 2001. Thomas R. A., The Thermography Monitoring Handbook, Coxmoor Publishing, Oxford, UK, 1999. Holst G. C., Common Sense Approach to Thermal Imaging, SPIE Optical Engineering Press, Bellingham, WA, 2000. Kaplan H., Practical Applications of Infrared Thermal Sensing and Imaging Equipment, SPIE Optical Engineering Press, Bellingham, WA, 1999. Gaussorgues G., Infrared Thermography, Chapman & Hall, London, 1994.
Image processing/color data representation Wyszecki G. and Stiles W. S., Color Science: Concepts and Methods, Quantitative Data and Formula, John Wiley & Sons, Inc., NY, 1982. Gonzalez R. C. et al., Digital Image Processing, Addison-Wesley, 1992. Jain A. K., Fundamentals of Digital Image Processing, PrenticeHall, 1989. Fortner B. and Meyer T. E., Number By Colors, Springer-Verlag, NY, 1996.
817
INTRODUCTION Polymer light-emitting diodes (PLEDs) and organic lightemitting diodes (OLEDs) have a potential for extensive applications such as multicolor emissive displays, road signs, indicator lights, and logos (1–3). One of the major advantages of PLEDs and OLEDs is their color tuning capability; various emitted colors can be easily obtained by changing the chemical structure of the organic compounds. Another advantage is the solution processibility of the conjugated organics (4). Traditionally, polymer solutions are deposited by spin-coating, a technology incapable of patterning materials because it nonselectively distributes the materials all over the substrate. However, to realize the applications mentioned, such as multicolor displays, it is necessary to achieve lateral control during deposition of the different polymers for multicolor emission. This drawback of spin-coating can be overcome by using inkjet printing (IJP) technology. IJP technology is a common technology used for desktop publishing. It is a contactless method of printing and has found wide application in the packaging and printing industries. Ink-jet printers work on the principle of ejecting a fine jet of ink through nozzles 10–200 µm in diameter. The jet stream is broken up into a series of droplets that are deposited as a dot matrix image. The advantages of ink-jet printing technology over conventional spin-coating technology are shown in Table 1. CURRENT STATUS OF INK-JET PRINTING (IJP) TECHNOLOGY IN DISPLAY APPLICATIONS The idea of using ink-jet printing for organic and polymeric devices was first demonstrated by Garnier et al. in their research on organic transistors (5). In 1996, several industrial and academic groups started to use this technology to print organic luminescent dyes and conjugated polymers. The Princeton group, for example, replaced the ink cartridges of a commercial ink-jet printer with cartridges loaded with luminescent polymer solutions and printed arrays of polymer dots onto a thin flexible polyester substrate coated with indium tin oxide (ITO) (). To avoid thermal degradation of the luminescent materials, they used a piezoelectric inkjet printer instead of a thermal bubble ink-jet printer. Using a color printer with multiple ink cartridges and printing nozzles, they produced arbitrary single color and multicolor patterns. Continuous films were also obtained by printing solid patterns, and functioning organic LEDs were fabricated on such films (7). Despite this success, however, the fabrication of OLED/PLED by using this technology was hindered by
818
INK JET PRINTING FOR ORGANIC ELECTROLUMINESCENT DISPLAY
Table 1. Comparison of Spin-Coating and Ink-Jet Printing Technologies Characteristics
Spin-Coating
Ink-Jet Printing
Patterning capability
No patterning capability
Large device area capability
Sensitive to dust particles and substrate defects, and not suitable for large area processing.
Efficiency of using material Multicolor display fabrication capability
More than 99% of the polymer solution is wasted. No multicolor patterning capability.
technical difficulties such as pinholes in the printed layer, which are intrinsic to this technology. Bharathan and Yang resolved this problem by introducing a hybrid IJP technology (3,8). In their technology, the printed polymer layer was overcoated by another uniform, polymer thinfilm layer to seal the pinholes and to provide other functions. The details of this technology are the main focus of this review article. The fundamental research on IJP technology has been pushed forward by academic research, but the industry has been making rapid strides applying this technology to fabricate multicolor and full-color displays. Major industrial companies involved in ink-jet printing technology are Seiko-Epson, Cambridge Display Technology (CDT), and Philips Electronics. Among these companies, Seiko-Epson and CDT have formed a technical alliance to codevelop this technology. CDT demonstrated a full-color 2’’ display at the Society for Information Display (SID) conference in 1999 (9). One year later, Philips demonstrated a dual-color PLED display for cell phones at the Long Beach SID conference in 2000. The latest developments from these companies in this technology can be found at their respective web sites (www.cdtltd.co.uk and www.research.philips.com).
HYBRID INK-JET PRINTING (HIJP) As mentioned earlier in this article, IJP is ideal for printing polymer and organic solutions, but this technology cannot be used in its original form because of the inherent presence of voids between the solution droplets, which would create electrical shorts in the device. This can be clearly understood from Fig. 1. To overcome these drawbacks and successfully apply the patterning capability of ink-jet printing technology, a group at UCLA developed a hybrid ink-jet printing technology (HIJP) (3,8).
Figure 1. IJP technology in its original form, left. The combination of IJP technology and a buffer layer (HIJP technology), middle and right. Middle, the buffer layer can be the bottom polymer layer which absorbs IJP droplets. Right, the buffer layer can be above the IJP layer and seal the pinholes.
Capable of patterning with micrometer resolution IJP is not sensitive to substrate defects, and it is a better technology for fabricating large area devices. Only less than 2% of the material is wasted. Ideal for multicolor patterning.
This technology consists of an ink-jet printed layer in conjunction with another uniform, spin-coated (or precision coated) polymer layer. This uniform layer, which is called the buffer layer, seals the pinholes, and the ink-jet printed layer consists of the desired pattern, for example, red–green–blue dots for a multicolor display. The use of the polymer buffer layer is demonstrated in Fig. 1. Following is a brief description of the unique characteristics of HIJP technology: • The buffer layer is a pinhole-free layer that can seal the pinholes (or voids) produced by the ink-jet printed layer. • The buffer layer could be the ink-absorbing layer and could effectively ‘‘fix’’ the printed materials when a proper solvent for the ink has been chosen. In addition, this buffer layer effectively planarizes the surface roughness produced by the ink-jet printed dots. • Due to the small diameter of the jet nozzle and the high molecular weight of the conjugated polymer, IJP is ideal for printing small amounts of materials. Therefore, this technology is ideal for depositing dopants. Multicolor emission can be realized from efficient energy transfer, when the buffer layer is a wide band-gap, semiconducting polymer layer and the ink-jet printed materials are the dopants whose band gaps are smaller than those of the buffer layer. The HIJP concept is a very unique approach to fabricating polymer and organic electronic devices. One can apply this technology to deposit various functional materials such as charge-injection layers, charge-blocking layers, and multicolor polymer/organic emissive layers. All of these possibilities are demonstrated by making four different types of devices: polymer light-emitting logos, dual-color PLEDs, multicolor OLEDs, and pixelated PLEDs. These four devices and the ink-jet printed materials are listed in Table 2. The discussion of these four devices is the main focus of this review article.
Ink-jet Patterned head A potential ink-jet printed short point image Substrate
Buffer layer polymer ITO
No short
Buffer layer polymer
No short
INK JET PRINTING FOR ORGANIC ELECTROLUMINESCENT DISPLAY
819
Table 2. Makeup of Light-Emitting Devices Devices Fabricated
IJP Material
Polymer light-emitting logo Dual-color PLED
Multicolor OLED
Pixelated PLED
Buffer Layer Material
Hole-injection layer: PEDOTa Red-emission polymer: MPS–PPV (Diffusion of MPS–PPV into PPP buffer layer, energy transfer occurred) Green emission organic molecule: Almq3 Red-emission organic dye: DCM (Bilayer structure, no diffusion occurred) Insulating polymer: PVAe PVA is used as the insulating shadow mask for patterning a cathode.
EL polymer: MEH-PPVb Blue-emission polymer: PPP–NEt3 +c
Blue-emission polymer: PVKd
EL polymer: MEH-PPV
a
PEDOT: (3,4-polyethylenedioxythiophene-polystyrenesulfonate). MEH–PPV: Poly(2-methoxy-5-(2 -ethyl-hexyloxy)-1,4-phenylene vinylene). c PPP–NEt3 + : poly[2,5-bis[2-(N,N,N-triethylammonium) ethoxy]-1,4-phenylene-alt-1,4-phenylene] dibromide. d PVK: poly-9-vinylcarbazole. e PVA: polyvinylalcohol. b
POLYMER LIGHT-EMITTING LOGO The first example of the successful application of HIJP technology is the polymer light-emitting logo (3). In this example, a water-soluble conducting polymer PEDOT (3,4-polyethylenedioxythiophenepolystyrenesulfonate) was used as the ink for the printer and an emissive polymer MEH-PPV was used as the buffer layer. To fabricate the polymer light-emitting logos (PLELs), the desired pattern was printed on 3 × 3 cm glass/ITO substrates followed by spin-coating the emissive polymer MEH-PPV. The Typical film thickness was around 1,000 A˚ as determined by alpha-step profilometry. The devices were then fabricated in air by routine LED technology. The fabrication procedure is shown schematically in Fig. 2. The charge-injection efficiency of a conducting polymer is much better than that of ITO, so only the areas covered by the conducting polymer illuminate under the same operating voltage (10). Therefore, the inkjet printed PEDOT defines the pattern of the logo. The voltage–brightness measurements in Fig. 3 show
(a)
Glass
significant differences in performance between these two types of devices. For example, when the device is operated at 5 volts, the ITO/PEDOT/MEH–PPV/Ca device has around 200 cd/m2 brightness, whereas the brightness from the ITO/MEH–PPV/Ca device was about three orders of magnitude smaller. This gives a contrast, defined as the brightness ratio of the bright/dark regions, of approximately 800. Figure 4 illustrates a polymer light-emitting UCLA logo and a Valentine heart logo produced by this technology. Because images from a personal computer can be directly printed to form a light-emitting logo, this technology has very broad applications. For instance, new and complicated emissive logos can be custom-built for a variety of purposes such as greeting cards or other novelty items. One can easily extend this concept of light-emitting logos to light-emitting pictures. However, one of the major concerns of applying this technology to achieve highquality light-emitting pictures is the gray scale. In Fig. 5(a), a four-level gray scale generated from a computer graphic program was demonstrated. The density of emissive dots defines the gray scale, and
(b)
Glass ITO
ITO UCLA PLED
Printed CP logo
Cathode
(c)
EL polymer Conducting polymer EL Figure 2. The polymer light-emitting logo fabrication process: (a) preparation of the substrate; (b) printing of the conducting polymer into the desired pattern; (c) deposition of the luminescent polymer and the cathode material.
820
INK JET PRINTING FOR ORGANIC ELECTROLUMINESCENT DISPLAY
103 PT anode
Brightness (cd/m 2 )
102
ITO anode
101 100 10−1 10−2 0
5
10
15
20
Bias (V) Figure 3. The brightness–voltage curves of devices with and without PEDOT as the hole injection layer. The contrast–voltage curve is shown in the inset.
Fig. 5(b) shows the brightness of each gray-scale level. This gray scale can be tuned nearly continuously by changing either the dot size or the density of dots. Another advantage of this ink-jet printing technology is that micron-sized polymer light-emitting diodes can be fabricated without going through the regular patterning of anode and cathode. Usually, small-sized LEDs can be fabricated only by crossing the cathode and anode fingers where the overlap area defines the pixel size. This unique patterning capability of the ink-jet printer using a conducting polymer provides a convenient alternative for generating regular arrays of micron-sized polymer LEDs. Typical emissive dot sizes mentioned in this report ranged from 180–400 µm, depending on the amount of conducting polymer ink sprayed from the nozzle. The dimensions of the pixels produced by this technology are also a function of the nozzle size of the ink-jet head; the dimensions can be reduced by reducing the nozzle size. Another important consideration for in using the ink-jet printing process to create organic electronic devices and circuits is the film quality, particularly compared to the spin-coating process. Thin polymer films prepared by the ink-jet printing process and the spin-coating process were compared qualitatively. The conducting polymer layer
PEDOT was coated on precleaned ITO substrates by spin coating and ink-jet printing. The spin-coating process was conducted under ambient conditions similar to the regular PLED fabrication process. The PEDOT solution was first filtered with a 2.7-µm syringe filter, and the resultant solution was spin coated on the substrates at a spin rate of 4,000 rpm. The film thickness was about 85 nm (measured with Sloan Dektak IIA). The ink-jet printing process was carried out with a commercially available SeikoEpson printer purchased through commercial channels and modified at UCLA. As part of the modifications, the ink from the ink reservoir was drained, and the cartridge was subsequently washed thoroughly with DI water. PEDOT was reloaded into the ink reservoir [PEDOT viscosity measurement conditions: ambient air, room temp. (23.6 ° C); motor spin speed: 30 rpm; viscosity: 8.04 cp (centipoise) = 8.04 mPa s (millipascal-s)]. One can print good quality PEDOT films using this printer. Finally, all of the substrates coated with PEDOT were baked at 120 ° C for 2 hours. An atomic force microscope in contact mode (Qscope 250, Quesant Instrument Corp.) was used to investigate the film morphology. The scan size was 5 × 5 µm. Figure 6(a) and (b) shows the AFM images of the spin-coated and ink-jet printed films, respectively. It was quite surprising to observe the similarity in film uniformity between the ink-jet printed and spin-coated film. However, the major technical challenge in using the IJP process for organic electronics is the mark at the edges of the droplets from drying of the ink droplets. DUAL-COLOR POLYMER LIGHT-EMITTING DIODES The most promising application of ink-jet printing is the fabrication of polymer multicolor displays. Hence, the second example to illustrate the potential of HIJP technology is the fabrication of a dual-color PLED (11). To fabricate dual-color pixels, the polymer buffer layer was a wide band-gap, blue-emitting semiconducting polymer, poly[2,5-bis[2-(N,N,N-triethylammonium)ethoxy]1,4-phenylene-alt-1,4-phenylene] dibromide (PPP-NEt3 + ), prepared by spin coating and approximately 2,000 A˚ thick, as determined by profilometry. The ink-jet printed layer was a red-orange semiconducting polymer, poly(5-methoxy-2-propanoxysulfonide-1,4-phenylene
Figure 4. The polymer light-emitting logo patterned by IJP technology: (a) a UCLA logo, and (b) a Valentine heart logo.
INK JET PRINTING FOR ORGANIC ELECTROLUMINESCENT DISPLAY
821
(a)
(a)
250 Brightness (cd/m 2 )
(b)
200 150 100 50 0 0
50
100 150 Density of dots
200
250
Figure 5. (a) Four levels of gray scale of the polymer light-emitting logo created by ink-jet printing technology. The brightness of these four regions is defined by the density of dots. (b) Brightness vs. density of dots.
vinylene) (MPS-PPV) which was printed on the buffer layer. A thin layer of conducting polymer, polyaniline, was spin-coated onto the ITO as an efficient hole-injection layer (10). (Figure 7 shows the structure of the dual-color LED.) The PPP-NEt3 + solution was prepared in acidic DI water (pH = 4) at 1% wt concentration, and the MPSPPV solution was prepared in neutral DI water at 2% wt concentration. The PPP-NEt3 + buffer layer plays a critical role in this dual-color device. The PPP-NEt3 + is first synthesized as a neutral polymer that is soluble in organic solvents (CHCl3 , THF) and in acidic water through quaternization of the amine functionality. In a second step, the neutral polymer is treated with bromoethane to yield the bromide salt of the polymer. This form of the polymer has varying neutral water (pH = 7) solubility characteristics that depend on the number of amine sites quaternized. This particular PPP-NEt3 + sample was quaternized so that an aqueous solution of pH = 4 was necessary to achieve the desired solubility. This unique property makes PPP-NEt3 + an ideal ink-absorbing material for the MPS-PPV, which was prepared in neutral DI water. If the PPP-NEt3 + were completely soluble in the neutral DI water, then the solvent for MPS-PPV would have smeared the buffer layer. Hence, the concepts of ink absorbing (or dopant diffusion) and energy transfer effects are successfully demonstrated in this PPP-NEt3 + :MPS-PPV system. The solvent of the ink-jet printed MPS-PPV slightly dissolves the PPP-NEt3 +
Figure 6. (a) The AFM image of the spin-coated PEDOT film. (b) The AFM image of an IJP PEDOT film. These two images show similar surface morphology and smoothness for spin-coated and IJP films.
layer and allows the MPS-PPV to diffuse into the PPPNEt3 + layer. This leads to efficient energy transfer from the PPP-NEt3 + to the MPS-PPV, thereby generating a redorange photoluminescence and electroluminescence from the ink-jet printed sites. The optical microscope pictures of three different IJP patterns were taken to illustrate the importance of solvent selection. Figure 8(a) shows an IJP dot of MPS-PPV printed directly onto the surface of a glass slide. This dot is circular and has a clear drying mark around the edge. On the other hand, Figure 8(b) is the picture of an IJP dot of MPS-PPV on the surface of the PPP-NEt3 + buffer layer. As indicated earlier, the solvent for MPS-PPV only slightly dissolves PPP-NEt3 + ; hence MPS-PPV can diffuse slightly into the buffer layer. This diffusion of MPS-PPV into PPP-NEt3 + can be clearly observed from the optical microscope image in Fig. 8(b). Along the edge of the MPS-PPV dot is a ring-shaped area surrounding the MPS-PPV dot, indicating that the solvent of MPS-PPV diffused into the buffer layer. Finally, when a solvent was chosen that can completely dissolve the buffer layer, the ink-jet printed dot mixed completely with the buffer layer polymer. Figure 8(c) is a microscope picture of a red-orange MPS-PPV dot completely blended into the PPP-NEt3 + buffer layer by using acidic DI water as
822
INK JET PRINTING FOR ORGANIC ELECTROLUMINESCENT DISPLAY
(b)
Figure 6. (Continued)
Ink-jet printed MPS−PPV dots
Blue PPP−NEt3+ buffer layer
MPS−PPV diffused into PPP−NEt3+
PANI / ITO anode
substrate. Two were blue emission LEDs (devices A and B), and the other two were orange-red emission devices (devices C and D) as can be seen from Fig. 7. The PL and EL spectra were measured using an Ocean Optics Spectrometer. The photoluminescent (PL) spectrum of the PPPNEt3 + :MPS-PPV blend is identical to the MPS-PPV PL spectrum, and this is direct evidence of the dopant diffusion and energy transfer effects discussed earlier. The PL spectra of PPP-NEt3 + and PPP-NEt3 + :MPS-PPV are shown in Fig. 9. The current–voltage curves of the PPP-NEt3 + and PPP-NEt3 + :MPS-PPV devices are shown in Fig. 10. The slightly high operating voltage for both types of LEDs was due to the thick PPP-NEt3 + layer. The PPP-NEt3 + :MPS-PPV system has a much lower turn-on and operating voltage than that of the PPP-NEt3 + system. This may be due to the fact that MPS-PPV has a smaller band gap and hence carrier injection and carrier transport is much easier than those of the PPP-NEt3 + system. The brightness of both devices however, is, not very high. For the blue emission, the peak brightness at 33 volts was about 10 cd/m2 . For the red emission, the peak brightness at 22 volts was about 7 cd/m2 . Such low brightness is probably due to the fact the devices were fabricated in air and photochemical oxidation reaction could have damaged the polymer. In addition, the low efficiency and brightness of the device could also be due to the use of DI water and acidic DI water as solvents for MPS-PPV and PPP- NEt3 + , respectively. These examples of blue- and red-emitting PLEDs, indicate that the solvent chosen for the IJP material (the dopant) is very important in achieving dopant diffusion and energy transfer effects. These two examples also illustrate the advantage of HIJP technology in controlling the local morphology between the dopant and the buffer layer (the host). The details of the diffusion process, the local morphology, and the potential for forming high surface area contact between the dopant polymer and the host polymer are still under investigation. MULTICOLOR ORGANIC LIGHT-EMITTING DIODES
Anode contact
A
B
Blue emission
C
D
Cathode
Red-orange emission
Figure 7. The device structure of dual-color polymer light-emitting diodes. Devices A and B are blue-emitting LEDs where PPP-NEt3 + is the active material. Devices C and D are red-orange devices where PPP-NEt3 + :MPS-PPV is the active material. See color insert.
the solvent. This example illustrates the hybrid ink-jet printing process and the importance of the choice of the organic solvent for the IJP material. Based on this HIJP principle, blue and orange-red dualcolor polymer light-emitting diodes were fabricated on the same substrate. Four LEDs were fabricated on one
The third example demonstrated was multicolor organic light-emitting diodes (OLEDs). In addition to polymer light-emitting diodes, HIJP can also be applied to OLEDs (12). The traditional fabrication of OLED involves thermal sublimation of organic materials in an ultrahigh vacuum. However, this process is rather time-consuming and complicated for patterning fine multicolor pixels. As illustrated earlier, HIJP is a convenient alternative for patterning multicolor pixels through efficient energy transfer from an appropriate buffer layer (a semiconducting polymer layer) that has a wide band gap to the ink-jet printed materials (dopants) that have smaller band gaps than the buffer layer. Alternatively, the buffer layer can be a continuous polymer layer insoluble in the solvents used for IJP droplets. In this case, it seals the pinholes and can also function as the hole-transport layer (HTL). The latter case is ideal for fabricating organic LEDs because the most efficient OLEDs consist of a bilayer HTL structure and an emissive layer (13).
INK JET PRINTING FOR ORGANIC ELECTROLUMINESCENT DISPLAY
(a)
823
(b)
(c)
Figure 8. (a) IJP of MPS-PPV on glass surface (no buffer layer). (b) IJP of MPS–PPV on PPP-NEt3 + buffer layer; the solvent is DI–H2 O, and solvent diffusion can be seen clearly. (c) The solvent is acidic water for MPS-PPV, and it completely dissolved the buffer layer. This example illustrates the importance of choosing the right solvent for the HIJP process.
To fabricate organic multicolor LEDs by this technology, a blue-emitting, semiconducting polymer, poly-9vinylcarbazole (PVK) was chosen as the buffer layer. This layer was approximately 1,500 A˚ thick. The ink-jet printed dopants used were tris(4-methyl-8-quinolinolato)Al (III) (Almq3 ) () and 4-(dicyanomethylene)-2-methyl-6(4-dimethylaminostyryl)-4H-pyran (DCM), which were printed on the PVK buffer layer. Based on this principle, multicolor organic LEDs comprised of bilayer structures of PVK/DCM (orange-red emitting), PVK/Almq3 (green-blue emitting) were fabricated, and the blue emitting PVK buffer layer served as the hole-transport layer.
Commercially available ink-jet printers cannot resist the corrosive action of regular solvents, so methanol was chosen as the solvent for DCM and Almq3 , and each dopant was deposited separately. The structure of ink-jet printed multicolor organic LEDs is similar to that shown in Fig. 7. Each substrate consisted of four LEDs, two of these four LEDs were blue-emitting LEDs (devices A and B in Fig. 7), and PVK was the active material. The remaining two were either orange-red or blue-green emitting LEDs (devices C and D in the same figure), and the emission was obtained from the PVK/DCM or PVK/Almq3 bilayer structures, respectively. A thin layer of PEDOT was spin-coated onto the ITO for efficient hole injection, and PVK was chosen
824
INK JET PRINTING FOR ORGANIC ELECTROLUMINESCENT DISPLAY
as the buffer layer because of its insolubility in methanol. Hence, a well-defined bilayer structure can be obtained by using this layer. For the orange-red emissive devices, DCM was printed onto a PVK layer, and for the green-emitting device, Almq3 was printed onto PVK buffer layers using the same printer. The spin-coating of PVK and the inkjet printing process of DCM and Almq3 were carried out in air. The volatility of methanol permits localized
PL intensity (a.u.)
400.0 PPP−NEt3+:
PPP−NEt3+
300.0
MPS−PPV
200.0
100.0
0.0 400
500
600
700
800
Wavelength (nm) Figure 9. The PL emission of PPP-NEt3 + and PPP-NEt3 + :MPSPPV. EL spectra are similar. There is no blue emission in the PPP-NEt3 + :MPS-PPV blend and this spectrum indicates that the energy has been transferred from PPP-NEt3 + to MPS-PPV.
0.8
(a)
PPP−NEt3+ LED
Current (mA)
0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0
5
10
15
20
25
30
35
Bias (V) 5.0
(b)
PPP−NEt3+ : MPS−PPV LED
Current (mA)
4.0 3.0 2.0 1.0 0.0 0
5
10 15 Bias (V)
20
25
Figure 10. The I–V curves of the two types of devices: ITO/PANI/ PPP-NEt3 + /Ca and ITO/PANI/ PPP-NEt3 + :MPS-PPV/Ca. The device area is about 0.1 cm2 .
deposition of Almq3 and DCM droplets and hence no smearing of these droplets occurred when printing. For the purpose of comparison, a regular organic LED with the structure ITO/PEDOT/PVK/Almq3 /Ca was also prepared, and Almq3 was deposited by vacuum sublimation. Figures 11(a) and (b) show pictures of the Almq3 based LEDs fabricated by thermal deposition and by HIJP, respectively. These two pictures were taken under identical exposure conditions, and the results qualitatively reflect the difference in emissive intensities between these two devices. The device fabricated by thermal sublimation shows more uniform emission than the ink-jet printed device. The thin dark lines seen crossing devices fabricated by HIJP are caused by some clogged nozzles in the printer head that prevented material from being printed along these lines. This is a common problem afflicting ink-jet printers and hence demonstrates the importance of the buffer layer. Without the buffer layer, it would be virtually impossible to make good devices due to the unavoidable presence of pinholes that result from the IJP. The current–voltage (I–V) and light-voltage curves (L–V) for the PVK/Almq3 bilayer LEDs fabricated by HIJP are shown in Fig. 12(a), and the PL and EL emission spectra are shown in Fig. 12(b). The device turned on at around 8 volts, and the external quantum efficiency was about 0.02%. On the other hand, the devices obtained by thermal sublimation were superior to their ink-jet printed counterparts. The device efficiency was about 10 times higher (0.2%), and device stability was much better. The low efficiency and short lifetime of the LEDs fabricated by using HIJP technology could be due to the fact that the devices were fabricated in air, and significant device degradation could have occurred due to the presence of residual moisture and oxygen. Another possibility for the faster decay of ink-jet printed devices is the difference in the morphology of Almq3 . Thermal sublimation results in films that are much more uniform than ink-jet printed films. The PL and EL emission spectra from devices made with ink-jet printed Almq3 are identical to the spectrum obtained from an Almq3 thin film deposited by thermal sublimation. This is an indication that the recombination zone is entirely within the Almq3 layer and that the PVK layer functions only as the hole-transport layer. However, the small peak in the EL spectra around 600 nm is perhaps due to the molecular aggregation of Almq3 (15). The peaks at 370 and 740 nm in the PL spectrum are from the mercury lamp that was used for the PL measurements. Figure 13 shows color pictures of red-green-blue LEDs. The red-emitting device, where DCM is the emissive material, has an emission spectrum consistent with regular DCM. (QE (External Quantum Efficiency) was estimated at less than 0.01%.) Analogous to the PVK/Almq3 bilayer structure, this is an indication that the recombination zone is entirely within the DCM layer and that PVK functions as the holetransport layer. The blue devices that have the structure ITO/PEDOT/PVK/Ca exhibit a blue-purple color. In this structure, the PVK layer behaves as the active layer
INK JET PRINTING FOR ORGANIC ELECTROLUMINESCENT DISPLAY
(a)
825
(b)
Figure 11. Color pictures of an OLED where Almq3 is the emitting material. In (a), Almq3 is processed by thermal sublimation, and in (b) Almq3 is processed by ink-jet printing. See color insert.
(a)
50
700.0
(b)
30
ITO/PEDOT/PVK/Almq3 /Ca LED
ITO/PEDOT/PVK/Almq3 /Ca LED
20 30 20
10
10
Intensity (a.u.)
Current (mA)
40
Brightness (cd/m 2 )
600.0 EL
500.0 400.0 300.0
PL
200.0 100.0
0
0 −2
0
2
4
6 8 Bias (V)
10
12
14
0.0 300
400
500
600
700
800
Wavelength (nm)
Figure 12. The current–light–voltage (I–L–V) curves (a), and EL and PL emission spectra (b) for the ITO/PEDOT/PVK/Almq3 LED.
attributed to defects in the PVK film or perhaps due to the moisture absorbed during the spin-coating process and ink-jet printing processes, which were carried out in air (16).
SHADOW MASK BY INK-JET PRINTING
Figure 13. Color pictures of red, green, and blue OLEDs. The red and blue LEDs are on the same substrate, and the green LED was fabricated on a separate substrate. See color insert.
for generating electroluminescence. However, there is a red emission peak in the spectrum of PVK. This phenomenon has been previously observed and has been
The final example of HIJP technology is a shadow mask for patterning cathode metal. In addition to multicolor patterning capability, another important technical issue for an EL display is patterning of the anode and the cathode. For organic EL displays, the easiest and lowest cost method for driving the display is the so-called ‘‘X-Y’’ passive addressable driving scheme, achieved by patterning the anode and the cathode materials into mutually perpendicular row and column electrodes. ITO is patterned by regular, high-resolution photolithography. On the other hand, patterning the cathode is somewhat difficult. The photolithography process cannot be applied to pattern the cathode because the solvents that process the photoresist would destroy the underlying organic
826
INK JET PRINTING FOR ORGANIC ELECTROLUMINESCENT DISPLAY
demonstrated. IJP is compatible with organic materials and also has the other significant advantages of low-cost and convenience for processing large area substrates. It is also environmentally friendly because there is minimal waste of materials. This technology has tremendous advantages over the traditional spin-coating method of processing organic materials, as can be seen from Table 1. Because this technology can pattern organic materials, its applications are unlimited. It can be used to fabricate logos, indicator lights, multicolor displays, and also for biomedical applications such as biosensors for low-cost diagnostics. Acknowledgments The authors are very grateful for the supply of Almq3 from Prof. Junji Kido of Yamagata University, PPP-NEt3 + from Prof. John Reynolds of the University of Florida, and MPS-PPV from Prof. Fred Wudl of the University of California — Los Angeles. This research was partially supported by the University of CaliforniaCampus Laboratory Collaboration Program (Award No. 016715), the Office of Naval Research (Award No. N00014-98-1-0484), the and National Science Foundation (Career Award # ECS-9733355). Figure 14. Column cathode (vertical shining metal strips) patterned by the ink-jet printed built-in shadow mask. Also shown is the row PEDOT anode (horizontal dark strips) printed directly by IJP. The active polymer light-emitting pixels are shown as the inset.
materials. Therefore, the cathode is usually patterned by using stainless steel shadow masks. However, this technique lacks the required fine resolution. An alternative method could be using prepatterned mushroomshaped photoresist dividers on top of the active organic layers as shadow masks. Apart from demonstrating multicolor patterning capability, patterning a ‘‘built-in’’ shadow mask by ink-jet printing of inert polymers was also demonstrated. Figure 14 is a picture of patterned metal (cathode) strips using this technique. The printed material is an aqueous solution of polyvinylalcohol (PVA). As can be seen from the figure, the spacing between the inkjet printed lines determines the width of the cathode. The width of the metal strip or the opening of the shadow mask is about 1 mm. Eventually, the metal line width can be reduced to the resolution of the ink-jet printer, which has recently been improved to 10 µm (17). The thickness of the final PVA layer is estimated at more than 1 µm. In the future, shadow masks can be fabricated after printing multicolor organic compounds. This process will significantly simplify the procedure for fabricating organic multicolor, X-Y addressable, passive matrix displays. However, significant research is still required to form a ‘‘mushroom-shaped’’ photoresist mask, which has been proven effective in preventing cross-talk between pixels. CONCLUSION To summarize, the feasibility of processing organic materials using ink-jet printing technology has been
ABBREVIATIONS AND ACRONYMS HIJP IJP PLEDs PLELs OLEDs Almq3 DCM ITO MEH-PPV MPS-PPV PEDOT PPP-NEt3 +
PVA PVK
hybrid ink-jet printing ink-jet printing polymer light-emitting diodes polymer light-emitting logos organic light-emitting diodes tris(4-methyl-8-quinolinolato)Al (III) 4-(dicyanomethylene)-2-methyl-6-(4dimethylaminostyryl)-4H-pyran (DCM) Indium tin oxide poly(2-methoxy-4-(2’-ethyl-hexyloxy)-1,4phenylene vinylene) poly(5-methoxy-2-propanoxysulfonide-1,4phenylene vinylene) (3,4-polyethylenedioxythiophenepolystyrenesulfonate) Poly[2,5-bis[2-(N,N,Ntriethylammonium)ethoxy]-1,4-phenylenealt-1,4-phenylene] dibromide polyvinylalcohol poly-9-vinylcarbazole
BIBLIOGRAPHY 1. (a) J. H. Burroughes, D. D. C. Bradley, A. R. Brown, R. N. Marks, K. Mackay, R. H. Friend, P. L. Burns, and A. B. Holmes, Nature 347, 539 (1990). (b) G. Gustafsson, Y. Cao, G. M. Treacy, F. Klavetter, N. Colaneri, and A. J. Heeger, Nature 357, 477 (1992). 2. H. Nakada and T. Tohma, Display Devices 17, 29–32 (Spring 1998). 3. J. Bharathan and Y. Yang, Appl. Phys. Lett. 21, 2,660–(1998). 4. D. Braun and A. J. Heeger, Appl. Phys. Lett. 58, 1,982 (1991). 5. F. Garnier, R. Hajlaoui, and A. Yassar, Science 265, 1,684 (1994). 6. C. C. Wu, D. Marcy, M. Lu, and J. C. Sturm, Electron. Mater. Conf. (EMC), Metal, Mineral and Material Society, Fort Collins, Colorado, June, 1997. 7. T. R. Hebner and J. C. Sturm, Appl. Phys. Lett. 73, 1,775 (1998).
INSTANT PHOTOGRAPHY 8. (a) Y. Yang, U.S. Pat. pending, Serial No. 60/062,294 and 60/072 709. (b) Y. Yang and J. Bharathan, SPIE Proc. Photonic West Conf. on Light-Emitting Diodes, San Jose, Jan. 28–29, 1998, vol. 3279, pp. 78–81. 9. T. Shimoda et al., Proc. Soc. Inf. Disp. (SID) 99: 376, 1999. 10. Y. Yang and A. J. Heeger, Appl. Phys. Lett. 60, 1,245 (1994). 11. S. C. Chang, J. Bharathan, Y. Yang, R. Helgeson, F. Wudl, M. R. Ramey, and J. R. Reynolds, Appl. Phys. Lett. 73, 2,561 (1998). 12. S. C. Chang, J. Liu, J. Bharathan, Y. Yang, J. Onohara, and J. Kido, Adv. Mater. 11, 734 (1999). 13. C. W. Tang, S. A. VanSlyke, and C. H. Chen, J.Appl. Phys. 65, 3,610 (1989). 14. J. Kido and Y. Iizumi, Appl. Phys. Lett. 73, 2,721 (1998). 15. F. Papadimitrapolis, private discussion. 16. (a) C. Zhang, H. von Seggern, K. Pakbaz, B. Kraabel, H. W. Schmidt, and A. J. Heeger, Synth. Met. 62, 35 (1994). (b) Dr. C. Zhang, private communication. 17. G. Percin, T. S. Lundgren, and B. T. Khuri-Yakub, Appl. Phys. Lett. 73, 2,375 (1998).
INSTANT PHOTOGRAPHY STANLEY H. MERVIS Cambridge, MA
VIVIAN K. WALWORTH Concord, MA
INTRODUCTION The term instant photography originally referred to onestep film processes that provide finished photographs within a minute after exposure of a silver halide film. The processing of each film unit is initiated in the camera immediately after the film has been exposed. Such processes are outwardly dry, and the reagent is usually provided as part of the film unit. Processing proceeds rapidly under ambient conditions, as opposed to the multistep darkroom processes that require precise time and temperature control. The technology of instant photography has been extended to include the application of similar chemistry to films wherein processing may be delayed for a time after exposure. The advent of digital photography has further expanded the concept of instant photography. Now, CCD cameras make it possible to make an exposure, preview the resulting picture within seconds, and generate a corresponding color or black-and-white print within a few minutes. Furthermore, the major companies that provide instant films are now creating so-called hybrid systems that use both digital imaging and imaging on film to produce a variety of rapidly generated images. All such systems seek to enable the photographer to create a finished photograph under ambient conditions so as to view and judge the results quickly.
827
This article emphasizes silver halide instant films and the cameras that expose and process these films. These were the predominant instant systems at the start of the twenty-first century. The term instant photography as used herein refers to these film-based systems. Also included in this article are examples of the rapidly emerging digital/film imaging systems. An incentive for such systems is the convenience of immediate hard copy along with a digital record. Examples include digital/film imaging and digital camera/printer combinations. The discussion will note relevant low-speed reprographic systems, usually described as diffusion transfer reversal (DTR) processes, that use silver halide photosensitive layers and follow image-forming chemistry somewhat analogous to that of instant processes. Film systems using dye transfer based on microencapsulated dyes and photopolymerization (1), such as Cycolor, or photoinduced capsule rupture (2) to control image formation are outside the scope of this article. Similarly, dry-processed nonsilver black-and-white films, in which an image comprising fine carbon particles is formed by laserinduced imagewise phase transformation and resulting differential adhesion at the interface between the lasersensitive layer and the carbon-bearing layer (3), are not included. HISTORICAL BACKGROUND The first one-step photographic process, which produced sepia prints directly from the camera, was introduced by Land in 1947 (4). Land, Rogers, and Walworth provided a comprehensive account of one-step photography detailing the development of instant black-and-white and color processes from 1944 through 1976 (5). A monograph by Rott and Weyde describes concomitant developments in silver diffusion transfer processes and includes a chronological list of selected DTR patents through 1972 (6). Hill reviewed DTR and web transfer processes in 1976 (7). In 1978, the Sterckshhof Provinciaal Museum voor Kunstambachten provided a well-illustrated catalog of its exhibit documenting the first 40 years of developments in both one-step photography and DTR processes (8). Papers by Hanson in 1976 and 1977 introduced a new instant camera and film (9). A 1983 paper by Van de Sande included a thorough discussion of instant color image formation (10). In 1987, Weyde and Land provided historical accounts of work in their respective fields as part of the conference Pioneers of Photography, documented in the book bearing the same title (11). Articles by Thirtle and Zwick in the Kirk-Othmer Encyclopedia of Chemical Technology (ECT) 2nd edition, by Walworth in the ECT 3rd edition, and by Mervis and Walworth in the ECT 4th edition (12) detail developments in the chemistry of instant color processes. A later publication by Mervis and Walworth describes developments in instant photography and related processes through 1996 (13). McCann’s 1993 compendium of Land’s writings includes all of his papers on instant photography and a complete list of patents more than 500 Land’s (14).
828
INSTANT PHOTOGRAPHY
Contemporary Product Information
Reagents for Instant Photography; Pods
The instant films described herein, unless otherwise noted, were available in May, 2001. For more comprehensive information on current products and applications, see the manufacturers’ technical bulletins or visit their respective web sites: www.polaroid.com; www.polaroiddigital.com; www.fujifilm.com.
An essential component of each instant film is the reagent system. Essentially dry processing is realized by using a highly viscous fluid reagent and restricting the amount to just that needed to complete the image-forming reaction for a single picture. The high viscosity of the reagent is provided by watersoluble polymeric thickeners (18). Suitable polymers include hydroxyethyl cellulose, alkali metal salts of carboxymethylcellulose, and carboxymethyl hydroxyethyl cellulose. The high viscosity of the reagent facilitates accurate metering to form a thin layer that also serves as an adhesive for the two sheets during processing. In peel-apart films, the reagent layer may remain on the surface of the print, or it may be stripped away by the other sheet. In integral films, the reagent forms a new layer within the processed film unit. A sealed pod of reagent for each picture permits using more highly reactive reducing agents and more strongly alkaline conditions than would be feasible in a tank or tray process. Sealing the pod protects the reagent from oxygen until it is used. In addition to a high molecular weight polymer, reducing agents, and alkali, the viscous reagent may contain reactive components that participate in image formation, deposition, and stabilization. Some reactants may also be incorporated in coatings on either of the two sheets. Such an arrangement may isolate reactive components from one another before processing, facilitate sequential availability during processing, and provide a stable and compatible environment to each component. The reagent-containing pod must be carefully designed for both containment and discharge of its contents. The pod lining must be inert to strong alkali and other components of the reagent, and the pods must be impervious to oxygen and water vapor over long periods. When the pod passes through processing rollers, the seal must rupture by a peeling separation, rather than explosive bursting, so that the reagent is released along one
PRINCIPLES OF INSTANT FILMS AND PROCESSES Film and Process Design Handheld camera use requires a film of sufficient photographic sensitivity to permit short exposures at small apertures. Silver halide emulsions have high sensitivity and, upon development, enormous amplification. In a onestep process a viscous reagent is spread between two sheets; one bears an exposed silver halide emulsion, and the other an image-receiving layer. Both sheets pass through a pair of pressure rollers, and a sealed pod (Fig. 1) (15) attached to one of the sheets ruptures to release the viscous reagent, which spreads to form a thin layer between the two sheets, bonding them together. The action of the reagent produces, concomitantly, a negative image in the emulsion layer and a visible black-and-white or color image in the image-receiving layer. Film Configurations: Peel-Apart and Integral Films In so-called ‘‘peel-apart’’ products, the two sheets are stripped apart to reveal the image. In ‘‘integral’’ films, the image-forming layers are located on the inner surfaces of the two sheets, at least one of which is transparent. The emulsion layers are exposed through a transparent support layer, and the processed print is viewed either through the same surface (16) or through the opposite surface (17) of the film unit. A reflective white pigment layer within the film unit facilitates viewing by reflected light.
(a)
(b)
C
C
Sealing strip
Coating Metal foil Paper backing (c)
Figure 1. Typical pod structure (as in US Pat. 2,543,181). (a) Unfilled pod; (b) filled, sealed pod; (c) cross section of filled pod along line C–C in (b).
INSTANT PHOTOGRAPHY
edge and spread uniformly, starting as a fine bead and forming a smooth, thin layer. The amount of reagent contained in each pod is particularly critical for integral film units that have limited capacity to conceal surplus reagent. One-Step Cameras and Processors The earliest one-step cameras used roll film and completed processing inside a dark chamber within the camera (19). With flat-pack cameras (Fig. 2), introduced in 1963, the user draws the film between processing rollers and out of the camera before processing is completed (20). Film holders for instant film packets (Fig. 3) contain retractable rollers that enable the user to load the film unit without rupturing the pod (21). For 8 × 10 in. (20 × 25 cm) films, the rollers are part of a tabletop processor. The exposed film, contained in a protective black envelope, and a positive sheet that has the pod attached are inserted into separate slots of a tray that leads into the processor. The film passes through the rollers into a covered compartment within which processing is completed. The Polaroid 20 × 24 in. (51 × 61 cm) camera uses separate negative and positive sheet films in continuous rolls. The pod is attached manually before the sheets enter motorized processing rollers. The 40 × 80 in. (102 × 203 cm) film used for Polaroid museum reproductions is
(a)
Lens
also in continuous roll form. For this format, the reagent is applied from a flexible plastic ‘‘straw’’ that is drawn between rollers to extrude a bead of reagent along the nip between the sheets just before they enter the processing rollers. In the 35-mm instant format, both negative and positive layers are coated on a single support (22). Following exposure in a conventional 35-mm camera, the film is processed in a tabletop processor, such as the Polaroid AutoProcessor, as illustrated in Fig. 4. A processing pack provides a pod of reagent and a strip sheet. Processing is initiated as a thin layer of reagent applied to a strip sheet contacts the film surface. The strip sheet and film pass together between processing rollers and wind together onto a take-up spool. After a short interval, the strip sheet and film are rewound, separating, so that the processed film goes back into its cartridge and the strip sheet and adhered reagent go into the processing pack, which is then discarded. Fully automatic processing was introduced in 1972 by the SX-70 camera, which ejects each exposed integral film unit automatically. Immediately after exposure, a pick moves the uppermost film unit from the pack to the nip of motorized processing rollers, which spread the reagent and move the film unit out of the camera (23,24). Kodak instant cameras for integral films, introduced in 1976 and now discontinued, included both motorized and
(b)
Negative Pressure plate Pod
Positive Negative Pack container Camera
Pod of viscous reagent
Pressure plate White tab Negative exposed here
Yellow tab
Positive
(c)
Rollers
Negative
Positive
829
Viscous reagent Yellow tab
Figure 2. Schematic section of Model 100 camera (1963), illustrating processing of pack film outside the camera. The outer surfaces of both negative and positive sheets are opaque. (a) Position of each element during exposure; (b) following exposure, pulling the white paper tab leads the negative into position for processing; (c) pulling the yellow paper tab draws the entire film unit between processing rollers, rupturing the pod and spreading the reagent, and leads the film unit out of the camera.
White tab
Yellow tab
830
INSTANT PHOTOGRAPHY (a)
Control arm
Pod
Cap
Packet
Retainer
Rollers
(b) Negative
(c)
(d) Reagent
Figure 3. Film holder for 4 × 5 in. sheet films, used with professional 4 × 5 cameras and in MP-4 industrial cameras. The external control arm controls retractable processing rollers inside the holder. The sequence of operations is as follows: (a) insertion of the lighttight film unit into the holder with the control arm in LOAD position causes engagement of the cap on the end of the packet by a spring-loaded retainer; (b) partial withdrawal of the outer envelope enables film exposure; (c) reinsertion of unit repositions the outer envelope; (d) pulling the unit out of the holder by the control arm in PROCESS position ruptures the pod and spreads its contents between negative and positive sheets to initiate processing. Processing reaches completion outside the film holder.
hand-cranked models. Fuji instant cameras for integral films are motorized. In several Polaroid instant cameras, pulling a handle out of the camera actuates the processing rollers, initiates processing, and ejects the developing film unit. Formation of Instant Images One or more of a series of complementary positive and negative images can serve as the starting point for a transfer process that leads to a useful final image. Candidate initial image pairs include exposed grains, unexposed grains; developed silver, undeveloped silver; oxidized developer, unoxidized developer; neutralized alkali, alkali not neutralized; and hardened gelatin, unhardened gelatin. As an example, black-and-white instant print films use imagewise transfer of undeveloped silver halide, unoxidized developer, and unused alkali to an imagereceiving layer, within which they react to form a positive silver image. In instant color processes that use dye developers — molecules that are both image dyes and photographic developers — an image in terms of unoxidized dye developer transfers to form a positive image in the image-receiving layer. In dye release systems, either negative-working or positive-working emulsions are used, and dye release is effected by one or more of the initial images in terms of silver or developer.
BLACK-AND-WHITE INSTANT IMAGING PROCESSES Chemistry of Silver Image Formation In the processing of a silver halide emulsion after camera exposure, a reagent containing developing agent, alkali, and a silver halide solvent reduces the exposed grains in situ [Eq. (1)]. The solvent concomitantly dissolves the unexposed grains to form soluble silver complexes [Eq. (2)]. These soluble complexes transfer from the emulsion layer, or donor layer, to the image-receiving, or receptor, layer. The developer reduces the complex silver salts to form an image that comprises metallic silver; at the same time it regenerates the solvent so that it may recycle to form additional soluble complex [Eq. (3)]. AgXexp + Developerred −−−→ Ag0 + Developerox AgXunexposed + 2(S2 O3 )
(1) 2−
−−−→ Ag(S2 O3 )2
−
+X
Ag(S2 O3 )2
3−
(2) 3−
0
+ Developer −−−→ Ag
+ Developerox + 2(S2 O3 )2−
(3)
This sequence permits efficient silver transfer using minimum solvent. Further efficiency is realized by providing only the amounts of developer and solvent in
INSTANT PHOTOGRAPHY
(a)
A
831
B D
C E F (b) B A
Figure 4. Schematic diagram of the AutoProcessor for instant 35-mm films. (a) At the start of processing, the strip sheet (A) exits the processing pack, and the viscous reagent (B) spreads onto the surface of the strip sheet. The coated strip sheet and the exposed film (D) pass together between the processing rollers (E) and wind onto the take-up spool (F); (b) After a timed interval, the strip sheet and film are rewound. The strip sheet goes back into the processing pack (A) and the processed film goes back into its original cartridge (B). The used processing pack is discarded, and the processed film is ready for viewing. Both hand-cranked and motorized AutoProcessors are available.
each pod needed to process a single image (or strip of images, in the 35-mm films). The use of fresh reagent and the cyclic use of the solvent enable precise control of the competition between development and dissolution. In Eqs. (2) and (3), the silver solvent is sodium thiosulfate (hypo). The selection of developer and silver halide solvent depends on the nature of the image-receiving layer. For siliceous receptors, a typical reagent comprises a hydroquinone developer and sodium thiosulfate as solvent. For cellulosic receptors, typical developing agents are hydroxylamines and tetramethylreductic acid, and uracils serve as solvents. Figure 5 is a schematic cross section of a black-andwhite peel-apart instant film, such as Polaroid Type 667, before, during, and after processing.
distribution. Means of achieving such control include limiting the deposit to a thin receiving layer, limiting the volume available within that layer, and providing nuclei that limit the number of sites available for deposition. Matrix materials that provide such controlled deposition include colloidal silica and several cellulosic polymers. Colloidal nucleating particles include metal sulfides and noble metals. The inclusion of certain mercaptans may also influence image tone by controlling silver particle and aggregate sizes. In peel-apart films, even though the reagent adheres to the negative when the negative and positive are separated, the thin positive silver image, left unwashed, may be vulnerable to attack by residual reagent. The following sections discuss both image formation and the steps taken to ensure the stability of the image.
Image-Receiving Layers In addition to determining a suitable composition of the reagent, the composition and structure of the image receiver control the tone of the silver image and the stabilization of that image. Image Tone; Image Stabilization. The tone of the positive silver image depends on the nature of the receiving sheet and its effect on the particle size of the image silver. Particles that are very small produce a yellowish image silver deposit, and particles that are too large produce image silver that has low covering power. For consistent tone throughout the density range, it is necessary to have well-controlled silver deposit size
Silica Receiving Layers. In the earliest one-step film (Polaroid Type 40, 1948), which produced sepia images, the image-receiving layer comprised a thin colloidal suspension of silica containing very fine particles of an insoluble metal sulfide precipitated within the suspension. This suspension was coated onto baryta paper. The sulfide particles formed fine aggregates in the interstices of the silica and served as nuclei for depositing image silver in aggregates 25–125 nm in diameter. These images were stabilized by incorporating soluble salts within the support that diffused into the image layer as the image neared completion, neutralizing the reagent and rendering the image highly stable.
832
INSTANT PHOTOGRAPHY (a) Negative base Emulsion Pod Image-receiving layer Positive base
(b) Negative base
Reagent
Positive base Exposed area
Unexposed area
(c) Negative base
Reagent
Positive base Exposed area
Unexposed area
Figure 5. Schematic cross section of a black-and-white peel-apart film, for example, Polaroid Type 667. (a) Negative and positive sheets and pod before processing; (b) The same components during processing. The pod has ruptured, and the viscous reagent has spread between the sheets to form a temporary lamination. In exposed areas (left), silver halide grains develop in situ. At the same time, grains in unexposed areas (right) dissolve and transfer as a soluble complex to the image-receiving layer and undergo reduction there to form the positive image; (c) The sheets stripped apart after processing. The reagent has adhered to the negative sheet, and the positive print is ready for viewing.
In 1950, Polaroid initiated the use of silica-based receiving layers that produce black silver images. Larger aggregates form in a very thin layer of silica/metal sulfide, approximately 0.5 µm thick, coated over a layer of polyvinylbutyral. The image aggregates so constrained are 100–160 nm in diameter, and each aggregate comprises fine particles 10–30 nm in diameter. Because the underlying layer is impervious, diffusion of neutralizing components from beneath the image-receiving layer is not feasible. Instead, each print is stabilized after processing by swabbing with a print coater, a disposable swab saturated with an acidic solution of a basic polymer. The print coater removes residual reagent and deposits a thin polymeric layer that dries quickly to form a tough, impermeable coating. Typical print-coater films include Polaroid Types 611, 51, 52, 57, and 55P/N (positive/negative).
Cellulosic Receiving Layers. Coaterless films were introduced by Polaroid in 1970. These films use image-receiving layers produced by hydrolyzing a cellulose acetate layer containing nuclei (25). Suitable nuclei include colloidal metals and colloidal metal sulfides and selenides. The depth of hydrolysis determines the depth of the imageforming stratum. The silver aggregates form between 0.04 and 0.40 µm below the surface. The reagents for coaterless black-and-white films use nonstaining developing agents and silver halide solvents. The developing agents are substituted hydroxylamines (26), and the silver halide solvents are colorless, sulfur-free compounds, such as cyclic imides (27). An immobile layer of an acidic polymer underlying the image-receiving layer neutralizes residual alkali, and the onset of neutralization is controlled by a polymeric timing layer between the image-receiving layer and the acidic polymer layer. Stabilization of this type was first developed for the Polacolor color print process, and similar mechanisms are used in integral black-and-white and color films. Coaterless black-and-white print films using hydrolyzed cellulose acetate image-receiving layers include many of Polaroid’s peel-apart films. Fuji coaterless black-and-white films also use hydrolyzed cellulose acetate receiving layers. Polaroid’s integral black-and-white print films and instant black-and-white slide films use hydroxyethyl cellulose receiving layers containing noble metal nuclei (28). Black-and-White Instant Films Polaroid Pack and Sheet Films. Continuous-tone peelapart print films provided by Polaroid fall into several speed classes, ranging from ISO 100 to ISO 3,000 and 3,200. Most pack films produce 3 14 × 4 14 -in. prints that have an image area 7.3 × 9.5 cm (2 78 × 3 34 in.). Professional 4 × 5 pack films provide prints whose image area is 9 × 11.7 cm (3 21 × 4 58 in.), and prints from professional 4 × 5 sheet films have a 9 × 11.4 cm (3 12 × 4 12 in.) image area. Most of the black-and-white films are coaterless. The 100-speed Polapan Pro films are professional proofing films characterized by medium contrast and long tonal range. The faster print films are for general use. P/N positive-negative films, Type 655 pack film and Types 55 and Type 55 HC (high-contrast) sheet films, provide both a print for immediate viewing and a fully developed and fixed negative of high resolution and very fine grain from each exposed film unit. The negatives require aftertreatment in a clearing bath to remove residual reagent. For the pack films, this treatment also removes a temporary black opacifying back coat. A peel-apart transparency film for X-ray imaging, TPX, is available in 8 × 10-in. format. Type 611, a specialty low-contrast ISO 200 film, provides extended dynamic range and exposure latitude for recording CRT images. An ISO 400 sepia print film is used to simulate old-time photographs. Integral black-and-white films include 600 Classic (600 Extreme in Europe, 600 B&W in Japan) and 600 Copy&Fax, which incorporates a halftone screen. These ISO 640 square format films provide prints 7.9 × 7.9 cm (3 18 × 3 18 -in.) in image area. Autofilms, which provide a 10.2 × 7.6 cm (4 × 3-in.) image area, are
INSTANT PHOTOGRAPHY
Type 331 print film, ISO 400, and Type 337 high-speed print film, ISO 3200. Polaroid 35-mm Transparency Films. Instant slide films in 35-mm cartridges fit into standard 35-mm cameras and camera backs. The films are processed in the tabletop processor described earlier. The black-and-white positive transparency films are Polapan CT continuous tone slide film and PolaGraph HC high contrast slide film, both ISO 125. Fuji Print Films. In 1984, Fuji introduced a series of black-and-white and color peel-apart films. Most of the black-and-white films are available in both pack and sheet formats. Equipment provided for these films includes camera backs and film holders in each format and the Fuji UP-L camera, which has interchangeable backs in both formats. The Fuji films are compatible with corresponding format Polaroid cameras, camera backs, and film holders. The products include general purpose films, whose speeds range from ISO 100 to 3,000, and a low-contrast ISO 400 film for medical imaging and CRT screen imaging. Films available in the United States are FP-100B, FP100C, and FP-3000B. The Fuji black-and-white prints do not require coating or other aftertreatment. BLACK-AND-WHITE DIFFUSION TRANSFER PROCESSES The tank processing of black-and-white diffusion transfer reversal (DTR) materials involves silver halide reduction, complex formation, and image transfer steps, as for instant films. Unlike instant films, the light-sensitive emulsions are low speed, designed for use in copy equipment under ambient room light. In DTR processing, donor and receptor sheets pass through separate aligning feed trays into an activator bath comprising developing agent, silver halide solvent, and an organic base. The sheets exit the activator through rollers that temporarily laminate the two sheets. Stripping the two apart reveals the positive image on the receptor sheet. It is also possible to have both donor and receptor on the same support, so that just one sheet passes through the activator. Other variations incorporate some or all of the processing components in the sheet materials. If all components are so incorporated, the bath may be only water. DTR products current in 2001 include Agfa’s Copyproof negatives, receiving films and papers, chemicals, equipment, and accessories. The processes provide film and paper reproductions of both continuous tone and high contrast originals. Principal applications are in graphic arts. Panchromatic materials are also used for color separations. A specialty sepia ‘‘antique’’ receiving paper is used for continuous tone negatives and is processed in a dedicated activator bath. Kodak’s diffusion transfer products, identified as PMT materials, have been discontinued. INSTANT COLOR IMAGING PROCESSES Methods of Color Reproduction The reproduction of color requires selectively recording and presenting principal regions of the visible spectrum,
833
which extends roughly from 400 to 700 nm. For most processes, three records are used, corresponding respectively to the blue (400–500 nm), green (500–600 nm), and red (600–700 nm) regions. In subtractive color photography, the three images are formed within a multilayer film comprising separate silver halide emulsion layers, respectively, sensitive to blue, green, and red light. Early experimental color negatives for instant photography used subtractive negative layers in a screenlike configuration (29); the three color elements were arrayed side by side in an arrangement that had been used somewhat successfully in earlier work. These structures were abandoned in favor of multiple continuous layers (30). Processing a subtractive color film produces positive images in terms of complementary dyes: the blue record is transformed to an image in yellow, or minus blue, dye; the green record to an image in magenta, or minus green, dye; and the red record to an image in cyan, or minus red, dye. When white light passes through the superposed set of yellow, magenta, and cyan images, the dye images absorb the blue, green, and red components of the white light, so that the transmitted light provides an image in terms of the blue, green and red original colors. Subtractive color systems include all of the instant print and large format transparency materials. In additive color photography, the three color records are separated laterally by an array, or screen, of blue, green, and red filter elements superposed over a single panchromatic silver halide emulsion. Processing produces a positive transparency comprising black-andwhite records in silver of the respective blue, green, and red components of the original scene. When these silver images are projected in registration with the superposed color screen, the images add to reproduce the full color image. Additive color photography is used only for transparencies because the minimum density is too high for reflection prints. Polachrome 35-mm slide films, discussed in a later section, use additive color. Subtractive Color Imaging Systems There are numerous systems for producing dye images from the records formed in a set of blue-, green-, and red-sensitive emulsions that constitute a multilayer instant color film (4,6,30). Although image dyes may be formed in situ by chromogenic color development, as in noninstant color films, in most instant color processes preformed dyes are transferred from the negative to an image-receiving layer. The dyes or dye precursors may be initially diffusible in alkali, in which case they will be immobilized imagewise, or they may be initially immobile in alkali and released imagewise to transfer. Positiveworking processes produce dye transfer inversely related to the developed silver density; conversely, negativeworking processes produce dye transfer density in direct proportion to the developed silver. The use of a negative-working emulsion, in which exposed grains develop to form a negative silver image, in a positive-working process results in the formation of a positive dye transfer image. In negative-working dye processes, positive images may be obtained by
834
INSTANT PHOTOGRAPHY
using direct reversal, or direct positive, emulsions, which develop unexposed rather than exposed grains to form positive silver images. As an alternative to using direct positive emulsions, unoxidized developer remaining after development of a negative-working emulsion may reduce prefogged silver in an adjacent layer; the resulting oxidized developer may then couple to release dyes that form positive dye transfer images (31,32). Dye Developer Processes. The first instant color film, Polacolor, introduced the dye developer (33), a bifunctional molecule comprising both a preformed dye and a silver halide developer. The multilayer Polacolor negative (30) comprises a set of three negative-working emulsions, blue-, green-, and red-sensitive, respectively, each overlying a layer containing a dye developer complementary in color to the emulsion’s spectral sensitivity. During processing, development of exposed grains in each of the emulsions results in oxidation and immobilization of a corresponding portion of the contiguous dye developer. Dye developer that is not immobilized migrates through the layers of the negative to the image-receiving layer to form the positive image. In addition to having suitable diffusion properties, dye developers must be stable and inert within the negative before processing. After completion of the process, the dye developer deposited in the image-receiving layer must have suitable spectral absorption characteristics and stability to light. The requirements of a developer moiety for incorporation into a dye developer are well fulfilled by hydroquinones. Under neutral or acidic conditions, hydroquinones are very weak reducing agents, and the weakly acidic phenolic groups confer little solubility. In alkali, however, hydroquinones are readily soluble, powerful, developing agents. Dye developers containing hydroquinone moieties have solubility and redox characteristics in alkali related to those of the parent compounds. Among the first dye developers were simple azohydroquinones, which have hydroquinone and a coupler attached directly through an azo group to form the chromophore, and hydroquinone-substituted anthraquinones. These compounds were very active developing agents, and they diffused readily to form excellent positive images. However, because the hydroquinone was an integral part of the chromophore, the colors shifted due to pH changes and the oxidation state of the hydroquinone. To circumvent such unwanted color shifting, alkylene links that interrupt conjugation were added as insulation between the chromophore and developer moieties (34).
Insulated Dye Developers. In the course of developing the Polacolor and SX-70 processes, many insulated dye developers were synthesized and investigated. An extensive review of this work is available (35). Each of the insulating linkages, chromophores, and developer moieties can be varied. Substituents on the developer modify development and solubility characteristics; substituents on the chromophore modify the spectral characteristics in both color and light stability. The attachment of two dyes to a single developer by amide linkage has also been described (36).
(a)
O OH
CNHC6H13 CH2CH2
N
N N HO
N
OH
(b)
OH OH N
N
CH2CH2
OH OCH(CH3)2 (c)
CH3
OH
CHCH2 OH
O
HN OH
OH OH
O
HN CHCH2 CH3 OH
Figure 6. Dye developers used in Polacolor (1963): (a) yellow [14848-0008-9]; (b) magenta [1880-52-5]; and (c) cyan [2498-16-0].
Pyrazolone dyes are particularly versatile yellow chromophores; the yellow dye used in Polacolor, the first instant color film, was a pyrazolone dye. The structures of the three Polacolor dyes are shown in Fig. 6. The study of azo dyes derived from 4-substituted 1-naphthols led to the chromophore used in the Polacolor magenta dye developer. The Polacolor cyan dye developer contained a 1,4,5,8-tetrasubstituted anthraquinone as the chromophore (27). Figure 7 shows the structures of the metallized dye developers used in the first SX-70 film (1972) and in Polacolor 2 (1975). The images formed by these dye developers are characterized by very high light stability (38). The metallized cyan dye developer is based on a copper phthalocyanine pigment (39). Incorporation of developer groups converts the pigment into an alkali-soluble dye developer. A study of the chromium complexes of azo and azomethine dyes led to the design of the magenta and yellow metallized dye developers (40). The latter was derived from the 1 : 1 chromium complex of an o,o dihydroxyazomethine dye. Later versions of both integral and peel-apart color films use a stable nonmetallized magenta dye developer [78052-95-6] (1) containing a xanthene dye moiety (41,42).
INSTANT PHOTOGRAPHY OC3H7
(a)
835
(b)
CH
N
NO2 CH3
C3H7O
O
(HOCH2CH2)2NO2S
O Cr
O
N
N N
OH2
O
O
N
O Cr OH2
OH O
CH2·CH2
O OH CH2·CH2
OH OH (c)
OH OH
HO CH2
HO SO2NHCHCH3
CH2 CH3CH
HNSO2
N
N N N
• • •
Cu • • •
N
N N CH3CH
N
SO2NHCHCH3 CH2
HNO2S
OH CH2 OH
HO
HO
Figure 7. Metallized dye developers used in SX-70 film (1972) and Polacolor 2 film (1975): (a) yellow [31303-42-1]; (b) magenta [59518-89-7]; and (c) cyan [28472-22-2].
This dye developer has much less unwanted blue absorption than previous magenta dye developers and thus provides more accurate color rendition (43). OH CH3 CH3 N OH
CH2 CH2 CH2 CH3 O
N
OH OH
+
C3H6
SO−3
C3H6 OH
OH (1)
Color-Shifted Dye Developers. Although none of the commercialized films incorporate color-shifted dyes at this time, such compounds have been investigated extensively. These materials offer the option of incorporating the dye
developer and the silver halide in the same layer without losing speed through unwanted absorption of light by the dye developer. Prototypes are shown in Fig. 8. The magenta azo dye developer shown in Fig. 8a has been color-shifted by acylation (44). In the form shown, its color is a weak yellow. Indophenol dye developers colorshifted by protonation have been described (45). The cyan indophenol dye developer, Fig. 8b, shows little color when the chromophore is in the protonated form. In alkali, the dye is ionized to the colored indophenoxide form, and it may be stabilized in this state by association with a quaternary mordant in the receiving layer. Another approach involves the formation of dye images from colorless oxichromic developers, leuco azomethines stabilized against premature oxidation by acylation and linked to developer moieties (46), as shown in Figure 8c. The transferred images are oxidized in the colored form either by aerial oxidation or by oxidants in the receiving layer.
836
INSTANT PHOTOGRAPHY
(a)
O OH
OCCH3 N
CH2CH2
N
OH
OCH3
(b)
O OH
NHCO
OC5H11
(CH2)4CONH N OH
Cl
Cl OH
(c)
OH OH
NHCO
OC5H11
(CH2)4CONH NCOCF3
Release by Oxidation. Dyes or dye precursors may be released from alkali-immobile compounds through interaction with the mobile oxidized form of a developing agent. Mobility of the oxidized developing agent is important because both the silver halide and the dye or dye precursor are initially immobile in alkali and thus cannot interact directly. The mobile oxidized developing agent may participate directly in dye release, or it may act as an electron-transfer agent between silver halide and a dye releaser that is initially a reducing agent. For alkali-mobile image-forming species, such as the alkali-mobile dye developers, the use of electron transfer by auxiliary developers is optional, whereas for alkaliimmobile species, the use of an electron-transfer agent may be essential to the process. The image-related transfer of a diffusible dye formed as a product of the oxidation of a dye containing a developing agent moiety was described in 1966 (48). The process depends on the preferential transfer of an oxidation product that has greater mobility than the unoxidized species. The use of images in terms of oxidized developer to release dyes initially immobilized through a sulfonamide linkage has been described (49). In one approach, color coupling leads to ring closure and concomitant release of an alkali-soluble dye [Eq. (4)]. Here, R = H or alkyl, and X = an immobilizing group.
OH
OH
NH2 Cl
X
Cl
+
OH
Figure 8. Color-shifting dye developers: (a) yellow dye developer [16044-30-7] that becomes magenta upon hydrolysis; (b) cyan dye [50695-79-0] that becomes colorless upon protonation; and (c) the leuco form of a dye developer [50481-86-2] that becomes yellow upon hydrolysis and oxidation.
NHSO2
N
Ag+
Cl
R
dye OH−
O
R X
N−
Additional Chromophores. Other types of dyes that have been studied as chromophores in dye developers include rhodamine dyes, azamethine dyes, indophenol dyes, and naphthazarin dyes. Cyanine dyes, although not generally stable enough for use as image dyes, have also been incorporated in dye developers (47). Dye Release Processes. Dyes or dye precursors that do not participate directly in the development process and are initially immobile or have low mobility in alkali may be released by agents generated imagewise during development. One of the advantages of such a process is that the released species may themselves be unreactive with respect to other components of the negative. Dyes may thus diffuse through layers of the negative to reach the image-receiving layer without undergoing unwanted reactions. Dye release may relate either directly or inversely to the image-related reduction of silver halide. Release of the dye or dye precursor may be accomplished or initiated by the oxidized developing agent or the unoxidized developing agent or alkali or silver salts.
SO2
dye
N
O−
R
X N N−
SO2
dye
N +
R
N R O− X N N
+ dye
SO2− R
N R
R
(4)
INSTANT PHOTOGRAPHY
positive image may be obtained from such a negativeworking process by using a direct positive emulsion, so that the unexposed, rather than the exposed grains, undergo development. Alternatively, both the immobile dye releaser and silver-precipitating nuclei are included in a layer adjacent to the emulsion layer, and the processing reagent includes a silver halide solvent. In unexposed regions, soluble silver complex diffuses into the layer containing the nuclei and the immobile dye releaser. Development of silver is catalyzed by the nuclei, and the resulting oxidized developing agent cross-oxidizes the dye releaser, as in Eq. (5). The oxidized dye releaser undergoes ring closure, and the mobile dye is released for transfer to a receiving layer. The transferred image in this case is positive. Immobile p-sulfonamidonaphthol dye-release compounds (Fig. 9) were used in the discontinued Kodak instant color films (50). The compounds undergo imagewise oxidation to quinonimides through interaction with an oxidized developing agent, followed by alkaline hydrolysis of the quinonimides to release soluble, diffusible image-forming dyes and immobile quinones [Eq. (6)]. Here, X and Y are immobilizing groups.
A second approach uses the oxidation of a low-mobility 4-hydroxydiphenylamine to which an image dye is linked through a sulfonamide group. Oxidation and hydrolysis result in ring closure and release of the alkali-soluble dye [Eq. (5)]. Here, R = alkyl, and X = an immobilizing group. OH
OH +
N
NH R
H
NHSO2
dye
X O
OH−
Ag+
(5) N− SO2
837
dye
N
OH O−
X
X
N N
O
NHSO2
SO−2
+ dye
Y + developerox
X
dye
NSO2 OH−
O X
Y + dye
X These release processes are negative-working; dye is released where silver development takes place. A
(a)
(b)
SO2NH−
OH
t-C5H11
CONH(CH2)4O
t-C5H11
t-C5H11
t-C5H11 NHSO2
NHSO2 NHSO2
N
OCH3
CH3SO2N
N N
N
CN
SO2NHC(CH3)3 HO
N
OH
Figure 9. p-Sulfonamidonaphthol dye releasers used in Kodak integral instant films: (a) yellow; (b) magenta [67041-93-4]; and (c) cyan [42905-20-4].
dye (6)
+ developerred
O
OH CONH(CH2)4O
Y
838
INSTANT PHOTOGRAPHY (c)
OH
t-C5H11
CONH(CH2)4O
OH
t-C5H11 NHSO2 SO2HN
NO2
N
N
NHSO2
SO2CH3
Figure 9. (Continued)
Further examples of image-derived labilization and release by hydrolysis include redox dye releasers that contain ballasted hydroquinones using sulfonyl, oxy, or thio linkages. Imagewise oxidation to form quinones, followed by alkaline hydrolysis, releases mobile image dyes (53). Corresponding hydroquinone derivatives without such ballast are dye developers that transfer in nondeveloping regions to form positive images (54). A dye-release system that yields positive images may also be based on immobile benzisoxazolone dye releasers, such as (2), where X is an immobilizing group and R and R = alkyl. For these compounds, oxidation prevents
Fuji instant color films are based on a similar dye-release mechanism using the o-sulfonamidophenol dye-release compounds shown in Fig. 10 (51). Similarly, immobile p-sulfonamidoanilines, where Y is H and the phenol group is replaced by an NHR moiety, may be used as dye releasers. o-Sulfonamidophenol dye-release compounds are also used by Fuji in the photothermographic printing process introduced in 1987 as Fujix Pictrography (52). In this material, the chosen developer moiety of the dye releaser is a weak reducing agent at room temperature, and processing is effected at 90 ° C.
OCH2CH2OCH3
(a)
OCH2CH2OCH3
(b)
OH
OH NHSO2
NHSO2 NHSO2
CH3
OCH3
OC16H33-n
n-C16H33O
N N
N
CH3 CH3SO2NH
N
CN
HO
SO2N(C2H5)2
N OH
N
OCH2CH2OCH3
(c) OH NHSO2
SO2NH
CH3 OC16H23-n O2N
N
OH
N SO2CH3
Figure 10. o-Sulfonamidonaphthol dye releasers used in early Fuji instant films: (a) yellow [06249-54-6]; (b) magenta [73869-83-7]; (c) cyan.
INSTANT PHOTOGRAPHY
cleavage. In alkali, the heterocyclic ring opens, forming a hydroxylamine, and in unexposed areas, where silver halide is not developing, the hydroxylamine cyclizes to form a second benzoxazolone (3), eliminating the dye moiety. In exposed areas, silver halide is reduced by a mobile developing agent, which in turn transfers electrons to oxidize the hydroxylamine to the nonreleasing species (4).
RN dye O
RN
CH2R′
C
N X
OH
O
(CH2)3SO3H (7)
O− O−
X C
O
O [O]
(7)
RN dye O
O
CHR′
C
O NCH2R′
N X C
O O− X
O− + dye
RNH
C
O
O
(4)
OC16H33
N
N
−
(2)
S SO2NHN
CH2R′
C
O
These compounds are oxidized by oxidized p-phenylene developing agents, yielding azosulfones that undergo alkaline cleavage to release soluble moieties (56). To minimize stain, the developing agent may itself be rendered immobile and used in conjunction with a mobile electrontransfer agent (57).
dye
dye
Displacement by Coupling. The coupling reactions of color developing agents have been used as release mechanisms for initially immobile image dyes or dye precursors (58). The dye-releasing coupler may have substituents in the coupling position that are displaced in the course of the coupling reaction with an oxidized color developer. In one process, the eliminated substituent is the immobilizing group, so that the dye formed by coupling is rendered mobile. Compound (8) [513515-9] is an example of a coupler that splits off an immobilizing group in this manner. In a related process, the coupler itself is the immobile moiety, as in (9) [4137-16-0], and the substituent that is split off is a mobile dye.
(3) OH
Another positive-working release utilizing cyclization, illustrated by Eq. (8), starts with an immobile hydroquinone dye releaser (5), where R = alkyl and X is an immobilizing group. Cyclization and dye release take place in alkali in areas where silver halide is not undergoing development. In areas where silver halide is being developed, the oxidized form of the mobile developing agent oxidizes the hydroquinone to its quinone (6), which does not release the dye (55).
C2H5
O
COOH S
CONHC18H37 (8)
X
dye
NR OH−
R OH
OH
O
NCO
COOH
CON
O OH
839
CONH(CH2)4O
t-C5H11
X t-C5H11
OH−,[O]
(5)
O +
O
O
NCO X
N
−
dye
(8)
O−
N C2H5 N
N
C2H4SO3Na
dye
R O (6)
Redox dye-release systems based on immobile sulfonylhydrazones, as in compound (7), were disclosed in 1971.
N CH3
(9)
Dye Release by Reduction. The early Agfa-Gevaert instant reprographic processes, Agfachrome-Speed and Copycolor CCN, introduced a method of releasing positive dye images by reducing the alkali-immobile dye-releasing quinone compounds shown in Fig. 11 (59).
840
INSTANT PHOTOGRAPHY (a)
OC16H33
O
R′′′
O R′′
CH3
OH dye
CHSO2
R′′
R′′′ CHSO2 dye
reduction
CHSO2 O
R′
OCH3
CH3
R′
R O
N
R′′ N
+ HSO2 dye
R′
(b) CH3
N
Dye Release by Ring-Opening, Single-Electron Transfer. A positive-working system based on ring opening by the cleavage of a single N−O bond in a 2-nitroaryl4-isoxazolin-3-one compound, such as (10) has been described (60). In exposed areas, silver halide reduction is accompanied by oxidation of an electron-transfer agent, which cross-oxidizes an electron donor; in unexposed areas, the electron donor initiates the ring opening and consequent dye release (61). This system, ring opening by single electron transfer (ROSET), was first used in Fuji Colorcopy DC photothermographic color print material provided for use in the Fuji Colorcopier DC3000 (62). The ROSET chemistry is used in Pictrostat 100, 300, and 50 systems.
NHSO2CH3 CHSO2 O
CH3 N
N CN
(c) OH
R OH
OC16H33
O
C13H27
R
CHSO2
CH3
(CH2)2CH3
CH3
(9)
CHR′′′
CH3 N
O
OH−
O
R OH
O
O N O2N
ballast
SO2NH OH
N SO2CH3
Figure 11. Dye releasers used in Agfachrome-Speed and Copycolor films: (a) yellow [85432-41-3]; (b) magenta [80406-97-9]; (c) cyan.
These compounds were incorporated within the lightsensitive layers along with a ballasted electron donor compound. During processing, there was competition between the oxidized developer and the reducible dye releaser for reaction with the electron donor. In exposed areas, silver halide developed, and the oxidized developer that formed was reduced by the electron donor. In unexposed areas, the electron donor reacted instead with the dye releaser, reducing its quinone moiety. The reduced dye releaser in turn underwent a quinone methide elimination in alkali, releasing a mobile dye, as shown in Eq. (9). The system produced positive color images from negative-working emulsions. R, R , R , and R are all alkyls.
NO2
N O
CH2
dye
(10)
Dye Release by Silver-Assisted Cleavage. A soluble silver complex formed imagewise in the undeveloped areas of the silver halide layer may be used to effect a cleavage reaction that releases a dye or dye precursor. Polaroid Spectra film introduced silver-assisted thiazolidine cleavage to produce the yellow dye image (63), a system subsequently used in Polaroid 600 Plus, later designated Polaroid 600 Platinum, and in Polacolor Pro 100 films. The process yields positive dye transfer images directly from negativeworking emulsions (64). An example is the silver-assisted cleavage of a dye-substituted thiazolidine compound, as shown in Eq (10). The dye is initially linked to a ballasted thiazolidine, which reacts with silver to form a silver iminium complex. The alkaline hydrolysis of that complex yields an alkali-mobile dye. Concomitantly, the silver ion is immobilized by reaction with the ballasted aminoethane thiol formed by cleavage of the thiazolidine ring.
INSTANT PHOTOGRAPHY
Ag+ CH3
S
dye
CH3
N
CH3
S
Ag+
dye
CH3
N
ballast
ballast
(10) Ag dye
CH3
S
CHO + Ag
OΗ−
CH3
N
2
ballast
O−K+
OH dye
dye
+ 2 KOH
+ 2 H2O
CH3
OH
(11)
O−K+
OH
CH3
O−K+
+ 2 KOH
+ 2 H2O
(12)
O−K+
OH [10551-32-3]
CH3
O−K+
CH3
O−K+
+ AgBr
(13)
O−K+
O• + Ag° + K+ + Br−
CH3
O−K+
+
O−K+
dye O−K+
O•
(14)
O−K+
O•
CH3
+
dye O−K+
that do not readily undergo alkaline hydrolysis in the absence of silver ions. In a study of model compounds, the rates of hydrolysis of model N-methyl thiazolidine and N-octadecyl thiazolidine compounds were compared (65). An alkaline hydrolysis half-life of 33 min was reported for the N-methyl compound, a half-life of 5,525 min (3.8 days) was reported for the corresponding N-octadecyl compound. Other factors that affect the kinetics include the particular silver ligand chosen and its concentration (66). Dye Formation Processes
CH3 S dye CH C +N CH CH3
ballast
841
O−K+
Discrimination between exposed and unexposed areas in this process requires selecting thiazolidine compounds
Color Development. Experimental chromogenic negatives included a structure comprising interdigitated multilayer color-forming elements (30). Because of the advent of dye developers, chromogenic three-color work was discontinued. However, a monochrome ‘‘blue slide’’ film, PolaBlue, was introduced by Polaroid in 1987. Color development of the exposed film forms a nondiffusing dye image within the emulsion layer (67). In 1988, a chromogenic photothermographic process based on immobile couplers and a color developer precursor was described (68). In another process, instead of color coupling to form the image dyes, coupling was used to immobilize preformed dyes in exposed areas (69). Unreacted coupling dye transferred to form a positive image. Each of the coupling dyes incorporated the same coupler moiety, so that all would react at the same rate with the oxidized developing agent. Coupler-developers comprising a color coupler and a color developer linked together by an insulating group have been described (70). In areas where silver halide undergoes development by such a compound, the couplerdeveloper is oxidized and immobilized by intermolecular coupling. In undeveloped areas, the coupler-developer transfers to a receiving layer, where oxidation leads to intermolecular coupling to form a positive dye image. Other Colorless Dye Precursors. Color images may be produced from initially colorless image-forming compounds other than traditional chromogenic couplers. The use of colorless compounds avoids loss of light by unwanted absorption during exposure, and the colored form is generated during processing. A leuco indophenol may be used both to develop a silver image in the negative and to form an inverse, positive image in the receiving layer (71). Colorless triazolium bases have also been proposed as image dye precursors (72). In unexposed regions, where silver halide is not undergoing development, unused developing agent reduces such a base to its colored form, rendering it immobile. In developing regions, the base may migrate to a receiving layer and there undergo reduction to form a negative dye transfer image. Stabilization of Dye Images. Regardless of the method of forming the component image dyes, a major consideration is the stability of the final color image in the receiving layer. Unlike the dye images produced by bath processes, the instant image is formed in a single processing step and is not washed or treated afterward. Therefore, the instant process must incorporate both image-forming reactions and reactions that provide a safe environment for the finished image. In integral films, the positive color image
842
INSTANT PHOTOGRAPHY
layer and also the negative layers that remain concealed within the film unit must be rendered inert. Subtractive instant color films provide for image stabilization by including polymeric acid layers that operate in conjunction with timing layers to provide a carefully timed reduction in pH following image formation. Reducing the pH terminates development reactions and stops the migration of dyes and other alkali-mobile species, thus stabilizing both the dye image and its environment. Stabilization by timed pH reduction was first provided in the original Polacolor film and continued in later Polacolor films. As shown in Fig. 12, the Polacolor positive sheet includes a timing layer and a polymeric acid layer (73). In integral films, the timing layer and the acid polymer layer may alternatively be placed beneath the emulsion layers (74). The permeability of the timing layer determines the length of time before alkali reaches the polymeric acid layer and begins to undergo neutralization. The consumption of alkali rapidly reduces the pH of the image-receiving layer to a level conducive to print stability. To provide for suitable timing of the pH reduction across the wide range of temperatures that may be encountered, the instant films may use polymeric timing layers in which permeability to alkali varies inversely with temperature. In integral films, where all components are retained within the film unit after processing and the moisture content remains high for several days, care must be taken to avoid materials that migrate or initiate unwanted reactions even at reduced pH. Another concern is the stability of image dyes to prolonged exposure to light. Many dyes that would otherwise be suitable as image dyes undergo severe degradation upon such exposure and are particularly sensitive to UV irradiation. Important considerations include the dye structures and the structure of the mordant in the image-receiving layer, as well as the chemical environment after processing ends. The metallized dyes introduced in SX-70 and Polacolor 2 films and included in later films have outstanding stability to light and are also highly stable in dark storage (75). The light stability of certain nonmetallized dyes may be increased by posttransfer metallization. For integral films, further protection against UV attack may be afforded by incorporating UV absorbers in the transparent polyester cover sheet. This sheet also provides protection against physical damage and attack by external chemicals. Similarly, Polacolor images show enhanced stability when laminated in plastic covers that provide UV filtration and exclude air and moisture from the image environment. Spray coating of large format Polacolor images, such as used in early Polacolor museum replicas, using a solution of a UV absorber in a polymethylmethacrylate carrier, also afforded protection against prolonged light exposure.
Stability of Instant Dye Images. As in noninstant color images, the changes in an image as it ages depend on the conditions to which the images are subjected. Detailed recommendations for displaying, storing, and preserving instant photographs have been published (76). Testing instant color images in dark storage and under both accelerated and normal light exposure has enabled
Paper base Acid polymer Timing layer Image-receiving layer View Expose Anti-abrasion layer Blue-sensitive emulsion Yellow dye developer Interlayer Green-sensitive emulsion Magenta dye developer Interlayer Red-sensitive emulsion Cyan dye developer
Base Figure 12. Schematic cross section of Polacolor positive and negative sheets (not to scale). During processing, pressure rollers rupture a pod attached to one of the sheets, thereby releasing a viscous reagent. The viscous reagent temporarily laminates the two sheets and initiates processing. As the exposed silver develops in situ, the associated dye developers undergo oxidation and immobilization. In unexposed regions, the dye developers diffuse from their respective layers to the image-receiving layer, where they form the positive color image. After the required processing time, the two sheets are stripped apart. The reagent adheres to the negative, which is discarded, and the finished color print is ready for viewing, The timing layer and the acid polymer layer, which underlie the image-receiving layer, provide a timed pH reduction that stabilizes the color print.
evaluation and estimates of the longevity of instant color prints. Polacolor 2 images, which used the metallized dyes described earlier, and Polacolor ER prints, in which the magenta dye is a nonmetallized xanthene dye, both have shown significantly higher light stability than original Polacolor images (77). At elevated temperatures in dark storage and upon prolonged exposure to daylight, some yellowish stain was reported for both early integral and early peel-apart films (78). Standardized testing established in 1991 provides protocols for evaluating the stability of both instant and noninstant products (79).
INSTANT PHOTOGRAPHY
INSTANT COLOR FILM FOR INSTANT CAMERAS
Clear polyester
Polacolor The first instant color film, Polacolor, introduced by Polaroid Corporation in 1963, used dye developers for image formation (see earlier section on Dye Developer Processes). To use negatives that have multiple continuous layers, hold-release mechanisms were developed to keep each dye developer in close association with the emulsion designated to control it until development had progressed substantially (30,80).
843
Acid polymer Timing layer Image-receiving layer Pod
Expose.
View
Antiabrasion layer Blue-sensitive emulsion Colorless developer Yellow dye releaser
Polacolor Negative Structure. In addition to designing and selecting dye developers characterized by a high development rate relative to the diffusion rate, useful measures included introducing small quantities of auxiliary developers and intercalating temporary barrier layers between the three monochrome sections of the negative. Figure 12 shows the layered structure of the original Polacolor negative. Auxiliary Developers. Although using auxiliary developers as electron-transfer agents is not necessary for alkali-mobile dye developers, it is usually advantageous. The auxiliary developer is smaller and more mobile than dye developers and can reach the exposed silver halide grains more rapidly to initiate development. The oxidation product of the auxiliary developer in turn oxidizes and immobilizes the associated dye developers. Some auxiliary developing agents that have been used in this way are Phenidone, Metol, and substituted hydroquinones, such as 4 methylphenylhydroquinone [10551-32-3], C13 H12 O2 (81), either alone or with Phenidone (82). Barrier Interlayers. Depending on composition, barrier layers can function simply as spatial separators, or they can provide specified time delays by swelling at controlled rates or by undergoing reactions such as hydrolysis or dissolution. Temporary barriers assist in preventing interimage cross talk. For example, a temporary barrier prevents cyan dye developer that is migrating toward the image-receiving layer from competing with the overlying magenta dye developer in reducing the green-sensitive emulsion in exposed areas. Preproduction dye developer negatives used a combination of cellulose acetate and cellulose acetate hydrogen phthalate as barrier layers. The images produced from these negatives were outstanding in color isolation, color saturation, and overall color balance. However, solvent coating was required for this composition, and therefore, it was not used in production. The original Polacolor negative had water-coated interlayers of gelatin (83). Later Polacolor negatives, starting with Polacolor 2, and integral film negatives use combinations of a polymer latex and a water-soluble polymer as interlayers. A key development was the synthesis of latices that would function as temporary barriers to reduce interimage problems. The water-soluble polymer functions as a permeator, so that the barrier properties are tunable (84). Additional
Yellow filter dye Interlayer (silver scavenger) Green-sensitive emulsion Spacer Magenta dye developer Interlayer Red-sensitive emulsion Spacer Cyan dye developer Opaque polyester Figure 13. Schematic cross section of Polacolor Pro, a hybrid peel-apart film. The yellow image is formed by dye release based on silver-assisted cleavage of a yellow dye releaser. The magenta and cyan image components use dye developers, as in the earlier Polacolor films.
interlayers may contain filter dyes, pigments, or reactive components. Polacolor Receiving Sheet. The original Polacolor receiving sheet comprised three active layers (Figs. 12 and 13). The outermost layer was an image-receiving layer composed of polyvinyl alcohol and the mordant poly(4vinylpyridine), which immobilizes image dyes (85). Next was a polymeric timing layer and then an immobile polymeric acid layer. In subsequent Polacolor receiving sheets, the permeability of the timing layer to alkali was greater at lower temperatures than at elevated temperatures, thereby providing appropriate timing of pH reduction across a range of temperatures before the alkalinity of the system is reduced (86). The removal of alkali from the image-receiving layer prevents salt formation on the print surface after the negative and positive sheets are stripped apart. The reduction of pH in the image layer proceeds to completion after the sheets are separated. Image Formation and Stabilization. The sequence of reactions responsible for image formation and stabilization begins as the exposed negative sheet and the imagereceiving sheet pass together through a pair of rollers, rupturing the reagent pod attached to one sheet and releasing a viscous reagent. The reagent temporarily
844
INSTANT PHOTOGRAPHY
laminates the two sheets together. Alkali in the reagent permeates the layers of the negative, ionizing and solubilizing each of the dye developers and an auxiliary developer contained in layers of the negative. As the reagent reaches the emulsions, they undergo development in exposed regions, and the associated dye developers become oxidized and immobilized. In unexposed areas, dye developer molecules migrate to the image-receiving layer and are mordanted and immobilized there, forming the positive dye image. The auxiliary developer assists in color isolation by serving as an electron-transfer agent between the less mobile dye developers and the exposed silver halide. As the auxiliary developer is oxidized by the development of silver, it yields an oxidation product that in turn oxidizes dye developer molecules. Further control of the development process is effected by alkali-releasable antifoggants, such as phenylmercaptotetrazole, contained in the dye developer layers. The polymeric acid layer in the receiving sheet terminates the process. Acting as an ion exchanger, the acid polymer forms an immobile polymeric salt with the alkali cation and returns water in place of alkali. Capture of alkali by the polymer molecules prevents deposition of salts on the print surface. The dye developers thus become immobile and inactive as the pH of the system is reduced. Polacolor processing takes approximately 60 s. Within this period, the image-forming and image-stabilizing reactions proceed almost simultaneously. When image formation is concluded by stripping apart the negative and positive sheets, the print is ready for viewing; its surface is almost dry, and the pH of the image layer is rapidly approaching neutrality. The continuation of pH reduction after the sheets are separated is important for print stability because it prevents reactions that might otherwise result in highlight staining. The surface dries quickly to a hard gloss, and the finished print is durable and stable. Polacolor 2; Polacolor ER. In 1975, Polacolor was replaced by Polacolor 2, a film that had improved light stability provided by the metallized dye developers shown in Fig. 7. An extended range film, Polacolor ER(1980), used cyan and yellow metallized dye developers and introduced a magenta dye developer [78052-95-6] that incorporates a xanthene dye (1). This dye was chosen because it reduced unwanted blue absorption, thereby providing improved color fidelity. Polacolor ER films are balanced for daylight exposure and are rated at ISO 80. Polacolor Pro; Professional Polacolor Films. Hybrid Polacolor negatives, starting with Polacolor Pro (1996), use dye developers for cyan and magenta in combination with silver-assisted cleavage to generate the yellow image (87). Hybrid negatives had been introduced in 1986 in the integral film Spectra. (Polaroid integral films are discussed further in the following section.) These films also use a completion developer, such as 2,5-di-tert-butyl hydroquinone, which reduces nonimage silver and lessens reciprocity failure. The use of a completion developer also contributes to increased color fidelity and whiter whites.
In hybrid negatives, interlayers act as barriers and may also contain filter dyes, pigments, or reactive components. A feature introduced by Type 600 integral film and included in hybrid Polacolor negatives involves incorporating pigmented spacer layers behind the bluesensitive and red-sensitive layers. These layers contribute to the efficiency of the negatives. A new interlayer in hybrid negatives contains a colorless hydroquinone-based developer for the blue-sensitive emulsion (88). Another interlayer provides a silver scavenger that prevents release of yellow dye by silver ions originating in the underlying green- and red-sensitive layers. Figure 13 is a schematic section of the Polacolor Pro hybrid negative. The negative comprises three layered color sections: the blue-sensitive emulsion, its colorless developer, the yellow dye releaser, and a yellow dye filter layer; the green-sensitive emulsion and the magenta dye developer; the red-sensitive emulsion and the cyan dye developer. Spacers and additional interlayers separate these three sections. During exposure, the yellow filter layer and each of the dye developers function as antihalation dyes for the respective emulsion layers above them. The yellow filter dye protects the green- and redsensitive layers below from exposure to blue light, and the magenta dye developer protects the red-sensitive emulsion below from exposure to green light. Polacolor Pro daylight films are rated at ISO 100, and a tungsten version is rated at ISO 64. Formats include both pack films and sheet films. Film types include Polacolor 579, Polacolor 679, and Studio Polaroid. Polaroid Integral Films The first integral film, SX-70, and the SX-70 single lens reflex camera, described in Fig. 14, were introduced in 1972. These products heralded totally new concepts in instant photography (89). By using this system, incamera initiation of processing became fully automatic. The negative and the receiving sheet are sealed into a single unit, which also contains the pod. Immediately after exposure, the film unit passes between processing rollers and is ejected from the camera. During spreading, the reagent, which contains a white pigment, forms a new layer between the negative and the image-receiving layer. There are no air spaces in the multilayer structure after processing. Processing continues within the film unit under ambient conditions, and the emerging image becomes visible as processing reaches completion. The image is seen by reflected light against the new pigmented layer within the unit. Polaroid followed the original SX-70 film with Time Zero/SX-70 film (1979) and the higher speed 600 film (1981), used in cameras designed to accept only highspeed films. Spectra film (1986) and succeeding integral films are described in a later section. SX-70 Film Structure. The integral film format required some components quite different from those used in the peel-apart system. Figure 15 is a schematic cross section of the SX-70 film unit. The two sheets comprising the film unit were analogous to Polacolor negative and positive sheets. The two outermost layers, both polyester,
INSTANT PHOTOGRAPHY
845
Expose/View e d
Clear polyester Acid polymer
c
Timing layer Image-receiving layer a
Pod Antiabrasion layer
b
Blue-sensitive emulsion yellow dye develope Interlayer Green-sensitive emulsion
Figure 14. Schematic cross section of the original SX-70 camera in viewing mode. Rays of light pass through the camera lens to the permanent mirror (a), which reflects them downward to a decentered Fresnel mirror (b) that lies just over the image plane. The Fresnel reflects the light rays upward to mirror (a), which directs them through the small aperture (c) to a decentered aspheric mirror (d). This mirror reflects the rays through the decentered lens (e) to the eye. The folded light path facilitated the camera’s compact design. In the taking mode, the hinged unit swings up against mirror (a), and another mirror on the back of the Fresnel reflects the light rays downward to expose the uppermost film unit. In the original SX-70 camera, the surface of the Fresnel was roughened to form a focus screen. This system worked well at high light levels but was less accurate at low light levels. Later versions incorporated a small circular split-image rangefinder that comprised a pair of decentered Fresnel sections; decentering is the optical equivalent of tilting, and matching the two images provided accurate focus control. Still later models incorporated a fully automatic sonar rangefinder.
provided structural symmetry that ensured flatness of the film unit under all atmospheric conditions. The upper polyester sheet was transparent, and the lower one opaque black. Light passing through the upper sheet exposed the layers of the negative, and the picture was viewed through the same sheet. A durable, low index of refraction, quarter-wave antireflection coating on the outer surface of the transparent polyester minimized flare during exposure, increased the efficiency of light transmission during both the exposure of the negative and the viewing of the final image, and facilitated seeing the image through the polyester because of minimal surface luster (90). The color balance of an integral film exposed through the transparent polyester sheet may be adjusted by incorporating color-correcting filters in a layer of the positive sheet; these dyes are subsequently bleached to colorless form during processing (91). SX-70 Image Formation. A small pod concealed within the wide border of the SX-70 film unit contained the viscous reagent. As an exposed film unit was ejected through the processing rollers and out of the camera, the pod burst and the viscous reagent spread between the top negative layer and the image-receiving layer, forming a new layer
Magenta dye developer Interlayer Red-sensitive emulsion Cyan dye developer Opaque polyester Figure 15. Schematic cross section of an SX-70 integral film unit. The film is exposed through the clear upper sheet. As the reagent is spread from the pod, it forms a white pigmented layer. The image formed in the image-receiving layer is viewed against the reflective pigment layer.
comprising water, alkali, white pigment, polymer, and other processing addenda. As in Polacolor processing, alkali rapidly permeated the layers of the negative. In the exposed areas, silver halide was reduced, and the associated dye developer was oxidized and immobilized. In the unexposed areas, the dye developers, ionized and solubilized by the alkali, migrated through overlying layers of the negative to reach the image-receiving layer above the new stratum of pigment-containing reagent. The transferred image is visible through the transparent polyester sheet. Opacification. An important aspect of the SX-70 system is that the developing film unit is ejected while still sensitive to light. Hence both surfaces of the negative must be protected from ambient light during development. Two potential sources of fog must be prevented when the film emerges from the camera. The first, lateral light piping within the transparent polyester sheet, is prevented by incorporating a very low concentration of a lightabsorbing pigment, such as carbon black, in the sheet. A pigment concentration that increases transmission density normal to the surface by only 0.02 provides efficient absorption of light that otherwise could travel laterally by internal reflection within the sheet (92). Additional protection against further exposure is provided by the combined effects of opacifying dyes and light-reflecting titanium dioxide pigment included in the viscous reagent layer (93).
846
INSTANT PHOTOGRAPHY
The opacifying dyes used in the SX-70 and subsequent Polaroid integral films represent a new class of phthalein dyes (94) designed to have unusually high pKa values. The solution of an indicator such as those represented by compound (11) is highly colored when the dye is at or above its pKa value and becomes progressively less colored as the pH is reduced. The very high pKa values of these opacifying indicators are induced by hydrogen-bonding substituents located in juxtaposition to the hydrogen that is removed by ionization (95). The pKa values for a series of naphthalein opacifying indicators are shown below compound (11).
Expose/View Clear polyester Image-receiving layer Clearing layer Pod Antiabrasion layer Blue-sensitive emulsion Yellow dye developer Interlayer Green-sensitive emulsion
O
Magenta dye developer
O
Interlayer Red-sensitive emulsion
N X
N
H
H
Cyan dye developer
X
(11)
Timing layer Acid polymer
X
Y
pKa
CAS Registry number
H
H
11.1, 13.8
[68975-54-2]
COOH
COOH
13.2, > 15
[68975-55-3]
COOH
NHSO2C16H33
12.9, > 15
[37921-74-7]
The reduction of pH within the film unit is effected by a polymeric acid layer, as in the Polacolor process. The contiguous timing layer controls the onset of neutralization. In the original SX-70 film unit, these layers were on the inner surface of the transparent polyester sheet (Fig. 14) (16). In Time Zero/SX-70 and later Polaroid integral films, these layers are on the inner surface of the opaque negative support, as shown in Fig. 16 (96). Time Zero/SX-70 Film. The Time Zero/SX-70 film (1979) radically reduced the time required to complete the development process and also changed the appearance of the emerging print. In the original SX-70 film, the picture area initially showed the pale green color of the opacifying dyes. This color gradually faded during a few minutes’ time, as the pH dropped. In a Time Zero/SX-70 print, the image becomes visible against a white background only a few seconds after the film unit has been ejected, and the picture appears complete within about a minute. The faster appearance of the image and the change in background color reflect changes in film structure and composition. When the reagent is spread, it contacts a new clearing layer on the receiving layer surface. The clearing agent immediately decolorizes the opacifying dyes at the interface of the reagent layer and the receiver layer without affecting the opacifying dyes within the pigmented reagent layer. This decolorization provides a white background for the emerging image, and the opacifying dyes within the reagent layer continue to protect the developing negative (97).
Opaque polyester Figure 16. Schematic cross section of Time-Zero SX-70 integral film. In this film, the polymeric acid layer and the timing layer are located beneath the negative layers, rather than in the positive sheet. Time-Zero and all later Polaroid integral films have an antireflective layer coated on the outer surface of the clear polyester layer through which the image is viewed.
An effective clearing layer comprises a low molecular weight poly(ethylene oxide) hydrogen-bonded to a diacetoneacrylamide-co-methacrylic acid copolymer. The alkali of the reagent breaks the complex, releasing the ethylene oxide and decolorizing the nearest opacifying dyes. The Time Zero reagent incorporates poly(diacetoneacrylamide oxime) as a thickener. This polymer is stable at very high pH but precipitates below pH 12.5. Its use at concentrations below 1% provides the requisite reagent viscosity and permits rapid dye transfer (98). A further aid to rapid transfer is the use of negative layers appreciably thinner than those of the original SX-70 negative. The Time Zero film process has an efficient receiving layer consisting of a graft copolymer of 4-vinylpyridine and a vinylbenzyltrimethylammonium salt grafted to hydroxyethylcellulose (99). The fidelity of color rendition was improved in the Time Zero film by using the xanthene magenta dye developer (1), which has reduced blue absorption. Cameras using SX-70 films include both folding singlelens reflex models that can focus as close as 26.4 cm (10.4 in.), designated as SX-70 cameras, and nonfolding models that have separate camera and viewing optics. Pictures may be taken as frequently as every 1.5 s. The SX-70 picture unit is 8.9 × 10.8 cm, and has a square image area of approximately 8 × 8 cm. The SX-70 film
INSTANT PHOTOGRAPHY
pack contains 10 of these picture units, a flat 6-volt battery about the size of a film unit, and an opaque cover sheet that is automatically ejected when the pack is inserted into the camera. SX-70 films are balanced for daylight exposure and are rated at ISO 150. The temperature range for development is approximately 7–35 ° C. The range may be extended to 38 ° C by slight exposure compensation. Type 600 Film. The first high-speed integral film, Type 600 (1981), was rated at ISO 640, approximately two stops faster than the earlier SX-70 films. Features included highspeed emulsions and reflective pigmented spacer layers underlying the blue-sensitive and red-sensitive emulsion layers. The dye developers were the same as those used in Time Zero/SX-70 film. As in Time Zero film, the polymeric acid layer and the timing layer were located over the opaque polyester layer, below the negative layers, and the structure included an additional clearing layer between the reagent layer and the image-receiving layer. Although Type 600 film was similar in format to SX-70 films, the higher speed film could not be used in SX-70 cameras. New cameras, including a single lens reflex model, were provided for Type 600 film. Spectra Film. Spectra film was introduced in 1986 as part of a new camera–film system featuring a rectangular picture format, 7.3 × 9.2 cm. The Spectra system introduced the first hybrid film structure (87), as shown schematically in Fig. 17. Spectra film used silver-assisted thiazolidine cleavage to form the yellow dye image and dye developer image formation of the magenta and cyan images, as described in the earlier section on Polacolor Pro film. Among later integral films that follow the same hybrid scheme are Spectra Platinum (also known as Image Extreme), 600 Plus Film (1988), Spectra HighDefinition film (1991), 600 Plus HighDefinition (1993), and 600 Platinum. Variations include a number of specialty films, some incorporating preexposed patterns, and 600 Write On, a matte version of Platinum film that enables writing on the processed print surface. All are rated at ISO 640. Spectra films for specialized applications are obtained by preexposing the film to unique patterns. Polaroid provides to quantity users integral films that are customized by preprinted borders or messages within the image area. Polaroid Copy&Fax Film incorporates a preexposed halftone screen that improves the quality of images produced by photocopying or fax transmission of continuous tone photographs. In Polaroid GridFilm, a preexposed grid pattern provides a built-in scale for each print for measuring image features. Type 500 Films; Pocket Films. Type 500 film (formerly known as Captiva, Vision, and 95) uses the Spectra film structure and chemistry in a smaller format, and provides prints whose image area is 7.3 × 5.4 cm. This film was introduced with the Captiva Camera, now discontinued. The Captiva camera was unique in that the developing integral film units passed between pressure rollers and around the end of the camera to a storage
847
Expose/View Clear polyester Image-receiving layer Clear polyester Pod Antiabrasion layer Blue-sensitive emulsion Colorless developer Yellow dye developer Yellow filter dye Interlayer (silver scavenger) Green-sensitive emulsion Spacer Magenta dye developer Interlayer Red-sensitive emulsion Spacer Cyan dye developer Timing layer Acid polymer Opaque polyester Figure 17. Schematic cross section of Spectra integral film. The 600 Platinum film has a similar structure. In these films, the yellow image is formed by silver-assisted cleavage of a yellow dye releaser. A colorless developer reduces exposed silver halide in the blue-sensitive emulsion; in unexposed areas, dissolved silver diffuses to the dye releaser layer and triggers the release of the yellow image dye.
compartment in the back of the camera where each could be viewed as the image appeared. The same film is used in the JoyCam and in the one-time-use PopShots camera (100). Pocket films, used in pocket-sized I-Zone cameras (101), produce integral images 3.6 × 2.4 cm. Film speed is ISO 640, and average processing time is 3 minutes. Pocket Sticker Film has an adhesive layer so that the prints may be affixed to a variety of surfaces. The I-Zone Convertible Pocket Camera features changeable colored faceplates, and the I-Zone Digital and Instant Combo Camera enables the user to select between digital and instant film pictures. An accessory to the I-Zone camera is the Polaroid Webster, a handheld miniscanner, sized to scan I-Zone pictures and other small, flat objects. The scanner will store as many as 20 images and upload them to a PC. Polaroid Additive Color Films An additive color system was first used by Polaroid in the Polavision Super-8 instant motion picture system, introduced in 1977 (102), and later in Polachrome
848
INSTANT PHOTOGRAPHY
35-mm slide films. The Polaroid additive color screens comprise interdigitated stripes that range in frequency from 590 triplets/cm for Polavision to 394 triplets/cm for Polachrome. These stripes are fine enough to be essentially invisible when projected (103). To form the stripes, the film base is embossed to form fine lenticules on one surface, and a layer of dichromated gelatin is coated on the opposite surface. Exposure of the dichromated gelatin through the lenticules at an appropriate angle forms hardened line images in the dichromated gelatin. After washing away the unhardened gelatin, the lines that remain are dyed. These steps are repeated at different exposure angles to complete the formation of triads of red, green, and blue lines. Then the lenticules are removed. Polavision; Polachrome. The Polavision system, which is no longer on the market, included a movie camera and a player that processed the exposed film and projected the movie. The Polavision film was provided in a sealed cassette, and the film was exposed, processed, viewed, and rewound without leaving the cassette. Polaroid introduced Polachrome CS 35-mm slide film in 1982 (104) and a high contrast version, Polachrome HCP, in 1987. Polachrome films are provided in standard-size 35-mm cassettes.
Silver Image Formation and Stabilization. The integral additive transparency is based on the rapid, one-step formation of a stable, neutral, positive black-and-white image of high covering power behind the fine pattern of colored lines. The negative image formed concurrently has much lower covering power and is thus so inconsequential optically that it need not be removed to permit viewing the positive image (105). The negative image formed in the emulsion layer has only about one-tenth the covering power of the positive image. In Polavision, the negative image remained in place over the positive image after processing, whereas in Polachrome the negative is stripped away, so that only the positive silver image layer and the additive screen are present in the processed slide. To achieve good color rendition, it is necessary for the emulsion resolution to be significantly higher than that of the color screen. This requirement is fulfilled by using a fine-grained silver halide emulsion and coating the emulsion in a thin layer that contains minimal silver. The high covering power positive image is a mirror-like deposit formed in a very thin image-receiving layer. Because the image is transferred to a contiguous layer, the resolution is higher than for images transferred through a layer of processing fluid. The Polachrome film structure and the formation of a Polachrome color image are illustrated schematically in Fig. 18. The color screen is separated from the imagereceiving layer by an alkali-impermeable barrier layer. Over the image-receiving layer is a stripping layer, and above that are the panchromatic silver halide emulsion and an antihalation layer. Stabilizer precursors are contained in one or more of these layers (106). (The Polavision film structure was similar but did not include a stripping layer.) As indicated in Fig. 18, exposure of the film produces blue, green, and red records behind the respective color stripes.
Processing of Polachrome 35-mm film, carried out in an Autoprocessor (Fig. 4), is initiated when a thin layer of reagent applied to a strip sheet contacts the film surface. Exposed silver halide grains develop in situ; unexposed grains dissolve, and the resulting soluble silver complex migrates to the positive image-receiving layer, developing there to form a thin, compact positive image layer. After a short interval, the strip sheet and film are rewound. The strip sheet rewinds into the processing pack, which is subsequently discarded, and the film rewinds into its cartridge. The processed film is dry and ready to be viewed or to be cut and mounted for projection. Both during and after the formation of the negative and positive silver images, additional reactions take place. The antihalation dyes are decolorized, and stabilizer is released to diffuse to the image-receiving layer and stabilize the developed silver images. The developing agent, tetramethylreductic acid [1889-96-9] (12) (2,3-dihydroxy-4,4,5,5-tetramethyl2-cyclopenten-1-one), C9 H14 O3 , undergoes oxidation and alkaline decomposition to form colorless, inert products (107). As shown in Eq. (15), the chemistry of this developer’s oxidation and decomposition is complex. The oxidation product tetramethylsuccinic acid (13) is not found under normal circumstances. Instead, the products are the α-hydroxyacid (16) and the αketoacid (17).When silver bromide is the oxidant, a four-electron oxidation can occur to give (17). In model experiments, the hydroxyacid was not converted to the ketoacid. Therefore, it seemed that the two-electron intermediate triketone hydrate (14) in the presence of a stronger oxidant would reduce more silver, possibly involving a species such as (15) as a likely reactive intermediate. This mechanism was verified experimentally, using a controlled constant electrochemical potential. At potentials like that of silver chloride, four electrons were used; at lower potentials, only two were used (108). O
O CH3 CH3 CH3 CH3
OH
(O)−
CH3 CH3 CH3 CH3
OH
OH(−)
OH O
OH
(12)
CH3 COOH CH3 C CH3 C COOH CH3 C H OH
(14)
(16) (O) OH(−)
CH3 COOH CH3 C CH3 C COOH CH3
CH3
COOH CH3 C OH CH3 C C C OH CH3 OH
(13)
(15)
(15)
CH3 COOH CH3 C CH3 C COOH C CH3 O
(17)
Polachrome films are exposed in standard 35-mm cameras, and the slides are mounted in standard-size 5 × 5 cm (2 × 2 in.) mounts for use in conventional 35-mm projectors. Because of the density of the color base (about 0.7), Polachrome slides are most effective when viewed either in a bright rear projection unit or in a well-darkened projection room.
INSTANT PHOTOGRAPHY
849
(a) Antihalation dye layer Exposed AgX grains Unexposed AgX grains Release layer Protector layer for positive image Positive image receiver, Nuclei Protector layer for color screen Additive color screen
R
G
B
R
G
B
G
B
Clear polyester support
Exposure (b)
Processing fluid coating Antihalation dye layer Exposed AgX grains Dissolved AgX grains Release layer Protecter layer for positive image Positive image receiver, Nuclei Protector layer for color screen Additive color screen
R
G
B
R
Clear polyester support
(c)
Antihalation dye layer Developed grains (low covering power) Release layer Protector layer for positive image Developed positive grains (high covering power) Protector layer for color screen Additive color screen
Peeling occurs here
R
G
B
R
G
B
Clear polyester support
Figure 18. Schematic representation of the Polachrome process. (a) Film during exposure to green light, where only the green lines transmit light to the emulsion. In (b), processing has been initiated by the application of a very thin stratum of the viscous reagent, represented by arrows pointing downward. The reagent reduces exposed grains in situ and dissolves unexposed grains to form a soluble complex that migrates toward the receiving layer. (c) represents the completion of the process. The top layers are peeled away, leaving the integral positive image and color screen ready to view.
Kodak Instant Films Kodak entered the instant photography market in 1976 by introducing PR-10 integral color print film, rated at ISO 150, and a series of instant cameras designed for this film (9). The film system was based on a negativeworking dye release process using preformed dyes (50), as described earlier under Dye Release Processes (see Fig. 9). These films and cameras were incompatible with Polaroid instant photographic systems. Later Kodak integral films included Kodak Instant Color Film (ISO 150), Kodamatic and Kodamatic Trimprint films (ISO 320), Instagraphic
Print film (ISO 320), and Instagraphic color slide film (ISO 64). The films were balanced for daylight. Kodak discontinued the production of instant films and cameras in 1986. These Kodak films and their chemistry are described in detail in (12). Fuji Instant Films The first Fuji instant system, Fotorama (1981), provided the integral color film FI-10, rated at ISO 150. Both film and camera were compatible with the Kodak instant system and incompatible with the Polaroid system. The
850
INSTANT PHOTOGRAPHY
film was similar in structure and processing to Kodak PR-10 film but had a different set of dye release compounds (see Figs. 9 and 10). A high-speed integral color film, FI-800 (ISO 800), also designated as ACE film, and a new series of cameras suited to the higher speed films were introduced in 1984. The ACE products were distributed principally in Japan. Picture units are 9.7 × 10.2 cm, and image area is 6.7 × 9.0 cm. FI-10 was also provided in a 10 × 13-cm format for use in a special back for professional cameras. In 1984, Fuji also introduced a peel-apart color print pack film and a series of black-and-white peel-apart pack films. These films are compatible with Polaroid cameras and camera backs that use peel-apart films. Film units measure 8.3 × 10.8 cm, and image area is 7.3 × 9.5 cm. In 1998, Fuji introduced the Instax system, which included new ISO 800 films in two formats and completely redesigned camera models (109). The films incorporated many improvements over the earlier FI-10 and ACE films, and the cameras introduced new picture formats. The Instax film units are 10.8 × 8.6 cm, and image area is 9.9 × 6.2 cm. Instax Mini film units are 8.6 × 5.4 cm, and image area is 6.2 × 4.6 cm. The following sections will describe the principles involved in image formation and discuss the specific films. The Fuji films are based on a negative-working dye release process using preformed dyes. Direct positive emulsions yield positive images upon development, as described earlier under Dye Release Processes.The transfer of dyes is thus initiated by oxidized developing agent in areas where unexposed silver halide grains are undergoing development. The oxidized developing agent reacts with alkali-immobile dye-releasing compounds, which in turn release mobile image dyes that then diffuse to the imagereceiving layer. The direct positive emulsions comprise silver halide grains that form latent images internally and not on the grain surface. Reversal is effected during processing by the action of a surface developer and a nucleating, or fogging, agent in the emulsion layer. Photoelectrons generated during exposure are trapped preferentially inside the grain, forming an internal latent image. In an exposed grain, the internal latent image is an efficient trap for conduction electrons provided by the nucleating agent. These grains have no surface latent image and do not develop in a surface developer. The unexposed grains trap electrons from the nucleating agent, at least temporarily, on the grain surface, forming fog nuclei that undergo development by the surface developer. Fuji FI-10 Integral Film. The principal components of the FI-10 integral film unit are shown schematically in Figure 19. The clear polyester support, through which the final image is viewed, has both an image-forming section and an image-receiving section on its inner surface. The image-forming section includes direct positive emulsion layers, dye releaser layers, and scavenger layers; the image-receiving section includes the receiving layer and opaque layers to protect the image-forming section. The pod contains a viscous reagent and carbon black that serves as an opacifier. The transparent cover sheet through
View
Clear embossed layer Clear polyester Image-receiving layer White-reflective layer Black opaque layer Cyan dye releaser Red-sensitive emulsion Interlayer Magenta dye releaser Green-sensitive emulsion Interlayer Yellow dye releaser Blue-sensitive emulsion UV absorber Pod Timing layers Acid polymer Clear polyester Backing layer Expose
Figure 19. Schematic cross section of Fuji FI-10 integral instant film. This film is exposed through the transparent cover sheet, and the image is viewed through the opposite surface. During processing, released image-forming dyes pass through the black opaque layer and the white reflective layer to reach the image-receiving layer. During processing under ambient light, the photosensitive emulsion layers are protected from further exposure by pigments within the reflective layer and the black opaque layer. The image is viewed against the reflective white pigment layer.
which the film is exposed has polymeric timing layers and a polymeric acid layer on its inner surface. When the exposed film unit passes between processing rollers, viscous reagent released from the pod spreads between the two sheets, initiating a series of reactions that develop the unexposed silver halide, release the dyes, and form the positive image in the image-receiving layer. As in Polaroid integral films, completion of processing takes place under ambient conditions outside the camera. The pigment layers of the image-receiving section protect the developing film from exposure to ambient light through the viewing surface. The carbon black layer formed as the reagent spreads prevents exposure through the opposite surface. Neutralization to terminate processing commences as the reagent permeates the polymeric timing layers and reaches the polymeric acid layer. Fuji Ace (FI-800) Integral Film. The ACE film structure, earlier designated as FI-800, introduced new spacer layers
INSTANT PHOTOGRAPHY
851
that separate each of the light-sensitive emulsions from the dye releaser layer that it controls. Figure 20a is a schematic cross section. The overall thickness of the ACE negative is appreciably greater than that of the FI-10 negative, and image completion is somewhat slower. This film and the earlier FI-10 are available in Japan and in other Asian countries.
of photothermographic imaging materials and processes discuss the principles of such image formation (111). Although most dry silver processes use low-speed materials, a recent patent described a camera-speed color process based on photothermographic components (112).
Fuji Peel-Apart Film FP-100. The FP-100 films, like the Fuji integral films, use a dye release system and direct positive emulsions. Figure 20b shows the negative structure, which includes a spacer layer between the red-sensitive layer and the cyan dye releaser layer, as in the ACE structure. Unlike the ACE negative, there are no spacers between the other emulsions and the corresponding dye releaser layers. These films are available in both 4 × 5-inch and 3 14 × 4 14 -inch formats, compatible with corresponding Polaroid cameras and camera backs.
Several photothermographic systems using instant color film output have been developed by Fuji in recent years. The processing takes place in dedicated equipment that uses donor and receptor sheets in roll form. All processing components are incorporated within the coatings, so that processing requires only the application of water and heat. Film speeds are generally lower than those of the films suited to instant camera use. Both analog and digital inputs are used.
Fuji Instax Integral System. The introduction of Instax film in 1998 marked a complete redesign of the Fuji instant cameras and films (109). New emulsions, interlayers, dyes, dye releasers, and timing polymers are incorporated in the Instax film structure illustrated schematically in Fig. 21. The use of small tabular direct positive grains permits thinner, more uniform emulsion layers. Thin interlayers, new development accelerators, and faster-acting pH reduction contribute to improved imaging properties. New features include a broadened processing temperature range, 5–40 ° C (41104 ° F); improved sharpness, modulation transfer function (MTF), and color isolation; and greater light stability (110). The new film units incorporate narrower margins, and more compact reagent pods, and the reagents have more uniform spreading characteristics. The use of a single polymer, polystyrene, for the film pack and the opaque cover sheet enables efficient disposal and recycling. The Instax products are distributed in Japan, Western Europe, Canada, Mexico, Brazil, and Asia. PHOTOTHERMOGRAPHIC IMAGING Dry Silver Materials and Processes Photothermographic materials have been developed using low-sensitivity silver salts, such as silver acetylides and silver carboxylates, sometimes in combination with silver bromide. Starting in 1964, 3 M produced a series of commercial dry silver products and conducted extensive research on the mechanisms involved. The dry silver products and patents, along with other 3 M imaging systems, were turned over to the 3 M spinoff Imation Corp. in 1996. The dry silver technology was sold to Eastman Kodak in 1999. Contemporary films of this type include Kodak DryView, Fuji Dry CRI-AL film, and Konica DryPro. In these materials, infrared laser exposure creates a latent image that is developed by heat to provide high resolution images. Applications of such films include medical imaging and graphic arts. Both black-and-white and color processes have been described. Detailed reviews
Fuji Photothermographic Systems
Fujix Pictrography 1000. Redox dye release chemistry by the ROSET (ring opening by single-electron transfer) system was used in negative-working emulsions in the Fujix Pictrography 1000 system (1987) (113). The films were sensitized for exposure to light-emitting diodes, rather than the more conventional blue, green, and red spectral regions. Exposure was by yellow (570 nm), red (660 nm), and near-infrared (NIR) LEDs. The multilayer negative comprised a yellow-sensitive emulsion layer that controlled a yellow dye releaser, a red-sensitive emulsion that controlled a magenta dye releaser, and an NIRsensitive emulsion that controlled a cyan dye releaser. The peel-apart film used two sheets, a negative, or donor, sheet and a receptor sheet. The donor sheet contained a redox dye releaser, a development accelerator, and a basic metal compound, in addition to silver halide and binder (114). The redox dye releaser was a 4-alkoxy-5-tert-alkyl-2-sulfonamidophenol derivative (18). The substituent in the 5-position was considered significant for the combination of stability at room temperature and reactivity at high pH and elevated temperature. The development accelerator for the high temperature processing, a silver arylacetylide, precluded the need for an auxiliary developing agent. The receiving sheet contained binder, polymeric mordant, and a chelating compound. OH SO2NR1R2
R3SO2 NH
N N
Magenta dye
OR4 SO2NH
OH
R5 O-Ballast Reducing moiety (18)
852
INSTANT PHOTOGRAPHY
(a)
View
(b)
Backing layer Polyester
Clear embossed layer
Acid polymer
Clear polyester Timing layers
Image-receiving layer
Image-receiving layer
White reflective layer
Release layer
Black opaque layer
Expose Cyan dye releaser Spacer
Pod
View Protective layer
Red-sensitive emulsion
UV absorber
Interlayer
Blue-sensitive emulsion
Magenta dye releaser Spacer Green-sensitive emulsion Interlayer Yellow dye releaser Spacer
Yellow dye releaser Interlayer Green-sensitive emulsion Magenta dye releaser Interlayer Red-sensitive emulsion
Blue-sensitive emulsion Spacer UV absorber Pod Timing layers
Cyan dye releaser Base coat Polyester
Acid polymer
Backing layer
Clear polyester Backing layer Expose Figure 20. Schematic cross sections of (a) Fuji ACE (FI-800) integral instant film and (b) Fuji FP-100 peel-apart instant color film. The ACE film structure includes new spacer layers between each of the photosensitive emulsion layers and the associated dye releaser layer that each controls. The higher speed of this film is attributed to new direct positive emulsions and a new dye releaser. The overall thickness of the FI-800 negative is appreciably greater than that of the earlier FI-10 negative, and image completion is somewhat slower. The FP-100 film includes a spacer layer between the red-sensitive emulsion and the cyan dye releaser layer but not between the other emulsions and their associated dye releaser layers.
The exposed film was coated with a thin layer of water and brought into contact with the receptor sheet, and the assembled sheets were heated to ∼90 ° C. The chelate compound diffused from the receiving sheet to the donor sheet and reacted there with the metal compound to generate a soluble base. Under basic conditions, the exposed silver halide grains and the dye releaser underwent a redox reaction, effecting silver halide reduction and image dye release, followed by diffusion of dyes into the receptor layer. The two sheets were stripped apart after 20 s to reveal the color image. Pictrostat 300, Pictrography 3,000, Pictrography 4,000, and Pictrostat Digital 400. Current versions of Fuji photothermographic systems combine laser diode exposure of silver halide emulsions with thermal development and
ROSET dye release image chemistry. The Pictrostat 300 (PS300) analog system produces color prints up to A4 size [210 × 297 mm (8 41 × 11 34 in.)] from color negatives, color positives, and 3-D objects. Pictrography 4,000 (PG4000), a 400-dpi digital color printer, prints up to A4 size; Pictrography 3,000 (PG3000) is a 400-dpi color printer that accepts multiple paper sizes up to A3 [297 × 420 mm (11 14 × 16 12 in.)] and outputs color prints and transparencies up to A4 and A5 (148 × 210 mm, or 5 78 × 8 14 in.) sizes. The Pictrostat Digital 400 (PSD400), illustrated schematically in Figure 22, is a full-color digital scanner/printer that handles a variety of input sources, including a flatbed scanner, a film scanner, and SCSI and PC card interfaces (115). The system also incorporates a digital image processing unit (Fig. 23). The printer section of the PSD 400 is the same as that of the PG4000.
INSTANT PHOTOGRAPHY
853
During exposure
(a)
Glossy surface bacing layer Base Photosensitive material sheet
Image receiving layer Light reflective layer Light-shielding layer Cyan dye layer Red-sensitive emulsion layer Intermediate layers Magenta dye layer Green-sensitive emulsion layer Intermediate layers Yellow dye layer Blue-sensitive emulsion layer UV-absorbing layer Processing fluid Cover sheet
Neutralization timing layers Acid polymer layer Base Backing layer Black
White
Red Expose
Green
Blue
Light
Unexposed silver halide Exposed silver halide Dye
Figure 21. Schematic cross section of Fuji Instax integral instant film during exposure (a) and after development (b). The film incorporates high-speed tabular-grain emulsions, new development accelerators, and thinner layers than the earlier Fuji integral films.
Pictro Proof is a high-speed, high-quality digital color proofer introduced in 1998. It uses laser diode exposure and the Pictro three-color printing process, which is very similar to the PG4000 system, to facilitate computer-toplate prepress work. Fuji also provides Pictro Match NT, a preliminary digital display system that simulates the Pictro Proof result. THERMOGRAPHIC IMAGING Thermographic, or thermal, imaging comprises formation of images by localized application of heat without the involvement of photosensitivity (116). Most contemporary thermal processes depend on solid-state thermal printheads. In a direct thermal process, heat effects a chemical reaction, such as that of a leuco dye with an acidic component to produce the colored form of the dye. In such systems, the color change takes place within a single sheet. In thermal transfer processes, two sheets are involved, and each may contain reactive components. Typically, image-forming dyes or pigments transfer from
one sheet or tape to the other. Dye diffusion thermal transfer (D2T2), also described as sublimation thermal transfer, has been commercialized for rapid production of continuous tone color prints. The thermal heads trigger the release of cyan, magenta, and yellow dyes from donor sheet patches to an image-receiving layer. Ink-jet printing, based on the release of inks or pigments from either thermal printheads or charge-sensitive printheads, is also used to produce high-quality digital images rapidly. Certain of the digital thermal printing systems are of interest in the context of instant photography, inasmuch as they can provide full color images, using small, portable home printers that accept digital camera memory cards and do not rely on computer input. Print quality may be comparable with that of instant photographic prints on photosensitive film or paper. The Kodak Personal Picture Maker PPM 200 incorporates Lexmark ink-jet technology. The Sony Snapshot Printer and the Olympus Camedia P-200 Printer use D2T2 processes. In 2001, Polaroid announced two new thermal printing processes, internally code named Onyx and Opal. Onyx is a thermal
854
INSTANT PHOTOGRAPHY (b)
View
After development Black
White
Red
×××× ××××
××× ×××
Green
Blue
××× ×××
××× ×××
Dye transfer Unexposed and developed silver halide Exposed but undeveloped silver halide Dye × Transferred dye
Figure 21. (Continued)
monochrome process based on leuco dyes. Opal is a twosheet process in which thermally activated dyes are transferred to a microporous receiving sheet (117,118). The Opal process uses a ‘‘frozen ink’’ comprising the dye and a thermal solvent. This frozen ink permits the use of lower processing temperature and shorter printing time than with D2T2 processes.
DIGITAL/INSTANT FILM IMAGING SYSTEMS In addition to the photothermographic and thermographic printing systems described before, there are systems that couple digital cameras or digital records and instant film printers. Fuji’s Digital In-Printer Camera uses a vacuum fluorescent printhead to expose Instax Mini film from a digital image record stored in a Smart Media card. The same print engine is used in the Fuji portable, stand-alone Digital Instax Mini Printer. The Polaroid ColorShot Digital Photo Printer uses Polaroid Spectra film to print directly from a computer. A linear liquid crystal shutter printhead illuminated by red, green, and blue LEDs is moved over the stationary film,
one line at a time. The PhotoMax Digital Photo Printer is a consumer version of this printer; it also uses Polaroid Spectra film. The Polaroid P-500 Digital Photo Printer is a compact, completely portable, tabletop or handheld printer that can be used anywhere. This printer uses Polaroid Type 500 instant color film to produce high resolution color prints directly from the Compact Flash data storage cards and Smart Media memory cards used in many digital cameras. The digital data are converted by the printhead to light that exposes the film. The printer automatically reformulates the digital image up to 3 megapixels for optimum printing. The printer is powered by the battery in the Polaroid Type 500 film pack. The Olympus Camedia C-211 digital photo printing camera integrates a digital camera with a Polaroid instant photo print engine that prints onto Polaroid Type 500 film. Polaroid’s SPd360 Studio Express system combines digital capture and film printout in a choice of six print formats, from one to nine images on a single sheet. The system uses Polacolor Pro100 film to provide color images or Polapan Pro 100 for black-and-white images for passports and official documents.
INSTANT PHOTOGRAPHY
855
APPLICATIONS OF INSTANT PHOTOGRAPHY Film carrier Lamp
A/D CCD
PC VGA monitor Proof unit CRT I/F Lamp
Negative and slide enlargement unit Reflection scanner unit Lamp
SCSI I/F
PC card slot
Scanner correction circuit
CCD A/D
Cutter Receiver Image processing (Water moistening) unit Frame memory Printer engine (Thermal development Cutter and dye transfer) (Exposure to laser diode) Donor Finished print Used donor (Peeling off)
Figure 22. Block diagram of the Fuji Pictrostat Digital 400 processing machine. The image input unit incorporates scanners to accommodate reflective material, slides and negatives, as well as interfaces for inputting digital still images and template images. The image processing unit comprises a color-mapping module and an image processing module. The image output unit comprises a display unit and a printer engine that exposes the film, moistens it with water, and applies heat to implement thermal development and dye transfer from donor sheet to receiver sheet.
Image processing module
Host or scanner
Selector
Frame memory
LUT 3×3Matrix
Selector
LUT 3Dim.LUT
Color mapping module
Printer or CRT
Figure 23. Image processing unit component of the Fuji Pictrostat Digital 400 machine. This unit comprises a color mapping module that provides color correction and a processing module that permits adjustments of sharpness, tone, and image composition. The processing unit interfaces with a PC and with a VGA monitor.
Instant film formats and corresponding equipment have been developed to fit a variety of specialized needs. Many laboratory instruments and diagnostic machines include built-in instant-film camera backs. Digital film recorders produce color prints, slides, or overhead transparencies from computer output and from cathode-ray tube displays. Video image recorders, such as the discontinued Polaroid FreezeFrame, provide color prints and transparencies from a variety of video sources, including VCRs, laser disks, and video cameras. An important aspect of many professional applications is the rapid on-site completion of color images. For example, in photomicrography, work can continue without interruption, results can be documented quickly, and successive images of specimens that are undergoing rapid change can be compared immediately. Integral films and apparatus have been certified for use in clean room environments, where both photomicrography and photomicrography are important for documentation and diagnostic work. Instant color slides and overhead transparencies provide a way to present data and information in lectures and business meetings. In the instant reprographic field, Copycolor materials have been used extensively for making large format seismological charts and maps for the oil industry, mapmaking, and reproducing large graphs, charts, and engineering drawings. The films are also used for small color stats and for position proofs in layout work. Instant photographs have been widely used for identification purposes, such as in driver’s licenses and other identification cards. Most passport photographs made in the United States are instant photographs, many of them in color. Special cameras for identification photography include two-lens and four-lens cameras to make two or four passport size photographs of the same subject or individual photographs of several different subjects on a single sheet of instant film. Such cameras are available from Polaroid, Fuji, and other manufacturers. The Polaroid ID-104 camera line is typical of such cameras. The Fuji Fotorama series, available in Japan and other Asian markets, includes two-lens and four lens identification cameras. There also are special cameras that reproduce a passport-size photograph together with relevant text or other data on a single sheet to provide a complete identification card. The Polaroid ID-200 camera produces two such ID cards on a single sheet of 4 × 5 instant film. Several instant films provide authentication features incorporated in the film during manufacture. Polaroid Polacolor ID Ultraviolet film contains an invisible security pattern preprinted into the film. This random pattern instantly appears when the ID card is scanned by ultraviolet light, making it virtually impossible to forge or alter the photograph. Polaroid Identifilm incorporates a preprinted, customized color or invisible ultraviolet pattern, or both in the image sheet, that is unique to the card issuer. Additional security features may be
856
INSTANT PHOTOGRAPHY
incorporated into laminating materials used to form the final ID card. In addition to cameras designed for use in largescale production of ID photographs, there are relatively simple and inexpensive cameras designed for smaller scale applications. The Polaroid BadgeCam, used in combination with custom transparent templates placed over the exposure aperture of a Captiva (Type 500) film cassette, quickly produces complete identification cards, for example, for use as visitor identification. Polaroid and Avery Dennison introduced a kit for making identification cards using the Polaroid Pocket ID camera, pocket sticky-back film, and special badge blanks made by Avery (119). Medical and scientific fields in which instant color films are used include photography of the retina, using fundus cameras equipped with instant film holders (120); dental imaging with the Polaroid CU-5 close-up camera (121); chromatography; diagnostic imaging with radiopharmaceuticals; photomicrography; and Doppler blood flow studies. Professional photographers use instant films as proof material to check composition and lighting. Large format Polacolor films are often used directly for exhibition prints. On a still larger scale, full-size Polacolor replicas of works of art provide high-quality images for display and for professional study and documentation. Instant films are frequently used by photographers and artists to generate uniquely modified images (122). The image transfer technique begins with the transfer of a positive image from a partially developed Polacolor negative to a material other than the usual receiving sheet — for example, plain paper, fabric, or vellum (123). The transferred image may then be modified for artistic effects by reworking with watercolors or other dyes. The final transfer image is either displayed directly or reproduced for further use. Image manipulation is a creative art form based on Time Zero/SX-70 film images. Here, the artist uses a fine stylus or a blunt object to draw on the surface of an integral film unit during processing and for several hours afterward while the image-forming layers are still soft enough to be affected (122). The altered image may be scanned, further modified electronically, printed, or transmitted digitally. ECONOMIC ASPECTS In 2000, 3.9 M instant cameras were sold in the United States, up 80% from the previous year. In the same period, digital still camera sales in the United States increased 123% from 1999, and a total of 4.1 M digital cameras was sold in the year 2000 (124). Polaroid reported worldwide sales of 13.1 M instant cameras and 1.3 M digital cameras in 2000 (125). Instant film printers incorporated into digital cameras, portable digital printers, and portable scanners for transforming instant photographs into digital records all indicate a growing trend toward integrating the two technologies.
BIBLIOGRAPHY 1. J. S. Arney and J. A. Dowler, J. Imaging Sci. 32, 125 (1988). 2. B. Fischer, G. Mader, H. Meixner, and P. Kleinschmidt, siemens Forsch u. Entwickkl.-Ber. 17, 291 (1988); G. Mader, H. Meixner, and G. Klug, J. Imaging Sci. 34, 213 (1990). 3. Eur. Pat. 349, 532, July 26, 2000, M. R. Etzel, (to Polaroid Corp); Eur. Pat. Appl. 90890217 Mar, 6, 1991), M. Etzel (to Polaroid Corp.). 4. E. H. Land, J. Opt. Soc. Am. 37, 61 (1947); E. H. Land, Photogr. J. 90A, 7 (1950). 5. E. H. Land, H. G. Rogers, and V. K. Walworth, in J. M. Sturge, ed., Neblette’s Handbook of Photography and Reprography, 7th ed., Van Nostrand Reinhold, New York, pp. 258–330, 1977. 6. A. Rott and E. Weyde, Photographic Silver Halide Diffusion Processes, Focal Press, London, New York, 1972. 7. T. T. Hill, in J. M. Sturge, ed., Neblette’s Handbook of Photography and Reprography, 7th ed., Van Nostrand Reinhold, New York, pp. 247–255, 1977. 8. Four Decades of Image Diffusion Transfer. Milestones in Photography, Sterckshof Provinciaal Museum voor Junstambachten, Deurne-Antwerpen, 1978. 9. W. T. Hanson Jr., Photogr. Sci. Eng. 24, 155 (1976); W. T. Hanson Jr., J. Photogr. Sci. 25, 189 (1977). 10. C. C. Van de Sande, Angew. Chem. Int. Ed. Engl. 22, 191 (1983). 11. E. Ostroff, ed., Pioneers of Photography: Their Achievements in Science and Technology, SPSE (now IS&T), Springfield, VA, 1987. 12. S. H. Mervis and V. K. Walworth, in Kirk-Othmer Encyclopedia of Chemical Technology, vol. 6, 4th ed., John Wiley & Sons, Inc., pp. 1,003–1,048, 1993. 13. V. K. Walworth and S. H. Mervis, in N. Proudfoot, ed., Handbook of Photographic Science and Engineering, 2nd ed., IS&T, Springfield, VA, 1997. 14. M. McCann, ed., Edwin H. Land’s Essays, IS&T, Springfield, VA, 1997. 15. U.S. Pat. 2,543,181, Feb. 27, 1951, E. H. Land (to Polaroid Corp.). 16. U.S. Pat. 3,415,644, Dec. 10, 1968, E. H. Land (to Polaroid Corp.). 17. U.S. Pat. 3,594,165, July 20, 1971, H. G. Rogers (to Polaroid Corp.); U.S. Pat. 3,689,262, Sept. 5, 1972, H. G. Rogers (to Polaroid Corp.). 18. U.S. Pat. 2,603,565, July 15, 1952, E. H. Land (to Polaroid Corp.). 19. U.S. Pat. 2,435,717, Feb. 10, 1948, E. H. Land (to Polaroid Corp.); U.S. Pat. 2,455,111, Nov. 30, 1948, J. F. Carbone and M. N. Fairbank (to Polaroid Corp.). 20. U.S. Pat. 2,495,111, Jan. 17, 1950, E. H. Land (to Polaroid Corp.); U.S. Pat. 3,161,122, Dec. 15, 1964, J. A. Hamilton (to Polaroid Corp.); U.S. Pat. 3,079,849, Mar. 5, 1963, R. R. Wareham (to Polaroid Corp.). 21. U.S. Pat. 2,933,993, April 26, 1960, A. J. Bachelder and V. K. Eloranta (to Polaroid Corp.). 22. S. H. Liggero, K. J. McCarthy, and J. A. Stella, J. Imaging Technol. 10, 1 (1984). 23. E. H. Land, Photogr. Sci. Eng. 16, 247 (1972). 24. E. H. Land, Photogr. J. 114, 338 (1974).
INSTANT PHOTOGRAPHY
857
25. U.S. Pat. 3,671,241, June 20, 1972, E. H. Land (to Polaroid Corp.); U.S. Pat. 3,772,025, Nov. 13, 1973, E. H. Land (to Polaroid Corp.).
49. U.S. Pat. 3,433,939, May 13, 1969, S. M. Bloom and R. K. Stephens (to Polaroid Corp.); U.S. Pat. 3.751,406, Aug. 7, 1973, S. M. Bloom (to Polaroid Corp.).
26. U.S. Pat. 3,362,961, Jan. 9, 1968, M. Green, A. A. Sayigh, and C. Uhrich (to Polaroid Corp.).
50. Fr Pat. 2,154,443, Aug. 31, 1972, L. J. Fleckenstein and J. Figueras (to Eastman Kodak Co.); Brit. Pat. 1,405,662, Sept. 10, 1975, L. J. Fleckenstein and J. Figueras (to Eastman Kodak Co.); U.S. Pat. 3,928,312, Dec. 23, 1975, L. J. Fleckenstein (to Eastman Kodak Co.); U.S. Publ. Pat. Appl. B351,673, Jan. 28, 1975, L. J. Fleckenstein and J. Figueras (to Eastman Kodak Co.).
27. U.S. Pat. 2,857,274, Oct, 21, 1958, E. H. Land et al., (to Polaroid Corp.). 28. U.S. Pat. 4,304,835, Dec. 8, 1981, S. M. Bloom and B. Levy (to Polaroid Corp.). 29. U.S. Pat. 2,968,554, Jan. 17, 1961, E. H. Land (to Polaroid Corp.). 30. U.S. Pat. 3,345,163, Oct. 3, 1967, E. H. Land and H. G. Rogers (to Polaroid Corp.). 31. S. Fujita, K. Koyama, and S. Ono, Nippon Kagaku Kaishi 1991, 1 (1991). 32. U.S. Pat. 3,148,062, Sept. 8, 1964, K. E. Whitmore, C. R. Staud, C. R. Barr, and J. Williams (to Eastman Kodak Co.); U.S. Pat. 3,227,554, Jan. 4, 1966, C. R. Barr, J. Williams, and K. E. Whitmore (to Eastman Kodak Co.); U.S. Pat. 3,243,294, Mar. 29, 1966, C. R. Barr (to Eastman Kodak Co.).
51. S. Fujita, SPSE 35th Annu. Conf., Rochester, N.Y., p. J-1; S. Fujita, Sci. Publ. Fuji Photo Film Co,. Ltd. 29, 55 (1984) (in Japanese). 52. T. Shibata, K. Sato, and Y. Aotsuka, Advance Printing of Paper Summaries, SPSE 4th Int. Congr. Adv. Non-Impact Printing Technol., New Orleans, LA, 1988, p. 362. 53. U.S. Pat. 3,725,062, Apr. 3, 1973, A. E. Anderson and K. K. Lum (to Eastman Kodak Co); U.S. Pat. 3,728,113, Apr. 17, 1973, R. W. Becker, J. A. Ford Jr., D. L. Fields, and D. D. Reynolds (to Eastman Kodak Co.).
33. U.S. Pat. 2,983,606, May 9, 1961, H. G. Rogers (to Polaroid Corp.).
54. U.S. Pat. 3,222,169, Dec. 7, 1965, M. Green and H. G. Rogers (to Polaroid Corp.); 3,230,086, Jan. 18, 1966 M. Green (to Polaroid Corp.); 3,303,183, Feb. 7, 1976, M. Green (to Polaroid Corp.).
34. U.S. Pat. 3,255,001, June 7, 1966, E. R. Blout and H. G. Rogers (to Polaroid Corp.).
55. Belg Pat. 834,143, Apr. 2, 1976, D. L. Fields and co-workers (to Eastman Kodak Co.).
35. S. M. Bloom, M. Green, E. M. Idelson, and M. S. Simon, in K. Venkataraman, ed., The Chemistry of Synthetic Dyes, vol. 8, Academic Press, New York, pp. 331–387, 1978.
56. U.S. Pat. 3,628,952, Dec. 21, 1971, W. Puschel and coworkers to Agfa-Gevaert Aktiengesellschaft; Brit Pat. 1,407,362, Sept. 24, 1975, J. DanhaU.S. er and K. Wingender (to Agfa-Gevaert Aktiengesellschaft).
36. U.S. Pat. 3,201,384, Aug. 17, 1965, M. Green (to Polaroid Corp.); U.S. Pat. 3,246,985, Apr. 19, 1966, M. Green (to Polaroid Corp.). 37. U.S. Pat. 3,209,016, Sept. 28, 1965, E. R. Blout et al., (to Polaroid Corp.). 38. H. G. Rogers, E. M. Idelson, R. F. W. Cieciuch, and S. M. Bloom, J. Photogr. Sci. 22, 138 (1974).
57. Ger. Offen. 2,335,175, Jan. 30, 1975, M. Peters and coworkers (to Agfa-Gevaert Aktiengesellschaft). 58. Brit Pat. 840,731, July 6, 1960, K. E. Whitmore and P. Mader (to Kodak Limited); U.S. Pat. 3,227,550, Jan. 4, 1966, K. E. Whitmore and P. M. Mader (to Eastman Kodak Co.).
41. U.S. Pat. 4,264,701, Apr. 28, 1981, L. Locatell Jr. et al., (to Polaroid Corp.).
59. U.S. Pat. 4,232,107, Nov. 4, 1980, W. Janssens (to N. V. Agfa-Gevaert); U.S. Pat. 4,371,604, Nov. 4, 1983, C. C. Van De Sande, W. Janssens, W. Lassig, and E. Meier (to N. V. Agfa-Gevaert); U.S. Pat. 4,396,699, Aug. 3, 1983, W. Jannsens and D. A. Claeys (to N. V. Agfa-Gevaert); U.S. Pat. 4,477,554, Oct. 16, 1984, C. C. Van de Sande and A. Verbecken (to N. V. Agfa-Gevaert).
42. U.S. Pat. 4,264,704, Apr. 28, 1981, A. L. Borror, L. Cincotta, E. M. Mahoney, and M. Feingold (to Polaroid Corp.).
60. K. Nakamura and K. Koya, SPSE/SPSTJ Int. East-West Symp. II, Kona, Hawaii, p. D-24, 1988.
43. Reference 12, p. 1011.
61. Eur. Pat. Appl. 0,220,746, May 6, 1987, K. Nakamura (to Fuji Photo Film Co., Ltd.).
39. U.S. Pat. 3,857,855, Dec. 31, 1974, E. M. Idelson (to Polaroid Corp.). E. M. Idelson, Dyes and Pigments 3, 191 (1982). 40. E. M. Idelson, I. R. Karaday, B. H. Mark, D. O. Rickter, and V. H. Hooper, Inorg. Chem. 6, 450 (1967).
44. U.S. Pat. 3,307,947, Mar. 7, 1967, E. M. Idelson and H. G. Rogers (to Polaroid Corp.); U.S. Pat. 3,230,082, Jan. 18, 1966, E. H. Land and H. G. Rogers (to Polaroid Corp.). 45. U.S. Pat. 3,854,945, Dec. 17, 1974, W. M. BU.S. h and D. F. Reardon (to Eastman Kodak Co.). 46. U.S. Pat. 3,880,658, Apr. 29, 1975, G. J. Lestina and W. M. Bush (to Eastman Kodak Co.); U.S. Pat. 3,935,262, Jan. 27, 1976, G. J. Lestina and W. M. Bush (to Eastman Kodak Co.); U.S. Pat. 3,935,263, Jan. 27, 1976, G. J. Lestina and W. M. Bush (to Eastman Kodak Co.). 47. U.S. Pat. 3,649,266, Mar. 14, 1972, D. D. Chapman and L. G. S. Brooker (to Eastman Kodak Co.); U.S. Pat. 3,653,897, Apr. 4, 1972, D. D. Chapman (to Eastman Kodak Co.). 48. U.S. Pat. 3,245,789, Apr. 12, 1966, H. G. Rogers (to Polaroid Corp.); U.S. Pat. 3,443,940, May 13, 1969, S. M. Bloom and H. G. Rogers (to Polaroid Corp.).
62. M. Sawada and K. Nakamura, Advance Printing of Paper Summaries, IS&T 44th Annu. Conf., St. Paul, MN, 1991, p. 510. 63. R. Lambert, J. Imaging Technol. 15, 108 (1989); 4,740,448, Apr. 26, 1988, P. O. Kliem (to Polaroid Corp.). 64. U.S. Pat. 3,719,489, Mar. 6, 1973, R. F. W. Cieciuch, R. R. Luhowy, F. Meneghini, and H. G. Rogers (to Polaroid Corp.); U.S. Pat. 4,060,417, Nov. 29, 1977, R. F. W. Cieciuch, R. R. Luhowy, F. Meneghini, and H. G. Rogers (to Polaroid Corp.). 65. F. Meneghini, J. Imaging Technol. 15, 114 (1989). 66. A. Ehret, J. Imaging Technol. 15, 97 (1989). 67. U.S. Pat. 4,690,884, Sept. 1, 1987, F. F. DeBruyn Jr. and L. J. Weed (to Polaroid Corp.). 68. U.S. Pat. 4,631,251, Dec. 23, 1986, T. Komamura (to Konishiroku Photo IndU.S. try, Ltd.); U.S. Pat. 4,650,748,
858
INSTANT PHOTOGRAPHY Mar. 17, 1987, T.Komamura (to Konishiroku Photo IndU.S. try, Ltd.); U.S. Pat. 4,656,124, Apr. 7, 1987, T. Komamura (to Konishiroku Photo IndU.S. try, Ltd.); T. Komamura, SPSE/SPSTJ International East-West Symposium II, Kona, Hawaii, 1988, p. D-35.
69. U.S. Pat. 3,087,817, Apr. 30, 1963, H. G. Rogers (to Polaroid Corp.). 70. U.S. Pat. 3,537,850, Nov. 3, 1970, M. S. Simon (to Polaroid Corp.). 71. U.S. Pat. 2,909,430, Oct. 20, 1959, H. G. Rogers (to Polaroid Corp.).
92. Can. Pat. 951,560, July 23, 1974, E. H. Land (to Polaroid Corp.). 93. U.S. Pat. 3,647,437, Mar. 7, 1972, E. H. Land (to Polaroid Corp.). 94. S. M. Bloom, Abstract L-2, SPSE 27th Annu. Conf., Boston, Mass., 1974. 95. U.S. Pat. 3,702,244, Nov. 7, 1972, S. M. Bloom, A. L. Borror, P. S. Huyffer, and P. Y. MacGregor (to Polaroid Corp.); U.S. Pat. 3,702,245, Nov. 7, 1972, M. S. Simon, and D. F. Waller (to Polaroid Corp); M. S. Simon, J. Imaging Technol. 16, 143 (1990);
72. U.S. Pat. 3,185,567, May 25, 1965, H. G. Rogers (to Polaroid Corp.).
R.Cournoyer, D. H. Evans, S. Stroud, and R. Boggs, J. Org. Chem. 56, 4,576 (1991).
73. U.S. Pat. 3,362,819, Jan. 9, 1968, E. H. Land (to Polaroid Corp.).
96. U.S. Pat. 3,573,043, Mar. 30, 1971, E. H. Land (to Polaroid Corp.).
74. U.S. Pat. 3,362,821, Jan. 9, 1968, E. H. Land (to Polaroid Corp.). 75. H. G. Rogers, M. Idelson, R. F. W. Cieciuch, and S. M. Bloom, J. Photogr. Sci. 22, 183 (1974).
97. U.S. Pat. 4,298,674, Nov. 3, 1981, E. H. Land, L. D. Cerankowski, and N. Mattucci (to Polaroid Corp.); U.S. Pat. 4,294,907, Oct. 13, 1981, I. Y. Bronstein-Bonte, E. O. Lindholm, and L. D. Taylor (to Polaroid Corp.).
76. P. Duffy, ed., Storing, Handling and Preserving Polaroid Photographs: A Guide, Focal Press, New York, 1983.
98. U.S. Pat. 4, 2002, 694, May 13, 1980, L. D. Taylor (to Polaroid Corp.);
77. R. F. W. Cieciuch, SPSE Int. Symp.: The Stability and Preservation of Photographic Images, Ottawa, Ontario, 1982.
L. D. Taylor, H. S. Kolesinski, D. O. Rickter, J. M. Grasshoff, and J. R. DeMember, Macromolecules 16, 1,561 (1983).
78. H. Wilhelm and C. Brower, The Permanence and Care of Color Photographs: Traditional and Digital Color Prints, Color Negatives, Slides, and Motion Pictures, Preservation Publishing Co., Grinnall, IA, 1993.
99. U.S. Pat. 3,756,814, Sept. 4, 1973, S. F. Bedell (to Polaroid Corp.); U.S. Pat. 3,770,439, Nov. 6, 1973, L. D. Taylor (to Polaroid Corp.).
79. ANSI IT9.9-1990, American National Standards for Imaging Media — Stability of Color Images — Methods for Measuring, American National Standards Institute, Inc., NY, 1991. 80. U.S. Pat. 3,625,685, Dec. 7, 1971, J. A. Avtges, J. L. Reid, H. N. Schlein, and L. D. Taylor (to Polaroid Corp.). 81. U.S. Pat. 3,192,044, June 29, 1965, H. G. Rogers and H. W. Lutes (to Polaroid Corp.).
100. U.S. Pat. 6,006,036, Dec. 21, 1999, L. M. Douglas (to Polaroid Corp.). 101. U.S. Pat. 5,870,633, Feb. 9, 1999, P. R. Norris (to Polaroid Corp.); U.S. Pat. 5,888,693, Mar. 30, 1999, J. E. Meschter, P. R. Norris, and H. R. Parsons (to Polaroid Corp.); U.S. Pat. 5,981.137, Nov. 9, 1999, J. E. Meschter, P. R. Norris, and H. R. Parsons (to Polaroid Corp.). 102. E. H. Land, Photogr. Sci. Eng. 21, 225 (1977).
82. U.S. Pat. 3,039,869, June 19, 1962, H. G. Rogers and H. W. Lutes (to Polaroid Corp.).
103. U.S. Pat. 3,284,208, Nov. 6, 1966, E. H. Land (to Polaroid Corp.).
83. U.S. Pat. 3,411,904, Nov. 19, 1968, R. W. Becker (to Eastman Kodak Co.).
104. S. H. Liggero, K. J. McCarthy, and J. A. Stella, J. Imaging Technol. 10, 1 (1984).
84. U.S. Pat. 3,625,685, Dec. 7, 1971, J. A. Avtges, J. L. Reid, H. N. Schlein, and L. D. Taylor (to Polaroid Corp.).
105. U.S. Pat. 2,861,885, Nov. 25, 1958, E. H. Land (to Polaroid Corp.); U.S. Pat. 3,894,871, July 15, 1975, E. H. Land (to Polaroid Corp.).
85. U.S. Pat. 3,148,061, Sept. 8, 1964, H. C. Haas (to Polaroid Corp.). 86. U.S. Pat. 3,421,893, Jan. 14, 1969, L. D. Taylor (to Polaroid Corp.); U.S. Pat. 3,433,633, Mar. 18, 1969, H. C. Haas (to Polaroid Corp.); U.S. Pat. 3,856,522, Dec. 24, 1974, L. J. George and R. A. Sahatjian (to Polaroid Corp); L. D. Taylor and L. D. Cerankowski, J. Polymer Sci., Polym. Chem. Ed. 13, 2,551 (1975). 87. U.S. Pat. 4,740,448, Apr. 26, 1988, P. O. Kliem (to Polaroid Corp.). 88. E. S. McCaskill and S. R. Herchen, J. Imaging Technol. 15, 103 (1989). 89. E. H. Land, Photogr. Sci. Eng. 16, 247 (1972); E. H. Land, Photogr. J. 114, 338 (1974). 90. U.S. Pat. 3,793,022, Feb. 19, 1974, E. H. Land, S. M. Bloom, and H. G. Rogers (to Polaroid Corp.). 91. U.S. Pat. 4,304,833, Dec. 8, 1981, J. Foley (to Polaroid Corp.); U.S. Pat. 4,304,834, Dec. 8, 1981, R. L. Cournoyer and J. W. Foley (to Polaroid Corp.); U.S. Pat. 4,258,118, Mar. 24, 1981, J. Foley, L. Locatell Jr., and C. M. Zepp (to Polaroid Corp.); U.S. Pat. 4,329,411, May 11, 1982, E. H. Land (to Polaroid Corp.).
106. U.S. Pat. 3,704,126, Nov. 28, 1972, E. H. Land, S. M. Bloom, and L. C. Farney (to Polaroid Corp.); U.S. Pat. 3,821,000, June 28, 1974, E. H. Land, S. M. Bloom, and L. C. Farney (to Polaroid Corp.). 107. U.S. Pat. 3,615,440, Oct. 25, 1971, S. M. Bloom and R. D. Cramer (to Polaroid Corp.). 108. S. Inbar, A. Ehret, and K. N. Norland, SPSE 39th Ann. Conf., Minneapolis, MN, 1986, p. 14. 109. Photokina News 1998 (1998); Fujifilm World 113, 113 (1998). 110. S. Ishimaru et al., Annu. Meet. SPSTJ’ 99, p. 41, 1999 (in Japanese); S. Ishimaru et al., J. Soc. Photogr. Sci. Technol. Japan 62, 434 (1999) (in Japanese); H. Hukuda et al., Res. Dev. 44, 1 (1999) (in Japanese); Fujifilm Data Sheet AF3-077E, 2000 (in English); Fujifilm Product Information Sheets AF3-078E and AF3079E, 2000 (in English). 111. D. A. Morgan, in A. S. Diamond, ed., Handbook of Imaging Materials, Marcel Dekker, NY, 1991, pp. 43–60; M. R. V. Sahyun, J. Imaging Sci. Technol. 42, 23 (1998); D. R. Whitcomb and R. D. Rogers, J. Imaging Sci. Technol. 43, 517 (1999);
INSTANT PHOTOGRAPHY P. Cowdery-Corban and D. R. Whitcomb, in D. Weiss, ed., Handbook of Imaging Materials, 2nd ed., Marcel Dekker, NY, 2001. 112. U.S. Pat. 6,040,131, Mar. 21, 2000, L. M. Eshelman and T. W. Stoebe (to Eastman Kodak Co.). 113. U.S. Pat. 4,704,345, Nov. 3, 1987, H. Hirai and H. Naito (to Fuji Photo Film Co., Ltd.); H. Seto, K. Nakauchi, S. Yoshikawa, and T. Ohtsu, Advance Printing of Paper Summaries, SPSE 4th Int. Cong. Adv. Non-Impact Printing Technol., New Orleans, LA, 1988, p. 358; T. Shibata, K.Sato, and Y. Aotsuka, ibid., p. 362. 114. Y. Yabuki, K. Kawata, H. Kitaguchi, and K. Sato, SPSE/SPSTJ Int. East-West Symp. II, Kona, HI, 1988, p. D-32. 115. T. Hirashima, N. Jogo, and M. Takahira, IS&T’s 10th Int. Symp. Photofinishing Technol., Las Vegas, NV, 1998, p. 44; K. Nagashima, S. Oku, and T. Hirashima, Fujifilm Res. & Dev. 44, 60 (1999). 116. J. F. Komerska, in A. S. Diamond, ed., Handbook of Imaging Materials, Marcel Dekker, New York, 1991, pp. 487–526.
859
117. U.S. Patent 6,054,246, April 25, 2000, J. C. Bhatt et al., (to Polaroid Corporation). 118. Polaroid press release, May 31, 2001. 119. Polaroid press release, April 10, 2001. 120. Ophthalmic Photography, Polaroid Guide to Instant Imaging PID 780806, Polaroid Corp., Cambridge, MA, 1990. 121. Dental Imaging, Polaroid Guide to Instant Imaging PID 780791, Polaroid Corp., Cambridge, MA, 1990. 122. T. Airey, Creative Photo Printmaking, Watson-Guptill, NY, 1996. 123. Polacolor Image Transferring, Polaroid Guide to Instant Imaging PID 780805, Polaroid Corporation, Cambridge, MA, 1990; Advanced Image Transferring, Polaroid Guide to Instant Imaging PID 780953, Polaroid Corporation, Cambridge, MA, 1991; E. Murray, Painterly Photography: Awakening the Artist Within, Pomegranate Artbooks, San Francisco, 1993. 124. 1999–2000 PMA U.S. Industry Trends Report, Photo Marketing Association International, Jackson, MI, 2001. 125. Polaroid Corporation 2000 Annual Report, Polaroid Corp., Cambridge, MA, 2001.
L LASER-INDUCED FLUORESCENCE IMAGING
volume. Knowledge of the laser spectral characteristics, the spectroscopy of the excited material, and other aspects of the fluorescence collection optics is required for quantifying the parameter of interest. A typical PLIF setup is shown schematically in Fig. 1. In this example, taken from Ref. 1, an ultraviolet laser probes a flame. A spherical lens of long focal length and a cylindrical lens together expand the beam and form it into a thin sheet. The spherical lens is specified to achieve the desired sheet thickness and depth of focus. This relates to the Rayleigh range, to be discussed later. An alternate method for planar laser imaging is to use the small diameter, circular beam typically emitted by the laser and scan it. Alternate sheet formation methods include combining the spherical lens with a scanned-mirror system and other scanning approaches. Fluorescence excited by the laser is collected by a lens or lens system, sometimes by intervening imaging fiberoptics, and is focused onto a camera’s sensitive surface. In the example, this is performed by a gated intensified charge-coupled device (ICCD).
STEPHEN W. ALLISON WILLIAM P. PARTRIDGE Engineering Technology Division Oak Ridge National Laboratory Knoxville, TN
INTRODUCTION Fluorescence imaging is a tool of increasing importance in aerodynamics, fluid flow visualization, and nondestructive evaluation in a variety of industries. It is a means for producing two-dimensional images of real surfaces or fluid cross-sectional areas that correspond to properties such as temperature or pressure. This article discusses three major laser-induced fluorescence imaging techniques: • Planar laser-induced fluorescence • Phosphor thermography • Pressure-sensitive paint
Background Since its conception in the early 1980s, PLIF has become a powerful and widely used diagnostic technique. The PLIF diagnostic technique evolved naturally out of early imaging research based on Raman scattering (2), Mie scattering, and Rayleigh scattering along with 1-D LIF research (3). Planar imaging was originally proposed by Hartley (2), who made planar Raman-scattering measurements and termed the process Ramanography. Two-dimensional LIF-based measurements were made by Miles et al. (4) in 1978. Some of the first applications
IC C
D
Since the 1980s, planar laser-induced fluorescence (PLIF) has been used for combustion diagnostics and to characterize gas- and liquid-phase fluid flow. Depending on the application, the technique can determine species concentration, partial pressure, temperature, flow velocity, or flow distribution/visualization. Phosphor thermography (PT) is used to image surface temperature distributions. Fluorescence imaging of aerodynamic surfaces coated with phosphor material for thermometry dates back to the 1940s, and development of the technique continues today. Imaging of fluorescence from pressuresensitive paint (PSP) is a third diagnostic approach to aerodynamic and propulsion research discussed here that has received much attention during the past decade. These three methodologies are the primary laser-induced fluorescence imaging applications outside medicine and biology. As a starting point for this article, we will discuss PLIF first because it is more developed than the PT or PSP applications.
Pu ls co an er nt d ro le r
F2
PC
CL SL
PLANAR LASER-INDUCED FLUORESCENCE ar
PD
Bo xc
Planar laser-induced fluorescence (PLIF) in a fluid medium is a nonintrusive optical diagnostic tool for making temporally and spatially resolved measurements. For illumination, a laser beam is formed into a thin sheet and directed through a test medium. The probed volume may contain a mixture of various gaseous constituents, and the laser may be tuned to excite fluorescence from a specific component. Alternatively, the medium may be a homogenous fluid into which a fluorescing tracer has been injected. An imaging system normal to the plane of the imaging sheet views the laser-irradiated
Dy e
la
se
r
Nd
:Y
AG
la
se
r
F1
Figure 1. Representative PLIF configuration. 861
862
LASER-INDUCED FLUORESCENCE IMAGING
of PLIF, dating to the early 1980s, involved imaging the hydroxyl ion, OH− , in a flame. In addition to its use for species imaging, PLIF has also been employed for temperature and velocity imaging. General reviews of PLIF have been provided by Alden and Svanberg (3) and Hanson et al. (5). Reference 6 also provides recent information on this method, as applied to engine combustion. Overall, it is difficult to state, with a single general expression, the range and limits of detection of the various parameters, (e.g., temperature, concentration, etc.), because there are so many variations of the technique. Single molecules can be detected and temperature measured from cryogenic to combustion ranges, depending on specific applications. General PLIF Theory The relationship between the measured parameter (e.g., concentration, temperature, pressure) and the fluorescent signal is unique to each measured parameter. However, the most fundamental relationship between the various parameters is provided by the equation that describes LIF or PLIF concentration measurements. Hence, this relationship is described generally here to clarify the different PLIF measurement techniques that derive from it. The equation for the fluorescent signal in volts (or digital counts on a per-pixel basis for PLIF measurements) is formulated as · GR tL (1) SD = (VC fB NT ) · (12,L B12 Iνo ) · · η 4π where =
AN,F AN + Qe + W12 + W21 + QP
and where SD : VC :
fB : NT : 12,L : B12 : Iνo : : η: : G: R:
tL : AN,F : AN : Qe : Qp : W12 : W21 :
Measured fluorescent signal Collection volume, i.e. portion of laser irradiated volume viewed by detection system Boltzmann fraction in level 1. Total number density of probe species Overlap fraction (i.e., energy level line width divided by laser line width) Einstein coefficient for absorption from energy level l to level 2 Normalized laser spectral irradiance Fluorescent quantum yield Collection optics efficiency factor Solid angle subtended by the collection optics gain of camera (applicable if it is an intensified CCD) CCD responsivity Temporal full width half-maximum of the laser pulse Spectrally filtered net spontaneous emission rate coefficient Net spontaneous emission rate coefficient Fluorescence quenching rate coefficient Predissociation rate coefficient Absorption rate coefficient Stimulated emission rate coefficient
The individual terms in Eq. (1) have been grouped to provides a clear physical interpretation of the actions represented by the individual groups. Moreover, the groups have been arranged from left to right in the natural order that the fluorescent measurement progresses. The first parenthetical term in Eq. (1) is the number of probe molecules in the lower laser-coupled level. This is the fraction of the total number of probe molecules that are available for excitation. The second parenthetical term in Eq. (1) is the probability per unit time that one of the available molecules will absorb a laser photon and become electronically excited. Hence, following this second parenthetical term, a fraction of the total number of probed molecules has become electronically excited and has the potential to fluoresce. More detailed explanation is contained in Ref. 1. The fluorescent quantum yield represents the probability that one of the electronically excited probe molecules will relax to the ground electronic state by spontaneously emitting a fluorescent photon within the spectral bandwidth of the detection system. This fraction reflects the fact that spectral filtering is applied to the total fluorescent signal and that radiative as well as nonradiative (e.g., spontaneous emission and quenching, respectively) decay paths are available to the excited molecule. In the linear fluorescent regime and in the absence of other effects such as predissociation, the fluorescent yield essentially reduces to ≈
AN,F (AN + QE )
so that the fluorescent signal is adversely affected by the quenching rate coefficient. Within the third parenthetical term in Eq. (1) represents the net efficiency of the collection optics. This term accounts for reflection losses which occur at each optical surface. The next term, /4π , is the fraction of the fluorescence emitted by the electronically excited probe molecules that impinges on the detector surface (in this case, an ICCD) is the finite solid angle of the collection optics. This captured fluorescence is then passed through an optical amplifier where it receives a gain G. The amplified signal is then detected by a given spectral responsivity R. The detection process in Eq. (1) produces a time-varying voltage or charge (depending on whether a PMT or ICCD detector is used.) This time-varying signal is then integrated over a specific gate time to produce the final measured fluorescent signal. Using Eq. (1), the total number density NT , of the probed species, can be determined via a PLIF measurement of SD provided that the remaining unknown parameters can be calculated or calibrated. Investigation of the different terms of Eq. 1 suggests possible schemes for PLIF measurements of temperature, velocity, and pressure. For a given experimental setup (i.e., constant optical and timing parameters) and total number density of probe molecules, all of the terms in Eq. (1) are constants except for fB , 12,L , and Qe . The Boltzmann fraction fB varies in a known manner with temperature. The degree and type of variation
LASER-INDUCED FLUORESCENCE IMAGING
with temperature is unique to the lower laser-coupled level chosen for excitation. The overlap fraction 12,L varies with changes in the spectral line shape(s) of the absorption transition and/or the laser. Changes in velocity and pressure produce varying degrees of Doppler and pressure shift, respectively, in the absorption spectral profile (7–9). Hence, variations in these parameters will, in turn, produce changes in the overlap fraction. The electronic quenching rate coefficient varies with temperature, pressure, and major species concentrations. Detailed knowledge of the relationship between the variable of interest (i.e., temperature, pressure, or velocity) and the Boltzmann fraction fB and/or the overlap fraction 12,L can be used in conjunction with Eq. (1) to relate the PLIF signal to the variable of choice. Often ratiometric techniques can be used to allow canceling of terms in Eq. (1) that are constant for a given set of experiments. Specific examples of different PLIF measurement schemes are given in the following review of pertinent literature. PLIF Temperature Measurements The theory behind PLIF thermometric measurements is the same as that developed for point LIF. Laurendeau (10) gives a review of thermometric measurements from a theoretical and historical perspective. Thermometric PLIF measurement schemes may be generally classified as monochromatic or bichromatic (two-line). Monochromatic methods employ a single laser. Bichromatic methods require two lasers to excite two distinct molecular rovibronic transitions simultaneously. In temporally stable environments (e.g., laminar flows), it is possible to employ bichromatic methods with a single laser by systematically tuning the laser to the individual transitions. In bichromatic PLIF thermometric measurements, the ratio of the fluorescence from two distinct excitation schemes is formed pixel-by-pixel. If the two excitation schemes are chosen so that the upper laser-coupled level (i.e., exited state) is the same, then the fluorescent yields (Stern–Volmer factors) are identical. This is explained by Eckbreth in Ref. 11, an essential reference book for LIF and other laser-based flow and combustion diagnostic information. Hence, as evident from Eq. (1), the signal ratio becomes a sole function of temperature through the ratio of the temperature-dependent Boltzmann fractions for the two lower laser-coupled levels of interest. Monochromatic PLIF thermometry is based on either the thermally assisted fluorescence (THAF) or the absolute fluorescence (ABF) methods. In THAF-based techniques, the temperature is related to the ratio of the fluorescent signals from the laser-excited level and from another higher level collisionally coupled to the laser-excited level. Implementing of this method requires detailed knowledge of the collisional dynamics, that occur in the excited level (9). In ABF-based techniques, the field of interest is uniformly doped or seeded, and fluorescence is monitored from a single rovibronic transition. The temperatureindependent terms in Eq. (1) (i.e., all terms except fB , 12,L , and ) are determined through calibration. The temperature field may then be determined from the fluorescent field by assuming a known dependence of
863
the Boltzmann fraction, the overlap fraction, and the quenching rate coefficient on temperature. PLIF Velocity and Pressure Measurements PLIF velocity and pressure measurements are based on changes in the absorption line-shape function of a probed molecule under the influence of variations in velocity, temperature, and pressure. In general, the absorption lineshaped function is Doppler-shifted by velocity, Dopplerbroadened (Gaussian) by temperature, and collisionally broadened (Lorentzian) and shifted by pressure (10). These influences on the absorption line-shape function and consequently on the fluorescent signal via the overlap fraction of Eq. (1) provide a diagnostic path for velocity and pressure measurements. The possibility of using a fluorescence-based Dopplershift measurement to determine gas velocity was first proposed by Measures (12). The measurement strategy involved seeding a flow with a molecule that is excited by a visible, narrow-bandwidth laser. The Doppler shift could be determined by tuning the laser over the shifted absorption line and comparing the spectrally resolved fluorescence to static cell measurements. By probing the flow in two different directions, the velocity vector along each propagative direction could be determined from the resulting spectrally resolved fluorescence. For another early development, Miles et al. (4) used photographs to resolve spatially the fluorescence from a sodiumseeded, hypersonic nonreacting helium flow to make velocity and pressure measurements. The photographs of the fluorescence at each tuning position of a narrowbandwidth laser highlighted those regions of the flow that had a specific velocity component. Although this work used a large diameter beam rather than a sheet for excitation, it evidently represents the first two-dimensional, LIF-based imaging measurement. Another important method that is commonly used for visualizing flow characteristics involves seeding a flow with iodine vapor. The spectral properties are well characterized for iodine, enabling pressure and velocity measurements (13). PLIF Species Concentration Measurements The theory for PLIF concentration measurements is similar to that developed for linear LIF using broadband detection. The basic measurement technique involves exciting the specific rovibronic transition of a probe molecule (seeded or naturally occurring) and determining the probed molecule concentration from the resulting broadband fluorescence. Unlike ratiometric techniques, the fluorescent signal from this single-line method retains its dependence on the fluorescent yield (and therefore the electronic quenching rate coefficient). Hence, the local fluorescent signal depends on the number density the local probe molecule, of the Boltzmann fraction, the overlap fraction, and the electronic quenching rate coefficient. Furthermore, the Boltzmann fraction depends on the local temperature; the overlap fraction depends on the local temperature and pressure; and the electronic quenching rate coefficient depends on the local temperature, pressure,
864
LASER-INDUCED FLUORESCENCE IMAGING
and composition. This enhanced dependence of the fluorescent signal complicates determining of probed species concentrations from PLIF images. The difficulty in accurately determining the local electronic quenching rate coefficient, particularly in reacting environments, is the primary limitation to realizing quantitative PLIF concentration imaging (5). Nevertheless, methodologies for PLIF concentration measurements in quenching environments, based on modeling (1) and secondary measurements (2), have been demonstrated. Useful fundamental information can be obtained from uncorrected, uncalibrated PLIF ‘‘concentration’’ images. Because of the species specificity of LIF, unprocessed PLIF images can be used to identify reaction zones, mixing regimes, and large-scale structures of flows. For instance, qualitative imaging of the formation of pollutant in a combustor can be used to determine optimum operating parameters. The primary utility of PLIF concentration imaging remains its ability to image relative species distributions in a plane, rather than providing quantitative field concentrations. Because PLIF images are immediately quantitative in space and time (due to the high temporal and spatial resolution of pulsed lasers and ICCD cameras, respectively), qualitative species images may be used effectively to identify zones of species localization, shock wave positions, and flame-front locations (5). The major experimental considerations limiting or pertinent to the realization of quantitative PLIF are 1. spatial cutoff frequency of the imaging system; 2. selection of imaging optics parameters (e.g., f number and magnification) that best balance spatial resolution and signal-level considerations; 3. image corrections implemented via postprocessing to account for nonuniformities in experimental parameters such as pixel responsivity and offset and laser sheet intensity; and 4. spatial variation in the fluorescent yield due to the electronic quenching rate coefficient. Laser Beam Control A distinctive feature of planar LIF is that the imaging resolution is controlled by the camera and its associated collection optics and also by the laser beam optics. For instance, the thinner a laser beam is focused, the higher the resolution. This section is a simple primer for lens selection and control of beam size. The most important considerations for the choice of lenses are as follows. A simple lens will process light, to a good approximation, according to the thin lens equation, 1 1 1 + = , so si f
SO Objective distance
Si Image distance
Figure 2. Simple lens imaging.
Laser beam
Cylindrical lens
Line focus
Figure 3. Line focus using a cylindrical lens.
in space whose magnification M = −si /so . If two lenses are used, the image of the first lens becomes the object distance for the second. For a well-collimated beam, the object distance is considered infinity, and thus the image distance is simply the focal length of the lens. There is a limit on how small the beam may be focused, and this is termed the diffraction limit. This minimum spot size w is given in units of length as w = (1.22f λ)/D, where λ is the wavelength of the light and D is the collimated beam diameter. If the laser beam is characterized by a divergence α, then the minimum spot size is w = f α. To form a laser beam into a sheet, sometimes termed ‘‘planarizing’’, a combination of two lenses, one spherical and the other cylindrical, is used. The spherical lens controls the spread, and the cylindrical lens controls the sheet thickness. The result is illustrated in Fig. 3. A laser sheet may be formed by combining spherical and cylindrical lenses; the cylindrical lens is used to achieve the desired sheet height, and a spherical lens is used to achieve the desired sheet thickness and Rayleigh range. Rayleigh range, a term that describes Gaussian beams (e.g., see Ref. 9), is the propagative distance required √ on either side of the beam waist to achieve a radius of 2 times the waist radius. The Rayleigh range zo , is defined as π · w2o /λ, where wo is the waist radius that is used as a standard measurement of the waist-region length (i.e., length of the region of minimum and uniform sheet thickness). In general, longer focal length lenses produce longer Rayleigh ranges. In practice, lens selection is determined by the need to make the Rayleigh range greater than the lateral imaged distance. In general, because longer focal length lenses produce wider sheet-waist thicknesses, the specified sheet thickness and lateral image extent must be balanced. PHOSPHOR THERMOGRAPHY Introduction
(2)
where so is the distance from an object to be imaged to the lens, si, , is the distance from the lens to where an image is formed, and f is the focal length of the lens, as shown in Fig. 2. In practice, this relationship is useful for imaging laser light from one plane (such as the position of an aperture or template) to another desired position
As conceived originally, phosphor thermography was intended foremost to be a means of depicting twodimensional temperature patterns on surfaces. In fact, during its first three decades of existence, the predominant use of the technique was for imaging applications in aerodynamics (14). The method was termed ‘‘contact thermometry’’ because the phosphor was in contact with the surface to be monitored. The overall approach, however,
MCP intensifier
410 nm
400
490 nm
440
480 520 560 600 Emission wavelength (nm)
680
0.25
0.20
0.15
0.10 60
80
100 120 140 Surface temperature (°F)
Figure 5. Intensity ratio versus temperature.
CW VU lamp Launch optics Hard-copy device
Sync/timing electronics Image processing PC Digitizing hardware
640
Figure 4. Gd2 O2 S:Tb spectrum.
Selectable filter wheel
Nd:YAG pulsed laser
865
Gd202S:Tb
Phosphorcoated sample
Imaging optics
CCD camera
26 24 22 20 18 16 14 12 10 8 6 4 2 0 360
Corrected image ratio I410.5 /I489.5
has largely been overshadowed by the introduction of modern infrared thermal imaging techniques, several of which have evolved into commercial products that are used in a wide range of industrial and scientific applications. Yet, phosphor thermography (PT) remains a viable method for imaging and discrete point measurements. A comprehensive survey of fluorescence-based thermometry is provided in Ref. 14 and 15. The former emphasizes noncontact phosphor applications, and the latter includes the use of fluorescent crystals, glasses, and optical fibers as temperature sensors, as well as phosphors. Phosphor thermography exploits the temperature dependence of powder materials identical or similar to phosphors used commercially in video and television displays, fluorescent lamps, X-ray scintillating screens, etc. Typically, a phosphor is coated onto a surface whose temperature is to be measured. The coating is illuminated by an ultraviolet source, which induces fluorescence. The emitted fluorescence may be captured by either a nonimaging or an imaging detector. Several fluorescent properties are temperature-dependent. The fluorescence may change in magnitude and/or spectral distribution due to a change in temperature. Figure 4 shows a spectrum of Gd2 O2 S:Tb, a representative phosphor. The emission from this material originates from atomic transitions of the rare-earth activator Tb. At ambient temperatures, the ratio of emission intensities at 410 and 490 nm changes drastically with temperature from ambient to about 120 F. The other emission lines in the figure do not change until much higher temperatures are achieved. Thus the ratio indicates temperature in the said range, as shown in Fig. 5. Figure 6 shows a typical setup that depicts illumination either with laser light emerging from a fiber or an ultraviolet lamp. If the illumination source is pulsed, fluorescence will persist for a period of time after the illumination is turned off. The intensity I decreases, ideally according to I = e−t/τ , where the time required for decreasing by 1/e
Luminescence intensity (arbitrary units)
LASER-INDUCED FLUORESCENCE IMAGING
RGB display Figure 6. A phosphor imaging system.
160
866
LASER-INDUCED FLUORESCENCE IMAGING
(a)
(b)
(c)
Figure 7. False-color thermograph of heated turbine blade.
is termed the characteristic decay time τ , also known as lifetime. The decay time is very temperature-dependent and in most nonimaging applications, the decay time is measured to ascertain temperature. For imaging, it is usually easier to implement the ratio method (16). Figure 7 shows false color images of a heated turbine blade (17). Temperature can be measured from about 12 K to almost 2,000 K. In some cases, a temperature resolution of less than 0.01 K has been achieved. Applications Why use phosphor thermometry when infrared techniques work so well for many imaging applications? As noted by Bizzak and Chyu, conventional thermometric methods are not satisfactory for temperature and heat transfer measurements that must be made in the rapidly fluctuating conditions peculiar to a microscale environment (18). They suggested that thermal equilibrium on the atomic level might be achieved within 30 ns and, therefore, the instrumentation system must have a very rapid response time to be useful in microscale thermometry. Moreover, its spatial resolution should approach the size of an individual
phosphor particle. This can be specified and may range from <1 µm to about 25 µm. In their effort to develop a phosphor imaging system based on La2 O2 S:Eu, they used a frequency-tripled Nd:YAG laser. The image was split, and the individual beams were directed along equal-length paths to an intensified CCD detector. The laser pulse duration was 4 ns, and the ratio of the 5 D2 to 5 D0 line intensities yielded the temperature. A significant finding is that they were able to determine how the measurement accuracy varied with measurement area. The maximum error found for a surface the size of a 1 × 1 pixel in their video system was 1.37 ° C, and the average error was only 0.09 ° C. A particularly clever conception by Goss et al. (19) illustrates another instance where the method is better suited than infrared emission methods. It involved the visualization through a flame produced by condensedphase combustion of solid rocket propellant. They impregnated the fuel under test with YAG:Dy, and used the ratio of an F-level band at 496 nm to a G-level band at 467 nm as the signal of interest. Because of thermalization, the intensity of the 467-nm band increased
LASER-INDUCED FLUORESCENCE IMAGING
in comparison with the 496-nm band across the range from ambient to 1,673 K, the highest temperature they were able to attain. At that temperature, the blackbody emission introduces a significant background component into the signal, even within the narrow passband of the spectrometer that was employed. To mitigate this, they used a Q-switched Nd:YAG laser (frequency-tripled to 355 nm). The detectors in this arrangement included an intensified (1,024-element) diode array and also an ICCD detector. They were gated for a period of 10 µs. To simulate combustion in the laboratory, the phosphor was mixed with a low melting point (400 K) plastic, and the resulting blend was heated with a focused CO2 laser which ignited a flame that eroded the surface. A time history of the disintegrating plastic surface was then obtained from the measurement system. Because of the short duration of the fluorescence, the power of the laser, and the gating of the detector, they were able to measure temporally resolved temperature profiles in the presence of the flame. Krauss, Laufer, and colleagues at the University of Virginia used laser-induced phosphor fluorescent imaging for temperatures as high as 1,100 ° C. Their efforts during the past decade have included a simultaneous temperature and strain sensing method, which they pioneered (20,21). For this, they deposit closely spaced thin stripes of phosphor material on the test surface. A camera views laser-induced fluorescence from the stripes. An image is acquired at ambient, unstressed conditions and subsequently at temperature and under stress. A digital moir´e pattern of the stripes is produced by comparing the images before and after. The direction and magnitude of the moir´e pattern indicates strain. The ratio of the two colors of the fluorescence yields temperature.
PRESSURE-SENSITIVE PAINT Background Pressure-sensitive paints are coatings that use luminescing compounds that are sensitive to the presence of oxygen. References 22–26 are reviews of the subject. There several varieties of PSPs discussed in the literature. Typically, they are organic compounds that have a metal ligand. Pressure-sensitive paint usually consists of a PSP compound mixed with a gas-permeable binder. On the molecular level, a collision of an oxygen molecule with the compound prevents fluorescence. Thus, the greater the oxygen concentration, the less the fluorescence. This application of fluorescence is newer than planar laser-induce fluorescence and phosphor thermography, but it is a field of rapidly growing importance. The utility of imaging pressure profiles of aerodynamic surfaces in wind tunnels and inside turbine engines, as well as for flights in situ has spurred this interest. For the isothermal case, the luminescent intensity I and decay time τ of a pressure-sensitive paint depend on
867
oxygen pressure as I0 τ0 = = 1 + K · Po = 1 + kq τo Po , τ I
(3)
where K = kq · τ0 and the subscript 0 refers to the respective values at zero oxygen pressure (vacuum). In practice, rather than performing the measurements under vacuum, a reference image is taken at atmospheric conditions where pressure and temperature are well established. The common terminology is ‘‘wind-on’’ and ‘‘wind-off’’ where the latter refers to reference atmospheric conditions. The equations may then be rearranged to obtain (where A(T) and B(T) are functions of temperature) P IREF . = A(T) + B(T) I PREF
(4)
Pressure-sensitive paint installations typically use an imaging system that has a CCD camera connected to a personal computer and some type of frame digitizing hardware. Temperature also plays a role in the oxygen quenching because the collision rate is temperaturedependent. The precise temperature dependence may be further complicated because there may be additional thermal quenching mechanisms and the absorption coefficient can be somewhat affected by temperature. Therefore, for unambiguous pressure measurement, the temperature should either be uniform or measured by other means. One way to do this is to incorporate the fluorescent material in an oxygen-impermeable host. The resulting coating is termed a temperature-sensitive paint (TSP) because oxygen quenching is prevented in this case. Another means is to mix a thermographic phosphor with the PSP. There are advantages and disadvantages to these methods. For example, the fluorescent emissions of the PSP and TP or TSP should be distinguishable from each other. This is an active area of research, and other approaches are being investigated. A PSP may be illuminated by any of a variety of pulsed or continuous mercury or rare-gas discharge lamps. Light sources that consist of an array of bright blue LEDs are of growing importance for this application. Laser beams may be used which are expanded to illuminate the object fully. Alternatively, the lasers are scanned as described earlier in the planar LIF section. Applications This is a young but rapidly changing field. Some of the most important initial work using oxygen-quenched materials for aerodynamic pressure measurements was done by Russian researchers in the early 1980s (26). In the United States, the method was pioneered by researchers at the University of Washington and collaborators (28,29) In the early 1990s, PSP became a topic for numerous conference presentations that reported a wide range of low-speed, transonic, and supersonic aerodynamic
868
LASER-INDUCED FLUORESCENCE IMAGING
applications. Measurement of rotating parts is important and is one requirement that drives the search for short decay time luminophors. The other is the desire for fast temporal response of the sensor. Laboratory setups can be compact, simple, and inexpensive. On the other hand, a 16-ft diameter wind tunnel at the Air Force’s Arnold Engineering Development Center has numerous cameras that viewing the surface from a variety of angles (30). In this application, significant progress has been achieved in computational modeling to remove various light scattering effects that can be significant for PSP work (31). The collective experience from nonimaging applications of PSPs and thermographic phosphors shows that either decay time or phase measurement usually presents the best option for determining pressure (or temperature) due to immunity from various noise sources and inherently better sensitivity. Therefore, phase-sensitive imaging and time-domain imaging are approaches that are being explored and could prove very useful (32,33). Research into improved PSP material is proceeding in a variety of directions. Various biluminophor schemes are being investigated for establishing temperature as well as pressure. Phosphors and laser dyes are receiving attention because they exhibit temperature dependence but no pressure dependence. Not only is the fluorescing material important but the host matrix is as well. The host’s permeability to oxygen governs the time response to pressure changes. Thus, there is a search for new host materials to enable faster response. Now, the maximum time response rate is about 100 kHz. This is a fast moving field, as evidenced by the fact that only two years ago, one of the authors was informed that the maximum response rate was about 1 kHz. In aerodynamic applications, the method for coating surfaces is very important, especially for scaled studies of aerodynamic heating and other effects that depend on model surface properties. Work at NASA Langley on scaled models uses both phosphors and pressure-sensitive paints (34,35). Scaling considerations demand a very smooth surface finish.
FUTURE ADVANCES FOR THESE TECHNIQUES Every advance in spectroscopy increases the number of applications for planar laser-induced fluorescence. One of the main drivers for this is laser technology. The variety of lasers available to the user continues to proliferate. As a given type of laser becomes smaller and less expensive, its utility in PLIF applications is expanded and sometimes facilitates the movement of a technique out of the lab and into the field. New laser sources always enable new types of spectroscopies which produce new information on various spectral properties that can be exploited by PLIF. Improvements in producing narrower linewidths, shorter pulse lengths, higher repetition rates, better beam quality, and wider frequency range, as the case may be, will aid PLIF. In contrast, phosphor thermometry and pressuresensitive paint applications usually require a laser only
for situations that require high intensity due to remote distances, or the need to use fiber optics to access difficult to reach surfaces. Those situations usually do not involve imaging. Improvements to incoherent light sources are likely to have a greater impact on PT and PSP. For example, blue LEDs are sufficiently bright for producing useful fluorescence and are available commercially in arrays for spectroscopic applications. The trend will be to increased output and shorter wavelengths. However, one area of laser technology that could have a significant impact is the development of inexpensive blue and ultraviolet diode lasers. The field of PSP application is the newest of the technologies discussed here and it has been growing the fastest. Currently, applications are limited to pressures of a few atmospheres. Because PSPs used to date are organic materials, the temperatures at which they can operate and survive are limited to about 200 ° C. Development of inorganic salts and other materials are underway to increase the temperature and pressure range accessible to the technique. One PSP material will not serve all possible needs. There may eventually be hundreds of PSP materials that will be selected on the basis of (1) chemical compatibility in the intended environment, (2) the need to match excitation and emission spectral characteristics with available light sources, (3) decay time considerations that are important for moving surfaces, (4) pressure and temperature range, (5) frequency response required, and (6) the specifics of adhesion requirements in the intended application. The PSP materials of today are, to our knowledge, all based on oxygen quenching. However, materials will be developed that are sensitive to other substances as well. Acknowledgments Oak Ridge National Laboratory is managed by UT-Battelle, LLC, for the U.S. Dept. of Energy under contract DE-AC05-00OR22725.
BIBLIOGRAPHY 1. W. P. Partridge, Doctoral Dissertation, Purdue University, 1996. 2. D. L. Hartley, M. Lapp and C. M. Penney, eds., Laser Raman Gas Diagnostics, Plenum Press, NY, 1974. 3. M. Alden and S. Svanberg, Proc. Laser Inst. Am. 47, 134–143 (1984). 4. R. B. Miles, E. Udd, and M. Zimmerman, Appl. Phys. Lett. 32, 317–319 (1978). 5. R. K. Hanson, J. M. Seitzman, and P. H. Paul, Appl. Phys. B 50, 441–454 (1990). 6. P. H. Paul, J. A. Gray, J. L. Durant, and J. W. Thoman, AIAA J. 32, 1,670–1,675 (1994). 7. P. Partridge and N. M. Laurendeau, to be published in Appl. Phy. B. 8. A. M. K. P Taylor, ed., Instrumentation for Flows with Combustion, Academic Press, London, England 1993. 9. W. Demtroder, Laser Spectroscopy, Basic Concepts and Instrumentations, Springer-Verlag, NY, 1988. 10. N. M. Laurendeau, Prog. Energy Combust. Sci. 14, 147–170 (1988).
LIDAR
869
11. A. C. Eckbreth, Laser Diagnostics for Combustion Temperature and Species, Abacus Press, Cambridge, CA, 1988.
INTRODUCTION
12. R. M. Measures, J. Appl. Phys. 39, 5,232–5,245 (1968).
Light detection and ranging (lidar) is a technique in which a beam of light is used to make range-resolved remote measurements. A lidar emits a beam of light, that interacts with the medium or object under study. Some of this light is scattered back toward the lidar. The backscattered light captured by the lidar’s receiver is used to determine some property or properties of the medium in which the beam propagated or the object that caused the scattering. The lidar technique operates on the same principle as radar; in fact, it is sometimes called laser radar. The principal difference between lidar and radar is the wavelength of the radiation used. Radar uses wavelengths in the radio band whereas lidar uses light, that is usually generated by lasers in modern lidar systems. The wavelength or wavelengths of the light used by a lidar depend on the type of measurements being made and may be anywhere from the infrared through the visible and into the ultraviolet. The different wavelengths used by radar and lidar lead to the very different forms that the actual instruments take. The major scientific use of lidar is for measuring properties of the earth’s atmosphere, and the major commercial use of lidar is in aerial surveying and bathymetry (water depth measurement). Lidar is also used extensively in ocean research (1–5) and has several military applications, including chemical (6–8) and biological (9–12) agent detection. Lidar can also be used to locate, identify, and measure the speed of vehicles (13). Hunters and golfers use lidar-equipped binoculars for range finding (14,15). Atmospheric lidar relies on the interactions, scattering, and absorption, of a beam of light with the constituents of the atmosphere. Depending on the design of the lidar, a variety of atmospheric parameters may be measured, including aerosol and cloud properties, temperature, wind velocity, and species concentration. This article covers most aspects of lidar as it relates to atmospheric monitoring. Particular emphasis is placed on lidar system design and on the Rayleigh lidar technique. There are several excellent reviews of atmospheric lidar available, including the following: Lidar for Atmospheric Remote Sensing (16) gives a general introduction to lidar; it derives the lidar equation for various forms of lidar including Raman and differential absorption lidar (DIAL). This work includes details of a Raman and a DIAL system operated at NASA’s Goddard Space Flight Center. Lidar Measurements: Atmospheric Constituents, Clouds, and Ground Reflectance (17) focuses on the differential absorption and DIAL techniques as well as their application to monitoring aerosols, water vapor, and minor species in the troposphere and lower stratosphere. Descriptions of several systems are given, including the results of measurement programs using these systems. Optical and Laser Remote Sensing (18) is a compilation of papers that review a variety of lidar techniques and applications. Lidar Methods and Applications (19) gives an overview of lidar that covers all areas of atmospheric monitoring and research, and emphasizes
13. Flow Visualization VII: Proc. Seventh Int. Symp. Flow Visualization J. P. Crowder, ed., 1995. 14. S. W. Allison and G. T. Gillies, Rev. Sci. Instrum. 68(7), 1–36 (1997). 15. K. T. V. Grattan and Z. Y. Zhang, Fiber Optic Fluorescence Thermometry, Chapman & Hall, London, 1995. 16. K. W. Tobin, G. J. Capps, J. D. Muhs, D. B. Smith, and M. R. Cates, Dynamic High-Temperature Phosphor Thermometry. Martin Marietta Energy Systems, Inc. Report No. ORNL/ATD-43, August 1990. 17. B. W. Noel, W. D. Turley, M. R. Cates, and K. W. Tobin, Two Dimensional Temperature Mapping using Thermographic Phosphors. Los Alamos National Laboratory Technical Report No. LA UR 90 1534, May 1990. 18. D. J. Bizzak and M. K. Chyu, Rev. Sci. Instrum. 65, 102 (1994). 19. P. Goss, A. A. Smith, and M. E. Post, Rev. Sci. Instrum. 60(12), 3,702–3,706 (1989). 20. R. P. Marino, B. D. Westring, G. Laufer, R. H. Krauss, and R. B. Whitehurst, AIAA J. 37, 1,097–1,101 (1999). 21. A. C. Edge, G. Laufer, and R. H. Krauss, Appl. Opt. 39(4), 546–553 (2000). 22. B. C. Crites, in Measurement Techniques Lecture Series 1993–05, von Karman Institute for Fluid Dynamics, 1993. 23. B. G. McLachlan, and J. H. Bell, Exp. Thermal Fluid Sci. 10(4), 470–485 (1995). 24. T. Liu, B. Campbell, S. Burns, and J. Sullivan, Appl. Mech. Rev. 50(4), 227–246 (1997). 25. J. H. Bell, E. T. Schairer, L. A. Hand, and R. Mehta, to be published in Annu. Rev. Fluid Mech. 26. V. Mosharov, V. Radchenko, and S. Fonov, Luminescent Pressure Sensors in Aerodynamic Experiments. Central Aerohydrodynamic Institute and CW 22 Corporation, Moscow, Russia, 1997. 27. M. M. Ardasheva, L. B. Nevshy, and G. E. Pervushin, J. Appl. Mech. Tech. Phys. 26(4), 469–474 (1985). 28. M. Gouterman, J. Chem. Ed. 74(6), 697–702 (1997). 29. J. Kavandi and J. P. Crowder, AIAA Paper 90-1516, 1990. 30. M. E. Sellers and J. A. Brill, AIAA Paper 94-2481, 1994. 31. W. Ruyten, Rev. Sci. Instrum. 68(9), 3,452–3,457 (1997). 32. C. W. Fisher, M. A. Linne, N. T. Middleton, G. Fiechtner, and J. Gord, AIAA Paper 99-0771. 33. P. Hartmann and W. Ziegler, Anal. Chem. 68, 4,512–4,514 (1996). 34. Quantitative Surface Temperature Measurement using TwoColor Thermographic Phosphors and Video Equipment, US Pat. 4,885,633 December 5, 1989 G. M. Buck. 35. G. M. Buck, J. Spacecraft Rockets 32(5), 791–794 (1995).
LIDAR P. S. ARGALL R. J. SICA The University of Western Ontario London, Ontario, Canada
870
LIDAR
the role lidar has played in improving our understanding of the atmosphere. Coherent Doppler Lidar Measurement of Winds (20) is a tutorial and review article on the use of coherent lidar for measuring atmospheric winds. Lidar for Atmospheric and Hydrospheric Studies (21) describes the impact of lidar on atmospheric and to a lesser extent oceananic research particularly emphasizing work carried out during the period 1990 to 1995. This review details both the lidar technology and the environmental research and monitoring undertaken with lidar systems. Laser Remote Sensing (22) is a comprehensive text that covers lidar. This text begins with chapters that review electromagnetic theory, which is then applied to light scattering in the atmosphere. Details, both theoretical and practical, of each of the lidar techniques are given along with many examples and references to operating systems.
Laser beam FOV of receiver
Monostatic coaxial
Monostatic biaxial
Biaxial
Figure 1. Field of view arrangements for lidar laser beam and detector optics .
Scatterers
HISTORICAL OVERVIEW Synge in 1930 (23) first proposed the method of determining atmospheric density by detecting scattering from a beam of light projected into the atmosphere. Synge suggested a scheme where an antiaircraft searchlight could be used as the source of the beam and a large telescope as a receiver. Ranging could be accomplished by operating in a bistatic configuration, where the source and receiver were separated by several kilometres. The receiver’s field-ofview (FOV) could be scanned along the searchlight beam to obtain a height profile of the scattered light’s intensity from simple geometric considerations. The light could be detected by using a photoelectric apparatus. To improve the signal level and thus increase the maximum altitude at which measurements could be made, Synge also suggested that a large array of several hundred searchlights could be used to illuminate the same region of the sky. The first reported results obtained using the principles of this method are those of Duclaux (24) who made a photographic recording of the scattered light from a searchlight beam. The photograph was taken at a distance of 2.4 km from the searchlight using an f /1.5 lens and an exposure of 1.5 hours. The beam was visible on the photograph to an altitude of 3.4 km. Hulbert (25) extended these results in 1936 by photographing a beam to an altitude of 28 km. He then made calculations of atmospheric density profiles from the measurements. A monostatic lidar, the typical configuration for modern systems, has the transmitter and receiver at the same location, (Fig. 1). Monostatic systems can be subdivided into two categories, coaxial systems, where the laser beam is transmitted coaxially with the receiver’s FOV, and biaxial systems, where the transmitter and receiver are located adjacent to each other. Bureau (26) first used a monostatic system in 1938. This system was used for determining cloud base heights. As is typical with a monostatic system, the light source was pulsed, thereby enabling the range at which the scattering occured to be determined from the round-trip time of the scattered light pulse, as shown in Fig. 2. By refinements of technique and improved instrumentation, including electrical recording of backscattered light
z = Range
t up = z /c
t down = z /c
Ground
t total = t up + t down = 2z /c z = (t total · c ) /2 Figure 2. Schematic showing determination of lidar range.
intensity, Elterman (27) calculated density profiles up to 67.6 km. He used a bistatic system where the transmitter and receiver were 20.5 km apart. From the measured density profiles, Elterman calculated temperature profiles using the Rayleigh technique. Friedland et al. (28) reported the first pulsed monostatic system for measuring atmospheric density in 1956. The major advantage of using a pulsed monostatic lidar is that for each light pulse fired, a complete altitudescattering profile can be recorded, although commonly many such profiles are required to obtain measurements that have a useful signal-to-noise ratio. For a bistatic lidar, scattering can be detected only from a small layer in the atmosphere at any one time, and the detector must be moved many times to obtain an altitude profile. The realignment of the detector can be difficult due to the large separations and the strict alignment requirements of the beam and the FOV of the detector system. Monostatic lidar inherently averages the measurements at all altitudes across exactly the same period, whereas a bistatic system takes a snapshot of each layer at a different time. The invention of the laser (29) in 1960 and the giant pulse or Q-switched laser (30) in 1962 provided a powerful new light source for lidar systems. Since the invention of the laser, developments in lidar have been closely linked to advances in laser technology. The first use of a laser in a lidar system was reported in 1962 by Smullins and Fiocco (31), who detected laser light scattered from the lunar surface using a ruby laser that fired 0.5-J pulses at 694 nm. In the following year, these same workers
LIDAR
Light transmitted into atmosphere
Beam expander (optional)
Laser
Transmitter
871
Backscattered light
Light collecting telescope
Optical filtering for wavelength, polarization, and/or range
Receiver
Optical to electrical transducer
Electrical recording system Detector
Figure 3. Block diagram of a generic lidar system.
reported the detection of atmospheric backscatter using the same laser system (32). LIDAR BASICS The first part of this section describes the basic hardware required for a lidar. This can be conveniently divided into three components: the transmitter, the receiver, and the detector. Each of these components is discussed in detail. Figure 3, a block diagram of a generic lidar system, shows how the individual components fit together. In the second part of this section, the lidar equation that gives the signal strength measured by a lidar in terms of the physical characteristics of the lidar and the atmosphere is derived. Transmitter The purpose of the transmitter is to generate light pulses and direct them into the atmosphere. Figure 4 shows the laser beam of the University of Western Ontario’s Purple Crow lidar against the night sky. Due to the special characteristics of the light they produce, pulsed lasers are ideal as sources for lidar systems. Three properties of a pulsed laser, low beam divergence, extremely narrow spectral width, and short intense pulses, provide significant advantages over white light as the source for a lidar. Generally, it is an advantage for the detection system of a lidar to view as small an area of the sky as possible as this configuration keeps the background low. Background is light detected by the lidar that comes from sources other than the transmitted laser beam such as scattered or direct sunlight, starlight, moonlight, airglow, and scattered light of anthropogenic origin. The larger the area of the sky that the detector system views, that is, the larger the FOV, the higher the measured background. Therefore, it is usually preferable for a lidar system to view as small an area of the sky as possible. This constraint is especially true if the lidar operates in the daytime (33–35), when scattered sunlight becomes the major source of background. Generally, it is also best if the entire laser beam falls within the FOV of the detector system as
Figure 4. Laser beam transmitted from the University of Western Ontario’s Purple Crow lidar. The beam is visible from several kilometers away and often attracts curious visitors. See color insert.
this configuration gives maximum system efficiency. The divergence of the laser beam should be sufficiently small, so that it remains within the FOV of the receiver system in all ranges of interest. A simple telescope arrangement can be used to decrease the divergence of a laser beam. This also increases the diameter of the beam. Usually, only a small reduction in the divergence of a laser beam is required in a lidar system, because most lasers have very low divergence. Thus, a small telescope, called a beam expander, is usually
872
LIDAR
all that is required to obtain a sufficiently well-collimated laser beam for transmission into the atmosphere. The narrow spectral width of the laser has been used to advantage in many different ways in different lidar systems. It allows the detection optics of a lidar to spectrally filter incoming light and thus selectively transmit photons at the laser wavelength. In practice, a narrowband interference filter is used to transmit a relatively large fraction of the scattered laser light (around 50%) while transmitting only a very small fraction of the background white light. This spectral selectivity means that the signal-to-background ratio of the measurement will be many orders of magnitude greater when a narrowband source and a detector system interference filter are used in a lidar system. The pulsed properties of a pulsed laser make it an ideal source for a lidar, as this allows ranging to be achieved by timing the scattered signal. A white light or a continuouswave (cw) laser can be mechanically or photo electrically chopped to provide a pulsed beam. However, the required duty cycle of the source is so low that most of the energy is wasted. To achieve ranging, the length of the laser pulses needs to be much shorter than the required range resolution, usually a few tens of meters. Therefore, the temporal length of the pulses needs to be less than about 30 ns. The pulse-repetition frequency (PRF) of the laser needs to be low enough that one pulse has time to reach a sufficient range, so that it no longer produces any signal before the next pulse is fired. This constraint implies a maximum PRF of about 20 kHz for a lidar working at close range. Commonly, much lower laser PRFs are used because decreasing the PRF reduces the active observing time of the receiver system and therefore, reduces the background. High PRF systems do have the distinct advantage of being able to be made ‘‘eye-safe’’ because the energy transmitted in each pulse is reduced (36). Using the values cited for the pulse length and the PRF gives a maximum duty cycle for the light source of about 0.06%. This means that a chopped white light or cw laser used in a lidar would have effective power of less than 0.06% of its actual power. However, for some applications, it is beneficial to use cw lasers and modulation code techniques for range determination (37,38). The type of laser used in a lidar system depends on the physical quantity that the lidar has been designed to measure. Some measurements require a very specific wavelength (i.e., resonance–fluorescence) or wavelengths (i.e., DIAL) and can require complex laser systems to produce these wavelengths, whereas other lidars can operate across a wide wavelength range (i.e., Rayleigh, Raman and aerosol lidars). The power and pulse-repetition frequency of a laser must also match the requirements of the measurements. There is often a compromise of these quantities, in addition to cost, in choosing from the types of lasers available.
that collects the light scattered back from the atmosphere and focuses it to a smaller spot. The size of the primary optic is an important factor in determining the effectiveness of a lidar system. A larger primary optic collects a larger fraction of the scattered light and thus increases the signal measured by the lidar. The size of the primary optic used in a lidar system may vary from about 10 cm up to a few meters in diameter. Smaller aperture optics are used in lidar systems that are designed to work at close range, for example, a few 100 meters. Larger aperture primary optics are used in lidar systems that are designed to probe the middle and upper regions of the Earth’s atmosphere where the returned signal is a much smaller fraction of the transmitted signal (39,40). Smaller primary optics may be lenses or mirrors; the larger optics are typically mirrors. Traditional parabolic glass telescope primary mirrors more than about a half meter in diameter are quite expensive, and so, some alternatives have been successfully used with lidar systems. These alternatives include liquidmirror telescopes (LMTs) (36,41) (Fig. 5), holographic elements (42,43), and multiple smaller mirrors (44–46). After collection by the primary optic, the light is usually processed in some way before it is directed to the detector system. This processing can be based on wavelength, polarization, and/or range, depending on the purpose for which the lidar has been designed. The simplest form of spectral filtering uses a narrowband interference filter that is tuned to the laser wavelength. This significantly reduces the background, as described in the previous section, and blocks extraneous signals. A narrowband interference filter that is typically around 1 nm wide provides sufficient rejection of background light for a lidar to operate at nighttime. For daytime use, a much narrower filter is usually employed (47–49). Complex spectral filtering schemes
Receiver The receiver system of a lidar collects and processes the scattered laser light and then directs it onto a photodetector, a device that converts the light to an electrical signal. The primary optic is the optical element
Figure 5. Photograph of the 2.65-m diameter liquid mercury mirror used at the University of Western Ontario’s, Purple Crow lidar. See color insert.
LIDAR
are often used for Doppler and high-spectral-resolution lidar (50–54). Signal separation based on polarization is a technique that is often used in studying atmospheric aerosols, including clouds, by using lidar systems (55–58). Light from a polarized laser beam backscattered by aerosols will generally undergo a degree of depolarization, that is, the backscattered light will not be plane polarized. The degree of depolarization depends on a number of factors, including the anisotropy of the scattering aerosols. Depolarization of backscattered light also results from multiple scattering of photons. Processing of the backscattered light based on range is usually performed in order to protect the detector from the intense near-field returns of higher power lidar systems. Exposing a photomultiplier tube (PMT) to a bright source such as a near-field return, even for a very short time, produces signal-induced noise (SIN) that affects the ability of the detection system to record any subsequent signal accurately (59,60). This protection is usually achieved either by a mechanical or electroptical chopper that closes the optical path to the detector during and immediately after the laser fires or by switching the detector off during this time, called gating. A mechanical chopper used for protecting the detector is usually a metal disk that has teeth on its periphery and is rotated at high speed. The laser and chopper are synchronized so that light backscattered from the near field is blocked by the chopper teeth but light scattered from longer ranges is transmitted through the spaces between the teeth. The opening time of the chopper depends on both the diameter of the optical beam that is being chopped and the speed at which the teeth move. Generally, opening times around 20–50 ms corresponding to a lidar range of between a few and several kilometers are required. Opening times of this order can be achieved by using a beam diameter of a few millimeters and a 10-cm diameter chopper rotating at several thousand revolutions per minute (61). Signal Detection and Recording The signal detection and recording section of a lidar takes the light from the receiver system and produces a permanent record of the measured intensity as a function of altitude. The signal detection and recording system in the first lidar experiments was a camera and photographic film (24,25). Today, the detection and recording of light intensity is done electronically. The detector converts the light into an electrical signal, and the recorder is an electronic device or devices, that process and record this electrical signal. Photomultiplier tubes (PMTs) are generally used as detectors for incoherent lidar systems that use visible and UV light. PMTs convert an incident photon into an electrical current pulse (62) large enough to be detected by sensitive electronics. Other possibilities for detectors (63) for lidar systems include multianode PMTs (64), MCPs (65), avalanche photodiodes (66,67), and CCDs (68,69). Coherent detection is covered in a later section.
873
The output of a PMT has the form of current pulses that are produced both by photons entering the PMT and the thermal emission of electrons inside the PMT. The output due to these thermal emissions is called dark current. The output of a PMT can be recorded electronically in two ways. In the first technique, photon counting, the pulses are individually counted; in the second technique, analog detection, the average current due to the pulses is measured and recorded. The most appropriate method for recording PMT output depends on the rate at which the PMT produces output pulses, which is proportional to the intensity of the light incident on the PMT. If the average rate at which the PMT produces output pulses is much less that the average pulse width, then individual pulses can be easily identified, and photon counting is the more appropriate recording method.
Photon Counting Photon counting is a two-step process. First, the output of the PMT is filtered using a discriminator to remove a substantial number of the dark counts. This is possible because the average amplitude of PMT pulses produced by incident photons is higher that the average amplitude of the pulses produced by dark counts. A discriminator is essentially a high-speed comparator whose output changes state when the signal from the PMT exceeds a preset level, called the discriminator level. By setting the discriminator level somewhere between the average amplitude of the signal count and dark count levels, the discriminator can effectively filter out most of the dark counts. Details of operating a photomultiplier in this manner can be found in texts on optoelectronics (62,70,71). The second step in photon counting involves using a multichannel counter, often called a multichannel scaler (MCS). A MCS has numerous memory locations that are accessed sequentially and for a fixed time after the MCS receives a signal indicating that a laser pulse has been fired into the atmosphere. If the output from the discriminator indicates that a count should be registered, then the MCS adds one to the number in the currently active memory location. In this way, the MCS can count scattered laser photons as a function of range. An MCS is generally configured to add together the signals detected from a number of laser pulses. The total signal recorded by the MCS, across the averaging period of interest, is then stored on a computer. All of the MCS memory locations are then reset to zero, and the counting process is restarted. If a PMT produces two pulses that are separated by less that the width of a pulse, they are not resolved, and the output of the discriminator indicates that only one pulse was detected. This effect is called pulse pileup. As the intensity of the measured light increases, the average count rate increases, pulse pileup becomes more likely, and more counts are missed. The loss of counts due to pulse pileup can be corrected (39,72), as long as the count rate does not become excessive. In extreme cases, many pulses pileup, and the output of the PMT remains above the discriminator level, so that no pulses are counted.
874
LIDAR
Analog Detection Analog detection is appropriate when the average count rate approaches the pulse-pair resolution of the detector system, usually of the order of 10 to 100 MHz depending on the PMT type, the speed of the discriminator, and the MCS. Analog detection uses a fast analog-to-digital converter to convert the average current from the PMT into digital form suitable for recording and manipulation on a computer (73). Previously, we described a method for protecting a PMT from intense near-field returns using a rotating chopper. An alternative method for protecting a PMT is called blanking or gating (74–76). During gating, the PMT is effectively turned off by changing the distribution of voltages on the PMT’s dynode chain. PMT gating is simpler to implement and more reliable than a mechanical chopper system because it has no moving parts. However, it can cause unpredictable results because gating can cause gain variations and a variable background that complicates the interpretation of the lidar returns. Coherent Detection Coherent detection is used in a class of lidar systems designed for velocity measurement. This detection technique mixes the backscattered laser light with light from a local oscillator on a photomixer (77). The output of the photomixer is a radio-frequency (RF) signal whose frequency is the difference between the frequencies of the two optical signals. Standard RF techniques are then used to measure and record this signal. The frequency of the measured RF signal can be used to determine the Doppler shift of the scattered laser light, which in turn allows calculation of the wind velocity (78–82). Coherent lidar systems have special requirements for laser pulse length and frequency stability. The advantage of coherent detection for wind measurements is that the instrumentation is generally simpler and more robust than that required for incoherent optical interferometric detection of Doppler shifts (20). An Example of a Lidar Detection System Many lidar systems detect light at multiple wavelengths and/or at different polarization angles. The Purple Crow lidar (39,83) at the University of Western Ontario detects scattering at four wavelengths (Fig. 6). A Nd:YAG laser operating at the second-harmonic frequency (532 nm) provides the light source for the Rayleigh (532 nm) and the two Raman shifted channels, N2 (607 nm) and H2 O (660 nm). The fourth channel is a sodium resonancefluorescence channel that operates at 589 nm. Dichroic mirrors are used to separate light collected by the parabolic mirror into these four channels before the returns are filtered by narrowband interference filters and imaged onto the PMTs. A rotating chopper is incorporated into the two highsignal-level channels, Rayleigh and sodium, to protect the PMTs from intense near-field returns. The chopper operates at a high speed, 8,400 rpm, and is comprised of a rotating disk that has two teeth on the outside edge. This chopper blocks all scatter from below 20 km
Rayleigh PMT l = 532 nm
Mirror
Sodium PMT l = 589 nm Interference filters
Dichroic R l = 589 T l = 532 nm
Chopper Nitrogen PMT l = 607 nm
Mirror
Water vapor PMT l = 660 nm
Dichroic R l >600 nm Dichroic R l = 607 nm T l = 660 nm Telescope focus
Figure 6. Schematic of the detection system of the Purple Crow lidar at the University of Western Ontario.
and is fully open by 30 km. The signal levels in the two Raman channels are sufficiently small that the PMTs do not require protection from near-field returns. The two Raman channels are used for detecting H2 O and N2 in the troposphere and stratosphere and thus allow measurement of water vapor concentration and temperature profiles. Measurements from the Rayleigh and sodium channels are combined to provide temperature profiles from 30 to 110 km. THE LIDAR EQUATION The lidar equation is used to determine the signal level detected by a particular lidar system. The basic lidar equation takes into account all forms of scattering and can be used to calculate the signal strength for all types of lidar, except those that employ coherent detection. In this section, we derive a simplified form of the lidar equation that is appropriate for monostatic lidar without any high-spectral resolution components. This equation is applicable to simple Rayleigh, vibrational Raman, and DIAL systems. It is not appropriate for Doppler or pure rotational Raman lidar, because it does not include the required spectral dependencies. Let us define P as the total number of photons emitted by the laser in a single laser pulse at the laser wavelength λl and τt as the transmission coefficient of the lidar transmitter optics. Then the total number of photons transmitted into the atmosphere by a lidar system in a single laser pulse is given by Pτt (λl ).
(1)
LIDAR
The number of photons available to be scattered in the range interval r to r + dr from the lidar is Pτt (λl )τa (r, λl ) dr,
R2 τa (r, λl )σπi (λl )N i (r) dr,
(3)
R1
where σπi (λl ) is the backscatter cross section for scattering of type i at the laser wavelength and N i (r) is the number density of scattering centers that cause scattering of type i at range r. Range resolution is most simply and accurately achieved if the length of the laser pulse is much shorter than the length of the range bins. If this condition cannot be met, the signal can be deconvolved to obtain the required range resolution (84,85). The effectiveness of this deconvolution depends on a number of factors, including the ratio of the laser pulse length to the length of the range bins, the rate at which the signal changes over the range bins, and the signal-to-noise ratio of the measurements. The number of photons incident on the collecting optic of the lidar due to scattering of type i is R2 Pτt (λl )A
1 τa (r, λl )τa (r, λs )ζ(r)σπi (λl )N i (r) dr, r2
(4)
R1
where A is the area of the collecting optic, λs is the wavelength of the scattered light, and ζ(r) is the overlap factor that takes into account the intensity distribution across the laser beam and the physical overlap of the transmitted laser beam on the FOV of the receiver optics. The term 1/r2 arises in Eq. (4) due to the decreasing illuminance of the telescope by the scattered light, as the range increases. For photon counting, the number of photons detected as pulses at the photomultiplier output per laser pulse is R2 Pτt (λl )Aτr (λs )Q(λs )
terms, τa (r, λl ), τa (r, λs ), N i (r), and ζ(r), varies significantly throughout individual range bins, then the range integral may be removed, and Eq. 5 becomes
(2)
where τa (r, λl ) is the optical transmission of the atmosphere at the laser wavelength, along the laser path to the range r. Note that range and altitude are equivalent only for a vertically pointing lidar. The number of photons backscattered, per unit solid angle due to scattering of type i, from the range interval R1 to R2 , is
Pτt (λl )
875
1 τa (r, λl )τa (r, λs )ζ(r)σπi (λl )N i (r) dr, r2
R1
(5) where τr (λs ) is the transmission coefficient of the reception optics at λs and Q(λs ) is the quantum efficiency of the photomultiplier at wavelength λs . For analog detection, the current recorded can be determined by replacing the quantum efficiency of the photomultiplier Q(λs ) by the gain G(λs ) of the photomultiplier combined with the gain of any amplifiers used. In many cases, approximations allow simplification of Eq. (5). For example, if none of the range-dependent
1 ζ(R)σπi (λl )N i (R)δR R2 (6) where R is the range of the center of the scattering volume and δR = R2 − R1 , is the length of the range bin. This form of the lidar equation can be used to calculate the signal strength for Rayleigh, vibrational Raman lidar, and DIAL as long as the system does not incorporate any filter whose spectral width is of the same order or smaller than the width of the laser output or the Doppler broadening function. For high-resolution spectral lidar, where a narrow-spectral-width filter or tunable laser is used, the variations in the individual terms of Eq. (6) with wavelength need to be considered. To calculate the measurement precision of a lidar that measures the Doppler shift and broadening of the laser line for wind and temperature determination, computer simulation of the instrument may be necessary. Pτt (λl )Aτr (λs )Q(λs )τa (R, λl )τa (R, λs )
LIGHT SCATTERING IN THE ATMOSPHERE AND ITS APPLICATION TO LIDAR The effect of light scattering in the Earth’s atmosphere, such as blue skies, red sunsets, and black, grey, and white clouds, is easily observed and reasonably well understood (86–89). Light propagating through the atmosphere is scattered and absorbed by the molecules and aerosols, including clouds that form the atmosphere. Molecular scattering takes place via a number of different processes and may be either elastic, where there is no exchange of energy with the molecule, or inelastic, where an exchange of energy occurs with the molecule. It is possible to calculate, by at least a reasonable degree of accuracy, the parameters that describe these molecular scattering processes. The theory of light scattering and absorption by spherical aerosols, usually called Mie (90) theory, is well understood, though the application of Mie theory to lidar can be difficult in practice. This difficulty arises due to computational limits encountered when trying to solve atmospheric scattering problems where the variations in size, shape, and refractive index of the aerosol particles can be enormous (91–97). However, because aerosol lidars can measure average properties of aerosols directly, they play an important role in advancing our understanding of the effect of aerosols on visibility (98–101) as well as on climate (102,103). Molecules scatter light by a variety of processes; there is, however, an even greater variety of terms used to describe these processes. In addition, researchers in different fields have applied the same terms to different processes. Perhaps the most confused term is Rayleigh scattering, which has been used to identify at least three different spectral regions of light scattered by molecules (104–106).
876
LIDAR
RAYLEIGH SCATTER AND LIDAR
molecule illuminated by plane polarized light, is
Rayleigh theory describes the scattering of light by particles that are small compared to the wavelength of the incident radiation. This theory was developed by Lord Rayleigh (107,108) to explain the color, intensity distribution, and polarization of the sky in terms of scattering by atmospheric molecules. In his original work on light scattering, Rayleigh used simple dimensional arguments to arrive at his well-known equation. In later years, Rayleigh (109,110) and others (22,87,111,112) replaced these dimensional arguments with a more rigorous mathematical derivation of the theory. Considering a dielectric sphere of radius r in a parallel beam of linearly polarized electromagnetic radiation, one can derive the scattering equation. The incident radiation causes the sphere to become an oscillating dipole that generates its own electromagnetic field, that is, the scattered radiation. For this derivation to be valid, it is necessary for the incident field to be almost uniform across the volume of the scattering center. This assumption leads to the restriction of Rayleigh theory to scattering by particles that are small compared to the wavelength of the incident radiation. It can be shown (113) that when r < 0.03λ, the differences between results obtained with Rayleigh theory and the more general Mie (90) theory are less than 1%. Rayleigh theory gives the following equation for the scattered intensity from a linearly polarized beam by a single molecule: Im (φ) = E20
2
9π ε0 c 2N 2 λ4
n −1 n2 + 2 2
2 sin2 φ,
(7)
where r is the radius of the sphere, n is the index of refractive of the sphere relative to that of the medium, that is, n = nmolecule /nmedium , N is the number density of the scattering centers, φ is the angle between the dipole axis and the scattering direction, and E0 is the maximum value of the electrical field strength of the incident wave (22,87). From Eq. (7), we see that the intensity of the scattered light varies as λ−4 . However, because the refractive index may also have a small wavelength dependence, the scattered intensity is in fact not exactly proportional to λ−4 . Middleton (114) gives a value of λ−4.08 for wavelengths in the visible. A useful quantity in discussion is the differentialscattering cross section (22), which is also called the angular scattering cross section (87). The differentialscattering cross section is the fraction of the power of the incident radiation that is scattered, per unit solid angle, in the direction of interest. The differential-scattering cross section is defined by dσ (φ) I0 = I(φ), d
(8)
where I0 = 1/2cε0 E20 is the irradiance of the incident beam. By substituting Eq. (7) in (8), it can be seen that the differential scattering cross section for an individual
9π 2 dσm (φ) = 2 4 d N λ
n2 − 1 n2 + 2
2 sin2 φ.
(9)
If we assume that n ≈ 1, then Eq. (9) can be approximated as dσm (φ) π 2 (n2 − 1)2 sin2 φ. (10) = d N 2 λ4 For a gas, the term (n2 − 1) is approximately proportional to the number density N (115), so Eq. (10) has only a very slight dependence on N. For air, the ratio (n2 − 1)/N varies less than 0.05% in the range of N between 0 and 65 km in altitude. When Rayleigh theory is extended to include unpolarized light, the angle φ no longer has any meaning because the dipole axis may lie along any line in the plane perpendicular to the direction of propagation. The only directions that can be uniquely defined are the direction of propagation of the incident beam and the direction in which the scattered radiation is detected; we define θ as the angle between these two directions. The differential-scattering cross section for an individual molecule that is illuminated by a parallel beam of unpolarized light is π 2 (n2 − 1)2 dσm (θ ) (1 + cos2 θ ). = d 2N 2 λ4
(11)
Figure 7 shows the intensity distribution for Rayleigh scattered light from an unpolarized beam. The distribution has peaks in the forward and backward directions, and the light scattered at right angles to the incident beam is plane polarized. Because of the anisotropy of molecules, which moves the molecules dipole moment slightly out of alignment with the incident field, scattering by molecules causes some depolarization of the scattered light. This results in some light whose a polarization is parallel to the incident beam being detected at a scattering angle of 90° .
Perpendicular component
Total
X
Parallel component
Figure 7. Intensity distribution pattern for Rayleigh scatter from an unpolarized beam traveling in the x direction. The perpendicular component refers to scattering of radiation whose electric vector is perpendicular to the plane formed by the direction of propagation of the incident beam and the direction of observation.
LIDAR
The depolarization ratio δnt is defined as δnt =
to determine the backscatter intensity of a particular Rayleigh lidar.
I , I⊥
(12)
where the parallel and perpendicular directions are taken with respect to the direction of the incident beam. The subscript n denotes natural (unpolarized) incident light and the superscript t denotes total molecular scattering. The depolarization is sometimes defined in terms of polarized incident light and/or for different spectral components of molecular scattering. There is much confusion about which is the correct depolarization to use under different circumstances, a fact evident in the literature. The reader should take great care to understand the terminology used by each author. Young (104) gives a brief survey of depolarization measurements for dry air and concludes that the effective value of δnt is 0.0279. He also gives a correction factor for the Rayleigh differential-scattering cross section, which, when applied to Eq. (11) gives dσm (θ ) π 2 (n2 − 1)2 1 + δnt + (1 − δnt ) cos2 θ = 7 t d 2N 2 λ4 δ 1− 6 n
(13)
Most lidar applications work with direct backscatter, i.e. θ = π , and the differential-backscatter cross section per molecule for scattering from an unpolarized beam is dσm (θ = π ) π 2 (n2 − 1)2 = d 2N 2 λ4
12 6 − 7δnt
877
(14)
The correction factor for backscatter is independent of the polarization state of the incident beam (111). This means that the correction factor and thus, the backscatter cross section per molecule are independent of the polarization characteristics of the laser used in a backscatter lidar. The Rayleigh molecular-backscatter cross section for an altitude less than 90 km and without the correction factor is given by Kent and Wright (116) as 4.60 × 10−57 /λ4 m2 sr−1 . When the correction factor is applied, with δnt = 0.0279, this result becomes dσm (θ = π ) 4.75 × 10−57 2 −1 m sr = d λ4
(15)
Collis et al. (117) gives a value of the constant in Eq. (15) as 4.99 × 10−57 m6 sr−1 . Fiocco (118) writes Eq. (15) in the form 4.73 × 10−57 2 −1 dσm (θ = π ) m sr = d λ4.09
(16)
Here, the wavelength exponent takes into account dispersion in air. Equations (15) and (16) are applicable to the atmosphere at altitudes less than 90 km. Above this altitude, the concentration of atomic oxygen becomes significant and changes the composition and thus, the refractive index. Equations (15) and (16), used in conjunction with the lidar equation [Eq. (6)] can be used
Rayleigh Lidar Rayleigh lidar is the name given to the class of lidar systems that measure the intensity of the Rayleigh backscatter from an altitude of about 30 km up to around 100 km. The measured backscatter intensity can be used to determine a relative density profile; this profile is used to determine an absolute temperature profile. Rayleigh scattering is by far the dominant scattering mechanism for light above an altitude of about 30 km, except in the rare case where noctilucent clouds exist. At altitudes below about 25–30 km, light is elastically scattered by aerosols in addition to molecules. Only by using high-spectralresolution techniques can the scattering from these two sources be separated (119). Thus, most Rayleigh lidar systems cannot be used to determine temperatures below the top of the stratospheric aerosol layer. The maximum altitude of the stratospheric aerosol layer varies with the season and is particularly perturbed after major volcanic activity. Above about 90 km, changes in composition, due mainly to the increasing concentration of atomic oxygen, cause the Rayleigh backscatter cross-section and the mean molecular mass of air to change with altitude. This leads to errors in the temperatures derived by using the Rayleigh technique that range from a fraction of a degree at 90 km to a few degrees at 110 km. For current Rayleigh systems, the magnitude of this error is significantly smaller than the uncertainties from other sources, such as the photocount statistics, in this altitude range. Low photocount rates give rise to large statistical uncertainties in the derived temperatures at the very top of Rayleigh lidar temperature profiles (Fig. 8a). Additional uncertainties in the temperature retrieval algorithm, due to the estimate of the pressure at the top of the density profile which is required to initiate temperature integration (120), can be significant and are difficult to quantify. The operating principle of a Rayleigh lidar system is simple. A pulse of laser light is fired up into the atmosphere, and any photons that are backscattered and collected by the receiving system are counted as a function of range. The lidar equation [Eq. (6)] can be directly applied to a Rayleigh lidar system to calculate the expected signal strength. This equation can be expressed in the form Signal strength = K
1 R2
Na δR
(17)
where K is the product of all of the terms that can be considered constants between 30 and 100 km in Eq. (6) and Na is the number density of air. This result assumes that there is insignificant attenuation of the laser beam as it propagates from 30 to 100 km, that is, the atmospheric transmission τa (r, λl ) is a constant for 30 < r < 100 km. If there are no aerosols in this region of the atmosphere and the laser wavelength is far from the absorption lines of any molecules, then the only attenuation of the laser beam is due to Rayleigh scatter and possibly
878
LIDAR
10
integration proceeds. A pressure profile calculated in this way is a relative profile because the density profile from which it was determined is a relative profile. However, the ratio of the relative densities to the actual atmospheric densities will be exactly the same as the ratio of the relative pressures to the actual atmospheric pressures:
15
Nrel = K Nact
(a)
(c)
(b)
5
and 20
Prel = K Pact ,
25
where Nrel is the relative density and Nact is the actual atmospheric density, similarly for the pressure P, and K is the unknown proportionality constant. The ideal gas law can then be applied to the relative density and pressure profiles to yield a temperature profile. Because the relative density and relative pressure profiles have the same proportionality constant [see Eq. (18)], the constants cancel, and the calculated temperature is absolute. The top of the temperature profile calculated in this scheme is influenced by the choice of initial pressure. Figure 8 shows the temperature error as a function of altitude for a range of pressures used to initiate the pressure integration algorithm. Users of this technique are well advised to ignore temperatures from at least the uppermost 8 km of the retrieval because the uncertainties introduced by the seed pressure estimate are not easily
0
5
10 15 Temperature error (K)
20
Figure 8. The propagation of the error in the calculated temperature caused by a (a) 2%, (b) 5% and (c) 10% error in the initial estimate of the pressure.
ozone absorption. Using Rayleigh theory, it can be shown that the transmission of the atmosphere from 30 to 100 km is greater than 99.99% in the visible region of the spectrum. Equation (17) shows that after a correction for range R, the measured Rayleigh lidar signal between 30 and 100 km is proportional to the atmospheric density. K cannot be determined due to the uncertainties in atmospheric transmission and instrumental parameters [see Eq. (6)]. Hence, Rayleigh lidar can typically determine only relative density profiles. A measured relative density profile can be scaled to a coincident radiosonde measurement or model density profile, either at a single altitude or across an extended altitude range. This relative density profile can be used to determine an absolute temperature profile by assuming that the atmosphere is in hydrostatic equilibrium and applying the ideal gas law. Details of the calculation and an error analysis for this technique can be found in both Chanin and Hauchecorne (120) and Shibata (121). The assumption of hydrostatic equilibrium, the balance of the upward force of pressure and the downward force of gravity, can be violated at times in the middle atmosphere due to instability generated by atmospheric waves, particularly gravity waves (122,123). However, sufficient averaging in space (e.g., 1 to 3 km) and in time (e.g., hours) minimizes such effects. Calculating an absolute temperature profile begins by calculating a pressure profile. The first step in this process is to determine the pressure at the highest altitude rangebin of the measured relative density profile. Typically, this pressure is obtained from a model atmosphere. Then, using the density in the top range-bin, the pressure at the bottom of this bin is determined using hydrostatic equilibrium. This integration is repeated for the second to top density range-bin and so on down to the bottom of the density profile. Because atmospheric density increases as altitude decreases, the choice of pressure at the top range-bin becomes less significant in the calculated pressures, as the
(18)
100 Altitude (km)
30
80 60 40 160
180
200
220
240
260
280
Temperature (K) 100 Altitude (km)
Distance below integration start (km)
0
b
80
a
60 40 0
5
10
15
Temperature (K) Figure 9. Top panel shows the average temperature (middle of the three solid lines) for the night of 13 August 2000 as measured by the PCL. The two outer solid lines represent the uncertainty in the temperature. Measurements are summed across 288 m in altitude and 8 hours in time. The temperature integration algorithm was initiated at 107.9 km; the top 10 km of the profile has been removed. The dashed line is the temperature from the Fleming model (289) for the appropriate location and date. Bottom panel shows (a) the rms deviation from the mean temperature profile for temperatures calculated every 15 minutes at the same vertical resolution as before. (b) is the average statistical uncertainty in the individual temperature profiles used in the calculation of the rms and is based on the photon counting statistics.
LIDAR
quantified, unless an independent determination of the temperature is available. The power–aperture product is the typical measure of a lidar system’s effectiveness. The power–aperture product is the mean laser power (watts) multiplied by the collecting area of the receiver system (m2 ). This result is, however, a crude metric because it ignores both the variations in Rayleigh-scatter cross section and atmospheric transmission with transmitter frequency, as well as the efficiency of the system. The choice of a laser for use in Rayleigh lidar depends on a number of factors, including cost and ease of use. The best wavelengths for a Rayleigh lidar are in the blue–green region of the spectrum. At longer wavelengths, for example, the infrared, the scattering cross section is smaller, and thus, the return signal is reduced. At shorter wavelengths, for example, the ultraviolet, the scattering cross section is higher, but the atmospheric transmission is lower, leading to an overall reduction in signal strength. Most dedicated Rayleigh lidars use frequency-doubled Nd:YAG lasers that operate at 532 nm (green light). Other advantages of this type of laser are that it is a well-developed technology that provides a reliable, ‘‘turnkey,’’ light source that can produce pulses of short duration with typical average powers of 10 to 50 W. Some Rayleigh lidar systems use XeF excimer lasers that operate at about 352 nm. These systems enjoy the higher power available from these lasers, as well as a Rayleigh-scatter cross section larger than for Nd:YAG systems, but the atmospheric transmission is lower at these wavelengths. In addition, excimer lasers are generally considered more difficult and expensive to operate than Nd:YAG lasers. An example of a temperature profile from The University of Western Ontario’s Purple Crow lidar Rayleigh (40) system is shown in Fig. 9. The top panel of the figure shows the average temperature during the night’s observations, including statistical uncertainties due to photon counting. The bottom panel shows the rms deviation of the temperatures calculated at 15minute intervals. The rms deviations are a measure of the geophysical variations in temperature during the measurement period. Also included on the bottom panel is the average statistical uncertainty due to photon counting in the individual 15-minute profiles. Rayleigh lidar systems have been operated at a few stations for several years building up climatological records of middle atmosphere temperature (60,124,125). The lidar group at the Service d’Aeronomie du CNRS, France has operated a Rayleigh lidar at the Observatory of HauteProvence since 1979 (120,125–128). The data set collected by this group provides an excellent climatological record of temperatures in the middle and upper stratosphere and in the lower mesosphere. Lidar systems designed primarily for sodium and ozone measurements have also been used as Rayleigh lidar systems for determining stratospheric and mesospheric temperatures (129–131). Rayleigh-scatter lidar measurements can be used in conjunction with independent temperature determinations to calculate molecular nitrogen and molecular oxygen mixing ratios in the mesopause region of the atmosphere (132).
879
Rayleigh lidar systems cannot operate when clouds obscure the middle atmosphere from their view. Most Rayleigh systems can operate only at nighttime due to the presence of scattered solar photons during the day. However, the addition of a narrow band-pass filter in the receiver optics allows daytime measurements (35,133). Doppler Effects Both random thermal motions and bulk-mean flow (e.g., wind) contribute to the motion of air molecules. When light is scattered by molecules, it generally undergoes a change in frequency due to the Doppler effect that is proportional to the molecules line of sight velocity. If we consider the backscattered light and the component of velocity of the scattering center in the direction of the scatter, then the Doppler shift, that is, the change in frequency ν of the laser light is given by (134)
ν = ν − ν ≈ 2ν
v c
(19)
where ν is the frequency of the incident photon, ν is the frequency of the scattered photon, and v is the component of the velocity of the scattering center in the direction of scatter (e.g., backscatter). The random thermal motions of the air molecules spectrally broaden the backscattered light, and radial wind causes an overall spectral shift. The velocity distribution function due to thermal motion of gas molecules in thermal equilibrium is given by Maxwell’s distribution. For a single direction component x, the probability that a molecule has velocity vx is (135) P(vx )dvx =
M 2π kT
1/2
Mv2x dvx exp − 2 kT
(20)
where M is molecular weight, k is Boltzmann’s constant, T is temperature, and vx is the component of velocity in the x direction. Using Eqs. (19) and (20), it can be shown that when monochromatic light is backscattered by a gas, the frequency distribution of the light is given by
1
1 exp − P(ν ) = 2π 1/2 σ 2
where σ =
ν c
2kT M
ν − ν σ
2 ,
(21)
1/2 .
(22)
The resulting equation for P(ν ) is a Gaussian distribution √ whose full width at half maximum is equal to 2σ 2 ln 2. Equations (21) and (22) are strictly true only if all the atoms (molecules) of the gas have the same atomic (molecular) weight. However, air contains a number of molecular and atomic species, and therefore the frequency distribution function for Rayleigh backscattered light Pa (ν ) is the weighted sum of Gaussian functions for each constituent. The major constituents of air, N2 and O2 , have similar molecular masses which allows the function Pa (ν ) to be fairly well approximated by a single Gaussian
LIDAR
Intensity
880
n n′
Frequency
Figure 10. The frequency distribution function for Rayleigh backscattering from a clean dry atmosphere (i.e., no water vapor or aerosols), for monochromatic incident radiation of frequency ν. The broadening is due to random thermal motions and the shift is due to wind.
calculated for a gas whose a molecular mass is equal to the mean molecular mass of air. Wind, the bulk motion of the air, causes the distribution function Pa (ν ) to shift frequency while maintaining its shape. The frequency shift can be calculated directly from Eq. (19), which shows that the shift is directly proportional to the component of the wind velocity in the direction of scattering, the radial wind velocity. Figure 10 shows how the spectrum of a narrow bandwidth laser is changed due to scattering by molecules in the atmosphere. In principle, it is possible to determine both the radial wind velocity and temperature by measuring the spectral shape of the light backscattered from air molecules in the middle atmosphere. However, using this Doppler technique, the signal-to-noise ratio requirements for temperature measurement are much higher than that for measuring winds (136), and so in practice, Rayleigh–Doppler temperature measurements are quite difficult. The advantage of this method of temperature determination is that the true kinetic temperature of the atmosphere is obtained without the need for the assumptions required by the Rayleigh technique. The group at the Observatory Haute-Provence (54,137) has demonstrated the Doppler technique for measuring middle atmosphere winds. They used a Fabry–Perot interferometer as a narrowband filter to measure the intensity of the lidar returns in a pair of wavelength ranges centered on the laser wavelength (54). Tepley et al. used a scanning interferometer to make similar measurements (136). AEROSOL SCATTERING AND LIDAR The theory of scattering that was developed by Mie (90) in the early 1900’s is a general solution that covers the scattering of electromagnetic radiation by a homogeneous sphere for all wavelengths of radiation and spheres of all sizes and refractive indexes. A parameter that is basic to the Mie theory is the size parameter α. This parameter is a measure of the relative size of the scattering particle to the wavelength of the radiation: α=
2π a λ
(23)
where a is the radius of the scattering particle and λ is the wavelength of the incident radiation. When the particle size is small compared to the wavelength of the incident radiation (i.e., α is small), Mie theory reduces to Rayleigh theory. Mie theory is general enough to cover the range of α’s for which Rayleigh and geometrical optics also apply, but it is mathematically more complex than Rayleigh theory and geometrical optics. This complexity has led to the common use of Mie scattering to imply scattering from particles larger than those to which Rayleigh theory applies and smaller than those to which geometrical optics applies. Mie theory solves Maxwell’s equations for the boundary conditions imposed by a homogeneous sphere whose refractive index is different from that of the surrounding medium. Since Mie first published the solution to this problem, others have extended the calculations to include different shapes (e.g., infinite cylinders and paraboloids) and have provided methods for finding solutions for irregular shapes and nonhomogenous particles (112,138–140). The atmosphere contains particles that have an infinite variety of shapes, sizes and refractive indexes. The measurement of the properties of atmospheric aerosols is also complicated by the composition and size of these particles (87,141–143). Evaporation, condensation, coagulation, absorption, desorption, and chemical reactions change the atmospheric aerosol composition on short timescales. Care must be taken with direct sampling methods that the sampling process allows correct interpretation of the properties of the aerosols collected. Aerosol concentrations in the atmosphere vary widely with altitude, time, and location. The vertical structure of aerosol concentration profiles is complex and ever changing (144–148). There is a layer of aerosols in the atmosphere from about 15 to 23 km that is known as the stratospheric aerosol layer or the Junge (149) layer. The Junge is primarily volcanic in origin. Lidar measurements have shown that the altitude range and density of the aerosols in this layer vary widely depending on recent volcanic activity (150–154). Extinction cross sections given by the Mie theory for size parameters corresponding to atmospheric aerosols and visible light are generally larger than extinction cross sections due to molecular scattering (87). In the atmospheric boundary layer, where the aerosol concentrations are high, the extinction of a beam of visible light is much greater than that due solely to Rayleigh scattering. Tropospheric aerosols can be a mixture of natural and anthropogenic aerosols. The effects of clouds are difficult to quantify due to the great variability they exhibit in their optical properties and in their distribution in time and space. Atmospheric aerosols, including clouds, play an important role in the earth’s radiation budget. A full understanding of the role of aerosols is important for improving weather forecasting and understanding climate change. Aerosols scatter and absorb both incoming solar radiation and outgoing terrestrial radiation. The amount of radiation that is scattered and the directions of scatter, as well as the amount or radiation absorbed, varies with aerosol
LIDAR
composition, size, and shape. Thus, the physical properties of aerosols determine whether they contribute net heating or cooling to the Earth’s climate. Lidar provides a method of directly measuring the optical properties of atmospheric aerosol distributions and is playing an important role in current work to better quantify the atmospheric radiation budget (148,155–160).
Since the early 1960s, a large number of lidar systems have been built that are designed to study aerosols, including clouds, in the troposphere and lower stratosphere (161,162). Instruments using multiple wavelength transmitters and receivers (55,145,154,163–168) and polarization techniques (55,56,58,169–173) have been used to help quantify aerosol properties. A review of aerosol lidar studies is given by Reagan et al. (174). Lidars have been used to study polar stratospheric clouds (PSCs) (175–181) to help understand the role they play in ozone depletion (182–184). In September 1994, NASA flew a space shuttle mission, STS-64, which included the LITE experiment (185–187). LITE was a technology development and validation exercise for future space lidar systems. The scientific potential of LITE was recognized early in its development, and a science steering committee was established to ensure that the scientific potential of the experiment was exploited. LITE used a Nd:YAG operating simultaneously at three frequencies, the fundamental 1,064 nm, the second harmonic 532 nm, and the third harmonic 355 nm. It also incorporated a system for automatically aligning the laser beam into the FOV of the detector system. The science objectives of LITE were to study the following atmospheric properties: 1. tropospheric aerosols, including scattering ratio and its wavelength dependence, planetary boundary layer height, structure and optical depth; 2. stratospheric aerosols, including scattering ratio and its wavelength dependence, averaged integrated backscatter, as well as stratospheric density and temperature; 3. the vertical distribution, multi layer structure, fractional cover, and optical depth of clouds; 4. the radiation budget via measurements of surface reflectance and albedo as a function of incidence angle. Figure 11 shows a sample of the LITE measurements. This figure clearly shows regions of enhanced scatter from cloud and dust from the Saharan Desert in Northwest Africa. A worldwide correlative measurement program was undertaken for validation and intercomparison with LITE measurements. This correlative measurement program included more than 60 ground-based and several aircraft-based lidar systems (188–190). Atmospheric aerosols have the same average velocity as atmospheric molecules; thus, the average Doppler shift of their distributions is the same, see section Doppler Effects earlier. The spectral broadening of the
15
10 Altitude, km
Aerosol Lidar
881
5
0 Figure 11. LITE Observations of Saharan dust, 12 September, 1994. Elevated dust layers exceeding 5 km above the Saharan Desert in Northwest Africa were observed by the Lidar In-Space Technology Experiment (LITE). The intensity plot for the 532-nm wavelength shows an aerosol layer associated with wind-blown dust from the Saharan Desert. This image is composed of individual lidar profiles sampled at 10 Hz and extends 1,000 km along the Space Shuttle Discovery orbit track during nighttime conditions. Weaker signals due to molecular backscatter are in blue, moderate backscatter signals from the dust layer are in yellow and red, and the strongest backscatter signals from clouds and the surface are in white. Opaque clouds, shown in white, prevent LITE from making observations at lower altitudes and create a shadowing effect beneath the cloud layer. The Atlas Mountain range is seen near 31 ° N, 6 ° W (David M. Winker, NASA Langley Research Center, and Kathleen A. Powell, SAIC). See color insert.
light backscattered from aerosols is much narrower than that backscattered from molecules because the mass of aerosols is much greater than that of air molecules. Light backscattered from aerosols can be separated from that backscattered from molecules using this difference in Doppler width (119,191); however, spectral separation is not necessary if only wind is to be measured because the average Doppler shift is the same for both molecular and aerosol scattering. Wind lidar using incoherent detection has been used in the troposphere (51,137); however, coherent detection techniques are more commonly used.
Coherent Doppler Lidar Because of stronger the signal levels in the lower atmosphere, the measurement of the Doppler shift via coherent detection techniques becomes viable. Coherent Doppler lidar is used extensively in wind field mapping from the ground (192,193), from the air (194–196), and has been suggested as a possible method for global wind measurement from space platforms (194,197).
882
LIDAR
Differential Absorption Lidar (Dial) In 1964, Schotland (198) suggested using a lidar technique now known as differential absorption lidar (DIAL). DIAL is useful for measuring the concentration of trace species in the atmosphere. The method relies on the sharp variation in optical transmission near an absorption line of the species to be detected. A DIAL transmits two closely spaced wavelengths. One of these wavelengths coincides with an absorption line of the constituent of interest, and the other is in the wing of this absorption line. During the transmission of these two wavelengths through the atmosphere, the emission that is tuned to the absorption line is attenuated more than the emission in the wing of the absorption line. The intensity of the two wavelengths that are backscattered to the DIAL instrument can then be used to determine the optical attenuation due to the species and thus, the concentration of the species. The first use of a DIAL system was for measuring atmospheric water vapor concentration (199). The DIAL technique has been extensively used for pollution monitoring (200–206). This technique is also used very successfully in the lower atmosphere for high spatiotemporal measurements of species such as NO (207), H2 O (208–210), O3 (211–213), SO2 (214,215), and CH4 (216–218). Atmospheric temperature measurement is possible by the DIAL technique if the absorption line selected is temperature-dependent (219–221). Use of the DIAL technique in the middle atmosphere has been restricted mainly to measuring ozone profiles (211,222–227). DIAL ozone measurements have extended as high as 50 km with integration times of at least a few hours required. These same lidar systems can obtain profiles up to 20 km in approximately 15 min due to the much higher ozone densities and available scatterers at the lower levels. Typically, a stratospheric ozone DIAL uses a XeCl laser that operates at 308 nm for the ‘‘online’’ or absorbed wavelength and a frequency-tripled YAG at 355 nm for the ‘‘off-line’’ or reference wavelength. The spectral separation between the wavelengths means that when large stratospheric aerosol loading events occurs (such as after a large volcanic eruption), the measurements become difficult to interpret due to the optical effects of the aerosols. These shortcomings have been addressed by recording the Raman-shifted backscatter from N2 at both of the transmitted wavelengths (228). The DIAL technique has also been used with hard targets (229,230) and is called differential optical absorption spectroscopy (DOAS). DOAS measurements are an average across the entire path from the instrument to the target, so a DOAS system is not strictly a lidar because it does not perform any ranging. DOAS has been used to monitor large areas from aircraft using the ground as the target or reflector and has been used for monitoring chemical (6–8) and biological (9–12) weapons agents. RAMAN LIDAR When monochromatic light, or light of sufficiently narrow spectral width, is scattered by a molecular gas or liquid, the spectrum of the scattered light, it can be observed, contains lines at wavelengths different from those of the incident radiation (231). Raman first observed this effect (232), that
is due to the interaction of radiation with the quantized vibrational and rotational energy levels of the molecule. Raman scattering involves a transfer of energy between scattered light and a molecule and is therefore, an inelastic process. The cross sections due to Raman scattering are included in the Rayleigh scattering theory (106), although Raman spectroscopists use the term Rayleigh line to indicate only the unshifted central component of the scattered light. Each type of molecule has unique vibrational and rotational quantum energy levels and therefore, Raman scattering from each type of molecule has a unique spectral signature. This allows the identification of molecules by their scattered light spectra. Scattered radiation that loses energy during interaction with a molecule, and so decreases in frequency, is said to have a Stokes shift, whereas radiation that gains energy and increases in frequency is said to have an anti-Stokes shift. In general, Stokes radiation is more intense than anti-Stokes because the Stokes can always occur, subject to selection rules, whereas anti-Stokes also requires that the molecule is initially in an excited state. The quantum numbers v and J describe the vibrational and rotational states of a molecule, respectively. The Qbranch, J = 0, contains a number of degenerate lines leading to higher intensity for light scattered in this branch. The v = +1 frequency shifts and backscatter cross sections for a number of atmospheric molecules are given in Fig. 12. Measures (22) gives a comprehensive list of atmospheric molecules. The pure rotational Raman spectrum (PRRS), which occurs when there is no vibrational transition, that is,
v = 0, is more difficult to measure because the spectral shift of the lines is quite small. This small shift leads to technical difficulties in blocking the nearby elastic scatter from entering the detector. The PRRS of an N2 molecule is shown in Fig. 13. The intensities of the individual lines and thus the shape of the envelope of the lines are temperaturedependent. The term Raman lidar is generally used to refer to a lidar system that uses the Raman-shifted component where v = ±1, that is, a transition that involves a change in vibrational energy level. In practice the v = +1 transition is commonly used because it has higher intensity. The spectral selection of the v = +1 line in the receiver system of a lidar can be achieved by using a highquality narrowband interference filter. It is necessary to ensure that blocking of the filter at the laser wavelength is sufficiently high that the detected elastic backscatter from molecules and aerosols is insignificant compared to Raman scattering. Generally, special order filters are required to meet this specification. In the mid-1960s, Cooney (233) and Leonard (234) demonstrated the measurement of the Raman-shifted component of N2 in the troposphere by lidar. The Raman lidar technique has been used most often for measuring atmospheric water vapor (34,235–240). Clouds (241–243) and aerosols (148,156,244,245) have also been studied by this technique. The use of Raman lidar is restricted to the more abundant species in the atmosphere due to the small backscatter cross section involved. The measurement of
LIDAR
883
55 50
NO2(ν1)
Cross section (10−30 cm2 sr −1)
45 C6H6(ν1)
40 35
CH4
30
CCl4
25 NO2(ν2)
20 15
SF6
NH3
10
O2
CO2(ν1) NO 500
0
1000
H2
H2O
O3
5 0
H2S
SO2
1500
C2 H2
CO N2
2000
2500
3000
3500
4000
Intensity
Raman shift (cm−1)
T(K) 350 290 210 0
2
4
6
8
10
12
14
16
18
J
Figure 13. Intensity distribution of PRRS for N2 at three temperatures.
atmospheric water vapor concentration by Raman lidar requires measuring the Raman backscatter from both water vapor and molecular nitrogen. The nitrogen signal is used as a reference to determine the water vapor mixing ratio from the lidar’s Raman water vapor signal. There are two methods by which Raman lidar can be used to determine atmospheric temperature. In the upper troposphere and throughout the stratosphere, the Rayleigh lidar temperature retrieval algorithm can be applied to appropriately corrected Raman N2 measurements. Due to its spectral shift, the Raman component of the scatter from N2 is free from the contamination of scattering from aerosols. However, aerosols affect the optical transmission of the atmosphere, an effect for which the Raman N2 signal must be corrected before it is used for temperature calculations (246–248). Unlike Rayleigh temperature retrieval, here, the transmission is not constant with altitude. The characteristics of the background stratospheric aerosol layer are known well enough that the correction for atmospheric transmission
Figure 12. Vibrational Raman frequency shifts and cross sections for a number of molecules found in the atmosphere.
leads to an acceptable uncertainty in calculated temperatures. However, this correction cannot be made with sufficient accuracy lower in the atmosphere and during increased loading of the stratospheric aerosol layer. Cooney (249) was the first to propose temperature measurement based on the shape of the PRRS for molecular nitrogen. This method uses the variation in the population of the rotational levels of a molecule with temperature; at higher temperature, the probability that a higher level is populated is greater. Figure 13 shows the envelope of the PRRS lines of a nitrogen molecule at three temperatures. Thus, temperature measurements can be made by measuring the intensity of some or all of the PRRS lines. This differential technique determines the temperature from the intensity of the Raman backscatter across a very narrow wavelength range. Changes in atmospheric transmission due to changes in aerosol properties and loading are insignificant across such a small wavelength range, making the technique almost independent of aerosols. Separation of the central Rayleigh line from the PRRS has proved to be very difficult, even though the backscatter cross section for PRRS is much greater than that for vibrational-rotational Raman scattering. For example, for the N2 molecule, the backscatter cross sections for vibrational, pure-rotational and elastic scattering are 3.5 × 10−30 , 1.1 × 10−28 and 3.9 × 10−27 , respectively. The spectral separation of the PRRS and the central unshifted line is quite small, and this leads to technical difficulties when trying to separate these two signals. Nevertheless, a number of Raman lidar systems have been constructed that infer temperature from rotational Raman spectra (250–255). Resonance Lidar Resonant scattering occurs when the energy of an incident photon is equal to the energy of an allowed transition within an atom. This is an elastic process; the atom absorbs the photon and instantly emits another photon at the same frequency. As each type of atom and molecule
LIDAR
has a unique absorption and hence, fluorescent spectrum, these measurements may be used to identify and measure the concentration of a particular species. A description of the theory of fluorescence and resonance can be found in both Chamberlain (256) and Measures (22). The constant ablation of meteors in the earth’s upper atmosphere leads to the existence of extended layers of alkali metals in the 80 to 115 km region (257). These metals have low abundances but very high resonantscattering cross sections. Because resonant scattering involves an atomic transition between allowed energy levels, the probability that this process occurs is much greater than that for Rayleigh scattering. For instance, at 589 nm, the resonance-fluorescence cross section for sodium is about 1015 times larger than the cross section for Rayleigh scattering from air. This means that the lidar signal from 85 km measured by a sodium resonancefluorescence lidar is about the same as the Rayleigh scatter signal measured by the same lidar at about 30 km. Sodium. Atmospheric sodium is the most widely used of the alkali metal layers in the atmosphere because it is relatively abundant and the transmitter frequency is easy to generate. Several research groups have measured the climatology of sodium abundance, parameters related to gravity wave dynamics, temperatures, and winds (83,258–265). The sodium layer exists in the earth’s atmosphere between about 80 and 105 km in altitude, a region that covers the upper part of the mesosphere and the lower part of the thermosphere. This sodium layer is sometimes referred to as the mesospheric sodium layer, although it extends well above the top of the mesosphere. The first reported use of a resonance lidar to study sodium was in 1969 (266). The existence of the mesospheric sodium layer had been known many years previous to these first lidar measurements, due to the bright, natural airglow emission that was extensively studied using passive spectroscopy (267). These passive instruments could resolve the height structure of the region only during sunrise and sunset. The spectral shape of the sodium line at 589 nm, the D2a line, is temperature-dependent and the scattering cross section is proportional to the line shape. Using this information allows the measurement of the temperature of the sodium atoms and the atmosphere surrounding them from the spectral shape of the backscattered intensity. Figure 14 shows the shape of the sodium D2a line for three temperatures that are within the range of temperatures expected around the mesopause region. The sodium D2a shape has been measured by lidar in a number of ways (268,269). Usually, this measurement is achieved by transmitting narrow bandwidth laser pulses at two or three well-known frequencies within the sodium D2a line and recording the backscatter intensity at each of the transmitted frequencies separately. By knowing the frequency of the transmitted laser pulses and the intensity of the backscatter at each of the transmitted frequencies, the atmospheric temperature can be determined. A technique known as Doppler-free saturation spectroscopy is used to set the frequency of the laser transmitted into the atmosphere very precisely.
150 K 200 K 250 K
Relative insensity
884
fc
fa
fb
−3
−2
−1
0
1
2
3
Frequency offset (GHz)
Figure 14. Shape of the sodium D2a line at three temperatures.
Counterpropagating a sample of the laser output through a laboratory cell that contains sodium vapor generates the Doppler-free saturation spectrum. Under the right conditions, the fluorescence from the cell contains sharp spectral features (270) (Fig. 15). Measurements of these Dopplerfree features are used in a feedback loop to control the output frequency of the laser and to lock the laser’s output frequency to the frequency of the spectral feature (83,271). The Doppler-free spectrum of sodium provides three features that offer the possibility of locking the laser; fa , fb , and fc . The atmospheric temperature can be determined from the ratio of the backscattered intensity at any two of three available frequencies. The pair of frequencies, which has the largest change in ratio with temperature, is fa and fc and so these two frequencies are commonly used. This method of temperature measurement is a direct spectral measurement and has associated errors several orders of magnitude lower than those associated with Rayleigh temperature measurements in this altitude range. A slight drawback of this method is that it typically takes 5 to 10 seconds to switch the laser from one frequency to the other, fa to fc , or back again. To obtain a reasonable duty cycle, it is therefore necessary to operate the laser at each frequency for typically 30 to 60 seconds. The temperature is then determined from the ratio of measurements taken at slightly different times. The variability of the sodium and the atmosphere over this short timescale leads to some uncertainty in the temperatures measured using this technique (270). Improvements in transmitter technology during the last decade have allowed winds as well as temperatures to be measured using narrowband sodium lidar systems (270,272,273) incorporating an acousto-optic (AO) modulator. The AO modulators are used to switch the transmitted frequency several hundred MHz to either side of a selected Doppler-free feature. This tuning enables measuring the Doppler shift and the width of the backscattered light simultaneously. Acousto-optic modulators can be turned on and off very quickly; this feature allows frequency switching between transmitted laser pulses. Typically a sodium temperature-wind lidar operates at three frequencies fa and fa plus and minus the AO offset. Today, such systems have been extended to a large scale, for example, the sodium lidar operated at the Starfire
LIDAR
Relative intensity
(b)
1
105
0.8
27 October, 2000
(K) 240 230
0.6
100 220
0.4
Modeled spectrum
0.2 −1.5
−1
−0.5 0 0.5 Frequency offset (GHz)
1
1.5
210
95
200 90
190 180
0.7
85
170
0.6 80
0.5
160 4
5
6
7
8
9
10
11
UT (h)
0.4 −700
−680
−660
−640
−620
−600
Figure 16. Temperature in the mesopause region of the atmosphere measured by the University of Illinois Sodium Wind and Temperature Lidar over the Starfire Optical Range (35.0N,106.5W), near Albuquerque, New Mexico, USA, on 27 October 2000. The local time is UT (Universal Time) 7 hours. Measurements shown in this image have been smoothed by about 0.5 hour in time and 0.5 km in altitude. The downward phase progression of the atmospheric tidal structure is clearly shown as the temperature structure move downward with time (courtesy of the University of Illinois lidar group). See color insert.
280
and dynamics from the surface to the upper atmosphere. It also has important uses in mapping, bathymetry, defense, oceanography and natural resource management. Lidar solutions offer themselves for a wide range of environmental monitoring problems. Except for the LITE experiment (184,185), present lidars systems are primarily located on the surface or, for campaign use, on aircraft. The next decade promises the launch of several significant space-based lidar systems to study the Earth’s atmosphere. These systems include experiments to measure clouds on a global scale, for example, the GLAS (284,285), ATLID (286), and ESSP3–CENA (287) instruments, as well as ORACLE, (288) a proposed instrument to measure global ozone distribution. These space-based missions will complement existing ground-based systems by increasing global coverage. A new, ground-based, multitechnique lidar called ALOMAR (261) promises to provide measurements of air density, temperature, 3-D wind vector, momentum fluxes, aerosols, cloud particles, and selected trace gases at high vertical and temporal resolution. The new millennium will bring synergistic combinations of space and ground-based radar and lidar facilities that will greatly enhance our ability to predict weather and climatic changes by making available measurements of wind, temperature, composition, and cloud properties.
Frequency offset (MHz) (c) Relative intensity
Temperature
Measured spectrum
Altitude (km)
Relative intensity
(a)
885
0.55 0.50 0.45 0.40 0.35 160
180
200
220
240
260
Frequency offset (MHz) Figure 15. The Doppler-free-saturation spectra for the sodium D2a line showing the locations of the spectral features fa , fb , and fc . (a) D2a line. (b) closeup of fa , solid line is modeled ‘+’s are measured. (c) closeup of fc .
Optical Range (SOR). Figure 16 shows an example of temperature measurements made at SOR. By simultaneously measuring temperature and vertical wind velocity, measurements at SOR have been used for the first determinations of the vertical flux of heat due to gravity waves in the mesopause region (40). Other Metallic Species. Other alkali metals, including calcium (Ca and Ca+ ) (274,275), potassium (276,277), lithium (278,279), and iron (280,281), that have resonance lines in the blue region of the visible spectrum, have also been used to study the mesopause region of the Earth’s atmosphere. Thomas (282) reviews the early work in this field. Resonance lidar requires laser transmissions at the precise frequency of an absorption line of the species being studied. Traditionally, dye lasers have been used successfully to probe many of these species, though working with these dyes is difficult in the field environment. Recently, solid-state lasers have been applied to resonance lidar systems (283). SUMMARY Lidar has established itself as one of the most important measurements techniques for atmospheric composition
ABBREVIATIONS AND ACRONYMS ATLID ALOMAR AO CCD CNRS
atmospheric lidar arctic lidar observatory for middle atmosphere research acousto-optic charge coupled device centre natural de la recherche scientifique
886 cw DIAL DOAS ESSP3 FOV GLAS Lidar LITE LMT MCP MCS NASA Nd:YAG ORACLE PCL PMT PPRS PRF RF SIN SOR STS UT
LIDAR continuous wave differential absorption lidar differential optical absorption spectroscopy earth system science pathfinder 3 field-of-view geoscience laser altimeter system light detection and ranging lidar in space technology experiment liquid mirror telescope micro channel plate multichannel scaler national aeronautics and space administration neodymium:yttrium-aluminum garnet ozone research with advanced cooperative lidar experiment purple crow lidar photomultiplier tube pure rotational raman lidar pulse repetition frequency radio frequency signal induced noise starfire optical range space transport system Universal time
BIBLIOGRAPHY 1. D. A. Leonard, B. Caputo, and F. E. Hoge, Appl. Opt. 18, 1,732–1,745 (1979).
20. R. Frehlich, in Trends in Optics: Research, Development and Applications, A. Consortini, ed., Academic Press, London, England, 1996, pp. 351–370. 21. W. B. Grant, in Tunable Laser Applications, F. J. Duarte, ed., Marcel Dekker, NY, 1995, pp. 213–305. 22. R. M. Measures, Laser Remote Sensing: Fundamentals and Applications, John Wiley & Sons, Inc., New York, NY, 1984. 23. E. H. Synge, Philos. Mag. 52, 1,014–1,020 (1930). 24. Duclaux, J. Phys. Radiat. 7, 361 (1936). 25. E. O. Hulbert, J. Opt. Soc. Am. 27, 377–382 (1937). 26. R. Bureau, Meteorologie 3, 292 (1946). 27. L. Elterman, J. Geophys. Res. 58, 519–530 (1953). 28. S. S. Friedland, J. Katzenstein, and M. R. Zatzick, J. Geophys. Res. 61, 415–434 (1956). 29. T. H. Maiman, Nature 187, 493 (1960). 30. F. J. McClung and R. W. Hellworth, J. Appl. Phys. 33, 828–829 (1962). 31. L. D. Smullins and G. Fiocco, Nature 194, 1,267 (1962). 32. G. Fiocco and L. D. Smullins, Nature 199, 1,275–1,276 (1963). 33. H. Chen et al., Opt. Lett. 21, 1,093–1,095 (1997). 34. S. E. Bisson, J. E. M. Goldsmith, and M. G. Mitchell, Appl. Opt. 38, 1,841–1,849 (1999). 35. D. Rees, U. von Zahn et al., Adv. Space Res. 26, 893–902 (2000). 36. J. D. Spinhirne, IEEE Trans. Geosci. Remote 31, 48 (1993) 37. C. Nagasawa et al., Appl. Opt. 29, 1,466–1,470 (1990).
2. J. L. Irish and T. E. White, Coast. Eng. 35, 47–71 (1998). 3. R. Barbini et al., ICES J. Mar. Sci. 55, 793–802 (1998).
38. Y. Emery and C. Flesia, Appl. Opt. 37, 2,238–2,241 (1998).
4. I. M. Levin and K. S. Shifrin, Remote Sensing Environ. 65, 105–111 (1998).
40. C. S. Gardner and W. M. Yang, J. Geophys. Res. 103, 8,699–8,713 (1998).
5. J. H. Churnside, V. V. Tatarskii, and J. J. Wilson, Appl. Opt. 37, 3,105–3,112 (1998).
41. E. F. Borra and S. Thibault, Photon Spectra 32, 142–146 (1998).
6. N. S. Higdon et al., Proc. Nineteenth ILRC, NASA, Hampton, Va., 1998, p. 651.
42. G. V. Guerra et al., J. Geophys. Res. 104, 22,287–22,292 (1999).
7. D. R. Alexander, M. L. Rohlfs, and J. C. Stauffer, Proc. SPIE 3,082, 22–29 (1997).
43. D. M. Chambers and G. P. Nordin, J. Opt. Soc. Am. 16, 1,184–1,193 (1999).
8. G. Klauber, C. Sini, P. M. Brinegar II, and M. M. Williams, Proc. SPIE 3,082, 92–103 (1997).
44. http://control.cass.usu.edu/lidar/index.htm.
39. R. J. Sica et al., Appl. Opt. 43, 6,925–6,936 (1995).
45. S. Ishii et al., Rev. Sci. Instrum. 67, 3,270–3,273 (1996).
9. R. A. Mendonsa, Photon Spectra 31, 20 (1997). 10. [ANON], Laser Focus World 32, 13 (1996).
46. J. L. Baray et al., Appl. Opt. 38, 6,808–6,817 (1999).
11. W. B. Scott, Aviat. Week Space Technol. 143, 44 (1995). 12. B. T. N. Evans, E. Yee, G. Roy, and J. Ho, J. Aerosol Sci. 25, 1,549–1,566 (1994). 13. A. V. Jelalian, W. H. Keene, and E. F. Pearson, in D. K. Killinger and A. Mooradian, eds., Optical and Laser Remote Sensing, Springer-Verlag, Berlin, 1983, pp. 341–349.
48. Z. L. Hu et al., Opt. Commun. 156, 289–293 (1998).
14. www.bushnell.com. 15. www.leica-camera.com.
52. S. T. Shipley et al., Appl. Opt. 22, 3,716–3,724 (1983).
16. U. N. Singh, in Optical Measurement Techniques and Application, P. K. Rastogi, ed., Artech House, Norwood, MA, 1997, pp. 369–396. 17. C. Weitkamp, in Radiation and Water in the Climate System, E. Raschke, ed., Springer-Verlig, Berlin, Germany, 1996, pp. 217–247.
54. M. L. Chanin et al., J. Geophy. Res. 16, 1,273–1,276 (1989).
18. D. K. Killinger and A. Mooradian, eds., Optical and Laser Remote Sensing, Springer-Verlag, Berlin, 1983. 19. L. Thomas, in Spectroscopy in Environmental Science, R. J. H. Clark and R. E. Hester, eds., Wiley, Chichester, England, 1995, pp. 1–47.
47. K. W. Fischer et al., Opt. Eng. 34, 499–511 (1995). 49. J. A. McKay, Appl. Opt. 38, 5,851–5,858 (1999). 50. G. Beneditti-Michelangeli, F. Congeduti, and G. Fiocco, JAS 29, 906–910 (1972). 51. V. J. Abreu, J. E. Barnes, and P. B. Hays, Appl. Opt. 31, 4,509–4,514 (1992). 53. G. Fiocco and J. B. DeWolf, JAS 25, 488–496 (1968). 55. A. I. Carswell, in D. K. Killinger and A. Mooradian, eds., Optical and Laser Remote Sensing, Springer-Verlag, Berlin, 1983, pp. 318–326. 56. K. Sassen, R. P. Benson, and J. D. Spinhirne, Geophys. Res. Lett. 27, 673–676 (2000). 57. G. P. Gobbi, Appl. Opt. 37, 5,505–5,508 (1998). 58. F. Cairo et al., Appl. Opt. 38, 4,425–4,432 (1999). 59. F. Cairo et al., Rev. Sci. Instrum. 67, 3,274–3,280 (1996). 60. J. P. Thayer et al., Opt. Eng. 36, 2,045–2,061 (1997).
LIDAR 61. P. S. Argall and F. Jacka, Appl. Opt. 35, 2,619–2,629 (1996). 62. F. L. Pedrotti and L. S. Pedrotti, Introduction to Optics, 2nd ed., Prentice-Hall, Englewood Cliffs, NJ, 1993, pp. 24–25. 63. E. L. Dereniak and D. G. Crowe, Optical Radiation Detectors, John Wiley & Sons, Inc., New York, NY, 1984, pp. 116–121. 64. M. J. McGill and W. R. Skinner, Opt. Eng. 36, 139–145 (1997). 65. W. C. Priedhorsky, R. C. Smith, and C. Ho, Appl. Opt. 35, 441–452 (1996). 66. T. Erikson et al., Appl. Opt. 38, 2,605–2,613 (1999). 67. N. S. Higdon et al., Appl. Opt. 33, 6,422–6,438 (1994). 68. M. Wu et al., Appl. Spectrosc. 54, 800–806 (2000). 69. A. M. South, I. M. Povey, and R. L. Jones, J. Geophys. Res. 103, 31,191–31,202 (1998). 70. R. W. Engstrom, Photomultiplier Handbook, RCA Corporation, USA, 1980. 71. J. Wilson and J. F. B. Hawkes, Optoelectronics, An Introduction, 2nd ed., Prentice-Hall, Cambridge, 1989, pp. 265–270. 72. D. P. Donovan, J. A. Whiteway, and A. I. Carswell, Appl. Opt. 32, 6,742–6,753 (1993). 73. A. O. Langford, Appl. Opt. 34, 8,330–8,340 (1995). 74. M. P. Bristow, D. H. Bundy, and A. G. Wright, Appl. Opt. 34, 4,437–4,452 (1995). 75. Y. Z. Zhao, Appl. Opt. 38, 4,639–4,648 (1999). 76. C. K. Williamson and R. J. De Young, Appl. Opt. 39, 1,973–1,979 (2000). 77. J. M. Vaughan, Phys. Scripta T78, 73–81 (1998). 78. R. M. Huffaker and P. A. Reveley, Pure Appl. Opt. 7, 863–873 (1998). 79. R. Targ et al., Appl. Opt. 35, 7,117–7,127 (1996). 80. R. M. Huffaker and R. M. Hardesty, Proc. IEEE 84, 181–204 (1996). 81. S. M. Hannon and J. A. Thomson, J. Mod. Opt. 41, 2,175–2,196 (1994). 82. V. M. Gordienko et al., Opt. Eng. 33, 3,206–3,213 (1994). 83. P. S. Argall et al., Appl. Opt. 39, 2,393–2,400 (2000). 84. A. Ben-David, Appl. Opt. 38, 2,616–2,624 (1999). 85. Y. J. Park, S. W. Dho, and H. J. Kong, Appl. Opt. 36, 5,158–5,161 (1997). 86. K. L. Coulson, Solar and Terrestrial Radiation, Academic Press, NY, 1975. 87. E. J. McCartney, Optics of the Atmosphere, John Wiley & Sons, Inc., New York, NY, 1976. 88. P. N. Slater, Remote Sensing, Optics and Optical Systems, Addison-Wesley, Toronto, 1980. 89. V. V. Sobolev, Light Scattering in Planetary Atmospheres, Pergamon Press, Oxford, 1975. 90. G. Mie, Ann. Physik 25, 377–445 (1908). 91. D. Muller et al., Appl. Opt. 39, 1,879–1,892 (2000). 92. J. P. Diaz et al., J. Geophys. Res. 105, 4,979–4,991 (2000). 93. F. Masci, Ann. Geofis. 42, 71–83 (1999). 94. D. Muller, U. Wandinger, and A. Ansmann, Appl. Opt. 38, 2,358–2,368 (1999). 95. C. Erlick and J. E. Frederick, J. Geophys. Res. 103, 23,275–23,285 (1998). 96. I. N. Sokolik, O. B. Toon, and R. W. Bergstrom, J. Geophys. Res. 103, 8,813–8,826 (1998). 97. A. A. Kokhanovsky, J. Atmos. Sci. 55, 314–320 (1998). 98. W. C. Conant, J. Geophys. Res. 105, 15,347–15,360 (2000).
887
99. G. M. McFarquhar et al., J. Atmos. Sci. 57, 1,841–1,853 (2000). 100. R. M. Hoff et al., J. Geophys. Res. 101, 19,199–19,209 (1996). 101. J. L. Brenguier et al., Tellus B 52, 815–827 (2000). 102. J. Redemann et al., J. Geophys. Res. 105, 9,949–9,970 (2000). 103. M. Minomura et al., Adv. Space Res. 25, 1,033–1,036 (2000). 104. A. T. Young, Appl. Opt. 19, 3,427–3,428 (1980). 105. A. T. Young, J. Appl. Meteorol. 20, 328–330 (1981). 106. A. T. Young, Phys. Today 35, 42–48 (1982). 107. Rayleigh (J. W. Strutt), Philos. Mag. 41, 274–279 (1871). 108. Rayleigh (J. W. Strutt), Philos. Mag. 41, 447–454 (1871). 109. Rayleigh (J. W. Strutt), Philos. Mag. 12, 81 (1881). 110. Rayleigh (J. W. Strutt), Philos. Mag. 47, 375–384 (1899). 111. J. A. Stratton, Electromagnetic Theory, McGraw-Hill, NY, 1941. 112. M. Kerker, The Scattering of Light and Electromagnetic Radiation, Academic Press, NY, 1969. 113. R. Penndorf, J. Opt. Soc. Am. 52, 402–408 (1962). 114. W. E. K. Middleton, Vision Through the Atmosphere, University of Toronto Press, Toronto, 1952. 115. M. Born and E. Wolf, Principles of Optics, Pergamon Press, Great Britain, Oxford, 1970. 116. G. S. Kent and R. W. H. Wright, J. Atmos. Terrestrial Phys. 32, 917–943 (1970). 117. R. T. H. Collis and P. B. Russell, in E. D. Hinkley, ed., Laser Monitoring of the Atmosphere, Springer-Verlag, Berlin, 1976. 118. G. Fiocco, in R. A. Vincent, ed., Handbook for MAP, vol. 13, ICSU, SCOSTEP, Urbana, IL, 1984. 119. G. Fiocco et al., Nature 229, 79–80 (1971). 120. A. Hauchecorne and M. L. Chanin, Geophys. Res. Lett. 7, 565–568 (1980). 121. T. Shibata, M. Kobuchi, and M. Maeda, Appl. Opt. 25, 685–688 (1986). 122. C. O. Hines, Can. J. Phys. 38, 1,441–1,481 (1960). 123. R. J. Sica and M. D. Thorsley, Geophys. Res. Lett. 23, 2,797–2,800 (1996). 124. T. J. Duck, J. A. Whiteway, and I. A. Carswell, J. Geophys. Res. 105, 22,909–22,918 (2000). 125. T. Leblanc et al., J. Geophys. Res. 103, 17,191–17,204 (1998). 126. M. L. Chanin and A. Hauchecorne, J. Geophys. Res. 86, 9,715–9,721 (1981). 127. M. L. Chanin and A. Hauchecorne, in R. A. Vincent, ed., Handbook for MAP, vol. 13, ICSU, SCOSTEP, Urbana, IL, 1984, pp. 87–98. 128. M. L. Chanin, N. Smires, and A. Hauchecorne, J. Geophys. Res. 92, 10,933–10,941 (1987). 129. C. S. Gardner, M. S. Miller, and C. H. Liu, J. Atmos. Sci. 46, 1,838–1,854 (1989). 130. T. J. Beatty, C. A. Hostetler, and C. S. Gardner, J. Atmos. Sci. 49, 477–496 (1992). 131. A. I. Carswell et al., Can. J. Phys. 69, 1,076 (1991). 132. M. M. Mwangi, R. J. Sica, and P. S. Argall, J. Geophys. Res. 106, 10,313 (2001). 133. R. J. States and C. S. Gardner, J. Geophys. Res. 104, 11,783–11,798 (1999). 134. E. A. Hyllerass, Mathematical and Theoretical Physics, John Wiley & Sons, Inc., New York, NY, 1970.
888
LIDAR
135. E. H. Kennard, Kinetic Theory of Gases, McGraw-Hill, NY, 1938. 136. C. A. Tepley, S. I. Sargoytchev, and R. Rojas, IEEE Trans. Geosci. Remote Sensing 31, 36–47 (1993). 137. C. Souprayen et al., Appl. Opt. 38, 2,410–2,421 (1999). 138. H. C. Van de Hulst, Light Scattering by Small Particles, John Wiley & Sons, Inc., New York, NY, 1951. 139. C. E. Bohren and D. R. Huffman, Absorption and Scattering of Light by Small Particles, John Wiley & Sons, Inc., New York, NY, 1983. 140. L. P. Bayvel and A. R. Jones, Electromagnetic Scattering and its Applications, Applied Science, England, London, 1981. 141. C. N. Davies, J. Aerosol. Sci. 18, 469–477 (1987). 142. L. G. Yaskovich, Izvestiya, Atmos. Oceanic 640–645 (1986).
Phys. 22,
143. Y. S. Georgiyevskiy et al., Izvestiya, Atmos. Oceanic Phys. 22, 646–651 (1986). 144. J. Rosen et al., J. Geophys. Res. 105, 17,833–17,842 (2000). 145. A. Ansmann et al., Geophys. Res. Lett. 27, 964–966 (2000). 146. T. Sakai et al., Atmos. Environ. 34, 431–442 (2000). 147. M. A. Fenn et al., J. Geophys. Res. 104, 16,197–16,212 (1999). 148. R. A. Ferrare et al., J. Geophys. Res. 103, 19,673–19,689 (1998). 149. C. E. Junge, Air Chemistry and Radioactivity, Academic Press, NY, 1963. 150. G. Di Donfrancesco et al., J. Atmos. Sol-Terrestrial Phys. 62, 713–723 (2000). 151. D. Guzzi et al., Geophys. Res. Lett. 26, 2,199–2,202 (1999). 152. V. V. Zuev, V. D. Burlakov, and A. V. El’nikov, J. Aersol Sci. 103, 13,873–13,891 (1998). 153. A. di Sarra et al., J. Geophys. Res. 103, 13,873–13,891 (1998). 154. G. S. Kent and G. M. Hansen, Appl. Opt. 37, 3,861–3,872 (1998). 155. M. H. Bergin et al., J. Geophys. Res. 105, 6,807–6,816 (2000). 156. R. A. Ferrare et al., J. Geophys. Res. 103, 19,663–19,672 (1998). 157. C. M. R. Platt et al., J. Atmos. Sci. 55, 1,977–1,996 (1998). 158. A. Robock, Rev. Geophys. 38, 191–219 (2000). 159. W. T. Hyde and T. J. Crowley, J. Climate 13, 1,445–1,450 (2000). 160. H. Kuhnert et al., Int. J. Earth Sci. 88, 725–732 (2000). 161. C. J. Grund and E. W. Eloranta, Opt. Eng. 30, 6–12 (1991). 162. C. Y. She et al., Appl. Opt. 31, 2,095–2,106 (1992). 163. D. P. Donovan et al., Geophys. Res. Lett. 25, 3,139–3,142 (1998). 164. Y. Sasano and E. V. Browell, Appl. Opt. 28, 1,670–1,679 (1989). 165. D. Muller et al., Geophys. Res. Lett. 27, 1,403–1,406 (2000). 166. G. Beyerle et al., Geophys. Res. Lett. 25, 919–922 (1998). 167. M. J. Post et al., J. Geophys. Res. 102, 13,535–13,542 (1997). 168. J. D. Spinhirne et al., Appl. Opt. 36, 3,475–3,490 (1997). 169. T. Murayama et al., J. Geophys. Res. 104, 3,1781–3,1792 (1999). 170. G. Roy et al., Appl. Opt. 38, 5,202–5,211 (1999). 171. K. Sassen K and C. Y. Hsueh, Geophys. Res. Lett. 25, 1,165–1,168 (1998).
172. T. Murayama et al., J. Meteorol. Soc. Jpn. 74, 571–578 (1996). 173. K. Sassen, Bull. Am. Meteorol. Soc. 72, 1,848–1,866 (1991). 174. G. A. Reagan, J. D. Spinhirne, and M. P. McCormick, Proc. IEEE 77, 433–448 (1989). 175. D. P. Donovan et al., Geophys Res. Lett. 24, 2,709–2,712 (1997). 176. G. P. Gobbi, G. Di Donfrancesco, and A. Adriani, J. Geophys. Res. 103, 10,859–10,873 (1998). 177. M. Pantani et al., J. Aerosol Sci. 30, 559–567 (1999). 178. H. Mehrtens et al., Geophys. Res. Lett. 26, 603–606 (1999). 179. T. Shibata et al., J. Geophys Res. 104, 21,603–21,611 (1999). 180. A. Tsias et al., J. Geophys. Res. 104, 23,961–23,969 (1999). 181. F. Stefanutti et al., Appl. Phy. B55, 13–17 (1992). 182. K. S. Carslaw et al., Nature 391, 675–678 (1998). 183. B. M. Knudsen et al., Geophys. Res. Lett. 25, 627–630 (1998). 184. I. A. MacKenzie and R. S. Harwood, J. Geophys. Res. 105, 9,033–9,051 (2000). 185. http://oea.larc.nasa.gov/PAIS/LITE.html. 186. D. M. Winker, R. H. Couch, and M. P. McCormick, Proc. IEEE 84, 164–180 (1996). 187. L. O’Connor, Mech. Eng. 117, 77–79 (1995). 188. K. B. Strawbridge and R. M. Hoff, Geophys. Res. Lett. 23, 73–76 (1996). 189. Y. Y. Y. Gu et al., Appl. Opt. 36, 5,148–5,157 (1997). 190. V. Cuomo et al., J. Geophys. Res. 103, 11,455–11,464 (1998). 191. H. Shimizu, S. A. Lee, and C. Y. She, Appl. Opt. 22, 1,373–1,382 (1983). 192. R. M. Hardesty, in D. K. Killinger and A. Mooradian, eds., Optical and Laser Remote Sensing, Springer-Verlag, Berlin, 1983. 193. S. D. Mayor et al., J. Atmos. Ocean. Tech. 14, 1,110–1,126 (1997). 194. J. Bilbro, in D. K. Killinger and A. Mooradian, eds., Optical and Laser Remote Sensing, Springer-Verlag, Berlin, 1983. 195. J. Rothermel et al., Opt. Express 2, 40–50 (1998). 196. J. Rothermel et al., Bull. Am. Meteorol. Soc. 79, 581–599 (1998). 197. R. Frehlich, J. Appl. Meteorol 39, 245–262 (2000). 198. R. M. Schotland, Proc. 3rd Symp. Remote Sensing Environ., 1964, pp. 215–224. 199. R. M. Schotland, Proc. 4th Symp. Remote Sensing Environ., 1966, pp. 273–283. 200. D. K. Killinger and N. Menyuk, Science 235, 37–45 (1987). 201. K. W. Rothe, U. Brinkmann, and H. Walther, Appl. Phys. 3, 115 (1974). 202. N. Menyuk, D. K. Killinger, and W. E. DeFeo, in D. K. Killinger and A. Mooradian, eds., Optical and Laser Remote Sensing, Springer-Verlag, Berlin, 1983. 203. E. E. Uthe, Appl. Opt. 25, 2,492–2,498 (1986). 204. E. Zanzottera, Crit. Rev. Anal. Chem. 21, 279 (1990). 205. M. Pinandito et al., Opt. Rev. 5, 252–256 (1998). 206. R. Toriumi et al., Jpn. J. Appl. Phys. 38, 6,372–6,378 (1999). 207. R. Toriumi, H. Tai, and N. Takeuchi, Opt. Eng. 35, 2,371–2,375 (1996). 208. D. Kim et al., J. Korean Phys. Soc. 30, 458–462 (1997). 209. V. Wulfmeyer, J. Atmos. Sci. 56, 1,055–1,076 (1999). 210. A. Fix, V. Weiss, and G. Ehret, Pure Appl. Opt. 7, 837–852 (1998).
LIDAR 211. E. V. Browell, Proc. IEEE 77, 419–432 (1989). 212. R. M. Banta et al., J. Geophys. Res. 103, 22,519–22,544 (1998). 213. E. Durieux et al., Atmos. Environ. 32, 2,141–2,150 (1998). 214. P. Weibring et al., Appl. Phys. B 67, 419–426 (1998). 215. T. Fukuchi et al., Opt. Eng. 38, 141–145 (1999). 216. N. S. Prasad and A. R. Geiger, Opt Eng. 35, 1,105–1,111 (1996). 217. M. J. T. Milton et al., Opt. Commun. 142, 153–160 (1997). 218. K. Ikuta et al., Jpn. J. Appl. Phys. 38, 110–114 (1999). 219. J. E. Kalshoven et al., Appl. Opt. 20, 1,967–1,971 (1981). 220. G. K. Schwemmer et al., Rev. Sci. Instrum. 58, 2,226–2,237 (1987). 221. V. Wulfmeyer, Appl. Opt. 37, 3,804–3,824 (1998). 222. J. Pelon, S. Godin, and G. Megie, J. Geophys. Res. 91, 8,667–8,671 (1986).
889
254. Yu. F. Arshinov and S. M. Bobrovnikov, Appl. Opt. 38, 4,635–4,638 (1999). 255. A. Behrendt and J. Reichardt, Appl. Opt. 39, 1,372–1,378 (2000). 256. J. W. Chamberlain, Physics of Aurora and Airglow, Academic Press, NY, 1961. 257. J. M. C. Plane, R. M. Cox, and R. J. Rollason, Adv. Space Res. 24, 1,559–1,570 (1999). 258. V. W. J. H. Kirchhoff et al., J. Geophys. Res. 91, 13,303– 13,307 (1986). 259. B. R. Clemesha et al., Geophys. Res. Lett. 26, 1,681–1,684 (1999). 260. K. H. Fricke and U. von Zahn, J. Atmos. Terrestrial Phys. 47, 499–512 (1985). 261. U. von Zahn et al., Ann. Geophys.-Atmos. Hydr. 18, 815–833 (2000).
223. J. Werner, K. W. Rothe, and H. Walther, Appl. Phys., Ser. B. 32, 113 1983.
262. C. S. Gardner et al., J. Geophys. Res. 91, 13,659–13,673 (1986).
224. I. S. McDermid, S. M. Godin, and L. O. Lindqvist, Appl. Opt. 29, 3,603–3,612 (1990).
263. X. Z. Chu et al., Geophys. Res. Lett. 27, 1,815–1,818 (2000).
225. T. J. McGee et al., Opt. Eng. 30, 31–39 (1991). 226. T. Leblanc and I. S. McDermid, J. Geophys. Res. 105, 14,613–14,623 (2000).
265. C. Y. She et al., Geophys. Res. Lett. 22, 377–380 (1995).
227. W. B. Grant et al., Geophys. Res. Lett. 25, 623–626 (1998). 228. T. J. McGee et al., Opt. Eng. 34, 1,421–1,430 (1995).
267. D. M. Hunten, Space Sci. Rev. 6, 493 (1967).
229. J. R. Quagliano et al., Appl. Opt. 36, 1,915–1,927 (1997). 230. C. Bellecci and F. De Donato, Appl. Opt. 38, 5,212–5,217 (1999). 231. G. Herzberg, Molecular Spectra and Molecular Structure I. Spectra of Diatomic Molecules, 2nd ed., Van Nostrand Reinhold Company, NY, 1950. 232. C. V. Raman, Indian J. Phys. 2, 387 (1928). 233. J. A. Cooney, Appl. Phys. Lett. 12, 40–42 (1968). 234. D. A. Leonard, Nature 216, 142–143 (1967).
264. A. Nomura et al., Geophys. Res. Lett. 14, 700–703 (1987). 266. M. R. Bowman, A. J. Gibson, and M. C. W. Sandford, Nature 221, 456–457 (1969). 268. A. Gibson, L. Thomas, and S. Bhattachacharyya, Nature 281, 131–132 (1979). 269. K. H. Frick and U. von Zahn, J. Atmos. Terrestrial Phys. 47, 499–512 (1985). 270. G. C. Papen, W. M. Pfenninger, and D. M. Simonich, Appl. Opt. 34, 480–498 (1995). 271. C. Y. She et al., Geophys. Res. Lett. 17, 929–932 (1990). 272. C. Y. She and J. R. Yu, Geophys. Res. Lett. 21, 1,771–1,774 (1994).
235. J. A. Cooney, J. Appl. Meteorol. 9, 182 (1970).
273. R. E. Bills, C. S. Gardner, and C. Y. She, Opt. Eng. 30, 13–21 (1991).
236. J. A. Cooney, J. Geophys. Res. 77, 1,078 (1972). 237. J. A. Cooney, K. Petri, and A. Salik, Appl. Opt. 24, 104–108 (1985).
274. C. Granier, J. P. Jegou, and G. Megie, Proc. 12th Int. Laser Radar Conf., Aix en Provence, France, 1984, pp. 229–232.
238. S. H. Melfi, Appl. Opt. 11, 1,605 (1972). 239. V. Sherlock et al., Appl. Opt. 38, 5,838–5,850 (1999). 240. W. E. Eichinger et al., J. Atmos. Oceanic Technol. 16, 1,753–1,766 (1999). 241. S. H. Melfi et al., Appl. Opt. 36, 3,551–3,559 (1997). 242. D. N. Whiteman and S. H. Melfi, J. Geophys. Res. 104, 31,411–31,419 (1999). 243. B. Demoz et al., Geophys Res. Lett. 27, 1,899–1,902 (2000). 244. A. Ansmann et al., J. Atmos. Sci. 54, 2,630–2,641 (1997). 245. R. Ferrare et al., J. Geophys. Res. 105, 9,935–9,947 (2000). 246. P. Keckhut, M. L. Chanin, and A. Hauchecorne, Appl. Opt. 29, 5,182–5,186 (1990). 247. K. D. Evans et al., Appl. Opt. 36, 2,594–2,602 (1997). 248. M. R. Gross et al., Appl. Opt. 36, 5,987–5,995 (1997). 249. J. A. Cooney, J. Appl. Meteorol. 11, 108–112 (1972). 250. A. Cohen, J. A. Cooney, and K. N. Geller, Appl. Opt. 15, 2,896 (1976). 251. J. A. Cooney and M. Pina, Appl. Opt. 15, 602 (1976). 252. R. Gill et al., Izvestiya, Atmos. Oceanic Phys. 22, 646–651 (1979). 253. Yu. F. Arshinov et al., Appl. Opt. 22, 2,984–2,990 (1983).
275. M. Alpers, J. Hoffner, and U. von Zahn, Geophys. Res. Lett. 23, 567–570 (1996). 276. G. C. Papen, C. S. Gardner, and W. M. Pfenninger, Appl. Opt. 34, 6,950–6,958 (1995). 277. V. Eska, U. von Zahn, and J. M. C. Plane, J. Geophys. Res. 104, 17,173–17,186 (1999). 278. J. P. Jegou et al., Geophys. Res. Lett. 7, 995–998 (1980). 279. B. R. Clemesha, MAP Handbook 13, 99–112 (1984). 280. J. A. Gelbwachs, Appl. Opt. 33, 7,151–7,156 (1994). 281. X. Z. Chu et al., Geophys. Res. Lett. 27, 1,807–1,810 (2000). 282. L. Thomas, Phil. Trans. R. Soc. Lond. Ser. A 323, 597–609 (1987). 283. U. von Zahn and J. Hoffner, Geophys. Res. Lett. 23, 141–144 (1996). 284. J. D. Spinhirne and S. P. Palm, 18th Int. Laser Radar Conf. (ILRC), Springer-Verlag, 1996, pp. 213–216. 285. http://virl.gsfc.nasa.gov/glas/index.html. 286. A. E. Marini, ESA Bull-Eur. Space 95, 113–118 (1998). 287. http://essp.gsfc.nasa.gov/cena/index.html. 288. http://aesd.larc.nasa.gov/oracle/index.htm. 289. S. Fleming et al., Tech. Memoir, NASA TM-100697, 1988.
890
LIGHTNING LOCATORS
LIGHTNING LOCATORS HAMPTON N. SHIRER Penn State University University Park, PA
WILLIAM P. ROEDER MS 7302, 45 WS/SYR Patrick AFB, FL
HAMPTON W. SHIRER University of Kansas Lawrence, KS
DAVID L. D’ARCANGELO Delta Airlines Hartford International Airport Atlanta, GA
JOBY HILLIKER Penn State University University Park, PA
JOE KOVAL The Weather Channel Atlanta, GA
NATHAN MAGEE Penn State University University Park, PA
INTRODUCTION Locating lightning in real time is an old problem (1). Radio techniques developed in the early to mid-twentieth century used crossed-loop cathode-ray direction finders (CRDF) that provide the bearing but not the range to the lightning source (2–4). Direction-finding (DF) systems typically sense the radio signal, known as atmospherics, spherics, or ‘sferics, that is emitted by lightning and that most listeners of AM radios interpret as interference, static, or radio noise (5, p. 351). Quite generally, lightning radiates electromagnetic pulses that span an enormous range of frequencies. In this article, the radio signal refers to the portion of the electromagnetic spectrum that covers frequencies less than 3 × 108 kilohertz (denoted kHz), or 300 GHz, and the optical signal refers to frequencies greater than 3 × 108 kHz. Also, when not qualified here, electromagnetic radiation refers to the radio signal. Modern real-time lightning locating systems have their origins in the work of Krider, Noggle, Uman, and Weiman, who published several important papers between the mid-1970s and early 1980s describing the unique characteristics of the electromagnetic waveforms radiated by both cloud–ground and intracloud lightning and their components (6–10). The initial application of their locating method was to identify where cloud–ground strokes might have initiated forest fires in the western United States and Alaska (11). Today, their method provides the basis for the North American Lightning Detection Network (NALDN) (12) operated by Global Atmospherics, Inc. (GAI) of Tucson, Arizona, the combination of the National Lightning Detection NetworkTM (NLDN) (13–15) in the United States and the Canadian Lightning Detection Network (CLDN) (12). Similar networks, noted in Table 1, are
installed in Europe, South America, and Asia. A smaller scale network, the Cloud to Ground Lightning Surveillance System (CGLSS), is operated by the 45th Weather Squadron of the United States Air Force (USAF) at the Cape Canaveral Air Force Station (CCAFS) and by the John F. Kennedy Space Center (KSC) at Cape Canaveral, Florida (16). These lightning location networks are by no means the only ground-based systems operating in either real time or for research. As listed in Table 1, there are numerous networks, including the long-range Arrival Time Difference (ATD) network operated by the British Meteorological Office at Bracknell in the United Kingdom (17–19); the Long-Range Lightning Detection Network (LRLDN) operated in North America by GAI (20); the Global Position and Tracking Systems (GPATS) network operated by Global Position and Tracking Systems Pty. Ltd. in Ultimo, New South Wales, Australia (21), that uses an electric field (E-field) sensor similar to that in the Lightning Position and Tracking System (LPATS) (22, pp. 160–162), which was incorporated into the NLDN in the mid-1990s (13); the E-field Change Sensor Array (EDOT) operated by the Los Alamos National Laboratory (LANL) in Los Alamos, New Mexico (23); the Surveillance et Alerte Foudre par Interf´erom´etrie Radio´electrique (SAFIR), a direction-finding system that is marketed by Vaisala Dimensions SA of Meyreuil, France, and is used in several locations in Europe (24,25), Japan (26), and Singapore; the research version of SAFIR, the ONERA three-dimensional interferometric mapper (27), operated by the French Office National d’Etudes et de Recherches A´erospatiales (ONERA); the Lightning Detection and Ranging (LDAR) system operated by the USAF and the United States National Aeronautics and Space Administration (NASA) at CCAFS/KSC (28–31); the deployable Lightning Mapping Array (LMA) or Lightning Mapping System (LMS) operated by the New Mexico Institute of Mining and Technology in Socorro, New Mexico (32–34); networks of electric field mills, among them the Launch Pad Lightning Warning System (LPLWS) operating at the CCAFS/KSC (35–38) and the Electric Field Measurement System (EFMS) operating at the Wallops Flight Facility at Wallops Island, Virginia; and networks of flash counters such as the Cloud–Ground Ratio 3 (CGR3) (39–41) and the Conference Internationale des Grands Reseaux Electriques (CIGRE) (42). Other systems listed in Table 1 include past and current satellite-mounted sensors such as the Defense Meteorological Satellite Program (DMSP) Operational Linescan System (OLS), which provided data from 1973–1996 (43,44); NASA’s Optical Transient Detector (OTD) on the Microlab-1 satellite, which provided data from 1995–2000 (45–47); NASA’s Lightning Imaging Sensor (LIS) on the Tropical Rainfall Measuring Mission (TRMM) satellite, which has been providing data since 1997 (31,34,48); the instruments on the Fast On-Orbit ´ satellite that has Recording of Transient Events (FORTE) been providing data since 1997 (49–51) and is operated by
891
Long-Range Lightning Detection Network [http://ghrc.msfc.nasa.gov/uso/readme/gailong.html]
NL
North American Lightning Detection Network National Lightning Detection NetworkTM [http://www.glatmos.com/products/data/nldnproducts.html]∗ [http://ghrc.msfc.nasa.gov/uso/readme/gai.html] Canadian Lightning Detection Network [http://www.weatheroffice.com/lightningnews/] NLDN is combination of Lightning Location and Protection, Inc. (LLP) MDF network with the Atmospheric Research Systems, Inc. (ARSI) TOA network, Lightning Position and Tracking System
NL NL
Other Lightning Detection Networks using same instruments as NALDN: NL Bureau of Land Management Alaska Lightning Network [http://www.nwstc.noaa.gov/d.HMD/Lightning/Alaska.htm] NL Austrian Lightning Detection & Information System [http://www.aldis.at/english/index.html] NL Brazilian LDN [http://physics.uwstout.edu/staff/jtr/msfc− final− report− 99.html] NL Central European (sensors in Switzerland, Germany, France, Netherlands, Czech Republic, Slovakia, Slovenia, Italy, Hungary, Poland, and Austria) [http://www.aldis.at/english/index.html]
NL
Los Alamos E-field Change Sensor Array [http://edot.lanl.gov/edot− home.htm]
NL
Global Position and Tracking Systems [http://gpats.com.au/]∗
Cloud to Ground Lightning Surveillance System
NL
NL
Arrival Time Difference Network [http://www.torro.org.uk/sfinfo.htm]
NL
Instrument/Network and Class of System NL: Network Locator; NM: Network Mapper; SSC: Single-Station Counter; SSL: Single-Station Locator; WD: Warning Device ∗ Manufacturer’s literature for commercial product
ALDIS
VLF/LF
VLF/LF
VLF/LF
VLF/LF
VLF/LF
LPATS
VLF/LF VLF/LF
VLF/LF
ULF/ VLF/LF
VLF/LF
VLF/LF
VLF/LF
VLF
Electromagnetic Radiation
CLDN
NALDN NLDN
EDOT
GPATS
LRLDN
CGLSS
ATD
Acronym
Electric Field Only (E-field)
Lightning Property
Other
(114)
(Continued)
(22, pp. 160–162)
(12)
(12) (13–15)
(23)
(21)
(20)
(16)
(17,18)
Reference
Table 1. Many of the Current Lightning Sensing Systems, Organized by Class. Details of Each System are Provided in Table 2. The Radio-Frequency Bands Under Lightning Property are ULF (ultralow frequency, 300 Hz to 3 kHz), VLF (very low frequency, 3 to 30 kHz), LF (low frequency, 30 to 300 kHz), and VHF (very high frequency, 30 to 300 MHz).
892
VHF
New Mexico Tech Lightning Mapping Array or Lightning Mapping System [http://ibis.nmt.edu/nmt− lms/]
NM
Other Lightning Detection Networks using same instruments as SAFIR SAFIRKEPCO Network in Kansa¨ı, and Tokyo, Japan SAFIRJWA Network in Tokyo, Japan SAFIR 3,000 JMA National Network for Meteorological Service, Japan SAFIR 3,000 IRM National Network for Meteorological Service, Belgium SAFIR KNMI National Network for Meteorological Service, Netherlands NM/ SAFIR 3,000 HMS National Network for Meteorological Service, Hungary NL SAFIR 3,000 SHMI National Network for Meteorological Service, Slovakia SAFIR MSS National Network for Meteorological Service, Singapore SAFIR 3,000 IMGW National Network for Meteorological Service, Poland SAFIR DIRIC Network in Paris area for Meteo France, France SAFIR 3,000 Sud-Est, Network in southeast France SAFIR CSG Network European Space Center (CSG), Kourou, French Guiana SAFIR DGA/CEL Network Defense Test Center (CEL), France SAFIR 3,000 Network for Institute of Meteorology, Hanover University, Germany SAFIR
SAFIR
NM/ NL
Surveillance et Alerte Foudre par Interf´erom´etrie Radio´electrique [http://www.eurostorm.com/]∗ [http://www.vaisala.com/]∗
ONERA-3D
NM/NL French National Agency for Aerospace Research (ONERA) Interferometric Three-Dimensional Mapper [http://www.onera.fr/]∗
LMA/ LMS
VHF/LF
VHF/LF
VHF/LF
VHF
NCLR
VLF/LF
VLF/LF
ALDS
LDAR
VLF/LF VLF/LF VLF/LF VLF/LF VLF/LF
Lightning Detection and Ranging [http://ghrc.msfc.nasa.gov/uso/readme/ldar.html] [http://www.nasatech.com/Briefs/Apr98/KSC11785.html]
Electromagnetic Radiation
NM
Acronym
Electric Field Only (E-field)
Lightning Property
Other Lightning Detection Networks using same instruments as NALDN (continued): NL French LDN [http://www.meteorage.com/ang/Accueil.htm] NL LPATS network in Israel NL Italian LDN [http://www.cesi.it/services/sirf/index.cfm] NL Japanese LDN [http://www.sankosha.co.jp/topics− n/frank.html] NL Nevada Automated Lightning Detection System [http://www.wrcc.sage.dri.edu/fire/ALDS.html] NL Nordic Center for Lightning Research (Denmark, Finland, Iceland, Norway, Sweden) http://thunder.hvi.uu.se/NCLR/NCLR.html] NL Slovenian LDN [http://observer.eimv.si/]
Instrument/Network and Class of System NL: Network Locator; NM: Network Mapper; SSC: Single-Station Counter; SSL: Single-Station Locator; WD: Warning Device ∗ Manufacturer’s literature for commercial product
Table 1. (Continued)
Other
(24,25)
(27)
(Continued)
(32–34)
(29–31)
(216,217)
(218) (219)
Reference
893
Aircraft Total Lightning Advisory System [http://ae.atmos.uah.edu/AE/ams− 1999b.html]
BLACKBEARD VHF Sensor on ALEXIS satellite [http://nis-www.lanl.gov/nis-projects/blackbeard/]
Defense Meteorological Satellite Program (DMSP) Operational Linescan System (OLS) [http://thunder.msfc.nasa.gov/ols/]
Fast On-orbit Recording of Transient Events satellite [http://forte.lanl.gov/]
Great Plains-1 [http://bub2.met.psu.edu/default.htm]
SSL
SSL
SSL
SSL
Thunder Bolt [http://www.spectrumthunderbolt.com/]∗
SSC
SSL
StrikeAlert Personal Lightning Detector [http://www.strikealert.com/]∗
SSC
SSC
Guardian Angel (also Sky ScanTM ) Lightning Detector [http://lightningdetector.com/angel/index.html]∗
SSC
Lightning Alert Lightning Detector [http://www.stormwise.com/]∗
Electrical Storm Identification Device/ASOS Lightning Sensor (TSS 924) [http://www.glatmos.com/products/local/localarea.html]∗
SSC
SSC
Cloud–Ground Ratio network of Flash Counters
Conference Internationale des Grands Reseaux Electriques Lightning Flash Counters [http://retd.edf.fr/gb/futur/mois/foudre/detect/a.en.htm]
SSC
Thunder Recording Employed in Mapping Branched Lightning Events [http://www.cs.unc.edu/∼stotts/145/OLD99/homes/tremble/]∗
NM
Instrument/Network and Class of System NL: Network Locator; NM: Network Mapper; SSC: Single-Station Counter; SSL: Single-Station Locator; WD: Warning Device ∗ Manufacturer’s literature for commercial product
Table 1. (Continued)
Electrostatic
VLF/LF
VHF/ Light GP-1
´ FORTE
VHF
VHF
VLF
VLF
VLF
Light
Light
VLF/LF
VLF
Event Trigger
Electromagnetic Radiation
DMSP/ OLS
BLACKBEARD
ATLAS
ESID/ALS
CIGRE
CGR3
TREMBLE
Acronym
Electric Field Only (E-field)
Lightning Property
Sound
Other
(56,57) (Continued)
(49–51)
(43,44)
(52,53)
(55)
(54), (155)
(42)
(39–41)
Reference
894
WD
WD
WD
WD
Thor Guard [http://www.thorguard.com/default1.asp]∗
SAFe Lightning Warning Network E-field network at European Space Center (CSG), Kourou, French Guiana
Lightning Warning System (SAFe) [http://www.eurostorm.com/]∗
SAFe
SAFe
EFMS
Electrostatic
Electrostatic
Electrostatic
ELF/ULF
Weather radars
SSL
Field Mill Networks The Launch Pad Lightning Warning System at CCAFS/KSC [http://www.tstorm.com/lplws.html] [http://ghrc.msfc.nasa.gov/uso/readme/kscmill.html] Electric Field Measurement System, Wallops Island [http://www.tstorm.com/wffefms.html]
VLF/LF
WD
VLF/LF
ELF/VLF
Strike Finder [http://www.strikefinder.com/stk.html]∗
ELF
Electrostatic
SSL
LPLWS
SOLLO
VLF/LF
SSL
Light
StormTracker Lightning Detection System [http://www.boltek.com/]∗
Long-Range Shipborne Detector
SOnic Lightning LOcation system [http://www.nasatech.com/Briefs/July00/KSC11992.html]
SSL
OTD
Light
SSL
Optical Transient Detector [http://thunder.msfc.nasa.gov/otd/] [http://ghrc.msfc.nasa.gov/uso/readme/otd.html]
SSL
LIS
VLF/LF
Electromagnetic Radiation
Storm Scope [http://www.gis.net/∼aviaelec/stormscope.htm]∗
Lightning Imaging Sensor [http://thunder.msfc.nasa.gov/lis/] [http://ghrc.msfc.nasa.gov/uso/readme/lis.html]
SSL
LDARS
Acronym
Electric Field Only (E-field)
Lightning Property
SSL
Lightning Detection and Ranging System [http://www.chinatech.com/lightning.htm]∗
SSL
Instrument/Network and Class of System NL: Network Locator; NM: Network Mapper; SSC: Single-Station Counter; SSL: Single-Station Locator; WD: Warning Device ∗ Manufacturer’s literature for commercial product
Table 1. (Continued)
Channel Ionization
Sound
Other
(35,37,38)
(60)
(59)
(58)
(45–47)
(31,34,48,163)
Reference
895
Locations given by TOA differences Several 1,000 km From <2 km in UK to from 5 receivers in UK, 1 in 15 km at service area Gibraltar, and 1 in Cyprus. limit; error increases Analysis of complete 2 to 23 kHz with range (VLF) waveforms performed
Continental-sized footprint ∼250 m
Aircraft-mounted VHF MDF receiver. Senses radiation given by initial breakdown of leader step into virgin air. Radiated power assumed to be same in each leader, so range given by attenuation of radiated power
ALEXIS-satellite-mounted VHF Instrument on (28 to 95 or 108 to 166 MHz) polar-orbiting detector. Operational 1993–1997 satellite
Locations given by MDF ∼60 km triangulation using VLF/LF peak signals from network of 6 IMPACT sensors. Time-domain waveform analysis used to separate CG from IC strokes
ATLAS (SSL)
BLACKBEARD (SSL)
CGLSS (NL)
Not provided
Not provided
Varies from 1.5% of the range at 100 km to 3% at 400 km
ATD (NL)
Network coverage over Nevada; mean range 400 km for each DF sensor
Locations given by MDF triangulation of VLF/LF peak signals from a network of receivers
<1 km owing to sensor spacing
Location Accuracy
ALDS (NL)
Network coverage over Austria; mean range 400 km for each DF sensor
Maximum Range
Locations given by MDF triangulation of VLF/LF peak signals from a network of 8 receivers
Approach ITF: interferometry MDF: magnetic direction finding TOA: time of arrival
ALDIS (NL)
Lightning System (class)
Possible forest fire initiation
∼85% of CG strokes
∼98% of flashes; up to 74 strokes per minute
Not provided
Not provided
Paired pulses in IC
Initial leader step
CG, primarily
CG and IC distinguished; only CG reported, with individual stroke polarity within a flash identified
CG and IC distinguished; only CG reported, with individual stroke polarity within a flash identified
(Continued)
Rocket launch support CG and IC and personnel distinguished; only safety at Cape CG reported Canaveral Air Force Station/ Kennedy Space Center (CCAFS/KSC)
Study of transionospheric pulse pairs (TIPPs)
Identification of the origin of lightning flashes in cloud, a hazard to aviation
>90% of storm areas; maximum Locate thunderstorms of 400 flashes/h over ranges of several 1,000 km
Operational weather forecasting support; locations of potential lightning damage
Applications
>90% of flashes, but appreciably <90% for individual CG strokes
Detection Efficiency
Stroke Type CG: cloud– ground IC: intracloud
Table 2. Approach, Application, Detected Stroke Type, Maximum Range, Accuracy, and Detection Efficiency for the Lightning Sensing Systems and Classes Listed in Table 1. The Data Presented are From Peer-Reviewed Work Where Possible. As in Table 1, If Marked with an Asterisk (∗ ), then the Information for a Commercial Product is Uncritically Provided Based on Documentation Given by the Manufacturer
896
CLDN (NL)
Locations given by sensors using MDF triangulation and TOA differences of VLF/LF peak signals. Time-domain waveform analysis used to separate CG from IC strokes. Network has 55 LPATS-IV TOA E-field sensors and 26 IMPACT ESP hybrid MDF/TOA sensors
Locations not given
Locations not given
Location Accuracy Determination of lightning climatology, as given by density of lightning strikes
Determination of lightning climatology, as given by density of lightning strikes
For 500-Hz counter, ∼80% within 40 km; for 10-kHz counter, 95% within 20 km
Applications
Essentially 100% within effective area
Detection Efficiency
Median position accuracy Flash detection efficiency 80% to Operational weather Coverage over 90% for peak currents >5 kA. ∼500 m over network. forecasting support; 9 million km2 area of Canada; Subsequent stroke detection Bearing error for locations of each IMPACT efficiency 50%. Up to 15 individual IMPACT potential lightning ESP sensor strokes per flash reported. ESP sensors <1° damage; lightning detects in 90 to More than 98% of strokes warning for outdoor 1,000 km range correctly characterized as CG events. Added to or IC. Stroke polarity NLDN to form assigned OK for 99.5% of LRLDN and strokes out to 600 km. Some NALDN indication that positive return strokes < 10 kA are misidentified IC strokes
Detects a lightning discharge if Effective range electric field changes sufficiently; (in which all omnidirectional 500-Hz or flashes that 10-kHz antenna used occurred are detected) is 20 to 30 km; maximum range up to 3 times greater
CIGRE (SSC)
Maximum Range
Detects a lightning discharge if Effective area (in electric field changes sufficiently; which all omnidirectional antenna used flashes that occurred are detected) depends on height of cloud top, latitude, and stroke polarity; varies from 315 to 442 km2
Approach ITF: interferometry MDF: magnetic direction finding TOA: time of arrival
CGR3 (SSC)
Lightning System (class)
Table 2. (Continued)
(Continued)
CG and IC distinguished; individual CG stroke polarity within a flash identified. LPATS-IV sensor can distinguish inter- and intracloud strokes
CG and IC, combined
CG and IC, combined
Stroke Type CG: cloud– ground IC: intracloud
897
Wallops Island, Virginia, launch pad area
Nominally, up to 50 km; up to 80 km for intense CG or IC strokes
Network of 10 to 16 field mills sampling from dc to 10 Hz and electric field change meters sampling from 10 Hz to 2 kHz. Their time-varying values over the region alert forecaster to possible lightning hazard
Optical and VLF/LF radiation received within milliseconds by single omnidirectional instrument triggers lightning detection; range within 1 of 3 intervals given by comparison of optical and radio signals
Satellite-mounted VHF and optical sensors; simultaneous observations by these sensors best identifies a lightning flash occurrence. Operational 1997 to date
EFMS (WD)
ESID/ALS (SSC)∗
´ FORTE (SSL)
Detection Efficiency
Instrument on polar-orbiting satellite for nuclear test ban treaty verification
Detection within field of view of diameter 1,200 km; samples an area for ∼15 min
Range resolved into 0 to 8 km, 8 to 16 km, and 16 to 50 km intervals
Similar to LPLWS
Not yet determined, as correlating optical and VHF signals is made difficult by light scattering in cloud
Reports more than local human observers. Near 100% flashes reported within 16 km when at least 3 discharges; near 90% for CG within 16 km. Thunderstorm reported by ALS if 2 strokes in 15-minute period; thunderstorm detection has < 0.5% false alarm rate
Similar to LPLWS
Few hundred km, Within a few km for Near 100% for storms in within network storms within the array network coverage area coverage area
Measures electric field change in 300-Hz to 300-KHz (ULF through LF) band. When above trigger level, locations given by TOA differences. Networks of 5 sensors in New Mexico/Texas, 5 in Florida, 1 in Nebraska
EDOT (NL)
Location Accuracy
Footprint on the Up to 100 km error due to Often < 2% (dawn, dusk and earth of 700 to manual digitization, local midnight only) 1,360 km, with navigation, and width global coverage of lightning streak; imagery of 2.7 km resolution
Maximum Range
Optical instrument mounted on DMSP polar-orbiting satellites that identifies a lightning flash from optical signature. Data available from 1973, 1977, 1978 and 1985–1996
Approach ITF: interferometry MDF: magnetic direction finding TOA: time of arrival
DMSP/OLS (SSL)
Lightning System (class)
Table 2. (Continued)
All types; not separated
CG and IC, combined
All types, combined
Development of lightning climatologies and study of lightning processes in clouds
(Continued)
Stepped leaders, dart leaders, as well as CG and IC, distinguished
ESID used for CG and IC, personnel safety distinguished and equipment protection; ALS part of Automated Surface Observing System (ASOS) in United States
Provides detection of flashes or warning of possible triggering of lightning during rocket launches
Ground truth for lightning detections ´ satellite by FORTE
Research studies of NOx budgets and transport, as lightning is a significant natural source of NOx , and of the natural variability of thunderstorms
Applications
Stroke Type CG: cloud– ground IC: intracloud
898
Location up to 400 km, depending on bearing; bearing up to 500 km
Covers the 1,000,000 km2 area of Australia
Up to 70 km
Network of 6 VLF/LF (2 kHz-500 kHz) LPATS-III E-field sensors in Australia that locate lightning via TOA techniques
Single portable omnidirectional receiver that estimates range based on received signal at pairs of VLF/LF frequencies
Locations given by TOA differences Up to 300 km in of discrete broadband VHF (63 to horizontal, but 69 MHz) peak signals from a from 2 km to network of 7 receivers. Channel 16 km in mapped using tips of negative vertical leader steps above 2 km; ground strike point given by CGLSS
GPATS (NL)∗
Guardian Angel/ Sky ScanTM (SSC)∗
LDAR (NM)
Maximum Range
Single VLF/LF (1 to 400 kHz) receiver. Bearing via MDF and range via application of signal propagation model relating peak of received magnetic wave intensity to distance. Time-domain waveform analysis used to separate CG from IC strokes
Approach ITF: interferometry MDF: magnetic direction finding TOA: time of arrival
GP-1 (SSL)
Lightning System (class)
Table 2. (Continued)
Detection Efficiency
Operational weather forecasting support; locations of potential lightning damage; lightning warning for outdoor events Warning of approaching thunderstorms for outdoor events
Similar to other LPATS TOA systems: >95% of CG strokes detected within a specified range
∼97% in the 0 to 5 km bin
Detection of coherent lines and regions of thunderstorms and electrically active convective cells
Applications
CG and IC, combined
CG and IC distinguished
CG and IC distinguished; only CG used
(Continued)
3-D locations of up to 100 pulses Rocket launch support Negative leader steps Median error of 3-D in each flash. At < 25 km and personnel produced in impulsive locations 50 to 100 m range, >99%. In a separate safety at Cape VHF radiation; within the sensor study, estimates of flash Canaveral Air Force cannot detect positive baseline of 10 km when detection efficiency >90% in Station/Kennedy leaders. CG and IC stepped leader 90–100 km range, but < 25% Space Center stroke locations originates >3 km, at 200 km range (CCAFS/KSC) in inferred from channel increasing steadily to Florida. Some orientation 900 m at 40 km. warning of CG Standard deviations of strokes possible, as source locations detects precursor IC increase from 4 km at strokes 60 km range to 30 km at 260 km range, and of flash locations are ∼1 km at 50 km range and remain within ±7 km out to 250 km range. Maximum of 104 samples per second
Range resolved into 0 to 5 km, 5 to 15 km, 15 to 35 km and 35 to 70 km intervals
Mean location error better than 200 m
Best ranging between 100 Not provided and 300 km from receiver. Mean absolute location error of ∼50 km
Location Accuracy
Stroke Type CG: cloud– ground IC: intracloud
899
Satellite-mounted optical Satellite covers instrument on Tropical Rainfall only half the Measuring Mission (TRMM) earth centered satellite that identifies a on equator lightning flash from optical signature. Operational from 1997 to date
Locations given by TOA differences Up to 100 km in of VHF (60 to 66 MHz) peak horizontal, but signals from a deployable above 2 km in network of 10 sensors over a vertical 60 km diameter area
Single-station system that finds 1,000 to 3,000 km 5% uncertainty for range, Not provided range using spectral components few degrees for bearing of ELF (< 1 kHz) and VLF (∼1 to 10 kHz) signals and finds bearing using Poynting vector based on covariances between temporally varying horizontal magnetic and vertical electric fields
LMA/LMS (NM)
Long-Range Shipborne Locator (SSL)
50 to 100 m rms errors over the 60 km diameter area of network
Detection within field of view that spans a 600 × 600 km area; samples one storm ∼90 s. Spatial resolution 4 to 7 km. Location accuracy better than 10 km. Near 100%, as with LDAR
∼90% owing to excellent filtering of sun glint and reflected visible light
Up to 600 km; Range estimates reflected Not provided; will vary with range increases in rate at which antenna siting with antenna warning light flashes height above ground
LIS (SSL)
Not provided
Detection Efficiency
Single omnidirectional VLF receiver triggers on signal above a threshold; both portable and fixed models available
Bearing within 1 ° and range within 1 km
Location Accuracy
Lightning Alert Detector (SSC)∗
Maximum Range
Single VLF/LF receiver that locates Up to 250 km for radiation source using phase and CG, 60 km for time differences of the received IC electromagnetic waves
Approach ITF: interferometry MDF: magnetic direction finding TOA: time of arrival
LDARS (SSL)∗
Lightning System (class)
Table 2. (Continued)
CG and IC, combined
Lightning detection from a ship for thunderstorm climatological studies
Field programs studying lightning characteristics
(Continued)
CG and IC; not separated
Leader steps; CG and IC strokes inferred from channel orientation
Lightning climatology; All types, combined comparison of cloud lightning signature with that given by ground-based sensors
Warning of approaching thunderstorms
Lightning warning for CG and IC aviation and other distinguished, with regional individual stroke transportation polarity within a flash authorities identified
Applications
Stroke Type CG: cloud– ground IC: intracloud
900
(Continued)
CG and IC distinguished; only CG reported over network, with individual stroke polarity within a flash identified. CLDN can distinguish intercloud and intracloud strokes using LPATS-IV sensors
Locations given by sensors using Coverage over Median position accuracy Flash detection efficiency 80% to Operational weather MDF triangulation and TOA 90% for peak currents >5 kA. United States forecasting support; ∼500 m over network. differences of VLF/LF peak Subsequent stroke detection (NLDN) & locations of Bearing error for signals. Time-domain waveform efficiency 50%. Up to 15 Canada potential lightning individual IMPACT analysis is used to separate CG strokes per flash reported. (CLDN); each damage; lightning ESP sensors < 1 ° from IC strokes. NLDN has 59 More than 98% of strokes IMPACT ESP warning for outdoor LPATS-III TOA E-field sensors correctly characterized as CG sensor detects events and 47 IMPACT hybrid or IC. Stroke polarity in 90 to MDF/TOA sensors. CLDN has 55 assigned OK for 99.5% of 1,000 km range LPATS-IV TOA E-field sensors strokes out to 600 km. Some and 26 IMPACT ESP hybrid indication that positive return MDF/TOA sensors strokes < 10 kA are misidentified IC strokes
NALDN/ NLDN (NL)∗
All types; not separated
CG and IC distinguished. LPATS-IV sensor can distinguish inter- and intracloud strokes
CG and IC distinguished; only CG reported, with individual stroke polarity within a flash identified
Provides detection of flashes or warning of possible triggering of lightning during rocket launches at CCAFS/KSC
Operational weather forecasting support; locations of potential lightning damage; lightning warning for outdoor events
Applications
Long-range network operated by Up to 2,000 to Median location accuracy Only lightning flashes with peak Lightning detection GAI. Uses data from the 4,000 km. ∼5 km when lightning currents >30 kA detected. over oceans in combined MDF/TOA NALDN Location is located between Overall peak detection support of aviation sensors, reprocessed to use accuracy and subsets of sensors and efficiency in a test region over sky-wave signals for lightning detection propagation paths United States 10–25% during detection over the North Atlantic efficiency between 1,200 and the night and a few percent and Pacific Oceans studied over 1,600 km. Limited to 16 during the day. For peak 1,200 to to 32 km over 2,000 to currents >45 kA, detection 1,600 km range 4,000 km range efficiency in a 24-hour period approaches 50%
Flash detection ∼90%
>95% CG detection within a specified range
Detection Efficiency
Stroke Type CG: cloud– ground IC: intracloud
LRLDN (NL)
Median charge location errors of 2 to 20 km
Network of 31 field mills over a ∼20 km outside 20 × 20 km area sampling network electric field potential gradient at 50 Hz. Contours of gradient in coverage area alerts forecaster to possible lightning hazard. Flash inferred if sufficient E-field gradient
LPLWS (WD)
Location Accuracy Median accuracy better than 500 m
Maximum Range
Network of VLF/LF (2 kHz to Coverage over 500 kHz) E-field sensors that network; each locate lightning using TOA sensor detects techniques. Part of NALDN; up to several portion in Canada (CLDN) uses hundred km series IV sensors that distinguish intercloud and intracloud strokes. Series III sensors used in NLDN and in GPATS
Approach ITF: interferometry MDF: magnetic direction finding TOA: time of arrival
LPATS (NL)
Lightning System (class)
Table 2. (Continued)
901
Maximum Range
Location Accuracy
Scattering of radar beam off ionized Nearby radar up Resolution depends on lightning channel; from to a few 100 km sampling strategy nonscanning or dual polarization radar
Radar (SSL)
Detection within field of view that spans a 1, 300 × 1, 300 km area; 8 km nadir resolution. Ground range errors typically 20 to 40 km, with median of 50 km
Satellite-mounted optical instrument on MicroLab-1 satellite that identifies lightning flash from optical signature. Operational 1995–2000 Low-earthorbiting satellite in a 70° -inclination orbit covers any spot on earth twice per day
ITF triangulation of narrowband Nominal Varies with receiver VHF (at a selectable center maximum spacing: from 500 m frequency in the 110 to 118 MHz range rms for 30-km spacing band in 1 MHz bandwidths) ∼100 km. Plan to 750 m rms for signals from pairs of receivers view depiction 120 km spacing. provides phase differences and so at large ranges, Temporal resolution direction to source; triangulation 3-D mapping at 10 µs. Maximum of 4,000 samples per using these radials allows 3-D closer ranges second. Bearing error mapping of channel. Stroke type < 0.1 ° for each and ground strike points given by interferometer in LF radiation techniques network
Approach ITF: interferometry MDF: magnetic direction finding TOA: time of arrival
OTD (SSL)
ONERA-3D (NM/NL)∗
Lightning System (class)
Table 2. (Continued)
Varies with beam width and amount of precipitation
Cloud–ground detection efficiency 40% to 69% based on comparison with NLDN observations
Detection rate ∼95% within 200 km range for low-elevation sources
Detection Efficiency
Case studies relating occurrence of flash to rain gush
(Continued)
All types, combined
Lightning climatology; CG and IC, combined global flash rate found to be 40 strokes per sec; lightning mostly over land
Originally, launch All stages of breakdown support and in lightning process, personnel safety for especially negative European space leader steps. Best program, followed detects continuous by many commercial VHF noise associated and weather service with K-processes or users. Some dart leaders. CG and warning of CG IC stroke type and strokes possible, as polarity inferred from detects precursor IC channel orientation strokes and LF wave analysis
Applications
Stroke Type CG: cloud– ground IC: intracloud
902
ITF using five acoustic sensors Up to 1.6 km provides 3-D direction to source. TOA differences between electric and acoustic waves provide range to source; one electric field sensor.
SOLLO (SSL)
Detection rate >90% within ∼250 km range for all lightning types, or within 400 km for high-elevation sources
Close to 100% up to 10 km
Detection Efficiency
Median position accuracy Not provided <5 m for strokes within 1 km
ITF triangulation of narrowband Nominal Varies with receiver VHF (at a selectable center maximum spacing: from 500 m frequency in the 110 to 118 MHz range rms for 30-km spacing band in 1 MHz bandwidths) ∼250 km, but to 750 m rms for signals from pairs of receivers up to ∼400 km, 120-km spacing. provide phase differences and so depending on Temporal resolution direction to source; triangulation sensor baseline 10 µs. Maximum of 4,000 samples per using these radials allows 2-D or and elevation. second. Bearing error 3-D mapping of channel. Stroke Plan view < 0.1 ° for each type and ground strike points depiction at interferometer in given by LF radiation techniques large ranges; network 3-D mapping at closer ranges
No location provided
Location Accuracy
SAFIR (NM/NL)∗
15 km for cumulonimbus without lightning; 30 km for cumulonimbus with lightning
Maximum Range
Analysis of cumulonimbus electrostatic field evolutions for estimation of lightning hazard
Approach ITF: interferometry MDF: magnetic direction finding TOA: time of arrival
SAFe (WD)∗
Lightning System (class)
Table 2. (Continued)
Static field; both IC and CG
Provides strike point on launch pad at CCAFS/KSC to determine if inspections for damage needed
(Continued)
CG and IC strokes producing thunder detected; CG prime interest
Originally, launch All stages of breakdown support and in lightning process, personnel safety for especially negative European space leader steps. Best program in French detects continuous Guiana, followed by VHF noise associated many commercial with K-processes or and weather service dart leaders. CG and users (France, IC stroke type and Belgium, polarity inferred from Netherlands, channel orientation Germany, Hungary, and LF wave analysis Slovakia, Poland, Italy, Singapore, Japan). Some warning of CG strokes possible, as detects precursor IC strokes
Lightning hazard warning
Applications
Stroke Type CG: cloud– ground IC: intracloud
903
Up to 400 km
Up to 500 km
Up to 70 km
Up to 400 km
Single VLF/LF receiver. Bearing via MDF and range via application of signal propagation model relating received signal intensity to distance
Single portable omnidirectional receiver triggers on VLF/LF radiation above a threshold; estimates whether storm is approaching or receding
Aircraft-mounted sensor that finds bearing via MDF and range using a function of received electromagnetic signal intensity
Storm Tracker (SSL)∗
StrikeAlert Lightning Detector (SSC)∗
Strike Finder (SSL)∗
Maximum Range
Aircraft-mounted sensor that finds bearing via MDF and range using a function of received electromagnetic signal in a narrow frequency band and the correlations between the electric and magnetic field signals.
Approach ITF: interferometry MDF: magnetic direction finding TOA: time of arrival
Storm Scope (SSL)∗
Lightning System (class)
Table 2. (Continued)
Not provided
Range resolved into 0 to 10 km, 10 to 20 km, 20 to 40 km and 35 to 70 km intervals
Not provided
Not provided
Location Accuracy
Not provided
Not provided
Not provided
Not provided
Detection Efficiency
Identification of lightning flashes in cloud, a hazard to aviation
Warning of approaching thunderstorms for outdoor events
Warning of approaching thunderstorms
Identification of lightning flashes in cloud, a hazard to aviation
Applications
(Continued)
CG and IC, combined
CG and IC, combined
CG
CG and IC, combined
Stroke Type CG: cloud– ground IC: intracloud
904 Up to 100 km
Single portable omnidirectional receiver triggers on VLF/LF radiation above a threshold; estimates whether storm is approaching or receding
Locations given by TOA differences Within 15 km from a network of 3 to radius of each 6 microphones; event timing recording triggered by radio detection of station lightning signal. 3-D mapping of lightning channel provided
Thunder Bolt Lightning Detector (SSC)∗
TREMBLE (NM)∗
Maximum Range Up to 8 to 20 minutes warning; proximity sensor
Approach ITF: interferometry MDF: magnetic direction finding TOA: time of arrival
Electric field change meter that uses the character of these changes to estimate lightning hazard probability
Thor Guard (WD)∗
Lightning System (class)
Table 2. (Continued)
∼100% flash detection if storm occurs within 5 minutes; unit displays strike probability
Not provided
Detection Efficiency
Not provided; will depend Not provided; will depend on on microphone spacing number of simultaneous detections by at least four microphones
Within 5% of range at 50 km and within 10% of range at 100 km
Not provided
Location Accuracy
Visualization of lightning channel for thunderstorm research
Warning of approaching thunderstorms for outdoor events
Warning of lightning risk for outdoor events
Applications
Leader steps; CG and IC strokes inferred from channel orientation
CG and IC, combined
CG and IC, combined
Stroke Type CG: cloud– ground IC: intracloud
LIGHTNING LOCATORS
the United States Department of Energy (DOE); and the BLACKBEARD instrument on the Array of Low Energy X-ray Imaging Sensors (ALEXIS) satellite (52,53) that provided data from 1993 to 1997 and is operated by LANL. Limited ranging may be accomplished by using functions of the electromagnetic wave radiated by a lightning stroke, as done by a host of single-station detectors (54–58); by combining measurements of the electromagnetic signal with those of the acoustic signal (thunder), as done, for example, in the SOLLO system currently under development at CCAFS/KSC (59); or by using the fact that the heated and highly ionized lightning channel reflects a radar signal (60). As summarized in Table 2, estimates of the range to the lightning that are given by a particular system may vary from local to global according to both the particular aspect of the radio or optical signal that is detected and the intended application of the information. Lightning sensing systems have important applications in meteorology. Certainly, these systems can help to detect and forecast the arrival of thunderstorms by monitoring their positions, motions, and intensities. Such systems are also important for lightning safety and for protecting natural resources in a diverse set of realtime applications, including fire and personnel safety, aircraft routing, missile launch decisions, power grid operation, and weather forecasting (1,11,13). Lightning is a significant threat to people (Fig. 1) but far too often is underrated. Lightning is ranked the #2 source of direct weather deaths in the United States and kills an average of about 100 people each year — more than hurricanes and tornadoes combined. Lightning inflicts lifelong severe
905
injuries on many more, about 750 people each year in the United States. Lightning also causes about $5 billion in damage in the United States each year (61–63). More information on lightning safety is available at the Lightning Safety web page of the 45th Weather Squadron, USAF (64). Lightning sensing systems, especially those that detect intracloud lightning, are being investigated as tools for predicting severe thunderstorms, including tornadoes, microbursts, and hail (47,65–67). Lightning data are also being investigated as possible input for smallscale numerical weather forecast models to supplement the usual observations of pressure, temperature, humidity, and winds. This article is not an exhaustive survey of all real-time and research lightning location systems, but it provides a comprehensive overview of the topic. It is split into two major parts; the first reviews the many important concepts and defines the terminology used to describe them, and the second covers the different major types of operational lightning location systems by describing an example of each in detail. These sections refer to one another but can be read separately, as many readers may wish to learn about certain aspects of the lightning location problem quickly and so may not want to read the entire article. This approach to presenting the material has led to redundancy in some cases, but the repeated information is retained to benefit the reader. There are many excellent books that describe lightning characteristics and lightning location methods in greater detail than presented here (22,68–77). Many of the lightning facts herein are documented thoroughly in these books and in the articles they reference. Other important sources cited here include papers from conference preprint volumes as supplements to the numerous papers published in the reviewed literature. Finally, the methods and detection systems reviewed in this article are also described on various World Wide Web (www) sites. Table 1 lists the addresses for many of these sites, enclosed in brackets, recognizing that www addresses change frequently or that some web sites are not kept as current as users would like. CLASSES OF LIGHTNING SENSING SYSTEMS
Figure 1. Cloud–ground lightning stroke near Lane Stadium in Blacksburg, Virginia, just before the start of the Black Coaches Association Bowl on 27 August 2000. More than 56,000 fans were in the stadium at the time. They were evacuated and the game was postponed to a later date (Photograph by Eric Brady reproduced with the permission of The Roanoke Times in Roanoke, Virginia). See color insert.
Real-time lightning sensing systems fall into four broad classes: locators, direction finders, flash counters, and warning systems (Tables 1 and 2). Lightning locating systems may themselves be separated into two major types: network locators (NL) that provide a two-dimensional plan view that gives the coordinates at the earth’s surface at which cloud–ground strokes are occurring and network mappers (NM) that provide a mapping view that gives the spatial coordinates within the atmosphere for the entire lightning channel in three dimensions. Depending on the selected radio receiver bandwidth, or breadth of accepted frequency content, plan view systems can provide coverage varying from local to global (Table 2). In contrast, ground-based mapping systems are restricted to local coverage, because they depend on line-of-sight reception of very high frequency (VHF, 30 MHz to 300 MHz) emitted electromagnetic radiation.
906
LIGHTNING LOCATORS
Lightning Direction Finders
Generally, MDF systems have ranges up to a few hundred kilometers, and, depending on the passband of the receiver chosen, can produce bearing errors ranging from 1 to more than 10° . Large bearing errors are found predominantly in narrowband receiving systems (78–80;81, pp. 184–185), which were the type first used for radio direction finding in the 1920s (2). Because these systems detect the signal from all parts of the lightning channel, which can branch considerably above ground (74, p. 357), the bearing errors are related to multipath propagation caused by nonvertical channel orientation (5, p. 370) that lead to sky wave reflections off the ionosphere (81, pp. 184–185) (see later). In contrast, wideband receiving systems that resolve the initial peak b of the return-stroke magnetic field and that use timedomain waveform analysis to separate cloud–ground from intracloud discharges (11) (see later), have accuracies of the order of 1° because they use only the single ground-wave signal radiated from the vertically oriented cloud–ground stroke channel near where it contacts the ground (7,82).
Lightning direction finders (DF), the oldest type of electronic lightning sensing systems (1), give only the bearing from which the lightning signals originate. DF systems use magnetic direction finding (MDF), time of arrival (TOA), or interferometric (ITF) techniques. When used in concert, DF systems form networks of lightning locators or lightning mappers (see later). Magnetic Direction Finding (MDF). As illustrated in Fig. 2, the magnetic direction finding technique finds the tangent of the bearing angle θ that is given by the ratio of the peaks ew and ns of the two horizontal components EW and NS of the magnetic field (Bfield) signal. These components are received by two vertical and orthogonal, crossed-loop antennas, or B-field antennas (1;5, p. 370; 6,11), that are typically aligned in the east–west (EW) and north–south (NS) planes. Because the value of θ determined this way from the B-field has an inherent bearing ambiguity of π radians (180° ), an electric field (E-field) sense antenna, often called simply an E-field antenna, is included with the crossedloop antennas to determine stroke polarity (see later); by knowing the polarity given by the E-field measurement, one can easily determine whether the signal arrived from quadrants 1 or 4 or from quadrants 2 or 3 (11).
Figure 2. Plan view of the crossed-loop antenna array used for magnetic direction finding (MDF). The radiation pattern of each loop provides a signal level proportional to the cosine of the bearing θ to the source indicated by the lightning bolt. If the reference plane is assigned to one loop, such as the north–south loop that is appropriate for navigational bearings used by meteorologists, then the east–west loop signal level is proportional to the sine of the bearing relative to the reference plane. Because the loop signal voltage is proportional to the rate of change of the intercepted magnetic field, integrators in each loop-processing channel provide the level of the loop signal. NS and EW provide the corresponding loop signal levels that are used to calculate the incoming time-domain magnetic field B using standard trigonometry. Unless the electromagnetic field polarity is known, the signal bearing is ambiguous. In networked systems, however, only the bearings given by intersecting radials are the correct ones (Fig. 3). In single-station locators (see later), the field polarity can be found from the polarity of the initial peak of the voltage field by using a separate voltage field (E-field) sense antenna. The initial peak levels, ns and ew, of each vector are used for bearing estimation. The navigational bearing to the source is found as the arctangent of the ratio of the pair of peak signal levels, ew/ns. The stroke waveforms from Fig. 6, south south-east (SSE) of Lawrence, Kansas are used in this example. With the value of θ increasing clockwise from due north, the NE quadrant is 1, the SE quadrant is 2, the SW quadrant is 3 and the NW quadrant is 4. As the SE quadrant is quadrant 2 because ew > 0 and ns < 0, π radians (180 ° ) must be added to the calculated arctangent to give a bearing of 152 ° . Finally, the peak level b of B that can be used for range estimation via a signal propagation model is given by b = ns cos(θ) + ew sin(θ), once the value of θ is calculated.
Time of Arrival (TOA). The time of arrival (TOA) method for locating lightning is based on the fact that, if the source is on or near the earth’s surface, then the difference in times at which a signal is detected by a pair of stations, when multiplied by the speed of light, is described spatially
Magnetic crossed loop direction finder North-south loop signal ∼ B cos θ East-west loop signal ∼ B sin θ θ1,4 = arctan (ew/ns) θ2,3 = π + arctan (ew/ns)
N Reference plane
+
θ W
−
ew
E
ew EW
b
ns NS ns
+
−
b
S B
LIGHTNING LOCATORS (b) MDF station TOA station
(a)
(e) (c)
(d)
Figure 3. Illustration of magnetic direction finding (MDF) triangulation and time-of-arrival (TOA) methods for locating where a cloud–ground lightning stroke has occurred, as given by the lightning bolt in the center. Stations A and B to the west and north of the strike location use the MDF technique to define the radials, given by the dashed lines, along which the electromagnetic signal from the lightning travels (Fig. 2). The TOA method applied at each station pair (C, D) and (D, E) to the south of the strike location is based on the fact that a hyperbola defines the locus of points from which a ground-based signal could have originated when the difference in the arrival times at a station pair is the same; this fact relies on the good assumption that radio waves emanating from lightning propagate at the speed of light. Each station is at a focus of a hyperbola. Because the pair of foci defines two possible hyperbolas, the station that receives the signal first gives the particular hyperbola used. Here, stations C and E detect the lightning signal before station D, giving hyperbolas closer to C and E. In both methods, the intersection of the radials or the hyperbolas provides an estimate of the strike location; combining the two methods via a least-squares optimization approach, as done in a hybrid lightning location network (see later), provides the best estimate.
by a two-dimensional hyperbola, with the stations at its foci (1,83). A hyperbola describes the locus of points whose difference in distances from the foci is constant, as for the station pairs (C, D) and (D, E) in Fig. 3 south of the lightning strike point. The sensors in the original TOA network in the United States, operated by Atmospheric Research Systems, Inc. (ARSI) (13,14), as well as those in other networks currently operated throughout the globe by Global Position and Tracking Systems Pty. Ltd. (GPATS) (21), are commonly called Lightning Position and Tracking Systems (LPATS) sensors (Table 2). Interferometry (ITF). An interferometer is an example of a DF system employing antennas that combine both the magnetic and voltage component of the incoming field (Fig. 4) (84). Interferometry (ITF) measures the direction angle θ of an incoming signal using the difference in phase φ between the radio signals received by closely spaced pairs of receivers. When a pair of receivers is spaced at a distance D that is equal to half the wavelength λ
907
of the sampled radiation, φ is given simply by π cos θ (Fig. 4). A direction angle θ perpendicular to the baseline of the interferometer, θ = π/2, is indicated by a phase φ = 0. A phase of φ = π occurs when θ = 0 and a phase of φ = −π occurs when θ = π . Because of the unavoidable consequence of having to calculate a bearing from the phase using the arccosine function, a signal arriving from a bearing in the upper half plane of the figure (+θ ) produces the same phase as a signal arriving from the lower half plane (−θ ). Additional bearing ambiguities occur when the sensor spacing D > λ/2 (see later) (22, p. 155). Because interferometers are used with triangulation in network systems, however, this uncertainty does not pose a problem when the interferometers are distributed wisely or use different frequencies. Depending on the separation between the receiver pairs, systems such as SAFIR (see later) that determine bearing using interferometry can operate in either plan view NL or three-dimensional NM modes (22, p. 155; 24). Single-Station Systems Lightning warning systems (WD) measure the thunderstorm electric field or lightning precursors within a few tens of kilometers of the instrument. An example of such a system is the Thor Guard instrument that warns of imminent lightning when the magnitudes of the temporal variations in the electric field near the earth’s surface exceed their fair-weather values. Vaisala Dimensions SA also manufactures an early lightning-warning system called SAFe that uses electric field measurement; a network of SAFe units is operated at the European Space Center in Kourou, French Guiana, to aid launch decisions. Single-station flash counters or proximity sensors (SSC) indicate that lightning is occurring within a few tens or hundreds of kilometers of the receiver but are omnidirectional because they provide no bearing to the source. Some of these instruments may provide limited distance information, usually divided into three or four range-bins (Table 2). For example, both the solarpowered Electrical Storm Identification Device (ESID) (see later) and the similar Thunderstorm Sensor Series (TSS924) device, which is also called the ASOS Lightning Sensor (ALS) because it is the instrument used on some Automated Surface Observing System (ASOS) weather stations operated by the United States National Weather Service (NWS), are proximity sensors that report the occurrence of lightning in three distance ranges from the station (54). These instruments use both the received radio and optical signals from a possible flash to identify it unambiguously as lightning. An early single-station proximity sensor marketed by the A.D. Little corporation is used at CCAFS/KSC, but it has low detection efficiency, and so has been supplanted by more advanced systems such as CGLSS (see later) and LDAR (see later). Many inexpensive detectors, some of which are listed in Table 2, are aimed at the consumer market to warn of nearby or approaching lightning. Higher precision flash counters such as CGR3 and CIGRE are often used in thunderstorm research projects, and such instruments have provided much of what is known about lightning phenomenology (39–42;74, Chap. 2).
908
LIGHTNING LOCATORS
λ
q=0 Figure 4. Simplified very high frequency (VHF) interferometer (ITF) direction finder. In this special case, the pair of receiving antennas is spaced apart a distance D = λ/2, where λ is the wavelength of the incoming signal. Here, λ is the wavelength of the center frequency chosen by the pair of selective receivers, usually in the VHF range of 30 to 300 MHz; this passband is indicated schematically by the middle of the three sine waves at each receiver. The pair uses a common local oscillator to maintain phase coherency. The receiver pair output phase difference φ is at a maximum of π radians (180 ° ) when the signal bearing is from the right, parallel to the vertical antenna pair plane, falls progressively to zero as the bearing reaches the normal to the plane, then decreases to −π when the bearing is again parallel to the plane but with signals arriving from the left, the opposite direction. Like magnetic direction finders, interferometers experience directional ambiguity because the phase difference does not discriminate between bearings from either side of the antenna plane. The phase detector driven by the receiver pair provides a signal voltage proportional to φ. This voltage is then a function of the direction angle θ. The magnitude V of the output voltage for a given direction angle depends on antenna gain, receiver sensitivity, and phase detector characteristics, grouped together as a signal transfer constant G. Because the wavelengths λ are quite short at VHF, a fraction of a meter to a few meters, it is practical to construct and orient the antenna arrays for three-dimensional direction finding and signal source mapping (see later).
D
Receiver A
Plan view, ground-based single-station location (SSL) systems generally obtain the bearing to the lightning source via the MDF technique. Such systems, for example, the Great Plains-1 (GP-1) at Penn State (see later), estimate range using signal propagation models that relate the statistical properties of the received magnetic signal level to the distance to the lightning source (56,57). Using this means for estimating range relies on the assumption that there is only one region of lightning in the bearing of interest. If there are multiple regions that are widely separated, then such an algorithm based on peak radiated field magnitude, or peak signal level b (Fig. 2), can provide only a single intermediate range that may not apply to any of the actual regions. Satellite-based SSL platforms are also used. They identify the occurrence of lightning from above by using optical signatures (34,43,46,47) (see later) and/or VHF fields radiated by lightning (49–53). Finally, a new hybrid single-station locating system, SOnic Lightning LOcation (SOLLO), is being tested at KSC (59). It combines electric field and acoustic signal processing to give extremely accurate lightning locations over short ranges (see later). Lightning Detection Networks Networks of electric field mills measure the electric field strength near the ground, and so can provide an
Receiver B
f=p For D = λ /2 q = p/2
f = p cos q V = G cos f = G cos (p cos q)
f=0
q=p VHF interferometer direction finder
f = −p f Detector
V
indication of impending lightning. Networks of field mills, such as the Electric Field Mill System (EFMS) at the Wallops Flight Facility at Wallops Island, Virginia, and the 31 field mills in the Launch Pad Lightning Warning System (LPLWS) at CCAFS/KSC in Florida (see later), measure the electric field potential gradient to identify the locations of charge centers (38); these centers are related to the point of origin of the lightning channel and so may indicate the threat of lightning triggered by a rocket launch (85). The LPLWS can also detect all types of lightning, albeit over a short range ∼20 km (35) and with only moderate detection efficiency and poor location accuracy (Table 2). Plan view multistation systems or network locators (NL) use spatially distributed arrays to detect nearsimultaneous lightning signals at multiple receivers; the information from all receivers is then combined by a central processor to deduce the lightning location. Lightning locations can be obtained via triangulation to the point of intersection of radials along which the radiated signal traveled toward one or more receivers. The MDF technique illustrated in Fig. 2 provides a radial for each station, as indicated by the dashed lines that pass through stations A and B in Fig. 3 that are west and north of a lightning strike point in the center. In such networks, only crossed-loop B-field antennas are required because E-field measurements are not needed to determine
LIGHTNING LOCATORS
stroke polarity to eliminate any π -radian (180° ) ambiguity. Before an upgrade completed in the mid-1990s, the NLDN was operated by the company Lightning Location and Protection, Inc. (LLP), whose wideband sensors used only the MDF technique to locate lightning (11,13,14). As indicated in Fig. 3, the lightning location can be obtained from either the intersection of the radials given by the MDF triangulation technique or by the intersection of the hyperbolas given by the TOA technique (see earlier). Normally, systems obtain a least-squares optimum location of a stroke by combining the outputs of two or more MDF stations or three or more TOA stations. Both methods can be applied in the same system to reduce the location error further, as in the current upgraded NLDN that was created by Sankosha Corporation of Tokyo, Japan, when the LLP and ARSI networks were combined (13). This error is reduced by adjusting the initial and subsequent locations iteratively via an objective procedure that accounts for the uncertainties in the estimates given by the separate methods (see later) (22, p. 162). Three-dimensional lightning mapping systems or network mappers (NM) use spatially dispersed arrays of instruments to measure either the time of arrival of or the direction to bursts of radio-frequency (RF) signals at a rate rapid enough to resolve the entire lightning channel, thereby depicting its development in three dimensions. The locations of the RF sources can be obtained from either intersecting hyperboloids of two sheets given by the TOA technique (86,87) using at least four, but typically at least five, receivers (22, pp. 152–154) (see later), or can be determined from intersecting radials given by the interferometric technique. Other Methods Lightning forecasting — predicting the initial occurrence of lightning many minutes in advance — often must be done without the aid of direct precursors such as electric field strength because it must rely on instruments that are routinely available to the weather forecaster. An example of this approach relates the probable occurrence of imminent cloud–ground lightning to the temporal and vertical changes of reflectivity given by precipitation radars, as done, for example, by operational forecasters at CCAFS/KSC (16,88,89). Rapid increases in reflectivity in certain layers within the cloud can be associated with large vertical velocities in the regions where charge separation and lightning initiation primarily occur. The large current in the lightning channel heats it to temperatures near 30,000 K and generates acoustic shock waves that produce thunder. An old technique for locating lightning is to use networks of microphones to detect where thunder originates (74, pp. 306–307; 90). Such acoustic mapping techniques require an accompanying electric field detector to sense when lightning has occurred (22, pp. 149–151), as used by the TREMBLE system (Table 2). By knowing the time of the lightning stroke, one can obtain the range to the lightning channel by using the time difference between the electromagnetic and acoustic signals received. The location of the lightning channel is deduced from the intersection of each range circle given by
909
the signal time from each microphone (22, p. 149). Ranges are limited to 20 km or so, and errors are of the order of 10 to 15% of the estimated range. Because the propagation of acoustic waves depends strongly on wind speed and direction, the strong wind shear common to thunderstorm environments is a significant source of this error. Acoustic mapping is used rarely any longer because in many cases, better mapping techniques using a VHF electromagnetic signal are available (see later). One exception occurs when special applications require high accuracy lightning location over short ranges. For example, knowing exactly where lightning attaches to launch pads at CCAFS/KSC is important to determine if the payload, launch vehicle, or support electronics need to be inspected for electromagnetic pulse (EMP) damage. KSC is testing a new hybrid system, the SOnic Lightning LOcation system (SOLLO), that combines both electric field and acoustic detectors (59). SOLLO is a single-station lightning locator (SSL) system that uses one electric field detector and five acoustic sensors. Four of the acoustic sensors are placed in a horizontal circle of 2 or 4 m radius with acoustic sensors 90° apart; the electric field sensor is in the middle of the circle, and the fifth acoustic sensor is 2 or 4 m above the electric field sensor. A sufficiently strong and rapid change in the electric field indicates that a nearby lightning strike has occurred. Waveform analysis (see later) filters out lightning signals that originate more than one mile away. The time difference between the arrival of the electric and acoustic pulses gives the range, and interferometry (see earlier) using the five acoustic sensors gives the three-dimensional bearing to the lightning strike; these two values uniquely define the lightning strike location. SOLLO provides lightning location to within about 5 m for a range of up to 1 km (Table 2). Finally, precipitation radars also have been used for the direct study of lightning because the highly ionized, overdense lightning channel effectively reflects the radiation emitted by the radar (60). The channel can be detected only for short periods of time, hundreds of milliseconds, because the ions in the channel diffuse quickly (22, p. 147). Moreover, operational radars use rotating antennas, meaning that the channel would be detected for only a brief interval during each scan. Radar is thus used more in research to learn more about lightning channel properties than to locate lightning operationally (60) (Table 2). RADIATED FIELDS DUE TO LIGHTNING Lightning is a brief, half-second or less (74, p. 21), electrical discharge that carries a large current and whose pathlength is typically many kilometers (74, p. 8). Thunderstorm clouds, often called cumulonimbus clouds, are the most common sources of lightning, and lightning strokes produced by them are the focus of locating systems (see earlier). To understand how these different systems detect various aspects of lightning, the lightning process and its signal characteristics are reviewed here and in the next section.
910
LIGHTNING LOCATORS
Electrosphere
+ 300,000 V
Iglobal Ifair wx 1000 A
Rglobal ~300 Ω
Cglobal ~3 F
+ −
S lightning Earth Figure 5. The global electric circuit. The electrosphere is a highly conductive layer of the atmosphere at an altitude of 50 to 75 km. The values of this circuit are estimated by Uman (74, p. 30) as follows: The potential of the electrosphere is about +300,000 volts (denoted V) relative to the earth’s surface and establishes the fair-weather electrostatic field of 100 volts per meter (denoted V/m) at the surface. The electrosphere and the earth’s surface form the global capacitor Cglobal that is charged to about 106 coulombs (denoted C). The electrosphere–surface capacitance is then about 3 C/V or 3 farads (denoted F). The global fair weather leakage current Ifair wx is estimated at ∼1, 000 amperes (denoted A), and the capacitor is shunted by a leakage resistance Rglobal of approximately 300 ohms (denoted ). The time constant of the global resistance–capacitance combination is thus about 900 seconds (denoted s) and would discharge Cglobal in something more than an hour. In contrast, Volland (94) estimates a slightly smaller electrosphere potential of 240,000 V and a charge of about 7 × 105 C but the same leakage current Ifair wx of ∼1, 000 A. Thus a slightly smaller leakage resistance Rglobal of 240 and a capacitance Cglobal of 2.9 F produces a somewhat shorter discharge time constant of 700 s. On a daily basis, thousands of thunderstorms scattered over the earth’s surface — up to 1,500 to 2,000 simultaneously (96, p. 211) — maintain a rather constant charging current Iglobal equal to the leakage current. The thundercloud current generators are connected into the global circuit by cloud–ground negative return strokes that act as intermittent switches, Slightning .
The Global Circuit Model A vertically varying electric field that has a magnitude of about 100 volts per meter (denoted V/m) = 0.1 kV/m at the earth’s surface (22, p. 29) is present in the atmosphere at all times and it is called the fair-weather, fineweather, or clear-air field. The fair-weather field also varies horizontally; it is influenced in part by natural ionizing radiation from different rock types and soil permeability. Globally, the charge of the earth’s surface is negative, whereas the charge of the electrosphere, whose base is 50 to 75 km above the surface, is positive (74, p. 30; 91, p. 236). The earth–electrosphere combination can be modeled as a capacitor (22, p. 31), whose plates are at the earth’s surface and in the electrosphere (Fig. 5). The annual average voltage difference between these two plates is approximately 300 kV, and they maintain a charge of about 106 coulombs (denoted C) (74, p. 30; 92,93). The capacitance Cglobal of this electrosphere–surface component is then about 3 C/V or 3 farads (denoted F). The global leakage current Ifair wx , also called the fair-weather current, and the air-earth conduction current, is of the
order of 1,000 amperes (denoted A) (22, p. 29; 73, p. 30; 91, p. 236), which results from a global leakage resistance Rglobal of about 300 ohms (denoted ). The global circuit time constant, Rglobal Cglobal , is then 900 seconds (denoted s) and would discharge the capacitor almost completely in a bit more than an hour. Volland (94, pp. 15–20) estimates slightly different values for the electrosphere charge (6.8 × 105 C) and potential difference (240 kV) but still finds about the same capacitance (2.9 F), although with a slightly lower resistance Rglobal (240 ) and a shorter time constant Rglobal Cglobal (700 s) that would imply a discharge time of about an hour. Estimates of Rglobal closer to 200 are also found in the literature [e.g., (91, p. 252)]. As proposed by Wilson (95), numerous simultaneous thunderstorms across the earth — as many as 1,500 to 2,000 (96, p. 211) — provide a mechanism for maintaining the charge in this capacitor by supplying an average thunderstorm current or supply current Iglobal equal to the global leakage current Ifair wx (97). Together, the two currents form the global circuit (22, p. 29; 96). The thunderstorm generators are connected effectively to the global circuit via switches Slightning that are closed intermittently by cloud–ground negative return strokes. Uman (74, pp. 9–10) cautions that the net charge redistribution by a lightning stroke is well known but is expressed only as a difference in charge between centers in the cloud and at the ground. The observed difference can be equally well described by deposits of positive charge in the cloud or by deposits of negative charge at the ground, a process labeled by Uman as effective lowering of charge. The positive current resulting from the effective lowering of negative charge to the ground is Iglobal , and schematically it extends upward through the thunderstorms to the electrosphere (Fig. 5). For a lightning discharge from a thunderstorm to effectively lower negative charge to the earth’s surface, separate regions of net positive and net negative charge must occur within the cloud itself. Various charge separation mechanisms have been proposed, as recently reviewed in MacGorman and Rust’s book (22, Chap. 3). Essentially, the descending heavier hydrometeors in the cloud tend to accumulate negative charge, and the smaller and lighter convectively rising hydrometeors accumulate positive charge. This leads to the dipole/tripole model of thunderstorm electrification, a somewhat oversimplified description of the typical charge distributions within a cumulonimbus cloud (22, pp. 49–53; 98). In this model, generally a lower positive charge center occurs near cloud base, a main negative charge center in the middle portion, and a main positive charge center in the upper portion of a cumulonimbus cloud [see Fig. 8.1 in (92), Fig. 1 in (97), or Fig. 3 in (99)]. This distribution results in a net positive charge within the thunderstorm anvil that occurs downwind of the main cell and is near the tropopause at altitudes of 10 kilometers (denoted km) or more above the ground. Depending on the storm type (99), the main negative charge centers occur at altitudes from 5 to 9 km or so where the temperature is between −10 and −25 ° C [Fig. 8.5 in (92);(100,101)]. The main positive centers occur 1 to 5 km above the negative centers where the temperatures are in the range
LIGHTNING LOCATORS
−30 to −60 ° C (22, pp. 193–194; 38,92,97). These altitudes are well above a typical cloud base, which is 1 to 4 km above the ground where temperatures are usually higher than 0 ° C. Potential gradients near the main negative charge centers are of the order of 100 kV/m and at cloud base are of the order of 50 kV/m (22, p. 51; 97). Pointed objects at the ground respond to the electric field produced by the cloud and so emit charge. As a result, the potential gradient near the surface is reduced to values between 1 and 10 kV/m (92). Imminent lightning is thus expected when the magnitude of the potential gradient measured near the earth’s surface exceeds a threshold of about 1 kV/m, which is about 10 times greater than the typical fair-weather field. The field mills in the LPLWS and EFMS networks (Tables 1 and 2) are designed to measure electric fields from ±15 kV/m (35,38) up to ±32 kV/m (102), and so they can easily capture the surface field of an intense thunderstorm overhead that could be as great as 20 kV/m (22, p. 121). Lightning discharges may effectively transfer charge from cloud to ground, via cloud–ground strokes, and may also redistribute charge within the atmosphere, via cloud discharges. Such discharges may occur entirely within the cloud itself (intracloud lightning), between two separate clouds (cloud-to-cloud lightning), or between the cloud and the surrounding clear air (cloud-to-air lightning). Commonly, and throughout this article, the term intracloud stroke is used synonymously with the term cloud discharge. Globally, estimates given by flash counters of the average ratio of the number of cloud discharges to the number of cloud–ground discharges vary from 3.5 (40) to 5.7 (103). Estimates from the OTD satellite-based sensor (see later) range from 2.5 to 5.7 (46). Moreover, this ratio is not constant throughout the globe; it varies in the Northern Hemisphere from about 2 at latitude 60° N to between 6 and 9 at the equator (103,104). Results obtained from efficient total lightning detection systems, however, now show that ratios are typically much greater, between 10 and 30 (105) (Fig. 35). Although not as frequent as cloud discharges, cloud–ground lightning has been a greater focus of lightning studies and real-time location systems, owing to its critical roles in maintaining the global circuit, compromising human safety (63), and producing economic impacts via, for example, disruption of electrical power, initiation of forest fires, and damage to residences (61;74, p. 8). Interest has focused recently on using the frequency of intracloud lightning to forecast the onset of cloud–ground lightning or of other thunderstorm hazards (24,65) (see later). Radiation and Electrostatic Components For most applications, the far-field or radiation components of the electric (E-) and magnetic (B-) fields propagate from their source at the speed of light (73, p. 62) and are of importance to lightning location methods that use radio techniques (5, p. 353). To detect the rates of change in these components on the timescale of a microsecond (75, pp. 194–197), fast E-field sense antenna systems, called fast antennas, are typically used (11;22, pp. 106–107). In contrast, slow antennas measure the total change in the
911
field, or the near-field or electrostatic component, produced by an entire flash over a timescale of a second (75, pp. 191–193). (Both ‘‘fast-antennas’’ and ‘‘slow-antennas’’ may use the same physical antenna; the difference is simply based on the following receiver bandwidth; see Lightning Signal Discrimination.) A sufficiently large temporal rate of change of the current I in a flash generates a radiated field about a certain frequency f . This frequency is proportional to the inverse of the quarter wavelength λ/4 of the radiated field, or equivalently to the inverse of the length l ∼ λ/4 of the lightning channel. The far-field components dominate at ranges r from the radiator greater than a few wavelengths λ, or equivalently at ranges r that are much greater than l; this latter relationship is usually expressed as the inequality r > 10l. Thus, for a typical cloud–ground lightning channel of length l = 5 km, the far-field components would dominate at ranges r greater than 50 km. Pierce (5, p. 353) notes that the far-field components dominate at ranges r > c/2π f , where c is the speed of light equal to 3 × 108 m/s. At distances greater than 50 km, broadband receivers using frequencies f at or above 1 kHz, as is typical in systems using the magnetic direction finding (MDF) technique (11; Fig. 2; see later), would detect primarily the far-field components. Equivalently, receivers that measure signal magnitudes or signal levels at frequencies of 10 kHz or greater need be concerned only with these components at ranges greater than 5 km (106, p. 545). Restriction to reception of the far-field component of the lightning signal is important in SSL systems, because theoretically the received amplitude of this field varies at a rate proportional to 1/r (5, p. 353; 106, p. 545). Ranging algorithms used by many single-station locator (SSL) systems are based on this relationship between peak B-field signal level b and range r, although the power to which b must be raised may differ from −1 (56) (see later). Finally, single-station flash counter (SSC) systems designed to detect lightning at ranges less than 5 km from the receiver must contend with both the far-field and nearfield components, making the determination of accurate range estimates challenging (see later). The earliest single-station locating (SSL) and plan view network (NL) systems for locating lightning concentrated on cloud–ground strokes (1) because the direction to these strokes tends to be easier to locate unambiguously from a distant receiver than the direction to cloud discharges. The cloud–ground stroke channel tends to be vertically oriented near the ground where the initial magnetic field peak is radiated (82), and so at the far field — at distances much greater than the channel length (14) — the channel appears as a vertical radiating antenna at a point source (106, p. 544). In contrast, intracloud strokes have channels that typically span 5 to 10 km in the horizontal (74, p. 21), but these channels may extend much farther, up to many tens of kilometers (107–109). At relatively close range, definition of an unambiguous bearing using a radio receiver is made difficult by such long horizontal channels for intracloud strokes but is relatively simple for the near-surface vertical channels of cloud–ground strokes. For example, a 10-km long intracloud stroke oriented orthogonal to a radial extending
LIGHTNING LOCATORS
from the receiver spans 11° of azimuth at a 50-km range, about 5 1/2° at 100 km, but only about 1° at 500 km. At ranges greater than 100 km or so, the bearing to an intracloud stroke can be hard to determine accurately for a different, perhaps somewhat subtle reason. The predominantly horizontal intracloud channel acts as a horizontally polarized radiation source (58;81, p. 184). Such sources propagate electromagnetic waves via two pathways, one via a ground wave and the other by a longer path sky wave (75, pp. 224–228). In contrast, vertically polarized sources, such as cloud–ground strokes near the earth’s surface, propagate only via the shorter path ground wave. The apparent bearing of the combined sky wave and ground wave from an intracloud stroke is ambiguous owing to phase distortion and yields an elliptical pattern on an oscilloscope display (4,7;81, p. 185), as used in the original cathode ray direction finding (CRDF) technique (2). In contrast, the apparent bearing of a return stroke is quite discrete and appears as a spike on such a display (11), which allows MDF systems to produce bearing accuracies of a degree or so (7). As described by ray or reflection theory, the ground wave or ground pulse is a signal that propagates to a receiver along the earth’s surface, whereas the sky wave arrives at a receiver after one or more ionosphere–earth reflections. The first-hop sky wave after one reflection is normally the one of concern for lightning location (75, pp. 224–228), although some very long-range lightning detection systems have begun to employ multiple-hop sky waves (20). The difference in time between the arrival of the later first-hop sky wave and the earlier ground wave is t1 = (r1 − r)/c, where as before c = 3 × 108 m/s is the speed of light and r is the range from the receiver to the source. Here r1 is the phase path of the first-hop wave, and it is related to r and the virtual height h1 of the firsthop sky wave by r21 = r2 + 4h21 . During the day, the virtual height h1 ∼ 70 km, and at night h1 ∼ 90 km because the ionosphere becomes less active after sunset. As a result, the height of the ionosphere effectively rises and so leads to the common occurrence of more distant nighttime AM radio receptions due to the sky wave. For unambiguous separation between the ground and sky waves, the arrivaltime difference t1 must be greater than the duration τ of the stroke, often called its pulse length. The condition here that t1 > τ can be written as r<
4h21 − τ 2 c2 . 2τ c
(1)
For typical pulse lengths τ of 100 to 200 microseconds (denoted µs) for return strokes, separation of the ground and sky waves based on arrival-time considerations alone can be done for ranges only up to 300 km during daytime and up to 500 km at night. Because the bearing to a return stroke is determined from the ratio of the peak values of the two B-field components (ew and ns in Fig. 2), and because these peaks occur less than 10 µs after the signal level rise begins, bearing accuracies are maximized (82). Thus, many plan view network locator (NL) systems such as the NLDN focus on locating cloud–ground strokes based on the detected ground wave and in so doing can attain
reasonably good bearing accuracies for ranges upto several hundred kilometers (13). Time-Domain Waveform Analysis Fortunately, the electromagnetic waveforms of intracloud and cloud–ground strokes graphed as functions of time are noticeably different (Figs. 6, 7, and 8), meaning that the stroke type can be determined by using suitable time-domain waveform analysis of measurements of radio signals received by very low frequency (VLF, 3 kHz to 30 kHz)/low frequency (LF, 30 kHz to 300 kHz) systems (11). Negative return strokes have an initial positive E-field polarity, seen in the graph as an increase of the E-field to a positive peak value (Fig. 6), and positive return strokes have an initial negative polarity, seen as a rapid decrease to a negative peak value. The waveforms for either the electric or magnetic waves can be used in this analysis, because the temporal values of the radiated electric and magnetic components in the far field follow exactly the same-shaped curve (6,110). In time-domain waveform analysis, the characteristics of the waves are distinguished by calculating various parameters obtainable when the radio wave is graphed as a function of time. One useful wave-shape parameter is the ratio of the first peak value to the largest peak value of opposite sign — known as the overshoot tolerance (14) — that separates the distinct bipolar shape of the intracloud wave,
0.6 First negative return stroke 0.3 10−8 Wb/m2
912
0
−0.3
−0.6
Lawrence, KS
0
50
100
150
200
250
µs Figure 6. Magnetic field (B-field) waveform acquired by the VLF/LF GP-1 system at Lawrence, Kansas, at 23 : 14 UTC on 18 August 2000 from lightning in the region of Little Rock, Arkansas, about 500 km distant at a bearing of 150 ° . This first return stroke wave shows the characteristic multiple peaks separated by ∼10 microseconds (denoted µs) on the descending limb of the initial excursion created by the stroke branches. The rise time, or the time necessary for the signal level to increase from 10% to 90% of its peak value, is ∼4 µs and the waveform width, or the time over which the value remains positive after the peak is attained, is ∼25 µs; both values are typical for return strokes. The negative-going undershoot between 80 and 150 µs and the following overshoot between 150 and 220 µs are somewhat exaggerated by high-pass filtering necessary to remove excessive interference from the local 117Vac power service. (NLDN data for stroke location are cited with permission of Global Atmospherics, Inc.)
LIGHTNING LOCATORS
Intracloud stroke
0.6
12 Stepped leader waves
10−8 Wb/m2
10−8 Wb/m2
0.3
0
−0.3
6 St
St
St
St
St St St St
St
0
−6 Lawrence, KS
Lawrence, KS −0.6
913
0
50
100
150
200
250
µs Figure 7. As in Fig. 6, but from an intracloud stroke at 22 : 48 UTC. Although such strokes can be many kilometers long, they are frequently far shorter. Such a short stroke generates a somewhat steeper rise to the initial peak and a far more rapid fall in the descending limb than a cloud–ground return stroke. The rise time here is ∼1 µs, which is noticeably shorter than the value in Fig. 6 for the return stroke, and the waveform width is ∼4 µs, much less than that for the return stroke. These values are typical of those for intracloud strokes. A sharp undershoot, whose peak occurs about 10 µs after the initial peak, is also a common feature. (NLDN data for stroke location are cited with permission of Global Atmospherics, Inc.)
in which the positive peak is often about equal in magnitude to that of the negative peak, from the monopolar shape of the negative return stroke, in which the positive peak level is significantly greater than the level for the negative peak. As noted by Weidman and Krider (9), the monopolar shape of the return stroke wave, with a rapid rise to a peak value, is caused by the current surge up the predominantly vertical channel that is formed by the stepped leader as it nears the earth’s surface (see next section). In contrast, Weidman and Krider (10) suggest that the bipolar shape of the intracloud stroke wave arises from the superposition of two types of currents: A series of fast pulses associated with the formation of the lightning channel produces the positive peak; this series is followed by a slower surge of current giving the second peak that occurs either during the establishment of the channel or just after it is formed. A characteristic of a lightning radio wave is that there is a short E- or B-field rise time, the time during which the absolute value of the signal level increases from 10 to 90% of its first peak (111). Apparent rise times vary, depending on whether the electromagnetic wave propagates over salt water or over land. For return strokes that propagate over salt water, which is a good conductor, rise times of the order of 0.2 µs or less are observed (9); but for waves traveling 200 km over land, rise times increase to 1 µs or so (112). Such fast rise times are used to separate the electromagnetic waveforms of lightning from other received signals, but to do so requires lightning signal processors that have at least a 100-nanosecond (denoted ns) resolution. In the sample waveforms from the GP-1
−12
0
50
100
150
200
250
µs Figure 8. Magnetic field (B-field) waveform showing the last few stepped leader pulses (St) preceding a negative return stroke in a flash acquired by the VLF/LF GP-1 system at Lawrence, Kansas, at 2 : 11 UTC on 20 August 2000. This stroke, reported by the NLDN to be about 75 km northeast of Lawrence, had an extremely large peak current of 41 kA. This value was the largest reported between 2 : 00 UTC and 3 : 00 UTC, over which the median value was the much lower value of 9 kA. Large currents of many tens of kA are rare but can produce sizable signal levels even at ranges of many tens of kilometers. Small stepped leader waves are difficult to discern in recordings from strokes beyond 100 km, because on average, even the largest peak from any leader pulse has a radiated signal level that is less than 10% of that of the return stroke peak. For the return stroke that occurred after 200 µs, the rise time is ∼2 µs, and the waveform width >25 µs; both are values typical of return strokes. (NLDN data are cited with permission of Global Atmospherics, Inc.)
system at Lawrence, Kansas, rise times for first return strokes (see next section) over land increase from ∼2 µs at a 75-km range (Fig. 8) to ∼4 µs at a 500-km range (Fig. 6). To distinguish signals from lightning and other nonlightning sources such as arcing when motors start or light switches are thrown, the original gated magnetic direction finders used an upper limit of 8 µs (7); in contrast, the direction finders used in the first lightning location networks used rise times as long as 20 µs (11). Finally, the B-field rise time for the intracloud stroke, ∼1 µs in Fig. 7, is substantially less than the rise time for the return stroke, ∼4 µs in Fig. 6, a characteristic in addition to the overshoot tolerance that aids in distinguishing intracloud from return strokes. Return strokes also have a sufficiently long, waveform width or pulse width, the time during which the B-field signal value has the same sign after the first peak is attained. Thus, differences in the waveform width can also be used to aid in distinguishing intracloud from return strokes. The positive B-field signal has a waveform width of about 25 µs for the negative return stroke, but it is only about 4 µs for the intracloud stroke. For wideband MDF networks used in the 1970s to detect return strokes, pulse widths greater than 15 µs (11) were proposed. The minimum value of the waveform width applied to the sensors used in the NLDN was decreased from 11 to 7 µs
914
LIGHTNING LOCATORS
as part of the system upgrade in 1995 (14,15,113,114), and the overshoot tolerance was increased from 0.85 to 1.2 (14,113). These changes may have led to an increase in the number of positive return strokes detected by the system (13), an increase apparently contaminated by misdiagnosis of some strokes that were actually a type of intracloud pulse (15) known as the narrow positive bipolar pulse (NPBP) (115) (see later). Finally, the characteristic values of parameters such as the rise time and waveform width for the return stroke waveforms vary seasonally and with the location of the thunderstorms on the globe (116). The stepped leader that precedes the return stroke (see next section) appears as a series of monopolar pulses of significantly smaller amplitude than that of the return stroke (Fig. 8). Indeed, a nearby stepped leader can produce pulses whose peak amplitudes are as large as that for a more distant return stroke. Time-domain waveform analysis can be used, however, to distinguish the stepped leader pulses from the return stroke pulse. Smaller values of these wave-shape parameters characterize waves sensed by wideband very high frequency (VHF) systems. For example, the waveform measured by one 10- to 500-MHz interferometric system has the same shape as that measured by wideband VLF systems, but has much shorter rise times (∼5 ns), much smaller waveform widths (∼10 ns), and much smaller amplitudes (27). Measurements using vertically polarized dipole antennas have produced similar results (117). Thus, ground-based VHF lightning-mapping systems such as LDAR, LMA/LMS, or SAFIR (see later) also use the ground wave radiated either by stepped leaders and K-processes or dart leaders (see later section) that have much shorter, 1 to 20 µs, pulse lengths (75, p. 228; 86), or by the streamer–leader transition that has a much shorter duration, less than 10 ns (27). For unambiguous discrimination of ground and sky waves, such short-duration, ∼20 µs, pulse lengths lead via Eq. (1) to maximum ranges greater than 1,500 km during the day and 3,000 km at night (75, p. 228). Thus, the limited detection range ∼250 km for VHF systems is not tied to distinguishing the ground and sky waves but instead is tied to the much lower signal level than in the VLF (see later sections) and to line-of-sight propagation of the VHF radiation produced by the discharge process (8). Stroke Polarity Conventions The conventions used to relate the positive and negative polarities of a lightning stroke to the deflection of the measured electric field (E-field) vary with author, and so the reader must remain wary. In general, strokes of negative polarity effectively lower negative charge to the earth’s surface, and strokes of positive polarity raise negative charge away from the earth’s surface (see next section). The literature on return stroke waveforms normally uses the convention used in this article, that strokes of negative polarity produce a positive deflection on an instrument measuring the E-field [e.g., Fig. 2 in (9), Figs. 6 and 8, and Fig. 10], and strokes of positive polarity produce a negative deflection. Unfortunately, this convention is not followed uniformly in studies of intracloud strokes. Consistent with
the convention used in studies of cloud–ground strokes, some authors state that intracloud strokes of negative polarity produce a positive deflection, and strokes of positive polarity produce a negative deflection, of the E-field intensity (10,118,119). The opposite, so-called physics convention, is used by others, however, whereby a stroke of positive polarity is said to produce a positive E-field deflection (115,120). The physics convention is followed primarily in articles that discuss a particular short-duration bipolar pulse, upon which substantial high-frequency components are superimposed, that is labeled the narrow positive bipolar pulse (NPBP) or narrow negative bipolar pulse (NNBP). These pulses have a significantly shorter duration — only 10 to 20 µs (115,119,121) — than those of the intracloud strokes discussed in (10) or illustrated in Fig. 7, have peak amplitudes comparable to those of return strokes (120), tend to occur early in the flash (118), and produce the largest signal of any discharge process in the VHF (119). Generally, NNBPs occur much more rarely than NPBPs (53). These narrow bipolar pulses are confined to the main negative and upper positive charge regions of the thunderstorm dipole and are the initial event of otherwise normal intracloud strokes (32,118). Recently, these pulses have been associated with transionospheric pulse pairs (TIPPs), which are pairs of powerful high frequency (HF, 3 MHz to 30 MHz)/VHF signals that last ∼10 µs, that occur ∼50 µs apart, and that were first detected by the BLACKBEARD VHF radio instrument on the ALEXIS satellite (52,53,122). LIGHTNING MORPHOLOGY Understanding the design requirements of a lightning locating system requires thorough knowledge of the morphology of a lightning flash (Fig. 9). Intracloud flashes have many components similar to those in cloud–ground lightning. Many intracloud flashes have a bilevel, horizontally branching structure that extends within the main positive charge regions (32,120); in the decaying stage of a thunderstorm, these branches can extend horizontally many tens of kilometers in what is known as spider lightning (108,109). In its active phase, an intracloud flash typically discharges the main negative charge region of a thunderstorm by effectively transporting negative charge upward via nondiscrete, continuing currents in a channel that joins the main positive and negative centers. In contrast, a cloud–ground lightning flash, which is described in detail in the following paragraphs, is composed of one or more discrete strokes that effectively lower charge from either the main positive or negative charge centers in a thunderstorm to the earth’s surface. Both types of lightning flashes produce the bright light that can be seen and the thunder that can be heard. The Stepped Leader When the potential difference between the main negative center in the middle portion of the cloud and the positive center below is large enough, preliminary breakdown is said to occur, and a small charge begins to be lowered
LIGHTNING LOCATORS 0
20
20
60
60
90
Dart leader
Stepped leader
Dart leader
915
90 Cloud base
50 − 100 µs
1 − 2 ms
Attachment
0 Upware streamer
1 − 2 ms
First return stroke 20
20
Subsequent return stroke 60
60
90
Subsequent return stroke 90
Time since start of stepped leader, (ms)
Figure 9. The sequence of events in a typical negative cloud–ground flash, where time increases from left to right. A stepped leader emerging from a region near cloud base initiates the flash. Each leader step is approximately 50 m long. The next step occurs about 50 µs after the first, and the process repeats until a leader step nears the surface in some 10 to 30 milliseconds (denoted ms). This near-ground step raises the local electric field gradient and initiates an upward streamer. When the upward streamer meets this last step of the stepped leader a few tens of meters above the surface, the return stroke begins. The return stroke travels up the ionized channel established by the stepped leader at about one-half the speed of light. The stepped leader generally displays many branches as it meanders toward the surface. The duration of this, and subsequent, return strokes, is 50 to 100 µs, depending on a stroke length of one to a few kilometers. If sufficient charge becomes available at the top of the lightning channel via J- and K-processes, then after a few tens of milliseconds, a dart leader descends continuously from the cloud, follows the first return stroke channel, and nears the surface in 1 to 2 ms. On reaching the surface, a subsequent return stroke is initiated that follows the channel of the dart leader. These subsequent return strokes are not branched. When they contain continuing currents, these subsequent strokes may contain impulsive brightenings known as M-components, which some have erroneously labeled as additional subsequent strokes. Each subsequent return stroke may be followed by another dart leader, in turn, initiating another return stroke. Thus a flash may consist of as many as 26 strokes (140) and has a flickering flash duration of up to two seconds [after Fig. 1.6 in (74)].
toward the ground. This lowering proceeds in several discrete intervals or steps in the process termed the stepped leader. Each step is tens of meters long, lasts about 1 µs, carries currents of the order of 1,000 A, and propagates at speeds that range from 1 × 105 m/s to 2.5 × 106 m/s, and at an average speed whose estimates range from 2 × 105 m/s to 3 × 105 m/s (22, pp. 89, 93; 74, pp. 11, 82; 68, p. 10; 123). Each leader step produces a small broadband electromagnetic pulse that has a distinct waveform (denoted St in Fig. 8) that precedes the much larger pulse produced by the return stroke. The average peak signal level, or magnitude, of a stepped leader pulse is ∼10% of the peak signal level of the following return stroke field and has a significant frequency content ranging from 200 kHz to 1 GHz (5, p. 356; 8,28,117,124–126). The orientation of the stepped leader is not solely vertical and forms a lightning channel that has many branches or forks (Fig. 9). The horizontal extent of this channel is typically limited by the size of the thunderstorm (22, pp. 199–203), but it can extend a significant distance horizontally. If it remains aloft, then it will lead to a cloud discharge; but if it nears the ground, then it will lead to a cloud–ground discharge.
In some cases, the horizontal channel can extend so far horizontally before becoming vertical that the resulting stroke can appear as the occasionally reported bolt-out-ofthe-blue (32) (Fig. 33). The First Return Stroke After approximately 20 milliseconds (denoted ms), the lightning channel defined by downward propagation of the stepped leader is close enough to the ground that upward streamers or connecting leaders from the surface, which are initiated by electric field changes at sharp objects, attach to the channel at striking distances of several tens of meters above the ground (74, p. 107); this attachment process initiates the return stroke (Fig. 9) (74, p. 12). As many as 50% of the negative return strokes have multiple attachments to the ground (127,128), making the location of the contact point by networks of sensors challenging. This stroke, which lasts 70 to 100 µs, propagates up the leader channel at speeds that approach one-third to onehalf of the speed of light and that decrease as elevation above the ground increases (74, p. 13; 123,129,130). Typically, several of the many tens of coulombs of available negative charge are effectively transferred to ground by
916 (a)
LIGHTNING LOCATORS 0.6 First negative return stroke
10−8 Wb/m2
0.3
0
−0.3
−0.6
Lawrence, KS
0
50
100
150
2314 UTC Aug 18, 2000 200 250
µs (b)
0.6 Subsequent negative return stroke
10−8Wb/m2
0.3
0
−0.3
−0.6
0
50
100
150
Lawrence, KS 2345 UTC Aug 18, 2000 200 250
µs
Figure 10. As in Fig. 6, but illustrating the difference in waveforms acquired for first return (a) and subsequent return strokes (b). The waveform in part (a) is the same one as in Fig. 6, but the waveform in part (b) was obtained from a different flash in the same region at 23 : 45 UTC. Subsequent return strokes are seldom branched and thus show a relatively smooth descending limb in the initial wave excursion. The rise time for the first stroke in (a) is ∼4 µs, and for the subsequent stroke in (b) is ∼5 µs. Owing to the lack of subsidiary peaks, the waveform width of the subsequent stroke (∼20 µs) is a bit narrower than that of the first stroke (∼25 µs). Although the peak levels shown are typical of those for return strokes, normally the peak level for a subsequent return stroke is less than that for the associated first return stroke. The two return stroke waveforms shown here are from two different flashes and so only coincidentally have similar peak levels.
the stroke. Table 3.1 in (22) provides a summary of several published estimates of the available charge in the main negative center of the thunderstorm dipole; these charges vary from 20 to 340 C, and the majority is in the range of 20 to 50 C. Table 7.2 in (74), adapted from the work of Berger et al. (131), reports that the median value of the charge transfer by a first return stroke is approximately 5 C. The typical peak current near the ground in this first return stroke is about 30 kA but can vary from 10 to 200 kA or more (131). Within 50 µs or so, the current in the lightning channel falls to 50% of its peak level, but currents of the order of 100 A may continue to flow
in the channel for up to several hundred ms (74, p. 13) (see later). Owing to field attenuation with distance, the observed peak levels of the electromagnetic signals are often range-normalized to levels that would be observed if the lightning were at a fixed distance from the receiver. The standard distance used is 100 km (74, p. 111; 132,133) (see later), although 10 and 50 km are also used (106, p. 565; 125). The first return stroke generally has a forked appearance because the stepped leader creates a branching pattern as it propagates toward the ground to define the return stroke channel (Fig. 9). These ionized branches in the channel cause the first return stroke to produce an electromagnetic waveform that has three or four local maxima, or subsidiary peaks, of substantial amplitude that occur every 10 to 30 µs after the initial largest peak (Fig. 10a) (9). Subsequent return strokes (see later) that might occur within the flash have much smaller subsidiary peaks, because these return strokes tend to be unbranched (Figs. 9 and 10b). This characteristic difference between the first and subsequent return stroke waveforms can be used by locating systems to identify the first return stroke that typically generates the largest signal level because it has the greatest current (11). Generally speaking, the longer the radiating channel, the larger the peak current, and the lower the frequencies at which the peak amplitude of the generated electromagnetic signal occurs (106, p. 544). With a large-current discharge spanning a kilometer or more vertically, the return stroke thus has a peak amplitude at relatively low frequencies near 5 kHz (134). The fine spatial structure within the return stroke channel and the rapidity of the discharge ensure significant higher frequency components, creating a broadband spectrum whose power varies proportionally to 1/f through frequencies f of 1 MHz or so and to 1/f 2 for larger frequencies (5, p. 369; 125,135). Design uncertainties in early narrowband receivers may have led to the suggestion of a slower, 1/f , fall-off of amplitude at all frequencies tested above 1 MHz (136, pp. 196–197). These narrowband systems measure the net effect of several individual waveforms and so give misleading spectral results. Nevertheless, those using narrowband VHF systems to study the electromagnetic emissions from all stages of the lightning process (24) cite the 1/f decrease in amplitude with frequency. That the occurrence of radiated power at these higher frequencies is related to the forks in the lightning channel created by the stepped leader is suggested by the fact that the spectra for both the leader step waves and the first return strokes have similar shapes and radiated power distribution in the frequency range of 300 kHz to 20 MHz (125). Subsequent Return Strokes As noted earlier, the first return stroke effectively lowers several coulombs of the available charge (131). If an insufficient amount of charge remains to initiate a second stroke, then the lightning flash is a single-stroke flash. After the first return stroke has occurred, charge may become available at the top of the channel, moving within the cloud via relatively slow J- and fast K-processes, and about 40 ms after the return stroke ends may initiate a
LIGHTNING LOCATORS
continuous dart leader that, unlike the discrete steps of a stepped leader, propagates down the channel continuously (Fig. 9). Interferometric (ITF) mapping techniques, used, for example, by SAFIR/ONERA-3D, that detect extremely short duration (<10 ns), continuously pulsed events within the lightning discharge process (27,126) work well in detecting negative dart leaders and K-processes (86,137) (see later). In contrast, TOA techniques, used, for example, by LDAR (see later section), focus primarily on longer duration (a few µs), discretely pulsed negative stepped leader steps and junction processes in virgin air and so do not track dart leaders well (86). The dart leader propagates down the first-return stroke channel, generally without branching, at speeds ranging from about 0.2 × 107 to 4 × 107 m/s [Table 5.3 in (22); Table 8.1 in (74), which is adapted from (130); p. 10 of (68);(123)]. When the dart leader nearly reaches the ground, a second return stroke occurs as part of a multistroke flash. According to Table 7.2 in (74), which is adapted from the work of Berger et al. (131), the median value of the charge transfer by a subsequent return stroke is approximately one-quarter that of the first return stroke, a bit over 1 C. For two cells in Florida, Nisbet et al. (38) found that the charge transfer by the entire flash was in the range of 1 to 27 C; values up to 150 C have been measured [see Table 2 in (35)]. Because the human eye discerns only the light created by each return stroke, multistroke flashes appear to flicker. The second and any additional strokes in a flash follow the well-defined channel upward, are usually not forked (Fig. 9), and so produce a smoother electromagnetic waveform that has few or no subsidiary peaks (Fig. 10b), (9). As a result, only a relatively small amount of radiation above 1 MHz is generated (74, p. 119; 138). Because the current in a subsequent stroke is usually less than that in the first return stroke, typically about one-third [see Table 7.2 in (74)], the signal level produced by a subsequent stroke is generally much less than that of the first return stroke (131). Signals from first return strokes are the most reliable return stroke signals to describe the peak signal level of a lightning flash and to provide range estimation using signal propagation models; from their waveform shape, they can easily be distinguished from the subsequent stroke. Additional dart leaders may occur to initiate subsequent return strokes. Although claims exist in the literature of more than 40 [e.g., (77, p. 65)], such a high number is likely to be given by erroneously counting the numerous impulsive increases in luminosity that may occur when the tip of the upwardly moving subsequent return stroke streamer reaches a branching point of the channel that was produced originally by the forked stepped leader (71, p. 18). Such impulsive increases in the brightness of the subsequent return stroke that occurs when the forks are illuminated are called M-components (71, pp. 24–26). More reliable maximum values for the total numbers of strokes in a flash are 17 (139), 18 (127), and 26 (140). Operationally, the NLDN limits the number of strokes per flash to 15, provided that they all occur within 1 s and 10 km of the first return stroke (13). The global mean value of strokes per ground flash is about
917
3.5 (141), which is about twice the number detected by the NLDN (12,13). Not all flashes have only discrete episodes of current flow, however, because some channels may remain attached to an object such as a tree on the ground for hundreds of milliseconds. Such continuing currents can maintain flows of hundreds of amperes, comparable to those of an arc welder, and these may be enough to bring the object to a combustible temperature (22, pp. 106–107). Between one-quarter and one-half of all cloud–ground strokes contain continuing currents (74, p. 14) that generate a substantial amount of energy in the extremely low frequency (ELF, less than 300 Hz) range; in one study, 60% of single-stroke and 50% of multiple-stroke flashes had long continuing currents (140). Indeed, Mcomponents are impulsive brightenings of the continuing current channel (22, p. 107; 74, pp. 14, 17). The fact that continuing currents can initiate combustion provided the original rationale for creating a lightning location network using VLF/LF techniques for the timely detection of forest fires (11). Return Stroke Types Cloud–ground strokes need not effectively carry negative charge to ground and have stepped leaders originating in the cloud. Berger (142) notes that there are four types of cloud–ground strokes, although Mazur and Ruhnke (143) propose a modification of Berger’s types based on the argument that a return stroke can move only from the ground upward to the cloud (22, p. 97). Type 1 strokes originate in the cloud as negatively charged stepped leaders that effectively carry negative charge to ground and are called negative return strokes; these generate electric fields whose initial peaks have positive polarity (Figs. 6 and 8 earlier and Fig. 10) (144). Globally, more than 90% of all cloud–ground strokes are of this negative type (74, p. 8). Positively charged stepped leaders originating in a cloud are also possible, leading to Type 3 positive return strokes that effectively lower positive charge to ground. Such strokes often come from the anvil that spreads downwind of the main part of the thunderstorm at heights of 10 km above the earth’s surface (145,146). Positive strokes also appear with greater relative frequency in highly sheared and/or shallower thunderstorms, such as occur more frequently in winter. In the spring of 1998, a significantly greater relative number of positive return strokes were also observed in the Central Plains of the United States when numerous biomass fires burning in Central America produced aerosol particles that were advected over the United States (147). Although the vast majority of return strokes effectively carry negative charge to ground as part of negative flashes, an increase in the relative number of positive flashes is a signature of severe thunderstorms (22, p. 253). Although Berger et al. (131) found that the median value of peak current for positive and first-return negative strokes is about the same, 30 kA, positive return strokes have a greater number of peak current values that are substantially larger than the median value. As seen in Figs. 11 and 12 of (13), the distribution of peak currents measured by the NLDN
LIGHTNING LOCATORS
10000
Lightning location range & return stroke spectra Locating systems 50 km E-field (134) 50 km E-field (125) 50 km dE/dt-field (125) 50 km dE/dt-field (135) 200 km E-field (151)
ATD (2-D) (TOA) Long-range shipborne locator NLDN (2-D) (2-D) (spectral) (MDF/TOA)
1000
SAFIR LDAR (2-D) (3-D) (ITF) (TOA)
GP-1 (2-D) (MDF)
100
ONERA (3-D) (ITF)
First return 10
Relative E-field reduction (dB)
Figure 11. Maximum lightning location range for a few selected locating systems (Table 2) spanning ultralow to very high frequencies (ULF to VHF) (plan view locator, (2-d); channel mapper, (3-d); interferometry, (ITF); magnetic direction finding, (MDF); time-of-arrival, (TOA); ratio of spectral amplitudes, (spectral)), together with the frequency spectra for first-return strokes as given by measurements using wideband (MDF) systems (125, 134, 135, 151). The intermediate frequencies are chosen for the wideband observing systems. Note that systems using lower frequencies have a greater detection range. The return-stroke spectra are expressed in terms of decibels (dB), which logarithmically quantify the E-field reduction relative to the standard of 1 V/m or the dE/dt-field reduction relative to the standard of 1 V s/m. From a peak value near 5 kHz, the spectral amplitudes fall f −1 as through frequencies of 1 MHz, but fall more rapidly as f −2 for frequencies greater than 1 MHz. Note that the decrease in detection range for the lightning location systems roughly follows the decrease of the spectral amplitudes. Also shown via the horizontal bars in the lower part of the figure is a rough approximation of the frequency content of first and subsequent return strokes, intracloud (including cloud–cloud and cloud–air) strokes, and stepped leader pulses. Lower frequency systems sample the relatively long return strokes well but must discriminate them from intracloud strokes. In contrast, higher frequency systems sample the relatively short stepped leader pulses well but primarily infer the location of return and intracloud strokes from the channel orientation that is given by a coherent sequence of these short-duration pulses.
Locating system maximum range (km)
918
ELF
0.1
ULF
VLF
1
10
LF
−80 −100 −120 −140 −160 −180 −200
Subsequent return Intracloud Stepped leader
1
−60
MF
HF
VHF UHF
−220 −240
100 1000 10000 100000 1000000 Frequency (kHz)
is more strongly skewed to larger current magnitudes (right-skewed) than the distribution of peak currents from negative return strokes. This more strongly skewed distribution is consistent with the origin of many positive return strokes in the anvil, whose lightning channels are much longer than those for negative return strokes that begin typically in the middle portion of the cloud (101). The other two types of cloud–ground flashes defined in (142) — Types 2 and 4 — are relatively rare. They involve leaders that begin at the earth’s surface and propagate upward toward the cloud; generally, cloud–ground strokes of this type are initiated at mountaintops or tall buildings. According to Mazur and Ruhnke (143) and summarized by MacGorman and Rust (22, p. 97), Type 2 strokes are characterized initially by a positive leader that moves upward into the cloud. It is followed by a dart leader that propagates from cloud to ground and that initiates the return stroke which effectively lowers negative charge to ground. Type 4 strokes begin via a descending positive leader that initiates a negative leader which moves upward from the ground. When these two leaders meet, a return stroke that effectively lowers positive charge to ground occurs. SAMPLING CONSIDERATIONS FOR LIGHTNING LOCATION SYSTEMS The different types of lightning location systems introduced earlier exploit the different aspects of the lightning discharge process that are reviewed in the previous sections. These systems provide various information about
Figure 12. Map of the 31 field mill locations used in the Launch Pad Lightning Warning System (LPLWS) at the Kennedy Space Center. (Courtesy USAF, 45th Weather Squadron, used with permission.) See color insert.
LIGHTNING LOCATORS
the discharge, such as the location of the ground strike point or the orientation of the lightning channel, and provide vastly different stroke classification and ranging capabilities. Throughout the discussion here, emphasis is placed on application of the radio-frequency characteristics of lightning. Stroke Classification The electromagnetic field frequency spectrum of first cloud–ground strokes is quite broad and spans frequencies from at least 1 kHz to 10 GHz (136, p. 196), over which the signal level, or magnitude, normalized to the 10-km range, decreases as frequency increases, from a peak typically near 5 kHz (134) to a value seven orders of magnitude smaller. The peak in the spectrum is in the VLF portion of the radio-frequency spectrum; thus the origin of the common statement that ‘‘VLF sferics are produced predominantly by return strokes’’ (22, p. 249). One must be careful when applying this statement, however, because bipolar waveforms associated with intracloud strokes are commonly observed when using VLF systems (Fig. 7), and so, as discussed earlier, when detection of cloud–ground strokes is desired, time-domain waveform analysis must be performed to separate the cloud–ground and intracloud (including cloud–cloud and cloud–air) signals (11) (Figs. 6 and 7). The wideband character of the electromagnetic radiation in first return strokes is caused by branching effects commonly seen in their channels (125,138). Thus network locating (NL) systems such as the NLDN and CGLSS that focus on cloud–ground strokes (see later) use frequencies predominantly in the VLF/LF range that detect cloud–ground lightning in the far field using the peak signal radiated near the surface where the channel is predominantly vertical. In contrast, network lightning mapping (NM) systems such as LDAR, LMA/LMS and SAFIR/ONERA-3D, which locate the lightning channel in three dimensions by tracking the position of negative-polarity streamer/leader transitions, stepped leaders, dart leaders, or other in-cloud breakdown processes (24,27,29,86,137), use VHF frequencies higher than 30 MHz (Table 2) (see later). Thus, VHF ground-based systems do not detect cloud–ground strokes or intracloud strokes directly, because there is little VHF radiation emitted by current flow in existing channels (137), but instead provide the stroke classification by tracking the formation of the channel given by a large number of detections of a coherent sequence of extremely short pulses (24,27,29,32,33,86). Recent work ´ VHF and on data from the satellite-mounted FORTE optical sensors suggests, however, that cloud–ground and intracloud flashes can be distinguished from their optical waveforms and their properties in VHF spectrograms (50). The electromagnetic fields produced by lightning propagate by line of sight in the VHF (see earlier). Consequently, ground-based VHF mapping systems cannot identify the strike point on the surface directly because intervening ground due to the earth’s curvature or other obstructions blocks the signal, whereas VLF systems can. Thus, the VLF/LF CGLSS network supplements the LDAR network
919
to provide such points (16), and VLF/LF receivers are used in the SAFIR network for the same purpose. Stroke Ranging The effective range of a locating system is related to the portion of the electromagnetic spectrum sampled. As noted before, the radiated power drops by several orders of magnitude between the VLF peak near 5 kHz and that measured at microwave frequencies near 10 GHz. Such drops in power are often described by using power ratios that are conveniently expressed in decibels (dB), where 1 dB is 10 times the base 10 logarithm of the ratio of the radiated powers. Because radiated power varies as the square of the electric field intensity, 1 dB is 20 times the base 10 logarithm of the ratio of the electric field E, or equivalently the magnetic field B, intensities. A common reference E-field intensity is 1 V m−1 , and for its temporal derivative dE/dt, the common reference is 1 V m−1 s−1 (125). Using decibels defined by these reference values allows comparing both the E and dE/dt spectra on one figure and so extends the frequency range over which the lightning properties can be characterized. For example, the E-field spectra are used by Weidman et al. (125) to characterize the frequency content of return strokes from frequencies of 100 kHz through 1 MHz, and the dE/dt spectra are used for frequencies in the range of 1 to 20 MHz (Fig. 11). For lightning sources range-normalized to a 50km distance, a typical spectral value of the E-field peak occurs at frequencies near 5 kHz and has dB values near −70 based on the reference intensity of 1 V m−1 (134). At a 50-km range, the broadband signal generated by a cloud–ground stroke varies by more than 80 dB between the VLF peak and the HF signal level at 10 MHz (125,134,135). Although a 20-dB decrease represents an order of magnitude drop in intensity, an 80-dB decrease represents a signal level only 0.01% of the magnitude of the peak level. Stepped leader waves that have average peak signal levels that are 10% of those of the following return strokes (8) thus have peak signal levels that are about 20 dB lower than those of the return strokes (125). The stepped leader waves in Fig. 8 have peak levels about 10% of the 11 × 10−8 weber per square meter Wb/m2 (denoted) peak level of the return stroke; these steps are spaced at 20 to 25 µs, which is consistent with a medium frequency (MF, 300 kHz to 3 MHz) of 400 to 500 kHz and which is on the lower end of the frequency band for negative stepped leader waves given in Fig. 11. From this figure, it is apparent that the power at these frequencies is about 20 dB below that for return strokes whose frequencies are near 100 kHz, at which the NLDN samples. For frequencies greater than 50 MHz used by LDAR and SAFIR, much lower signal levels than those associated with the MF frequencies of the stepped leader step peaks would be expected. In the frequency range of 100 kHz to 1 MHz, the power at all frequencies for intracloud electromagnetic waves is typically about 10 dB lower than that for cloud–ground waves, indicating a significantly weaker signal level (125). Subsequent return strokes in the flash generally do not have branches and, as noted in Fig. 11, also produce
920
LIGHTNING LOCATORS
a small signal level above 1 MHz (74, p. 119). Although VLF/LF systems such as the NLDN and CGLSS can detect these subsequent return strokes, VHF interferometric systems such as SAFIR can provide some information about them by detecting the preceding K−processes and dart leaders (86). In general, the mean E-fields produced by subsequent strokes are one-half to threequarters of those produced by the first strokes in a flash [Table 7.1 in (74) and (144,148,149)], and so only the peak magnitude of the first strokes should be used for distance ranging. Finally, continuing currents in subsequent return strokes emit a substantial amount of ELF radiation at frequencies less than 300 Hz (5, pp. 362–363), and so can be detected at a great distance. For example, a shipborne system that uses the spectral components of received signals over ELF and VLF frequencies can provide ranging between 1,000 and 3,000 km (58). Additionally, measurements can be made on a truly global scale by applying magnetic direction finding techniques to received signal levels in the ULF/ELF transition range of 4 to 200 Hz (150). The received signal level is affected by the attenuation rate over the distance propagated and also by the frequency-dependent propagation characteristics of the atmosphere itself (5, p. 352). At frequencies less than 300 kHz, which are primarily used by plan view SSC and SSL locating systems at ranges >100 km (Tables 1 and 2), the electromagnetic wave is a ground wave (see earlier). To a good approximation, the amplitude of this ground wave varies inversely with the distance propagated, and in the VLF this provides the basis for lightning detection and ranging at distances up to several hundred kilometers (133). For example, as seen by comparing the stars and pluses in Fig. 11, the signal is decreased by 12 dB, or four times, and also four times over a range that varies between 50 and 200 km (134,151). Using signals in frequencies between 300 kHz and 30 MHz can produce ambiguity because they may arrive at a ground-based receiver by multiple paths due to both ground and sky waves; for this reason, locating system receivers predominantly avoid this range. Very high frequency/ultrahigh frequency (VHF/UHF, 30 MHz to 3,000 MHz) signals penetrate the ionosphere, removing the multipath ambiguity, but the line-of-sight propagation limits ground-based receivers to ranging of only local storms (5, p. 367). Satellitebased systems, however, can use VHF signals to great advantage (49–53). Taken together, the effects of radiated power and signal attenuation imply that the desired range to the lightning controls which radio-frequency band should be used by a ground-based system (Fig. 11). To detect lightning on the large scale of several thousand kilometers, one should choose receivers sampling predominantly in the VLF region near the 5-kHz peak in the spectrum, as is done by the British ATD system (17,18) (Tables 1 ¨ and 2) and by Fullekrug (150), who used simultaneous global observations of the earth–ionosphere cavity, or Schumann, resonances. The shipborne ELF/VLF system of Rafalsky et al. (58) combines signal levels whose frequencies are less than 1 kHz with those between 2
and 23 kHz to detect lightning in the range of 1,000 to 3,000 km. The broadband MDF and TOA receivers used by the NLDN, CGLSS, GPATS, and GP-1 systems measure signal levels predominantly in the VLF and LF ranges and so detect lightning up to distances of 400 to 600 km (13). The effectiveness of VLF/LF methods becomes degraded at close ranges, within 100 km or so, in part because stepped leaders in intense lightning may lead to premature triggering of the system, as may well happen in the example in Fig. 8, and in part because near-field components of the electromagnetic radiation are no longer negligible (5, p. 353). The LDAR 6-MHz bandwidth system uses signal levels centered on 66 MHz, and the LMA/LMS 6-MHz bandwidth system uses signal levels centered on 60 MHz, to map the lightning channel given by negative leader steps associated with discrete pulsed breakdown processes at ranges up to 100 to 150 km or so. In contrast, the SAFIR/ONERA-3D narrowband VHF ITF systems sample more continuous bursts of radio noise from most phases of a lightning discharge at frequencies between 100 and 118 MHz using 1-MHz bandwidths (24,27,86,137) up to ranges of 100 to 400 km (22, p. 157; 65), depending on the elevation and separation of the receiving stations in the network. Note from Fig. 11 that the decrease in maximum range, as the ground-based receiver frequency increases, closely follows the decrease in lightning electric field spectral magnitude, as frequency increases. This decrease is in part a direct result of the relatively uniform attenuation rate of the signal with propagation range over land. The weaker the original signal, the shorter the distance it can propagate and still be discernible from sources other than lightning. In theory, the radiation or far-field component of the electric field of a thunderstorm decays directly with range r (5, p. 353; 106, p. 545). In practice, however, the relationship between r and the received signal level is a bit more complicated and may be given by a signal propagation model of the form y=
r −p D
,
(2)
in which y represents the received signal level for an individual stroke, as for the NLDN (13) (see later), and D is a site- or bearing-dependent parameter. For singlestation SSL systems, y is a statistical parameter, such as the median, that describes an aspect of the distribution of strokes in a thunderstorm over a period of time, such as 30 minutes (56,57) (see later). Doing so assumes that the set of strokes that composes the distribution originates from a region at a well-defined distance from the sensor. If there are multiple regions of lightning, however, then the range estimate given by this simple model will correspond to an average distance to these regions, weighted by the number of detected strokes from each region. A 12-dB decrease in the received signal levels for lightning at the 50- and 200-km ranges is consistent with the theoretical value of 1 for the attenuation rate coefficient p in Eq. (2) (151). Generally, however, p > 1 is found due to site error effects that are related, in
LIGHTNING LOCATORS
part, to reradiation of the lightning signals by manmade structures, to topographic variations along the propagation path (3;22, p. 160), and to distortion of the magnetic field in the vicinity of the direction-finding antenna (152). For example, the NLDN uses p = 1.13 in a more complicated version of Eq. (2), which was determined by Orville (132) to estimate the peak current Ipeak from the peak level of the received signal (see later). In an earlier study of NLDN observations, Idone et al. (133) found p = 1.09 for a power law of the form of Eq. (2); in contrast, Mazeroski et al. (56) found bearing-dependent values of p < 1 for power laws of this form for the GP-1 SSL installation at Lawrence, Kansas (see later). As noted earlier, accurate location of the source of lightning depends on accurate determination of the bearing. Although broadband magnetic direction finders can, in principle, provide accuracies of a degree or less (7), site errors also can significantly affect the bearing calculated at one receiving site. Because these bearing errors generally do not change with time, they can be accounted for during calibration of the instrument and corrections can be applied before the signal propagation model, Eq. (2), is used for ranging (56,152). Multistation networks use triangulation or time-of-arrival methods to obtain a best position by finding the location that minimizes the variability in the estimates given by pairs of receiving stations (22, pp. 160–162) (see earlier). The best estimates are obtained by combining two independent estimates via least-squares optimization, as is done by the Improved Accuracy from Combined Technology (IMPACT) sensors used in the NLDN network (13) (see later). A description of several examples of real-time, operational lightning detection, direction finding, and locating systems follows in the next sections. Of the many systems summarized in Tables 1 and 2, those chosen for description sample different characteristics of the lightning signal to obtain information tailored to the particular applications of that system. The accuracies and ranges vary with each system and are constrained in part by the particular lightning characteristics sampled. Some systems use more than one characteristic, and this strategy generally improves the location or detection accuracy noticeably. THE LAUNCH PAD LIGHTNING WARNING SYSTEM: A LIGHTNING WARNING DEVICE The United States Air Force (USAF) 45th Weather Squadron uses a network of 31 electric field mills, over an area of roughly 20 × 20 km, to measure the electric field potential gradient at the surface, in support of the space program at Cape Canaveral Air Force Station (CCAFS) and Kennedy Space Center (KSC). This network of field mills, communication, analysis, and display equipment is known as the Launch Pad Lightning Warning System (LPLWS). A map of the field mill locations is shown in Fig. 12. The electric field potential gradient is sampled at 50 Hz, and 1-minute averages are displayed and isoplethed in kV/m (Fig. 13). The electric field potential gradient has the same magnitude as the electric field but has the
921
opposite sign. Thus LPLWS displays the fair-weather field as about +200 V/m, but is positive. Strip charts that show the time series of the field mills are also available. The primary purpose of the LPLWS is to help evaluate the lightning launch commit criteria (153) to guard against rocket-triggered lightning or to determine if potentially hazardous conditions are safe. The LPLWS also helps in lightning forecasting: the building electric fields from locally developing thunderstorms provide some warning of the onset of lightning. The decaying phase of the thunderstorm can sometimes be seen on the field mill display, especially in the slowing of the lightning rate on the strip charts, and the end-of-storm oscillation (92,109) can sometimes be detected on the strip charts to help forecast the end of lightning. In addition, the electric charge in anvil and debris clouds and their lightning threat can be detected, although screening layers at the thunderstorm tops must be considered. The two cases of anvil cloud and debris cloud lightning are especially difficult forecast challenges. Although much less frequent than lightning from cellular thunderstorms, cloud–ground lightning from an anvil can be surprising, given the long distances it can strike from the parent thunderstorm. Cloud–ground lightning from anvil clouds has been documented more than 90 km from the parent thunderstorm, and incloud lightning more than 130 km. Lightning from a debris cloud is even less frequent than that from an anvil cloud but can be surprising, given the long time it can strike after the parent thunderstorm has dissipated. Cloud–ground lightning more than 1 14 hours after the last flash in the parent thunderstorm has been documented. The LPLWS also provides lightning detection and has limited lightning location capability. The LPLWS detects and locates all types of lightning, including lightning aloft. If a field mill changes by 200 V/m or more, positively or negatively, between 50 Hz measurements, then the LPLWS defines that a lightning event has occurred and activates its lightning algorithm. Every field mill that detected the instantaneous pulse, regardless of magnitude, is used to isopleth the overall electric field change. Note that because the isopleth maximum is used, the location does not necessarily correspond to the field mill reporting the largest change; that is, the interpolation scheme can produce greater electric field changes not collocated with the field mills themselves. The location of the maximum field change, is determined by the isoplething, is taken to represent the center of mass of total change in the electric charge in the lightning flash, that is, the middle of the lightning flash projected onto the ground. A 32 × 32 one-kilometer grid is used to plot the inferred position of the lightning flash. Letters in alphabetical sequence are used to plot the lightning; the first lightning flash is plotted as ‘‘A,’’ the next as ‘‘B,’’ and so forth (Fig. 14). The lightning is plotted at the closest grid point, not the actual maximum in isopleths. This approach introduces an average positioning error of 354 m for purely vertical lightning, and an unknowable range of 0 to 707 m. But a much greater location problem occurs for lightning that has any horizontal extent. Consider a flash of anvil
922
LIGHTNING LOCATORS
Figure 13. LPLWS example of a lightning event at 21 : 35 UTC 28 December 1998. The electric field potential gradient is sampled at 50 Hz, and one-minute averaged electric field potential gradient (kV/m) values are isoplethed. Fair-weather field values are displayed as +0.2 kV/m because the electric field potential gradient and the electric field have opposite signs. The field mill locations are denoted by the integers and their observations by decimal values. (Courtesy USAF, 45th Weather Squadron, used with permission.) See color insert.
lightning at a 9-km height, more than 50-km long, over the middle of the network. LPLWS will plot that flash as a single point in the middle of the network. The threedimensional nature of the lightning is completely lost; its altitude and horizontal extent are unknown. The plotting grid also displays strong lightning outside the LPLWS network erroneously on the edge of the network. LPLWS was meant to detect and locate lightning only over the field mill network but typically detects lightning as far as ∼20 km outside the network. Fortunately, users of LPLWS are accustomed to this problem and know how to discount the erroneously plotted lightning locations. The LPLWS is a reasonably good detector of all types of lightning, and has a detection efficiency of 90%. But its location accuracy is considered poor. Since the implementation of the Lightning Detection and Ranging system (see later) and other lightning systems at CCAFS/KSC, the lightning detection and location features of LPLWS have been considered merely a third backup capability by the 45th Weather Squadron.
ELECTRICAL STORM IDENTIFICATION DEVICE: A RADIO-OPTICAL FLASH COUNTER The Electrical Storm Identification Device (ESID) and the similar ASOS (Automated Surface Observing System) Lightning Sensor (ALS) developed by Global Atmospherics, Inc. (GAI) (Tables 1 and 2) are two of the most widely used single-station flash counter (SSC) systems. The system has been specifically designed for ease of installation and operation to protect people and assets. The ESID is a portable, lightweight unit that measures two feet high and one foot in diameter, and it contains a solar-powered sensor as well as a battery backup within the display unit to ensure uninterrupted operation of the system during power outages. The ESID has appealed to a broad group of users, including private companies, individuals, and the United States Golf Association (USGA) for real-time use in its golf championships. The Thunderstorm Sensor Series (TSS) versions of the ESID have undergone extensive testing for certification on ASOS platforms (154), and the results have
LIGHTNING LOCATORS
923
Figure 14. Sample of LPLWS display at 14 : 13 UTC 28 June 2001. The red letters show where LPLWS infers the center of a lightning flash of all types projected onto the ground. One-minute averaged electric field potential gradient (kV/m) values are isoplethed, and dashed curves indicate negative values. The field mill locations are denoted by the integers and their observations by the decimal values (kV/m). (Courtesy USAF, 45th Weather Squadron, used with permission.) See color insert.
been published in reports available from the ASOS Program Office of the United States National Weather Service (155). Lightning Signal Discrimination Several components of the ESID enable sensing both return and intracloud (including cloud–cloud and cloud–air) lightning strokes (54). Both radio and optical signals from a stroke must be received for lightning detection, a requirement that greatly reduces the false alarm rate, the ratio of the number of incorrect detections to the total number of detections (154;156, pp. 240–241). The device is omnidirectional and so cannot resolve the bearing of a given stroke. Rather, emphasis is placed on range estimation of return strokes. Range estimation of intracloud strokes, however, is not available from the earlier TSS520 model ESID; the newer model TSS924 is used on ASOS platforms as the ALS. Because the ESID is a
single-point sensor, range cannot be estimated by triangulation but is derived from an analysis of the optical and radio signals generated by the stroke. First, the optical sensor on the ESID detects and processes the waveforms from the light emitted by a lightning stroke (154). When compared with the inverserange attenuation rate 1/r of the electric field (see earlier) (5, p. 353; 106, p. 545), the optical field attenuates more rapidly with range r and obeys the inverse-squared relationship in which the acquired optical signal level is proportional to 1/r2 . This attenuation rate limits the range of the ESID to line-of-sight measurements when the atmospheric visibility is not seriously degraded by rainfall. The system response also degrades as the flash rate increases owing to processor dead time, the time when the system is analyzing a recently received signal and so cannot detect another (154). Second, the instrument senses the electric field. Because the ESID detects lightning at close range, both
924
LIGHTNING LOCATORS
the near-field, or electrostatic, and far-field, or radiation, components of the emitted electromagnetic radiation must be considered (73, pp. 61–63). To do so, the decay time constant is changed to allow the antenna to measure both of these components of the electric field. The electrostatic component has lower temporal frequency, of the order of milliseconds to seconds, and essentially reproduces the total field change from an entire flash (22, pp. 106–107). When the decay time constant of the circuit signal is reduced and forces the field signal level to decay faster, the electric field waveform, or radiation component, is measured (75, p. 191) (see earlier section). The E-field antenna now serves as a fast antenna, of the type used in combination with gated, magnetic direction finders (5,11;117, p. 221), which allows observation of the higher frequency components to distinguish the type of flash using time-domain waveform analysis (see earlier section). The simultaneous sensing of both components of the E-field by one antenna allows the ESID to acquire the rapidly occurring field variations of a complete lightning flash with adequate time resolution. The ESID registers a lightning stroke only if the optical threshold is exceeded within 100 µs of the electric field threshold time. The patented signal processing circuits in the ESID, which reject background noise, also aid in virtually eradicating false alarms for properly installed sensors, as affirmed by the manufacturer, GAI (157). Once a return stroke is detected, its range is then estimated by data-processing electronics within the unit, and data are transmitted to the display unit via a fiberoptic communications link. Range estimates are given in one of three intervals: 0–5, 5–10, and 10–30 miles (0–8, 8–16, 16–50 km) from the device. The ESID can also automatically issue audible and visual alarms according to a user-selected criterion. Flash count information is also displayed on the unit. The capacity of the ESID to estimate the proximity of lightning, as well as its ability to withstand occasionally harsh outdoor weather elements, made it a viable choice as an operational lightning sensor in the early 1990s. During this time, a network of Automated Observing Weather Systems (ASOS) was being installed as the result of the modernization plan of the United States National Weather Service. These instruments, positioned at hundreds of airports across the country, relay critical surface weather observations to ground operations, eliminating the need for human observers. The ALS reports that a thunderstorm is in the vicinity of the station (using the designator TS in the coded message) if the sum of the cloud–ground flashes in a 10-statute-mile radius and cloud discharges at any range is two or greater. A flash cannot be older than 15 minutes to be included in this count. This so-called two-stroke algorithm (158) provides more reliable reports than simply requiring one stroke, as demonstrated in a performance test conducted by the Raytheon STX Corporation in the spring and summer of 1997. Sensor Performance Studies in the mid-1990s compared the performance of ESID with that of human observers and with that of
magnetic direction finder networks that locate lightning via triangulation (see earlier section). A study at an airport in 1993 concluded that the ESID could detect a greater number of return strokes than humans because humans use thunder to detect lightning (154). Ambient aircraft noise may have occasionally muffled distant thunder that the human observer would have otherwise detected. In this early study, the ESID performed quite well with respect to both human and network detection, particularly within the first range-bin of 0 to 5 miles. The more extensive 1997 study by Raytheon STX used 10 separate installations and found that the two-stroke algorithm produced ESID thunderstorm reports 88% of the time that humans reported them and 89% of the time that the National Lightning Detection Network (NLDN) reported them. ESID reported 25% more thunderstorm minutes than humans and almost 2 12 times more minutes than the NLDN, whose event coverage was low during the study. Only 0.8% of the ESID-reported events were defined as false alarms because neither humans nor the NLDN reported lightning when an ESID did. Most of the events missed by the ESID were due to the conservative twostroke requirement for logging a thunderstorm report. Although humans and ESID reported thunderstorms about the same number of times as the NLDN, there were many events when a human or an ESID detected lightning and the NLDN did not. This discrepancy is explained by the fact that the NLDN reports only cloud–ground strokes (see later), but humans and ESID report both cloud–ground and cloud discharges. Tests such as these provide strong evidence that the ESID sensor provides a viable means for automatically detecting thunderstorms. Currently, 27 ASOS sites have such ALS detectors (158). An automated system that can report both range and direction in at least the eight octants remains a requirement of the FAA because this amount of detail is expected in the standard hourly reports of atmospheric conditions produced by any surface reporting station. A more advanced ESID, that can resolve the bearing of cloud–ground strokes and estimate the range of intracloud lightning strokes is planned. To help meet this FAA requirement, an alternative method for reporting thunderstorms at ASOS sites is being implemented currently to supplement the observations from the ALS. Dubbed the Automated Lightning Detection and Reporting System (ALDARS), this approach reports when the NLDN detects cloud–ground lightning within 30 nautical miles of an ASOS site and reports the location to that site (158,159). As of 6 October 2000, 785 of the 886 ASOS sites had operational version 2.60 ALDARS installations; for the current status of these installations, see (160). THE GREAT PLAINS-1: A SINGLE-STATION LOCATOR The Great Plains-1 (GP-1) system is a gated, single-station magnetic direction finder of the type used in the National Lightning Detection Network (NLDN) (6,7,11,56,57,161) (see later). The GP-1 consists of the usual crossed-loop magnetic field (B-field) and electric field (E-field) antennas that detect the electromagnetic signal from a lightning
LIGHTNING LOCATORS
stroke. Time-domain waveform analysis is used to identify which received electromagnetic signals are from negative return strokes (see earlier section and Figs. 6 and 7). For each stroke, the bearing calculation is based on the peak levels ew and ns of the two components of the B-field signals (Fig. 2). The system estimates the locations of active areas of cloud–ground lightning using a signal propagation model of the form of Eq. (2) that relates measures of the received peak signal to the estimated range to that region. Of course, this approach can give reasonable results only when there is a single well-defined region of lightning in the bearing of interest. Signal Propagation Model Because a typical distribution of lightning peak signal levels, or magnitudes, is right-skewed, and peak signal levels from a given storm often vary by an order of magnitude (13,162), a viable signal propagation model must be based on a set of observations from a region of cloud–ground lightning rather than on the peak signal level from an individual stroke (56,57). To relate a given set of observations to the range to the source, the changes in the shape of the lightning peak signal distribution must be known as the range changes. Once this has been done, lightning range is estimated by matching the observed distribution over a fixed time interval, such as 30 minutes, to a reference distribution of known distance from the receiver. Such reference distributions are found by analyzing a developmental database, in which the GP-1 observations in each 1° bearing are sorted into rangebins; these are sufficiently wide distance intervals to provide an adequate sample for an entire lightning season. The ranges to the lightning are provided by the NLDN observations for the same time and region. Separate rangeestimating equations of the form of Eq. (2) are needed for each 1° bearing because site errors lead to varying signal attenuation rates that are produced when the electromagnetic waves propagate over surfaces that have different conductivities or varying topographic relief (3;22,
925
p. 160) or when the magnetic field becomes distorted in the vicinity of the direction-finding antenna (152). Site errors also are manifested as bearing errors that must be corrected using a two-cycle sinusoidal function determined for each GP-1 site (152). Signal propagation models for each 1° bearing are determined that relate the decile value from the lightning peak signal level distribution to the range. To find the 10 decile values di , i = 1, . . . , 10, the set of observed peak signal levels over the sampling interval of interest is sorted from smallest to largest value (156, pp. 22–23). The first decile d1 is the value in the list that is one-tenth of the way up from the first, smallest value. The fifth decile d5 is the same as the median because it is the value at the midpoint in the list, and the tenth decile d10 is the largest value. Deciles conveniently standardize the description of the distribution for samples of varying size. Deciles also provide a convenient, detailed way to express how the distribution changes with range. Such changes can be well approximated by the Weibull distribution, an idealization that has parameter values that can be varied to produce distributions ranging from Gaussian to highly skewed (38,162). To illustrate how the received peak signal level distributions vary with range, two possible Weibull distributions representing lightning peak signal levels from sources at two different ranges are shown in Fig. 15. Here the distribution of the B-field peak magnitude that represents the closer source (solid curve) has a broader shape and is shifted toward larger values than the distribution that represents the more distant source (dashed curve). These particular distributions are chosen based on the expectation that, owing to additional signal attenuation as range increases, the more distant case would yield fewer strokes that have relatively large peak levels but would yield more strokes that have relatively small peak levels. These two distributions are expressed as probability density functions, which are normalized so that the area under each function is 1. Here, the 10 decile values di , i = 1, . . . , 10 are given by the
0.6 Weibull PDF Probability density function
0.5 Distant 0.4
0.3 Nearby 0.2
0.1
0 0
0.5
1
1.5
2
2.5
B -field peak level
3 (10−8
3.5
Wb/m2)
4
4.5
5
Figure 15. Two idealized distributions of received B-field peak signal levels given by the Weibull probability density function. Note that the more distant sources (dashed curve) produce fewer strong, but more weak, signals than the nearby sources (solid curve), as would be expected owing to greater attenuation of the original lightning signal as range increases. The cumulative distribution functions for these two cases are shown in Fig. 16.
926
LIGHTNING LOCATORS Cumulative distribution function for weibull distribution lawrence, Kansas, GP-1 signal propagation model for 150° bearing 0 Nearby
Distant
100 km
200 km
B -field peak level (10−8 Wb/m2)
4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 0
1
2
3
4
5
6
7
8
9
10
Decile index
Figure 16. The cumulative distribution function for the B-field peak signal level as a function of the decile index i for each of the two Weibull distributions in Fig. 15 for the nearby (solid curve) and distant (dashed curve) cases. Also shown are the decile values di for two sets of points given by the signal propagation model at a bearing of 150 ° for the GP-1 locator at Lawrence, Kansas, one for the 100-km range (diamonds), and the other for the 200-km range (triangles). Note that both the slope and magnitude of the decile values are greater for the closer sources and that the cumulative distribution function from the Weibull distribution models the received peak signal level fairly well. Equations of the form of Eq. (2) model these variations better using coefficient Di and exponent pi values that are functions of the decile index i and bearing.
values of B for which the area under the curve between 0 and B is i/10. When these decile values are graphed as a function of the decile index i (Fig. 16), the curves are called cumulative distribution functions. It is apparent from Fig. 16 that for each decile index i, the decile value di for the more distant stroke (dashed curve) is less than the decile value for the nearby stroke (solid curve). Owing to the different shapes of the distributions in Fig. 15, the difference between decile values increases with the decile index i and is manifested in Fig. 16 by the greater slope and magnitude of the curve for the nearby distribution that represents the closer source. Also shown in Fig. 16 are two sets of decile values for the signal propagation model at a bearing of 150° , one for a range of approximately 100 km from the GP1 installation in Lawrence, Kansas (diamonds), and the other at a range of approximately 200 km (triangles). Note that the signal propagation model at each range follows the changes in magnitude and shape of each idealized distribution fairly well; in particular, the two models follow the expected trends as range increases. The deviations from the Weibull distribution are sufficient, however, to warrant use of the signal propagation model. Once the decile variations are found for each bearing and range-bin, equations of the form of Eq. (2) are found −1/p but are rewritten as ri = Di yi i . Here yi is the decile value for index i obtained from the distribution of the received peak signal level, converted from V to Wb/m2 by using Eq. (1) in (6), and ri is the range estimate for that index, expressed in kilometers (161). The signal propagation
model is expressed in terms of the values of Di and pi that are bearing- and decile-dependent, where pi < 2/3 typically, and yield useful ranges between roughly 40 and 360 km if there is a sufficient sample over a reasonable time interval (56,57). How well the signal propagation model tracks the decrease of peak signal levels with range is expressed by the coefficient of determination, or squared correlation coefficient, R2 (156, p. 167); the better the representation of the variation of decile value with range, the closer to 1 the value of R2 . For the GP1 site in Lawrence, Kansas, the average value of R2 is 0.78 across all bearings and decile values but is greater than 0.93 for deciles 6 through 9 that describe the upper half of the distribution. The first two decile values that represent the weakest peak signal levels that contribute to the distribution provide the poorest set of equations, and so their range estimates are often ignored. All of these results are consistent with the results in Fig. 16 that reveal an increasing separation between decile values as the decile index increases. Estimating Range Given that the rate of movement of thunderstorm areas is as high as 40 km/h, sampling periods of the order of 20 or 30 minutes must be used to determine the peak signal level distribution, so that the storms remain at approximately the same range. Although the signal propagation model is developed by using a large database of observations from an entire lightning season, it is applied to samples that are considerably smaller in size, varying from only 15 to
LIGHTNING LOCATORS
40 strokes per 1° bearing. For any given 20- to 30-minute sample containing thunderstorms in various portions of their life cycle, the resulting distribution is likely to deviate from that in the longer term developmental distribution that is uniformly composed of thunderstorms in all stages of their life cycles. Therefore, to obtain the best possible range estimate, care must be taken to ensure that the small-sample distribution follows the form of a large-sample distribution. As can be seen from Fig. 16, the decile values from the long-term distributions represented by the triangles and diamonds each follow a cumulative distribution function that has a nonzero slope. In contrast, short-term distributions can have lengthy portions of zero slope, as illustrated by the 30-minute distribution denoted by diamonds and labeled ‘‘Duplicates’’ in Fig. 17. Those intervals of zero slope are caused by duplicate values in the distribution that overemphasize some portions of the long-term distribution for that range, which means that other portions are underemphasized. During some portions of the life cycle of a lightning-producing cell, duplicate peak signal levels are common and can be separated in time by many minutes; this means that they do not come from the same lightning flash (56,57). The distortion of the distribution due to retention of the duplicate peak signal
levels can lead to range estimation errors that in some cases are unacceptably large (56). One way to adjust the short-term distribution and thereby obtain distribution estimates whose cumulative distribution functions have nonzero slopes is to eliminate the duplicate peak signal levels; duplicate peak signal levels are defined as those that have the same value within the resolution of the receiver. Experience has shown that when there are many duplicate peak signal levels to eliminate, doing so allows the sample distribution to replicate the long-term one better (Fig. 17) and consequently, to yield better ranging results. When the ranging algorithm is implemented, a minimum of 11 samples in a bearing is required to have a sufficiently long sorted list that defines the decile values and so produces a potentially usable range estimate. Even when eliminating the duplicate peak signal levels to form a corrected distribution, not all decile values from the adjusted distribution will necessarily yield acceptable range estimates. Other quality control measures must be implemented to lower the uncertainty of the range estimate to an acceptable level. One such measure is to require that a decile value fall between its smallest and largest value in the developmental database for that 1° bearing before it is used to produce a range estimate from the −1/p equation ri = Di yi i . These smallest and largest values
KS GP-1 short-term and long-term distributions at 180° 1.8
Duplicates No duplicates SPM 330 km Minimum for SPM Maximum for SPM
1.7
B -field peak level (10−8 Wb/m2)
1.6 1.5 1.4 1.3 1.2 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 1
2
3
4
5 Decile index
6
927
7
8
9
Figure 17. The decile values for the 30-minute period ending at 09 : 00 UTC 27 August 1999 at a bearing of 180 ° when all 35 samples are included in the sorted distribution (diamonds) and when all duplicates are removed, giving 23 samples in the sorted distribution (asterisks and Xs). Note that in contrast to the case including duplicate values, the case without duplicates follows the values from the signal propagation model quite well at a 330-km range (triangles), the range given by the average of all NLDN locations at the 180 ° bearing during this period. Clearly, removing the duplicates from the short-term distribution produces a better representation of the long-term distribution for the given range. The minimum values (dotted curve) and maximum values (solid curve) of the deciles that span the ranges between 40 and 360 km in the developmental database are also shown. A decile value from a distribution obtained during a 20- or 30-minute period is used to provide a range estimate from the corresponding signal propagation model power law of the form of Eq. (2) if that value falls within this envelope. All but one of the decile values for the case with duplicates fall outside this envelope and so are not used to give a range estimate, whereas three for the case without duplicates fall within the envelope (decile indices 6, 7, and 7.5, denoted by X). These three give the corresponding range estimates of 332, 315, and 333 km that are averaged to produce the estimated range of 327 km, which is very nearly the same value of 330 km that is given by the NLDN. (NLDN data used with permission of Global Atmospherics, Inc.)
928
LIGHTNING LOCATORS
are given by the dashed and solid curves in the example in Fig. 17, and they define an envelope within which a decile value must fall to be acceptable. Because each decile index i in a 1° bearing has its own model equation, as many as 10 independent range estimates ri can be created from each short-term sample. Only those estimates given by acceptable deciles are averaged to produce the final estimate, and they can be updated with each new observation. For the example based on a 20-minute sample given in Fig. 17, the actual range to the lightning at the 180° bearing was given as 330 km by the NLDN. At this range, the equations for the first and second deciles are not reliable because they have quite low R2 values. Consequently, a fair comparison of the results is best done using the eight range estimates given by the third through ninth deciles and the third quartile, which has a decile index of 7.5. None of these decile values fall within the envelope of acceptable values when all observations are retained (diamonds in Fig. 17); but if used, then the eight range estimates ri produce an average of 391 ± 50 km, a value considerably greater than the actual range of 330 km. Once the duplicates are removed from the shortterm distribution, the range estimates provided by the eight decile values give a much improved average range of 336 ± 32 km, which is consistent with the fact that the cumulative distribution function for the distribution without duplicates (asterisks and Xs in Fig. 17) closely follows the cumulative distribution function given by the signal propagation model at a range of 330 km (triangles). If only the decile values that are within the envelope are used to calculate the average, in this case the sixth and seventh deciles and the third quartile (large Xs in Fig. 17), then the estimate is improved a bit further to 327 ± 10 km. Across all 1° bearings, the mean absolute error in locating lightning at a range between 40 and 360 km from the receiver is reduced to 52 ± 14 km, once the following three quality control procedures are applied to the original short-term samples: the duplicate peak signal levels are eliminated, only the decile values that occur in their corresponding developmental database ranges are used to provide individual range estimates, and at least three of these acceptable range estimates must be found before the average of these preliminary estimates is computed to produce the final range estimate (56,57). An example from 1 June 1999 is provided in Fig. 18, based on observations obtained by the GP-1 locator at Lawrence, Kansas, which is at the center of the map, as indicated by the black cross. The locations of negative return strokes south of Lawrence in southeastern Kansas (black dots), at a range of about 125 km from the receiver, are placed well when compared with NLDN observations (white dots). The ranges to the more distant strokes in western Kansas, Oklahoma, and Nebraska are not estimated because the GP-1 receiver acquires an insufficient number of high-quality signals. At these ranges, however, the system still functions well as a magnetic direction finder. One significant limitation in applying the preceding algorithm comes from the assumption that all lightning at a bearing occurs in a well-defined region at a fixed distance from the GP-1 site. This assumption is often poor, because multiple regions of lightning often exist in a bearing. For
05:20:41 UTC 06-01-1999
Figure 18. A typical screen display from the GP-1 locator in Lawrence, Kansas, based on a 20-minute sample. The locations given by the signal propagation model (black dots) agree well with those given by the NLDN (white dots) in a region south of Lawrence (black cross) on 1 June 1999 at 05 : 21 UTC. The NLDN data in Oklahoma, Nebraska, and western Kansas are beyond the range of the GP-1 installation. The GP-1 signal propagation model of the form of Eq. (2) generally provides good ranging between 100 and 300 km. (NLDN data used with permission of Global Atmospherics, Inc.)
example, parallel squall lines of thunderstorms separated by one- to two-hundred kilometers are often found ahead of cold fronts in the Great Plains; in addition, widely separated areas of isolated thunderstorms typically occur in warm, unstable air masses in the spring and summer. In such situations, the range to the lightning given by the signal propagation model will be a weighted average of the detections from the various regions. The resulting plots of estimated lightning locations will be unsatisfactory in these cases. SATELLITE-BASED SINGLE-STATION LOCATORS In the mid- to late 1990s, two satellites carried instruments developed by the United States National Aeronautics and Space Administration (NASA) for detecting lightning from space using the visible light emitted by a lightning flash; one instrument, the Lightning Imaging Sensor (LIS), is still operational. The data from these satellites have been vital in establishing a consistent worldwide lightning climatology, especially over data-sparse areas such as oceans or sparsely populated land or developing countries. Although scattering from water droplets and ice crystals attenuates more light emitted by lightning channels in the lower portions of a cloud, the vast majority of lightning flashes originate in channels extending through the middle and upper portions of a cloud. Such bright flashes are detected easily from space (44,163). These two satellites are successors to the first optical instruments used for detecting lightning from space, the Operational
LIGHTNING LOCATORS
929
Figure 19. A comparison of lightning observations during July 1995 given by the Operational Linescan Sensor (OLS) and the newer Optical Transient Detector (OTD) (166). Each color represents the number of detections at each pixel during the entire month. The OLS measurements are obtained at local midnight by a human analyst who must manually determine the latitude, longitude, and time of each streak produced by a lightning flash on a visible image. The OTD observations at any location between 75 ° N and 75 ° S were obtained automatically twice in each 24-hour period, both day and night. Note that both sensors captured the fundamental fact that lightning is more likely to occur over land than over the ocean. Because these observations were taken during the Northern Hemisphere summer, the lightning flashes occur primarily in the tropics and Northern Hemisphere. (Figures provided by the NASA Lightning Imaging Sensor science team at Marshall Space Flight Center, AL, used with permission.) See color insert.
Linescan System Sensor (OLS) (43,164,165). The OLS could only distinguish the signature of lightning at nighttime, and thus lightning was only detectable during the local orbit crossings that occurred at local midnight or near dawn and dusk (44) (Fig. 19). The OLS sensor continues to fly on United States Defense Meteorological Satellite Program (DMSP) satellites. A lightning data archive derived from the smoothed 2.5-km resolution visible imagery filmstrips is available for some years from 1973 to 1996 at the Global Hydrology and Climate Center (166). The NASA Optical Transient Detector (OTD) was the first optical instrument on a satellite, the Microlab1, that could detect lightning during both day and night (45–47). The OTD was a rapidly developed prototype of the LIS, having a lower spatial resolution of 8 km at nadir from its altitude of 750 km (versus 4 km for LIS from its 350 km altitude) and a lower sensitivity (LIS sensitivity is 1.6 times that of OTD). LIS is still operational aboard the Tropical Rainfall Measurement Mission (TRMM) satellite (31,34,48). These two sensors are in precessing orbits (55-day repeat cycle for OTD; 24-day repeat cycle for LIS) on low earth-orbiting satellites. The BLACKBEARD VHF sensor on board the ALEXIS satellite operated by the Los Alamos National ´ Laboratory (52,53) and the instrument on the FORTE satellite operated by the Department of Energy that uses a combination of optical (based on the LIS/OTD design) and VHF sensors to distinguish lightning flashes (49–51), are also aboard low earth-orbiting satellites (Tables 1 and 2). The snapshot from low earth orbit limits their utility in certain research and application areas. For that reason, NASA’s next generation optical detector, the Lightning
Mapping Sensor (LMS), has been developed (43,163,167) for geosynchronous earth orbit, and the NASA instrument development team is planning for an anticipated flight opportunity for it in June 2004. Optical Transient Detector The OTD was a flight-qualified engineering prototype of the LIS and was developed at NASA’s Marshal Space Flight Center in Huntsville, Alabama (45). Its development and delivery required only about 9 months, which is an extremely short period for a satellite-mounted instrument. It was launched aboard the MicroLab1 satellite on 3 April 1995 into a low earth orbit at an altitude of 750 km with a 70° inclination that permitted observations of lightning between latitudes 75° S and 75° N (163,168) (Figs. 19 and 20). The MicroLab1 completed an orbit of the earth once every 100 minutes. OTD exceeded its expected one- to two-year design life, and data collection ended after five years in April 2000 (46). Instrument Characteristics. Two distinct but integral units composed the OTD instrument. The sensor, which was essentially a camera, was approximately 20 cm in diameter and 40 cm tall. The accompanying processing electronics equipment was about 10 cm high by 25 cm in length and width. The two modules weighed 18 kilograms, which was about one-quarter of the total weight of the MicroLab-1 satellite. In many ways, the optical equipment on the OTD was highly conventional. The optical sensor was similar to a television camera with lenses, a narrowband interference filter to reject the bright sunlit daytime backgound, a detector array, and a mechanism to convert electronic output into useful data.
Figure 20. Lightning observations made globally by the OTD during 8 August, 1998 (172). Each location was sampled as often as twice a day, once during northward-moving, ascending passes and once during southward-moving, descending passes; the blue regions were sampled on this day. The local and UTC times of each pass are indicated, respectively, above and below the maps. On this day, the ascending passes were at about 1 P.M. local time, and the descending passes were at about 1 A.M. local time. Because thunderstorms are more likely to be created by heating of the ground during daytime, more lightning discharges were detected during the ascending passes in the upper map. The graph at the bottom of the figure summarizes the daily flash activity in 3-minute intervals and gives data quality flags; the maximum activity, approaching 1,000 flashes per 3-minute period, occurred during the ascending pass over South America. (Figure provided by the NASA Lightning Imaging Sensor science team at Marshall Space Flight Center, AL, used with permission.) See color insert. 930
LIGHTNING LOCATORS
Because it was aboard a low earth-orbiting satellite the instantaneous field of view of the OTD was rather limited compared with that possible from a geosynchronous orbit. The field of view of the OTD, 1,300 × 1,300 km, was about 1/300 of the area of the earth’s surface (169). The nadir spatial resolution of 8 km (46) is consistent with the average area visibly illuminated by a lightning flash (163), and the temporal resolution of the sensor was 2 ms (46) to minimize the contribution of the sunlit background (i.e., noise). Location errors typically were between 20 and 40 km and had a median value of 50 km; 25% of the errors were greater than 100 km (46). Capable of detecting lightning during both day and night, the OTD represented a significant advance in lightning imaging technology compared with previous satellite-based platforms such as the OLS. The detection efficiency was 40 to 65%, including both cloud–ground and intracloud (including cloud–cloud and cloud–air) strokes (169); as determined by comparison with the National Lightning Detection Network (NLDN) (see later), the detection efficiency of cloud–ground strokes has been estimated to be 46 to 69% (46). There are four major sources of variance in the detection efficiency: 1. the instrument sensitivity, by design, is an implicit function of the background scene variance; this design allows daytime detection while preserving the signal-to-noise ratio; 2. there is minor (10–15%) variability in the instrument lens/filter response across its angular field of view; 3. there is variability in the pixel ground footprint size across the field of view, which couples to detection efficiency; and 4. the direct line of sight from the sensor to the cloud varies from 0° to about 50° across the field of view, and less light may be scattered from clouds at angles off-normal from cloud top (170). These four sources do not explain the large range of values cited in (46), however. That large variance was due to uncertainty in the empirical validation technique. As described in a paper in preparation by Boccippio, the actual instantaneous uncertainty or variability in detection efficiency (e.g., one standard deviation) from items (1–4) in the list above has been modeled as 1–20%. There were several products in which the detected lightning could be reported (47,169,171). The OTD compared the difference in luminance of adjacent frames of optical data. Should the difference in luminance be large enough, an event, the finest resolution, was recorded. One or more adjacent events in a 2-ms time frame composed a group. Subsequently, one or more groups in a small time period were classified as a flash. Flashes were then grouped into areas, the lowest resolution, if there was sufficient space between existing flashes. The OTD’s predicted lifetime of about 2 years far exceeded longevity expectations. Data were transmitted each day from the OTD to an intermediate station in Fairmount, West Virginia, followed by transmission to the Global Hydrology and Climate Center in Huntsville, Alabama, where the data were processed, analyzed, and distributed (172).
931
Applications of Observations. Several important discoveries and lightning databases resulted from analysis of OTD data, including the creation of the most comprehensive maps of global lightning distribution (Fig. 19). In addition, analysis of the OTD data resulted in a revision of the estimate of the global lightning flash rate to approximately 40 flashes per second, a significant decrease from the previous estimate of 100 flashes per second that had been accepted since the mid-1920s (168,169). This flash rate is based on the estimate of 1.2 billion flashes across the globe in the period from September 1995 to August 1996, a value determined by adjusting the number of flashes observed by the OTD to account for the known detection efficiency of the instrument (168). Moreover, the flash rate in the Northern Hemisphere is considerably greater than that in the Southern Hemisphere; rates during the Northern Hemisphere summer can approach 54 flashes per second. Approximately 75% of the lightning flashes occur in the tropics and subtropics between 30° S and 30° N (168). OTD data analysis also demonstrated that there is far less lightning over the ocean than over the continents (Fig. 19); the global average rate over all of the oceans is only seven flashes per second, but the rate over all of the land is 24 to 49 flashes per second, depending on latitude (168). Although this discrepancy had been hypothesized for some time, based, for example, on studies in the 1950s showing that land is preferred to the ocean for the location of nighttime radio noise at 1 MHz (81, p. 178), the OTD datasets provided the first conclusive evidence for this hypothesis. One strong contributor to the disparity in lightning frequencies results from the higher updraft speeds over land that result from stronger surface heating by the sun; these stronger updrafts lead to greater charge separation in convection over land than that in convection over the ocean. OTD data also have been used to ascertain a lightningdischarge signature indicative of continuing current in cloud–ground discharges (see earlier). This signature is useful for predicting lightning strike thresholds for the ignition of wildfires (173). From an operational meteorological standpoint, OTD data analysis has laid the groundwork for what may eventually be an effective tornado warning system based on lightning detected from space (173). In one of many similar examples, on 17 April 1995, the OTD observed that lightning rates in a supercell storm over Oklahoma first rose rapidly, then dropped sharply. Several minutes later, the storm produced a tornado (47). A similar result was observed using the surface-based SAFIR interferometric network that detects intracloud lightning by mapping the three-dimensional orientation of the lightning channel (65) (see later). Strong updrafts result in vertical stretching and an increase in vertical vorticity, which is a measure of the spin of the air about a vertical axis; this increase is a key element in tornado formation because a tornado is an intense vertical vortex. Strong updrafts result in heavy precipitation loading, which, in turn, lead to increased charge separation, and so to increased lightning rates. The strong updraft then weakens, the precipitation loading weakens, and the lightning rate decreases; at this time, the vertical
932
LIGHTNING LOCATORS
vorticity has reached a maximum, and the conditions are then optimal for tornado formation. If this hypothesis is validated, then the variation in the frequency of lightning may provide up to ten minutes additional tornado warning time beyond what is currently available by using Doppler radar. Other recent work using ground-based sensors is confirming this tornado signature (174). Lightning flash rate time lines, especially when they include lightning aloft, may also be useful in downburst, and even hail, nowcasting (see also later section). Despite its advantages, the OTD dataset did have several limitations. Its 40 to 69% detection efficiency was somewhat low and varied between day and night (168). The Lightning Imaging Sensor has improved detection efficiency and sensitivity (48). Also, the OTD was onboard the MicroLab-1, a low earth-orbiting satellite. Because the elevation of the orbit was quite low, the instantaneous viewing area was rather small. The small viewing area
and rapid orbit of the satellite made the OTD unsuitable for real-time forecasting. The large location error of 20 to 40 km or more made it an unsuitable instrument for ground-truth studies (31). Consequently, OTD datasets have been applied to lightning climatological research rather than to operational forecasting. Lightning Imaging Sensor The LIS was launched from the Tanegashima Space Center in Japan aboard the Tropical Rainfall Measuring Mission (TRMM) satellite platform on 28 November 1997 (175). It is one of five instruments aboard the TRMM platform, which is intended to study tropical rainfall patterns and variability, especially over the ocean where conventional observations are unavailable. As a result, lightning is detected from an altitude of 350 km between 35 ° N and 35 ° S (48) (Fig. 21).
Figure 21. As in Fig. 20, except for the Lightning Imaging Sensor (LIS) (172). The ascending orbit passing over the Gulf of Mexico and Florida is shown in Fig. 22, and the lightning observations near the Kennedy Space Center are shown in Fig. 24a. Note that some parts of the globe were sampled at nearly the same time by both the OTD (Fig. 20) and the LIS. (Figure provided by the NASA Lightning Imaging Sensor science team at Marshall Space Flight Center, AL, used with permission.) See color insert.
LIGHTNING LOCATORS
Instrument Characteristics. The LIS optical equipment is the same as that in the OTD: a staring imager has a diameter of about 30 cm and height of about 50 cm (48,163). The sensor can detect total lightning, day and night, and uses an optical lens and detection array, as well as a signal processor. Perhaps the most important element of both the OTD and LIS sensors is the Real Time Event Processor (RTEP). The RTEP is a sophisticated electronic unit that allows the instruments to filter sun glint and diffuse solar radiation reflected from clouds. By comparing the duration of the reflected sunlight (long) with the duration of a lightning flash (short), the RTEP can remove the background signal level (reflected sunlight) and identify lightning flashes as superimposed transients. As both sensors use RTEPs, the increased sensitivity by LIS is due to other hardware changes. The altitude of the TRMM satellite on which the LIS is integrated is 350 km compared with 750 km for the MicroLab-1 satellite on which the OTD was located. With an instantaneous viewable area of 600 × 600 km, the LIS detects only a small portion of the earth at any given time. Spatial resolution is between 4 and 7 km, the first at the nadir and the second at the edge of the field of view near 35° N and 35 ° S (34,48,175).
933
This marks an improvement over the 8-km nadir spatial resolution of the OTD (46). Estimated location accuracy is better than 10 km (176). The LIS detection efficiency has been empirically estimated as 1.6 times greater than OTD (177), a value consistent with modeling of the instrument response. Because the TRMM satellite is not geostationary, the LIS cannot view a single storm for more than 90 s before the storm leaves its field of view (Fig. 22). Although the TRMM satellite orbit is restricted to the tropics and subtropics (35° N to 35° S), this is the region in which the majority of lightning does occur (168). Applications of Observations. Data from the LIS are being used to generate more complete lightning climatologies in the tropics and subtropics. A comparison of the total number of flashes observed in 1999 by the OTD and LIS instruments is given in Fig. 23 (172). Note that the color scales on the high end for the LIS detections in Fig. 23b are double those for the OTD in Fig. 23a. In addition to the fact that the absolute numbers from the LIS are higher than those for the OTD, the LIS detected relatively more flashes in South America than the OTD, primarily due to the amount of viewing time at midlatitudes. LIS datasets also confirmed that rainfall rates are
Figure 22. LIS background images taken between 20 : 23 : 11 and 20 : 33 : 21 UTC on 8 August 1998 showing the cloud cover in white, ocean glint, and the lightning events detected by the LIS in color; the color bar indicates the number of events that occurred at each pixel. This image is obtained at the website (172) by clicking the mouse on Florida on the background image that is displayed after clicking on Florida on the global image in Fig. 21. Although more than 39,000 events are detected during this portion of the orbit, these events are confined to an extremely small percentage of the total area covered by clouds. The region of activity in central Florida is depicted in Fig. 24a. (Figure provided by the NASA Lightning Imaging Sensor science team at Marshall Space Flight Center, AL, used with permission.) See color insert.
(a)
(b)
Figure 23. Total number of flashes per unit area in 1999 observed by the OTD (a) and LIS (b) instruments (172). Note that the color scales on the low end are the same in both figures, but that those on the high end are about twice as large for the LIS as they are for the OTD. Although the overall flash density pattern is the same for both instruments, there are notable differences, such as in South America where the LIS detected far more flashes than the OTD. (Figures provided by the NASA Lightning Imaging Sensor science team at Marshall Space Flight Center, AL, used with permission.) See color insert.
934
LIGHTNING LOCATORS
highest in areas of frequent lightning (48,173) and have helped document changes in geographical distributions of wintertime lightning frequencies along the United States Gulf Coast during an El Nino (178). Because of its value in providing reliable total lightning observations, the LIS data are being used in many crosssensor research studies (31,34). In particular, an extensive program is under way at the Kennedy Space Center (KSC) to compare the lightning observed by the LIS with that by the National Lightning Detection Network (NLDN) network locator (see later) and the Lightning Detection and Ranging (LDAR) network mapper (31,176) (see later). Indeed, NASA maintains an excellent browse gallery on the World Wide Web (179) where contemporaneous images from these sensors are posted, together with a corresponding WSR-88 precipitation radar image. Figure 24 shows a set of these four images centered on KSC that were taken during an LIS overpass at approximately 20:30 UTC on 8 August 1998; this is the same overpass as depicted in Fig. 22, where it is clear that only a small region covered by clouds is electrically active. Although both the LIS visible wavelength and LDAR VHF electromagnetic wavelength total lightning systems detect all high-frequency stages of the lightning event, the NLDN displays only the cloud–ground strokes (see later). Thus the number of observations reported by the total lightning systems, LIS and LDAR, is larger than the number given by the cloud–ground system, NLDN. Nevertheless, it is clear from Fig. 24 that the three different lightning detection systems agree quite well in placing the lightning locations. In particular, note that the most intense activity detected by the LIS instrument (Fig. 24a) coincides with the locations of the cloud–ground strokes given by the NLDN (Fig. 24b). It is clear that lightning channels span regions larger than those revealed by the locations of the cloud–ground strokes alone. For example, both LIS and LDAR detect lightning west-northwest of the KSC at a 60-km range, but the NLDN does not (Figs. 24a,b,c) LIS validation with a New Mexico Tech-developed Lightning Mapping Array deployed in northern Alabama will continue through the remaining duration of TRMM. Certainly the LIS sensor represents a marked improvement in lightning detection efficiency from space, but its inability to be used for real-time forecasting and analysis underscores the need for real-time, large-scale detection from space. To this end, the Lightning Mapping Sensor has been developed for launch aboard a geostationary GOES satellite when the opportunity arises (43,163,167,180). THE NATIONAL LIGHTNING DETECTION NETWORKTM : A PLAN VIEW LOCATING SYSTEM Although networks of cathode-ray direction finders (CRDF) were used during the middle of the twentieth century for estimating the location of cloud–ground lightning using triangulation techniques based on magnetic direction finding (MDF) (Fig. 3), the National Lightning Detection NetworkTM (NLDN) operated by Global Atmospherics, Inc. (GAI) is the first modern real-time lightning location network to offer lightning locations to a broad set of users in real time (1). This system detects the location
935
of cloud–ground strokes based on the peak signal levels received in the VLF/LF portion of the electromagnetic spectrum (Fig. 11). Cloud–ground and intracloud (including cloud–cloud and cloud–air) strokes are distinguished by applying time-domain waveform analysis (Figs. 6 and 7), and the polarity of the cloud–ground, or return, strokes is determined (see earlier sections). As reviewed in (13) and (14), the NLDN began real-time operation in 1989 when gated, wideband regional MDF networks in the western and midwestern United States (11,181) were merged with a network operated in the eastern United States by the State University of New York at Albany (182). Originally the network was created in the 1980s to support the needs of electric utilities and forestry agencies, but since that time, the user base has broadened considerably to span a wide variety of commercial and government applications (1,13). Optimizing Stroke Locations In the mid-1990s, the MDF network was combined with the time-of-arrival (TOA) network known as the Lightning Position and Tracking System (LPATS) (183) (Fig. 3), which had been operated since the late 1980s by Atmospheric Research Systems, Inc. (ARSI), to form the modern NLDN (14). Finally, in the late 1990s, the network was merged with the Canadian Lightning Detection Network (CLDN) to form the North American Lightning Detection Network (NALDN) (12). Today, many similar VLF/LF MDF or TOA networks are in operation throughout the world (Tables 1 and 2). These networks achieve similar location accuracies and detection efficiencies for both negative and positive return strokes in cloud–ground lightning flashes; because the NLDN is the first and perhaps the most extensively studied of these networks, it is described here as an example of a network locating (NL) system. The hybrid sensors in the NLDN that combine both the MDF and TOA approaches are called Improved Accuracy from Combined Technology (IMPACT) sensors (13,14). Site error is the largest contributor to erroneous measurements using the MDF triangulation method (see earlier section). Both distortions of the magnetic field near the antenna (152) and man-made and natural features that affect the propagation of electromagnetic signals from lightning cause site error, which is both site-specific and bearing-dependent. Once the site error for a specific location is determined, it generally remains constant, thereby allowing routine adjustments of observations to account for it (13;22, p. 160; 56). The most recent IMPACT ESP (Enhanced Sensitivity and Performance) model of the sensor yields bearing uncertainties of less than 1° (184). As discussed in early sections, TOA location techniques compare the time when three or more sensors detect an electromagnetic signal from the same source such as a lightning discharge. The majority of errors associated with the TOA technique are due to imprecise time synchronization of the sensors. Thus, an extremely accurate and well-calibrated timing system is required for the detection network to implement the TOA technique effectively. Clock synchronization errors must be less than 10−6 s. To achieve this accuracy, the NLDN uses the global
(b)
(a)
(d)
(c)
Figure 24. A set of four contemporaneous depictions of the lightning near the Kennedy Space Center (KSC) in Florida during an LIS overpass between 20:27:46 and 20:30:00 UTC on 8 August 1998 (179). KSC is at the center of each image; latitude and longitude grids and range circles are provided every 40 km. LIS data, confined by the satellite track to the green region, are shown in (a), NLDN data in (b), LDAR data in (c), and National Weather Service WSR-88 data from Melbourne, Florida, in (d). The total lightning sensors LIS and LDAR show excellent agreement west-northwest of KSC; the cloud–ground sensors in the NLDN show activity in the same region, although this activity appears to be less only because the NLDN instruments do not detect the cloud discharges. The intensity of the radar echo in (d) is indicated by the reflectivity (dBZ). Note the close correspondence between the locations of the greatest amount of lightning and the locations of highest reflectivity given in red (>50 dBZ). (Figures provided by the NASA Lightning Imaging Sensor science team at Marshall Space Flight Center, AL, used with permission; NLDN data shown with permission of Global Atmospherics, Inc.) See color insert. 936
LIGHTNING LOCATORS
positioning system (GPS) for timing. Additional site errors leading to inaccurate placement of the TOA hyperbola (Fig. 3) arise from changes in the electromagnetic signal as it travels from the source to the sensor location (22, pp. 161–162). Communication errors between the sensor and the satellite relaying the observations to the central processor may be caused by attenuation due to heavy rain or by system saturation during high data rates; these errors can also degrade the performance of the system in real time. These problems are corrected when the data are reprocessed for use in system recalibration and meteorological research studies (13). When the network was upgraded in 1995 using instruments that had better ranging, the total number of sensors was reduced from 130 to 106 (13) (Fig. 25). Today 59 LPATS-III TOA sensors and 47 IMPACT MDF/TOA sensors across the United States compose the NLDN (12). The Canadian CLDN adds 55 of the newer LPATS-IV and 26 of the newer IMPACT ESP sensors to those in the NLDN to form the NALDN (12). Using these newer models, GAI reports that more than 98% of the detected strokes are correctly identified as cloud–ground or intracloud strokes, and the return stroke polarity is assigned correctly 99.5% of the time for lightning ranges out to 600 km (184). Achieving such high rates is important because the focus of the network is on accurate reporting of the location and polarity of cloud–ground strokes. Network estimates of 85% flash detection efficiency and 500-m location accuracy for cloud–ground strokes are supported by ground validation studies based on independent video recording of lightning flashes within the network (12,185,186). This is a marked improvement in the 65 to 80% detection efficiency achieved by the MDF triangulation version of the network before 1995 (13). To achieve these network accuracies, the NLDN determines the final location estimate of a lightning stroke by combining the location estimates given separately by several sets of MDF and TOA sensors (Figs. 3 and 26). A chi-square test provides the nonlinear least-squares technique for determining the optimum stroke location (13). This location comes from the optimal combination of MDF triangulation bearings and TOA times that are given
by minimizing the value of an appropriate chi-square function, defined by (22, pp. 160–162) χ2 =
2
2 φi − φmi tj − tmj + . 2 σaz,i σtj2 i j
(3)
For the ith MDF triangulation station, φi is the bearing of the calculated stroke location in the trial solution, φmi 2 is the bearing measured by the station, and σaz,i is the expected azimuthal error in the measurement. For the jth TOA station, tj is the time at which the lightning signal arrives in the trial solution, tmj is the time of signal arrival measured at the station, and σtj2 is the expected error in the time measurement. Combining the measurements from the two independent systems by minimizing the value of χ 2 produces a least-squares optimal location estimate that overcomes many of the problems inherent in systems that use only one location technique (13). This location process is illustrated in Fig. 26, in which three bearing and seven TOA estimates are combined to provide a strike point at latitude 30.5148° N and 97.8550° W in central Texas on 2 November 2000. The bearing estimates given by stations Z6, Be and Z0 are indicated by the intersecting radials, and the time-of-arrival estimate given by each station is indicated by a circle centered on the station whose radius is given by the product of the speed of light and the difference between the time of the stroke and the time of the sensor measurement. Determining Stroke Current The NLDN reports peak currents Ipeak for cloud–ground strokes; negative values are assigned to negative return strokes, and positive values to positive return strokes. The method for determining Ipeak is based on the theoretical transmission line model (TLM) that describes the timeinvariant waveform of the far-field radiation produced by a vertical lightning channel (14,112,133). Soon after the upward streamer attaches to the stepped leader, a typical first return stroke produces a peak E-field signal level Epeak of 5 to 10 V/m at a range r of 100 km (187), with an upward propagation velocity v in the channel of roughly one-half the speed of light c (129). For µ0 equal to the permeability of free space, µ0 = 4π × 10−7 Wb A−1 m−1 (188, p. 316), the TLM relating Epeak and Ipeak is given by (189) Epeak =
Figure 25. Distribution of IMPACT MDF/TOA hybrid (triangles) and LPATS-III TOA (circles) sensors that form the National Lightning Detection Network (NLDN) in the United States. (Used with permission of Global Atmospherics, Inc.)
937
−µ0 vIpeak . 2π r
(4)
For example (1), an E-field peak level Epeak = 8 V/m at r = 100 km when v = (1/2)c = 1.5 × 108 m/s yields Ipeak = −27 kA. To apply the TLM for determining Ipeak , first the value of the measured signal is converted to a range-normalized signal strength (RNSS), given by the signal propagation model (13,133) p r r − 100 km y, (5) RNSS = a exp 5 10 km 100 km where y is the received E-field or B-field signal, in either V m−1 or Wb m−2 , r is the range in km, and a and p are
938
LIGHTNING LOCATORS
Figure 26. An example from 2 November 2000 of an optimal cloud–ground lightning strike point given by combining measurements from seven sensors in the NLDN network. The location of the lightning stroke, whose parameter values are summarized in the table above the map, is indicated by the intersection of radials and circles at latitude 30.5148 ° N and 97.8550 ° W in central Texas. Recent lightning locations across the mapped region are indicated by the green symbols, + for positive and − for negative polarity. The locations of the measuring stations in the network are indicated by two-letter codes. The seven that were used here are denoted in bold letters on the map, but their hexadecimal equivalents are used in the table (Z6 and 2B, Stephenville, TX; Be and 27, Beeville, TX; Z0 and 2C, Overton, TX; Uv and EC, Uvalde, TX; Mc and E0, McCarney, TX; Bo and F4, Booneville, AR; and EA, Post Isabel, TX, which is to the south of the map). Three olive-colored bearing radials given by MDF stations Z6, Be, and Z0 and the seven brown time-of-arrival circles intersect at the estimated strike location. Each time-of-arrival circle is centered on the measuring station and has a radius given by the product of the speed of light and the difference between the time of the stroke and the time of the sensor measurement. From left to right, the parameter values in the table are given by GAI’s LP2000 manual as stroke summary in the first line — date; corrected time to the nearest nanosecond; latitude, degrees; longitude, degrees; average range-normalized signal strength (RNSS), LLP units (1 LLP = 1.5 × 10−10 Wb m−2 = 4.5 × 10−2 V m−1 ) and stroke polarity (− for negative and + for positive); maximum multiplicity of the stroke; semiminor axis of the confidence ellipse, km; semimajor axis of the confidence ellipse, km; eccentricity of the ellipse; angle of orientation of the ellipse, degrees; reduced chi-square; number of sensors used; stroke type, G for cloud–ground and C for cloud; type of start position used in the iteration to minimize location errors, H for hyperbolic and T for triangulated; information used in optimizing the stroke location, A for angle, S for signal, T for time, ∗ if not used; and quality check indicator, OK if all checks met. Each remaining line is for one of the sensors used — hexadecimal ID; sensor report time, decimal fraction of a second; difference in time between the actual stroke time and the sensor’s measured time, microseconds; rise time, microseconds; bearing if an MDF station, navigational degrees; corrected bearing, navigational degrees; angle deviation between the corrected bearing and the bearing derived from the calculated stroke position, degrees; measured signal (corrected), LLP units; RNSS, given by product of the measured signal and [(range in km)/100]1.13 , LLP units; difference between station RNSS and average RNSS, LLP units; multiplicity; range to stroke, km; configured usable information, A for angle, S for signal, T for time, underscore if not used; information used to determine location, A for angle, S for signal, T for time, underscore if not used; and last two columns other descriptive codes. (Used with permission of Global Atmospherics, Inc.) See color insert.
positive constants; the value of 100 km in the denominator of the term raised to the pth power is used as the standard range normalization. Over a range of 500 km, the exponential term in Eq. (5) represents a small, less than 0.5%, correction to the power-law term. Normally, RNSS is expressed in LLP units, which are related to the
magnetic and electric field strengths via 1 LLP = 1.5 × 10−10 Wb m−2 = 4.5 × 10−2 V m−1 (14) (LLP units derive their name from Lightning Location and Protection, the original company operating the NLDN). Presently, all stations within 625 km of a stroke calculate the value of RNSS using the same parameter values of p = 1.13
LIGHTNING LOCATORS
and 105 km in Eq. (5), although work is ongoing to find site-specific values for these two parameters (13,132). The value of RNSS given by Eq. (5) for each station is then averaged to give RNSS, expressed in LLP units (see table in Fig. 26). Next, this average is converted to Ipeak in units of kiloamperes (kA) via the simple equation Ipeak = 0.185RNSS, which is based on a linear regression analysis (13); this equation produces an estimate of 14.9 kA for the negative return stroke in Fig. 26. This equation also revises that given in the study by Idone and collaborators (133) that was created for the NLDN before the TOA sensors were added to the network. The NLDN detects individual cloud–ground lightning strokes that compose a lightning flash (see earlier sections). It reports both the first and subsequent strokes in a flash, although the detection efficiency of the subsequent strokes is much reduced from the detection efficiency of the first strokes (13). Summaries of lightning occurrences, such as those plotted in Figs. 24b and 27, are also available in which the latitude, longitude, time, stroke polarity, and peak current Ipeak of the first strokes are reported; normally this first stroke produces the largest peak current, and so characterizes the flash well (13;74, p. 121; 131). The algorithm that groups detected strokes into flashes uses both the time of detection of the strokes and the distance these strokes are from each other. As noted earlier, a flash as defined by the NLDN consists of those strokes, of either polarity, that occur within one
Figure 27. An example of real-time cloud–ground lightning stroke locations given by the NLDN in the United States for the period 19:00 to 21:00 UTC 27 August 2000. Yellow pixels indicate strike locations that are current; red, locations that are about an hour old; and blue, locations that are about two hours old. Global Atmospherics, Inc. provides these data through the World Wide Web-based Lightning Explorer service that is accessed via http://www.lightningstorm.com. This example is made available free and is updated every 15 minutes; more detailed times and locations are available for a fee. A line of thunderstorms is clearly seen stretching down the East Coast of the United States; scattered thunderstorms are occurring elsewhere, particularly in the forest-fire-prone western United States. (Used with permission of Global Atmospherics, Inc.) See color insert.
939
second of the initial stroke, are within 10 km of the initial stroke, and are no more than 0.5 s apart in time (13). Separations of this magnitude are considered because as many as 50% of the negative return strokes can have multiple attachments to the ground (127,128). An NLDN flash may contain as many as 15 strokes. In part due to the 50–60% detection efficiency of subsequent strokes, the average multiplicity found is only about 2 (13), which is approximately half the expected number based on previous studies (141). Strokes that have higher peak currents are more likely to be part of multistroke flashes; for example, strokes whose mean peak current is 40 kA have a mean multiplicity of 3.3 (12). When more than 15 strokes occur that meet the single-flash criteria, the additional strokes are arbitrarily assigned as part of a second flash. The location of the flash is given by the latitude and longitude of the first stroke in the flash (13). Applications of Observations The NLDN has been used extensively to study lightning climatology in the United States (12,190,191). A total of 13.4 million flashes was reported by the early version of the NLDN in the United States in 1989, and the peak of 3.6 million ground flashes occurred in July (190). After the system was upgraded in 1994, the number of observed flashes increased from a preupgrade mean of 16.7 million ground flashes per year to 20.6 million flashes in 1995 (14). Furthermore, the positive polarity flash count increased from 700,000 flashes per year to 2.1 million in 1995. These changes have been attributed to changes in the cloud–ground stroke selection algorithm, which is reviewed later (15). From the upgraded system in 1998, the maximum ground flash density value of 16 flashes/km2 /yr occurs in Florida; the minimum value is much less than 1 flash/km2 /yr in the northwestern United States and Canada (12). There is considerable year-toyear variation in ground flash density, but in general there are many more ground flashes per unit area per year in the midwestern United States and central Florida than in other portions of the country; in some years, the density values in the Midwest have approached those seen in Florida (191). Moreover, positive cloud–ground strokes are far more likely in the central United States and Florida than elsewhere, and their relative numbers are affected by the types of aerosols in the air in which the thunderstorms form (147). As a percentage of total cloud–ground strokes, positive cloud–ground strokes are found relatively more frequently in the winter months, perhaps approaching 20% of the total, and in the central United States (191). Overall, about 10% of the cloud–ground strokes are positive, although this percentage may be high by a factor of 2 or more, owing to the suspected acceptance of some intracloud strokes as positive return strokes by the algorithm used by the NLDN for time-domain waveform analysis (13–15). This overly high acceptance rate of positive ground flashes is indicated, for example, by work at the Kennedy Space Center that compared results given by the CGLSS MDF triangulation network, a VLF/LF system described at the end of this section that is a small-scale version of the NLDN (Tables 1 and 2), with those obtained by
940
LIGHTNING LOCATORS
the Lightning Detection and Ranging (LDAR, see later) system; LDAR can map the entire lightning channel and so can indicate if the stroke remains aloft or approaches the ground. In addition, as part of the 1995 system upgrade, the minimum waveform width for acceptance of a return stroke was decreased from 11 µs before 1995 to 7.4 µs after (113), the sensor gain was increased by 50%, and the field overshoot tolerance (see earlier) was increased from 0.85 times the initial peak value to 1.2 times (14). These changes in the cloud–ground stroke acceptance algorithm led to an increase in the number of relatively weak positive return strokes (13,15); some are apparently misdiagnosed large-amplitude intracloud pulses known as narrow positive bipolar pulses (NPBP) (114,115). Because intracloud strokes tend to have much smaller waveform widths — of the order of 5 µs — than return strokes (Figs. 6 and 7), setting the minimum waveform width criterion too low will undoubtedly lead to the misclassification of some intracloud strokes as return strokes. Indeed, apparent positive return strokes whose peak currents are less than
Figure 28. Map of the region around Kennedy Space Center (KSC) showing the six sites at which there are VLF/LF IMPACT sensors composing the MDF triangulation CGLSS network, which is a small-scale version of the NLDN. The sensors form a circle of approximately 20 km in radius, and so the system loses effectiveness beyond roughly 100 km. The ground strike point locations given by CGLSS are used to supplement the lightning observations given by the LDAR system that locates the lightning channel aloft via locating bursts of short-duration radio-frequency (RF) sources (Courtesy USAF, 45th Weather Squadron, used with permission.) See color insert.
10 kA are reclassified during reprocessing as intracloud strokes (13). Many practical forecast applications are emerging from the NLDN archived databases. Flow regime stratified climatology is being used in Florida by the United States Air Force (USAF) 45th Weather Squadron and the National Weather Service office in Melbourne, Florida, to predict local thunderstorm probability (66). In addition, studies of lightning strike distance distributions are influencing lightning safety designs. Finally, NLDN data provide an important source for cross-sensor studies that assess the quality of lightning detection and location by other systems (Fig. 24) (31,34,47). CGLSS — A Small-Scale Version of the NLDN Around the Cape Canaveral Air Force Station (CCAFS) and the John F. Kennedy Space Center (KSC) at Cape Canaveral, Florida, the USAF operates a small lightning location network that is a small-scale version of the NLDN (16). Known as the Cloud to Ground Lightning
LIGHTNING LOCATORS
Surveillance System (CGLSS), this network is used to estimate the electromagnetic pulse (EMP) hazard to payload, launch vehicles, and support electronics. If lightning strikes closely enough or strongly enough, then the affected electronics must be inspected for possible damage. The CGLSS is also used to help 45th Weather Squadron forecasters issue lightning watches and warnings for personnel safety and resource protection. CGLSS consists of six IMPACT sensors (Fig. 28) that are extremely well maintained and calibrated to remove local site effects. At CCAFS/KSC, there is a large network of video cameras and some lightning protection devices that together allow extremely precise location and timing of cloud–ground lightning and so allow calibration of the sensors. Consequently, location accuracy close to 250 m and detection efficiency close to 98% are achieved within the CGLSS network. This compares favorably with the 500-m accuracy and 90% detection efficiency given by the NLDN (13). Owing to the placement and number of sensors in the network, the CGLSS performance loses effectiveness beyond 100 km, in contrast to that provided by the much larger NLDN, which provides coverage throughout the United States (13). LIGHTNING DETECTION AND RANGING: A TOA THREE-DIMENSIONAL NETWORK MAPPER A lightning discharge has various components that are detectable across many frequencies and across a wide range of distances (Fig. 11). The systems reviewed in earlier sections use the electromagnetic radiation emitted by lightning in the very low frequency/low frequency (VLF/LF) bands that span the approximate range of 1 to 400 kHz. The ability to use such sferics has been well established for more than a half century (1), and is best suited for detecting cloud–ground lightning at ranges from 100 to as much as 1,000 km (13,84,184,192). The effectiveness of wideband magnetic VLF methods is seriously degraded at close ranges (within 100 km), in part because of the possibility that the system is prematurely triggered by stepped leaders in nearby lightning (Fig. 8). Arnold and Pierce (193) found that an intracloud Kdischarge associated with dart leaders at 20 km also produces a peak signal level that is comparable in size to that of a return stroke discharge at 100 km. Wideband VLF/LF techniques focus on detecting cloud–ground strokes, but many applications such as space operations require knowing whether any intracloud (including cloud–cloud and cloud–air) or prestroke discharges are occurring that commonly precede the first cloud–ground strokes by several minutes. Knowledge of these cloud discharges is needed in this case because such discharges pose a hazard to missiles launched in environments free of cloud–ground strokes (16). Locating such cloud discharges requires mapping the entire lightning channel. Thus this operational requirement for locating lightning channels within thunderstorms in real time and at close range has necessitated the development of sensing systems that use very high frequency (VHF) radiation associated with prestroke sferics. As summarized earlier, these initial prestroke discharges, which are best detected
941
across a frequency range that spans a few to several hundred MHz, are the negative stepped leader processes within a cloud (Fig. 11) that are the precursors to both cloud–ground and intracloud lightning (86). Indeed, such prestroke discharges within a cloud may occur between seven and 20 minutes before the first cloud–ground flash (65,81,194–196). One study (197) found an average lead time of only 4.5 min, however, and a leadtime of less than one minute — the Lightning Detection and Ranging (LDAR) system display rate — in 23% of the cases. Thus, in nearly one-quarter of the cases, there is essentially no warning of a cloud–ground stroke. Nevertheless, the sources of VHF radiation, which can be detected at lineof-sight distances (84), when mapped in three-dimensional space, may be used to help warn of the possibility of cloud–ground discharges. In this section, the long-baseline time-of-arrival (TOA) VHF technique is reviewed (22, pp. 152–153) as it is applied by the LDAR system that is operated by the Kennedy Space Center (KSC) at the Cape Canaveral Air Force Station (CCAFS), Florida, and is used by the 45th Weather Squadron, United States Air Force (USAF), Patrick Air Force Base, Florida, to support space operations (28–31,87). A similar, but deployable, VHF ground-based mapping system, the Lightning Mapping Array (LMA) or Lightning Mapping System (LMS), is operated by New Mexico Tech (32,33) and is used in a variety of field programs that study lightning (34,107). Finally, an interferometric ground-based VHF mapping system, Surveillance et Alerte Foudre par Interf´erom´etrie Radio´electrique (SAFIR) (24,25), is discussed later. VHF TOA Systems Zonge and Evans (194) were among the first people to use an array of antennas operating at frequencies from a few megahertz to more than 100 MHz to detect electromagnetic radiation from a growing rain shower 10 to 15 minutes before the first discharge. The development of accurate time interval counters, whose resolution is of the order of microseconds, enabled Oetzel and Pierce (192) to use pairs of receivers spaced less than 300 m apart, giving a baseline length of 300 m, to detect VHF emissions. Multiple receivers are needed in a TOA system, because the difference in the time of arrival of a signal between any pair of receivers produces a locus of points traced by a hyperbola (87) (Fig. 3). Ambiguity in the origin of the discharge is usually eliminated when a third receiver is employed, because for most cases, the location is given by the unique intersection of the hyperbolas. A fourth receiver is often included in the system to resolve those cases where multiple intersections are possible (22, pp. 160–161). With five receivers in the shape of a cross, the location in three-dimensional space can be ascertained (22, pp. 152–153; 198,199). With more sensors, multiple solutions allow improvements in data quality control, internal consistency checks, and location accuracy via statistical techniques (87). The sensor configuration chosen by Oetzel and Pierce (84) used a short baseline length of 300 m because they could achieve a time difference resolution of only about 10 µs. As a result, their method was limited
942
LIGHTNING LOCATORS
primarily to detecting overhead discharges at a range greater than the baseline length. This requirement did not present a problem because VHF discharges originate at 5 to 10 km above the surface, in the middle or upper levels of clouds (101). Further developments in accurate time measurement enabled researchers to expand the baseline length to tens of kilometers (198) and to use five antennas. Expanding the baseline length enabled researchers to map stepped leaders over a wider range (up to several hundred kilometers), rather than only those overhead. In addition, further studies (200,201) found that the optimum frequency for detection is between 30 and 85 MHz. This bandwidth captures the distinctive spike in amplitude given by the stepped leader and has the least interference from local television stations and other man-made sources. Further research relating VHF emissions to the origins of the lightning process led to development of VHF detection networks in South Africa (199,202), Ontario, Canada (200,203) and KSC, Florida (201), in the middle 1970s. Observations from all locations showed that the
Figure 29. Map of the region around the Kennedy Space Center (KSC) showing the seven sites where there are VHF antennas in the LDAR network. The sensors form a circle approximately 8 km in radius. Reception of a lightning waveform at site 0 triggers the opening of a 100-µs window during which the other six sites report whether and when they receive the same signal, offset perhaps in time (Fig. 30). (Courtesy USAF, 45th Weather Squadron, used with permission.) See color insert.
highest concentration of stepped leader pulses, up to 2,500 discharges per single cloud–ground stroke (199), is above the freezing level and most concentrated between −5 and −15 ° C (199,202,203). The high number of stepped leader pulses found to occur in the mixed-phase portion of a cloud — where high concentrations of large liquid drops, rime-coated particles, and graupel occur — is an essential ingredient in the early stages of the lightning process and so in the genesis of a flash. The LDAR System LDAR is one of the earliest VHF detection networks still in operation; it is presently providing support at CCAFS/KSC to the United States space program. This system detects VHF radiation in the 6-MHz bandwidth centered at 66 MHz (204) and provides a real-time view of the lightning process to forecasters. The location of the KSC is ideal for testing the limits of an LDAR system (30,31), because central Florida has an average of 8–16 cloud–ground strikes per km2 per year (12,205), the highest concentration in the United States.
LIGHTNING LOCATORS
Central site (0) triggers system Site 0
Site 1
Site 2
Site 3
Computation system t0 t1 (X,Y,Z,) t2 t3
Display system
Locate VHF source from arrival times
Time of peak adjusted for delay Time tagging of LDAR events Figure 30. Illustration of the time-tagging process used by LDAR to determine the time of arrival (TOA) of the lightning waveform. Site 0 in the center of the array (Fig. 29) detects a lightning waveform, whose peak occurs at time t0, and then opens a 100-µs window during which the other sites report detections. The waveforms are offset in time; site 1 reports a peak at time t1, site 2 at t2, etc. The differences between these peak times at the sites on the array perimeter and the time t0 provide the time tags used in the TOA calculations that determine the coordinates (x, y, z) of the source by finding where the hyperbolic surfaces intersect (Fig. 3). (Excerpt from LDAR Computer-Based Training, USAF/NASA/NWS Applied Meteorology Unit, ENSCO, Inc., used with permission.) See color insert.
by these two configurations agree to within 5% or 350 m, whichever is greater, then the mean of the two solutions is used as the stepped leader location. If this double solution fails, then a computationally more expensive solution is used, using all 20 possible combinations of four antennas (29,87). If these 20 solutions satisfy the quality control criteria, then a weighted solution is used for the stepped leader location (204). If the 20 solutions do not satisfy the criteria, then no solution is displayed. The system can process up to 104 pulses per second (22, p. 153). Flash detection efficiency and discharge location accuracy have been examined in a number of cross-sensor studies. One study that compared detections by a network of field mills with those by LDAR found that both systems reported a comparable number of flashes (38). The location accuracy of the LDAR system was tested by using aircraft that had a 66-MHz pulse generator as ground truth and by matching the LDAR location estimates to the coordinates of the aircraft by using the GPS (29). Figure 31 shows the three-dimensional accuracy with range that was obtained. Within the sensor baseline length of 10 km, the median errors are less than 100 m when the signal originates above 3 km, and are slightly higher when the signal originates below 3 km; at a 40-km range, the median error is ∼900 m. Much of this error is along a radial from the central site, and considerably less error is in the azimuthal component. Similar location error estimates have been obtained theoretically (87). Other test aircraft were used to examine the detection rates of LDAR. In an aircraft data set of more than 300,000 events, 97.4% were detected by the LDAR system. For those events within 25 km of the central site, the detection rate was more than 99%. Finally, flash detection efficiencies in the 90- to 200-km range were studied using observations from the satellite-based Lightning Imaging Sensor and the National Lightning Detection Network. LDAR flash detection efficiency in the medium to far range (50 to 300 km) varies from greater than 90% in the 90–100 km range but decreases to less than 25% at a 200-km range. Flash location errors in the medium to far range increase from 1 km at a 50 km range to ±7 km between the 175and 250-km range (31). Location accuracy 3-D location error (m)
System Characteristics. The arrangement of seven sensors around the KSC, shown in Fig. 29 from (196), is akin to the multiple-sensor arrangement of Proctor (199). Six outlying sensors form a circle whose radius is approximately 8 km around the central site (site 0) and so provide a web of coverage around the KSC complex (87,196,204). Because the power received by the VHF emissions decreases 10 dB for every 71 km (30,206), the KSC network is optimized to track thunderstorms in the immediate vicinity of KSC, up to 150 km away in some cases (Fig. 24c). VHF emissions are received and processed at the six outlying sites and then transmitted to the central site. When site 0 receives a VHF emission exceeding a predetermined threshold, the system is triggered, and the central site opens a 100-µs data window to determine the times relative to the trigger time, known as time tagging, of LDAR events between all pairs of receivers (Fig. 30); time tagging can be accomplished at 10-ns resolution using the global positioning system (GPS) (29,30,196). During this window, the system determines the peak signal level at each of the seven sites, tags it, and adjusts for the time delay during transmission to the central site. In principle, detection at only four sites is required, three outlying sites and the central site 0, to map an event (204). In practice, detection at all seven sites is required for better quality results (87). The antennas are divided into two optimal ‘‘Y’’ configurations, antennas (0, 1, 3, 5) and (0, 2, 4, 6), and the central site is common to both. If the solutions given
943
1000 800 600 400 200 0 0
10 20 30 Horizontal distance (km)
40
Figure 31. Three-dimensional location accuracy achieved by the LDAR network as a function of range from the central site 0. Within the sensor baseline of 10 km, the median three-dimensional error is ∼100 m when the source is above 3 km and increases to ∼900 m at a 40-km range. Most error is in the radial direction. (Excerpt from LDAR Computer-Based Training, USAF/NASA/NWS Applied Meteorology Unit, ENSCO Inc., used with permission.)
944
LIGHTNING LOCATORS
Aircraft flying in clouds in the vicinity of the network sometimes generate false alarms. Corona discharge from triboelectrification, the production of electrostatic charges by friction, of the aircraft can be detected by LDAR. Fortunately, the footprint of an aircraft on the display terminal has a distinctive three-dimensional, temporal evolutionary pattern that is easily discernible from natural events to the practiced eye. When these detections are
removed from the dataset, the false alarm rate, which is the ratio of the number of incorrect detections to the total number of detections (156, pp. 240–241), is of the order of 0.01%. Display Examples. A real-time example of two thunderstorms, as seen on the LDAR terminal at KSC, is shown in Fig. 32. In this figure, the lower left panel is a zoomed
Figure 32. LDAR display illustrating a cloud–ground stroke. The lower left panel is a zoomed x,y depiction of part of the KSC complex approximately 15 nautical miles south of the central site, where x increases to the east and y increases to the north. The upper left panel is a view in the x,z plane as seen from the south, and the lower right panel is a cross-sectional view in the y,z plane as seen from the west; here z is the altitude above the ground in kilofeet. Each dot represents a pulse from a radio-frequency (RF) source, and a coherent collection of dots maps the lightning channel. Quasi-horizontal collections indicate cloud discharges, as seen between −4 and 0 nautical miles in the upper left panel; vertical collections indicate cloud–ground strokes, as seen at −7 nautical miles. The cloud–ground stroke here does not extend completely to the surface because the near-surface elevations are below the line of sight of the antennas in the network. The upper right panel provides a 1-minute count of LDAR events. (Excerpt from LDAR Computer-Based Training, USAF/NASA/NWS Applied Meteorology Unit, ENSCO, Inc., used with permission.) See color insert.
LIGHTNING LOCATORS
x, y, or plan view, depiction of part of the KSC complex approximately 15 nautical miles south of the central site, where x increases to the east and y increases to the north. The upper left panel is a cross-sectional view in the x, z plane as seen from the south, and the lower right panel is a cross-sectional view in the y, z plane as seen from the west; here z is the altitude above the ground. Conceptually, the upper left and lower right panels can be folded upward out of the page, so as to create a box in the lower left corner, in which the LDAR detections are projected onto the three sides. Each dot represents an event, which is normally taken as the tip of a negative stepped leader pulse within the lightning channel (86). The collection of dots maps the lightning channel; quasi-horizontal collections indicate cloud discharges and vertical collections indicate cloud–ground strokes. The upper right panel provides a 1-minute count of the number of LDAR events detected. In the example in Fig. 32, there are two convective areas. The one to the west near −7 nautical miles is more concentrated in the x, y view (lower left panel) and has a distinctive cloud–ground flash seen by the alignment of the pixels in the x, z view (upper left). The signals originate in the mixed-phase region near 40,000 feet and then form a cone as the channel propagates toward the surface. Note the lack of data below 6,000 feet — consequence of the network’s inability to detect events toward the horizon farther away from the central site. The CGLSS VLF/LF network of sensors (Fig. 28) that is collocated with the LDAR network identifies the strike point of the cloud–ground flashes, and this point is included on the real-time displays now in use. The cloud event in the right portion of the panels in Fig. 32 is diffuse in all projections, indicative of a decaying storm that has no concentration in signal and so no well-defined stroke channel. A unique observation by the LDAR system at CCAFS/KSC is one in which a cloud–air discharge descended, well away from the storm edge, to the surface as a cloud–ground stroke (Fig. 33). In this figure, the cloud–air discharge, seen in the upper x, z panel, emanates from the concentrated region of signals approximately at the 25,000-foot level. After propagating westward for approximately 5 nautical miles, the discharge forks into a southward and upward moving cloud–air branch and a northwestward and downward moving cloud–ground stroke. Such bolt-out-of-the-blue strokes have also been seen by using the similar LMA/LMS deployable research system operated by New Mexico Tech (32,107). The LDAR system enables forecasters and researchers to analyze the origins of all types of lightning flashes and to visualize their development and propagation in threedimensional space. Although the CCAFS/KSC network is the only VHF network in continuous real-time use in the United States, its location in a region of Florida that has a high-thunderstorm frequency allows the development and examination of large data sets. The KSC plans several upgrades to LDAR within the next several years. First, an additional antenna will be deployed near Orlando, Florida, approximately 40 nautical miles to the west of the KSC complex. This eighth antenna, it is hoped, will provide better warnings of lightning over the peninsula of Florida, where the concentration
945
Nautical miles 60 K i l o f e e t
50 40 30 20 10 0
−30
−20
−10
0
10
20
30
Nautical miles 18
N a u t i c a l
12 6 0 −6
m i l e s
−12 −18 −30
−20
−10
0
10
20
30
Figure 33. LDAR display of a bolt-out-of-the-blue. The x,z cross section depicted in the top panel and the x,y plan view in the bottom panel show a lightning channel extending horizontally about 5 nautical miles from the main cluster of events, where it splits; one branch extends southward and upward as a cloud–air flash, and the other branch extends northwestward and downward as a cloud–ground stroke. (Excerpt from LDAR Computer-Based Training, USAF/NASA/NWS Applied Meteorology Unit, ENSCO, Inc., used with permission.)
of storms is higher. This region is upstream of the KSC in the southwesterly flow regime that is predominant in the summertime (207). Second, studies are under way to attempt to link LDAR flash rates to the likelihood of severe weather, such has been done with OTD and SAFIR. Finally, Global Atmospherics, Inc., the NLDN provider, has expressed interest in developing LDAR commercially (30). Perhaps the future of LDAR is to provide public warnings of lightning. SAFIR: AN INTERFEROMETRIC THREE-DIMENSIONAL NETWORK MAPPER A second approach for mapping a lightning channel in three dimensions uses interferometric analysis of the radiated VHF signals to determine the bearing to the source (24,25,27). In contrast to the TOA mapping technique used by the Lightning Detection and Ranging (LDAR) system that detects discrete events lasting from 1 to 20 µs associated with negative stepped leaders, the interferometric (ITF) mapping technique detects the more continuous bursts of radio noise of <10-nanosecond (denoted ns) duration (27), which can last for tens to hundreds of microseconds; these are associated with stepped leaders, dart leaders, and K-processes or Q-trains (86,126,137). Interferometric techniques have been applied throughout the VHF
946
LIGHTNING LOCATORS
frequency band (30–300 MHz) in a number of independent efforts, and many have produced highly successful location results (24,25,27,65,126,137,208–212). Radio Interferometry Interferometric analysis requires at least one pair of antennas to determine the direction angle θ of the radial to the source of the lightning signal; two pairs of two antennas that have perpendicular baselines are needed to determine both θ and the angle of elevation of the radial in three dimensions (22, p. 155). As reviewed earlier in the section ‘‘Interferometry’’ and illustrated in Fig. 4, the output from the two antennas separated by a distance D — called the baseline length — are sent to a phase detector, which in turn produces a signal that is a function of the phase difference φ between the two signals (212). The classic phase detector acts by amplifying and mixing the output of each antenna using a local oscillator before multiplying these signals together (22, p. 156). Assuming that the radiation propagates as a plane wave, one finds that the phase difference φ is a simple function of the direction angle θ , the wavelength of the radiation λ, and the baseline length D (22, p. 155; 137): φ=
2π D cos θ. λ
(6)
The antenna configuration described in Fig. 4 is for the special case when D = λ/2 for which φ = π cos θ . Because bearings of either +θ or −θ are associated with the same phase angle φ, bearing ambiguity is inevitable. In addition, phases that necessarily vary only between 0 and 2π will not be associated with a unique value of |θ | if the baseline length D of the interferometer is large relative to the wavelength λ of the sampled radiation. Indeed, such ambiguities occur if D > λ/2, as is often the case in antenna configurations in interferometers. For 100 MHz VHF radiation traveling at the speed of light, c = 3 × 108 m/s, the wavelength λ is 3 m. If the baseline length is D = 4λ = 12 m, then the equal phases φ of ±2π , ±4π , ±6π and ±8π are given by eight direction angles θ : 0 ° , 41.4 ° , 60 ° , 75.5 ° , 105.5 ° , 120 ° , 138.6 ° , and 180 ° . These cycles of repeating phase differences are referred to as fringes (22, p. 155; 137); in general there are 2D/λ such fringes, or eight in the preceding example, and one for the case illustrated in Fig. 4. Because it is necessary to unravel the preceding direction angle ambiguities produced by baseline lengths that are much longer than the wavelength of the radiation, one might expect that short baselines would be optimal. It can be demonstrated, however, that baseline length is inversely proportional to the angular resolution that can be achieved from a measurement (22, p. 155). Reading the hands of a clock to tell the time provides a good analogy. The hour hand unambiguously gives the time of day; in principle, if it could be read with sufficient accuracy, then it would yield the exact time. Although the minute hand does not uniquely reveal the time by itself, the hand allows greater resolution of it. Using the two hands simultaneously provides the required temporal accuracy, and this combination suggests an appropriate baseline
configuration for interferometric measurements (137,211). Two collinear baselines are used; one has a baseline length D = λ/2, resulting in one fringe and no ambiguities, but poor resolution, and one has a longer baseline length D = 4λ, resulting in one complete fringe over the resolution of the short-baseline antenna (22, pp. 156–157; 137,213). Other combinations of short and long baselines have been used to good advantage (108). Once accurate direction angles and angles of elevation are established to define a radial to the radiation source, a second array of antennas located some distance away is used to obtain a second radial. These two radials establish the lightning source location in three dimensions by triangulation. Because the absolute uncertainty in location increases with the range from the antennas, it can prove difficult to achieve sufficient vertical resolution of the source locations for lightning channel mapping unless the antenna arrays are separated no more than a few tens of kilometers from each other and from the radiation source (22, p. 157). For example, threedimensional mapping of spider intracloud lightning using interferometers is described in (108,109), as well as in (214), in which an array of two interferometers that are separated by 40 km were used. Plan view radiation location can be established over much greater ranges, sometimes as far as hundreds of kilometers (65). The capabilities of an interferometric system are largely defined by parameters related to the configuration of their antenna arrays. The arrangements and baseline lengths are adjusted to best suit the type of, and the range to, the lightning phenomena under study. Also of importance are the choice of center frequency and bandwidth over which the receivers operate. Electromagnetic radiation at different frequencies corresponds to different physical phenomena and is also affected differently by local sources of noise. Because the operating frequencies f are related to the necessary dimensions D of the antenna baselines, which are a few multiples, or near-multiples (108), of the sampled wavelength λ = c/f , where c is the speed of light, systems at the higher end of the VHF range (300 MHz) will have shorter baselines (on the order of several meters) than systems operating near the lower end (30 MHz). The bandwidth over which the signal is analyzed affects the accuracy of the results. Averaging results over broader frequency ranges should result in more accurate measurements of the phase differences, much as could be achieved by using multiple baselines (215). In addition, the time interval over which an incoming signal is averaged should help to improve the accuracy in determining the direction angles. Increasing the bandwidth and the sampling time interval, however, introduces uncertainties. Increasing the bandwidth can contribute to signal decorrelation across the array, which is not a problem when signals arrive at right angles to the baseline, but becomes an ever increasing problem as signals arrive at angles more closely parallel to the baseline (137). Emissions at different frequencies f are associated with different wavelengths λ and so yield different phase angles φ, except when they are perpendicular to the baseline (Fig. 4). As a result, at different sampling frequencies, waves appear to come from
LIGHTNING LOCATORS
different directions θ [Eq. (6)], even if their arrivals at the two receivers are nearly simultaneous. Increasing the time interval over which signals are averaged improves accuracy, but also diminishes the temporal resolution of the system and so increases the likelihood of averaging two distinct signals as one (126). Interferometric techniques are also somewhat limited by the electronic techniques required to calculate the phase difference between two signals (22, pp. 156–157). Interferometric lightning location techniques have some unique advantages over those used by the other types of lightning location systems reviewed in this article. Interferometric techniques depend only on the phase of the incoming signal, not on the amplitude of the waveform (24). Thus, because propagation effects and noise can easily lead to significant difficulties in identifying the cause of particular signal amplitudes and waveforms (137), eliminating these variables greatly reduces errors introduced by local noise and propagation effects (212). VHF interferometric systems can typically achieve high temporal resolutions, of the order of tens of nanoseconds or less (27,213), and can thereby resolve nearly all stages of a lightning discharge. Using such high sampling rates, an interferometric system would have no problem distinguishing two separate storms in the same direction angle. Moreover, true three-dimensional characterization of the lightning sources is possible for systems that have high spatial resolution and short distances between detection stations. Even when vertical resolution cannot be established, the combination of high temporal resolution and the ability to detect many different types of lightning emissions can enable an interferometric system to obtain the plan view geometry of a lightning stroke that might extend horizontally for many kilometers. The SAFIR System Surveillance et Alerte Foudre par Interf´erom´etrie Radio´electrique (SAFIR) is an operational, real-time interferometric system that samples over 1-MHz bandwidths at a selectable center frequency between 110 and 118 MHz (22, p. 157). The temporal resolution of the system is high enough (∼10 µs) to eliminate the possibility of unwanted interference between disparate sources of radiation. SAFIR, originally developed by the French Office National d’Etudes et de Recherches A´erospatiales (ONERA), is now a product of the company Dimensions SA; the research version of the SAFIR system is called the ONERA three-dimensional interferometric mapper, or ONERA-3D (27,108,214). In February 2000, Dimensions SA was purchased by Vaisala, an international company based in Finland that specializes in producing environmental and meteorological measuring equipment. The original purpose of SAFIR was to provide assistance to the European space program (22, p. 157), but in the last decade, Vaisala Dimensions SA has developed SAFIR into a commercially available system for detecting lightning discharges. SAFIR is used widely in Europe by many weather agencies and forecasting centers, the European Space Center in French Guiana, numerous hydrology offices, the aviation industry, the military,
947
and several power companies (65), and it has also been used in Japan (26) and the United States (214) for fundamental lightning research and for comparisons with the performance of other lightning location techniques (86). Real-time lightning networks that use the same instruments as SAFIR are summarized in Table 1. System Characteristics. The combination of high temporal resolution and three-dimensional radiation source location allows SAFIR to offer a unique set of detection parameters (65). SAFIR maps the three-dimensional structure of lightning channels that sometimes extend for tens of kilometers (24,32,33). SAFIR can detect all types of lightning phenomena that emit VHF radiation, in particular negative stepped and dart leaders, cloud–cloud recoil streamers, streamer–leader transitions, and possibly even stratospheric discharges (27,86,137). Cloud–ground and intracloud (including cloud–cloud and cloud–air) strokes are identified preliminarily via the orientation of the lightning channel. Time-domain waveform analysis allows distinguishing between intracloud and cloud–ground strokes by using low-frequency (LF) sampling techniques. Such analysis is used to provide an independent position analysis, peak current value, peak current derivative, stroke polarity, wave rise time, wave decay time, stroke energy estimate, and a neutralized charge estimate. As for the LDAR/CGLSS system, combining the LF and VHF techniques yields higher detection efficiencies, stroke-type classification, and a more precise location of cloud–ground strokes than either technique alone. A wide input dynamic range, 100 dB, that is achieved at the detection stations allows detection of all lightning discharges over a broad area (65). Electrostatic field analyses have estimated that all of the SAFIR systems currently installed can achieve a 95% detection rate over a 200-km range from each detection station for low-elevation sources and over a 400-km range for highelevation sources. The total lightning detection range that can be achieved by three or four detection stations is shown in Fig. 34. As for LDAR, the accuracy of the source locations given by the SAFIR system has been verified using aircraft (65). This accuracy was verified by using a helicopter, equipped with a VHF transmitter radiating pulsed broadband VHF signals simulating lightning radiation. A helicoptermounted radiation source was located by the SAFIR system and this was compared with the location given by tracking radar. Each interferometric detection station achieved a verified angular error of 0.5 ° in placing the radials in three dimensions (65), and the subsequent three-dimensional location on a short-baseline SAFIR system (30 km between detection stations) was 500 m rms and 750 m rms for a medium-baseline standard system (120 km between detection stations). Application Examples. One example of the utility of the system is demonstrated by its ability to distinguish between cloud–ground and intracloud lightning phenomena (65). Figures 35a,b,c depict the lightning strokes from a storm as it progressed from the southwest to the northeast over the Netherlands (65). Each color represents a 1-hour increment; blue indicates the oldest, and
948
LIGHTNING LOCATORS
Figure 34. Detection ranges available for two SAFIR systems; that on the left uses three detection stations at the vertices of an equilateral triangle whose sides are 200 km long, and the other on the right uses four stations at the vertices of a square whose sides are 200 km long. The gray shading indicates how the location accuracy decreases as the range from the array of stations increases. [From (65)]. (Used with the permission of VAISALA DIMENSIONS SA.)
red the newest, observations. Figure 35a shows the complete locations of all lightning discharges. Note that some lightning is mapped as linear streaks, rather than just points in space, indicating that SAFIR has captured some of the horizontal extent of individual discharges. Figure 35b displays only the locations and polarity of the cloud–ground strokes that were detected and located by the combined interferometric/LF system. Figure 35c shows how the numbers of total lightning detections and
3 stations
1000 km 4 stations
cloud–ground strokes varied throughout the storm. The combination of these three figures reveals that the location of a large majority of the lightning during the first four hours of the storm was limited to cloud discharges. Shortly after the intracloud lightning strokes reached a maximum value, the rate of occurrence of ground strokes increased rapidly. These results suggest that observations of intracloud frequency can be used to help guide predictions of cloud–ground strokes, which are the type of
(a)
Figure 35. SAFIR plan view displays of total lightning detections (a), cloud–ground detections (b), and total number of strokes of both types (c). The colors in (a) and (b) indicate the time of occurrence of the lightning, in 1-hour increments; dark blue to the southwest in the lower left is the oldest, and red to the northeast in the upper right is the newest. Initially, there were few cloud–ground strokes. Between the third and fourth hour, however, the number of intracloud strokes decreased and presaged the increase in the number of cloud–ground strokes during hour 4. Using the trends in the number of intracloud strokes together with a projection of the storm movement based on the rate of advance of the lightning areas may lead to improved warning of cloud–ground strokes. [From (65)]. (Used with the permission of VAISALA DIMENSIONS SA.) See color insert.
LIGHTNING LOCATORS
(b)
(c)
Figure 35. (Continued)
949
950
LIGHTNING LOCATORS
lightning hazard detected by VLF/LF systems such as the NLDN. Based on the information SAFIR revealed about this storm, it would have been possible to identify a dangerous and growing threat several hours in advance and then to predict the locations most likely to be severely affected by intense cloud–ground strokes. A second and somewhat similar case study confirmed the utility of SAFIR as a predictor of other sorts of thunderstorm hazards (65). The study compared the number of intracloud strokes during a storm with the location of damaging downbursts from thunderstorms, which are regions of air that rapidly move downward until they reach the ground and then spread horizontally to cause straight-line wind damage. This study found that the peak in intracloud lightning precedes the most intense portion of the downburst by an average of 7 minutes. This result is consistent with similar studies using OTD data (47). ABBREVIATIONS AND ACRONYMS A ac ALDARS ALEXIS ALS AM ARSI ASOS ATD B-field c C CCAFS CG CGLSS CGR3 CIGRE CLDN cm CRDF dB dBZ dc DF DMSP DOE EDOT E-field EFMS ELF EMP ESID ESP EW F FAA ´ FORTE GAI GHz GP-1
ampere alternating current automated lightning detection and reporting system array of low energy X-ray imaging sensors ASOS lightning sensor amplitude modulation Atmospheric Research Systems, Inc. automated surface observing system arrival time difference magnetic field speed of light coulomb Cape Canaveral Air Force Station cloud–ground cloud to ground lightning surveillance system cloud–ground ratio 3 Conference Internationale des Grands Reseaux Electriques Canadian Lightning Detection Network centimeter cathode ray direction finder; cathode ray direction finding decibels decibels relative to 1 mm6 /m3 ; unit of radar reflectivity direct current direction finder; direction finding defense meteorological satellite program department of energy E-field change sensor array electric field electric field measurement system extremely low frequency electromagnetic pulse electrical storm identification device enhanced sensitivity and performance east/west farads federal aviation administration fast on-orbit recording of transient events satellite Global Atmospherics, Inc. gigahertz Great Plains-1
GPATS GPS h HF Hz IC IMPACT ITF kA kHz km KSC kV LANL LDAR LF LIS LLP LMA LMS LPATS LPLWS LRLDN m MDF MF MHz min ms NALDN NASA NL NLDN nm NM NNBP NPBP ns NS NWS OLS ONERA OTD RF RNSS RTEP s SAFe SAFIR SOLLO SSC SSL TIPPs TLM TOA TREMBLE TRMM TSS UHF
global position and tracking systems global positioning system hours high frequency hertz intracloud improved accuracy from combined technology interferometer, inteferometry kiloamperes kilometers kilometers Kennedy Space Center kilovolts Los Alamos National Laboratory lightning detection and ranging low frequency lightning imaging sensor Lightning Location and Protection, Inc. lightning mapping array lightning mapping system lightning position and tracking system launch pad lightning warning system long-range lightning detection network meters magnetic direction finder; magnetic direction finding medium frequency megahertz minutes milliseconds North American Lightning Detection Network National Aeronautics and Space Administration network locator National Lightning Detection NetworkTM nanometer network mapper narrow negative bipolar pulse narrow positive bipolar pulse nanosecond north/south National Weather Service operational linescan system Office National d’Etudes et de Recherches A´erospatiales optical transient detector radio frequency range-normalized signal strength real time event processor seconds electric field measuring system marketed by dimensions SA Surveillance et Alerte Foudre par Interf´erom´etrie Radio´electrique sonic lightning location single-station flash counter single-station locator transionospheric pulse pairs transmission line model time of arrival thunder recording employed in mapping branched lightning events tropical rainfall measurement mission satellite thunderstorm sensor series ultrahigh frequency
LIGHTNING LOCATORS ULF USAF USGA UTC V VHF VLF Wb WD WWW yr µs 2-d 3-d
ultralow frequency United States Air Force United States Golf Association Universal Time Coordinated volts very high frequency very low frequency webers lightning warning systems World Wide Web year microseconds ohms two-dimensional three-dimensional
951
21. A. J. Sharp, Proc. 11th Int. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999-209261, Guntersville, AL, 1999, pp. 234–237. 22. D. R. MacGorman and W. D. Rust, The Electrical Nature of Storms, Oxford University Press, NY, 1998. 23. R. S. Massey et al., Proc. 11th Int. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999209261, Guntersville, AL, 1999, pp. 684–687. 24. P. Richard, Preprints, 16th Conf. Severe Local Storms and Conf. Atmos. Electr., American Meteorological Society, Boston, 1990, pp. J21–J26. 25. P. Richard, Proc. 9th Int. Conf. Atmos. Electr., vol. III, International Commission on Atmospheric Electricity, St. Petersburg, Russia, 1992, pp. 925–928. 26. K. Kawasaki et al., Geophys. Res. Lett. 21, 1,133–1,136 (1994).
BIBLIOGRAPHY 1. E. P. Krider, in J. R. Fleming, ed., Historical Essays on Meteorology 1919–1995, American Meteorological Society, Boston, 1996, pp. 321–350. 2. R. A. Watson-Watt and J. F. Herd, J. Inst. Electr. Eng. 64, 611–622 (1926). 3. F. Horner, Proc. Inst. Electr. Eng. 101, Part III: 383–390 (1954). 4. F. Horner, Proc. Inst. Electr. Eng. 104B, 73–80 (1957). 5. E. T. Pierce, in R. H. Golde, ed., Lightning, vol. 1, Academic Press, NY, 1977, pp. 351–384. 6. E. P. Krider and R. C. Noggle, J. Appl. Meteorol. 14, 252–256 (1975). 7. E. P. Krider, R. C. Noggle, and M. A. Uman, J. Appl. Meteorol. 15, 301–306 (1976).
27. G. Labaune, P. Richard, and A. Bondiou, Electromagnetics 7, 361–393 (1987). 28. C. Lennon and L. Maier, Proc. Int. Aerospace and Ground Conf. Lightning Static Electr., National Aeronautics and Space Administration, CP 3106, Cocoa Beach, FL, 1991, vol. II, pp. 89-1–89-10. 29. L. Maier, C. Lennon, T. Britt, and S. Schaefer, Proc. 6th Conf. Aviation Weather Syst., American Meteorological Society, Boston, 1995, pp. 305–309. 30. D. J. Boccippio, S. Heckman, and S. J. Goodman, J. Geophys. Res. 106D, 4,769–4,786 (2001). 31. D. J. Boccippio, S. Heckman, and S. J. Goodman, J. Geophys. Res. 106D, 4,787–4,796 (2001). 32. W. Rison, R. J. Thomas, P. R. Krehbiel, T. Hamlin, and J. Harlin, Geophys. Res. Lett. 26, 3,573–3,578 (1999).
8. E. P. Krider, C. D. Weidman, and R. C. Noggle, J. Geophys. Res. 82, 951–960 (1977).
33. P. R. Krehbiel et al., Proc. 11th Int. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999209261, Guntersville, AL, 1999, pp. 376–379.
9. C. D. Weidman and E. P. Krider, J. Geophys. Res. 83, 6,239–6,247 (1978).
34. R. J. Thomas et al., Geophys. Res. Lett. 27, 1,703–1,706 (2000).
10. C. D. Weidman and E. P. Krider, J. Geophys. Res. 84C, 3,159–3,164 (1979).
35. E. A. Jacobson and E. P. Krider, J. Atmos. Sci. 33, 103–117 (1976).
11. E. P. Krider, R. C. Noggle, A. E. Pifer, and D. L. Vance, Bull. Am. Meteorol. Soc. 61, 980–986 (1980).
36. D. E. Harms, B. F. Boyd, R. M. Lucci, and M. W. Maier, Preprints, 10th Symp. Meteorol. Obs. Instrum., American Meteorological Society, Boston, 1998, pp. 317–322.
12. K. L. Cummins, R. B. Pyle, and G. Fournier, Proc. 11th Int. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999-209261, Guntersville, AL, 1999, pp. 218–221.
38. J. S. Nisbet et al., J. Geophys. Res. 95D, 5,417–5,433 (1990).
13. K. L. Cummins et al., J. Geophys. Res. 103D, 9,035–9,044 (1998).
40. D. MacKerras et al., J. Geophys. Res. 103D, 19,791–19,809 (1998).
14. R. S. Wacker and R. E. Orville, J. Geophys. Res. 104D, 2,151–2,157 (1999).
41. Y. Yair, Z. Levin, and O. Altaratz, J. Geophys. Res. 103D, 9,015–9,025 (1998).
15. R. S. Wacker and R. E. Orville, J. Geophys. Res. 104D, 2,159–2,162 (1999).
42. V. Cooray, J. Geophys. Res. 91D, 2,835–2,842 (1986).
16. W. P. Roeder, B. F. Boyd, and D. E Harms, Proc. Conf. Lightning and Static Electr., National Interagency Coordination Group, Orlando, FL, 2000. 17. A. C. L. Lee, J. Atmos. Oceanic Technol. 3, 630–642 (1986). 18. A. C. L. Lee, Q. J. R. Meteorol. Soc. 115, 1,147–1,166 (1989). 19. N. Daley et al., Preprints, 2000 Int. Lightning Detect. Conf., Global Atmospherics, Inc., Tucson, AZ, 2000; http://www.glatmos.com/news/ildc− schedule.htm. 20. J. A. Cramer and K. L. Cummins, Proc. 11th Int. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999-209261, Guntersville, AL, 1999, pp. 250–253.
37. E. P. Krider, J. Geophys. Res. 94D, 13,145–13,149 (1989). 39. D. Mackerras, J. Geophys. Res. 90D, 6,195–6,201 (1985).
43. H. J. Christian, R. J. Blakeslee, and S. J. Goodman, J. Geophys. Res. 94D, 13,329–13,337 (1989). 44. S. J. Goodman and H. J. Christian, in R. J. Gurney, J. L. Foster, and C. L. Parkinson, eds., Atlas of Satellite Observations Related to Global Change, Cambridge University Press, Cambridge, England, 1993, pp. 191–219. 45. H. J. Christian et al., Proc. 10th Int. Conf. Atmos. Electr., Osaka, Japan, 1996, pp. 368–371. 46. D. J. Boccippio et al., J. Atmos. Oceanic Technol. 17, 441–458 (2000). 47. E. Buechler, K. T. Driscoll, S. J. Goodman, and H. J. Christian, Geophys. Res. Lett. 27, 2,253–2,256 (2000).
952
LIGHTNING LOCATORS
48. H. J. Christian et al., Proc. 11th Int. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999209261, Guntersville, AL, 1999, pp. 746–749. 49. A. R. Jacobson, S. O. Knox, R. Franz, and D. C. Enemark, Radio Sci. 34, 337–354 (1999).
70. R. H. Golde, ed., The Physics of Lightning, vol. 1, Academic Press, NY, 1977. 71. D. J. Malan, Physics of Lightning, The English Universities Press, London, 1963.
50. D. M. Suszcynsky et al., J. Geophys. Res. 105D, 2,191–2,201 (2000).
72. V. A. Rakov and M. A. Uman, Lightning: Physics and Effects (Encyclopedia of Lightning), Cambridge University Press, NY, 2002.
51. A. R. Jacobson et al., J. Geophys. Res. 105D, 15,653–15,662 (2000).
73. M. A. Uman, Lightning, McGraw-Hill, NY, 1969; republished by Dover, NY, 1984.
52. D. N. Holden, C. P. Munson, and J. C. Davenport, Geophys. Res. Lett. 22, 889–892 (1995).
74. M. A. Uman, The Lightning Discharge, Academic Press, Orlando, FL, 1987.
53. R. S. Zuelsdorf, R. C. Franz, R. J. Strangeway, and C. T. Russell, J. Geophys. Res. 105D, 20,725–20,736 (2000).
75. H. Volland, Handbook of Atmospherics, vol. 1, CRC Press, Boca Raton, FL, 1982.
54. G. Sampson, ASOS Lightning Sensor Assessment, Western Region Technical Attachment No. 97-31, NOAA, Salt Lake City, Utah, 1997; http://www.wrh.noaa.gov/wrhq/97TAs/ TA9733/TA97-33.html.
76. H. Volland, Handbook of Atmospherics, vol. 2, CRC Press, Boca Raton, FL, 1982. ˚ 77. L. Wahlin, Atmospheric Electrostatics, Research Studies Press, Letchworth, England, 1986.
55. R. Markson and L. Runkle, Proc. 11th Int. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999-209261, Guntersville, AL, 1999, pp. 188–191.
78. M. Yamshita and K. Sao, J. Atmos. Terrestrial Phys. 36, 1,623–1,632 (1974).
56. D. W. Mazeroski, MS Thesis, Penn State University, 2000.
79. M. Yamshita and K. Sao, J. Atmos. Terrestrial Phys. 36, 1,633–1,641 (1974).
57. D. W. Mazeroski, H. N. Shirer, and H. W. Shirer, Preprints, 2000 Int. Lightning Detection Conf., Global Atmospherics, Inc., Tucson, AZ, 2000, http://www.glatmos.com/news/ ildc− schedule.htm. 58. V. A. Rafalsky, A. P. Nickolaenko, and A. V. Shvets, J. Geophys. Res. 100D, 20,829–20,838 (1995). 59. P. J. Medelius, System Locates Lightning Strikes to Within Meters, NASA Technical Briefs Online, KSC11785, Associated Business Publications, NY, 2000; http://www.nasatech.com/Briefs/July00/KSC11992.html. 60. E. R. Williams, S. G. Geotis, and A. B. Bhattacharya, in D. Atlas, ed., Radar in Meteorology, American Meteorological Society, Boston, 1990, pp. 143–150. 61. R. Kithil, Proc. Int. Aerospace and Ground Conf. Lightning Static Elec., US. Navy, Naval Air Warfare Center, Aircraft Division, NAWCADPAX — 95-306-PRO, Williamsburg, VA, 1995, pp. 9-1–9-10 (70–79). Also published on the National Lightning Safety Institute web site at http://www.lightningsafety.com/nlsi− lls/sec.html. 62. R. Kithil, Proc. 11th Int. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999-209261, Guntersville, AL, 1999, pp. 204–206. 63. R. L. Holle and M. A. Cooper, Preprints, 2000 Int. Lightning Detection Conf., Global Atmospherics, Inc., Tucson, AZ, 2000; http://www.glatmos.com/news/ildc− schedule.htm. 64. W. P. Roeder, Lightning Safety web page, 45th Weather Squadron, USAF, Patrick AFB and Cape Canaveral AFS, Florida, 1999; http://www.patrick.af.mil/45og/45ws/ LightningSafety/index.htm. 65. Vaisala Dimensions SA, SAFIR web page, Vaisala Company, Meyreuil, France, 2000; http://www.eurostorm.com/. 66. East-Central Florida Meteorological Research web site, National Weather Service, Melbourne, Florida, 2000, http://WWW. SRH.NOAA.GOV/MLB/rsrchamu.html. 67. S. J. Goodman et al., Proc. 11th Int. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999209261, Guntersville, AL, 1999, pp. 515–518.
80. M. A. Uman, Y. T. Lin, and E. P. Krider, Radio Sci. 15, 35–39 (1980). 81. F. Horner, in J. A. Saxton, ed., Advances in Radio Research, vol. 2, Academic Press, NY, 1964, pp. 121–204. 82. B. D. Herrman, M. A. Uman, R. D. Brantley, and E. P. Krider, J. Appl. Meteorol. 15, 402–405 (1976). 83. E. A. Lewis, R. B. Harvey, and J. E. Rasmussen, J. Geophys. Res. 65, 1,879–1,905 (1960). 84. N. Cianos, G. N. Oetzel, and E. T. Pierce, J. Appl. Meteorol. 11, 1,120–1,127 (1972). 85. D. E. Harms et al., Preprints, 28th Conf. Radar Meteorol. Obs. Instrum., American Meteorological Society, Boston, 1997, pp. 240–241. 86. V. Mazur et al., J. Geophys. Res. 102D, 11,071–11,085 (1997). 87. W. J. Koshak and R. J. Solakiewicz, J. Geophys. Res. 101D, 26,631–26,639 (1996). 88. W. P. Roeder and C. S. Pinder, Preprints, 16th Conf. Weather Anal. Forecasting, American Meteorological Society, Boston, 1998, pp. 475–477. 89. S. Heckman, Proc. 11th Inter. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999-209261, Guntersville, AL, 1999, pp. 719–721. 90. A. A. Few and T. L. Teer, J. Geophys. Res. 79, 5,007–5,011 (1974). 91. R. H. Holzworth, in H. Volland, ed., Handbook of Atmospheric Electrodynamics, vol. 1, CRC Press, Boca Raton, FL, 1995, pp. 235–266. 92. P. R. Krehbiel, in Geophysics Study Committee, ed., The Earth’s Electrical Environment, National Academy Press, Washington, D.C., 1986, pp. 90–113. ¨ 93. R. Muhleisen, in H. Dolezalek and R. Reiter, eds., Electrical Processes in Atmospheres, Dr. Dietrich Steinkopff, Darmstadt, 1977, pp. 467–476.
68. E. M. Bazelyan and Y. P. Raizer, Lightning Physics and Lightning Protection, Institute of Physics, Bristol, 2000.
94. H. Volland, in L. J. Lanzerotti and M. Hill, eds., Atmospheric Electrodynamics, Physics and Chemistry in Space, vol. 11, Springer-Verlag, Berlin, 1984.
69. R. L. Gardner, ed., Lightning Electromagnetics, Hemisphere, NY, 1990.
95. C. T. R. Wilson, Philos. Trans. R. Soc. London, Ser. A 221, 73–115 (1920).
LIGHTNING LOCATORS 96. R. G. Roble and I. Tzur, in Geophysics Study Committee, ed., The Earth’s Electrical Environment, National Academy Press, Washington, D.C., 1986, pp. 206–231. 97. T. C. Marshall and N. Stolzenburg, J. Geophys. Res. 106D, 4,757–4,768 (2001). 98. E. R. Williams, J. Geophys. Res. 94D, 13,151–13,167 (1989). 99. M. Stolzenburg, W. D. Rust, and T. C. Marshall, J. Geophys. Res. 103D, 14,097–14,108 (1998). 100. P. R. Krehbiel, M. Brook, and R. A. McCrory, J. Geophys. Res. 84C, 2,432–2,456 (1979). 101. D. E. Proctor, J. Geophys. Res. 96D, 5,099–5,112 (1991).
953
126. P. Richard, A. Delannoy, G. Labaune, and P. Laroche, J. Geophys. Res. 91D, 1,248–1,260 (1986). 127. V. A. Rakov and M. A. Uman, J. Geophys. Res. 95D, 5,447–5,453 (1990). 128. M. Ishii, K. Shimizu, J. Hojo, and K. Shinjo, Proc. 24th Int. Lightning Protection Conf., Birmhingham, UK, 1998, vol. 1, pp. 11–16. 129. D. M. Mach and W. D. Rust, J. Geophys. Res. 98D, 2,635–2,638 (1993). 130. V. P. Idone and R. E. Orville, J. Geophys. Res. 87C, 4,903–4,915 (1982).
102. Thunderstorm Technology, NASA/KSC Launch Pad Lightning Warning System web page, Huntsville, AL, 1997, http://www.tstorm.com/lplws.html.
131. K. Berger, R. B. Anderson, and H. Kroniger, Electra 80, 23–37 (1975).
103. S. A. Pierce, J. Appl. Meteorol. 9, 194–195 (1970).
133. V. P. Idone et al., J. Geophys. Res. 98D, 18,323–18,332 (1993).
104. S. A. Prentice and D. MacKerras, J. Appl. Meteorol. 16, 545–550 (1977). 105. I. I. Kononov, I. A. Petrenko, and I. E. Yusupov, Proc. 25th Int. Lightning Protection Conf., Rhodos, Greece, 2000, 7 pp.; Also published at http://thunderstorm.newmail.ru/publications/iclp2000/ICLP2000.htm. 106. G. N. Oetzel and E. T. Pierce, in S. C. Coroniti and J. Hughes, eds., Planetary Electrodynamics, Gordon and Breach, NY, 1969, pp. 543–571. 107. P. R. Kriebel et al., EOS 81, 21, 22, 25 (2000). 108. X. M. Shao and P. R. Krehbiel, J. Geophys. Res. 101D, 26,641–26,668 (1996). 109. V. Mazur, X. M. Shao, and P. R. Krehbiel, J. Geophys. Res. 103D, 19,811–19,822 (1998). 110. M. A. Uman et al., J. Geophys. Res. 80, 373–376 (1975). 111. E. P. Krider, in Geophysics Study Committee, ed., The Earth’s Electrical Environment, National Academy Press, Washington, D.C., 1986, pp. 30–40. 112. M. A. Uman et al., Radio Sci. 11, 985–990 (1976). 113. C. Lucas and R. E. Orville, Mon. Weather Rev. 124, 2,077–2,082 (1996). 114. O. Pinto Jr., et al., Proc. 11th Int. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999209261, Guntersville, AL, 1999, pp. 62–64. 115. J. C. Willett, J. C. Bailey, and E. P. Krider, J. Geophys. Res. 94D, 16,255–16,267 (1989). 116. M. Ishii and J. -I. Hojo, J. Geophys. Res. 94D, 13,267–13,274 (1989). 117. M. Le Boulch, J. Hamelin, and C. Weidman, in R. L. Gardner, ed., Lightning Electromagnetics, Hemisphere, NY, 1990, pp. 211–255.
132. R. E. Orville, J. Geophys. Res. 96D, 17,135–17,142 (1991).
134. G. I. Serhan, M. A. Uman, D. G. Childers, and Y. T. Lin, Radio Sci. 15, 1,089–1,094 (1980). 135. C. D. Weidman and E. P. Krider, Radio Sci. 21, 964–970 (1986). 136. J. E. Nanevicz, E. F. Vance, and J. M. Hamm, in R. L. Gardner, ed., Lightning Electromagnetics, Hemisphere, NY, 1990, pp. 191–210. 137. C. T. Rhodes et al., J. Geophys. Res. 99D, 13,059–13,082 (1994). 138. D. Levine and E. P. Krider, Geophys. Res. Lett. 4, 13–16 (1977). 139. E. M. Thomson et al., J. Geophys. Res. 89D, 4,910–4,916 (1984). 140. V. A. Rakov and M. A. Uman, J. Geophys. Res. 95D, 5,455–5,470 (1990). 141. E. M. Thomson, J. Geophys. Res. 85C, 1,050–1,056 (1980). 142. K. Berger, in R. H. Golde, ed., Lightning, vol. 1, Academic Press, NY, 1977, pp. 119–190. 143. V. Mazur and L. Ruhnke, 12,913–12,930 (1993).
J.
Geophys.
Res.
98D,
144. Y. T. Lin et al., J. Geophys. Res. 84C, 6,307–6,324 (1979). 145. W. D. Rust, W. L. Taylor, D. R. MacGorman, and R. T. Arnold, Bull. Amer. Meteorol. Soc. 62, 1,286–1,293 (1981). 146. W. D. Rust, W. L. Taylor, D. R. MacGorman, and R. T. Arnold, Geophys. Res. Lett. 8, 791–794 (1981). 147. N. D. Murray, R. E. Orville, and G. R. Huffines, Geophys. Res. Lett. 27, 2,249–2,252 (2000). 148. T. B. McDonald, M. A. Uman, J. A. Tiller, and W. H. Beasley, J. Geophys. Res. 84C, 1,727–1,734 (1979).
118. Y. Villanueva, V. A. Rakov, and M. A. Uman, J. Geophys. Res. 99D, 14,353–14,360 (1994).
149. M. J. Master, M. A. Uman, W. H. Beasley, and M. Darveniza, IEEE Trans. PAS PAS-103, 2,519–2,529 (1984).
119. D. M. LeVine, J. Geophys. Res. 85C, 4,091–4,905 (1980). 120. D. A. Smith et al., J. Geophys. Res. 104D, 4,189–4,212 (1999).
¨ 150. M. Fullekrug, Proc. 11th Int. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999-209261, Guntersville, AL, 1999, pp. 709–711.
121. J. R. Bils, E. M. Thomson, M. A. Uman, and D. Mackerras, J. Geophys. Res. 93D, 15,933–15,940 (1988).
151. J. Preta, M. A. Uman, and D. G. Childers, Radio Sci. 20, 143–145 (1985).
122. R. S. Zuelsdorf et al., Geophys. Res. Lett. 25, 481–484 (1998).
152. R. M. Passi and R. E. L´opez, J. Geophys. Res. 94D, 13,319–13,328 (1989).
123. T. Ogawa, in H. Volland, ed., Handbook of Atmospheric Electrodynamics, vol. 1, CRC Press, Boca Raton, FL, 1995, pp. 93–136. 124. M. Brook and N. Kitagawa, J. Geophys. Res. 69, 2,431–2,434 (1964). 125. C. D. Weidman, E. P. Krider, and M. A. Uman, Geophys. Res. Lett. 8, 931–934 (1981).
153. W. P. Roeder et al., Proc. 11th Int. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999209261, Guntersville, AL, 1999, pp. 238–241. 154. J. M. Cook, Preprints, 9th Symp. Meteorol. Obs. Instrum., American Meteorological Society, Boston, 1995, pp. 110–112.
954
LIGHTNING LOCATORS
155. ASOS Automated Surface Observing System web page, National Weather Service, NOAA, Silver Spring, Maryland, 2000, http://www.nws.noaa.gov/asos/. 156. D. S. Wilks, Statistical Methods in the Atmospheric Sciences, Academic Press, San Diego, California, 1995. 157. Local Area Products web page, Global Atmospherics, Inc., Tucson, Arizona, 2000, http://www.glatmos.com/products /local/localarea.html. 158. ALDARS: Automated Lightning Detection And Reporting System web page, National Weather Service, NOAA, Pueblo, Colorado, 2000, http://www.crh.noaa.gov/pub/ltg /Aldars.html. 159. K. A. Kraus, T. A. Seliga, and J. R. Kranz, Preprints, 16th Int. Conf. Interactive Inf. Process. Syst. (IIPS) Meteorol. Oceanic Hydrol., American Meteorological Society, Boston, 2000, pp. 106–109. 160. ASOS Version 2.60 Installation Status web page, National Weather Service, NOAA, Silver Spring, Maryland, 2000, http://www.nws.noaa.gov/aomc/vers260.htm. 161. H. W. Shirer and H. N. Shirer, Great Plains-1 Lightning Locator web page, Penn State University, University Park, Pennsylvania, 1998, http://bub2.met.psu.edu/default.htm. ¨ 162. T. Shutte, O. Salka, and S. Israelsson, J. Climatology Appl. Meteorol. 26, 457–463 (1987). 163. H. J. Christian, Proc. 11th Int. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999-209261, Guntersville, AL, 1999, pp. 715–718. 164. S. J. Goodman and H. J. Christian, in R. J. Gurney, J. L. Foster, and C. L. Parkinson, eds., Atlas of Satellite Observations Related to Global Change, Cambridge University Press, NY, 1993, pp. 191–219. 165. B. N. Turman, J. Geophys. Res. 83C, 5,019–5,024 (1978). 166. Operational Linescan System (OLS) web page, Marshall Space Flight Center, NASA, Huntsville, Alabama, 1999, http://thunder.msfc.nasa.gov/ols/. 167. Space Research and Observations web page, Marshall Space Flight Center, NASA, Huntsville, Alabama, 1999, http://thunder.msfc.nasa.gov/research.html. 168. H. J. Christian et al., Proc. 11th Int. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999209261, Guntersville, AL, 1999, pp. 726–729. 169. Optical Transient Detector (OTD) web page, Marshall Space Flight Center, NASA, Huntsville, Alabama, 1999, http://thunder.msfc.nasa.gov/otd/. 170. L. W. Thomson and E. P. Krider, J. Atmos. Sci. 39, 2,051– 2,065 (1982). 171. H. J. Christian, R. J. Blakeslee, S. J. Goodman, and D. M. Mach, Algorithm Theoretical Basis Document (ATBD) for the Lightning Imaging Sensor (LIS), Marshall Space Flight Center, NASA, Huntsville, Alabama, 2000, http://eospso.gsfc.nasa.gov/atbd/listables.html. 172. OTD, LIS, and LDAR Data Browse Gallery web page, Marshall Space Flight Center, NASA, Huntsville, Alabama, 2000, http://thunder.msfc.nasa.gov/data/. 173. K. Driscoll, White Paper on Lightning Detection from Space, Marshall Space Flight Center, NASA, Huntsville, Alabama, 1999, http://thunder.msfc.nasa.gov/bookshelf/docs/white− paper− driscoll.html. 174. E. Williams et al., Atmos. Res. 51, 245–265 (1999). 175. Lightning Imaging Sensor (LIS) web page, Marshall Space Flight Center, NASA, Huntsville, Alabama, 1999, http://thunder.msfc.nasa.gov/lis/.
176. T. Ushio et al., Proc. 11th Inter. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999209261, Guntersville, AL, 1999, pp. 738–741. 177. D. J. Boccippio, S. J. Goodman, and S. Heckman, J. Appl. Meteo. 39, 2,231–2,248 (2000). 178. S. J. Goodman et al., Geophys. Res. Lett. 27, 541–544 (2000). 179. LIS Ground Truth Browse Calendar web page, Marshall Space Flight Center, NASA, Huntsville, Alabama, 2000, http://thunder.msfc.nasa.gov/lightning-cgi-bin/lisgt/ lisgt− ref.pl. 180. Lightning Mapping Sensor (LMS) web page, Marshall Space Flight Center, NASA, Huntsville, Alabama, 1999, http://thunder.msfc.nasa.gov/lms/. 181. D. M. Mach, D. R. MacGorman, W. D. Rust, and R. T. Arnold, J. Atmos. Oceanic Technol. 3, 67–74 (1986). 182. R. E. Orville, R. W. Henderson, and R. B. Pyle, Preprints, 16th Conf. Severe Local Storms Conf. Atmos. Electr., American Meteorological Society, Boston, 1990, pp. J27–J30. 183. W. A. Lyons, R. B. Bent, and W. F. Highlands, Preprints, Int. Conf. Interactive Inf. Process. Sys. Meteorol. Oceanic Hydrol., American Meteorological Society, Boston, 1985, pp. 320–327. 184. IMPACT ESP sensor web page, Global Atmospherics, Inc., Tucson, Arizona, 2000, http://www.glatmos.com/products /wide/impactESP.html. 185. V. P. Idone et al., J. Geophys. Res. 103D, 9,045–9,055 (1998). 186. V. P. Idone et al., J. Geophys. Res. 103D, 9,057–9,069 (1998). 187. M. A. Uman, J. Geophys. Res. 90D, 6,121–6,130 (1985). 188. M. Ference, H. B. Lemon, and R. J. Stephenson, Analytical Experimental Physics, 2nd rev. ed., University of Chicago Press, Chicago, 1956. 189. J. C. Willett, E. P. Krider, and C. Leteinturier, J. Geophys. Res. 103D, 9,027–9,034 (1998). 190. R. E. Orville, Mon. Weather Rev. 119, 573–577 (1991). 191. Lightning Climatology web page, in Operational Uses of Lightning Data, AWIPS Informational Series, National Weather Service Training Center, NOAA, Silver Spring, Maryland, http://www.nwstc.noaa.gov/d.HMD/Lightning/ climo.htm. 192. G. N. Oetzel and E. T. Pierce, Radio Sci. 4, 199–201 (1969). 193. H. R. Arnold and E. T. Pierce, Radio Sci. J. Res. 68D, 771–776 (1964). 194. K. L. Zonge and W. H. Evans, J. Geophys. Res. 71, 1,519–1,523 (1966). 195. E. T. Pierce, in F. Horner, ed., Radio Noise of Terrestrial Origin, Elsevier, Amsterdam, 1962, pp. 55–71. 196. T. O. Britt, C. L. Lennon, and L. M. Maier, Lightning Detection and Ranging System, NASA Technical Briefs Online, KSC-11785, Associated Business Publications, New York, New York, 1998; http://www.nasatech.com/Briefs/Apr98 /KSC11785.html. 197. G. S. Forbes and S. G. Hoffert, in E. R. Hosler and G. Buckingham, eds., 1993 Research Reports, NASA/ASEE Summer Faculty Fellowship Program, National Aeronautics and Space Administration, CR-199891, 1995, pp. 195–224. 198. D. E. Proctor, J. Geophys. Res. 76, 1,478–1,489 (1971). 199. D. E. Proctor, J. Geophys. Res. 86C, 4,041–4,071 (1981). 200. R. C. Murty and W. D. MacClement, J. Appl. Meteorol. 12, 1,401–1,405 (1973).
LIQUID CRYSTAL DISPLAY TECHNOLOGY 201. W. L. Taylor, J. Geophys. Res. 83, 3,575–3,583 (1978). 202. D. E. Proctor, R. Uytenbogaardt, and B. M. Meredith, J. Geophys. Res. 93D, 12,683–12,727 (1988). 203. W. D. MacClement and R. C. Murty, J. Appl. Meteorol. 17, 786–795 (1978). 204. GHRC User Services, Lightning Detection and Ranging (LDAR) Dataset Summary, Global Hydrology and Climate Center, Huntsville, Alabama, 2000; http://ghrc.msfc.nasa. gov/uso/readme/ldar.html. 205. L. M. Maier, E. P. Krider, and M. W. Maier, Mon. Weather Rev. 112, 1,134–1,140 (1984). 206. D. J. Boccippio, S. Heckman, and S. J. Goodman, Proc. 11th Inter. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999-209261, Guntersville, AL, 1999, pp. 254–257. 207. C. J. Neumann, J. Applied Meteorol. 10, 921–936 (1971). 208. P. Richard and G. Auffray, Radio Sci. 20, 171–192 (1985). 209. C. O. Hayenga, J. Geophys. Res. 89D, 1,403–1,410 (1984). 210. V. Mazur, P. R. Krehbiel, and X. -M. Shao, J. Geophys. Res. 100D, 25,731–25,753 (1995). 211. X. -M. Shao, P. R. Krehbiel, R. J. Thomas, and W. Rison, J. Geophys. Res. 100D, 2,749–2,783 (1995). 212. J. W. Warwick, C. O. Hayenga, and J. W. Brosnahan, J. Geophys. Res. 84C, 2,457–2,463 (1979). 213. C. Rhodes and P. R. Krehbiel, Geophys. Res. Lett. 16, 1,169–1,172 (1989). 214. E. Defer, C. Thery, P. Blanchet, and P. Laroche, Proc. 11th Int. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999-209261, Guntersville, AL, 1999, pp. 14–17. 215. X. -M. Shao, D. N. Holden, and C. T. Rhodes, Geophys. Res. Lett. 23, 1,917–1,920 (1996). ´ V. Cooray, T. G¨otschl, and V. Scuka, Proc. 11th 216. A. Galvan, Int. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999-209261, Guntersville, AL, 1999, pp. 162–165. 217. T. J. Tuomi, Proc. 11th Int. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999-209261, Guntersville, AL, 1999, pp. 196–199. 218. S. Soula, G. Molini´e, S. Chauzy, and N. Simond, Proc. 11th Int. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999-209261, Guntersville, AL, 1999, pp. 384–387. 219. Y. Yair, O. Altaratz, and Z. Levin, Proc. 11th Int. Conf. Atmos. Electr., National Aeronautics and Space Administration, CP-1999-209261, Guntersville, AL, 1999, pp. 460–463.
LIQUID CRYSTAL DISPLAY TECHNOLOGY GREGORY P. CRAWFORD MICHAEL J. ESCUTI Brown University Providence, RI
INTRODUCTION As the display in most imaging systems is the final medium through which an image is rendered for manipulation and verification, an understanding of display technologies is essential to the imaging process. Because individuals
955
working in the field of imaging science and technology may spend more time looking at a display screen than at anything else in their office or laboratory, it is imperative that it be comfortable to use and appropriate for the particular context. Twenty years ago, system manufacturers often integrated the electronic display directly into the system to provide a complete package for a specified imaging application. Although this approach does not afford much flexibility, the display could be closely matched to a specific application because user requirements were well defined. This custom design approach enabled optimizing the graphics controller, system software, and user interface for the display, user, and application requirements. Despite the positive attributes of this ‘‘black-box’’ approach, such as high performance and superior application specific image quality, closed architecture platforms tend to be more expensive and suffer from incompatibility with peripheral add-ons and software packages not supported by the system manufacturer. Today, the situation is dramatically different due to the continual evolution of the graphics controller interface. By mixing images with text and graphics, software developers require more from the display to support moving images without diminishing display performance for static images. The graphical capability of today’s standard computer platforms has now made it unprofitable for vendors of imaging systems to develop their own displays for system-specific tasks. End users now typically purchase a computer platform, display, and a variety of other peripherals from multiple vendors and integrate them with ease (i.e., a plug-andplay philosophy). In such a marketplace, one must be well educated to match display technology to application needs. This article provides the reader with a fundamental knowledge of working principles of liquid crystal displays (LCDs), their capabilities, and their limitations.
ADDRESSING DISPLAYS Before we delve into the operation of a LCD, it is important to understand how these displays are addressed and their impact on resolution, refresh rates, and image fidelity. Many treatises begin with material and device configurations, but we will first develop a basic understanding of electrical addressing schemes that apply to all LCDs. Our hope is that the reader will be better prepared to recognize the capabilities and limitations of the various display configurations presented afterward. A LCD with high-information content (e.g., computer or television screen) consists of a two-dimensional array of pixels, where a pixel is defined as the smallest switching element of the array. If the two-dimensional array has a total of N rows and M columns (N × M pixels), then in principle, there can be N × M electrical connections to control each pixel independently. This is known as direct addressing and is practical only for very low-resolution displays. For medium and higher resolution displays,
956
LIQUID CRYSTAL DISPLAY TECHNOLOGY
addressing is accomplished through passive- and activematrix techniques. Both of these approaches require only N + M electrical connections, thereby greatly simplify the electronics, and make higher resolution possible. Luminance–voltage plots for three hypothetical displays are depicted in Fig. 1. This display characteristic ultimately dictates the type of addressing that can be used to create images by using a LCD. Luminance is the physical measure of brightness of a display or any surface and most commonly has units of candelas per squared meter (cd/m2 ), nits, or footlamberts (fL). The two measurable quantities from the luminance–voltage curve that have the greatest impact on display addressing are the threshold voltage VTH (the voltage at which the luminance begins to increase) and a parameter (the additional voltage beyond VTH needed to cause the display to approach or reach its highest luminance). If a liquid crystal (LC) material does not start to respond to an electronic stimulus until it has reached a well-defined voltage, then it is said to have a threshold; otherwise, if the display material responds to all voltages, then it is said to be thresholdless (1). For simple direct addressing schemes, like the sevensegment digit electrodes shown in Fig. 2, the threshold(less) nature of the material is irrelevant because the segmented electrodes (or pixels) are independent of each other. The appropriate combinations of segments are addressed by dedicated logic circuitry (i.e., every pixel is independently driven by its own external voltage source), and the screen refresh rate is only as long as needed for a single pixel to switch. Direct addressing is practical only for low-resolution displays (<50 pixels). In a passive addressing scheme (also known as multiplexing), one substrate has row electrodes, and the other substrate has column electrodes, as shown in Fig. 3.
∆ ∆
Luminance
Thresholdless
Conducting segment
Glass
Figure 2. An example of direct addressing, where the segments are independently driven to create low-resolution information.
Every pixel is uniquely determined by the region of overlap of a row and a column electrode, making it possible to access N × M pixels with only N + M connections. A display that has a passive matrix is driven one line at a time; that is, one row is selected for addressing, and all of the columns are addressed using voltages associated with the image for that row. At some time interval later, the next row receives an appropriate voltage pulse, and the columns again are addressed using the information required for that row. The net result is that a pixel is influenced only sufficiently to produce an optical effect when the time-averaged voltage [called the root-mean-square (rms)] across the row and column electrodes is beyond the threshold (VON ≥ VTH ). All of the rows not being updated are driven by row voltages that will not affect the image information already present. The catch is that there will always be a voltage on the non selected rows, and therefore that voltage must satisfy the relationship VOFF ≤ VTH so as not to induce an optical change. Therefore, a well-defined threshold in the luminance–voltage characteristic of the LCD material is necessary to prevent the non selected rows from being addressed (i.e., cross talk). LC materials typically exhibit threshold behavior, which many displays to be multiplexed; display materials that are thresholdless, such as electrophoretics (2) and gyricon (3), cannot. An expression can be derived that completely specifies the maximum number of rows NMAX that can be addressed in terms of the voltage threshold and the parameter , a measure of the nonlinearity of the LC material:
Threshold
1
≤ √ . VTH NMAX VTH
Voltage
Figure 1. Luminance–voltage graph depicting the difference between materials that have well-defined thresholds and those materials that have no threshold (thresholdless). VTH is defined as the voltage at which the luminance begins to increase, and a parameter is defined as the additional voltage beyond VTH needed to cause the display to approach or reach its highest luminance.
(1)
To maximize the number of addressable rows for higher resolution, one usually uses a material that has a very non-linear and steep luminance–voltage response, small
, rather than using materials that have a large VTH (which increases power consumption). Another important equation relates the rms voltages of the ON-state VON , and the OFF-state, VOFF , to the maximum number of rows that is given by (for NMAX 1): VON ∼ 1 , =1+ √ VOFF NMAX
(2)
LIQUID CRYSTAL DISPLAY TECHNOLOGY
957
Passive multiplexing: Amplitude modulation
Row signals
S+D
″ on″
+S
+D −D S−D +D
″off ″
+S
S+D
″on″
+S
+D −D S−D +D
″off ″
+S
−D Pixel voltage (row-column)
+D −D Column signals
where VON /VOFF is referred to as the selection ratio. When NMAX is large, this ratio approaches unity, leading to poor contrast because the difference would decrease between the luminance in the ON- and OFF-states. Therefore, Eq. (2) dictates the optical contrast ratio (CR) of the display, which is defined as the ratio of the luminance in the ON-state and the luminance in the OFF-state (CR = LON /LOFF ). Equations (1) and (2) were first derived by Alt and Pleshko in 1974 and remain the fundamental expressions governing multiplexing (4). To increase resolution using passive addressing techniques, a technique known as ‘‘dual’’ scan can be implemented. In the dual-scan approach, two column drivers are used to address the upper N/2 rows and lower N/2 rows. Examination of Eq. (1) shows that the dual-scan approach √ improves the ratio /VTH by a factor of 2 (5). Although some LC configurations that use the multiplexing technique have reasonable contrast for large values of N, other limitations enter into consideration such as response time (the time taken for a pixel to be switched to the fully ON-state plus the time needed to relax to a completely OFF-state), viewing angle (the maximum polar angle for which a display maintains reasonable contrast and minimal color degradation), and gray-scale issues (those that involve the operation of a pixel at a luminance which is intermediate between the ON- and OFF-states). The frame rate in a passively addressed display is severely limited by the time it takes for a single row to switch, as well as the number of rows in the display, because the frame rate must be greater or equal to the product of the two. In only a few exceptions, displays that use passive addressing will not support full video frame rates. However, this problem can be essentially eliminated using an active-matrix addressing scheme, where a nonlinear element is used for pixel isolation. Active-matrix addressing is recognized by the display industry as the ultimate solution for high fidelity, high
Figure 3. A simple example of multiplexing on an N-row matrix showing the row addressing waveforms, column addressing waveforms, and the corresponding pixel waveforms. This type of multiplexing, known as amplitude modulation, varies the amplitude of the pixel waveform to achieve the desired image. The normally black pixel is ‘‘ON’’ if the rms average pixel voltage exceeds VTH . See color insert.
information content, full color, and significant grayscale applications. Additionally, this addressing approach can be used with thresholdless and large materials because a discrete nonlinear switch is integrated into each pixel structure. An active-matrix LCD incorporates a two-dimensional circuit array (or matrix) to provide the electrical addressing of individual pixels (6). This matrix incorporates an active device in each pixel, usually a thinfilm-transistor (TFT), positioned at one of the corners. Almost all LCD pixels are essentially dielectric capacitors whose leakage is minimal when a charge is placed on the electrodes through the transistor. And due to the electrical isolation afforded by the transistor, the voltage on one pixel remains constant while other pixel elements are subsequently addressed; therefore, the Alt–Pleshko limitation expressed by Eq. (2) does not constrain the contrast ratio, as it does in passive-addressing schemes. A schematic diagram of an active-matrix circuit is shown in Fig. 4, where each pixel element is defined by the overlap of row and column bus lines. The circuit diagram shows that each pixel has one TFT and a LC capacitor formed between a top conducting surface, (typically indium tin oxide (ITO)), and the active-matrix substrate. This approach is significantly more complex than passive addressing, as can be seen from the cross section of an active-matrix TFT display also shown in Fig. 4. The intricate underpinnings of active-matrix addressing and the complex processing required to create one are beyond our scope, but the basic operation can be understood as follows. The display is addressed one line at a time, as in passive addressing. When a row (called the scan or gate line) is addressed, a positive voltage pulse of duration T/N (where N is again the number of rows and T is the frame time) is applied to the line and turns on all transistors in the row. The transistors act as switches that transfer electrical charges to the LC cells from the columns (called data or source lines). When subsequent
958
LIQUID CRYSTAL DISPLAY TECHNOLOGY
Polarizer
Glass
D1
G1
D2
D3
Color filter D4
SiNx
Row driver
Passivation layer G2 G3
Black matrix
ITO electrode
Column driver
Planarization layer Drain Polymer alignment layer
Source
n+ a-Si
ITO
a-Si
SiO2
Glass
Polarizer
G4 Gate insulating layer
Gate
Storage capacitor electrode
Figure 4. The driving circuit of an active-matrix thin-film transistor next to the cross section of an active-matrix pixel. As is apparent from the pixel structure becomes substantially more complex when active-matrix schemes are used. See color insert.
rows are addressed, a negative voltage is applied to the gate line that turns off all of the transistors in the row and holds the electrical charges in the LC capacitors for one frame until the line is addressed again. Alternating row-select voltages are required for most LCD materials, and the polarity of the data voltage is usually switched in alternate frames. The refresh rate in this addressing scheme is not limited by the number of rows but only by the response time of the LC. A nonlinear pixel element can be implemented by a variety of approaches, but we will mention only the two most prominent active-matrix types: amorphous silicon (α-Si) and polycrystalline silicon (poly-Si). Both of these involve complex fabrication of thin films of silicon structures (<300 nm) on glass or quartz substrates. The details of active-matrix development are sufficiently complex to be far beyond the scope of this review, but we include a simple introduction to familiarize the reader with the basics. Pioneering work on the active matrix began in the early 1970s (7), and the now conventional α-Si approach was first proposed in 1979 (8), where the deposited thin film of silicon has a small grain structure and is randomly configured (9). In this approach, inexpensive glass substrates are typically processed at ∼300 ° C, using well established processes, including sputtering, photolithography (typically 5–8 photomasks), plasmaenhanced chemical vapor deposition (PECVD), wet chemical etch, and reactive ion etching (REI). The resulting electron mobility is low (∼0.5 cm2 /V/s), but it is more than adequate to form a useful switch for the pixel (9). However, there are two prominent limitations. First, the aperture ratio (the transmissive area of the pixel divided by the total pixel area) of the display is decreased to 50–75% due to the use of an absorbing light shield to overcome the photoconductivity of the α-Si TFT. Second, because the supporting rowand column-drivers must be mounted on additional IC chips bonded to the display substrate, the connections
(>4, 000) increase fabrication complexity and decrease reliability. Nonetheless, α-Si active-matrix displays have been made in all sizes and are most popular as laptop monitors and other large-area LCDs. Current research issues include reducing the number of photomasks (10), increasing display resolution (11), and increasing the aperture ratio (12). In contrast, a poly-Si substrate can yield TFTs whose electron mobility is much higher (∼440 cm2 /V/s), but must be processed at significantly higher temperatures (6). Fabrication typically involves the same α-Si process described before on a more expensive quartz substrate. Additional processing at the higher temperatures leads to recrystallization in a furnace or by laser annealing (13,14). An intriguing approach uses a laser ablation/annealing approach where CMOS-TFTs are fabricated at high temperatures using a quartz substrate and are subsequently transferred to flexible plastic substrates without any noticeable deterioration in poly-Si performance (15). In all approaches, the silicon grain becomes larger and more uniform and allows electrons to flow much more freely. The greatest benefits of these poly-Si substrates are the ability to fabricate row- and column-drivers directly on the periphery of the glass substrates and the reduction of the TFT size to ∼5 × 5 µm. Additionally, the aperture ratio can be made substantially higher. Disadvantages include the high process temperature, increased fabrication complexity (higher accuracy required in photolithography and ion implantation), and the higher off-leakage current. Integration of driver electronics onto the substrate, increased display brightness, lower power consumption, and the ability to form smaller pixels at higher densities (>200 dpi) make these poly-Si displays particularly useful for microdisplays and medium size displays. Display addressing directly impacts resolution and optical performance. Because most imaging applications require high resolution, active-matrix addressing is the most prominent addressing approach in the imaging field. Because of the complexity of the substrate, active-matrix
LIQUID CRYSTAL DISPLAY TECHNOLOGY
addressing always involves more cost, but it enables a high-resolution regime. A rule of thumb for LCD technology is that passive addressing can achieve NMAX ≤ 400, whereas active addressing can achieve NMAX ≥ 1, 000. PROPERTIES OF LIQUID CRYSTAL MATERIALS To understand the nuts and bolts of a LCD, it is worthwhile to review briefly the material properties that make LCs functional for display applications. The liquid crystal phase is a state of matter that is intermediate between a solid crystal and an isotropic liquid; in fact, it has properties of both. The geometry of a LC molecule is highly anisotropic; these molecules can be four to six times longer than they are wide and are often modeled as rigid rods (16). The nematic phase is the simplest LC phase and is the most used in commercial display applications. This phase possesses only orientational order along the long axes of the elongated molecules, but no positional or bond orientational order, as in a conventional crystal. We discuss the ferroelectric smectic C∗ liquid crystal phase later, so we will limit our discussion here to the nematic phase. The elastic properties of LCs are their most characteristic feature. At the display level, elastic theory is used to predict stable configurations and electric field-induced elastic deformations of the material that are responsible for the image. The elastic theory expression is often written in the following form: f = 12 {K11 (∇ · n)2 + K22 (n · ∇ × n)2 + K33 (n × ∇ × n)2 − ε0 ε(E · n)2 }.
(3)
Here, f is the free energy density; n is the nematic director; E is the applied electric field; K11 , K22 , and K33 are known as the splay, twist, and bend elastic constants, respectively; and ε is the dielectric anisotropy. The nematic director is denoted as n and represents the average orientational symmetry axis of an ensemble of LC molecules. The elastic constants are typically of the order of 10−11 N, the dielectric anisotropy ε is typically ∼5–15 for most display materials, and E is typically <1 V/µm for conventional transmissive LCD technology. Display performance parameters can be predicted by minimizing Eq. (3) with respect to the boundary conditions imposed by the surfaces of the display substrates to obtain the configuration that has the lowest free energy. A number of mathematical representations can be used to calculate the device properties of LCDs (17). Equation (3) is used to predict the details of the field-aligned configuration of the LC material and its threshold voltage. The switching dynamics can be determined by balancing the elastic and electric field torques derived from Eq. (3) and the viscous torque of the material–γ1 (dθ/dt), where γ1 is the rotational viscosity. The rotational viscosity γ1 is typically from 1–2 poise for display materials. The switching time is highly dependent on the choice of LC material and configuration but is typically in the 1–30 ms range. The reader is referred to the following references to explore basic LC properties further (18–30).
959
The techniques involved in transforming the configurations derived from elastic theory and into the optical properties of LCDs have been treated extensively and will only be summarized here. In addition to their geometrical anisotropy, nematic LCs used in displays also possess optical anisotropy, or birefringence, that is responsible for the optical changes in contrast when one configuration is electrically switched to another configuration. The birefringence n for LCs is in the range of 0.05–0.3; n ∼ 0.1 is often used for display applications because it leads to a high contrast design known as the first minimum or maximum (see later). The 2 × 2 Jones matrix technique (31) can be used to calculate the transmission of light through a LCD at normal incidence but neglects Fresnel diffraction and multiple reflection effects in thin layers. The more sophisticated 4 × 4 (32) method represents a complete solution of Maxwell’s equations and is used for the general case of light transmission through a LCD at any angle. Nematic LC configurations pertinent to displays are most often modeled by this latter method by the display community (33). LIQUID CRYSTAL DISPLAY CONFIGURATIONS Having presented the basics of display addressing and the basic properties of nematic LCs, we can now introduce the various configurations relevant to displays. It would be impossible to describe, even briefly, every type of current LCD configuration, so we will discuss the operational principles of a few that are most relevant to imaging. Broadly speaking, LCDs can be classified as transmissive or reflective. A transmissive display is a light shutter that modulates a powerful backlight. Reflective displays take advantage of ambient lighting to reflect light back to the viewer effectively and therefore do not require a backlight. Transflective displays operate in both modes. Removing the backlight can result in significant power saving and therefore longer battery life. These are largely being developed for portable applications rather than high-resolution imaging. Therefore, we will focus on transmissive displays and end by presenting a promising low-power reflective display technology. Twisted Nematic (TN) Liquid Crystal Displays The most commercially successful LCD configuration, known as the twisted nematic (TN), uses cross-polarizers and a molecular orientation of the molecules whose long axis twists through a 90 ° angle between the orienting substrates. A cross-section of a TN-LCD, presented in Fig. 5, includes two glass substrates that have polarizers laminated on the outer surfaces and an ITO conducting layer on the inner surfaces covered by a polymer (e.g., polyimide). The polyimide layer is mechanically rubbed with a cloth to create microgrooves on the surface that uniformly align the long axis of the LC molecules at the surface (34). The alignment of the rub direction of the two substrates is placed parallel to the transmission axis of the respective laminated polarizer, but perpendicular to each other. The surface alignment mechanisms introduced by the polyimide then result in a LC layer that
960
LIQUID CRYSTAL DISPLAY TECHNOLOGY
Polarizer
Liquid crystal material
Color filter mosaic
Conducting layer (ITO)
E Polymer layer
Polarizer
Active matrix for high resolution
Glass substrate
Backlight unpolarized light
Figure 5. The operation of the twisted nematic (TN) configuration, which is currently the most common LCD mode. The center illustration shows the optical stack of all of the components of the TN display cell. The adjacent insets show the field-on and field-off states of the normally white (NW) configuration. Note that the cross-section is not drawn to scale. See color insert.
The transmission of a NW-mode TN-LCD can be derived by using the Jones formalism (35): √ 2 π 2 1 1 sin 2 1 + u . T= − 2 2 1 + u2
(4)
Here, u = 2d n/λ, and it is assumed that the director n at the display substrates is parallel to the transmission axis of the polarizers. The 1/2 term indicates that the maximum transmission, as shown in Fig. 6, can be ideally 50% of the input light due to the loss in the first polarizer. Note that the maximum transmission occurs when the argument
60 50 Transmission(%)
has a 90° twist sandwiched between crossed polarizers. A 5-µm cell gap is used for most TN displays. In the figure, the extraordinary axis (long axis) of the LC molecules at the surfaces is parallel to the transmission axes of the polarizers (called the e-mode). However, the o-mode, when the ordinary axis is parallel instead, is also used. These two configurations lead to slightly different transmission and viewing-angle characteristics (35). Finally, placing a red, green, and blue color filter array between the top substrate and ITO layer creates a full color display, known as spatial color synthesis. The TN principle of operation is as follows. After the first polarizer, linearly polarized light will follow the LC twist and waveguide to an orthogonal linear polarization state. This process, sometimes called adiabatic following, enables light to escape through the second polarizer. This is valid if the twist angle (in this case π/2) is much smaller than the retardation of the nematic (2π nd/λ), known as the Mauguin condition. Using common LC values, birefringence n = 0.10, cell thickness d = 5 µm, and wavelength λ = 510 nm, this ratio is 1 : ∼4. If this limit is not satisfied, then the light exiting the LC layer is elliptically polarized, and the output from the top polarizer will be reduced. This configuration is called normally white (NW), and describes the zero-voltage state as the transmissive one. This is most common in today’s technology. The polarizers can also be arranged in a parallel configuration, so that the zero-voltage state is black and the voltage-on state is the transmissive one.
40 30 20 10 0
1 √3
3 √15 5 √35 7 u
9
11
Figure 6. The transmission of the NW-TN cell in the field-off state as a function of the Mauguin parameter.
LIQUID CRYSTAL DISPLAY TECHNOLOGY 2 of sin an integral multiple of π , corresponding to √ [. .√.] is √ u = 3, 15, 35, etc. The first maximum that occurs at √ 3 corresponds to the maximum brightness and contrast ratio. √ Optimizing around other maximums is also done (at 15, the second maximum, for example), but this is not nearly as common. Nematic domains can sometimes appear in the TN mode, which degrade the contrast but can be overcome by using a polyimide alignment layer which induces an out-of-plane pretilt (typically in the range of 2–10° ). When a voltage is applied to the pixel, an electric field is created perpendicular to the substrates. Because the dielectric anisotropy of the LC is positive ( ε > 0), the molecules tend to align parallel to the field. Equation (5) can be used to predict the actual threshold voltage of the twisted nematic configuration, which is the point where the molecules just begin to realign. The threshold voltage for a TN display is given by the following expression:
VTH = π
K33 − 2K22 1/2 K11 1+ εo ε 4K11
(5)
Using typical values K11 = 10−11 N, K22 = 5.4 × 10−12 N, K33 = 15.9 × 10−12 N, and ε = 10.5, then VTH = 1.1 V, a common threshold for most nematic mixtures. Note that this threshold is the voltage where the LC starts to align due to the applied voltage and does not say anything about the director profile. Above VTH , the broken symmetry of the twist due to the out-of-plane reorientation of the nematic molecules align perpendicular to the substrates and light passes through teh LC layer without any change in polarization. In the NW configuration, the output polarizer then absorbs the light. Grayscale levels are attainable when intermediate voltages are used. In summary, the TN-LCD therefore simply modulates the intensity of a powerful backlight by acting on the polarization state of the incident light. Notice that the display shown in Fig. 5 is addressed by an active matrix, which is common because the luminance–voltage curve for the twisted nematic LCD is not very steep and is not conducive to multiplexing schemes (36,37); therefore an active matrix is exclusively used to address the twisted nematic for high-end, highresolution imaging. For example, if we consider Eq. (1) and substitute typical values for a TN material ( = 0.7 and VTH = 1.8 V), the maximum number of rows that can be addressed is approximately six; therefore only very low resolution displays of this mode are possible using multiplexing schemes. The switching time (typically in the range of 10–30 ms) of the TN configuration is proportional to the viscosity and the square of the cell gap and therefore is very sensitive to the cell gap. Although thinner cells enable faster switching, they often compromise the Mauguin condition, which can reduce brightness and the contrast ratio. The art of display design consists, in large part, of balancing these parameters for maximum benefits. Supertwisted Nematic (STN) Liquid Crystal Displays While the TN-TFT display configuration has become the standard in applications that require high pixel densities
961
and gray-scale resolution, the supertwisted nematic (STN) configuration (38,39) has been very successful in the low to medium resolution realm. The reasons for this primarily surround the ability of this display mode to be multiplexed (passive-matrix addressing). As discussed earlier, an N × M-pixel display requires N + M electrical connections using this passive scheme (the same number as for active-matrix addressing) but does not require a TFT matrix. And even though there are some optical performance trade-offs involving contrast and switching times, multiplexing simplifies the manufacturing process tremendously and enables the fabrication of inexpensive medium-resolution displays. In this mode, as in the TN, color is achieved through spatial color synthesis. An expanded view of the basic STN display that has a twist of 270° is shown in Fig. 7. As for the TN, the substrates are coated by a polyimide layer that uses orthogonal rubbing for alignment. Notice that the illustrated twist in the field-off state is greater than the 90° that would be the minimum energy configuration resulting from surface alignment alone; a regular nematic in this configuration would immediately relax to the TN mode. The twist angle in the STN is maintained at >90° using materials with an inherently twisted structure, usually a chiral nematic, which exhibits intrinsic handedness that manifests as a macroscopic twist. This structure is identical to the cholesteric phase described later and can be characterized by the pitch p0 and the sense (left-or right-handedness) of the twist. When an ordinary nematic is doped by using a chiral nematic, adjusting the concentration ratio can modify the pitch. The primary electro-optic benefit of the supertwist is an increase in the steepness of the transmission curve (a decrease in ). Using typical values for the STN ( = 0.1 and VTH = 2.0 V), the maximum number of addressable rows is ∼400; therefore, medium resolution is possible. Because of the larger twist angle, this mode does not meet the Mauguin condition and does not exhibit the waveguiding property of the TN. The transmission equation for this mode is much more complex, but an illustration of midlayer tilt can give some insight. As is apparent from Fig. 8, the reorientation of the midlayer, which is responsible for the optical switching in twist mode cells, is much more sudden for higher twist angles and leads to a much sharper voltage threshold. Additionally, the two polarizers in the STN mode are generally not crossed or parallel, nor is the actual twist of the LC necessarily a multiple of 90° . These angles are now included as parameters over which optimization for a particular application must be performed (38). The trade-offs of this mode include slightly higher drive voltages, increased sensitivity to the pretilt angle, color leakage in both the on- and off-pixel states, and reduced contrast. Although the analysis is somewhat more involved [the twist term in Eq. (3) becomes K22 (n · ∇ × n + p0 )2 , and now includes the pitch of the twist p0 ), the voltage threshold can be found through elastic theory considerations (for a zero pretilt):
962
LIQUID CRYSTAL DISPLAY TECHNOLOGY
Liquid crystal
Polarizer
Color filter mosaic
ITO 270
180
E
Polymer 90
0
Glass or plastic
Polarizer
Passive matrix
Backlight
Figure 7. The operation of a supertwisted nematic (STN) configuration. The center illustration shows the optical stack of all of the components of the supertwisted nematic display configuration that is multiplexed. The adjacent insets show the field-on and field-off states of the NW configuration. Note that the cross-section is not drawn to scale. See color insert.
Deg 90 80 Midlayer tilt angle
70 60 50 40 Total twist = 90° 180°
30
270°
360°
20 10 0 0.0
0.5
1.0
1.5
2.0
2.5
3.0
V
Reduced voltage
Figure 8. Example of the way midlayer tilt of a LC display changes as a function of twist. The increasingly nonlinear voltage response of twists greater than 90° are the reason that higher degrees of multiplexing are possible in the STN mode. The reduced voltage is defined as the ratio of the drive voltage and the voltage threshold of the 90° twist.
VTH = π
1/2 K11 K22 d φ K33 − 2K22 φ 2 +4 1+ εo ε K11 π K11 p0 π (6)
The extra term arises from the presence of the inherent twisting strength and slightly increases the threshold [note that this reduces to Eq. (4) when p0 ⇒ ∞ and ⇒ π/2]. Using typical values K11 = 10−11 N, K22 = 5.4 × 10−12 N, K33 = 15.9 × 10−12 N, ε = 10.5, = 270° , and d/p0 = 0.75, then VTH = 2.2 V, higher than the TN. A result of the larger twist angles is the appearance of striped textures that can occur within pixels and destroy contrast and color performance. These defects, as for the TN domain defects described before, can be overcome by an appropriate pretilt angle, that can range from 4–8° . Two factors negatively impact contrast in a STN. First, as previously mentioned, the smaller parameter of a multiplexed display leads to a lower selection ratio [Eq. (2)]. Second, the lack of adiabatic waveguiding in this mode leads to a moderate sensitivity to wavelength. The chromatic anisotropy of the transmission of a NW cell leads to a yellowish-green appearance — an ON-state that is not really white. Furthermore, the substantial leakage of long and short wavelengths in the NB cell results in dark pixels that are not actually black. However, a solution to this leakage is to use an additional passive STN layer that has a reversed twist (40). The switching times of the STN mode are proportional to the viscosity of the LC and to the square of the cell gap and are longer than those of a comparable TN cell. This is due to the typically larger viscosity and additional chiral twist.
LIQUID CRYSTAL DISPLAY TECHNOLOGY
In-Plane Switching Liquid Crystal Displays The in-plane switching (IPS) mode (41,42) is another LCD mode that has been increasingly successful in large area desktop applications due to the inherently wide field of view (>100° ). As will be discussed later, both the TN and STN mode are limited in their viewing-angle performance primarily because of out-of-plane reorientation of LC molecules in response to an electric field. In the IPS configuration, this out-of-plane tilt is avoided by using an in-plane electric field, generated by a patterned electrode structure. Spatial color synthesis is also used to generate full color displays. Figure 9 shows the operation of a normally black (NB) IPS display using crossed polarizers. The interdigitated ITO fingers of the electrode structure lie entirely on the bottom substrate. Without an electric field, the parallelaligned polyimide-coated substrates lead to uniform alignment of the nematic through the depth of the cell. Because the transmission axis of the input polarizer is aligned parallel to the rub direction, as shown in Fig. 9, no birefringence is apparent, and all of the light is absorbed by the second polarizer. However, when the in-plane electric field is applied 45° to the rub direction, a twist configuration appears that changes the polarization state of the incident light. Except for a small pretilt designed to minimize defects, there is no out-of-plane tilt. Loosely speaking, when the field is strong enough to reorient
Liquid crystal material
Polarizer
most of the LC molecules, this layer can be approximately modeled as a birefringent waveplate, and the normalized transmission through the cell in Fig. 9 is the following: π nd 1 (7) T = sin2 2 λ At low fields, a simple transmission expression is not available, and analysis must be done numerically. Because there is no splay or bend in this mode, the energy density involves only the twist and electric field terms. A voltage threshold can be derived as
K22 πl VTH = (8) d εo | ε| where l is the distance between the interdigitated electrodes, d is the cell thickness, and ε can be either negative or positive. Using typical values K22 = 5.0 × 10−12 N, ε = 10, and l/d = 4, then VTH = 3 V, slightly higher than both the TN and the STN. Unlike the TN configuration, the electro-optic response in this mode is very sensitive to the cell gap and the electrode spacing. And although this mode has a voltage threshold, an active matrix must be used for addressing due to the large parameter. Although the wide viewing angle is the acclaimed feature of the IPS mode, several issues are currently
Color filter mosaic
Conducting layer (ITO)
Polymer layer E
Glass substrate
Polarizer
963
Active matrix for high resolution
Backlight unpolarized light Figure 9. The operation of a normally black (NB) in-plane switching (IPS) configuration. The IPS mode is very different from the TN and STN technology in that the electric field is in the plane of the substrates. Note that the cross-section is not drawn to scale. See color insert.
964
LIQUID CRYSTAL DISPLAY TECHNOLOGY
being investigated. First, and most critical, the switching time continues to be somewhat longer than for the TN, making it difficult for this display to support true video refresh rates (43). Second, the pixel architecture generally leads to a comparably smaller pixel aperture, and the LC configuration leads to an ideal transmission less than 50%. Both of these contribute to decreased overall transmission and can be overcome by a brighter backlight, but this leads to greater power requirements; however, this is not much of a problem for the desktop market. Third, the drive electronics for this mode typically demand more power, adding further to power consumption. Finally, the transmission of the IPS-LCD cell does not respond uniformly to all wavelengths, as can be seen from Eq. (9). In spite of these challenges, the IPS mode continues being used in large area desktop display. Ferroelectric Liquid Crystals Ferroelectric liquid crystals (FLCs) are an important addition to the previous discussion of nematic LCDs
because extremely fast switching and bistable displays are possible. Our treatment here is introductory, so we suggest further reading on FLCs and chiral liquid crystals in these thorough sources (44,45). Most broadly, both nematic and ferroelectric molecules exhibit shape anisotropy. However, FLCs are distinguished by inherent chirality and the presence of a permanent dipole oriented perpendicularly to the long axis of the molecule. On the macroscopic level, this transverse dipole leads to the smectic C∗ phase of the bulk LC, as seen in Fig. 10a. The LC molecules become organized into layers within which no 2-D positional order exists, while orientational order is maintained at some characteristic angles (θ, φ). The polar tilt angle θ is consistent throughout each layer and can be as large as ∼45° and as small as a few degrees (46,47). Furthermore, though the azimuthal angle φ is approximately uniform throughout each layer, the symmetry axis from one layer to the next rotates about the normal vector by a small dφ. The net polarization of the helix is zero due to averaging of the in-plane polarization of each smectic layer. However, when the helix is unwound
Figure 10. Ferroelectric liquid crystals (FLC): (a) the structure of the smectic C∗ phase; (b) the surface-stabilized (SSFLC) mode, where the helix is unwound and FLC molecules align to form a uniaxial birefringent layer; (c) bistable switching states in the SSFLC controlled by the polarity of the electric field. See color insert.
LIQUID CRYSTAL DISPLAY TECHNOLOGY
to form the ferroelectric phase (by the influence of fields or surfaces), a net spontaneous polarization exists, typically in the range of PS ∼1–200 nC/cm2 . It is important to note that both the spontaneous polarization and the √ tilt angle are highly temperature dependent, usually ∝ TC − T, where TC is the Curie temperature (the transition into the higher temperature phase, usually smectic C∗ → smectic A) (48). The unique ordering and electro-optic response of FLCs can be used to produce a fast switching and bistable display through the Clark–Lagerwall effect (49). This effect is seen when a FLC is confined between substrates whose surface alignment is parallel and have a cell gap much less than the helical pitch, as shown in Fig. 10b. Known as surface-stabilized FLCs (SSFLCs) (50), the helix is completely unwound by the substrate anchoring influence, the FLC symmetry axis lies along the substrate alignment direction, and the smectic layers are perpendicular to the substrates (called the quasi-bookshelf texture). When a voltage is applied, the permanent dipoles uniformly align parallel to the electric field because rotation of the FLC is constrained to the azimuthal cone. In this configuration, the mesogenic cores (the LC body) align parallel to the plane of the substrates and form a uniform birefringent medium. However, when the polarity of the electric field is reversed, the FLCs rotate to the opposite side of the cone. The molecular ordering will persist ‘‘indefinitely’’ in both of these cases, even when the electric field is removed. This bistable memory is a much sought after property because it can result in low-power operation. However, this bistability is particularly sensitive to temperature and physical shock (such as deformations of the substrates), and defects are a substantial device issue. A display can be formed when this cell is placed between crossed polarizers where one axis of symmetry is parallel to a polarizing axis. Both the ON- and OFF-states are illustrated in Fig. 10c. Linearly polarized light leaving the first polarizer encounters no birefringence in the FLC layer and remains linearly polarized; the second polarizer then absorbs all of the light. However when a voltage of opposite polarity is applied, the FLCs rotate by 2θ . In this case, the birefringence of the FLC layer leads to a phase shift in the linearly polarized light that enters the layer, and the transmission of the LCD can be modeled using a waveplate model (51), where θ is the FLC tilt angle and T0 is the unpolarized intensity entering the first polarizer: π nd 1 (9) T = T0 sin2 (4θ ) sin2 2 λ Maximum contrast occurs when θ = 22.5° and 2d n/λ = 1, and usually requires very small cell gaps (d∼2 µm) for visible wavelengths. As a result of the FLC reorientation only along the azimuthal cone, switching times are substantially less than those in the TN configuration; the azimuthal viscosity of a FLC is usually substantially less than the viscosity of the rotational viscosity of a nematic. And contrary to most nematic LCD modes, the electric field directly influences both the rise and fall times in the FLC which are inversely proportional to the spontaneous polarization and the electric field. Both of these factors lead to switching times in the range of ∼10–200 µs.
965
The voltage threshold of the SSFLC mode involves both the spontaneous polarization and the nonpolar anchoring energy Wd (usually in the strong-anchoring regime): VTH 4
Wd PS
(10)
For example, Wd = 10−4 J/m2 and PS = 20 nC/cm2 leads to VTH ∼2 V, well within the requirements for display systems. The limitations of these devices include the onset of defects due to temperature or shock, the difficulty of producing gray-scale images, and the challenge of maintaining small cell gaps in displays of any substantial size. Nonetheless, FLCs can be found in many applications including field-sequential color displays (52) and small flexible displays (53). A second electro-optic effect used as a display mode is known as the deformed helix ferroelectric (DHF) (54). In this case, the helical pitch is much smaller than the cell gap and shorter than the wavelength of visible light. As a result, the helical axis lies along the plane of the substrates perpendicularly to the electrodes. An externally applied field distorts the helix and results in the rotation of the optical axis (the average direction of the molecules) away from the helix. Although this mode is not bistable and requires an active matrix, it does enable gray-scale imaging with very fast switching times (55). Reflective Displays from Cholesteric Liquid Crystals There is a color-reflective bistable LC-based display that is on the verge of significant commercialization. The properties of a cholesteric LC material allow it to form two stable textures that persist, even when the drive electronics are inactive: a reflective planar texture that has a helical twist whose pitch p can be tuned to reject a portion of visible light or focal conic textures (scrambled helices) that are relatively transparent. Figure 11 shows the basic structure of a cholesteric display that is backed by a black substrate. In the planar texture case (Fig. 11a), the periodicity of the helices enables them to Bragg reflect a narrow range of colors, whereas all of the others pass through and are absorbed by a black background. The viewer sees a brilliant color reflection whose bandwidth is in the perfect planar texture of ∼100 nm, governed by λ = p n. Ideally this reflection peak can only be at 50% efficiency because cholesteric displays reflect either the right-handed component or left-handed component of circular polarized light, depending on the intrinsic twist of the material itself (56). Upon the application of an applied voltage (∼10–15 V), the planar structure transforms into the focal conic that is nearly transparent to all wavelengths in the visible, as shown in Fig. 11b. The viewer sees the black background, thereby creating an optical contrast between reflecting color pixels and black pixels. In this state, the voltage can be removed, and the focal conic state will remain indefinitely, so there is bistable memory between the reflecting planar state and the transparent focal conic state. To revert to the planar reflecting texture from the focal conic state, the pixel must go through the highly aligned state (also known as the homeotropic
966
LIQUID CRYSTAL DISPLAY TECHNOLOGY (b) Reflection
(a)
Glass or plastic l ITO Cholesteric liquid crystal
Voltage
(c)
VH
Time
Figure 11. The operation of a bistable, cholesteric LCD that does not require a backlight; it operates in the reflective mode and therefore uses ambient illumination. See color insert.
state), as shown in Fig. 11c. The transformation requires 30–35 V. An abrupt turn-off of the voltage after the aligned state results in the planar texture. The Braggtype reflection of cholesterics is far from Lambertian-like but has a more specular (mirror-like) nature; to smear out the reflection across a broader viewing angle, unique alignment techniques that slightly ‘‘fracture’’ the planar texture are employed to spread out the Bragg reflection across a broader viewing angle at the expense of on-axis reflection. Grayscale is achieved in cholesteric technology by controlling the focal conic domains using different levels of voltage. Because these devices are transparent, vertical integration is possible, as shown in Fig. 12, to create a true color addition scheme. Although stacking can create a complicated drive circuitry, it does preserve resolution and brightness because the pixels are vertically integrated rather than spatially arranged across the substrate plane. The photopic white reflectance of the vertical stack is >40%. The dynamic response times of cholesteric materials are of the order of 30–40 ms. By implementing unique addressing schemes, video is possible using cholesteric LC technology. The most attractive feature of the cholesteric display is that its reflectance–voltage curve has a welldefined threshold, which enables the use of inexpensive passive-addressing schemes, even for high resolution (57). A full color VGA image from this reflective display
Voltage
Relaxation
VH
Time
Figure 12. For cholesteric reflective displays, color is created by stacking red, green, and blue panels, thereby preserving brightness and resolution. This is in contrast to the spatial patterning of red, green, and blue color filter arrays used in TN, STN, and IPS technology (Photograph courtesy of J. W. Doane of Kent Displays). See color insert.
configuration is shown in Fig. 12. Cholesteric LC materials are being developed for document viewers, electronic newspapers and books, and information signs — portable applications where bistability is extremely beneficial.
LIQUID CRYSTAL DISPLAY TECHNOLOGY
DISPLAY PERFORMANCE PARAMETERS
(a)
967
90°
Operation of important display technologies has been discussed before, so we will now briefly summarize three important parameters that are commonly used for display characterization.
135°
0.49
45°
0.495 0.499
Contrast Ratio The optical contrast ratio (CR) is a simple performance measure that captures how clearly an image can be seen (58,59). It is defined as the ratio of the luminance of the ON-state and the luminance of the OFF-state: CR = LON /LOFF . A high CR demands high transmission in the bright state and is particularly sensitive to the dark-state brightness, which means that very high ratios are possible, even from a dim display. Unfortunately, the unbounded nature of this equation often makes display comparisons difficult, and consequently, it is not uncommon to find completely different image quality on two competing displays that have the same CR. The contrast of the TN and IPS modes is superior to the STN due to their addressing scheme and the respective nematic configuration. The multiplexing technique used by the latter leads to a slight increase in the dark-state luminance. The color leakage inherent in the STN mode further contributes to the contrast (<15 : 1). Both TN and IPS displays typically exhibit similar contrast (>100 : 1) due to good dark states when viewed on-axis, but the TN will often appear brighter.
180°
30° 45° 60° 225°
Color Unlike the previous two display measures, the color performance of a display necessarily involves the physiology
315° 270°
V = 0 Volts
(b)
90° 135°
45°
0.001 0.01 0.15
180°
0°
0.1 0.2 0.3 0.4
Viewing Angle The field of view is one of the most critical performance parameters of a high information content LCD panel, primarily due to the size of modern flat-panel displays (typically >13 ). A viewing cone of at least 30° is needed just to view the extreme corners if the observer is seated 20 away. Most LCD modes maintain excellent on-axis characteristics but exhibit poor contrast and grayscale inversions when viewed off-axis. This can be seen clearly in the isotransmittance curves for a NW-TN cell shown in Fig. 13. These problems arise primarily from a combination of the off-axis leakage of the polarizers and the out-of-plane tilt of the nematic molecules in the center of the cell (60). This results in an angularly dependent birefringence and in elliptical, rather than linear, off-axis polarization states incident on the second polarizer. A tremendous amount of effort has gone into improving this for the TN-TFT-LCD, and a variety of approaches have been successful: compensation films (20,35,38,61), multidomain approaches (62,63), and novel nematic configurations [vertical alignment (64), optically compensated bend (67)]. As previously mentioned, the LC molecules in the IPS mode remain substantially in the plane, and only the leakage from the polarizers is common.
0°
15°
225°
315° 270°
V = 2.10 Volts Figure 13. Examples of isotransmittance curves of an uncompensated NW-TN cell for the (a) bright state and (b) 10% gray-level state. These plots show constant-transmittance contours on a polar plot that has a 360° azimuthal span and 60° polar range measured from the vector normal to the display cell. Notice only a small anisotropy of the contours in the bright state which indicates that this display would appear just as bright from all polar angles as large as 60° . However, when the gray-scale voltage is applied, the variation in transmission over the azimuthal angles and the gray-scale inversions at 90° are apparent and would result in a very poor image. Also note that at best only 50% of the unpolarized backlight can be transmitted.
of the eye. Because of this, two realms exist for characterizing displays: radiometric and photometric. Radiometry involves measuring actual optical power in terms of radiant flux (watts), combined with units of area, distance, solid angle, and time. However, because the human visual system does not have a flat spectral response, photometry was developed. In this case, radiometric data undergo a linear transformation to take into account the average photo-optic response of the eye and are expressed in terms of luminous flux (lumens). Depending
968
LIQUID CRYSTAL DISPLAY TECHNOLOGY
on the context, units of radiance (watts/steridian × m2 ) and luminance (candelas/m2 , lumens/steridian × m2 , nits, and footlamberts) are commonly used for radiometric and photopic measures, respectively. One particularly useful aspect of photometry is the development of a color space wherein any optical power spectrum can be mapped onto a two-parameter chromaticity space that accounts for the photopic response of the eye. We refer the reader to the extensive treatment of display color performance in (58,63). The properties of the human visual system allow numerous possibilities of color synthesis with LCDs. Full-color commercial displays almost always use a spatial synthesis approach, in which a broad color gamut can be perceived using three spatially distinct, independently addressed subpixels of differing primary colors (usually red, green, and blue). Because these subpixels usually use absorbing color filters, one consequence of this scheme is a reduction in the ideal brightness of a display by at least two-thirds in most situations. In addition to this color-addition method, a pixel can also be formed using three primary-color layers that are vertically stacked. This enables maximum ideal throughput of the backlight and highest pixel resolution, but drawbacks include the necessity of three LC layers, their associated addressing matrices, and additional substrates, all of which dramatically increase manufacturing complexity. Nonetheless, this approach is ideal for reflective displays to maximize efficiency. In transmissive displays, a colorsubtraction scheme using dichroic dyes can also be devised using the vertically stacked pixel. In another possibility, a sequence of primary colors flashed quickly will be integrated by the eye and will be perceived as a different color. This is known as field sequential color synthesis (64,65). SUMMARY The development of LCDs is continuing at a feverish pace because of the market need for high-fidelity displays in numerous imaging applications. Perhaps many innovative and profitable careers are supported by the modern maxim: ‘‘You can image all content on some LCDs and some content on all LCDs, but you cannot image all content on all LCDs’’! Since imaging applications are ever-changing and display technology is continually evolving, we have chosen in this article to target the fundamental aspects of the technology in order to provide the reader with a flexible, yet foundational, understanding. ABBREVIATIONS AND ACRONYMS α-Si CR DHF FLC ITO IPS LC LCD NB
amorphous silicon contrast ratio deformed helix ferroelectric ferroelectric liquid crystal indium tin oxide in-plane switching liquid crystal liquid crystal display normally black
NW PECVD Poly-Si REI STN SSFLC TFT TN
normally white plasma-enhanced chemical vapor deposition polycrystalline silicon reactive ion etching supertwisted nematic surface-stabilized ferroelectric liquid crystal thin-film-transistor twisted nematic
BIBLIOGRAPHY 1. G. P. Crawford, IEEE Spectrum 37, 40–46 (1992). 2. B. Comiskey, J. D. Albert, H. Yoshizawa, and J. Jacobson, Nature 394, 253–255 (1997). 3. N. K. Sheridon, J. of Soc. for Information Display 7, 141–144 (1999). 4. P. M. Alt and P. Pleshko, IEEE Trans. Elec. Dev. ED-21, 146–155 (1974). 5. T. Scheffer, in P. Collings and J. Patel, eds., Handbook of Liquid Crystal Research, Oxford University Press, New York, 1997, pp. 445–471. 6. S. Kobayashi, H. Hiro, and Y. Tanaka, in P. Collings and J. Patel, eds., Oxford University Press, New York, 1997, pp. 415–444. 7. T. P. Brody, J. A. Asars, and G. D. Dixon, IEEE Trans. Elec. Dev. ED-20, 995–1101 (1973). 8. P. G. Le Comber, Electron. Lett. 15, 179–181 (1979). 9. V. G. Chigrinov, Liquid Crystal Devices, Artech House, NY, 1999, pp. 238–246. 10. D. E. Mentley and J. A. Castellano, Liquid Crystal Display Manufacturing, Stanford Resources, Inc., San Jose, 1994. 11. C. W. Kim et al., SID Digest 31, 1,006–1,009 (2000). 12. H. Kinoshita et al., SID Digest 30, 736–739 (1999). 13. S. Nakabu et al., SID Digest 30, 732–735 (1999). 14. T. Sameshima, M. Hara, and S. Usui, Jpn. J. Appl. Phys. 28, 2,131–2,133 (1989). 15. A. T. Voutsas, D. Zahorski, and S. Janicot, SID Digest 30, 290–293 (1999). 16. S. Utsunomiya, S. Inoue, and T. Shimoda, SID Digest 31, 916–919 (2000). 17. R. A. Pelcovits, in P. Collings and J. Patel, eds., Handbook of Liquid Crystal Research, Oxford University Press, New York, 1997, pp. 71–95. 18. L. M. Blinoff and V. G. Chigrinov, Electrooptic Effects in Liquid Crystal Materials, Springer, New York, 1996. 19. P. G. de Gennes and J. Prost, The Physics of Liquid Crystals, Oxford Science Publications, Oxford Press, New York, 1993. 20. S. Chandraskehar, Liquid Crystals, Cambridge University Press, Cambridge, 1994. 21. I. C. Khoo and F. Simoni, Physics of Liquid Crystalline Materials, Gordon and Breach Science Publishers, Philadelphia, 1991. 22. S. A. Pikin, Structural Transformations in Liquid Crystals, Gordon and Breach Science Publishers, Philadelphia, 1991. 23. H. Stegemeyer, Liquid Crystals, Springer, New York, 1994. 24. A. A. Sonin, The Surface Physics of Liquid Crystals, Gordon and Breach Science Publishers, Philadelphia, 1995. 25. D. Demus et al., Physical Properties of Liquid Crystals, WileyVCH, Weinheim, 1999.
LIQUID CRYSTAL DISPLAY TECHNOLOGY 26. P. J. Collings and M. Hird, Introduction to Liquid Crystals, Taylor and Francis, London, 1997. 27. S. Elston and R. Sambles, The Optics of Thermotropic Liquid Crystals, Taylor and Francis, London, 1998. 28. R. C. Jones, J. Opt. Soc. A 31, 488–499 (1941). 29. S. Teitler and J. Henvis, J. Opt. Soc. Am. 60, 830–840 (1970). 30. D. W. Berreman, (1983).
Phil.
Trans.
R.
Soc.
309,
203–216
31. M. Schadt, Annu. Rev. Mater. Sci. 27, 305–379 (1997). 32. P. Yeh and C. Gu, Optics of Liquid Crystal Displays, Wiley Interscience, New York, 1999. 33. J. A. Castellano, Handbook of Display Technology, Academic Press Inc., San Diego, 1992. 34. D. E. Mentley and J. A. Castellano, Flat Information Displays Market and Technology Trends, Stanford Resources Inc., San Jose, 1993,1994. 35. T. Scheffer and J. Nehring, in B. Bahdur, ed., Liquid Crystals: Applications and Uses, World Scientific, Singapore, 1993. 36. T. J. Scheffer and J. Nehring, J. Appl. Phys. 58, 3,022–3,031 (1985). 37. I. C. Khoo and S. T. Wu, Optics and Non-linear Optics of Liquid Crystals, World Scientific, Singapore, 1993. 38. M. Oh-e, M. Ohta, S. Aratani, and K. Kondo, Digest Asia Display ’95 577–580 (1995). 39. H. Wakemoto et al., SID Digest 28, 929–932 (1997). 40. M. Hasegawa, SID Digest 28, 699–702 (1997). 41. I. Musevic, R. Blinc, and B. Zeks, The Physics of Ferroelectric and Antiferroelectrc Liquid Crystals, World Scientific, Singapore, 2000. 42. H. -S. Kitzerow and C. Bahr, eds., Chirality in Liquid Crystals, Springer, New York, 2001. 43. J. S. Patel and J. W. Goodby, J. Appl. Phys. 59, 2,355–2,360 (1986). 44. T. Geelhaar, Ferroelectrics 84, 167–181 (1988). 45. A. W. Hall, J. Hollingshurst, and J. W. Goodby, in P. Collings and J. Patel, eds., Handbook of Liquid Crystal Research, Oxford University Press, New York, 1997, pp. 41–70.
969
46. N. A. Clark and S. T. Lagerwall, Appl. Phys. Lett. 36, 899–901 (1980). 47. N. A. Clark and S. T. Lagerwall, Ferroelectrics 59, 25–67 (1984). 48. J. Z. Xue, M. A. Handschy, and N. A. Clark, Liq. Cryst. 2, 707–716 (1987). 49. T. Yoshihara, T. Makino, and H. Inoue, SID Digest 31, 1,176–1,179 (2000). 50. M. Muecke et al., SID Digest 31, 1,126–1,129 (2000). 51. A. G. H. Verhulst, G. Cnossen, J. Funfschilling, and M. Schadt, International Display Research Conference 94, 377–380 (1994). 52. J. Funfschilling and M. Schadt, J. Appl. Phys. 66, 3,877–3,882 (1989). 53. D. K. Yang, L. C. Chien, and Y. K. Fung, in G. P. Crawford and S. Zumer, eds., Liquid Crystals in Complex Geometrics, Taylor Francis, London, 1996, pp. 103–142. 54. X. Y. Huang, N. Miller, and J. W. Doane, SID Digest 28, 899–902 (1997). 55. P. A. Keller, Electronic Display Measurement, John Wiley & Sons, Inc., New York, NY, 1997. 56. L. W. MacDonald and A. C. Lowe, eds., Display Systems Design and Applications, John Wiley & Sons, Inc., New York, NY, 1997, chap. 14–17. 57. P. J. Bos and K. Werner, Information Display 13, 26–30 (1997). 58. H. Mori et al., Jpn. J. Appl. Phys. 36, 143–147 (1997). 59. M. S. Nam et al., SID Digest 28, 933–936 (1997). 60. J. Chen et al., Appl. Phys. Lett. 67, 1,990–1,992 (1995). 61. K. Ohmuro, S. Kataoka, T. Sasaki, and Y. Koike, SID Digest 28, 845–48 (1997). 62. T. Uchida and T. Miyashita, IDW’95 Digest 39–42 (1995). 63. R. G. Kuehni, Color: An Introduction to Practice and Principles, John Wiley & Sons, Inc., New York, NY, 1997. 64. T. Yoshihara, T. Makino, and H. Inoue, SID Digest 31, 1,176–1,179 (2000). 65. T. R. H. Wheeler and M. G. Clark, in H. Widdel and D. L. Post, eds., Color in Electronic Displays, Plenum Press, NY, 1992, pp. 221–281.
M MAGNETIC FIELD IMAGING
MEASUREMENT METHODS Before computers became common tools, electromagnets were designed by using analytical calculations or by measuring representative voltage maps in electrolytic tanks and resistive sheets. Magnetic measurements on the final magnets and even on intermediate magnet models were imperative at that time. Nowadays, it has become possible to calculate the strength and quality of magnetic fields with impressive accuracy. However, the best and most direct way to verify that the expected field quality has been reached is magnetic measurements on the finished magnet. It is also the most efficient way of verifying the quality of series produced electromagnets to monitor tooling wear during production.
K. N. HENRICHSEN CERN Geneva, Switzerland
INTRODUCTION Magnetic field mapping is the production of maps or images of magnetic fields in space. Magnetic field maps are needed for designing and optimizing of magnets used in particle accelerators, spectrometers (mass, nuclear magnetic resonance, and electron paramagnetic resonance), and magnetic resonance imaging systems. Magnetic field maps are also used in geologic exploration where the variations in the magnitude and direction of the earth’s magnetic field are indicative of subsurface features and objects. Field mapping relies on various methods of measuring the magnetic field, generally one point at a time. These measurement methods are the main focus of this article. It is curious to note that most measurement methods have remained virtually unchanged for a very long period, but the equipment has been subject to continual development. In the following, only the more commonly used methods will be discussed. These methods are complementary and a wide variety of the equipment is readily available from industry. For the many other existing measurement methods, a more complete discussion can be found in two classical bibliographical reviews (1,2). An interesting description of early measurement methods can be found in (3). Much of the following material was presented at the CERN Accelerator School on Measurement and Alignment of Accelerator and Detector Magnets (4). Those proceedings contain a recent compendium of articles in this field and form a complement to the classical reviews.
Choice of Measurement Method The choice of measurement method depends on several factors. The field strength, homogeneity, variation in time, and the required accuracy all need to be considered. The number of magnets to be measured can also determine the method and equipment to be deployed. As a guide, Fig. 1 shows the accuracy that can be obtained in an absolute measurement as a function of the field level, using commercially available equipment. An order of magnitude may be gained by improving the methods in the laboratory. Magnetic Resonance Techniques The nuclear magnetic resonance technique is considered the primary standard for calibration. It is frequently used for calibration purposes and also for high precision field mapping. The method was first used in 1938 (5,6) for measuring the nuclear magnetic moment in molecular beams. A few years later two independent research teams observed the phenomenon in solids (7–9). Since then, the method has become the most important way of measuring
1 ESR
NMR
Accuracy (ppm)
10
Figure 1. Measurement methods: Accuracies and ranges.
Induction method
10
2
Magnetoinduction
Fluxgate 10
3
10
4
Ac Hall plate Dc Hall plate
10
−6
970
10
−5
10
−4
−3
−2
10 10 Field (tesla)
10
−1
1
10
MAGNETIC FIELD IMAGING
magnetic fields with very high precision. Because it is based on an easy and precise frequency measurement, it is independent of temperature variations. Commercially available instruments measure fields in the range from 0.011 T up to 13 T at an accuracy better than 10 ppm. Commercial units are also available for measuring weaker magnetic fields, such as the earth’s magnetic field (30 to 70 µT), but at lower accuracy. In practice, a sample of water is placed inside an excitation coil, powered from a radio-frequency oscillator. The precession frequency of the nuclei in the sample is measured either as nuclear induction (coupling into a detecting coil) or as resonance absorption (10). The measured frequency is directly proportional to the strength of the magnetic field whose coefficients are 42.57640 MHz/T for protons and 6.53569 MHz/T for deuterons. The magnetic field is modulated with a low-frequency signal to determine the resonance frequency (11). The advantages of the method are its very high accuracy, its linearity, and the static operation of the system. The main disadvantage is the need for a rather homogeneous field to obtain a sufficiently coherent signal. A small compensation coil that is formed on a flexible printed circuit board and provides a field gradient may be placed around the probe when used in a slightly inhomogeneous field. A correction of the order of 0.2 T/m may be obtained (11). The limited sensitivity and dynamic range also set limits to this method’s suitability. It is, however possible to use several probes with multiplexing equipment, if a measurement range of more than half a decade is needed. Pulsed NMR measurements have been practiced for various purposes (12,13), even at cryogenic temperatures (14). But equipment for this type of measurement is not yet commercially available. Finally, it should be mentioned that a rather exotic method of NMR measurement using water flowing in a small tube has given remarkably good results in low fields (15–17). It fills the gap in the measurement range up to 11 mT, for which NMR equipment is not yet commercially available. In addition, it provides a method of measurement in strong ionizing radiation such as in particle accelerators. It was tested for measurements in the bending magnets installed in the CERN Large Electron Positron collider (LEP). A resolution of 0.0001 mT was reached in the range from the remanent field of 0.5 mT up to the maximum field of 112.5 mT, and corresponding reproducibility was observed (18). The remarkable sensitivity and resolution of this measurement method makes it suitable for absolute measurements in low fields. In fact, it was even possible to detect the earth’s magnetic field outside the magnet, corresponding to an excitation frequency of about 2 kHz. However, the operation of this type of equipment is rather complicated due to the relatively long time delays in the measurement process. Electron spin resonance (ESR) (19–22) is a related and very precise method for measuring weak fields. It is now commercially available in the range from 0.55–3.2 mT,
971
has a reproducibility of 1 ppm, and is a promising tool in geology applications. Magnetic resonance imaging (MRI) has been proposed for accelerator magnet measurements (23). It is a very promising technique, which has proven its quality in other applications. However, the related signal processing requires powerful computing facilities, which were not so readily available in the past. The Fluxmeter Method This method is based on the induction law. The change of flux in a measurement coil will induce a voltage across the coil terminals. It is the oldest of the currently used methods for magnetic measurements, but it can be very precise (24). It was used by Wilhelm Weber in the middle of the last century (25) when he studied the variations in the strength and direction of the earth’s magnetic field. Nowadays, it has become the most important measurement method for particle accelerator magnets. It is also the most precise method for determining the direction of magnetic flux lines; this is of particular importance in accelerator magnets. The coil geometry is often chosen to suit a particular measurement. One striking example is the Fluxball (26) whose complex construction made it possible to perform point measurements in inhomogeneous fields. Measurements are performed either by using fixed coils in a dynamic magnetic field or by moving the coils in a static field. The coil movement may be rotation through a given angle, continuous rotation, or simply movement from one position to another. Very high resolution may be reached in field mapping by using this method (27). Very high resolution may also be reached in differential fluxmeter measurements using a pair of search coils connected in opposition, where one coil moves and the other is fixed, thus compensating for fluctuations in the magnet excitation current and providing a much higher sensitivity when examining field quality. The same principle is applied in harmonic coil measurements, but both coils move. A wide variety of coil configurations is used, ranging from the simple flip-coil to the complex harmonic coil systems used in fields of cylindrical symmetry. Induction Coils The coil method is particularly suited for measurements with long coils in particle accelerator magnets (28,29), where the precise measurement of the field integral along the particle trajectory is the main concern. Long rectangular coils were usually employed and are still used in magnets that have a wide horizontal aperture and limited gap height. In this case, the geometry of the coil is chosen to link with selected field components (30). The search coil is usually wound on a core made from a mechanically stable material to ensure a constant coil area, and the wire is carefully glued to the core. Special glass or ceramics that have low thermal dilatation are often used as core materials. During coil winding, the wire must be stretched so that its residual elasticity assures well-defined geometry and mechanical stability of the coil. Continuously rotating coils that have commutating polarity were already employed in 1880 (3). The harmonic
972
MAGNETIC FIELD IMAGING
coil method has now become very popular for use in circular cylindrical magnets, in particular, superconducting beam transport magnets. The coil support is usually a rotating cylinder. This method has been developed since 1954 (31,32). The induced signal from the rotating coil was often transmitted through slip rings to a frequency selective amplifier (frequency analyzer), thus providing analog harmonic analysis. The principle of a very simple harmonic coil measurement is illustrated in Fig. 2. The radial coil extends through the length of the magnet and is rotated around the axis of the magnet. As the coil rotates, it cuts the radial flux lines. Numerous flux measurements are made between predefined angles. This permits precise and simultaneous determination of the strength, quality, and geometry of the magnetic field. A Fourier analysis of the measured flux distribution results in a precise description of the field parameters in terms of the harmonic coefficients: Br (r, ϕ) = Bo
∞ r n−1 (bn cos nϕ + an sin nϕ) ro n=1
where Bo is the amplitude of the main harmonic and ro is a reference radius. bn and an are the harmonic coefficients. In this notation b1 will describe the normal dipole coefficient, b2 the normal quadrupole coefficient, etc. The corresponding skew field components are described by the coefficients a1 , a2 etc. Due to the advent of modern digital integrators and angular encoders, harmonic coil measurements have improved considerably and are now considered the best choice for most types of particle accelerator magnets, in particular those designed with cylindrical symmetry (33). In practice, the coil is rotated one full turn in each angular direction, and the electronic integrator is triggered at the defined angles by an angular encoder connected to the axis of the coil. To speed up the calculation of the Fourier series,
r j
it is an advantage to choose n equally spaced measurement points, where n is a power of 2 (e.g., 512). A compensating coil, connected in series and rotated with the main coil, may be used to suppress the main field component and thus increase the sensitivity of the system for measuring field quality. Dynamic fields are measured with a static coil linking to selected harmonics (34). The harmonic coil measurement principle and its related equipment are described in detail in (35). A thorough description of the general theory, including detailed error analysis, can be found in (36). The practical use of the harmonic coil method for large-scale measurements in superconducting magnets is described in (37,38) and more recent developments are in (39–43) Another induction measurement method consists of moving a stretched wire in a magnetic field, thus integrating the flux cut by the wire (44). It is also possible to measure the flux change while varying the field and keeping the wire in a fixed position. Tungsten is often selected as a wire material, if the wire cannot be placed in a vertical position. The accuracy is determined by the mechanical positioning of the wire. Sensitivity is limited but can be improved by using a multiwire array. This method is well suited to geometric measurements, to the absolute calibration of quadrupole fields, and in particular to measurements in strong magnets that have very small apertures. The choice of geometry and method depends on the useful aperture of the magnet. The sensitivity of the fluxmeter method depends on the coil surface and on the quality of the integrator. The coil–integrator assembly can be calibrated to an accuracy of a few tens of ppm in a homogeneous magnetic field by reference to a nuclear magnetic resonance probe, but care must be taken not to introduce thermal voltages in the related cables and connectors. Induced erratic signals from wire loops exposed to magnetic flux changes must also be avoided. One must measure the equivalent surface of the search coil and also its median plane which often differs from its geometric plane due to winding imperfections. In long measurement coils, it is important to ensure very tight tolerances on the width of the coil. If the field varies strongly over the length of the coil, it may be necessary to examine the variation of the effective width. The main advantage of search coil techniques is the possibility of very flexible coil design. The high stability of the effective coil surface is another asset. The linearity and the wide dynamic range also play important roles. The technique can be easily adapted to measurements at cryogenic temperatures. After calibration of the coils at liquid nitrogen temperature, only a minor correction has to be applied for use at lower temperatures. On the other hand, the need for relatively large induction coils and their related mechanical apparatus, which is often complex, may be a disadvantage. Finally, measurements with moving coils are relatively slow. Flux Measurement
Figure 2. Harmonic coil measurement.
Induction coils were originally used with ballistic galvanometers and later with more elaborate fluxmeters (45). The coil method was improved considerably by the
MAGNETIC FIELD IMAGING
development of photoelectric fluxmeters (46) which were used for a long time. The measurement accuracy was further improved by the introduction of the classic electronic integrator, the Miller integrator. It remained necessary, however, to employ difference techniques for measurements of high precision (47). Later, the advent of digital voltmeters made fast absolute measurements possible, and the Miller integrator has become the most popular fluxmeter. Due to the development of solid-state dc amplifiers, this integrator has become inexpensive and is often used in multicoil systems. Figure 3 shows an example of such an integrator. It is based on a dc amplifier that has a very low input voltage offset and a very high open-loop gain. The thermal variation of the integrating capacitor (C) is the most critical problem. Therefore, integrating components are mounted in a temperature-controlled oven. Another problem is the decay of the output signal through the capacitor and the resetting relay. So, careful protection and shielding of these components is essential to reduce the voltages across the critical surface resistances. The dielectric absorption of the integrating capacitor sets a limit to the integrator precision. A suitable integrating resistor is much easier to find. Most metal-film resistors have stabilities and temperature characteristics that match those of the capacitor. The sensitivity of the integrator is limited by the dc offset and the low-frequency input noise of the amplifier. A typical value is 0.5 µV, which must be multiplied by the measurement time to express the sensitivity in terms of flux. Thermally induced voltages may cause a problem, so care must be taken in choosing of cables and connectors. In tests at CERN, the overall stability of the integrator time constant proved to be better than 50 ppm during a period of three months. A few electronic fluxmeters have been developed by industry and are commercially available. In more recent years, a new type of digital fluxmeter has been developed, which is based on a high-quality dc amplifier connected to a voltage-to-frequency converter (VFC) and a counter. The version shown in Fig. 4 was
D T < 0.1°C C
R
Dc ampl
Figure 3. Analog integrator.
973
developed at CERN and is now commercially available. The input of the VFC is given an offset of 5 V to provide a true bipolar measurement. This offset is balanced by a 500-kHz signal which is subtracted from the output of the VFC. Two counters are used to measure with continuously moving coils and to provide instant readings of the integrator. One of the counters can then be read and reset while the other is active. In this way, no cumulative errors will build up. The linearity of this fluxmeter is 50 ppm. Its sensitivity is limited by the input amplifier, as in the case of an analog amplifier. This system is well adapted to digital control but imposes limits on the rate of change of the flux because the input signal must never exceed the voltage level of the VFC. To obtain reasonable resolution, the minimum integration period over the full measurement range must be of the order of one second. The Hall Generator Method In 1879, E.H. Hall discovered that a very thin metal strip that is immersed in a transverse magnetic field and carries a current develops a voltage mutually at right angles to the current and field that opposed the Lorentz force on the electrons (48). In 1910, the first magnetic measurements were performed using this effect (49). It was, however, only around 1950 that suitable semiconductor materials were developed (50–52) and since then the method has been used extensively. It is a simple and fast measurement method, that provides relatively good accuracy, and therefore it is the most commonly used method in largescale field mapping (53–55). The accuracy can be improved at the expense of measurement speed. Hall Probe Measurements The Hall generator provides an instant measurement, uses very simple electronic measurement equipment, and offers a compact probe that is suitable for point measurements. A large selection of this type of gaussmeter is now commercially available. The probes can be mounted on relatively light positioning gear (55). Considerable measurement time may be gained by mounting Hall generators in modular multiprobe arrays and applying multiplexed voltage measurement (56). Simultaneous measurements in two or three dimensions may also be carried out by using suitable probe arrays (57,58). The wide dynamic range and the possibility of static operation are other attractive features. However, several factors set limits on the accuracy obtainable. The most serious limitation is the temperature coefficient of the Hall voltage. Temperature stabilization is usually employed to overcome this problem (59), but it increases the size of the probe assembly. The temperature coefficient may also be taken into account in the probe calibration by monitoring the temperature during measurements (60). It also depends, however, on the level of the magnetic field (60), so relatively complex calibration tables are needed. Another complication can be that of the planar Hall effect (61), which makes measuring a weak field component normal to the plane of the Hall generator problematic if a strong field component is parallel to this
974
MAGNETIC FIELD IMAGING
Ampl
0 −1 MHz
+ _5 V Ampl
0 −10 V
Cntr 1 mP
VFC Cntr 2
Figure 4. Digital integrator.
+5 V
plane. This effect limits the use in fields of unknown geometry and in particular its use for determining of field geometry. Last but not least is the problem of the nonlinearity of the calibration curve because the Hall coefficient is a function of the field level. The Hall generator of the cruciform type (62) has better linearity and a smaller active surface than the classical rectangular generator. Therefore, its magnetic center is better defined, so it is particularly well suited for measurements in strongly inhomogeneous fields. Special types that have smaller temperature dependence are available on the market, but they have lower sensitivity. The measurement of the Hall voltage sets a limit of about 20 µT on the sensitivity and resolution of the measurement, if conventional dc excitation is applied to the probe. This is caused mainly by thermally induced voltages in cables and connectors. The sensitivity can be improved considerably by applying ac excitation (63,64). Good accuracy at low fields can then be achieved by employing synchronous detection techniques for measuring of the Hall voltage (65). Special Hall generators for use at cryogenic temperatures are also commercially available. Although they have very low temperature coefficients, they unfortunately reveal an additional problem at low temperatures. The socalled ‘‘Shubnikov–de Haas effect’’ (66,67) shows up as a field-dependent oscillatory effect of the Hall coefficient which may amount to about 1% in high fields, depending on the type of semiconductor used for the Hall generator. This adds a serious complication to calibration. The problem may be solved by locating the Hall generator in a heated anticryostat (68). The complications related to the planar Hall effect are less important at cryogenic temperatures and are discussed in detail in (69). Altogether, the Hall generator has proved very useful for measurements at low temperature (70).
500 kHz
A physically better representation is a piecewise cubic interpolation through a sufficient number of calibration points, which were measured with high precision. This can be done as a simple Lagrange interpolation or even better with a cubic spline function. The advantage of the spline function comes from its minimum curvature and its ‘‘best approximation’’ properties (71). The function adjusts itself easily to nonanalytic functions and is very well suited to interpolation from tables of experimental data. The function is defined as a piecewise polynomial of the third degree that passes through the calibration points so that the derivative of the function is continuous at these points. Very efficient algorithms can be found in the literature (72). The calculation of the polynomial coefficients may be somewhat time-consuming but need only be done once at calibration time. The coefficients (typically about 60 for the bipolar calibration of a cruciform Hall generator) can be easily stored in a microprocessor (59,65), and the subsequent field calculations are very fast. The quality of the calibration function can be verified from field values measured between the calibration points. A well-designed Hallprobe assembly can be calibrated to long-term accuracy of 100 ppm. The stability may be considerably improved by powering the Hall generator permanently and by keeping its temperature constant (56). Fluxgate Magnetometer The fluxgate magnetometer (73) is based on a thin linear ferromagnetic core on which detection and excitation coils are wound. The measurement principle is illustrated in Fig. 5. In its basic version, it consists of three coils
A
B
Calibration Hall generators are usually calibrated in a magnet in which the field is measured simultaneously by the nuclear magnetic resonance technique. The calibration curve is most commonly represented as a polynomial of relatively high order (7 or 9) fitted to a sufficiently large number of calibration points. This representation has the advantage of a simple computation of magnetic induction from a relatively small table of coefficients.
C Figure 5. Fluxgate magnetometer.
MAGNETIC FIELD IMAGING
wound around a ferromagnetic core: an ac excitation winding A, a detection winding B that indicates the zero field condition, and a dc bias coil C that creates and maintains the zero field. In practice, the coils are wound coaxially in successive layers. The core is made from a fine wire of Mumetal or a similar material that has an almost rectangular hysteresis curve. The method was introduced in the 1930s and was also named ‘‘peaking strip.’’ It is restricted to use with low fields but has the advantage of offering a linear measurement and is well suited for static operation. As a directional device of very high sensitivity, it is suitable for studying weak stray fields around magnets and mapping the earth’s magnetic field. Much more complex coil configurations are wound for precision measurements and where the measured field should not be distorted by the probe. The most interesting application is now in space research; important developments of this technique have taken place over the last decades (74–76). The use of modern materials for magnetic cores has improved the sensitivity to about 20 pT and can assure a wide dynamic range. The upper limit of the measurement range is usually of the order of a few tens of mT, but it can be extended by applying water cooling to the bias coil. Fluxgate magnetometers that have a typical range of 1 mT and a resolution of 1 nT are commercially available from several sources. They have many other practical applications, for example, in navigation equipment.
975
Faraday Effect The magneto-optical rotation of the plane of polarization of polarized light (Faraday effect) is a classical method for visualizing magnetic fields. A transparent container filled with a polarizing liquid and placed inside the magnet gap may reveal, for example, the field pattern in a quadrupole by observation through polarization filters placed at each end of the magnet. The rotation of the plane is proportional to the field strength and the length of the polarizing medium and may give a certain indication of the field geometry. This measurement principle has proved useful for measuring transient magnetic fields (83,84). It is less convincing when applied to the precise determination of magnet geometry, even though modern image processing techniques might improve the method substantially. Floating Wire Method Floating wire measurements were quite popular in the past (85). If a current-carrying conductor is stretched in a magnetic field, it will curve subject to the electromagnetic force and describe the path of a charged particle whose momentum corresponds to the current and the mechanical tension in the wire. A flexible annealed aluminium wire was used to reduce the effects of stiffness and gravity. This method has now been entirely replaced by precise field mapping and simulation of particle trajectories by computer programs.
Magnetoresistivity Effect
Measurements Based on Particle Beam Observation
Magnetoresistivity was discovered by W. Thomson in 1856 (77). It was exploited quite early, and a commercial instrument already existed at the end of the last century. Technical problems were, however, significant (78). Dependence on temperature and mechanical stress, combined with difficulties of manufacture and problems with electrical connections, caused a general lack of reliability in this measurement method. Similarly to the Hall generator, it was only when semiconductor materials became available that the method turned into a success. Then, inexpensive magnetoresistors came on the market and were also used for magnetic measurements (79). A more recent application for field monitoring was implemented in one of the large LEP spectrometers at CERN (80).
A method for precisely measuring the beam position with respect to the magnetic center of quadrupole magnets installed in particle accelerators has been developed during the last decade (86,87). The procedure consists of modulating the field strength in individual lattice quadrupoles while observing the resulting beam orbit oscillations. Local dc orbit distortions are applied in the search for the magnetic center. This so-called Kmodulation provides perfect knowledge of the location of the particle beam with respect to the center of a quadrupole. In addition, it may provide other very useful observations for operating and adjusting of the accelerator (88). This is obviously of particular importance for superconducting accelerators (89). It is very difficult to provide a superconducting quadrupole magnet that has a direct optical reference to its magnetic center, so errors caused by changes of temperature profiles and other phenomena may build up as time passes. The method may be further improved by synchronous detection of the oscillation, so that its phase can be identified. The sensitivity of the detection is impressive. Experience from LEP (90) showed that an absolute accuracy of 0.05 mm could be obtained in both the vertical and horizontal planes. Furthermore, it was observed that modulation of the quadrupole field by about 300 ppm could be clearly detected, which means that measurements may be carried out on colliding beams while particle physics experiments are taking place. This measurement method also played an important role for adjusting the so-called Final Focus Test Beam at Stanford Linear Accelerator Center (SLAC) (91,92).
Visual Field Mapping The best known visual field mapper is made by spreading iron powder on a horizontal surface placed near a magnetic source, thus providing a simple picture of the distribution of flux lines. Another very classical way of observing fluxline patterns is to place a free-moving compass needle at different points in the volume to be examined and note the direction of the needle. This compass method was applied, long before the discovery of electromagnetism, to studies of the variations in the direction of the earth’s magnetic field. Another visual effect may be obtained by observing the light transmission through a colloidal suspension of diamagnetic particles subject to the field (81,82).
976
MAGNETIC FIELD IMAGING
Magnetoinductive Technology Magnetic field sensors have been developed based on a change in inductance (L) caused by an applied magnetic field (93). These sensors, referred to as magnetoinductive sensors, contain an alloy whose permeability changes linearly over the sensors’ useful range in an applied magnetic field. When the alloy is incorporated into an inductor, the inductance will change as the applied field changes. Magnetoinductive sensors contain a resonant LC circuit. As the applied field changes, so does the resonant frequency of the circuit. These devices have a dynamic range of ± 1,000 µT and an accuracy of ± 0.4 µT. CONCLUDING REMARKS Proven measurement methods and powerful equipment are readily available for most of the measurement tasks related to beam-guiding magnets as well as for spectrometer magnets. Therefore, it is prudent to examine existing possibilities carefully before launching the development of a more exotic measurement method. Many unnecessary costs and unpleasant surprises can be avoided by choosing commercially available instruments. The measurement methods described are complementary, and a combination of two or more of them will certainly meet most requirements. In the field of new technologies, two methods merit consideration. Magnet resonance imaging is a promising technique, that could find lasting application. The use of superconducting quantum interference devices (SQUIDS) might also in the long run become an interesting alternative as an absolute standard and for measuring of weak fields (94,95). The complexity of these methods still prevents current laboratory use. BIBLIOGRAPHY 1. J. L. Symonds, Rep. Progr. Phys. 18, 83–126 (1955). 2. C. Germain, Nucl. Instr. Meth. 21, 17–46 (1963). 3. L. W. McKeehan, J. Opt. Soc. Am. 19, 213–242 (1929). 4. S. Turner, ed., Proceedings, CERN Accelerator School, Measurement and Alignment of Accelerator and Detector Magnets, CERN 98-05, Geneva, Switzerland, 1998. 5. J. J. Rabi, J. R. Zacharias, S. Millman, and P. Kusch, Phys. Rev. 53, 318 (1938).
14. W. G. Clark, J. M. Moore, and W. H. Wong, Proc. of the 2nd Int. Ind. Symp. on the Supercollider, Miami Beach, 1990, pp. 405–414. 15. G. Suryan, Proc. Indian Acad. Sci. A33, 107–111 (1951). 16. C. Sherman, Rev. Sci. Instr. 30, 568–575 (1959). 17. J. M. Pendlebury et al., Rev. Sci. Instr. 50, 535–540 (1979). 18. L. Jansak et al., Proceedings of the Int. conf. on Measurement, Measurement 97, Smolenice, Slovakia, May 1997, pp. 328–331. 19. D. Bourdel, J. Pescia, and P. Lopez, Rev. Phys. Appl. 5, 187–190 (1970). 20. F. Hartmann, IEEE Trans. Magn. MAG-8(1), 66–75 (1972). 21. D. Duret et al., Rev. Sci. Instr. 62(3), 685–694 (1991). 22. N. Kernevez, D. Duret, M. Moussavi, and J. -M. Leger, IEEE Trans. Magn. 28(5), 3,054–3,059 (1992). 23. D. A. Gross, Proc. of the ICFA Workshop on Superconducting Magnets and Cryogenics, Brookhaven National Lab., Upton, May 1986, pp. 309–311. 24. J. H. Coupland, T. C. Randle, and M. J. Watson, IEEE Trans. Magn. MAG-17, 1,851–1,854 (1981). 25. W. Weber, Ann. Physik 2, 209–247 (1853). 26. W. F. Brown and J. H. Sweer, Rev. Sci. Instr. 16, 276–279 (1945). 27. A. Da¨el et al., Int. J. Mod. Phys. A 2B(HEACC’92), 650–652 (1993). 28. E. A. Finlay, J. F. Fowler, and J. F. Smee, J. Sci. Instr. 27, 264–270 (1950). 29. B. C. Brown, Proc. of the ICFA Workshop on Superconducting Magnets and Cryogenics, Brookhaven National Lab., Upton, May 1986, pp. 297–301. 30. B. de Raad, Thesis, Delft, 1958, pp. 55–67. 31. W. C. Elmore and M. W. Garrett, Rev. Sci. Instr. 25, 480–485 (1954). 32. I. E. Dayton, F. C. Shoemaker, and R. F. Mozley, Rev. Sci. Instr. 25, 485–489 (1954). 33. C. Wyss, Proc. 5th Int. Conf. on Magnet Technology (MT-5), Frascati, Italy, 1975, pp. 231–236. 34. G. H. Morgan, Proc. 4th Int. Conf. on Magnet Technology, Brookhaven National Lab., Upton, 1972, pp. 787–790. 35. L. Walckiers, CERN Accelerator School, Montreux, Switzerland, CERN 92-05, 138–166 (1992). 36. W. G. Davies, Nucl. Instr. Meth. A311, 399–436 (1992). ¨ ¨ 37. H. Bruck, R. Meinke, and P. Schmuser, Kerntechnik 56, 248–256 (1991). ¨ 38. P. Schmuser, CERN Accelerator School, Montreux, Switzerland, CERN 92-05, 240–273 (1992).
6. J. J. Rabi, S. Millman, P. Kusch, and J. R. Zacharias, Phys. Rev. 55, 526–535 (1939).
39. M. I. Green, R. Sponsel, and C. Sylvester, Proc. of the 5th Int. Ind. Symp. on the Supercollider, San Francisco, 1993, pp. 711–714.
7. E. M. Purcell, H. C. Torrey, and R. V. Pound, Phys. Rev. 69, 37–38 (1946).
40. R. Thomas et al., Proc. of the 5th Int. Ind. Symp. on the Supercollider, San Francisco, 1993, pp. 715–718.
8. F. Bloch, W. W. Hansen, and M. Packard, Phys. Rev. 69, 127 (1946).
41. J. Billan et al., IEEE Trans. Magn. 30(4) (MT-13), 2,658–2,661 (1994).
9. F. Bloch, W. W. Hansen, and M. Packard, Phys. Rev. 70, 474–485 (1946).
42. J. Buckley, D. Richter, L. Walckiers, and R. Wolf, IEEE Trans. Appl. Superconducte 5(2) (ASC’94), 1,024–1,027 (1995).
10. N. Bloembergen, E. M. Purcell, and R. V. Pound, Phys. Rev. 73, 679–712 (1948). 11. K. Borer and G. Fremont, CERN 77, 19–42 (1977).
43. J. Billan, S. De Panfilis, D. Giloteaux, and O. Pagano, IEEE Trans. Magn. 32(4) (MT-14), 3,073–3,076 (1996).
12. W. G. Clark, T. Hijmans, and W. H. Wong, J. Appl. Phys. 63, 4,185–4,186 (1988).
44. D. Zangrando and R. P. Walker, Nucl. Instr. Meth. A376, 275–282 (1996).
13. R. Prigl et al., Nucl. Instr. Meth. A374, 118–126 (1996).
45. M. E. Grassot, J. Phys. 4, 696–700 (1904).
MAGNETIC RESONANCE IMAGING 46. R. F. Edgar, Trans. Am. Inst. Elect. Eng. 56, 805–809 (1937). 47. G. K. Green, R. R. Kassner, W. H. Moore, and L. W. Smith, Rev. Sci. Instr. 24, 743–754 (1953). 48. E. H. Hall, Amer. J. Math. 2, 287–292 (1879). 49. W. Peukert, Elektrotechn. Zeitschr. 25, 636–637 (1910). 50. G. L. Pearson, Rev. Sci. Instr. 19, 263–265 (1948). 51. H. Welker, Z. Naturforschung 7a, 744–749 (1952).
977
84. J. L. Robertson, D. T. Burns, and D. Hardie, Nucl. Instr. Meth. 203, 87–92 (1982). 85. L. G. Ratner and R. J. Lari, Proc. Int. Symp. on Magn. Technol., Stanford, 1965, pp. 497–504. 86. D. Rice et al., IEEE Trans. Nucl. Sci. NS-30, 2,190–2,192 (1983). 87. P. Rojsel, Nucl. Instr. Meth. A343, 371–382 (1994). 88. R. Brinkmann and M. Boge, 4th Euro. Particle Accelerator Conf., London, 1994, pp. 938–940.
52. H. Welker, Elektrotechn. Zeitschr. 76, 513–517 (1955). 53. E. Acerbi et al., IEEE Trans. Magn. MAG-17(MT-7), 1,610–1,613 (1981). 54. C. Bazin et al., IEEE Trans. Magn. MAG-17, 1,840–1,842 (1981).
89. J. Deregel et al., CERN LHC Project Report 4, (March 1996).
55. D. Swoboda, IEEE Trans. Magn. MAG-17, 2,125–2,128 (1981).
91. F. Bulos et al., 1991 Particle Accelerator Conf., San Francisco, (May 1991), pp. 3,216–3,218.
56. M. Tkatchenko, Private communication. 57. S. Kawahito, S. O. Choi, M. Ishida, and T. Nakamura, Sensors and Actuators A 40, 141–146 (1994). 58. J. Kvitkovic and M. Majoros, 6th European Mag. Mater. Appl. Conf., Vienna, Austria, 1995.
92. P. Tenenbaum et al., 1995 Particle Accelerator Conf., Dallas, (May 1995), pp. 2,096–2,098.
59. K. Brand and G. Brun, CERN 79, 02–24 (1979). 60. M. W. Poole and R. P. Walker, IEEE Trans. on Magn. MAG17, 2,129–2,132 (1981). 61. C. Goldberg and R. E. Davis, Phys. Rev. 94, 1,121–1,125 (1954). 62. J. Hauesler and H. J. Lippmann, Solid State Electron. 11, 173–182 (1968).
90. I. Barnett et al., 6th Beam Instrumentation Workshop, Vancouver, Canada, 1994.
93. Digital Compass and Magnetometer having a Sensor Coil Wound on a High Permeability Isotropic Core, US Pat. 4,851,775, N. H. Kim and T. Hawks. 94. G. L. Romani, Proc. 9th Int. Conf. on Magn. Technol. (MT-9), Zurich, Switzerland, 1985, pp. 236–242. 95. D. Drung, Eur. Conf. Appl. Superconductivity (EUCAS), Gottingen, Germany, 1993, pp. 1,287–1,294.
MAGNETIC RESONANCE IMAGING
63. J. J. Donoghue and W. P. Eatherly, Rev. Sci. Instr. 22, 513–516 (1951).
ROBERT W. PROST
64. C. D. Cox, J. Sci. Instr. 41, 695–691 (1964). 65. K. R. Dickson and P. Galbraith, CERN 85, 13–42 (1985).
Medical College of Wisconsin Milwaukee, WI
66. J. Babiskin, Phys. Rev. 107, 981–992 (1957). 67. H. P. R. Frederikse and W. R. Hosler, Phys. Rev. 110, 880–883 (1958). 68. M. Polak, Rev. Sci. Instr. 44, 1,794–1,795 (1973). 69. M. Polak and I. Hlasnik, Solid State Electron. 13, 219–227 (1970). 70. J. Kvitkovic and M. Polak, Eur. Conf. on Appl. Superconductivity (EUCAS), Gottingen, Germany, 1993, pp. 1,629–1,632. 71. J. L. Walsh, J. H. Ahlberg, and E. N. Nilson, J. Math. Mech. 11, 225–234 (1962). 72. A. Ralston and H. Wilf, eds., Mathematical Methods for Digital Computers, vol. 2, John Wiley & Sons, Inc., New York, NY, 1967, pp. 156–168. 73. J. M. Kelly, Rev. Sci. Instr. 22, 256–258 (1951). 74. D. I. Gordon and R. E. Brown, IEEE Trans. Magn. MAG-8, 76–82 (1972). 75. F. Primdahl, J. Phys. E: Sci. Instr. 12, 241–253 (1979). 76. O. V. Nielsen, T. Johansson, J. M. Knudsen, and F. Primdahl, J. Geophys. Res. 97, 1,037–1,044 (1992). 77. W. Thomson, Philos. Trans. 146, 736–751 (1856). 78. P. Kapitza, Proc. R. Soc. A 119, 358 (1928). 79. E. Welch and P. R. Mace, Proc. 3rd Int. Conf. on Magn. Technol., Hamburg, Germany, 1970, pp. 1,377–1,391. 80. G. Brouwer et al., Nucl. Instr. Meth. A313, 50–62 (1992). 81. J. K. Cobb and J. J.Muray, Nucl. Instr. Meth. 46, 99–105 (1967). 82. D. Trbojevic et al., 1995 Particle Accelerator Conf., Dallas, (May 1995), pp. 2,099–2,021. 83. J. Malecki, M. Surma, and J. Gibalewicz, Acta Phys. Polon. 16, 151–156 (1957).
INTRODUCTION Imaging of materials that contain nuclei of nonzero spin values has become a valuable tool in science, medicine, and industry. The principal use of MRI is to create an image of the distribution of certain physical properties of a substance, whether in the fluid-filled rock surrounding an oil well (1–4), soil (5), food-processing (6–11), industrial process control (12) or a human subject (13,14). These physical properties imaged include the chemistry of the material (15), its fluidity (9), its interactions with other materials in the molecular-level neighborhood (8), and even the flow of liquids within the object (16). Magnetic resonance phenomena occur on a timescale that is short relative to many chemical reactions and can often be used to investigate the kinetics of chemical reactions (17,18). Many different elements are amenable to investigation by magnetic resonance, but the sensitivity of MR is poor. Typically, only one out of 100,000 nuclei can be detected in a magnetic field of 15,000 Gauss or 1.5 Tesla (19). This is analogous to detecting a single voice in the crowd at a University of Michigan football game (capacity 109,000)! As a result of this insensitivity, magnetic resonance is not suited to detecting of chemical moities of less than millimolar quantities. For the same reason, gases are largely unsuitable to investigation by magnetic resonance, except for hyperpolarized gases 3 He and 129 Xe (20,21). All solid or liquid objects, that bear suitable nuclei can be imaged. However, spins within solid materials
978
MAGNETIC RESONANCE IMAGING
lose their signal on a very short timescale. This brevity of signal lifetime makes imaging quite difficult, yet not impossible (22). Examples of nuclei that bear a magnetic moment, or nonzero spin include hydrogen, helium-3, xenon-129, phosphorus, lithium, fluorine, carbon-13 and sodium-23. An important and recent use of the MR phenomenon is in medical imaging. As of 1999, more than 4,500 MRI systems were installed in the United States (23). The reason for the success of MRI in medical imaging is fourfold. The first is the abundance of hydrogen in the human body. Hydrogen nuclei in the brain are about 70 Molar in concentration (24) which is much higher than any other material that can be detected by NMR. The second is that the hydrogen is distributed in the organs in a way which allows using the MR signal to create images. The third is that imaging of hydrogen produces exquisitely detailed images of internal organ structure. The fourth is the functional data that can be extracted. These data include molecular diffusion (25), distribution of chemical moieties (26), flow (27), flow in the capillary bed (28), oxygen utilization (29,30), magnetization transfer rates (31,32), and disruption of the blood–brain barrier (33).
B0
(a)
(b)
B0 = 0
B0 > 0 ∆E = g h B0
Figure 1. (a) Cartoon of nuclei showing alignment parallel and antiparallel to the main magnetic field B0 . (b) Quantum mechanical description of spin system in the degenerate (B = 0) and nondegenerate (B > 0) state, where the energy level difference between states is proportional to B0 .
BASIC MRI PHYSICS A constant magnetic field on a group or ensemble of spins is creates a slight polarization where the ratio of spins antiparallel to B0 to the spins parallel to B0 is given by (34) N− = e(−λ¯hH0 /kT) , (1) N+ where N− is the number of spins antiparallel to B0 , N+ is the number of spins parallel to B0 , γ is the gyromagnetic ratio of the nucleus, h ¯ is the Schroedinger’s constant, k is Boltzmann’s constant, and T is the sample temperature. This effect can be visualized as a tendency to align parallel or antiparallel to B0 . A classical conception suggests that this is caused by the torque exerted on the intrinsic magnetic field of the spin by the constant magnetic field B0 (Fig. 1a). The polarization proportional to B0 is very slight and accounts for only one out of 100,000 spins for hydrogen nuclei in a 1.5 Tesla magnet at room temperature. It is only this miniscule fraction of the spin ensemble, from which the MR signal can be created. The reason that the polarization is so slight is that the magnetic field competes with the randomizing effect of the background thermal vibrations. Lowering the sample temperature can dramatically increase the polarization of the spin ensemble. Usually, this is not an option in intact biological systems. The correct description of the magnetic resonance phenomenon requires extensive use of quantum mechanics. The quantum mechanical description of the effect of B0 is to split the energy levels of the spin system (Fig. 1b) (34). All further processes can be mathematically described by
the evolution of the spin states (35). However, the quantum mechanical description does not provide an intuitive feeling for magnetic resonance phenomena. In 1946, Felix Bloch developed a phenomenological description of magnetic resonance, which visualizes the sum of the spins as a rotating vector (Fig. 2) that precesses around the direction of B0 (36). The rate of the precession f is proportional to B0 , as described by the Larmor equation: f = γ B0 .
(2)
The constant of proportionality γ is known as the gyromagnetic ratio. The value of γ is a property of the magnetic moment of the nucleus and thus is different
B0 z y
x Figure 2. In the phenomenological model proposed by Bloch, the spin precesses about the B0 vector, and the smaller circle illustrates the rotation of the sphere of charge which is the spin. The rotation gives rise to the intrinsic magnetic moment of the spin. The coordinate system used in this diagram is arbitrary but will be used throughout this article for consistency.
MAGNETIC RESONANCE IMAGING
for all nuclei. The gyromagnetic ratio of hydrogen is 42.5759 MHz/Tesla. Therefore, if the main magnetic field is 1.5 Tesla, the frequency of precession is 63.86 MHz. This frequency is the resonance condition. Nuclei that have zero spin, such as carbon-12, have a γ = 0 and thus do not exhibit the resonance condition. Excitation and Reception To produce a detectable signal, the spin vector M0 must be rotated at least partially into the transverse plane (Fig. 3), so that it creates an induced signal that can be detected by an antenna. To simplify the explanation of the way this is done, consider the rotating frame of reference. If one imagines the observer rotating around the z axis at the Larmor frequency, then the precession of the aligned spins becomes stationary (Fig. 4). The rotating frame of
B0 z y
x
B1 Figure 3. The B1 (RF transmission) field in the laboratory frame of reference.
B0 z
reference will be used for all further description of spin motion in this article. In the rotating frame, the B0 field appears to have vanished because all of the precession is provided by the rotation of the frame. This ability to ‘‘ignore’’ B0 in the rotating frame is one of the principal values of this approach. To effect the rotation of M0 , a second magnetic field, denoted B1 , is created in the transverse plane (37) in the rotating frame. Because the B1 field has to rotate at the Larmor frequency, the B1 field is created by a radio-frequency pulse whose frequency is at or near the resonant frequency f = γ B0 . The B1 field is also known as an RF pulse because the duration of the pulse is usually quite limited, typically 1–10 ms. In the rotating frame, it is easy to see why this B1 field will rotate the spins into the transverse plane. Because the B0 field has vanished in the rotating frame, the spins simply precess about B1 . After a short time, they have rotated perpendicularly to their original direction, at which time the B1 can be switched off. The B1 field is created by transmitting the radio-frequency pulse into an antenna. Antennas used in radio communication are designed to radiate power by creating electromagnetic waves. The antennas used in magnetic resonance are designed to produce a pure magnetic field (38). This is important because the nuclear spins do not respond to an electric field such as that produced by a dipole communication antenna. In many instances, the antenna used for excitation is also used to receive the MR signal emitted by the sample. For historical reasons, MR antennas are referred to as coils. A coil positioned so that its normal vector is perpendicular to the B0 field, as shown in Fig. 5 has a current induced in it by the transverse component of the magnetization vector Mxy (39). The graph in Fig. 6a illustrates the received signal when the resonant frequency of the spins (= γ B0 ) is set the same as the operating frequency of the transceiver. If the operating frequency of the transceiver is changed by 100 Hz, the resulting signal will be as shown in Fig. 6b. Importantly, the same effect can be created by changing the B0 field by (100 Hz/γ ) gauss. It can be seen in this example that setting the frequency of operation to that of the spin system allows observing the spins in the ‘‘rotating frame of reference’’ where the magnetization vector no longer precesses about B0 . Slight frequency changes then
B1 Mz
979
B0
y
z y
RF coil
x x
Figure 4. Effect of the B1 field on the magnetic moment (Mz ) in the rotating frame of reference. The torque provided by the B1 field rotates the Mz vector into the transverse (x, y) plane where it becomes Mxy .
B1 Figure 5. Spin system in the rotating frame of reference showing the induction of a signal in an RF coil whose normal vector is in the transverse plane.
980
MAGNETIC RESONANCE IMAGING
B0
(a)
(b)
B0
z
z y B1
y
x x Mxy
B1
1 1
0.8
0.5
0.6 0 0.4
0.2
0
−0.5
0
0.2
0.4
0.6
0.8
1
T
−1
0
0.2
0.4
0.6
0.8
1
T Figure 6. Transverse magnetization decreases with time in (a) the rotating frame at resonance and (b) the rotating frame 100 Hz off-resonance.
make the magnetization vector appear to move forward or back. Through spins are not always on resonance, it is instructive and makes understanding the spin simple system within the framework of the phenomenological Bloch model. The power of the excitatory signal may be of the order of 10,000 watts when the MR of the body is at 1.5 Tesla, but the received signal is tiny, of the order of 10−12 watts. A signal this small is easily overwhelmed by stray fields from broadcast television, radio, and computer sources. Therefore, the MR procedure is always conducted within an electrically shielded enclosure like a Faraday cage. The Free Induction Decay The signal shown in Fig. 6, usually referred to as a free induction decay signal (FID), is the signal produced by the transverse component of the spins, Mxy , immediately after excitation. The frequency of precession for a spin is determined only by its local magnetic field. That field is the
sum of the imposed B0 field and typically far smaller fields. The smaller fields are the result of the nonuniformity of the B0 field and other objects that distort the main field. The field can be distorted by magnetic objects, be they ferromagnetic, paramagnetic, or diamagnetic. The field can also be distorted by surfaces between regions of strongly different magnetic susceptibility such as that between the sample and the surrounding air. The main magnetic fields used in MR must be highly spatially homogeneous. A typical MR magnet used in clinical imaging may have an inhomogeneity of less than one part per million (ppm) across a 20-cm diameter spherical volume. The 1 ppm inhomogeneity translates to a difference of 0.015 Gauss on a 1.5 T magnet. In a group of spins within a sample, some spins will precess over a range of frequencies. In the rotating frame then, some spins will appear to precess in a forward direction, and others appear to precess backward. On a timescale from microseconds to seconds, the spins will spread out to form a uniform disk in the transverse plane, as in Fig. 7.
MAGNETIC RESONANCE IMAGING
B0
981
B0
B0
1 0.8
Mxy (Te)
0.6 0.4 0.2 0
0
20
40
60
80
100 (Te )
120
140
160
180
200
Figure 7. Mxy versus time for a group of spins on resonance. Increasing time allows the spins to disperse, where some spins precess faster, and others precess slower causing the sum vector Mxy to decrease exponentially with time. This is known as transverse relaxation.
Echo Formation
weighted. The family of pulse sequences based on this method is called steady-state free precession (SSFP) (42) or fast imaging with steady precession (FISP) (43). The available signal-to-noise of the technique is low due to the low flip angle of the B1 pulses used.
The Gradient Echo. The FID is not the only method to receive a signal from the sample. The spin system may be manipulated to change the time at which Mxy is maximum. The signal thus formed is referred to as an echo, similar to the echo received in a radar system (40). For a gradient echo, the B1 pulse tips the M0 into the transverse plane as for the FID, but then a gradient magnetic field is applied that produces a strong dispersion of precessional frequencies in the sample, as in Fig. 8. So dispersed, no net Mxy vector remains, and the signal vanishes. The dispersion is a linear function of the distance from the origin in the direction of the gradient. By later reversing the sign of the magnetic gradient field which was applied, coherence is restored (41). At long sequence repetion times Tr , the amplitude of the signal is governed by a characteristic decay rate of the spin system called T2∗ . At short Tr times, where the gradient echo is often used, image contrast can become a mixture of T1 and T2∗ contrast. At short Tr times, significant Mxy may remain that will then interact with the following B1 pulse and generate a signal that interferes with the desired signal causing image ghosting. To discard this signal, a spoiling gradient may be inserted at the end of the pulse sequence. More effectively, the phase of the B1 pulse can be varied. Alternatively, the interfering signal can be collected instead of the gradient echo. This signal, formed by the interaction of successive B1 pulses is strongly T2
The Spin Echo. Inhomogeneities in the magnetic field are not the only source of decay in the FID or gradient echo signal. Spins that are close to one another can exchange energy and lose coherence. Additionally, spins that move between the time the system is excited and when it is sampled will experience different local magnetic fields either from local inhomogeneities or imposed gradient magnetic fields. This is a diffusional motion. If one imposes another B1 pulse (Fig. 9a) at a point halfway between the first B1 pulse and the echo time, the spin system can be refocused (44,45). The second B1 pulse is a 180° pulse and is twice as strong as the first pulse, a 90° pulse. The action of the 180° pulse can be thought of as follows. Imagine that the disk of spins that are spreading out is flipped over (Fig. 9b). The spins, still traveling in the same direction, are now heading back toward the point in the transverse plane where they were an instant after the initial 90° pulse. At a time equal to that between the 90° pulse and the 180° pulse, the spins will again reach the point of coherence (Fig. 9c). The signal which was lost due to magnetic field inhomogeneity is regained because spins retarded or advanced by local errors in the magnetic field all arrive together. The only signal lost in the spin-echo method comes from those spins that move microscopically (diffusion) or macroscopically(gross motion of the sample). The method of the spin echo was first described by Erwin Hahn in 1950 (46).
When the spins are uniformly spread in the transverse plane, no signal is induced in the coil because there is no change in the net magnetic vector Mxy with time.
Tr
RF Gr
Gd Readout gradient
Received signal
Te
Te
Signal phase Gr Readout gradient axis Gd Figure 8. Echo formation in a gradient echo sequence. The effect of the frequency encoding gradient (Gd) is to add signal phase to the spins proportional to the distance from the isocenter in the direction of the gradient. The readout gradient (Gr) rewinds the phase, causing the echo to peak at the center of the readout time, Te . The signal phase for three spins spaced out in the direction of the readout gradient (represented by red, green, and blue dots) is also plotted.
B0
(b)
(d)
(c)
B0
(a)
M0
B0
B0
RF 90 (B1) RF 180 (B1)
Mx
Echo
Te /2 Te Figure 9. Echo formation in a spin-echo sequence. (a) Initial RF90 pulse rotates Mz into Mx . (b) Spins lose transverse coherence. (c) RF180 pulse reverses the direction of the diffusing spins. (d) Center of received signal coherence occurs at time Te .
982
MAGNETIC RESONANCE IMAGING
(a)
(b)
B0
(c)
B0
B0 z
z
z Mz
B1 y
y
(a)
B1 y
Mx x
x
(b)
(c)
983
Mx x
(d)
(e) Echo
Te /2 (d)
Te /2 Tm
B0
(e)
z
B0 z
B1 y
−Mxy
x −Mz
The Stimulated Echo. An RF180 pulse can be used to refocus the spins, and as Hahn showed in his 1950 paper on the spin echo, a train of 90° RF pulses can produce a similar effect. If the RF 180 pulse in the spin-echo sequence is replaced by two RF90 pulses, the result will be as shown in Fig. 10. The Mz magnetization is first converted to the transverse plane. The second RF90 pulse converts the Mxy magnetization to −Mz , that is, the magnetization is now directed along −z. The third RF90 pulse then rotates the −Mz magnetization back into the Mxy plane where it can be detected. The value of this sequence is that no T2 decay effect occurs in the period between the second and third RF pulses because there is no transverse component. This period can then be used to alter the spin system in some manner. One example of this is to add a water suppression pulse in this time when stimulated echoes are used for proton spectroscopy (47). MR Parameters Magnetic resonance is an extremely useful tool despite its limited sensitivity. The great utility of MR is derived from the many contrast mechanisms that are contained in the MR signal. All other modalities in diagnostic medical imaging have a two-dimensional image space. For example, in computed tomography (CT), the image dimensionality is space versus atomic density. In ultrasound, it is space versus acoustic reflectivity. In nuclear medicine, it is space versus density of radionucleide decays. In MR however, the contrast space is vast. A partial listing gives space versus spin density, T1, T2, T2∗ , flow, diffusion, perfusion, magnetization transfer, and metabolite density. The challenge in using MR is selecting spin system preparative methods that present advantageous contrast. Sometimes these are combinations of the various basic contrasts, and at other times, they may be a single, pure contrast.
y x
Figure 10. Echo formation in a Hahn echo or stimulated echo sequence. (a) Spins are initially aligned along B0 . (b) The first RF90 pulse rotates the spin vector to become Mx (c) A second RF90 pulse rotates Mx along −z to become −Mz . The interval shown between (c) and (d) is longitudinal storage time and does not count against echo time because there is no transverse component to dephase. (d) The third and final RF90 pulse rotates −Mz to become −Mx where it can be read out.
Spin Density. Spin density is the most like CT of all the MR contrast parameters. The spin density is simply the number of spins in the sample that can be detected. The observed spin density in medical imaging is always less than the actual spin density due to the fact that many spins are bound and lose signal before they can be observed. T1 Relaxation. After the spins are tipped into the transverse plane, the spin vector along B0 (called Mz ) is depleted. If the spins are not tipped into the transverse plane again, the M0 signal will regrow exponentially with a time constant T1: Mz = M0 · (1 − eTr /T1 ),
(3)
where Tr is the time between repetitions of the excitation and T1 is the longitudinal relaxation rate. The Mz component, also called longitudinal magnetization, is the source of the signal for the experiment. If Mz = 0, the next repetition of the experiment will yield no signal because there will be no Mz to rotate into the transverse plane. The Mz vector is 99% of its equilibrium value M0 when the Tr time is five times T1. The mechanism for relaxation is the interaction between the excited spins and the stationary spins, referred to as the lattice. The amount of interaction with the lattice is the determining factor in T1. In biological systems, water molecules come into frequent contact with the lattice and thus have relatively short T1 times (<1 second). The T1 times for biological tissues tend to increase as the inverse square of the field strength (48). For simple isotropic motion, T1 is given by 4τb τb 1 + ≈B T1 1 + ω2 τb2 1 + 4ω2 τb2
(4)
984
MAGNETIC RESONANCE IMAGING
where B is a measure of the interactive strength between the spin and the lattice, ω is the resonant frequency, given by the Larmor equation, and τb is the correlation time between the spin and the lattice. In a gas, there is no lattice, so polarized gas molecules have T1 times of the order of days. In solids, the spins are relatively stationary and so do not experience frequent changes in contact with the lattice. Therefore, the T1 times in solids are also long. The T1 time of a sample increases as B0 increases.
T 2∗ Relaxation. The rate of decay of the Mxy signal after the initial excitation is T2∗ and can be appreciated in the graph of Fig. 7. This decay rate governs the amplitude of the signal in gradient echo experiments, despite the fact that the signal is sampled as a delayed echo, not an FID. The signal from the sample is a function of the delay from excitation, known as echo time, or Te : ∗
S(Te ) = S(0) · e−Te /T2
(5)
The T2∗ time constant is a function of both the reversible losses due to magnetic field inhomogeneity and irreversible losses from diffusional motion T2 (49): 1 1 + γ B0 = ∗ T2 T2
(6)
By passing a current through the coil pair, a magnetic field, directed along the main magnetic field, is created whose intensity is a function of position. The coil geometry is such that the physical center of the coil experiences a field intensity of zero, and the intensity of the field away from the coil center is a linear function of distance along one axis. The direction of the field created by the gradient coil is parallel to the main field, as shown in Fig. 11. Only alterations of the main magnetic field result in a change in the resonant frequency of the spin at that location. All three gradients, X, Y, and Z produce changes in the main field which vary in intensity in the x, y or z directions, respectively. The first method used for localization was designated the sensitive point method (54). This was accomplished by simultaneously applying gradients in three orthogonal dimensions. By adjusting the strength of the gradients, only one small volume in space remains on resonance. After collecting the signal from the spins at that location, the gradients are adjusted to shift the resonant point, and the signal from the newly resonant spins is collected. The speed with which the voxel can be shifted is limited by the T1 times of the sample. A single image that has a resolution of 256 × 256 and a Tr time of 1 second requires 65,536 seconds (more than 18 hours) to acquire. A second method, called line scanning, uses two gradients to create a right rectangular prism of resonant spins (55,56). Spatial localization along the length of the prism was accomplished by frequency encoding where the origin of the signal was determined by the resonant
Here the magnetic field inhomogeneity contribution is given by the term γ B0 .
T 2 Relaxation By adding the 180° B1 pulse as previously noted, the reversible portion of the T2∗ decay can be removed by restoring it to the signal. Then, the signal decreases with time as (7) S(Te ) = S(0) · e−Te /T2
z y
Note that T2∗ is always less than or equal to T2. Localization Principles and Techniques Localizing the origin of the MR signal is essential in any system where information is desired from less than the entire object. In a high resolution NMR spectrometer, such as is used in chemistry labs, the sample is confined to a small tube and is nominally homogenous. The magnetic resonance of larger objects such as animals or humans requires that the signal be acquired from a subregion of the object. Localization techniques presently used fall into two categories. These are localization by B1 field and by magnetic field gradient. In the first case, an RF coil smaller than the sample is used to acquire a signal only from the portion of the object close to the coil (50,51). This has been used in MR spectroscopy, but it is not a method for making MR images. The use of magnetic field gradients permits localization in three dimensions. The gradient is created by large diameter coils that are wound as Maxwell pairs (52,53).
x
−x
B0 + G (x )∗(−x )
x
B0 + G (x )∗(x )
Figure 11. Magnetic field gradient imposed in the x direction. Notice that the resonant frequency of the spin changes linearly with the position in the x direction. A common misconception is that the gradient produces a field that is pointed in the direction of the gradient. This is not correct. The gradient field direction is along z, regardless of the direction of the gradient. Only modifications of the B0 field (in the z direction) affect spin precession.
MAGNETIC RESONANCE IMAGING
985
Te Te /2
RF
Readout gradient Slice selection gradient
Phase encoding gradient
Received signal
Figure 12. A complete spin-echo sequence that uses localization gradients to image a single slice. Only a single value of a phase encoding gradient is shown. To form an image, the sequence must be repeated using a unique value of phase encoding for each line of data in the two-dimensional k-space matrix.
frequency of the spin. The gain in efficiency for the line scanning method as opposed to the sensitive point method is due to the ability to collect data in a single signal from all points along the line. Signal locations were then distinguished by using the Fourier transform. The third method, which is preponderant, is the ‘‘spinwarp’’ (57) or Fourier transform imaging method. This method enjoys significant improvements in efficiency over the point and line scan methods due to the way in which the image is sampled. In contrast to the sensitive point method where a single data point samples only the spins at that physical location, data points in Fourier transform imaging sample a phase and frequency for the entire slice. In the two-dimensional version, a slice is first excited by a shaped RF pulse which is transmitted while a gradient is imposed, this is known as the slice selection gradient (Fig. 12). If the slice selection gradient were missing, the entire object would be excited. To image a slice, a B1 pulse is transmitted during the time that a gradient is applied. The bandwidth of the B1 pulse is limited typically by using a modulation envelope of the form sin(x)/x (this is called a ‘‘sinc’’ function, and it contains only a limited band of frequencies). Therefore, only the spins within a slab will be excited. The slab thickness is a function of both the amplitude of the slice gradient and the bandwidth of the B1 pulse, either of which may be changed to change the slab thickness. In two-dimensional imaging, no further localization is done across the slice. In the sinc envelope RF pulse, phase dispersion is created by the slice selection process that causes a complete loss of signal. Therefore, a
slice refocusing gradient is applied that cancels the phase accumulated across the slice. The position of the slice in the direction of the slice selection gradient can be moved by changing the frequency of the B1 pulse. The slice selection direction is entirely arbitrary (58). The normal vector of the slice may be aligned along any of the principal coordinate axes, X, Y, or Z. Additionally, the slice can be oriented obliquely by sharing the slice selection pulse between any or all of the gradients. Multiple slices can than be acquired in the same sequence. This is important because typically a number of slices must be acquired to image the entire object (59). Signals from a group of spins distributed along the frequency encoding gradient can be distinguished by their resonant frequencies. As for line scanning, the individual resonant frequencies can be extracted from a single signal by using the Fourier transform. To localize in the second transverse dimension, the phases of the spins are changed as a function of the distance from the center. This is accomplished by temporarily applying a gradient that briefly changes the frequency of the spins and then returns them to resonance. The result is that the phase of the spin is a function of the distance from the isocenter in the direction of the phase encoding axis. If a thicker slab is excited, as shown in Fig. 13, an additional direction of localization can be added to the sequence. In the third dimension, an additional phase encoding waveform is used. Reconstruction then creates slices from the slab encoding direction.
986
MAGNETIC RESONANCE IMAGING
RF
x y z Signal
z
y
Figure 13. A three-dimensional spin-echo sequence where phase encoding is substituted for the slice selection gradient. Multiple slices are then made, as shown, by using one more Fourier transform on the collected three dimensional k-space data set.
A fourth method is similar to computed tomography. If the spin-warp method is used without the phase encoding gradient, a projection across the image is created. If the readout gradient is shared between the two axes which are not the slice selection axis, a group of projections from multiple angles around the object is acquired. An image is then created using the same methods as in computed tomography (60,61). Imaging Mathematics. The amplitudes of the gradients used in phase and frequency encoding can be calculated by the need to prevent aliasing of the image. Aliasing results from the sampled nature of the data in both phase and frequency encoding directions. An image that is undersampled in the phase encoding direction would appear as in Fig. 14. To prevent aliasing, sampling step size must be small enough to satisfy Nyquist’s criterion (62). In the frequency encoding direction, the step size is γ (8) Gf t,
kf = 2π and the field of view (FOV) in the frequency encoding direction is 1 (9) FOVf =
kf In the phase encoding direction, the step size is
kp =
γ Gp tp , 2π
(10)
and the FOV in the phase encoding direction is 1 FOVp = ,
kp
FOV = 24 cm
Frequency encoding axis
x
FOV = 12 cm
Phase encoding axis Figure 14. Aliasing in a single dimension. From the left image, the field of view for the right image was halved. The frequency encoding direction does not show any aliasing due to the use of a hypersharp digital filter. The phase encoding direction shows aliasing due to insufficient sampling.
This can be understood intuitively because the higher spatial frequencies are sampled farther from the origin. The physical field of view (FOV) of the image is determined by the density of samples. Because MR data are sampled data, they follow Nyquist’s criterion and therefore must be sampled to a density sufficient to prevent aliasing. The requirements of k-space coverage and density of sampling have a strong influence on the trade-offs made in MRI, and these will be further discussed in the section on trade-offs. The 2D acquisition then lasts T = Tr × Nph × NEX .
(12)
The 3D acquisition lasts (11)
where k is the step size, measured in cm−1 . Image resolution is determined by the maximum value of k.
T = Tr × Nslph × Nph × NEX ,
(13)
where Tr is sequence repetition time, Nph is the number of phase encoding steps or samples, Nslph is the number of
MAGNETIC RESONANCE IMAGING
SNR ∝ where
√ NV Ts (B0 )k (e−Te /T2 )(1 − e−Tr /T1 ) √ bw
(14)
Object
rm fo r ns te ra pu rt ie om ur e c th by
Voxel volume is a product of the voxel dimensions in the plane and the slice thickness. As such, it can be considered an indicator of resolution. As resolution improves, the volume element V decreases and decreases the SNR of the image (for constant imaging time). Imaging experiments which create a T2 or T1 based contrast will operate at less than the maximum SNR because Te and Tr times used are less than optimum for signal preservation. Increases in imaging SNR can be obtained by increasing the B0 field and or the total sampling time (usually by signal averaging) or by decreasing the receiver bandwidth.
k-space
Fo
N is the number of spins in the voxel, V is the voxel volume, B0 is the main magnetic field strength, Ts is the total sampling time of the experiment, bw is the bandwidth of the receiver filter, and k is a factor that is one for conductive samples and 7/4 for nonconductive samples (discussed further in the instrumentation section) (65).
ur th ier e tra gr n ad sfo ie rm nt s
The Signal-to-Noise Ratio (SNR) in MR imaging is the limiting factor in the choice of experimental parameters used in the imaging sequence. The SNR formula is
The ubiquity of Fourier transform imaging in MR is due to the high SNR/unit time of the technique. This SNR efficiency results from collecting data from the entire object for each repetition of the sequence. This occurs because the gradients act to perform Fourier transforms of the object into k-space data. Then, the image reconstruction process Fourier transforms the k-space data into the image (Fig. 15). k-space is the complex matrix where acquired data are stored. The dimensions of k-space are wave number or inverse length. The structure of k-space has the lowest spatial frequencies in the center, whereas the outer edges contain the highest spatial frequencies. For common objects, such as human subjects, the center of k-space holds most of the image energy, and the edges contain the detail. It is important to remember that a data point in k-space has no direct remapping into the final image but represents only the signal from a component of spatial frequency at a particular phase. k-space has another important property that results because the data are a Fourier transform (one created by the encoding gradients) of a real object. The transform is Hermitian across the center of k-space that is, the complex datum at a point x, y, call it (a + ib) is the complex conjugate of the datum at −x, −y which is (a − ib) where a is the real part and b the imaginary. Thus, only half of the k-space data must be collected to make an image because the other half of k-space can be recreated from the acquired half (66). The time between excitation and the echo (Te time) can be minimized by acquiring only the right half of k-space. Total acquisition time can be cut approximately in half by acquiring only the top half of k-space. Note that flow and motion can cause k-space data to depart from the Hermitian property. In two-dimensional imaging, k-space is a twodimensional matrix. Reconstruction is accomplished by first Fourier transforming along one dimension of the data set. Then, a second Fourier transform is performed along the other axis to yield an image. The Fourier transform
Fo
Signal-to-Noise Ratio
IMAGE PROCESSING AND K -SPACE
by
phase encoding steps in the slab direction, and NEX is the number of sequence averages. Sampling in the frequency encoding direction is performed in the presence of a band-pass frequency filter to prevent noise from frequencies outside the sampled bandwidth from being aliased into the image. If the frequency encoding gradient amplitudes and sampling times are adjusted appropriately, no image aliasing will occur, but the noise is white and exists outside the imaging space. It is this noise which can be aliased in and decreases the signal-to-noise ratio of the image (63). Image resolution is defined as the field of view divided by the number of samples in that direction. This is true as long as the point-spread function (PSF) is not broadened by signal decay in that direction. The point-spread function is the Fourier transform of the product of the sampling window and any weighting functions due to uneven sample weighting in k-space. To understand image resolution, consider the image of an object that is just one pixel in width and height. The resulting image can be predicted by the convolution of the object shape by the point-spread function. If an object were to be sampled to an infinite distance in k-space, the point-spread function would be a delta function because the sampling window would be at a constant level (whose Fourier transform is a delta function). Real-world imaging samples data over a finite range of samples. In this case, the sample window is a rectangular function whose transform is a sinc pulse. The wider the sample window, the closer the PSF approaches a delta function. If however, the weighting of the samples is not constant, the PSF will be broadened. This can be the case for some fast imaging methods (64).
987
Image
Figure 15. The concept underlying Fourier transform imaging, where the object is sampled through a single wire (the wire to the receiver) by encoding object data through the Fourier transform effect of the gradients into a single signal. The image is then created by a computer-based Fourier transform.
988
MAGNETIC RESONANCE IMAGING
I channel image
Q channel image
Phase image arctan (I /Q )
Magnitude image
Figure 16. Images obtainable from the complex data matrix which results from Fourier transform reconstruction. By backcalculating the image in the image, it can be removed to make the real image readable. In practice, only the magnitude image is used.
of the complex raw data produces a complex image. Four images can be created from the complex data set. These are the real (I) and imaginary (Q) images, the phase image [arctan(I/Q)] and the magnitude image. Each of these images is shown in Fig. 16. Most images are displayed as magnitude, however, additional information is available about the sign of the signal if the real channel is used, provided that the phase variation is removed (67). In 3-D imaging, a transform along the third axis is performed. No third transformation is required along the third dimension in 2-D imaging because the slices are acquired in separate k-space matrices. The order in which Fourier transforms are performed is arbitrary because each encoding is orthogonal from the others. The images produced reflect any nonlinearities of the physical gradient fields. Because the imaging process samples to a linear grid, nonlinearities will show up as image stretching or compression (68). Knowledge of the actual field distribution of the gradient allows mathematical warping of the image to compensate partially for the physical imperfections of the gradients. ENDOGENOUS IMAGE CONTRAST The simplest mechanism of image contrast is spin density. The number of spins in a voxel is a primary determinant of the signal available from the voxel [see Eq. (14)]. If this were the only source of image contrast, however, then MRI would be little different from computed tomography where image contrast is a function of the tissue density. The most significant discovery in NMR of intact human tissues was that different tissues had different relaxation rates (69). Further, pathological conditions caused the tissue relaxation rates T1 and T2 to change from those
of normal tissue (70). These two facts have formed the basis of most MRI sequences. Relaxation-based contrast mechanisms include T1 and T2 weighted methods, as well as magnetization transfer (MT) and time-of-flight angiography. To use these methods fully, tissue relaxation rates must be known approximately to allow choosing sequence timing parameters appropriately. The choice of parameters depends on the materials involved and also on the strength of the main magnetic field. For intact animals, the range of T1 and T2 values can be of the order of milliseconds to seconds, most strongly dependent on tissue type (71). Typically, T2 values change only slightly as field strength changes whereas T1 and T2∗ values can change substantially. T2∗ values change due to the presence of paramagnetic (blood) and ferromagnetic substances such as ferritin (72) which causes transverse magnetization to dephase more rapidly as B0 increases. T1 values change as field strength B0 changes because the loss of longitudinal magnetization to the lattice is determined by the ratio of the correlation time of the molecule and the Larmor frequency of the spins. When the Larmor frequency of the spins is equal to the inverse of the correlation time (between the free spin and the surrounding tissue matrix), the relaxation or spin loss is most efficient. Sequence parameters are thus adjusted to achieve the maximum difference between normal (for anatomical visualization) and pathological tissue states (for detection of disease), not to achieve maximum SNR. In Fig. 17a, MR signal intensity is plotted against echo time for some of the tissues in the brain. Echo times that yield the maximum T2 weighted image contrast are not zero, where signal-to-noise is maximum. It is instead, as shown in the associated graph that shows the difference in the signals, in an intermediate range of echo times. For T1 contrast, a typical set of relaxation curves is shown in Fig. 17b where the maximum contrast for a simple spinecho sequence is plotted in the same figure. Note that other methods of spin preparation such as inversion can greatly alter the contrast between tissues (73,74). Image contrast based on the combination of Te and Tr for human tissues is shown in Fig. 18. Typical ranges for Te and Tr are 5 to 200 ms for Te and 300 to 4,000 ms for TR . The mixed weighting condition (long Te , short TR ) is not used for patients. OTHER CONTRAST MECHANISMS In addition to spin density, and T1, T2, and T2∗ contrast, spin systems, especially biological, have other contrast mechanisms that can be used to image aspects of system or organ function. These include the bulk flow of spins (usually blood), diffusional motion of spins, and chemical shift. The bulk flow of spins can be imaged by two methods. The first is a time-of-flight method where a short Tr sequence is used to suppress nonmoving spins by reducing the longitudinal magnetization (Mz ) to near zero (75). Spins that move into the slice have full Mz (because they have not experienced an effective excitation) and thus produce strong signals. The second method is a phase contrast method where bipolar gradient lobes are played out along one gradient axis (76). Spins that do not
MAGNETIC RESONANCE IMAGING
1
1
0.8
0.08
Mx1
0.6
Mx2
0.4
Mx1−Mx2
(a)
0.2 0
989
0.06 0.04 0.02
0
50
100
150
200
0
250
Echo time (ms)
0
50
100
150
200
250
Echo time (ms)
McTr : = MaTr − MbTr
(b)
1
Mz1−Mz2
1
Mz1T r 0.5
Mz2T r
0
0.1
0.05
0 0
500 1000 1500 2000 2500 3000
T r time (ms)
0
500 1000 1500 2000 2500 3000
T r time (ms)
Figure 17. (a) Different spin populations that have different T2 relaxation rates show maximum contrast at a Te time that is not zero, where signal would be maximum. (b) Two spin populations different in T1 relaxation, showing relaxation rates and the difference in signal versus Tr time. Again, the maximum contrast is not obtained where the signal is maximum (5 × T1).
Short Short
Tr
Long
T1 weighted
Proton density weighted
Mixed weighting
T2 weighted
Te
Long Figure 18. A matrix that shows the effect of Te and Tr time on image contrast in spin-echo imaging. At long Te and short Tr , mixed contrast weighting is obtained and therefore is not used in MRI of patients.
move have equal amounts of phase added and subtracted, resulting in a net zero phase change. The sequence is then repeated, and the polarity of the phase encoding lobes is reversed. Subtraction of the two images results in an image of only the moving spins. The advantage of this method is that the velocity is encoded and thus can be measured (unlike time of flight). However, an image that is sensitized to flow in all three axes requires six repetitions of the sequence. It is possible to reduce this
number to four repetitions by suitable methods, but it is still a more time-consuming method. A comparison of the methods is shown in Fig. 19. Spin diffusion can indicate the microstructure of a material. In biological systems, it is used to reveal defects in cellular membrane integrity and cellular swelling (77–81). The method is implemented by using very large bipolar gradients that are arranged on either side of an RF180 pulse (25). Then, signal loss is a function of the diffusion of the spins (D) and the amplitude (g), width (δ) and spacing ( ) of the diffusion weighted gradients: S 2 2 2 = e−γ g δ ( −(δ/3))D . S0
(15)
All of the terms in the exponent except D are combined and referred to as b. The b value of a diffusion weighted pulse sequence is used as a measure of the degree of diffusion weighting in the pulse sequence. At high b values, even spins that diffuse slowly move a sufficient distance, thus losing signal. The areas of the gradients used to create the diffusional motion sensitivity are quite large, and the amplitudes of the motions are quite small. In living biological systems, minor motions attendant to respiration and other physiological processes require that the imaging experiment be quite short. Typically, fast echo-planar imaging (EPI) methods are used which acquire all of kspace in a single excitation. This will be explained in the section on fast imaging (Fig. 20) (82).
(a) Time-of-flight method
Slice selection profile
Slice selection gradient
Mz
Mz
Mz
Mz
Mz
Mz
Tissue spin (b)
Fluid spin
Encoded directions of flow
Phase encoding gradient 1
Phase encoding gradient 2 Phase encoding gradient waverform
Spin phases A
A
B
B
C
Moving spin
C Stationary spin
Difference ≠ 0
Moving spin Difference = 0
Stationary spin
Figure 19. (a) Mechanics of the time-of-flight method of flow imaging. Spins in the slice that do not move have small Mz due to the short Tr of the pulse sequence. These stationary spins produce very little signal in the image. Flowing spins, however, have not been subject to B1 excitation and have Mz = M0 . The flowing spin in the slice then gives a very large signal upon excitation. (b) Mechanics of the phase contrast method of flow imaging. Spins that move in the direction of the encoding gradient acquire a phase as a function of the distance traveled in the time between encoding lobes. Stationary spins acquire and lose the same amount of phase from each gradient lobe pair.
990
MAGNETIC RESONANCE IMAGING
991
Readout gradient Phase encoding gradient Slice selection gradient RF EPI b = 0
T 2 weighted
EPI, b = 1000
ADC image
Figure 20. Diffusion imaging showing the diffusion encoding gradients on all three axes at once (for the simplicity of the diagram). In practice, each diffusion direction must be collected separately. The diffusion weighted image is made by collecting at images that have at least two different diffusion weightings and calculating the Apparent Diffusion Coefficient (ADC) image by fitting each pixel intensity to Eq. (15).
MT pulse on Water resonance
Lipid resonance
MT pulse off
MT Excitation band Bound water resonance
4000
2000
0
−2000
−4000
Offset (Hz)
Magnetization transfer imaging is a method of modifying T1 relaxation of a sample by eliminating one of the pathways by which mobile spins lose energy to the lattice (83). In magnetization transfer imaging, an RF pulse is used that is significantly off-resonance. In imaging biological systems, mobile water protons communicate with the protons bound to macromolecules of the background. The macromolecule associated protons do not tumble freely which means that they have very short T2 times that render them invisible. The short T2 time also implies that the bandwidth of the resonance is very broad. A strong RF pulse delivered off-resonance will excite these spins without interfering with the mobile proton signal. The excited off-resonance protons become invisible to the mobile protons and force the T1 relaxation rate to increase. Not all biological systems demonstrate this behavior. Brain tissue and blood do, but adipose (fatty) tissue does not. The process of magnetization transfer
Figure 21. Magnetization transfer imaging, showing the saturation pulse relative to the imaged mobile protons and the bound background spins.
imaging is shown in Fig. 21. An image of the magnetization transfer effect requires making two images, one for the pulse on and the other for the saturation pulse disabled. The subtraction of the two images gives an image of the magnetization transfer effect. Magnetization transfer is now used to investigate diseases of the myelin sheath around neurons (84,85). FAST IMAGING In many systems, image acquisition time must be minimized due to the changing nature of the object. In biological systems, this is typified by cardiac, respiratory, and gross patient motion. For process control systems such as flowing fluids or production line systems, the object must be sampled on timescales short compared to the time available for observation. Imaging that uses pure T1, T2, or proton density contrasts requires constant Tr periods
992
MAGNETIC RESONANCE IMAGING
RF Phase encoding Signal
Te = 20 ms
Te = 40 ms
Te = 60 ms
Te = 80 ms
Figure 22. Conventional multiple echo spinecho sequence where one phase encoding is used for all echoes in a sequence. Then, a four echo sequence creates four separate images that have four distinct Te times.
for each line of k-space data. Therefore, the minimum acquisition time is Tr × Np where Np is the number of phase encoding steps. In biological systems where T1 values are of the order of 1 second, total acquisition time is of the order of two minutes. Several approaches exist to minimize total acquisition time. Each, however, requires trade-offs. Fractional k -Space. The Hermitian nature of k-space permits concluding acquisition after half of the lines of k-space has been acquired. This method decreases total scan time by almost 50%. The trade-off is decreased signalto-noise because the SNR is proportional to the square root of total acquisition time. Additionally, artifacts may arise from areas in the sample where the magnetic field changes abruptly, such as interfaces between tissue and air. Gradient Echo The gradient echo sequence can operate at short Tr times and decrease acquisition time from minutes to seconds (86). The use of a both a short Tr and the required short Te time results in a mixed T1 and T2∗ weighting. The T2∗ rather than T2 weighting is due to the lack of an RF180 pulse. In some instances, T2∗ weighting is highly desirable, as when searching for pathology that may be associated with the accumulation of ferromagnetic material such as hemorrhage from trauma (87). Fast Spin Echo In a spin-echo pulse sequence, the RF180 pulse may be repeated to refocus the magnetization at different times. Each of these echoes is then stored in its own k-space array to form separate images, as shown in Fig. 22. If different phase encodings are applied to each of the echoes and the resulting data are stored in the same k-space, acquisition time will decrease as the inverse of the number of echoes
(RARE) (64). The image contrast is determined principally by the echo time of the acquisitions near the center of the k-space (Fig. 23). The number of echoes can be extended to equal the total number of phase encodings. By this method, the entire image may be acquired in a single Tr time (88). The utility of fast spin echo depends on sufficiently long T2 times in the object. The k-space data are weighted by the T2 decay of the spins as shown in Fig. 24. k-space weighting causes blurring which worsens as T2 decreases. The T2s of the spins to be imaged dictate the maximum number of echoes (the echo train length). Driven Equilibrium In a conventional T2 weighted spin-echo sequence, the echo time Te is long to remove signal contributions from spins that have short T2 times, and the repetition time Tr is long to prevent T1 weighting from mixing in with the contrast (89). To eliminate the need to wait for restoring longitudinal magnetization through T1 relaxation, the sequence can be repeated backward, as shown in Fig. 25. The use of driven equilibrium does not restore lost transverse magnetization (T2 losses) and may create artifacts. Echo Planar In a manner analogous to that of fast spin echo, multiple echoes may be acquired in a gradient echo sequence (Fig. 26) simply by inverting the amplitude of successive readout gradient pulses (90). MAGNETIZATION PREPARATION Gradient Echo Associated Methods The lack of T1 contrast can be partially offset by using RF spoiling. RF spoiling varies the phase of
MAGNETIC RESONANCE IMAGING
993
RF Phase encoding
Signal
Figure 23. A fast spin-echo sequence whose echo timing is comparable to that of the conventional spin echo. Now, each echo has a different value of phase encoding applied and a single k-space matrix is filled in one-fourth the time of the conventional spin echo acquisition.
(a)
(c)
(b)
(d)
(e)
Figure 24. A fast spin-echo image data set where the PSF is increasingly broadened by filtering before reconstruction to emulate the effect of T2-related blurring on a fast spin-echo image. (a) No degradation, (b)–(e) Increasing degradation.
the RF pulses randomly between Tr periods to prevent the coherent interference of echoes from successive Tr periods (91). Inversion Recovery Most imaging methods start by nutating the longitudinal magnetization between 0 and 90° , but the spins can
be nutated 180° . Immediately after such a pulse, the magnetization is directed along −z. Then, the path of relaxation is up along z, and no transverse component is being generated. A family of curves for a typical biological system is shown in Fig. 27. The magnetization so created can then be sampled using a spin-echo or gradient echo sequence. The image contrast is largely determined by the time between the inversion and the RF90 pulse. The
994
MAGNETIC RESONANCE IMAGING
Tr Te
RF Readout Slice selection Phase encoding Signal
Figure 25. Driven equilibrium spin-echo sequence showing the sequence repeated backward after the echo time Te . The phases of the RF pulses in the second half of the sequence are opposite those from the first part. If the initial phases of the RF90 and RF180 are +x and +y, then the second RF180 must be −y and the second RF90 must be −x.
Readout gradient
EPI image
Phase encoding gradient RF Figure 26. Echo planar imaging (EPI) pulse sequence and a typical image. Note the distortion at the air/tissue interface in the temporal lobes (arrows).
inversion method of spin preparation has three important features. The first is that the polarity of the sampled signals may be both positive and negative, depending on which side of the transverse plane the longitudinal magnetization was just before excitation. Most commercial MR imaging systems generate magnitude images from the complex Fourier transformed data, thus displaying only a positive image. However, it is possible to make real images that include signal polarity. The second important feature of inversion recovery is the ability to sample the inverted magnetization when one component has Mz = 0. An image will then be produced that has no contribution from spins sampled at a time equal to 0.693 × T1 after inversion. This has been used extensively to eliminate the signal from fat tissues. The third important feature is the ability to generate larger contrasts between spins of similar T1 than of T1 weighted spin-echo imaging (92,93). Chemical Shift Saturation The resonant frequency of a spin is a function of both the magnetic field in which it is immersed and the local magnetic fields it experiences. One of these field sources is the molecule in which it is bound. Hydrogen nuclei in water resonate 3.5 ppm downfield from hydrogen nuclei bound in lipid methylene moieties. Although small compared to the strength of the main magnetic field, the frequency difference is large enough to permit excitation of only one moiety. In practice, a single species may be excited before the imaging sequence, and its signal made incoherent by a strong gradient pulse that renders it invisible to the subsequent pulse sequence (94,95). B0 +Mz
−Mz
1
Gray matter White matter Cerebro-spinal fluid
Mz 0
Figure 27. Longitudinal magnetization recovery after inversion versus inversion time for tissues in the brain. Note that Mz for each of the tissues crosses zero at a different inversion time.
−1 0
0.5
1.0
1.5 Time (s)
2.0
2.5
3.0
MAGNETIC RESONANCE IMAGING
995
(a)
(b)
NAA
NAA
Cho
Cho Glx
MyoI
Cho Glx
Cr
MyoI
4
2
0
MyoI
4
2
0
ETOH
4
2
NAA
Cho
Cho
Glx
Glx MyoI
Cr
Glx
Cr MyoI
0
4
ppm
0
ppm
NAA
Cho
2
Cr
ppm
NAA
4
Glx
Cr
ppm
MyoI
NAA
2 ppm
0
4
Cr
2
0
ppm
Figure 28. Proton density weighted, fast spin-echo images of a volunteer at the level of the basal ganglia collected in six separate sessions. Proton spectra were extracted from the left putamen for each of the studies. Water suppressed proton spectra were acquired at 0.5 Tesla using a spin-echo chemical shift imaging sequence (109).
CHEMICAL SHIFT IMAGING The resonant frequency of a spin depends on the magnetic field from all contributions. This includes, most importantly, the main magnetic field. Other effects
include gradients used for localization, local variations in the magnetic field produced by alterations in the bulk magnetic susceptibility of the material (susceptibility artifacts) and the chemical binding environment of the spin. For this last reason, a nucleus experiences a slight
996
MAGNETIC RESONANCE IMAGING
MRI
(a)
(b)
42
106
2
385
57
808
21
452
69
MRS
Figure 29. MRI image (a) is composed of scalar values. Data are available from each voxel, shown as individual spectra in (b) but are discarded in routine imaging. Note that the resonances in each voxel in (b) have much lower amplitude than water resonance (shown as truncated resonance).
shift in its resonant frequency which is characteristic of the molecule of which it is a part (the chemical shift). The chemical shift has been used to characterize chemical compounds in laboratory NMR spectrometers since the 1960s (15). Since the early 1980s, the technique has been applied in the human body within MRI scanners and is called magnetic resonance spectroscopy or MRS. One of the strongest advantages of MRS is the ability to demonstrate chemical change in the absence of morphological change. In Fig. 28, the images and corresponding MR spectra from six repeated studies of a volunteer are shown. In the third study (upper right hand corner), it was discovered incidentally that the volunteer had been drinking late into the night before the exam. The triplet resonance of ethanol is clearly shown in the spectrum (marked ETOH), whereas the corresponding image shows no change. The chemical shift can be exploited to create images of individual chemical moieties in an object by using chemical shift imaging (26). The information from all spins in each voxel of every MRI image is collapsed into a single scalar value to create a two-dimensional image (Fig. 29a). The readout gradient causes all of the spins in a voxel to be irretrievably mixed into a single scalar value. Using phase encoding gradients for both transverse axes, the chemical shift information is preserved and can be extracted by Fourier transform. An image created in this manner has three dimensions, two spatial and a third in chemical shift (Fig. 29b). Multiple images can be created from a single threedimensional data set by binning the frequency dimensional data. Resulting images of individual resonances reveal the spatial distribution of metabolites. Figure 30 shows images from a patient who had a brain tumor. The first image is the T2 weighted image where the CSI sampled area is superimposed upon it in red. The lactate
image demonstrates localized hypoxia, and the choline image shows membrane proliferation and by extension, growth of the tumor. The NAA (n-acetyl aspartate) image shows the distribution of healthy neurons, and the glutamate/glutamine images show the distribution of one of the principal neurotransmitters in the brain (96). Trade-Offs The limitation of MRI is the available signal. This signal is fundamentally limited by the polarization induced in the sample, which in turn is proportional to the strength of the main magnetic field and inversely proportional to the temperature of the sample. Increasing the magnetic field can cause other subtle degradations, including increased magnetic susceptibility artifacts and decreased contrast in T1 weighted scans due to the change of tissue T1 as field strength changes. The maximum available signal is obtained at long Tr (5 × T1 gives 99% of the available longitudinal magnetization) and the shortest Te times allowed by the performance of the MRI system. Operation at these parameters yields only spin density contrast. T1 weighted contrast requires Tr times less than 5 × T1 and creates signal loss as 1 − e−Tr /T1 . Likewise, T2 weighted sequences require Te times that create signal loss as e−Te /T2 . Image signal-to-noise is a function of available signal (dependent on the image contrast parameters, as mentioned before), voxel volume, and total sampling time. In-plane spatial resolution is expensive in terms of signal-to-noise (SNR). Doubling the in-plane resolution without changing the field of view (FOV) decreases the voxel volume and thus the SNR by a factor of 4. The SNR limited nature of MRI means that the effects of trade-offs can be considered by referring to Eq. (14) (97).
MAGNETIC RESONANCE IMAGING
(a)
997
Cho
NAA Glx MyoI
Cr
4
Integration area used to make metabolite map
Lactate
2
0 ppm
(b)
T 2 weighted
Choline
Glutamate
KEY +
−
NAA
Lactate
Instrumentation An instrument for magnetic resonance imaging consists of a magnet to create the polarizing field, a gradient subsystem to create the localizing fields, a transceiver and radio frequency coils to transmit and receive the radio-frequency magnetic field, a control computer and a computer for reconstructing of the data, and an image visualization system (often combined with other computers). Magnet. The fundamental and most critical piece of instrumentation in MRI is the magnet. Three technologies exist to create the main polarizing magnet field B0 (98–101). The first is the permanent magnet. Typically
Figure 30. Spectrum from a single voxel in a chemical shift imaging data set of the brain. Gray regions were integrated for each voxel in the data set to produce false color maps in (b), which were then overlaid on the high resolution image from the same slice position. See color insert.
made of ferrous or nickel alloys, the maximum field that can be generated is approximately 3,000 Gauss (0.3 Tesla). This technology is often employed in open architecture MRI systems (Fig. 31). Permanent field magnets cannot achieve the higher field strengths of superconducting magnets and are prone to field drift that is a strong function of ambient temperature. The second method of producing the field is by resistive magnets. In these magnets, the field is produced by a large dc current flowing through windings, arranged as shown in Fig. 32. The core of the windings can be augmented by ferromagnetic alloys to improve the stability of the field. The deficiencies of this magnet technology include the requirement for constant current supply,
998
MAGNETIC RESONANCE IMAGING
B0
B0
Figure 31. Photograph of permanent magnet MRI system showing the direction of the B0 field and a cross section of the magnet within. Typical magnets are composed of boron–nickel hydride. See color insert.
B0
B0
Figure 32. Resistive (electromagnet) design showing the windings and field direction. See color insert.
large cooling requirements, poor field stability, and the inability to achieve the field strength of superconducting magnets. The third and most popular magnet technology is implemented using superconducting wire wound in an arrangement of four or more coils, as shown in Fig. 33. After the current is injected into the magnet, no further current is required. The magnet has excellent field stability, and no water cooling is required as for resistive magnets. The ambient temperature stability is much better than that of permanent magnets. However, most superconducting magnets require liquid helium to keep the magnet windings cold. Typical wire in such a magnet becomes superconducting at approximately 10° –14° K. Liquid helium at 4.2° K provides a bath that keeps the magnet superconducting. Liquid helium has a very small specific heat and boils off easily. If liquid helium boils off, the magnet can quench, lose field, and the magnet wire may be destroyed. The wire cannot support the required current when it is not superconducting and can melt when it suddenly becomes no longer superconducting. Many of the original magnet designs had a jacket of liquid nitrogen (temperature 77° K) surrounding the liquid helium bath to keep the liquid helium boil-off rate as low as possible. Now, many superconducting magnets use
multiple, closed-cycle, helium refrigerators to keep the magnet windings cold, eliminating the need for liquid nitrogen. The downside of this magnet is that a loss of line power will cause the refrigerators to shut down, allowing the magnet windings to warm; without a liquid nitrogen bath to provide a thermal buffer, the liquid helium will boil off rapidly causing a possible wire-melting loss of superconductivity. Gradients. The gradient fields are created by Maxwell pairs of coils that create a linear field across the volume to be imaged. Typically, three coil sets are used, one for each of the three principal coordinate axes, x, y, and z. Each coil set is connected to an amplifier that can drive tens to hundreds of amps of current into the coil. The self-inductance of the coil opposes the rapid rise in current which is desirable in an MRI system. Therefore, the amplifiers use voltages of the order of hundreds to thousands of volts for the segment of the gradient waveform where the current is rapidly increasing. When the current nearly reaches the final desired value, the voltage will decrease. In this way, smaller amplifiers can be used. The figures of merit for a gradient subsystem are the maximum field that can be created,
MAGNETIC RESONANCE IMAGING
999
Magnet windings Magnet bore
B0
Magnet cryostat Gradient and body RF coil
B0
measured in gauss/centimeter or millitesla/meter; the slew rate, measured in tesla/meter/second; and the linearity. Linearity is measured as the deviation from actual positional measurements of a phantom where the geometry of the image is known (102). Transceiver. To observe the spin system in the rotating frame of reference, the spins must be communicated with at radio frequencies. At 1.5 tesla, proton spins precess at 63.86 MHz. By transmitting RF pulses whose carrier frequency is 63.86 MHz, the spins are efficiently nutated. Likewise, when receiving the signal induced in the radiofrequency coil, or antenna, the signal produced by the spins in the object is also at 63.86 MHz. A different field strength magnet, such as 0.5 tesla will have a different resonant frequency. For 0.5 tesla, the spins precess at 21.29 MHz. The transceiver creates the transmit field by mixing a carrier frequency (at the Larmor or resonant frequency of the magnet) whose wave shapes are created by a pulse control computer. The modulated carrier signal so produced is then fed into an RF power amplifier that provides sufficient power to nutate the spins in the sample. The power required from the RF power amplifier is a function of two variables. The first is the conductivity of the object. The more conductive the object, the more power must be applied to create the necessary nutation. Required RF power increases at the square of the increase in field strength for conductive bodies such as humans. The second variable is the size of the RF coil. A larger RF coil will dissipate more current in its own structure, and it will also usually contain a larger conductive object, and so will require more RF power (103). RF Coils. Radio-frequency antennas for MRI, also known as RF coils, operate in an unusual manner for RF antennas. Most RF antennas can be designed by using far-field approximations of the behavior of the components. No so for MRI. Coils for MRI operate in
Figure 33. Superconducting magnet showing the direction of the main field and the arrangement of current-carrying coils. Most superconducting MRI magnets are built of four windings connected in series and arranged as shown. The variation in diameter is intended to improve the homogeneity of the field in the center of the magnet. See color insert.
the near-field in an attempt to maximize the SNR of the MR image (104). The SNR For MRI coils is determined by the relative contribution of the noise generated by the object and the noise generated as Johnson noise by the coil itself. As the sample becomes more conductive (typically higher in field strength), it is easier to make a coil whose contribution to noise is insignificant relative to the noise from the patient (105). RF coils for MRI are typically either volume coils that excite and receive NMR signals from a large volume or are surface coils that are sensitive only to local regions of a sample (106). Typical volume RF coils in an MRI system include body, head, and knee coils. The B1 field created by the volume coil may be either linear or quadrature. A quadrature coil is analogous to a circularly polarized communications antenna. The benefits of quadrature coil design include a square root of 2 increase in image SNR and a reduction in required transmission power by a factor of 2 (107). Surface coils may be of any size or shape but most often are not used to transmit B1 RF pulses. Instead, a volume coil such as the body coil is used to create a homogeneous RF transmission field (108). The surface coil has an inhomogeneous sensitivity profile that allows it to achieve high image SNR in the vicinity of the coil but would create a wide range of transmission field amplitudes. The large range of transmission fields would make the received image even less homogeneous. The high power RF amplifier (100–20,000 watts) must be kept isolated from the RF preamplifier. The RF preamplifier receives a signal of the order of 10−12 watts that has an extremely low noise figure. The preamplifier would be destroyed quickly if it were not protected. The protection, called a T/R switch, is implemented as a series of diodes that can be switched on or off to direct the current flow. Computer Systems. Computer system operations in an MRI scanner are divided into two distinct areas. The first is nominally referred to as the host computer. The
1000
MAGNETIC RESONANCE IMAGING
Display Gradient amplifiers
RF screen (Faraday cage)
Magnet Gradient coil
Host computer
RF coil
x,y,z Preamp protection Sample Acquisition controller
Preamp
RF signal out ~0.000000001 watts
Data bus
Array processor/ memory
Pulse generator
Exciter
RF transmit signal (B1) 50 watts – 20,000 watts
RF power amp
Receiver
Figure 34. Block diagram of a typical MRI system.
host computer handles user interface, image display, and image storage functions. A second computer is required for real-time generation of pulse sequences (gradient, RF, and timing waveforms), as well as image reconstruction. A block diagram of a typical MRI system is shown in Fig. 34. BIBLIOGRAPHY 1. M. D. Hurlimann and D. D. Griffin, J. Magn. Resonance 143, 120 (2000). 2. U. Yaramanci, Annali Di Geofisica 43, 1,159 (2000). 3. Y. M. Zhang, P. C. Xia, and Y. J. Yu, IEEE Trans. Appl. Superconductivity 10, 763 (2000). 4. D. E. Woessner, Concepts Magn. Resonance 13, 77 (2001). 5. M. H. G. Amin, L. D. Hall, R. J. Chorley et al., Progr. Phys. Geogr. 22, 135 (1998). 6. M. Appel, F. Stallmach, and H. Thomann, J. Petr. Sci. Eng. 19, 45 (1998). 7. J. R. Bows, M. L. Patrick, K. P. Nott et al., Int. J. Food Sci. Technol. 36, 243 (2001). 8. P. Cornillon and L. C. Salim, Magn. Resonance Imaging 18, 335 (2000).
9. J. Gotz, W. Kreibich, M. Peciar et al., J. Non-Newtonian Fluid Mech. 98, 117 (2001). 10. J. J. Gonzalez, R. C. Valle, S. Bobroff et al., Postharvest Biol. Technol. 22, 179 (2001). 11. K. M. Keener, R. L. Stroshine, and J. A. Nyenhuis, Trans. ASAE 40, 1,633 (1997). 12. I. L. Pykett, IEEE Trans. Appl. Superconductivity 10, 721 (2000). 13. P. C. Lauterbur, Nature 242, 190 (1973). 14. J. W. Prichard, J. R. Alger, and R. Turner, British Medical Journal (BMJ) 319, 1,302 (1999). 15. E. Becker, High Resolution NMR, Theory and Chemical Applications, Academic Press, Orlando, FL, 1980. 16. R. R. Price, J. L. Creasy, C. H. Lorenz et al., Invest. Radiol. 27, S27 (1992). 17. G. F. Mason, R. Gruetter, D. L. Rothman et al., J. Cerebral Blood Flow Metab. 15, 12 (1995). 18. B. Mora, P. T. Narasimhan, B. D. Ross et al., Proc. Natl. Acad. Sci. USA 88, 8,372 (1991). 19. A. Abragam, Principles of Nuclear Magnetism, Clarendon Press, Oxford, 1961. 20. G. A. Johnson, G. Cates, X. J. Chen et al., Mag. Resonance Med. 38, 66 (1997). 21. K. Ruppert, J. R. Brookeman, K. D. Hagspiel et al., Magn. Resonance Med. 44, 349 (2000).
MAGNETIC RESONANCE IMAGING 22. Y. Wu, D. A. Chesler, M. J. Glimcher et al., Proc. Natl. Acad. Sci. USA 96, 1,574 (1999). 23. G. F. Anderson and J. P. Poullier, Health Affairs 18, 178 (1999). 24. T. Ernst, R. Kreis, and B. D. Ross, J. Magn. Resonance Ser. B 102, 1 (1993). 25. E. O. Stejskal and J. E. Turner, J. Chem. Phys. 42, 288 (1965). 26. T. T. Brown, B. Kincaid, and K. Ugurbil, Proc. Natl. Acad. Sci. USA 79, 3,523 (1982). 27. J. A. Kaufman, S. C. Geller, M. J. Petersen et al., Am. J. Roentgenol. 163, 203 (1994). 28. F. Guckel, G. Brix, K. Rempp et al., J. Comput. Assisted Tomography 18, 344 (1994). 29. M. S. Cohen and S. Y. Bookheimer, Trends in Neurosciences 17, 268 (1994). 30. C. T. Moonen, F. A. Barrios, J. R. Zigun et al., Magn. Resonance Imaging 12, 379 (1994). 31. G. E. Santyr, Magn. Resonance Imaging Clin. North America 2, 673 (1994). 32. A. D. Elster, J. C. King, V. P. Mathews et al., Radiology 190, 541 (1994). 33. J. R. Hesselink and G. A. Press, Radiol. Clin. North America 26, 873 (1988). 34. C. P. Slichter, Principles of Magnetic Resonance, Springer Verlag, Berlin, 1980. 35. T. C. Farrar, Concepts Magn. Resonance 2, 1 (1990). 36. F. Bloch, Phys. Rev. 70, 460 (1946). 37. T. C. Farrar, An Introduction to Pulse NMR Spectroscopy, Farragut Press, Chicago, 1987. 38. T. W. Redpath and J. M. Hutchison, Phys. Med. Biol. 27, 443 (1982). 39. F. Bloch, W. W. Hansen, and M. Packard, Phys. Rev. 70, 474 (1946). 40. P. Mansfield and P. G. Morris, NMR Imaging in Biomedicine, Academic Press, NY, 1982. 41. D. Nishimura, Principles of Magnetic Resonance Imaging, Stanford University Press, Stanford, CA, 1996. 42. R. C. Hawkes and S. Patz, Magn. Resonance Med. 4, 9 (1987). 43. A. Oppelt, R. Graumann, and H. Barfuss, Electromedica 54, 15 (1986). 44. H. Y. Carr and E. M. Purcell, Phys. Rev. 94, 630 (1954). 45. S. Meiboom and D. Gill, Rev. Sci. Instrum. 29, 6,881 (1958). 46. E. L. Hahn, Phys. Rev. 80, 580 (1950). 47. C. T. W. Moonen and P. C. M. vanZijl, J. Magn. Resonance 88, 28 (1990). 48. N. Bloembergen, E. M. Purcell, and R. V. Pound, Phys. Rev. 73, 679 (1948). 49. R. L. Dixon and K. E. Ekstrand, Med. Phys. 9, 807 (1982). 50. T. Jue, D. L. Rothman, J. A. Lohman et al., Proc. Natl. Acad. Sci. USA 85, 971 (1988). 51. J. A. Jeneson, J. O. van Dobbenburgh, C. J. van Echteld et al., Magn. Resonance Med. 30, 634 (1993). 52. R. Turner, Magn. Resonance Imaging 11, 903 (1993). 53. R. Bowtell and P. Mansfield, Magn. Resonance Med. 17, 15 (1991). 54. R. Damadian, Philos. Trans. R. Soc. London — Ser. B: Biol. Sci. 289, 489 (1980). 55. P. Mansfield and A. A. Maudsley, Phys. Med. Biol. 21, 847 (1976).
1001
56. P. Mansfield, P. G. Morris, R. J. Ordidge et al., Philos. Trans. R. Soc. London — Ser. B: Biol. Sci. 289, 503 (1980). 57. W. A. Edelstein, J. M. Hutchison, G. Johnson et al., Phys. Med. Biol. 25, 751 (1980). 58. M. A. Reicher, R. B. Lufkin, S. Smith et al., Am. J. Roentgenol. 147, 363 (1986). 59. V. M. Runge, M. L. Wood, D. M. Kaufman et al., Radiographics 8, 507 (1988). 60. J. B. Ra, S. K. Hilal, and Z. H. Cho, Magn. Resonance Med. 3, 296 (1986). 61. I. Shenberg and A. Macovski, Med. Phys. 13, 164 (1986). 62. R. N. Bracewell, The Fourier Tranform and Its Applications, McGraw-Hill, NY, 1986. 63. D. I. Hoult and P. C. Lauterbur, J. Magn. Resonance 34, 425 (1979). 64. J. Listerud, S. Einstein, E. Outwater et al., Magn. Resonance Q. 8, 199 (1992). 65. W. A. Edelstein, G. H. Glover, C. J. Hardy et al., Magn. Resonance Med. 3, 604 (1986). 66. D. A. Feinberg, J. D. Hale, J. C. Watts et al., Radiology 161, 527 (1986). 67. R. E. Hendrick, T. R. Nelson, and W. R. Hendee, Magn. Resonance Imaging 2, 279 (1984). 68. Y. P. Du and D. L. Parker, Magn. Resonance Imaging 14, 201 (1996). 69. R. Damadian, Science 171, 1,151 (1971). 70. A. F. Martino and R. Damadian, Physiol. Chem. Phys. Med. NMR 16, 49 (1984). 71. P. A. Bottomley, T. H. Foster, R. E. Argersinger et al., Med. Phys. 11, 425 (1984). 72. G. Bartzokis, M. Aravagiri, W. Oldendorf et al., Magn. Resonance Med. 29, 459 (1993). 73. M. Graif, J. M. Pennock, J. Pringle et al., Skeletal Radiol. 18, 439 (1989). 74. M. Dousset, R. Weissleder, R. E. Hendrick et al., Radiology 171, 327 (1989). 75. F. W. Wehrli, A. Shimakawa, G. T. Gullberg et al., Radiology 160, 781 (1986). 76. C. L. Dumoulin, S. P. Souza, M. F. Walker et al., Magn. Resonance Med. 9, 139 (1989). 77. Y. Ozsunar and A. G. Sorensen, Top. Magn. Resonance Imaging 11, 259 (2000). 78. N. J. Beauchamp Jr., A. M. Ulug, T. J. Passe et al., Radiographics 18, 1,269 (1998). 79. R. H. Rosenwasser and R. A. Armonda, Clin. Neurosurg. 46, 237 (2000). 80. S. L. Keir and J. M. Wardlaw, Stroke (Online) 31, 2,723 (2000). 81. H. A. Rowley, P. E. Grant, and T. P. Roberts, Neuroimaging Clin. North America 9, 343 (1999). 82. P. Bornert and D. Jensen, Magn. Resonance Imaging 12, 1,033 (1994). 83. S. D. Wolff and R. S. Balaban, Magn. Resonance Med. 10, 135 (1989). 84. M. A. van Buchem, J. Comput. Assisted Tomography 23, S9 (1999). 85. L. J. Bagley, R. I. Grossman, and J. C. McGowan, Neurology 53, S49 (1999). 86. I. R. Young, J. A. Payne, A. G. Collins et al., Magn. Resonance Med. 4, 333 (1987). 87. E. C. Unger, M. S. Cohen, and T. R. Brown, Magn. Resonance Imaging 7, 163 (1989).
1002
MAGNETOSPHERIC IMAGING
88. R. C. Semelka, N. L. Kelekis, D. Thomasson et al., J. Magn. Resonance Imaging 6, 698 (1996). 89. C. M. van Uijen and J. H. den Boef, Magn. Resonance Med. 1, 502 (1984). 90. M. K. Stehling, R. Turner, and P. Mansfield, Science 254, 43 (1991). 91. D. Chien and R. R. Edelman, Magn. Resonance Q. 7, 31 (1991). 92. I. R. Young, A. S. Hall, and G. M. Bydder, Magn. Resonance Med. 5, 99 (1987). 93. L. Kjaer and O. Henriksen, Acta Radiologica 29, 231 (1988). 94. T. W. Chan, J. Listerud, and H. Y. Kressel, Radiology 181, 41 (1991). 95. G. Sze, Can. Assoc. Radiol. J. 42, 190 (1991). 96. R. W. Prost, L. Mark, and S. -J. Li, in ISMRM, Sydney, Australia, 1998. 97. W. A. Edelstein, P. A. Bottomley, H. R. Hart et al., J. Comput. Assisted Tomography 7, 391 (1983). 98. F. Schmitt, A. Dewdney, and W. Renz, Magn. Resonance Imaging Clin. North America 7, 733 (1999). 99. L. Marti-Bonmati and M. Kormano, Eur. Radiol. 7, 263 (1997). 100. D. Hawksworth, Magn. Resonance Med. 17, 27 (1991). 101. J. H. Battocletti, Crit. Rev. Biomed. Eng. 11, 313 (1984). 102. J. W. Carlson, K. A. Derby, K. C. Hawryszko et al., Magn. Resonance Med. 26, 191 (1992). 103. G. N. Holland and J. R. MacFall, J. of Magnetic Resonance Imaging 2, 241 (1992). 104. J. H. Letcher, Magn. Resonance Imaging 7, 581 (1989). 105. D. I. Hoult and R. E. Richards, J. Magn. Resonance 24, 71 (1976). 106. J. F. Schenck, H. R. Hart, T. H. Foster et al., Magn. Resonance Annu. 123 (1986). 107. D. I. Hoult, C. N. Chen, and V. J. Sank, Magn. Resonance Med. 1, 339 (1984). 108. C. E. Hayes, W. A. Edelstein, J. F. Schenck et al., J. Magn. Resonance 63, 622 (1985). 109. R. W. Prost, in Elliptical Excitation Chemical Shift Imaging at 0.5 Tesla in Neuro Degenerative Disease, Medical College of Wisconsin, Milwaukee, WI, 1999, p. 194.
MAGNETOSPHERIC IMAGING J. L. BURCH Southwest Research Institute San Antonio, TX
INTRODUCTION The sun interacts with the earth to create a very dynamic and interesting region called the magnetosphere, that envelops the atmosphere and shields the immediate human environment from interplanetary space. As the technological age has advanced, the importance of the magnetosphere as a medium that must be monitored and whose behavior must be predicted has rapidly become apparent. The magnetosphere protects the earth’s upper atmosphere from direct exposure to the flow of charged particles from the sun, known as the solar wind, and it is also the site of disturbances — geomagnetic storms
and magnetospheric substorms — that can adversely affect human activities. For example, such disturbances generate strong electrical currents overhead in the ionosphere and below the earth’s surface, that can disrupt communications and power systems. They also accelerate magnetospheric ions and electrons to extremely high energies, creating populations of high-energy particles that can damage electronic systems in space and that pose potentially serious health risks to astronauts. The sun contributes to the formation of the earth’s magnetosphere in two important ways. First, the solar wind — the supersonic stream of plasma that flows away from the sun and carries with it the embedded magnetic field, known as the interplanetary magnetic field (IMF) — applies mechanical pressure to the geomagnetic field, compressing it on the day side and stretching it into a tail many hundreds of earth radii long on the night side (1 earth radius = 6,378 kilometers). Only a small fraction of the solar-wind plasma enters the magnetosphere, but a larger fraction of the solar-wind electric field is admitted that drives the large-scale flow of plasma within the magnetosphere. Secondly, solar X rays and ultraviolet radiation strip electrons from atoms in the upper atmosphere, creating the partially ionized region of the atmosphere, known as the ionosphere. Like all plasmas, the ionosphere is electrically conducting and thus is strongly influenced by electromagnetic forces. Accelerated by electric fields and guided by the earth’s magnetic field, ions and electrons flow continuously upward from the high-latitude ionosphere into the magnetosphere. This outflow is an important source of magnetospheric plasma. To date, the magnetosphere has been investigated largely by measurements made by single spacecraft along their orbital tracks. The understanding derived from such localized measurements is, by necessity, partial and fragmented. Magnetospheric imaging is needed to provide the missing global context that will allow space researchers, for the first time, to study the earth’s magnetosphere as a coherent global system of interacting components, driven by the highly variable input of mass, momentum, and energy from the solar wind. The many spacecraft that have explored the magnetosphere have identified its average configuration and the different populations of plasmas and energetic particles within it. Strong shear flows and pressure gradients at the boundaries that separate these populations often lead to current systems that link the outer magnetosphere with the upper atmosphere. The resulting coupling waxes and wanes owing to the occurrence of geomagnetic storms and the more impulsive and shorter lived magnetospheric substorms. The most dramatic and familiar manifestations of these disturbances are the aurora borealis and aurora australis. These polar lights are like images on a TV screen upon which is projected the global distribution of charged particles and currents in the magnetosphere. Figure 1 shows an artist’s conception of the average configuration of the magnetosphere. In situ measurements during the past 35 years have yielded a wealth of statistical information about the
MAGNETOSPHERIC IMAGING
1003
Figure 1. Artist’s drawing showing the principal plasma regions and boundaries of the earth’s magnetosphere. Different imaging techniques are used according to the properties of the plasma population to be imaged. Traditional photon imaging at ultraviolet wavelengths is used for the low-energy plasma of the plasmasphere, and the detection of energetic neutral atoms is used to image the higher energy plasmas of the ring current/radiation belts, inner plasma sheet, and cusp.
magnetosphere and its constituent plasma regimes. They have also provided many examples of dynamic changes of magnetospheric plasma parameters at specific times and places in response to changes in the solar-wind input and to internal disturbances related to substorms. Statistical global averages and individual events are, however, not sufficient to understand the dynamics and interconnection of this highly structured system. Fundamental questions concerning plasma entry into the magnetosphere, global plasma circulation and energization, and the global response of the magnetospheric system to internal and external forcing remain unanswered. These processes occur on time scales of minutes to hours, yet, currently available statistical averages are on time scales of months to years. To further our understanding of the physical processes that affect the magnetospheric plasma requires nearly instantaneous measurement of its structure, which can be obtained only by magnetospheric imaging. To image a large, complex region such as the magnetosphere, which is invisible to traditional astronomical techniques, it is necessary to image its component parts. The particular imaging technique to be employed depends to a large degree on the properties of the plasmas that
are to be imaged. One technique for magnetospheric imaging is extreme ultraviolet (EUV) imaging. This technique uses the fact that helium ions in the sun emit ultraviolet light at 30.4 nm. This light is absorbed and then rapidly reemitted by helium ions in the magnetosphere. The greatest concentration of helium ions is in the plasmasphere, the high-altitude extension of the ionosphere. Most of the ions in the plasmasphere are hydrogen, but hydrogen ions produce no light emissions. On the other hand, there is a nearly constant fraction (∼10–15%) of helium in the plasmasphere, which can be used as a tracer for the region. Recently, the first images of the plasmasphere taken from outside the region were obtained by an EUV imager on the Japanese Nozomi (Planet-B) spacecraft while it was in earth parking orbit before it was injected into an interplanetary trajectory to Mars (Fig. 2). The image in Fig. 2 took nearly 21 hours to construct because the spacecraft was not spinning and had to rely on its orbital motion to complete a raster scan of a large portion, but not all, of the plasmasphere (1). For more energetic ion populations, a different technique, energetic neutral atom (ENA) imaging, is used. Because the target ions in the earth’s inner magnetosphere are trapped by the geomagnetic field, they can be
1004
MAGNETOSPHERIC IMAGING
Polar Ceppad/IPS and VIS
Figure 2. A portion of the earth’s plasmasphere imaged at extreme ultraviolet wavelengths by the Japanese Nozomi (Planet B) spacecraft. The image was created from photons emitted from singly ionized helium at a wavelength of 30.4 nm and accumulated during a nearly 21-hour period in September 1998 while the spacecraft was in a parking orbit around the earth (1). (Image from M. Nakamura et al., Geophysical Research Letters, Vol. 27, No. 2, 141–144, January 15, 2000. Copyright 2000 American Geophysical Union. Reproduced by permission of the American Geophysical Union.)
imaged on a global scale from remote locations only if they are converted to neutral atoms and released from their magnetic confinement to travel along line-of-sight paths to a remote imager. This conversion occurs by charge exchange. In this process, a few percent of the energetic ions in a particular population pick up electrons from the neutral hydrogen atoms that occupy the outer reaches of the earth’s atmosphere (the exosphere). Charge exchange preserves the parent ion’s direction, speed, and mass so that suitable ENA cameras can obtain detailed information about the magnetospheric ion populations. ENA imaging of ions at ring-current energies has been accomplished with instruments originally designed to measure energetic ions on at least two spacecraft (2,3). Figure 3 shows an image obtained from the NASA Polar spacecraft (4). These instruments were able to form the ENA images when the spacecraft were in the region above the earth’s polar cap, where the fluxes of energetic ions were below the instrument threshold. The energetic ion instruments were essentially pinhole cameras that had solid-state detectors at the image plane. Because only a few per cent of the energetic ions in the ring current and radiation belt undergo charge exchange and the distance of the spacecraft from the ion populations was several earth radii, the ENA fluxes in the region were very low even though they still considerably exceeded the energetic ion fluxes. For example, the image in Fig. 3 took about one and a half minutes to form, and considerable smoothing was required because the pixel size of the raw images was 20° × 12° . Thus, to achieve an acceptable signal-to-noise ratio, ENA cameras need to be considerably larger than the ion detectors that typically have flown on magnetospheric satellites. A necessary component of neutral atom imaging is the translation of the neutral atom images into images of the parent ion populations (5,6). The generation of ion
November 24, 1996
Figure 3. Energetic neutral atom image of the earth’s space environment during a magnetospheric disturbance. The image was acquired with the CEPPAD instrument on NASA’s Polar spacecraft. The view is from high northern latitudes at an altitude of about 25,000 km. The earth is shown by an ultraviolet image obtained on Polar. The neutral atoms have energies between 75 and 113 keV. The image is a 10-minute average shown in a hemispheric fisheye projection. CEPPAD is a charged particle detector and was not designed for ENA imaging. However, owing to the low charged particle background at the particular segment of the spacecraft orbit where this image was acquired, CEPPAD was able to detect the energetic neutrals produced by charge exchange between the energetic radiation belt particles and exospheric neutral hydrogen (4).
images from neutral atom images requires knowledge of the hydrogen atom density of the upper atmosphere, which can be provided by imaging the hydrogen atoms of the earth’s exosphere using resonantly scattered solar Lyman-a emissions (121.6 nm). The Lyman-a ‘‘glow’’ of the exosphere is called the geocorona. The neutral atom populations being imaged are ‘‘optically thin,’’ or transparent. For such a transparent medium, the signal obtained is the integrated signal along the line of sight through the imaged region. The charge-exchange process also makes possible ultraviolet imaging of energetic protons that precipitate into the atmosphere and produce the proton aurora. Until now, the aurora is the only magnetospheric phenomenon that has been imaged routinely. These images are made at wavelengths at which nitrogen and oxygen are excited by precipitating energetic electrons. Using grating-based techniques, it has become possible to image energetic neutral hydrogen atoms produced by charge exchange of energetic protons just before they impact the upper atmosphere to produce the proton aurora (7). In this case, rather than imaging the aurora itself, the camera would actually be imaging the protons that produce the aurora. Such imaging is very challenging in that the resonantly scattered solar Lyman-a radiation (121.6 nm) from the
MAGNETOSPHERIC IMAGING
energetic hydrogen atoms has to be separated from the radiation from the ‘‘cold’’ hydrogen atoms at the same wavelength in the upper atmosphere. The separation is made possible because the energetic atom emissions are red-shifted when viewed from high altitudes in the magnetosphere. For particles whose energies are several keV, a red-shifted emission at 121.8 nm can be imaged. For the purposes of this article, magnetospheric imaging will be defined as the collection of techniques that can provide large-scale images of various magnetospheric charged particle populations on time scales that are relevant to those of magnetospheric disturbances (i.e., a few minutes). This definition will include imaging precipitating energetic protons. Radio sounding of the magnetosphere — a technique employed in NASA’s IMAGE (Imager for Magnetopause to Aurora Global Exploration) mission (8) — is beyond the scope of the article. However, the radio sounding system on IMAGE (9) has already demonstrated the capability of remotely mapping the plasmasphere, cusp, and magnetopause. The current state-ofthe-art magnetospheric imaging technology is embodied in the instrumentation developed for the IMAGE mission (launched on 25 March, 2000), which is the first mission dedicated to global imaging of the terrestrial magnetosphere. Thus the description of imaging techniques in the following sections will focus specifically on the IMAGE instruments. The discussion is ordered according to the particle populations to be imaged. EXTREME ULTRAVIOLET (EUV) IMAGING OF THE PLASMASPHERE
2. image emissions of 0.1–0.3 R in an integration time of several minutes; 3. have a wide field of view (extending out to at least a seven earth radii geocentric distance) to encompass the entire plasmasphere in a single ‘‘snapshot,’’ and 4. reject bright contaminating emissions such as 121.6 nm from the geocorona and interplanetary medium and the He 58.4-nm line emission from the upper atmosphere. The EUV imager on the IMAGE spacecraft is an instrument that meets these requirements (10). Figure 4 illustrates the optical design of the three IMAGE EUV sensor heads. Their characteristics are summarized in Table 1. The entrance aperture consists of four segments that collectively form a nearly complete annulus. Each of the four segments of the annulus contains a thin (150-nm) aluminum filter that transmits the He+ 30.4-nm line and excludes the bright geocoronal H Lymana line at 121.6 nm, as well as other emissions. Light that passes through the filter reaches the mirror. For good reflectivity at the target wavelength, the mirror has a multilayer coating formed of depleted uranium and silicon. Light reflected from the mirror is focused on the spherical focal surface of the microchannel plate (MCP) detector. To achieve the 90° field of view required for plasmaspheric imaging, three EUV sensor heads are used, and their respective 30° fields of view overlap by 3° to compensate for vignetting effects. The front of the sensor head is insulated from the main body and held at +15 V to exclude low-energy ions from the sensor head. The sensor heads are copiously vented to avoid damage to the filters
The plasmasphere is populated by the outflow of ionospheric plasma along mid- and low-latitude magnetic field lines (i.e., those that map to magnetic latitudes of ∼60° and less). The plasma of the plasmasphere ‘‘corotates’’ with the earth, unlike the plasma of the magnetosphere’s central plasma sheet, which in general ‘‘convects’’ (i.e., flows) sunward toward the dayside magnetopause. When the magnetosphere is disturbed by a magnetic storm, enhanced convection ‘‘erodes’’ the outer plasmasphere, captures plasma in the afternoon-dusk sector, and transports it outward and sunward toward the magnetopause. As geomagnetic activity abates, the ionosphere ‘‘refills’’ the plasmasphere. By imaging the helium ions from above the plasmasphere, it is possible to map the refilling of the plasmasphere during quiet times and the stripping away of its outer regions during magnetic storms. Direct measurements of the ions from spacecraft have led to intriguing, but unproven, notions that long plasma tails or detached islands are produced and are driven out through the dayside boundary of the magnetosphere by high-altitude winds. Global helium ion images can to test these theories definitively. To achieve the goal of imaging the earth’s plasmasphere, an extreme ultraviolet imager must
Annular entrance 2 aperture (25 cm )
Preamp
Filter
Spherical multilayer mirror
0
1. accommodate a maximum 30.4-nm brightness of ∼10 R (Rayleighs) in the plasmasphere as well as the much brighter, localized ionospheric source;
1005
Curved MCP detector
5
10
15
cm Figure 4. Cross-sectional view of one of the three IMAGE EUV sensor heads (10).
1006
MAGNETOSPHERIC IMAGING Table 1. EUV Sensor Characteristics
Aperture area Aperture diameter (outer) Aperture diameter (inner) Mirror diameter Mirror radius of curvature Focal ratio Detector active diameter Radius of curvature of focal surface
Value cm2
21.8 8.2 cm 6.0 cm 12.3 cm 13.5 cm f /0.8 4.0 cm 7.0 cm
25 Reflectivity (%)
Property
30
15 10 5 0
from the acoustic environment at launch. All of the vent openings are covered by a fine mesh that is also held at +15 V. Filters The filter material was chosen to provide high transmission at the target wavelength of 30.4 nm while attenuating the unwanted hydrogen Ly-a emission from the earth’s exosphere and the interplanetary medium. The filters consist of a 150-nm thick layer of aluminum supported by a nickel mesh that has a wire spacing of 0.36 mm and an open area of 93%. The aluminum material was permitted to oxidize naturally on both sides. This oxide coating stabilizes the material and provides the additional benefit of reducing the transmission of the He 58.4-nm line. This emission is expected to be weak in the plasmasphere but can be quite bright in the earth’s ionosphere. A strong response to the ionospheric He 58.4-nm emission could lead to high count rates when the earth is in the field of view and thus compromise measurements of 30.4-nm radiation. The measured transmission of the IMAGE EUV filters at 30.4 nm and 58.4 nm is 33% and 12%, respectively. At 121.6 nm (Ly-a), the transmission is essentially unmeasurable, but models show that it is approximately 10−7 . Multilayer Mirrors Almost all materials absorb highly in the extreme ultraviolet. This means that only a few interfaces in a multilayer contribute to its reflectance. Surface contamination and corrosion are particular problems because as little as 0.05 nm of an oil film or surface oxide can significantly change a mirror’s reflectance. The Lya 121.6-nm and He I 58.4-nm lines are much brighter than the 30.4-nm line. The 121.6-nm line is blocked by the Al filter, but Al is relatively transparent to 58.4-nm radiation. There is no acceptable filter for blocking 58.4nm radiation that meets the other mission requirements. The optical response of the mirrors must be retained over a 6° range of incidence angles (12° –18° from normal), which is relatively broad for multilayer mirrors. The EUV instrument uses a new kind of extreme ultraviolet mirror that retains high 30.4-nm reflectivity and low 58.4-nm reflectivity, which is challenging because most materials have much higher single-layer reflectivities at 58.4 nm than at 30.4 nm. To reflect well at 30.4 nm and poorly at 58.4 nm, the top part of the stack, which
30.4 nm
20
58.4 nm
−6
−4
−2 0 2 4 Radial position (cm)
6
Figure 5. The measured reflectivity of one of the IMAGE EUV multilayer mirrors (10).
reflects well at 30.4 nm, must act as an antireflection layer at 58.4 nm. Such mirrors that have dual optical function in the extreme ultraviolet have not been designed or built previously. Depleted uranium proved to be the only metallic material for the high-index (the socalled absorber) layer which also produced low 58.4 nm reflectance. The genetic algorithm (11) calls attention to the possibility of using oxidized uranium as the top layer material and shows the proper range of thicknesses to consider. The top layer of the mirror consists of a thin layer of UOX produced by oxidizing of 1.5 nm of uranium within a few minutes of exposure to air. (If the film oxidizes to UO3 , the layer will swell to more than three times its original thickness). This oxide functions as the top high-index layer in the multilayer when considering high reflectance at 30.4 nm. This is the first time this has been done for EUV multilayers. The oxide is also largely responsible for the low reflectance at 58.4 nm. Beneath the uranium oxide layer are 6.5 periods of bilayers that have 12.8 ± 0.01 nm of silicon and 5.3 ± 0.1 nm of uranium. The bottom layer consists of one 10.6-nm layer of uranium, twice the uranium thickness used in the other layers. This extrathick bottom layer allows the coating to be released from the glass, if recoating is necessary, without compromising optical performance. All films were deposited by dc magnetron sputtering. The detailed process for fabricating the EUV mirrors is described by Sandel et al. (10). Figure 5 shows the measured reflectivity of one of the IMAGE EUV multilayer mirrors. The aluminum filters, spherical multilayer mirrors, and spherical MCPs of the IMAGE EUV instrument provide excellent imaging of the plasmasphere. Figure 6 shows a global plasmaspheric image obtained by the EUV instrument every ten minutes during the two-year IMAGE mission. ENERGETIC NEUTRAL ATOM IMAGING AT HIGH ENERGIES: RADIATION BELTS AND RING CURRENT Charged particles that have energies from a few tens of keV up to several MeV are trapped in the earth’s magnetic field and form the well-known Van Allen belts. One of the primary replenishing mechanisms for the radiation
MAGNETOSPHERIC IMAGING
1007
4. have a wide field of view (extending out to at least a seven earth radii geocentric distance) to encompass the entire ring current and radiation belts in a single ‘‘snapshot,’’ 5. reject bright contaminating emissions such as 121.6 nm from the exosphere and interplanetary medium, and 6. reject energetic ion and electron fluxes. The High-Energy Neutral Atom imager (HENA) on the IMAGE spacecraft is an instrument that meets these requirements (12). The HENA sensor characteristics are summarized in Table 2. HENA Optics
Figure 6. Global view of the Earth’s plasmasphere on August 11, 2000, imaged with the EUV imager on NASA’s IMAGE spacecraft. The extreme ultraviolet emissions are solar 30.4-nm photons that have been resonantly scattered by singly ionized helium. The view is toward the Earth’s northern hemisphere from an altitude of roughly 26,500 km. The small circle in the upper right indicates the direction toward the Sun. The darker region in the lower left results from shadowing by the Earth. A ‘‘tail’’ of plasma can be seen extending from the dusk plasmasphere in the upper left. (Image from B. R. Sandel et al., Geophysical Research Letters, Vol. 28, No. 8, 1439–1442, April 15, 2001. Copyright 2001 American Geophysical Union. Reproduced by permission of the American Geophysical Union.) See color insert.
belts is magnetic storms, during which plasma is injected toward the inner magnetosphere from the plasma sheet (see Fig. 1) and forms a ring current that eventually surrounds the earth. The ring current is responsible for the magnetic perturbations that are observed on the surface of the earth during storms. There is also an ionospheric component to the ring current because ionospheric ions are accelerated out into the magnetosphere in association with the enhanced auroral activity that occurs during the storms. The freshly injected ring-current ions have energies between several tens of keV and a few hundred keV. The ions have been measured by numerous spacecraft flying through the region, but understanding the source and loss processes of the ring current requires global imaging, which can be done with energetic neutral atom (ENA) cameras. To achieve the goal of imaging the earth’s radiation belts and ring current, an ENA imager must 1. detect atoms whose energies range from a few tens of kev to several hundred keV and have an effective geometric factor of ∼1 cm2 sr, 2. obtain at least crude energy resolution ( E/E∼0.8) and mass resolution (separate H and O), 3. form images whose a spatial resolution in the ring current and radiation belts is 1RE or less,
A schematic view of HENA is shown in Fig. 7. The instrument is a slit camera that has a 90° × 120° field of view that operates over an energy range from 20–500 keV. A collimator that sweeps out energetic ions and electrons is in front of the entrance slit. The slit is covered by a thin foil that absorbs a large fraction of the ultraviolet flux and also provides a start signal for time-of-flight analysis by virtue of secondary electron emission from its back side. An MCP that extends the length of the foil detects the secondary electrons and, using its one-dimensional position-sensitive anode, locates the entrance position of the atom along the slit. The ENAs that have passed through the foil suffer a certain amount of scattering but continue in approximately a straight-line path to the back plane of the instrument where they encounter one of two types of detectors–a position-sensitive solid-state detector (SSD) or an MCP mounted behind a second foil. Secondary electrons produced by the arriving ENAs at either the SSD or the front of the second foil are steered by wire electrodes to trigger the coincidence/SSD-stop MCP, so-called because it provides fast stop pulses for the SSD detector and also, together with the start and stop pulses from the start and Table 2. HENA Characteristics Energy range Energy resolution [ E/E] Velocity resolution Composition Field of view Angular coverage Angular resolution ( φ θ) Angular resolution is degraded by ENA scattering in foils at low E Time resolution (statistics limited) Sensitivity, G × ε(cm2 -sr) G × ε for 3° × 3° pixel (image accumulation bin) Pixel array dimensionsa UV rejection Electron rejection Ion rejection Dynamic range a
∼20–500 keV/nuc. ≤0.25 ∼50 km/s (1 ns TOF) H, He, O, heavies 120° × 90° ∼3π sr, spinning ∼4° × 6° , high energy (>80 keV/nuc) <8° × <12° above ∼40 keV/nuc (see Fig. 9) 2.0 s., PHA events 1 spin, 2 min, images ∼1.6 for O, ∼0.3 for H 0.0027, tapered at FOV edges 4M × 8E × 40 A × 120 A <3 × 10−7 (Ly-a, I below 10 kR) ∼10−5 (E < collimator rejection) ∼10−5 (E < collimator rejection) ∼107
E ⇒ energies; M ⇒ masses; A ⇒ angles.
MAGNETOSPHERIC IMAGING
ENA, MCP-only analysis
Charged particle
ENA, SSD analysis
Charged particle sweeping collimator
1-D imaging, start MCP
Coincidence/ SSD-Stop MCP e e −
2-D imaging, stop MCP
Pixelated SSDs SSD electronics
Foils and UV Sensitivity
Preamps HV supplies Figure 7. Cross-sectional schematic view of high-energy neutral atom (HENA) imager (12).
When an ENA strikes the SSD, both position (pixel ID) and total energy (through pulse-height analysis) are derived. Secondary electrons generated at the surface are accelerated into the C/S MCP and have a travel time between the SSD and the MCP of between 6 and 10 ns, depending on which row of SSD pixels is struck. The signal produced in the C/S MCP is corrected for electron travel time and used for the SSD TOF. The TOF solution measures the velocity of the ENA, which, when combined with the total energy measurement, allows calculating the atom’s mass. A different technique may be used on the MCP side of the back plane for mass determination. The number of secondary electrons produced in each foil depends on the atomic number of the neutral atom; of the two most common neutral atoms expected (H and O), it has been found that oxygen produces several times the number of secondaries that hydrogen produces. Exploiting this phenomenon, the pulse-height of the MCP signal is used to determine the species based on that measurement. This method is optimized by pulse-height analysis (PHA) of both the signal from the front of the MCP and the signal from the back of the MCP, as shown in Fig. 8. This technique is the primary approach to mass determination on the Cassini INCA sensor (13), which has no SSD system. For HENA, it is used as a complement to the SSD system.
the
IMAGE
stop MCPs, provides triple-coincidence rejection of signals from photons or high energy electrons. The dots in Fig. 7 indicate the locations of the wire electrodes for steering the secondary electrons to their respective MCPs. Based on its 90° × 120° field of view, the HENA instrument could obtain suitable images of the inner magnetosphere from a nadir-pointing spacecraft. However, other instruments on board IMAGE require a spinning spacecraft. The spin rate is 0.5 rpm, and the spin axis is perpendicular to the plane of Fig. 7. Thus a single target pixel moves across the MCP and SSD detector planes as the spacecraft spins and results in both an MCP image and a SSD image (that has its higher resolution mass analysis) of the ring current and radiation belts every two minutes. If an ENA is directed toward the MCP portion of the focal plane, secondary electrons produced on the exit surface of the second foil are accelerated into the MCP that has a 2-D imaging anode that maps the position of impact and registers the stop timing pulse for the TOF measurement. In addition to this TOF and trajectory measurement, secondary electrons produced as the ENA enters the back foil are electrostatically accelerated and guided to the coincidence/SSD stop MCP (C/S MCP) (see Fig. 7). The electron travel time for the backscattered electrons from the back foil to the C/S MCP is constrained to <40 ns by the steering potentials.
There are two thin foils in HENA, the Si–polyimide–C foil at the entrance slit and the C–polyimide–C foil at the back plane in front of the 2-D MCP array. These foils are the source of secondary electrons for start and stop pulses used in the TOF analysis. The second foil also provides backscattered electrons for the coincidence pulse. However, electrons are also produced in the foils from photoionization by ultraviolet light, predominantly in the Ly-a line from the earth’s hydrogen exosphere. Therefore, the foil must be thick enough to attenuate the Ly-a to an acceptable level, so that the position and TOF circuitry
31-keV/nuc pulse-height
Back PH
1008
20,000 18,000 16,000 14,000 12,000 10,000 8000 6000 4000 2000 0
O H
0
2000
6000 4000 Front PH
8000
Figure 8. Separation of oxygen and hydrogen using MCP pulse-height signatures. The 31-keV O and H measurements are overplotted. The hydrogen separates as lower MCP pulse-heights. At higher energies, this separation becomes more complete. Data from calibration of Cassini/MIMI/INCA sensor (13).
MAGNETOSPHERIC IMAGING
are not swamped by false counts. Any single photon can produce a start or a stop signal, but not both. As long as the time between a false Ly-a start and a separate false Ly-a stop is long compared with the maximum valid TOF period (∼100 nsec), this background will not produce false events. The electronic design assumes nominal upper limits for Ly-a start and stop events of ≤105 s−1 , and ≤1, 000 s−1 , respectively. The Ly-a flux from the sunlit atmosphere at earth (excluding the Sun itself) is ∼10–30 kR (14), which corresponds to ∼0.32–1.0 × 1010 photons/s incident on the front foil (which has a ∼4 cm2 sr geometry factor). The forward photoelectron yield of a foil that has a C exit surface by Ly-a photons is ∼1% (15). The HENA foil comprises three layers: 6.5 µg cm−2 Si, 7.0 µg cm−2 polyimide, and 1.0 µg cm−2 C that has measured 1.5 × 10−3 Ly-a transmittance to reach a conservative design goal of ≤0.5–1.5 × 105 Ly-a induced
1009
secondary electrons per second, when the brightest portion of the earth’s atmosphere/exosphere fills the FOV. Laboratory calibrations of the heritage INCA instrument have verified this system’s capability to continue to image and reject Ly-a background at UV-induced counting rates in excess of 3 × 105 s−1 . The front foil is mounted on a 70-line/inch nickel mesh (82% transmission), supplied by Buckby-Mears. The stop foil, composed of 7.0 µg cm−2 of polyimide and 5.0 µg cm−2 carbon, will reduce the Ly-a flux to the back, 2-D imaging MCP by about another factor of 100. When the sun is in view, no practical foil composition–thickness combination could be identified to sufficiently suppress it because the quantum efficiencies for secondary electron production are so high at EUV wavelengths. Therefore, to avoid high rates a mechanical shutter is closed once per spin while the sun is in view.
Theta FWHM vs H energy Phi FWHM vs H energy
20
FWHM (Degrees)
FWHM (degrees)
25
15 10 5 0 0
20
40 60 80 100 Hydrogen energy (keV)
120
16 14 12 10 8 6 4 2 0 0
20
100 40 60 80 Hydrogen energy (keV)
120
Figure 9. Calibration data that show angular resolution of the IMAGE HENA instrument vs. hydrogen energy (12).
Figure 10. Energetic neutral atom image that shows flux at 39–60 keV during a magnetospheric substorm on 10 June 2000. The view is from the IMAGE satellite located above the North Pole of the Earth (the white circle). Magnetic field lines that have equatorial crossings at four, eight, and twelve earth radii are also shown in white. Noon and midnight field lines are denoted by S and A, respectively. The plot shows a hemispherical fish-eye view.
1010
MAGNETOSPHERIC IMAGING
Angular Resolution and Expected Results The HENA angular resolution as revealed by calibration generally respond to the measurement requirements of the radiation belt and ring current (see Fig. 9 and Table 2). Typically, the azimuth angle φ meets or exceeds those requirements (shown in Fig. 9), whereas the elevation angle θ meets or falls slightly short. The response of the HENA instrument to a ring-current event on 10 June 2000 is shown in Fig. 10. HENA returns such images every two minutes throughout the two-year IMAGE mission. ENERGETIC NEUTRAL ATOM IMAGING AT MEDIUM ENERGIES: THE PLASMA SHEET AND CUSP The plasma sheet and cusp (see Fig. 1) are the most important channels of plasma injection into the inner magnetosphere. The northern and southern cusps are the only certain locations where solar-wind plasma has direct access to the magnetosphere, whereas the plasma sheet is the great reservoir of plasma that is available for injection into the inner magnetosphere during magnetic storms and magnetospheric substorms. Occupying regions of lower magnetic field, these regions have much lower energies than the ring current and radiation belts. The cusp ions have typical energies in the range of 1–2 keV, and the plasma sheet has typical ion energies of about 5 keV. The requirements for imaging the plasma sheet and cusp are as follows: 1. detect atoms with energies from 1–30 keV whose effective geometric factor is ∼0.1 cm2 sr, 2. obtain at least crude energy resolution ( E/E∼0.8) and mass resolution (separate H and O), 3. form images whose a spatial resolution in the ring current and radiation belts is 1 RE or better, 4. have a wide field of view (extending out to at least a seven earth radii geocentric distance) to encompass the entire ring current and radiation belts in a single ‘‘snapshot,’’ 5. reject bright contaminating emissions such as 121.6 nm from the exosphere and interplanetary medium, and 6. reject energetic ion and electron fluxes. ENA imaging can be used for both of these regions, and a slit-camera approach such as used for the higher energy ions is feasible. However, the foils used to generate start pulses for 1 keV ions must be much thinner than those in HENA. These ultrathin foils (∼0.5 µ g cm−2 ) admit too much Lyman a, and the MCPs would be swamped with counts. Thus, an additional element is needed to suppress the ultraviolet radiation by at least a factor of 10−5 . This element is a gold UV-blocking grating. The medium-energy neutral atom imager (MENA) on the IMAGE mission is a grating-based slit camera that meets the requirements for imaging the cusp and inner plasma sheet, as well as the low-energy end of the ring current (16).
A schematic view of the MENA instrument is shown in Fig. 11 and the instrument performance data are summarized in Table 3. The neutral atoms first pass through a multielement fan-shaped collimator whose plates carry alternately 0 V and +10 kV to sweep out energetic ions and electrons whose energies are up to a few hundred keV. The atoms then encounter the UV grating (Fig. 12). The MENA gratings were developed and manufactured at MIT using holographic lithography techniques (17). Each grating consists of an ensemble of gold bars, each 400 nm thick and 160 nm wide. The bars lie parallel to one another at a periodic spacing of 200 nm. The ∼60-nm spacing between the gold bars is the critical parameter for absorbing of hydrogen Ly a. The gold bars are supported on nickel grids. The grating acts as a strong polarizer and a tuned lossy wave guide. The photon transmission of the gold gratings has been modeled as a function of wavelength and measured in the laboratory (18–20). The results, shown in Fig. 13, indicate a transmission of ∼3 × 10−5 at hydrogen Ly a (121.6 nm). When convolved with the Ly-a emission profile from the earth’s exosphere (21), the total background count rate can be estimated at less than 250 s−1 , as shown in Fig. 14. As discussed later, the anticoincidence of the TOF circuitry will reject nearly all of these counts, leaving a residual rate of only ∼10−3 s−1 that results from accidental Start/Stop coincidences. ENAs that pass through the gold grating impact a thin (nominally 0.6 or 0.7 µg/cm2 ) carbon foil that has been affixed to the back face of the grating. The purpose of the foil is to generate secondary electrons for a start pulse. The number of secondary electrons emitted provides information regarding the ENA species, as discussed for HENA. The entire grating/foil assembly for each sensor consists of five carbon foils, gold gratings, and stainless steel grating frames. It is clamped into a lighttight mount in the sensor aperture and electrically biased to −1 kV with respect to chassis ground. Because the secondary electrons that are emitted have a broad angular distribution and energies of a few eV, the −1 kV bias is required to accelerate electrons toward the detector, thereby retaining the z coordinate of the ENA passage through the aperture. A grounded (85% transmission) grid is placed 1 mm downstream from the carbon foil, inside the sensor. Secondary electrons are accelerated to 1 keV through this grid and pass through a nearly field-free, 3-cm region before striking the detector over the Start anode segment. Thus, they provide the TOF Start signal and the z coordinate of detector impact, which is interpreted as the z coordinate of ENA passage through the aperture, or Start position. The pulse height of the MCP charge burst, or Start Height, is measured and recorded to assist in the statistical measurement of PHA for species identification. ENAs undergo energy- and mass-dependent scattering during passage through the foil, causing some degradation in the determination of polar angle for incident ENAs. Figure 15 contains results of a laboratory study of ENA scattering in a nominal 0.5-µg/cm2 foil (22). These data may be interpreted directly as a lower bound on the scattering half angle for ENAs that pass through the
MAGNETOSPHERIC IMAGING
1011
ym en
a
Collimator ± 55° imaging plane ± 2° spin plane
Pr im ar
UV gratings
Start foil
Accel grid Start electrons
L
Shield grid Stop MCP stack
Segmented position sensitive anode
Z stop
Z a
Z start
a = cot −1[ (Z stop − Z start) / L) ]
Z=0
Figure 11. Schematic view of a MENA sensor. A neutral atom passes through the Start foil, producing secondary electrons. Secondary electrons are accelerated toward the Start segment of the detector, whereas the ENA will impact the detector Stop segment. Particles incident on the Start and Stop segments of the detector will provide TOF timing signals, which together with their respective pulse height and ID position on the detector provide the required information for polar incidence angle, energy, and species determination of the ENA (16).
Table 3. MENA Characteristics Quantity
Requirement
Performance (O)
Performance (H)
Units
360 90
360 140
360 140
Degrees Degrees
8 8 1–30 80 H, O 1.0
5 12 3–70 80 Probabilistic 0.83
5 8 1–70 80 Probabilistic 0.83
Degrees Degrees keV % Unitless Cm2
Angle FOV Azimuthal Polar Angle resolution Azimuthal Polar Energy range Energy resolution Species ID Effective areaa a
Three sensors combined.
MENA aperture and therefore as a lower bound on the (half-width) polar angle resolution achievable using the MENA imager. The angular resolution in the collimated azimuth direction is not affected because the collimator fixes the resolution at approximately 5° FWHM. An incoming ENA continues (nearly) along its original trajectory and strikes the detector over the Stop anode segment, providing the TOF Stop signal, the z coordinate of ENA impact on the detector (or Stop Position), and the pulse height of the Stop MCP charge burst, or Stop Height. Like the Start Height, the Stop Height is related
to the ENA species. The path length of the ENA from the aperture to the detector is 3/ cos α cm, where α is the measured polar incidence angle, as shown in Fig. 11. The timing of the Start and Stop signals is compared to produce a time of flight. Because of the −1-kV accelerating potential applied to the carbon foil, a secondary electron’s transit time from foil to detector is ∼1.6 ns, assuming an initial secondary electron energy of 2 eV and a 45° emission angle. During this time, an electron will have traveled less than 1 mm transverse to the aperture normal (z direction). This
1012
MAGNETOSPHERIC IMAGING 200 nm
60 nm
50
400 nm
Scattering half-width q1/2 (deg)
Gold gratings
10
O
Nickel support grid
H
Grating transmission
10−2
Theory (full)
10−3 Theory (0 th order) 10−4 Data (0 th order) 10−5 20
40
60 80 100 Wavelength (nm)
120
140
Figure 13. Model and measured Lya photon transmission through MENA prototype gratings. The solid curve represents model results, points indicate measurements.
Differential count rate (Hz/Å)
100
1 Integral
10
0.1
1
0.01 Differential
0.1
0.001 0.0001
0
50
100 150 200 Photon wavelength (ηm)
Integral count rate [Hz]
1000
10
0.01 250
Figure 14. Background noise rate due to geocoronal photons. Shown are the total (Starts + Stops) expected MCP count rates due to UV photons near the IMAGE apogee. Both differential count rates and their integral are shown and labeled (16).
(k F
1
(k F
33
)
=3
4)
=1
2.6
)
Nominal foil thickness: 0.5 µg/cm2
E q1/2 = k F Data 0.1
10−1
=1
F
He
m 4µ
Figure 12. Schematic diagram of UV transmission gratings. The drawing is not to scale. Therefore, disparate dimensions like the 60-nm gaps between gold bars and the 4-µm nickel grid period are displayed on similar scales (16).
(k
1
10
40
Energy, E (keV) Figure 15. Angular scattering of ENAs upon passage through a nominal 0.5-µg cm−2 foil (22).
displacement adds to the uncertainty in measuring of the z coordinate of the ENA as it passes through the aperture, and therefore, in the ENA polar angle which is derived from it. As in EUV and HENA, the MENA instrument also uses an MCP and position-sensitive anode system. However, because of the mechanical collimation along the slit provided by the collimator, only one-dimensional position sensing is required. Therefore, a spinning platform is required for generating magnetospheric images from MENA. As shown in Fig. 11, the MENA instrument uses the same MCP for start and stop counts, and the center part is reserved for starts. This simplification leads to a blind spot in the middle of each sensor. All counts in that portion are assumed to be start pulses. At the expected MENA count rates (hundreds of counts/s), such accidental coincidences caused by misidentifying a stop pulse as a start pulse are insignificant. To cover the blind spot will require at least two MENA sensors. For symmetry, three sensors are actually used and are offset by 30° to prevent the blind spots from overlapping. A flight image obtained on by MENA 10 June 2000 is shown in Fig. 16. Such images are obtained every two minutes throughout the IMAGE mission. ENERGETIC NEUTRAL ATOM IMAGING AT LOW ENERGIES: IONOSPHERIC PLASMA OUTFLOW The upper ionosphere is heated by resistive currents (Joule heating) and by wave-particle interactions that are excited by magnetic-field-aligned currents associated with charged particle precipitation and the associated auroral activity. The net result of these magnetosphere–ionosphere interactions is an outflow of ions (primarily oxygen and hydrogen) into the magnetosphere. This source of ions competes on an equal footing with the solar wind as a provider of plasma to the magnetosphere. It is also the major loss process for hydrogen and oxygen,
MAGNETOSPHERIC IMAGING
1013
Figure 16. Energetic neutral atom image at 8.6 keV during a magnetospheric substorm on 10 June 2000. Format and time are the same as in Fig. 10.
and hence, water from the earth’s atmosphere. It is estimated that the loss rate from ionospheric outflow for O+ is a few times 1025 ions s−1 (23,24). It is known that ionospheric outflows are highly variable in time (25), but it has been difficult to study the variability on appropriate time scales. Most studies have been statistical where time scales range from a few hours to 11 years. Improvement of these time scales requires global imaging of the ion outflow. Because the ions undergo charge exchange with upper atmospheric atoms as they emerge from the ionosphere, global imaging of ion outflows can, in principle, be realized from neutral atom imaging, as for the more energetic magnetospheric ions discussed earlier. However, the thin-foil approach described for the IMAGE HENA and MENA instruments cannot be used within the <1 keV energy range of interest for the outflowing ions because ions at these energies cannot penetrate even the thinnest available foils. Alternatively, incoming neutral atoms can be converted to ions and then analyzed by conventional ion mass spectrometric techniques. Preferably, conversion is accomplished without disturbing the energy, angle, and mass properties of incoming neutral atoms.
The minimum requirements for imaging the ionospheric outflow are as follows: 1. image neutral atoms whose energies range from 10 eV to several hundred eV, and separate high and low energies; 2. field of view of at least two earth radii from a geocentric distance of two earth radii (90° full field); 3. have at least 100 pixels within the field of view; and 4. separate hydrogen and oxygen. The low-energy neutral atom imager (LENA) on the IMAGE spacecraft meets these requirements (26). The LENA sensor depends for its operation on converting incident neutral atoms into ions that retain the general characteristics of their parent atoms. The ions are velocityanalyzed by conventional techniques of electrostatic and time-of-flight analyses, then detected and imaged by conventional charged particle techniques. The method for the neutral-to-ion conversion is charge exchange upon reflection from a solid surface. A fraction of the incoming neutral atoms is converted to negative ions that are nearly specularly reflected from the surface, then accelerated and collected by an ion extraction lens that focuses and
1014
MAGNETOSPHERIC IMAGING
disperses them in energy. They are passed through a broom magnet that removes electrons and an electrostatic analyzer that maintains energy dispersion while removing photons; then they are delivered at high energy to a timeof-flight sensor and 2-D imaging system that sorts the particles by polar angle of arrival and energy. LENA performance and resource requirements are summarized in Table 4. Exterior to the conversion surface, LENA is a pinhole camera that has a ∼1 cm2 pinhole. Figure 17 illustrates the main components of LENA and the optics path
through the instrument. An optically polished nearconical conversion surface (consisting of four facets 22.5° wide each) intercepts incoming neutrals passed by the collimator/charged particle rejecter at an incidence angle of 75° from normal. The conversion surface is composed of four flat facets, each a trapezoid of optically polished tungsten. The entire structure that supports the conversion surface is floated at a high negative potential in the range from 15–20 kV. The conversion product negative ions (and any secondary electrons produced on the surface) are accelerated away from the conversion surface support
Table 4. LENA Characteristics Parameter
Value
Energy range (incident neutral atom) Energy resolution Mass range Mass resolution Angular coverage Angular resolution/response Total field of view Pixels per image Pixel physical aperture Pixel solid angle Time resolution 1-D polar × energy Time Resolution 2-D (az × polar) × energy Time Resolution 3-D (az × polar × energy) Dynamic range
15–1250 eV E/δE = 1 at FWHM 1–20 amu (H+ and O+ ) M/δM = 4 at FWHM Sampling: 8° × 8° × 12 sectors per spin: 360° × 90° in 45 × 12 samples 8° × 8° = 0.02 steradians 2.8 π sr 3.2 k = 3 Energy × 45 Az × 12 Polar × 2 Mass 1.0 cm2 (Aeff ≤ 1 × 10−3 cm−2 ) 0.02 sr per pixel × 12 pixels 2.7 s 120 s 1 spin period (120 s) 104
2. Collimator/charged particle rejector
S1 3. Conversion surface
6. Time of flight/position sensing 1. Neutral particles 4. Extraction lens
S3 Magnet Electron collector S2
5. Spherical electrostatic analyzer Figure 17. Two-dimensional cross section of the LENA imager optics (26).
MAGNETOSPHERIC IMAGING
Angular Response LENA is designed to have a 90° polar × 8° azimuthal acceptance angle about the entrance aperture. This angle is defined geometrically by the collimator and by the dimensions of the conversion surface. The polar angle is subdivided into twelve 7.5° polar bins, and binning of azimuthal angle and imaging of the neutral atoms is achieved by rotating of the spacecraft. Conversion of neutral atoms takes place on four distinct tungsten surfaces equally spaced across the polar angle dimension. Each surface is associated with distinct microchannel plate assemblies for both the start and stop sections of the TOF system, as well as distinct carbon foils. The relative efficiency of detecting neutral atoms within any one of
Polar angle response 1.2
Normalized counts per bin
structure and collected by the ion optics system. The surface converts incident atoms to negatively charged ions of corresponding species (or in some cases sputters surface adsorbates, again as negatively charged ions). The emitted negative ions are accelerated, focused, and dispersed in energy by an ion extraction lens. The lens can be trimmed to selected energy ranges from 10–300 eV and 25–750 eV. A broom magnet removes photoelectrons and secondary electrons if they fall within the LENA energy range. Negative ions then encounter a spherical electrostatic analyzer that maintains the radial energy dispersion of the ions. The main purpose of the electrostatic analyzer is to remove residual UV photons and present the negative ions to the imaging time-of-flight (ITOF) system for mass, energy, and polar angle analysis. The ions arrive at the ITOF system with energy that corresponds to the optics high voltage, nominally 20 keV, but their original energies can still be derived from the radial dispersion, which is preserved through the spherical analyzer. The ITOF system consists of carbon foil apertures, an electrostatic mirror system that reflects secondary electrons emitted from the back sides of the foils, and four sets of start and stop detectors (one for each conversion surface facet). The detectors are chevron stacks of microchannel plates whose length/diameter = 60. The start stacks are rectangular, and the stop stacks are trapezoidal corresponding to projections of the conversion surface facets. Start event charge clouds fall directly upon a 2-D wedge and strip anode (27) that has four preamplifiers; the ratio of their pulses define radial and polar angle coordinates for the events. The preamplifiers are located within the TOF detection unit, and pulses are passed to a position sensing system. Start and stop events generate fast pulses regardless of their point of impact on the detector system. These pulses are passed directly from the fast pulse anodes to the TOF electronics unit. The response of LENA to an incident neutral atom is governed by the dimensions of the collimator, the entrance aperture slit, the conversion surface, the ion optics, the ITOF optics, and finally the response of the detectors. The response is fully described by the product of the effective aperture area A, the pixel solid angle , and the energy range E for each accumulation. In practice, these quantities may be functions of the incident particle species or energy and of internal potentials applied to the optics elements or to the MCP detectors.
1015
−35° −15° +15° +35°
1.0 0.8 0.6 0.4 0.2 0.0 0
1
2
3
4
5
6
7
8
9
10 11 12
Azimuthal bin number Figure 18. Response function of IMAGE-LENA in polar angle for several fixed incoming beam directions (26).
these four angular sections is constant, but the efficiency among the sections varies by as much as a factor of 3. The angular resolution is ultimately limited by angular scattering of the atoms on the conversion surface, optical aberrations in the ion extraction lens, scattering of the detected secondary electrons from the carbon foil, and measurement error on the position-sensitive detector. These aberrations lead to a dispersion in the angular response, as illustrated in Fig. 18. Energy Response LENA was designed to operate across neutral atom energy ranges of 10–300 eV and 25 eV–750 eV, depending on the potential placed on the steering electrode. Within these ranges, these measured energies correspond to those of the scattered negative ion from the conversion surface. Particles incident on the surface lose energy across a broad distribution which peaks at about 80% of the incident energy for oxygen and 95% for hydrogen. In addition, low-energy negative ions, sputtered from the surface by the incident neutral particle, contribute to the signal. The ion extraction lens on LENA is designed to focus and disperse the scattered ions, according to energy, across the carbon foil detector plane on the TOF analyzer (Slit S3 in Fig. 17). The location of the incident ions on the carbon foil is determined by imaging the secondary electrons from the carbon foil onto a position-sensitive detector. In this way, measurements of both the energies and angles of the ions are made simultaneously. The nominal energy bins (at 20-kV optics potential) are centered at ∼50, 150, and 250 eV (ion energy) at the lowest steering potential. At the highest steering potential, the bins are shifted to ∼300, 500, and 700 eV, respectively. These energy ranges scale approximately with the optics potential in use at any given time. Time-of-Flight Response LENA includes a carbon foil TOF analyzer designed to separate converted hydrogen from converted heavy ions.
1016
MAGNETOSPHERIC IMAGING
10 1
sputtered atoms from the detector by using the steering electrode. This is shown in Fig. 19, where the ratio of hydrogen to oxygen changes from 1.78 in the upper panel to 0.07 in the lower panel. When the sputtered ions are not removed, compositional information is lost, but the sensitivity of the instrument is improved. An example of the LENA flight data is shown in Fig. 20, in which the horizontal axis is the 90° instrument polar angle and the vertical axis is the 360° spin angle of the IMAGE spacecraft.
10 0
FAR-ULTRAVIOLET IMAGING OF THE PROTON AND ELECTRON AURORAS
LENA Denver calibration TOF response
Counts per bin
10 4
Protons
10 3
200-eV atomic oxygen nominal LENA settings Oxygen
10 2
Counts per bin
Steering controller half-maximum Oxygen Protons 10 1
10 0 0
200 400 600 800 Time-of-flight bin number
1000
Figure 19. LENA time-of-flight responses to oxygen, including sputtered H atoms (top panel), and to oxygen where sputtered atoms are removed (bottom panel) (26).
An example of the TOF spectrum is shown in Fig. 19. The spectrum shows clear separation of the well-peaked H atom signal and the O atom signal which is more broadly distributed due to straggling in the foils. This spectrum was taken from an incident oxygen atom beam, but the spectrum includes an H atom signal due to sputtering from the tungsten surface. Because the sputtered atoms come off the surface with low energy relative to the energy of the scattered atoms, it is possible to remove most of the
Figure 20. Low-energy neutral atom image obtained by the IMAGE LENA instrument on May 25, 2000. Vertical axis is instrument polar angle. The image was acquired when the spacecraft was near perigee over the southern hemisphere. The white circle indicates the Earth; the outline of Antarctica is shown. Also shown are representative magnetic field lines (in white), local time sectors, and the nominal auroral oval (in red). The energy range of the neutral atoms is 10 eV to 1000 eV. The fluxes to the left (labeled SW) are believed to originate in the solar wind, perhaps through charge exchange in the distant part of the Earth’s exosphere. The fluxes to the right are due to outflowing ionospheric ions. (Image from T. E. Moore et al., Geophysical Research Letters, Vol. 28, No. 6, 1143–1146, March 15, 2001. Copyright 2001 American Geophysical Union. Reproduced by permission of the American Geophysical Union.)
The aurora is an invaluable diagnostic of the state of the magnetosphere. Auroral emissions are excited predominantly by precipitating energetic electrons, and it is the electron aurora that has been extensively imaged in visible, ultraviolet, and X-ray wavelengths. Precipitating protons also produce auroral emissions, which have proven far more challenging to image. Auroral Lyman-a emissions of neutral hydrogen are produced by charge exchange of energetic protons that cascade into the atmosphere (28–31). As the protons penetrate into the atmosphere and collide with atmospheric gas, they capture electrons from the major neutral constituents (N2 , O2 , O) through charge exchange reactions. They become fast neutral hydrogen atoms, some of which will radiate the Dopplershifted Lyman-a emission. In subsequent collisions, the energetic H atoms may lose their electron in an electron stripping collision and reappear as protons whose motion is constrained by the magnetic field. The cycle will repeat hundreds to tens of thousands of times, depending on the primary proton energy. The difficulty in imaging the precipitating energetic protons stems from the fact that they emit photons only when they gain an electron in an excited state and then transit to the ground state, thereby radiating at the same Ly-a wavelength as the cold hydrogen atoms of the exosphere. Imaging is possible
MAGNETOSPHERIC IMAGING
because of the red shift of the energetic H emissions when viewed from above, but this red shift amounts to only a few tenths of a nanometer. A similar spectral resolution requirement exists for imaging the electron aurora. In this case, the actual atmospheric emissions produced by bombarding oxygen and nitrogen atoms and molecules can be imaged in the ultraviolet. However, airglow emissions at the same wavelengths are produced by atmospheric chemistry, and these emissions compete favorably with the auroral emissions, particularly on the dayside and for weak auroras (32). Although the brightest auroral and airglow line is the 130.4-nm OI emission, it is multiply scattered in the atmosphere, making it difficult to derive accurate two-dimensional distributions of the auroral emissions. On the other hand, the 135.6-nm line is scattered only to a limited degree (33) and is an excellent emission feature for imaging the aurora that can be related to the total electron precipitation. Thus, a primary measurement requirement for UV imaging is the detection and spectral separation of the 135.6-nm emission from that of 130.4 nm. This requirement necessitates using a spectrometer because narrowband filter technology cannot satisfy the requirement. The requirements for imaging the global proton aurora and the electron aurora at 135.6 nm are 1. image the entire auroral oval over a 5-minute period at a spatial resolution of <100 km; 2. spectrally separate the energetic proton precipitation (<1 kR of Doppler-shifted Lyman-a emissions near 121.8 nm which corresponds to a proton energy of 8 keV) from the intense, cold geocorona (>10 kR of Lyman-a emissions at 121.567 nm); and 3. separate spectrally the 135.6-nm OI emission from the multiply scattered 130.4-nm emission. These requirements are satisfied by the Far Ultraviolet Spectrographic Imager (FUV-SI) on the IMAGE spacecraft (34). The first requirement forces a relatively large field of view: 15° × 15° . The second requirement necessitates maximizing the spectral resolution within the available resources; an intrinsic spectral resolution of 0.2 nm was chosen to separate the Doppler-shifted auroral Lyman-a from the exospheric background. Because the Dopplershifted Lyman-a signal is about 100 times less intense than the Lyman-a exospheric emissions at 121.6 nm, internal scattering in the instrument must be minimized. To be able to interpret the Doppler-shifted auroral Lyman-a measurements in terms of pure proton aurora requires rejecting of nitrogen emissions near 120 nm (triplet lines at 119.955, 120.022, and 120.071 nm) that are produced by all auroras, including the electron aurora. The third requirement provides that the transmission of the 130.4nm triplet line should be less than 1% of the 135.6-nm line, necessitating an overall spectral resolution less than 3 nm in the 130–135 spectral region. The spatial resolution requirement specifies that, from the IMAGE satellite apogee of seven earth radii, the angular resolution should
1017
Spectral selection Spectograph Lens 1
Lens 2
Back imager Lenses
Wide slit
l2
Detectors l1
TX grating Slits (antigrill blocks geocorona)
Slits (grill) FOV = 15° × 15°
Imaging l2 l1
Detectors
Intermediate image Figure 21. Simplified schematic illustrating the operating principle of the FUV-SI as an imaging monochromator. The optical elements are represented here by lenses. Mirrors are used in the actual instrument. The top panel shows the wavelength selection; the bottom panel illustrates the imaging function (34).
be 7 × 7 arcmin for each of the 128 pixels × 128 pixels within the total field of view. The FUV-SI is essentially an imaging monochromator, whose principle of operation is illustrated in Fig. 21. The top diagram in Fig. 21 shows the wavelength selection, and the bottom part shows the imaging properties of such an instrument. For simplicity, refractive lens elements are used to represent the optical components. In the actual FUV-SI instrument, all optical elements are mirrors that pass far-ultraviolet radiation and are achromatic across the wavelength range covered. Consequently, the optics is folded, resulting in a more complicated arrangement. The first element in the imaging monochromator is an input aperture (a grill of slits in the case of FUV-SI) followed by a collimating lens that produces parallel light from any point in the plane of the slits. The grating in this illustration is a transmission grating placed at a distance from the collimating lens, so that any parallel light coming through the slit is focused to a point at the grating. Thus, an image of the object at infinity in front of the instrument is formed close to the grating. The grating disperses the light according to wavelength. The wavelength-related angular dispersion of the light is the same for all points of the image. Thus, to a first order, imaging and wavelength selection are independent of each other. The grating is followed by a spectrographic camera lens that focuses parallel light from the grating at the plane of the exit slit. A spectral region defined by the position and size of the slit(s) receives an equal amount of light from any point in the image. The back imager, the last part of the system, focuses the intermediate image, located at the grating, on the detector. The location and width of the exit slit(s) define the spectral region and the spectral width of the light that passes through the
1018
MAGNETOSPHERIC IMAGING
system. Therefore, the spectrometer acts as an afocal filter in the optical system. Two wavelength regions correspond to two separate exit slits. One region is for imaging the Doppler-shifted Lyman-a radiation (121.8 nm) induced by the proton aurora, and the other is the 135.6-nm oxygen emission. The two regions in Fig. 21 are denoted as λ1 and λ2 . The bottom half of Fig. 21 shows the imaging properties of the imaging monochromator. Parallel light that enters the entrance slit (grill) from an object is focused at the plane of the grating. This intermediate image is reimaged on the detectors by the back imager camera lens located behind the slit(s). The most remarkable property of this type of imaging monochromator is that its imaging and wavelength properties are independent of each other because light rays that leave from a point of the intermediate image at the grating are brought in focus on the detector, regardless of their direction or their associated wavelength. Different wavelengths are diffracted in different directions. Filtering is accomplished by the size and position of the slit(s). In this system, it is possible, for example, to have a wide exit slit that passes a large spectral region combined with a sharp focus on the detector that represents high spatial resolution. Thus, wide spectral passbands can be accommodated by high spatial imaging resolutions. Alternatively, it is possible to use very fine spectral slits and integrate over large detector pixels. It is also possible to have several separate wavelength channels using this approach. The arrangement needs separate back imager optics for each wavelength channel, and each separate back imager optics has to provide some deflection to ensure that the wavelength regions end up on separate detectors. For example, the two small reimaging lenses in Fig. 21 must be separate to ensure that two passbands λ1 and λ2 form images at different locations and, if necessary, on separate detectors. To have a fairly large throughput but high enough spectral resolution, the FUV-SI instrument uses a wide input slit that is subdivided into nine subslits to form a
slit grill (35). The output slit of the Lyman-a channel also contains a similar slit grill that has been designed to block the light at the precise wavelength of the more intense, cold, geocoronal emission but transmit efficiently in the wavelength region that contains the weaker Dopplershifted emissions. The nine parallel slits improve light collection by a factor of 9, so, that the FUV-SI is equivalent in efficiency to nine single-slit instruments of the same size. The IMAGE FUV-SI instrument is schematically illustrated in Fig. 22. The light enters the spectrometer through the entrance slit; is reflected by the collimator mirror, which produces parallel light from any point on the slit; and focuses it as an intermediate image approximately at the grating. The function of the two back imagers is to transfer the intermediate image onto two spatially separated detectors for 121.8 nm and 135.6 nm. The pattern of slits for the 121.8-nm back imager is designed to be in anticoincidence with the input slit grill. The slit for the back imager labeled 135.6 is simply a wide planar slit because a 3-nm spectral bandwidth is adequate to discriminate against 130.4-nm emissions. To satisfy the spectral resolution requirements of 0.2 nm without lengthening the spectrometer, at least two surfaces are needed to provide sufficient aberrational correction for the spectrograph. The instrument uses Wadsworth mounting in which a collimating mirror forms a parallel beam for a separate grating. Minimizing the aberrations also demands using the central, or paraxial, field of the collimator. This dictates using a system configuration in which the incoming light is on axis and passes through a central hole of the grating. The SI has a 3,000-line per mm holographically ruled grating 181 × 156 mm (Table 5). The IMAGE satellite is a slow spinner (2-minute spin period), so at apogee the earth is within the field of view of the imager for less than 10% of the spin. To maximize the sensitivity of the instrument, a time-delay integration (TDI) scheme is used in which multiple images taken as the earth passes through the field of view are spatially registered with respect to the earth and are overlapped
Zero-order baffle
Y
Spherical mirrors
1218 Å
Detectors Exit slits
Z
Folding mirror Entrance slits Front baffle
Figure 22. Schematic of the FUV-SI instrument (34).
Front door assembly
Zero order
1356 Å +1 order
Grating
Collimator mirror
Conical mirror
MAGNETOSPHERIC IMAGING
1019
Table 5. FUV SI Characteristics Mechanical Configuration Overall slit width (µm) Spacing (µm) Curvature
Entrance Slits
Exit Slits
391.6 1076.6 No
356.5 980.1 R = 1300 mm
Optical Equivalent
Wavelength
Slit width Bar width Slit spacing Linear dispersion
0.19 nm 0.3323 nm 0.5223 nm 1820 µm/nm
Optical Design Collimator mirror Grating Spherical mirrors Conical mirrors Flat 135.6 nm
Radius of Curvature (mm) 1000 (concave) 1000 (concave) 136.461 (convex) 170.725 (concave) —
Conic Constant −1.727 0 0 0.114 —
in the instrument memory. An advantage of the spinning platform is that the blind spot produced by the hole in the grating (a requirement of the paraxial Wadsworth configuration) is partially filled in. For ease of manufacturing, two identical imagers were used: one works at 121.8 nm, the other one works at 135.6 nm. Each imager is a two-mirror optical system. The first mirror is spherical. The second is conical (elliptical) and works in an off-axis configuration. The two imagers are perfectly symmetrical (only the back focal length is slightly different, owing to a geometric difference in the intermediate image to be relayed). Thus two identical spherical and two identical conical mirrors were manufactured. A flat folding mirror was added on the 135.6-nm imager,
Offset (mm)
Size (mm) x
y
0 0 0 53.5 —
181 142 52 144 72
156 164 20.25 76 45.5
so that the two detectors would fit in the available space. The imagers are not optimized independently of the Wadsworth monochromator. Astigmatism, which was not critical for spectral selection at the exit slit plane, was the main aberration to be corrected by the imagers to generate the best possible image on the detectors. The FUV-SI instrument is designed to minimize the 120- and 121.6-nm lines while transmitting the largest bandwidth in the region from approximately 121.7–122.3 nm. The grating angle and the collimator conic constant are optimized to reduce aberrations at 121 nm, that is, between the 120–121.6-nm lines. It was relatively easy to meet the spectral resolving power requirements of the instrument in the 130 to 135-nm
Wavelength (Å) Spectral filtering performances in the 121.6-nm region. 100.0 Fitting based on experiment
90.0
Ray-tracing simulation
Absolute transmission (%)
80.0 70.0
Lyman-α
60.0 50.0 40.0 30.0 20.0 10.0 0 1213.0
1214.0
1215.0
1216.0 1217.0 1218.0 Wavelength (Å)
1219.0
1220.0
1221.0
Figure 23. Laboratory measurement of the spectral performance of the FUV-SI instrument for the proton aurora (34).
1020
MAGNETOSPHERIC IMAGING
45 Å
45 Å
0.010 Counts in pixels
0.009
Bg subtraction
SI Response (counts/s) for 1 photon/s/cm2
0.008 0.007 0.006 0.005 1356 Å 0.004 0.003 0.002 Figure 24. Laboratory measurement of the spectral performance of the FUV-SI instrument for the electron aurora (34). Bottom curve is the response after the background (Bg) has been subtracted.
0.001 0.000 1300
1310
1320
1330
1340
1350
1360
1370
1380
1390
1400
Wavelength (Å)
Figure 25. Auroral images obtained with the IMAGE FUV-SI instrument on November 10, 2000. The auroral emissions are mapped in magnetic latitude and magnetic local time, with local noon at the top and midnight at the bottom. The image on the left is the proton aurora; the image on the right is the electron aurora. The proton aurora image shows an emission patch equatorward of the main oval in the post-noon afternoon sector that is not seen in the image of the electron aurora. See color insert.
region to accomplish 130.4-nm blocking for the second imager. FUV-SI Instrument Performance The spectral performance of the FUV-SI, as measured in the laboratory, is shown in Fig. 23 and Fig. 24 for the proton and electron aurora, respectively. A worstcase Lyman-a leak of 1.3% is derived from the data, which are limited by the fact that the calibration monochromator had a lower spectral resolution than that of the instrument. Figure 24 shows excellent rejection of 130.4-nm radiation and a peak response at 135.6 nm, as desired. The absolute response for a 5-second exposure at the two wavelengths of interest was measured at 1.6 × 10−2 counts per Rayleigh per pixel and 1.8 × 10−2 counts per Rayleigh per pixel, for 135.6 and 121.8 nm, respectively. Flight images of the electron and proton aurora obtained by the FUV-SI instrument are shown in Fig. 25.
SUMMARY AND CONCLUSIONS Magnetospheric imaging is an emerging field that has now just been fully implemented. The Imager for Magnetopause-to-Aurora Global Exploration (IMAGE) spacecraft was launched in March 2000 and since then has been sending back global images of the plasma regions in the earth’s inner magnetosphere. This spacecraft carries instruments that perform all of the known techniques for magnetospheric imaging. That is why a review of the state of the art in magnetospheric imaging is essentially a review of IMAGE instrumentation. One of the techniques employed on the IMAGE mission — energetic neutral atom imaging — will also be used to image the plasma environments of other planets. The INCA instrument on the Cassini Saturn Orbiter (36) — the predecessor of the IMAGE HENA imager — will provide ENA images of Saturn’s inner magnetosphere after Cassini arrives in the Saturn system in the summer of 2004. ENA imaging of the interaction of the solar wind with the Martian
MOTION PICTURE PHOTOGRAPHY
exosphere (37) is also a key objective of the European Mars Express mission, which is scheduled for launch in 2003. The application of ENA and other magnetospheric imaging technologies in the twenty-first century promises exciting advances in our understanding of terrestrial and planetary space environments.
1021
33. D. J. Strickland and D. E. Andersen Jr., J. Geophys. Res. 88, 9,260–9,264 (1983). 34. S. B. Mende et al., Space Sci. Rev. 91, 287–318 (2000). 35. M. -P. Lemaitre et al., Science 225, 171–172 (1984). 36. S. M. Krimigis et al., Science, in press (2000). 37. E. Kallio, J. G. Luhmann, and S. Barabash, J. Geophys. Res. 102, 22 183–22 197 (1997).
BIBLIOGRAPHY 1. M. Nakamura et al., Geophys. Res. Lett. 27, 141–144 (2000). 2. E. C. Roelof, Geophys. Res. Lett. 14, 652–655 (1987).
MOTION PICTURE PHOTOGRAPHY
3. M. G. Henderson et al., Geophys. Res. Lett. 24, 1,167–1,170 (1997). 4. M. G. Henderson et al., Adv. Space Res. 25, 2,407 (2000). 5. E. C. Roelof and A. J. Skinner, Space Sci. Rev. 91, 437–460 (2000). 6. J. Perez, M. -C. Fok, and T. E. Moore, Space Sci. Rev. 91, 421–436 (2000). 7. S. B. Mende et al., Space Sci. Rev. 91, 287–318 (2000). 8. J. L. Burch, Space Sci. Rev. 91, 1–14 (2000). 9. B. W. Reinisch et al., Space Sci. Rev. 91, 319–359 (2000). 10. B. R. Sandel et al., Space Sci. Rev. 91, 197–242 (2000). 11. S. Lunt and R. S. Turley, Proc. Utah Acad. submitted (1999). 12. D. G. Mitchell et al., Space Sci. Rev. 91, 67–112 (2000). 13. D. G. Mitchell et al., in R. F. Pfaff Jr., J. E. Borovsky, and D. T. Young, eds., Measurement Techniques in Space Plasmas: Fields, American Geophysical Union, Washington DC, 1998, pp. 281–287. 14. R. L. Rairden, L. A. Frank, and J. D. Craven, J. Geophys. Res. 91, 13 613–13 630 (1986). 15. K. C. Hsieh, E. Keppler, and G. Schmidtke, J. Appl. Phys. 51, 2,242 (1980). 16. C. J. Pollock et al., Space Sci. Rev. 91, 113–154 (2000). 17. J. T. M. Van Beek et al., J. Vac. Sci. Technol. B16, 3,911–3,916 (1998). 18. M. A. Gruntman, Appl. Opt. 34, 5,732–5,737 (1995). 19. E. E. Scime, E. H. Anderson, D. J. McComas, and M. L. Schattenburg, Appl. Opt. 34, 648–654 (1995). 20. M. M. Balkey, E. E. Scime, M. L. Schattenburg, and J. van Beek, Appl. Opt. 37, 5,087–5,092 (1998). 21. R. R. Meier, Space Sci. Rev. 58, 1–186 (1991). 22. H. O. Funsten, D. J. McComas, and B. L. Barraclough, Opt. Eng. 32, 3,090–3,095 (1993). 23. C. J. Pollock et al., J. Geophys. Res. 95, 18 969–18 980 (1990). 24. T. E. Moore et al., Science 277, 349–351 (1997). 25. T. E. Moore et al., Geophys. Res. Lett. 26, 1–4 (1999). 26. T. E. Moore et al., Space Sci. Rev. 91, 155–195 (2000). 27. D. M. Walton, A. M. James, and J. A. Bowles, in R. F. Pfaff Jr., J. E. Borovsky, and D. T. Young, eds., Measurement Techniques in Space Plasmas: Particles, American Geophysical Union, Washington DC, 1998, pp. 295–300. 28. R. H. Eather, Rev. Geophys. Space Phys. 5, 207–285 (1967). 29. B. C. Edgar, W. T. Miles, and A. E. S. Green, J. Geophys. Res. 78, 6,595–6,606 (1973). 30. B. Basu, J. R. Jasperse, D. J. Strickland, and R. E. Daniell, J. Geophys. Res. 98, 21 517–21 532 (1993). 31. D. J. Strickland, R. E. Daniell, J. R. Jasperse, and B. Basu, J. Geophys. Res. 98, 21 533–21 548 (1993). 32. D. J. Strickland, J. R. Jasperse, and J. A. Whalen, J. Geophys. Res. 88, 8,051–8,062 (1983).
THOMAS W. HOPE Hope Reports Rochester, NY
It is not the intention of this article on motion picture photography to be all encompassing. Motion picture photography as a professional industry has many facets. In effect, this is a compendium. Subjects pertaining more closely to imaging are covered in greater detail than others not directly involving imaging. For definitions of many terms used here, consult the glossary. Motion picture photography, dating from the 1890s, is one of the oldest of modern imaging, technologies that remains current today. APPLICATIONS (1) As a predecessor to television, motion pictures have been a medium that has provided entertainment in movie theaters for more than a century. Many prime time television programs and most national TV commercials are shot on motion picture film. For years, the family recorded its activities in home movies. Motion pictures have played a major role in education from preschool through high school and college. Business and industry have used it for training, sales, and marketing and to provide information to the public as well as to specific audiences such as stockholders. Motion pictures have been a vital element of health sciences for education as well as information. Federal, state, and local governments have used the motion picture for a wide variety of purposes similar to those of industry. Churches and synagogues, youth groups, civic organizations, clubs, retirement and nursing homes all have used movies for teaching, informing, and entertaining. Today many of the functions listed here are carried out on videotape and, more recently, by digital technology using the computer to combine multimedia programs for instruction and other applications as well as title creation and postproduction functions. Much basic program production, however, is still initiated on motion picture film. THEORY OF MOTION PICTURE PHOTOGRAPHY Motion picture theory is simple and clear-cut. Motion film is composed of a series of still pictures. When the still pictures are projected progressively and rapidly onto a screen, the eye perceives motion, hence they become a motion picture. This is termed persistence of vision. In a conventional sound motion picture, 24 individual pictures
1022
MOTION PICTURE PHOTOGRAPHY
are projected onto the screen in one second. The screen is blank (black) for 1/96th second. While picture number one is blanked out, the film moves to picture number two. The eye and mind retain the image of picture number one for that fraction of a second when the screen is black. If, for example, something in motion is being photographed, the difference between image one and image two is minute. The eye and memory record a smooth transition from picture one to picture two. This is perceived as motion.
experimentation and testing in developing the Kinetograph. He tried different sizes of film. It is believed that he cut an Eastman 70 mm still film in half, which he found easier to work with. In 1889, he finally settled on 35 mm film that had four perforations and a 4 : 3 aspect ratio. Essentially, both have remained the production and theater projection standards for more than a century, well into the twenty-first century (3,4). Aspect Ratio
Two Basic Components 1. A motion picture film has to be created, that is, produced. This can be relatively simple, or it can be highly involved and extend over a long period of time. 2. To be viewed, the finished motion picture is projected onto a screen. Four Key Elements • Motion picture film, the keystone of motion picture photography • Motion picture camera that photographs a series of pictures in rapid sequence • Motion picture printer in a laboratory to reproduce prints (positives from a negative or original positive). • Motion picture projector that allows a viewer to see a series of still pictures as motion when projected in rapid sequence onto a screen. ORIGIN OF MOTION PICTURE PHOTOGRAPHY The concept of motion picture science is based on work by at least 16 men from 1824 up to 1889 (2). The earliest evidence (1824) is Peter Mark Roget’s paper, The Persistence of Vision with Regard to Moving Objects, delivered at England’s Royal Society. This stimulated experimentation in England by Sir John Herschel and Michael Faraday and on the continent by Dr. Simon Ritter von Stampfer in Austria and Dr. Joseph Antoine Plateau of Belgium’s University of Ghent. In 1853, another Austrian, Baron Franz von Uchatius, projected images from a disc onto a screen using a magic lantern. In the United States, Philadelphia engineer, Coleman Sellers (1860), patented the kinematoscope. Using still pictures mounted on the blades of a paddle wheel, it produced a zoetropic (animation) effect. Two other men followed up on Baron Uchatius’s invention, one in France and the other, Henry Heyl, in Philadelphia, who patented the plasmatrope. In 1872, the well-known Eadweard Muybridge experiment, using a rotating machine designed by Leland Stanford’s engineer, John D. Isaacs (California), depicted the gait of a horse.
Aspect ratio is the relation of the width to height of the projected picture on the screen. Motion picture pioneers widely discussed and debated the subject in the early days. Paintings were carefully studied and analyzed. What would be most pleasing to a viewer (11)? The standard aspect ratio of 4 : 3 for regular movie theater showings has prevailed through the years. As the motion picture industry sought to increase viewing pleasure and excitement, various wide-screen applications were developed. The ratio most used is 1.85 : 1 in the United States, whereas 1.66 : 1 is more common in Europe. Motion Picture Film Created During 1887/1889, Edison went to George Eastman (Rochester, NY), who developed a 50-foot strip of film on nitrocellulose for Edison. Thus, motion picture film was conceived. Motion picture shows were made in Edison’s famous Black Maria tar-paper studio to show on his kinetoscope peep show device. The kinetograph camera, weighing almost a ton, was used to shoot the films (3). Edison and Eastman also realized the need for a term to describe this new medium. After many exchanges of letters exploring Greek- and Latin-based terminology, they decided on motion picture as the best name for this new technology. George Eastman was fascinated with words; motion picture was both easy to understand and to say. In France, brothers Louis and Auguste Lumi`ere designed the cinematographe. Their demonstration in Paris, March 22, 1895, was a huge success. Thomas Armat (Washington, DC) gets credit for the principle of the modern projector by the unveiling of his vitascope in September 1895. Others, including the Lumiere brothers, made additional contributions that led to modern 35 mm motion picture photography, as we know it today. FILM Photographic film used for motion pictures is normally a thin strip or ribbon. The exception is wide format films, 65 mm and 70 mm used for wide-screen shows. Professional motion picture people such as cameramen, film directors, and film editors dealing with film refer to motion picture film as film stock. COMPOSITION OF FILM
35 mm In 1887, Thomas A. Edison (New Jersey) experimented with pictures using his phonograph principle. Edison’s assistant, W.K.L. Dickson, actually did the
Base All film has a base that is a clear, transparent substance like clear plastic. The first films used for movies from 1890
MOTION PICTURE PHOTOGRAPHY
up into the 1950s had a nitrate film base which is highly flammable. Safety Acetate Film. In 1906, a slow burning, safety base film using cellulose acetate was developed, called acetate film. It is still in use, but it took almost 50 years before nitrate-based film was eliminated. Polyester Film. More recently, a very tough base was developed from polyethylene terephthalate by E.I. duPont de Nemours & Co., trade named Cronar . This base is used for print film and color duplicate negatives. It is not used as a camera negative because it is too tough to cut and splice and, in cold weather, becomes too stiff for camera use. Eastman Kodak uses the trade name, EstarTM . The Fuji Photo Film name for its base is Polyester. It also has excellent chemical stability, which makes it ideal for long-term archival use.
1023
Projection Film — Print (Positive) Print Film. For projection, the image on negative or duplicate-negative film is printed on B&W positive stock that reverses the image from the negative. The resulting picture is identical to the image seen by the human eye, except without color. Color Film Color film is more complex than B&W. It comes as negative, reversal, intermediate, and print film. Color intermediate film is used for duplication and special effects. In shooting color film, color ‘‘temperature’’ is a critical factor. For example, outdoor color film is balanced for 6,500 K — the ideal temperature for color film. ‘‘Tungsten balanced’’ film is designed for studio lamps at 3,200 K. Negative, Intermediate, and Reversal
Emulsion In the manufacture of motion picture film, a light-sensitive compound called an emulsion is coated on the base. The emulsion (sometimes multilayered) is the key substance on which images are exposed. TYPES OF FILMS Black-and-White Film Black-and-white film (B&W) comes in several types — negative, positive or print film, and reversal film. B&W intermediate film is used for duplicating negatives, for title preparation, for special effects, for sound recording, and for archival film elements. Negative and Reversal
Scratch-resistant top layer Yellow imaging layers Yellow filter layer Magenta imaging layer
Imaging
Negative Film. Used in a camera to photograph shots. That action is referred to as shooting a film. There are different kinds of B&W negative movie films. The most common is panchromatic, which is sensitive to all colors. Orthochromatic film is sensitive primarily to blue and green. The image recorded on negative film is the reverse of the visual image seen by the eye. It takes a skilled person to read negative film properly. Someone unfamiliar with film is usually unable to identify the picture on a negative.
Color negative film has three color emulsions — yellow, magenta and cyan — and two filter layers, three protective layers, the support base, and a rem-jet backing layer that is removed in processing (5). Color intermediate film is essentially a negative film. It is used in two stages, first to make a color master positive and then a color duplicate negative. A fine grain copy of the original negative is used for quantity printing of the release prints. The release prints go to theaters or are sold to schools and other customers. Color reversal film is exposed in the camera and then developed to a positive image that is used in a projector or telecine. A common example is KodachromeTM for amateurs. Kodak Ektachrome 16 mm film was developed
Interlayer scavenger Cyan imaging layers Subbing layer U-Coat
Reversal Film. Processes to a positive image after camera exposure. Support acetate
Base
Exposure Film Speed. A critical aspect of any film is its exposure speed. Film speed refers to its sensitivity to light. Slow speeds require stronger light such as sunlight or powerful indoor lights on a sound stage. The faster the film speed, the better that film can photograph in subdued light. Motion picture films use an EI number (exposure index) instead of ASA (American Standards Association) or ISO (International Standards Organization) used with still film. EI film speed ratings are half of ASA/ISO ratings. That makes EI 500T the same as ASA 1000.
REM-Jey layer Figure 1. Cross section of Kodak VisionTM color negative film (courtesy Eastman Kodak Co.) (5).
1024
MOTION PICTURE PHOTOGRAPHY
for news gathering before videotape was introduced. It is widely used for automobile crash testing. Projection Film — Print (Positive) Color print film has three color emulsions — magenta, cyan, and yellow — two interlayer scavengers, a scratchresistant top layer, an antihalation layer, a subbing layer, and, below the support base, a U-coat, a process-surviving antistatic layer, and at the bottom, a protective scratchresistant layer and lubricant. The scavenger layer absorbs or inhibits the migration of unwanted chemicals. Because of the many layers and their complexity, highspeed color film is a more difficult product to manufacture. It took years to bring color film speed up to that of black and white. Color Film Evolution The first successful color motion picture film came from Technicolor, a major Hollywood motion picture laboratory, when it introduced a two-color film for theaters in 1922. In 1932, Technicolor introduced a three-color, 35 mm film system called imbibition or IB for short. For the next 20 years, Technicolor’s IB was the principal color film for theatrical motion picture productions. It was expensive to shoot because it required a large camera that literally exposed three separate B&W films simultaneously through red, green, and blue filters. Kodachrome color film, introduced in 1935 by Eastman Kodak Co., was the first practical 16 mm color motion picture film. It is a color reversal film also available as 35 mm still picture film. In 1936, Kodak developed it for the 8 mm motion picture market. Although initially intended for the amateur, Kodachrome film found a market in the industrial nontheatrical film field and eventually in education and other markets. Subsequently, other color reversal motion picture films were introduced. Agfa (Germany; after World War II
Scratch-resistant top layer Magenta imaging layer
Cyan imaging layer
Imaging
Interlayer scavenger
Interlayer scavenger
it was Orno in East Germany) and Gevaert (Belgium) were widely used. Another was Anscochrome by General Aniline and Film. Finally, in 1958, a 16 mm film designed for duplication was developed by Kodak. Ektachrome Commercial replaced Kodachrome and similar films not designed for duplicating. Fuji (Japan) introduced a color reversal film in the 1970s. In 1950, Eastman Kodak introduced a negative/positive 35 mm color film system for the theatrical motion picture world. In the early 1970s, it was introduced in 16 mm, intended for nontheatrical markets. Eastman Color became the world standard. Fuji Photo Film of Japan brought out a film compatible with Eastman Color. Now, Agfa sells only color positive (print) and some recording films. General Aniline and Film no longer sells motion picture film. Currently, the most popular 35 mm camera color negative film is 500T (tungsten). Fuji finds that its 250T film is its next most popular. Eastman Kodak reports that the 800T is the next most popular, followed by 320T. Today, very fast films whose speeds are 800T can record good pictures in candlelight. New emulsion technology by Eastman Kodak resulted in its patented T-grain in 1989, which makes the silver halide crystal a more efficient gatherer of light. Likewise, it is better as a storage medium. 16 mm and Safety Film In 1912, Path´e Fr`eres of France introduced 28 mm film that had a nitrate base, even though acetate safety base had been available for 6 years. Alexander F. Victor, a Swede who had moved to the United States, urged manufacturers to put this 28 mm film on the new acetate base instead of the volatile nitrate base. This would let motion pictures be used safely in schools, churches, homes, and other places outside of theaters. In April 1918, the Society of Motion Picture Engineers (SMPE) adopted 28 mm film on a safety base for portable projectors (6). Some industry people continued to push for 17.5 mm film for portable projectors. George Eastman argued that unscrupulous manufacturers might split 35 mm nitratebased film which could pose a severe danger from fire. In 1923, Eastman Kodak introduced 16 mm film on acetate base plus a camera and projector designed for home use. Victor quickly designed a 16 mm projector for the nontheatrical education and industrial markets. The 28 mm film never went any further.
Yellow imaging layer Antihalation Subbing layer U-Coat
Film Width Base
4.7 mil (120 micron) Estar Support (polyester)
PHYSICAL CHARACTERISTICS
U-Coat Process-surviving antistatic layer Protective scratch-resistant layer & lubricant Figure 2. Cross section of Kodak VisionTM color print film (courtesy Eastman Kodak Co.) (5).
From the very beginning, film width has been measured in millimeters instead of inches for more precise measurement. 35 mm film — Theatrical industry film standard for more than 100 years. This is for use in both cameras and projectors. 65 mm — Used solely for photographing a motion picture for wide screen.
MOTION PICTURE PHOTOGRAPHY
Super 8 mm film
16 mm film
35 mm film
65 mm film - 5 perf
Figure 3. Five basic film sizes (width) (courtesy Fuji Photo Film).
70 mm film
70 mm — The resulting projection film, printed from 65 mm camera film, is 70 mm wide to allow for four to six sound tracks. 16 mm — For many years, 16 mm film was highly popular, second in annual volume to 35 mm film. Introduced in 1923, it quickly became the standard format for educational subjects and, subsequently, for industrial films and documentaries. In addition, by the early 1930s, an amateur market had developed for 16 mm film. 8 mm — In an effort to reduce the cost of filming for amateurs, several narrower widths were introduced. One, 8 mm, became the amateur standard worldwide. Another size, from France, was 9.5 mm. The French format never gained worldwide popularity, but it continues to be used. Three New Formats As a general rule for choosing a film format, the larger the original negative, the better the picture. The larger film format reduces projected image grain and increases sharpness. It also improves picture steadiness because less magnification is needed. This recognition led to newer formats. Super 8. In 1965, 8 mm film was changed, reducing perforation size and thus enlarging the picture area and providing room for a future sound track. Then, in 1973, sound super 8, that had a magnetic stripe for the sound track, became available.
35 mm negative
35 mm positive
1025
Super 16. 16 mm film was the next to have an enlarged picture area, using more of the width for a sound track. Super 35. Eventually, a larger picture area of 35 mm was designed. The full width of the film inside of the perforations is now the picture frame, using space originally reserved for the sound track. Length Motion picture film used in cinematography is a continuous length on a spool or core. For camera use, the film normally has a specific length such as 100, 400, 1,000, or 2,000 feet, depending on the camera magazine capacity. For projection, the film can be any length. Typical lengths are 400, 800, 1,000 feet, and longer. In trade terminology, 1,000 feet of 35 mm and 400 feet of 16 mm film are referred to as a reel. Laboratories are supplied with color positive film in rolls up to 6,000 feet long. Perforations Motion picture film contains small rectangular or square holes along both or one edge called perforations. The perforations are used to move the film through the camera, printer, or projector. Most cameras use a pulldown claw that works in concert with the perforations to move the film. In the projector, a small roller that has sprockets engages the film perforations to move the film. These functions are described in the sections on cameras, printers, and projectors. Perforations, generally called perfs, vary in size and shape. They are measured by width and height in hundredths of an inch (or millimeters)
35 mm DH ∗ ∗ 35
16 mm
Super 8 mm
Figure 4. Enlargement of five perforation types. mm DH (Dubray and Howell) perf is used for some intermediate film (courtesy: Fuji Photo Film).
1026
MOTION PICTURE PHOTOGRAPHY
R-90 Spool — 16 mm
S-83 Spool — 35 mm
Figure 5. Film spools (7) (courtesy Eastman Kodak Co.).
Type Z core —16 mm
Type Y core —35 mm
Figure 6. Film cores (7) (courtesy Eastman Kodak Co.).
designed precisely by international standards to ensure worldwide compatibility. Another critical measurement is pitch, the distance from the center of one perf to the center of the next perf. In reality, the measurement is made from the bottom edge of one perf to the bottom edge of the next perf. In the early days of 35 mm motion pictures, film perforations were round. Because these perforations were more subject to wear, the shape was changed to what is now known as the Bell & Howell (BH) or ‘negative’ perforation∗ . . . . The high shrinkage of older films on nitrate base made the negative perf a problem on projection films because of the excessive wear and noise during projection as the sprocket teeth ticked the hold-back side of the perf as they left the sprocket. The sharp corners also were weak points and projection life of the film was shortened. To compensate for this, a new perforation was designed with increased height and rounded corners to provide added strength. This perf, commonly known as the KS or ‘‘positive’’ perf, has since become the world standard for 35 mm projection prints (7).
Figure 7. 16 mm film magazine (8) (courtesy Arriflex Camera Co.).
To improve the printing of color print film, another perforation was designed by Dubray and Howell (DH perf) that had the same height as the negative perf to maintain exact registration but rounded corners for longer printing life. Because shrinkage of acetate film is greatly reduced, the DH perf is used for some intermediate films (7). A fourth perforation was designed for CinemaScope, an early wide-screen system. Square and smaller than the KS perf, it provided space on 70 mm film for as many as four magnetic sound stripes for surround sound. It is also called ‘‘Foxhole Perf,’’ because 20th Century Fox invented the cinemascope format. Perforations for 16 mm and super 8 mm film have not changed over the years. CAMERA FILM Most shorter lengths of film, 50 feet of 8 mm, 100 feet of 16 mm, and 100 feet of 35 mm, are available on a spool. Longer lengths of 16 mm and 35 mm are wound on a core. Film on a spool is loaded directly into a camera in subdued light. Film on a core must be loaded in a darkroom into a lighttight metal container called a magazine. The magazine, similar to a layer cake tin, is mounted on the camera. Usually several magazines are loaded at one time, so that additional ones are available during shooting. ∗
Used for all camera films and most intermediate films.
F28X 1 9 5 1 0
Figure 8. Traditional edge numbering (7).
Edge Numbering. All negative film has a series of sequential numbers printed along one edge of the film, a number every foot. This allows a film editor to identify specific scenes easily. Since the advent of video editing and the use of the computer and bar codes, in addition to edge numbers, film manufacturers preexpose numbered bar codes along the film edge that can be read electronically. Referred to as edge codes, they show the
MOTION PICTURE PHOTOGRAPHY
KJ 23 1234 5677 • Machine-readable KEYKODE TM number
Key number
Mid-foot KEYKODE TM number
Zero-frame reference mark repeats every 64 perforations
Tails
Heads
Zero-frame reference mark repeats every 64 perforations
KJ 23 1234 5677 Mid-foot Key number
1027
KJ 23 1234 5677 •
KJ 23 1234 5677 •
∆
#
KJ23 1234 5677
Base up
Eastman 5296 123 0123 123 AB
∆
#
Film Manufacturer Product code
Matching check symbols Frame marker repeats every 4 perforations
Emulsion no. Roll and part no. Printer no. Year code Figure 9. Eastman 35 mm edgeprint format with KEYKODETM numbers (17).
manufacturer’s name, product code, emulsion number, roll and part number, printer number for positive film, and year code. MOTION PICTURE CAMERA A motion picture camera is a lighttight precision instrument. All professional cameras, with a few exceptions, are operated by an electric motor. The power can be supplied by a battery in the field or ac power in the studio. Early motion picture cameras were hand cranked. Subsequently, cranks were replaced by springdriven motors. Today, most amateur cameras are springdriven. Pull-Down Claw In the camera and in an optical printer, a pull-down claw moves the film through the film gate. Registration pins are retracted as the film moves, then protrude when the film pauses, so that the film is exactly aligned with the aperture opening. Shutter The camera has a disc, called a shutter, mounted perpendicularly to the axis of the lens. The shutter has an open portion of the disc which allows the image of the subject being photographed to pass from the lens through an aperture to the film. The shutter opening can be varied
Supply
Take-up
Figure 10. Simple schematic of a movie camera (1).
from a slit to an opening as wide as half the shutter. This is known as a variable shutter. The shutter moves at whatever speed the camera is set for, based on the number of revolutions per second. The pull-down claw is synchronized precisely with the moving shutter, so that the shutter opening is in direct line with the lens axis just as the film frame is momentarily still and can be exposed from the image coming through the lens.
1028
MOTION PICTURE PHOTOGRAPHY
c SFS
Figure 11. Schematic of Panavision camera. Solid line (film) goes up to regular magazine Dashed line (film) goes back to handheld camera magazine (courtesy Panavision).
Opening: 90°. This is the area that exposes the film.
Adjustable metal part of shutter
11.2° mirror shutter opening
Figure 12. Variable Arri camera shutter (8) (courtesy Arriflex Camera Co.).
Frame and Film Gate with Aperture As the film passes through the film gate and stops momentarily in front of the aperture and lens, a small portion of the film is exposed. The resulting picture is called a frame. The size of the aperture opening determines the size of the image. Aperture aspect ratio is covered in the section on Projection. Using 35 mm film as an example, the pull-down claw pulls an unexposed segment of film 22.5 mm in a fraction of a second to a position perfectly
180° mirror shutter opening
aligned with a lens and the aperture opening. The frame is exposed. Then, the pull-down claw pulls the exposed film (usually) down, and the next unexposed frame comes into alignment. Camera Speed When that action is repeated 24 times in one second, that is termed sound speed. For slow motion, the camera runs at a faster rate, such as 64 frames per second or much
MOTION PICTURE PHOTOGRAPHY
1029
Figure 14. Aaton XTR prod Super 16/16 mm camera (courtesy Aaton). Figure 13. Film gate and aperture (courtesy Arriflex Camera Co.).
faster. High speed for extreme slow motion is a highly technical and specialized subject. For silent speed, such as in amateur cameras, the camera usually runs at 16 frames per second. Lens The lens is another essential element of a camera. It is an optical device manufactured to extremely close tolerances. Each professional lens is tested on a lens-testing bench to ensure that it conforms precisely to the specification. High quality lenses are expensive. A camera has a standard lens that produces a screen picture that has a 4 × 3 ratio (actually, 1.371 : 1). The focal length varies depending on the size film that the camera is designed to use. Another type of lens is a zoom lens. Its focal length can be changed from wide angle to standard to telephoto while the camera is filming. Viewfinder Motion picture cameras have two kinds of viewfinders. Some are built in and bring the image to the viewfinder through the lens; it is known as a reflex finder. Others are attached to the right side of the camera body and must be adjusted for parallax to avoid positional errors in framing, especially for close-up shots.
Figure 15. Arri camera available in 35 mm and 16 mm models (courtesy Arriflex Camera Co.).
industrial films. Available in 35 mm, 65 mm, and 16 mm (8). Cine Special — Eastman Kodak’s professional U.S.made 16 mm camera can be said to have launched the nontheatrical industrial film business. Mitchell — America’s Mitchell camera was the workhorse of the Hollywood film industry for more than 50 years until the last quarter of the twentieth century. It was primarily a 35 mm camera, but there was also a 16 mm model.
VIDEO Many cameras have integrated video cameras attached, allowing a shot to be recorded on film and videotape simultaneously. This provides immediate evaluation of a shot before the film is processed.
Significant Cameras Aaton — Made in France (1971). Noted for in-camera time code. Super16/16 mm and 35 mm models used for documentaries, TV, and independent feature production. Arri — Made in Germany by the Arriflex Corporation, it is widely used in Hollywood as well as among contract producers who make TV commercials and
Figure 16. Kodak Cine Special (10) (courtesy Alan Kattelle, Home Movies.).
1030
MOTION PICTURE PHOTOGRAPHY
Figure 19. Panavision Panaflex camera in 35, 65 and 16 mm models (courtesy Panavision).
Figure 17. Mitchell camera (courtesy Grant Loucks, Alan Gordon Enterprises, Inc.).
Moviecam — The newest (1970s) professional camera from Austria is lightweight and well designed for all kinds of production work. Panavision — Available in 16 mm, 35 mm and 65 mm. All cameras are owned by Panavision and are rented to producers worldwide. Panavision has been a major U.S. camera supplier for feature, television, and TV commercial production since 1972. PRO8 — A professional U.S. camera from Super8Sound. The camera is adapted from a French Beaulieu. Super 8 film dates from 1965 (9).
Figure 20. Super Pro8 camera (courtesy Super8Sound).
There are other makes of 35 mm and 16 mm cameras. Wide-screen 65 mm cameras are discussed in the section on wide screens.
Accessories
Figure 18. Moviecam (courtesy Moviecam F. G. Bauer GesmbH.).
Tripod — One of the most important accessories for the motion picture camera is a tripod. All heavy cameras need a tripod of some kind. The best tripods are those that have fluid heads. A good fluid head allows panning or tilting the camera so smoothly that the eventual viewer of the film will not be aware that the camera was moving. Dolly — A special truck built to hold a large studio camera and camera operator. The dolly usually runs on tracks, so that its movement is absolutely smooth. The camera can move toward or away from action, as well as parallel to a scene.
MOTION PICTURE PHOTOGRAPHY
Handheld — Motion pictures shot without a tripod convey a feeling of casualness and movement. There is a noticeable difference between filming crudely done without a tripod versus intentional handheld cinematography. News footage is often shot by using a shoulder pod. SteadicamTM — A unique camera stabilizing system for handheld camera operation. Worn as a harness, the Steadicam allows the camera to glide smoothly in any direction while the camera operator moves around. Video Assist — A new feature has been added at the shooting stage. A video camera is used with the film camera and photographs the same scene. This allows the director to look at a scene right after it has been shot and decide on the spot to accept or reshoot it. Videotaping along with filming does not replace the need to look at the dailies the first thing the next morning. See Dailies, at end of Production. SOUND DEVELOPMENT In 1923, Lee DeForest showed short synchronized sound films using his phonofilm (11). Warner Bros. of Hollywood introduced sound in its 1926 film Don Juan, using a musical score on 18-inch discs. The following year Warner again made history using its ‘‘Vitaphone’’ for The Jazz Singer which used spoken dialogue, the first talking picture (2). Optical Sound Within a few years, sound on 35 mm film had been perfected. The sound track was located between the perforations and the picture frames. The sound track is known as optical sound. Two different systems were introduced. One was variable density, called push pull, which used minute horizontal lines similar to today’s bar code lines. The other, variable area, has the appearance of a jagged graph line running along the film track area. Today, all analog sound is variable area. Sound on 16 mm film came in 1930 when RCA introduced a 16 mm sound projector. Because 16 mm film is narrower than 35 mm, perforations on one side were eliminated to provide room for the sound track. Eight mm film did not get sound until 1973 when Super 8 sound was introduced. It likewise had perforations only on one side. Digital Sound In recent years, recorded sound has undergone a dramatic change as digital technology is used. Theater digital sound is available in three systems: Dolby sound Sony SDDS sound DTS (digital theater system) WIDE SCREEN There are basically three kinds of wide-screen systems. The most common uses 35 mm film where the picture is
1031
masked at top and bottom to give a 1.85 : 1 aspect ratio wide-screen appearance. Theater screens for this system tend to be small, 23 to 47 feet wide. This wide-screen system in the industry is termed flat, and, at the beginning of the twenty-first century, is really not considered ‘‘widescreen.’’ Another system uses 35 mm film which uses an anamorphic lens during filming, to squeeze the image, so that the images on the negative are extremely tall and slim. In projection, a 2× anamorphic lens spreads the image to produce a wide-screen picture whose aspect ratio is 2.39 : 1. This wide-screen system, originated by Cinemascope, is called scope. The third wide-screen system is shot on 65 mm film and is projected using 70 mm film; both have five-perf pull-down. The aspect ratio is 2.21 : 1, and the screen can vary from 30 feet wide up to 60 feet or more. Because of the expense, this system is mainly used for spectacular films such as 3-D. Though wide-screen cinema exploded in popularity in the last half of the twentieth century, it actually dates back to 1896–1910. In 1900 at the Paris Exposition, Raoul Grimoin-Sanson used his Cineorama to show a film on a 360° screen 300 × 30 ft. It involved ten 70 mm projectors to fill the giant screen (11). The use of the wide screen in movie theaters has grown in popularity, especially since television has become a major entertainment venue. Modern use of the wide screen began with Cinerama in 1952. In the following year, Cinemascope came out, followed by VistaVision, Todd-AO, and Technirama, all in the 1950s. In 1970, three more wide-screen systems were launched — Ultra Panavision, IMAXTM , and OmnimaxTM . Three other wide-screen systems have been introduced since — iWERKS, Show Scan, and MegaSystems, using 8-, 10- and 15-perf pull-down systems. IMAX has become one of the most widely used widescreen systems worldwide. It is unique because the film is photographed and also projected on a horizontal plane. The camera is IMAX’s own 65 mm design. They also have their own 70 mm projector. Because of the horizontal format, the frame is larger than that of any other system. ShowScan is another unique technology. The film runs at 60 frames per second. Each film is short, 4 to 5 minutes. The seats move during screening so that the show is more like an amusement park experience. PRODUCING A MOTION PICTURE Making a professional motion picture can be relatively simple or very complex. The former can be done by a single person, although a crew of at least three or four is the more common minimum. In contrast, the production of a feature film involves hundreds of highly skilled individuals. The procedure essentially is the same for any film production. Preproduction Creating the story is obviously the starting point for any motion picture, be it a documentary, an educational subject or an entertainment feature film, television drama,
1032
MOTION PICTURE PHOTOGRAPHY
Table 1. Wide Screen Technologiesa Film Cinerama
Cinemascope
VistaVision
Todd-AO
Technirama
Ultra Panavision
IMAX
IWERKS
ShowScan MegaSystems
Year
Gauge
Cam or Proj
Aperture (Inches)
Perfs/ Frame
Aspect Ratio
1952
Triple 35 mm Triple 35 mm
c p
0.996 × 1.116 0.985 × 1.088
6 6
2.65 : 1 2.65 : 1
35 mm 35 mm
c p
0.868 × 735 0.839 × 0.700
4 4
2.35 : 1c 2.35 : 1c
35 mm 35 mm
c p
1.495 × 0.991 0.825 × 0.446
8 4
1.5 : 1 1.66 : 1.85
35 mm
p
1.418 × 0.7228
8
1.85 : 2.1
65 mm 70 mm
c p
2.072 × 0.905 1.912 × 0.870
5 5
2.21 : 1 2.21 : 1
35 mm
p
0.839 × 0.700
4
2.35 : 1
35 mm 35 mm
c p
1.496 × 0.992 0.839 × 0.700
8 4
2.25 : 1 2.35 : 1c
1953
1954
1955
1957
1970
1970
1989
1993 1998
70 mm
p
1.912 × 0.870
5
2.21 : 1
65 mm 70 mm
c p
2.072 × 0.906 1.912 × 0.870
5 5
2.76 : 1 2.76 : 1
35 mm
p
0.839 × 0.700
4
2.35 : 1c
65 mme 70 mme
c p
2.032 × 2.772 1.91 × 2.74
15 15
1.364 : 1 1.435 : 1
mme mm mm mm mm mm
c pd pd Pd pd pd
2.032 × 2.772 0.920 × 0.690 1.912 × 0.868 1.913 × 1.417 1.913 × 1.838 1.485 × 0.991
15 4 5 8 10 8
1.364 : 1 1.33 : 1 2.20 : 1 1.35 : 1 1.64 : 1 1.498 : 1
65 35 70 70 70 35
Sound Format
Screen
seven-track mag
Curved
four-track stereo mag
2× anamorphic squeeze
Optical monophonic Optical monophonic
Proj spherical Spherical
Mag six-track stereo Mag optical
Spherical
Mag optical four-track Mag 6-track
2× anamorphic squeeze
Mag six-track stereo Mag optical four-track
2× anamorphic squeeze 2× anamorphic squeeze
Digital six-channel
Curved high gain of 2
2× anamorphic squeeze
DTSf DTSf DTSf DTSf DTSf
65 mm 70 mm
c p
5 5
2.2 : 1
DTS 5.1
65 & 35 mm 70 mm
c p
Varies Varies 1.417 × 1.413 8
1.35 : 1 1.35 : 1
Digital
Curved 60 f/s Dual Geneva intermittent
a
Ref. 12. C = camera aperture; P = projector aperture c With squeeze. d Linear loop projector. e Horizontal. f DTS = Digital theater sound Note: None of the magnetic-strip sound formats is still produced except for archival purposes. b
sitcom, or TV commercial. The script is normally the next step. Probably, 99% of all productions begin with a carefully written script. For the majority of motion pictures, locations must be determined and sets designed and constructed in the planning stage. Another major preproduction function is casting, choosing the actors or on-screen personalities if it is to be a documentary. Production There are a number of elementary principles in motion picture production. They are, however, not hard and fast.
For example, a director may choose to violate one to achieve a certain effect or mood. Pictorial Continuity (13) The director, director of photography, and film editor are guided by a number of artistic principles developed over the years. Following are a few essentials presented in headline form. Sequence — Long shot, medium shot, close-up. Extreme long shot. Extreme close-up.
MOTION PICTURE PHOTOGRAPHY
1033
Sync sound. Most feature films are shot with synchronous sound recording, called sync sound. All dialog, for example, is both photographed and sound recorded at the same time, using a camera and sound recorder running at exactly the same speed using crystal-controlled motors. Single-system sound. A major exception was the super 8 mm sound camera. That type of camera recorded the sound on a magnetic stripe placed on 8 mm negative or reversal film opposite the side that is perforated. The sound was recorded in the camera as the film was shot. Today, very little is recorded single system. Voice Over. The commentary for a film is done by a narrator and other voices speaking off camera in conjunction with the scenes pictured. The recording is normally done after the film work print has been edited into final form. Sound for much news coverage is recorded in the camera, usually shot on videotape. Primarily amateur filmmakers using a video camera or people doing lowbudget film production might shoot videotape that records the sound in the camera. Figure 21. IMAX 3-D camera. The film path of the 65 mm 3-D camera goes through a staircase where the film is twisted, so that the magazines can be mounted in their vertical configuration. The image from the lenses is actually folded by mirrors before it reaches the film; one eye folds up, and the other folds down. This allows a natural lens interocular for proper 3-D shooting (courtesy IMAX Corp.).
Panning — Planned panning. Panning ahead. Caution against unnecessary camera movement. Reestablishing shot — Pulling back. Panning. Reverse angle. Travel shot — Truck. Moving camera on a dolly. Directional continuity — Screen direction of the action. Consistency to avoid distraction. Overlap and matching action — Controlled and uncontrolled action. Avoid jumpy action. Cut-ins, Cut-aways — Head-ons. Tail-aways. Angles — Vary the angles. Buildup — Good use of continuity principles. Time and space. Build suspense. General Rule — When shooting a new scene, change the size of the image, change the angle, or both.
Digital Technology. Audio technology has made great strides in recent years. Digital audio, for one, has transformed sound quality. Digital is a technology whereby the audio signal is broken down into binary bits that represent a mathematical model of the original signal. This system is superior to and is superseding the analog electrical signal that is a continuous variable. Digital sound has two principles. 1. Digital bits are recorded on the film (Dolby Digital and Sony SDDS). 2. Time code is recorded on the film that controls the playback from a separate CD-ROM player (DTS). Lighting (14) If any element of production is underrated, lighting (and shadows) is, but it is important to the success of cinematography. A mood can be created. Proper use of backlighting and a key light can emphasize something in a scene. It can help to create depth. Shadows can dramatize a scene. The wrong shadows on the face of an actor can be devastating. Lighting is a fine art. The lighting director has many tools today to create the proper lighting for a scene. This applies both indoors and outdoors.
Sound Recording Audio for a sound motion picture is usually recorded by using sound equipment that is not an integral part of the camera.
Double system sound. For Hollywood feature film production and practically all other professional production, the sound recording equipment is separate from the camera. This is known as double-system sound. Skilled engineers and highly trained audio technicians work in cooperation with the camera operator, also known as the cameraman.
Dailies Whenever possible, all of the negative film shot during the day is processed each night. A one light work print (normally in color) is made. Color corrections are not made at this stage. See Printing. These are called dailies. In the morning, the director and cameraman (director of photography) look at the dailies to decide if they got what is needed. This includes the acting as well as proper camera angles and other technical aspects. Video dailies are common. After the film negative is processed, it is put on a telecine machine that scans the
1034
MOTION PICTURE PHOTOGRAPHY
image and records it on videotape. It can be NTSC, PAL (standard definition), or high definition. Post Production When all production (shooting) is finished, the film goes into postproduction. Film is processed (developed) by a motion picture film laboratory.
Moviola. For more than a half century, the Moviola, a unique machine designed as an editing tool, was the backbone of the editing process. Using a small rear projection screen and the capacity to play back several sound tracks, an editor could review portions of a film work print, as well as synchronize and interlock picture and sound. Eventually the Moviola was displaced by flat bed editing tables which are still used in editing today. Steenbeck and Kem are two popular makes. The flat bed has all of the features of the old Moviola and the advantage of holding several reels of film, such as A and B rolls, plus splicer.
Laboratory Services. For purposes of this article on Motion Picture Photography, cinematography laboratory services and post-production functions are greatly simplified. A motion picture laboratory provides many services. Not every lab offers the same services. Some smaller labs specialize in production support. Others do volume printing of release prints such as theatrical feature productions, television programs, and TV commercials. Other labs may cater to the industrial market and to contract producers who often work in 16 mm and end up making videotapes for distribution and video projection for use on television. For some of these labs, videotape work has become their major business, although it originated from the basic film processing service.
Fade. Done either in the camera or in the laboratory. A fade-in is a scene that begins when the screen is totally black and gradually fades to full exposure. A fade-out is the opposite, going from full exposure to black. The fade is usually used to begin or end a sequence of scenes.
Processing. A primary service of most laboratories is film processing. Processing is the function of developing exposed film through a series of chemical baths and washes. It begins with the negative film. This same chemical process is also used to develop intermediate film. Print film uses a different process. Certain films require special handling and may use processing chemicals different from those normally used. The final stage is drying the film, after which it is wound onto a core or a reel.
Dissolve. As one scene fades out, the next scene fades in over the fade-out. The dissolve is usually made by laboratory printing using A and B rolls. It can be made, however, in the camera by doing a fade-out and then winding the film back and finally a fade-in.
Printing. An important service of most laboratories is printing. Printing is making a copy or print from a negative or, for quantity release printing, from a duplicate internegative (to protect the very valuable original negative). The printer is a sophisticated piece of equipment. The negative film is placed in a printer, which beams light through the aperture and projects the negative onto unexposed print film. Printing Lights. In the early years, printers literally had 21 lamps. Depending on the density of the negative, a technician preparing a film for printing judged the film to decide how many lights would be needed for a given scene. The settings were punched into a tape, hence the term tape setting. Today printers use a single light which is split into three components (red, green, and blue) using filters. Each component’s intensity can be varied. The term printer light has remained a common lab term in print exposure. In speaking about the density of a negative, they will refer to a certain scene as requiring so many lights or trim adjustment. Each change in setting is 0.025 log exposure. Film Editing The major operation in the postproduction phase is film editing, although the actual technique has changed greatly in recent years. For nearly a century, editing both picture and sound used a procedure barely changed over the years.
A and B Rolls. Original film, the negative or reversal film, is divided into two rolls by a break wherever an optical effect, such as a dissolve, is to be made. Using A and B rolls, titles can be printed over backgrounds. Using a 16 mm color negative, splices can be hidden. The A and B rolls are both put on the printer.
Special Effects. Scenes in a film that are not standard camera shots, used to achieve something unusual, are broadly termed special effects. They may be done in the laboratory or shot by a company or people specially skilled in this technical art. Optical techniques are still used, but many visual effects are done digitally after scanning the film at a very high resolution — 1980 pixels per line or higher. Feature Film Editing What follows is a simplified description of the traditional film editing operation. For nearly a century, all editing literally consisted of cutting the work print into pieces. Scenes would be regrouped to create the desired sequence. Each piece of film is then trimmed to the desired length to make the action flow smoothly. The individual pieces are cemented together (with a special glue called film cement) putting all of the pieces back into a single linear strip of film. Once the editing of the work print is completed and approved by the director and/or others, the negative is cut or assembled to match the approved work print. The camera film is timed using a color analyzer. At this point, depending on whether the original is 65 mm, 35 mm, or 16 mm, several options are usually available. They depend on the capability of the laboratory as well as cost factors. For purposes of this subject, let us assume that 35 mm color prints will eventually be made from a 35 mm negative. Once the negative is cut to match the work print, a color master positive (interpositive) is made. Next, using
MOTION PICTURE PHOTOGRAPHY
optical printing, a color duplicate negative (also called an internegative) is made from which the final color release prints are made. Editing On Videotape Many feature films are edited on videotape. Once the edited videotape is completed, one of two steps is taken. For a regular feature film, the camera negative is matched to the edited videotape. That results in one of the procedures shown in Fig. 22. In small budget pictures destined to be released in videotape, the dupes are made directly from the edited videotape when the very highest quality is not required. Film shot in 65 mm or in 16 mm film goes through essentially the same process. Nonlinear Editing More recently, the editing process involves digital technology — the most recent state of the art. Most film is now edited in a computer using a nonlinear editing system developed by Avid, Media 100, and Lightworks. The original negative ‘‘image’’ is digitized and transferred
Workprint
Color negative original
1035
into a computer in computer bins. Here, the editor is not confined to editing in typical linear fashion as outlined before. Instead, scenes can be called up and shifted around randomly, as the film is edited. Once the whole film, with sound, is in final form in the computer, an ‘‘edit decision list’’ (EDL) is produced from which a negative cut list is derived using the edge code. The film negative is matched, and the process follows the same procedure outlined for making film prints. Sound Editing Sound is critical to the success of a film. It can be routine, or it can have a powerful effect on the finished product. Sound, beginning in the production phase and carrying through the post production work, is an entire discipline that has its own digital technology. During the editing phase, sound effects are brought in. Again, this is an entire subject by itself that has its own skilled engineers and technicians. Digital Intermediate. The entire negative footage is scanned digitally (1,920 pixels per line or higher). All of the editing and special effects are done digitally before recording the images back onto color intermediate film using a laser or cathode-ray tube recorder. This is expensive, but its use will increase, predictably, as digital costs decrease. Animation Animation, the art of making inanimate pictures or objects move, is increasingly used for motion pictures. This includes such subjects as highly acclaimed Walt Disney films, medical subjects, and industrial training films as well as educational subjects. For decades animation was accomplished by batteries of artists designing, drawing, and then hand coloring individual cels which are transparent triacetate sheets. Each frame would be slightly different where the elements of motion changed minutely. Background scenes can be on a different cel which would remain constant as the action element in front moved. A special animation camera would expose one frame of film at a time to photograph individual cels progressively. The result might be a cartoon film or some complex machine operation shown in slow motion. As computer software technology became more sophisticated, it was possible to develop individual cels much quicker than by hand. Animation art as a form of computer graphics is common.
Color master positive [internegative]
0.5 Color duplicate negative [internegative]
Analog or 3 types of digital sound
Color release prints
Figure 22. Steps in making color prints. 1. The choice of printing procedure depends on a number of factors, including the types of printing and processing facilities available and cost. Final print quality will depend on those factors. The dotted lines show alternate, less common, or less satisfactory methods. 2. The indicates optical printing. In general, if you want symbol to reduce the print size, do it at the last practical stage to obtain the highest definition. 3. All film clips shown here have the base side up. 4. Optical or step contact. 5. Continuous contact or optical. Optical may produce the best quality (courtesy: Kodak Entertainment Imaging).
PROJECTION Motion Picture Projector Essentially, there are two types of projectors. Both operate using the same basic configuration of picture head, optics, and shutter system. The major difference is in the film supply and take-up system. 1. All 16 mm, super and regular 8 mm plus many older style 35 mm projectors use film reels somewhat similar to the spool in a motion picture camera.
1036
MOTION PICTURE PHOTOGRAPHY
theater projectors have two aperture plates, one for an 1.85 : 1 aspect ratio and the other for a 2.39 : 1 scope ratio. In Europe, 1.66 : 1 is sometimes used.
Shutter. Like the camera, the projector has a shutter. The shutter revolution is timed so that the open portion of the shutter is in line with the opening (aperture) in the film gate, which then allows light to pass from the lamp through the film frame and on through the lens to the screen. As the shutter continues to revolve, the solid portion of the shutter blocks the beam of light while the film is moved to the next frame. Normally, 35 mm and 70 mm projectors have two blades. Each frame is projected twice. Some three-bladed shutters for 35 mm have reduced flicker but light transmission is reduced. Shutters in 16 mm projectors have two, three, or five blades. A two-bladed projector gives maximum light. A three-bladed shutter has less flicker than the two blade but projects less light onto the screen. A five-bladed projector was used for television projection in old film chains. Figure 23. Platters for film in a modern multiplex theater (courtesy Strong International, Division of Ballantyne of Omaha, Inc.)
2. Most 35 mm and 70 mm projectors use two or more large flat platters; one has the film yet to be shown and another takes up the film as it leaves the central mechanism of the projector.
Lens. Almost every theater projector has two lenses, some mounted on a turret. One is for the 1.85 : 1 image, and the other is usually an anamorphic lens for a 2.39 : 1 scope image.
Theater projection operation is described following this technical portion. All projectors have three essential elements: 1. picture head 2. lamphouse 3. sound reproducer Picture Head. The picture head, the critical part of a projector, consists of the film path, the picture aperture, the shutter, and the lens. The film moves through the projector guided by two constant-drive sprockets. One feeds the film into the film gate, and the other pulls the film away from the gate and intermittent. The intermittent is a sprocket that does not turn continuously. It makes a quarter turn, moving a frame into the trap at the gate/aperture and holds the film for 3/96th of a second and then repeats the sequence 24 times in a second.
Sprockets. Sprockets are on small hubs, one mounted above the film gate to move the film from the supply reel or platter to the film gate. The second sprocket pulls the film from the gate to the take-up reel or platter. Film Gate and Trap. The film gate consists of two small, rectangular indented plates, slightly wider than the film and nested together. The indentation is exactly the width of the frame, so that no metal portion touches the center of the film in the frame area. This prevents film scratches. Centered in both plates of the film gate is an aperture cut out to the exact size of the film frame. In the US most
Figure 24. Two-bladed shutter used in all 35 mm and many 16 mm and Super 8 projectors for maximum light. Fins on shutter are used for balance in a 35 mm Simplex projector (courtesy Strong International, Division of Ballantyne of Omaha, Inc.).
MOTION PICTURE PHOTOGRAPHY
1037
Projection Lamps. There are hundreds of types of projection lamps. Each is designed for a specific projector model to provide maximum light output. 1. Incandescent halogen lamps are used in smaller projectors, principally 8 mm, some 16 mm projectors and some older 35 mm models. 2. Xenon lamps are in 95% of projectors used for large audiences in auditoriums and theaters. Essentially this includes all 35 mm and 70 mm projectors, many 16 mm and a few 8 mm. 3. A third light source is the carbon arc, which has been used in theater projectors from the early twentieth century up to the 1970s, when metal halide lamps came in. Carbon arcs are fast disappearing. Only the very oldest small theaters may still use the carbon arc. 4. Most video projectors use a metal halide lamp. 5. The newest high intensity lamp also used in video projectors for large audiences is the ultra high pressure mercury lamp. Figure 25. Three-bladed shutter used in Kodak Pageant 16 mm projector for minimum screen flicker (courtesy Eastman Kodak Co.).
Lens. There are a variety of different projection lenses. They range from quite simple lenses for standard 35 mm projectors, 16 mm and 8 mm, to highly sophisticated lenses such as those used for wide-screen shows. Some theater projectionists use a telescope or binoculars to focus the lens. Sharpness or acutance varies with viewing distance. The farther from the screen, the sharper the image. The projectionist is always far from the screen.
Brightness. Screen luminance is measured in foot lamberts (fL). The Society of Motion Picture and Television Engineers (SMPTE) standard for screen luminance is 16 fL at the center of a uniformly lit screen when the projector is running with an open gate projecting white light. A range of 12 fL to 22 fL is within the tolerance. Checks of theaters reveals that many are below the range at 6 fL to 8 fL, thus projecting a dull picture. Sound. Today, two basic types of sound are used for film projection.
Digital Sound. As described in Sound Development, there are three (Dolby, Sony SDDS, and DTS) proprietary digital systems in use in 2001. Optical Sound. The older sound system used for films is called optical sound. A sound track is recorded photographically onto the film. All but some small theaters use digital sound. Analog sound is reserved only as a backup in case of print damage. In 35 mm and 70 mm film, the soundtrack lies between the perforations and the picture frame. In 16 mm and super 8 sound film prints, the perforations are omitted from one side of the film. The optical soundtrack is on the side of the picture frame opposite the perforations. Magnetic Sound. A third system, magnetic sound, is nearly obsolete. A magnetic stripe was applied to the film in the same location on the print film as the optical track. A 35 mm print could have as many as four tracks or magnetic stripes for stereo sound. A 70 mm print would have as many as six stripes. Sound Head
Figure 26. Simplex projector with Strong Console for xenon lamp (courtesy Strong International, division of Ballantyne of Omaha, Inc.).
Every projector has a sound head that reproduces the sound from the film sound track. The head is always mounted below the film gate. The sound for analog is 19
1038
MOTION PICTURE PHOTOGRAPHY
frames ahead of the picture. This distance is standard for all sizes of film from 8 mm up to 70 mm. The offset for the different digital sound systems is proprietary. The sound head or sound scanner is opposite the sound drum, which is a flat roller designed to keep the film precisely positioned at the point where a scanning beam slit scans the sound track. THEATERS The movie theater has been part of our culture for nearly as long as the motion picture has existed. Theaters reflect the culture of their age. Movie palaces in the Roaring Twenties were extremely ornate, reflecting the architectural styles of the day. From the very beginning in the early part of the twentieth century, 35 mm film projectors have been the industry standard. At one point, an unsuccessful attempt was made to introduce 16 mm for small theater use. At the other extreme, the addition of the wide screen using 70 mm projectors represented a significant, successful move in the last half of the twentieth century. Until then, the typical theater used two 35 mm projectors that took 20-minute film reels. The projectionist had to be on his toes at all times. At nineteen minutes, a signal on the film would trigger a bell to alert the projectionist to be ready to switch projectors. He would light up the arc in the other projector and then watch for a dot in the corner of the picture. When he saw it, he would step on a pedal to move a douser that would put a screen in front of the lens and at the same time start the second projector. In addition to timing the projector change, the projectionist had to watch the arc lamp continually. As the arc burned, it became shorter. The projectionist periodically had to crank the arc unit so that it would be at the proper position to produce the brightest light. The arc is nearly a thing of the past. Today’s Movie Theater Xenon lamp technology made great strides in the later half of the twentieth century. Probably 99% of the theaters now use a xenon lamp that emits light much brighter than
the arc of the past without the nerve-racking anxiety. In addition to the advancement in projection lamps, three other major improvements have been made in movie theaters, two obvious to the audience, the other in the projection booth. Platter System. The platter has eased the life of the projectionist. In modern theater design, one wide projection booth can serve as many as 18 small theaters in the same complex. Furthermore, several theaters can project the same film using the platter system. When the film has gone through one projector at theater A, it travels over series of rollers to theater B projector and then back to the take-up platter at theater A. The platter system is used for both 35 mm and 70 mm films. Surround Sound. All newer theaters have sound speakers placed strategically around the theater. This allows sounds from different tracks on the film to come from different speakers around the theater. Stadium seating. Most of the newest theaters no longer have seats sloping slightly toward the screen. Instead, each row of seats is on a step above the seats in front. In this way, even a child can see the screen easily without heads in front cutting the view line. PRESERVATION — ARCHIVES (15) Preserving films for posterity is vital. Many of the early Hollywood films have been lost due to improper storage that lead to deterioration, especially of the film base. Shrinkage is another problem caused by the loss of stabilizers and acetic acid (triacetate film). If that occurs, the distance lost between frames makes projection impossible. Adequate industry standards for film base degradation do not exist. Storage of completed motion picture films under controlled temperature and humidity is crucial. Properly stored (40° F at 50% relative humidity or lower) motion picture films can last at least 500 years,
Sub woofer Left speaker
Center speaker
Right speaker
surround speakers
Figure 27. Surround sound using digital technology. The number of speakers varies depending on the individual theater.
Audience
t Righ
Left surround spea ker s
Screen
MOTION PICTURE PHOTOGRAPHY
1039
Polyester base. This rugged film base, when stored under the ideal conditions noted before, has a predicted life of at least 500 years. Emulsion. The cold storage recommended for the various film bases will insure the stability of the chemicals and gelatin in the emulsions.
Figure 28. Stadium seating improves viewing from any seat in the theater (courtesy IMAX Corp.).
Containers. Films are stored in metal cans, plastic containers, and paper (cardboard) boxes. Many film archives tend to use cans for 35 mm and 70 mm films and plastic (if ventilated) for 16 mm and 8 mm films. Acidfree cardboard is good, but some archivists are concerned about potential water damage. The key is ventilation to remove gases. Molecular sieves in cans and unventilated vaults that can be replaced periodically are used. DIGITAL CINEMA
Figure 29. Stadium seating in a theater designed wide-screen, 3-D, dome projection (courtesy IMAX Corp.).
for
and shrinkage is minimal. The colder the storage (not below freezing), the longer the film will last. Nitrate base. Nitrate degradation occurs through a chemical process. Rusting of film cans is caused by gases emitted by the degrading nitrate and triacetate films. Nitrate film becomes highly flammable even at lower temperatures once the degradation has begun. The emulsion becomes adhesive, and layers stick together. The film contains gas bubbles and emits a noxious odor. As the film becomes soft, covered with a viscous froth, it eventually degrades into a brownish, acrid powder. Surprisingly, the 1903 classic, The Great Train Robbery, is still in perfect condition stored in the film vault of the Library of Congress. Acetate Base. When acetate base begins to deteriorate, the chemical action causes a vinegar odor due to the evaporation of acetic acid from the film base. This is called the vinegar syndrome. There are acid detection strips (A-D strip) which can detect deterioration before the syndrome accelerates. An A-D strip placed in a film can for a day will change color. It can then be compared with colors on a chart to determine the stage of deterioration.
Motion picture photography and projection in the Twentyfirst century are undergoing a major evolution from film/chemical photography to electronic/digital cinematography, distribution, and exhibition. As of December 31, 2000, 30 movie theaters worldwide had been converted to digital cinema projection. Several feature films have been shot (photographed) using digital cameras. The conversion to digital involves technical, procedural, and financial factors. Foremost are the technical aspects. The prime technology is in place. Digital cameras exist. Equipment for digital editing has been adapted from other computer systems used for high definition television, and better and more efficient equipment will undoubtedly be designed in the future. As mentioned, projectors for showing digital pictures are installed and being tested in theaters around the world. Adaptation of the production process from systems developed over decades of film and HDTV production will have to be made to take advantage of digital technology. Editing requires a new system and, in some cases, the development of new equipment. Satellite and fiber optic delivery of a finished motion picture to theaters with proper security to prevent piracy is yet to be perfected. Product development and the manufacture of new types of equipment and installation are costly. Theater owners are extremely concerned about the huge expense of converting to digital technology which, in turn, could drive up the ticket price. Their direct competition is the home video market, where DVD media that have high resolution projection and displays are available. During the first 75 years of the Hollywood movie business, changes and new product development were gradual. Toward the end of the twentieth century, the pace of change increased radically. As the introduction of new systems, products, and upgrades continues to increase, the rate of obsolescence becomes a major consideration in management planning. Given the chaotic situation of incompatible video formats, preserving electronic movies for the future is not assured any more than television programs. In the end, however, new technology prevails.
1040
MOTION PICTURE PHOTOGRAPHY
Acknowledgments The author gratefully acknowledges the contributions by the following to the writing of Motion Picture Photography: Grant Loucks, Alan Gordon Enterprises, Inc., Hollywood, CA John Pytlak and Karen Dumont, Eastman Kodak Company, Rochester, NY Robert Gibbons and Glenn Kimmel, Eastman Kodak Company, Hollywood, CA Donald Roberts, Fox, WUHF-TV, Rochester, NY Masaru Jibiki, Fuji Photo Film. Co., Los Angeles, CA James Harte, film editor and projectionist, Rochester, NY Kenneth Baker, IMAX Ltd., Toronto, ON, Canada Philip Hill, iWERKS Moving Entertainment, Burbank, CA Kathy Nifield, MegaSystems, Wayne, PA. F. G. Bauer, Moviecam F. G. Bauer GesmbH, Vienna, Austria John Walsh, NBC, WHEC-TV, Rochester, NY Frank Kay, Panavision Corp., Woodland Hills, CA Michael Ellis, Showscan, Culver City, CA Phil Vigeant, Super8 Sound, Burbank, CA Ray F. Boegner, Strong, Div. of Ballantyne of Omaha, Inc. Omaha, NB Dr. Richard Goldman, Technicolor Inc., North Hollywood, CA
Academy Aperture. In projection, the aperture cutout, designed as specified by the American Academy of Motion Picture Arts and Sciences, that provides for a screen-image aspect ratio of approximately 1.37 : 1; also called ‘‘sound aperture.’’ Academy Leader. A nonprojected identification and timing countdown film leader designed to specifications of the American Academy of Motion Picture Arts and Sciences that is placed at the head end of a print reel. The countdown cuing information is related to ‘‘feet’’ which, in the silent days, meant projection at 16 frames per second, or 1 foot per second. Acetate. A slow-burning base material frequently used for motion picture films. Also, in sheet form for overlay cels. Acetate-base film. Any film that has a support made of cellulose triacetate; safety film. Additive printing. The use of three separate colored sources — red, green, blue — combined to form the light source that exposes the film. Modem additive printers separate white light from a tungsten-halogen bulb into its red, green, and blue components by using a set of dichroic mirrors.
BIBLIOGRAPHY 1. Hope Reports, Inc. 2. Encyclopedia Britannica. (1940). 3. Encyclopedia Britannica. (2001). 4. J. Belton, SMPTE J., p. 652–(August 1990).
Advance. The separation between a point on the sound track of a film and the corresponding picture image.
5. http://www.kodak/US/en/motion/support/types 6. S. G. Rose, J. SMPTE, (August 1963). 7. Student Filmmaker’s Handbook, Eastman Kodak Rochester (1991). 8. J. Fauer, Arriflex 435 Book, Arri USA, (2000).
leaves the roll at the top and toward the right, winding ‘‘A’’ should have the perforations on the edge of the film toward the observer, and winding ‘‘B’’ should have the perforations on the edge away from the observer. In both cases, the emulsion surface should face inward on the roll.
Co.
9. M. Mikolas and G. Hoos, Handbook of Super 8 Production. United Business, NY, (1976). 10. A. Kattelle, Home Movies. (2001). 11. F. E. Riley, Univ. of Kansas Master of Arts dissertation, (1996). 12. http://www.widescreenmuseum.com//cinemaaspecs.htm 13. A. L. Gaskill and D. A. Englander, Pictorial Continuity, Duell, Sloan and Pearce, NY, (1947). 14. Ross Lowell, Matters of Light & Depth, Lowell-Light Manufacturing, Inc., (1999). 15. James M. Reilly, Film Decay And How To Slow It, Image Permanence Institute, Rochester Institute of Technology, Rochester, NY, (1993).
GLOSSARY OF MOTION-PICTURE TERMS
Aerial image or virtual image. An image focused by a projection lens near a field or relay lens. A camera lens is then used to form a real image on the film from the aerial image. A cell or another material can be placed at the aerial-image location to combine it with the aerial image on film. Analog. An electrical signal that is continuously variable. Analytical density. Measurement of the amount of yellow, cyan, and magenta dye in an image. Anamorphic image. An image that has been squeezed in one direction, usually horizontally, by an anamorphic lens. Anamorphic lens. A lens that produces a ‘‘squeezed’’ image on film in the camera. When the film is projected on a screen, an appropriate lens reverses the effect, and the image spreads out to lifelike proportions. Designed for wide-screen movie photography and projection.
Student Filmmakers Handbook, 1993 Courtesy of Eastman Kodak
Anamorphic release print. A print in which the images are compressed horizontally.
A & B Cutting. A method of assembling original material in two separate rolls, allowing optical effects to be made by double printing (A and B printing).
Angle. With reference to the subject, the direction from which a picture is taken. The camera–subject relationship in terms of the immediate surroundings.
A or B Wind. When a roll of 16 mm film, perforated along one edge, is held so that the outside end of the film
Animation. The making of inanimate objects to appear mobile. This can be done by exposing one or two frames
MOTION PICTURE PHOTOGRAPHY
of movie film and then moving the objects slightly and exposing one or two more frames, etc. When the movie is projected, the objects will appear to have moved by themselves. Animation camera. A motion picture camera that has special capability for animation work, which usually includes frame and footage counters, the ability to expose a single frame at a time, reverse-filming capability, and parallax-free viewing. Ansi. American National Standards Institute.
1041
Barn door. A frame that has adjustable flaps attached to the studio light to control unwanted spill light or the spread of the light beam. Barney. A lightweight padded covering that generally performs the same function as a blimp. Heated barneys are sometimes used to facilitate shooting under extremely cold outdoor conditions. Base. The transparent, flexible support, commonly cellulose acetate, on which photographic emulsions are coated to make photographic film.
Answer print. The first print (combining picture and sound, if a sound picture), in release form, offered by the laboratory to the producer for acceptance. It is usually studied carefully to determine whether changes are required before printing the balance of the order.
Bipack filming. The running of two films simultaneously through a camera or optical printer either to expose both or expose one through the other, using the one nearest to the lens as a mask. Often used in special-effects work to combine live action with animated images.
Antihalation backing (coating). A dark layer coated on or in the film to absorb light that would otherwise be reflected back into the emulsion from the base.
Black-and-white film. A film that produces a monochromatic picture in shades of gray (usually a metallic silver image).
Aperture. (1) Lens: The orifice, usually an adjustable iris, which limits the amount of light passing through a lens. (2) Camera: In motion picture cameras, the mask opening that defines the area of each frame exposed. (3) Projector: In motion picture projectors, the mask opening that defines the area of each frame projected. Aperture plate. A metal plate containing the aperture that is inserted into a projector or camera. (Note: In some cameras, the aperture plate cannot be removed.) Arc lamp. A lamp whose light source consists of an open carbon arc or a closed xenon arc. The light is generated in a gas ball between two electrodes. ASA. Exposure Index or speed rating that denotes the film sensitivity, defined by the American National Standards Institute. Actually defined only for black-andwhite films, but also used in the trade for color films. Aspect ratio. Proportion of picture width to height such as 1.37 : 1, 1.85 : 1, or 2.35 : 1. Autoassemble. An operation in which a computer performs editing unaided, working from a previously edit decision list. Backing. (1) Antihalation backing: A temporary, darkcolored, gelatin coating that is sometimes applied to the rear side of a photographic plate or film to reduce halation by absorbing any light that may pass through the emulsion. (2) Noncurl backing: A transparent, gelatin coating, sometimes applied to the side of photographic film opposite from the emulsion to prevent curling by balancing the forces that tend to curl the film as it is wet and dried during processing (3) Coating: (e.g., antiabrasion coating or rem-jet backing) applied to the base side of the film to improve characteristics and performance. Balance stripe. A magnetic stripe on the edge of the film opposite from the magnetic track. Although the purpose of the stripe is to keep the film level on the reel, some projectors also use it for recording.
Black light. Ultraviolet light. Blimp. A soundproof enclosure that completely covers the camera to prevent camera-operating noise from being recorded on the sound track. Blooping. The technique of applying a special opaque ink or tape in a wide triangular or circular pattern across the sound track at the splice to prevent soundtrack clicks and other annoying sounds caused by splices. Blow up (part of frame). In transferring an image by an optical printer, it is possible to enlarge a properly proportioned fraction of the original image to full frame size in the copy or to enlarge an original 16 mm image to 35 mm size. Blue-screen. The filming or videotaping of actors, props, or objects in front of a blue screen (or green screen). In postproduction, the blue or green is replaced by another element, such as background, using digital or optical special-effects techniques. Boom. A long, adjustable arm used to position a microphone during production. Bounce light. Light that is reflected off ceilings and walls to illuminate the subject indirectly. Breakdown. The separation of a roll of camera original negative into its individual scenes. Broad light. Soft, floodlight type of illumination unit; usually not focusable. Butt splice. Film splice in which the ends come together without overlapping; ends are held together by splicing tape. Camera axis. Any imaginary line running exactly through the optical center of the camera lens. Camera log. A record sheet giving details of the scenes photographed on a roll of original negative.
1042
MOTION PICTURE PHOTOGRAPHY
Camera operator. In a feature production, the person who operates the camera under the direction of the DPDirector of Photography. Camera operator-animator. The person responsible for translating instructions on the exposure sheet into camera moves and photographing the artwork. Candela (cd). International unit of luminance measurement (1 candela per square meter = 0.2919 footlamberts). Cel animation. An animation technique in which the figures to be animated are drawn and painted on cels, placed over a background, and photographed frame by frame. Cel animation has been the standard technique for studio animation since its invention in 1915. Cellulose triacetate. Transparent, flexible material used as a base support for photographic emulsions. Cement splice. Film splice made by using a liquid solvent cement to weld the overlapping ends together. Changeover. In projection, the act of changing from one projector to another, preferably without interrupting the continuity of projection, or the points in the picture at which such a change is made. Character animation. The art of making an animated figure move like a unique individual; sometimes described as acting through drawings. The animator must understand how the character’s personality and body structure will be reflected in its movements. Chromakey. A method of electronically matting or inserting an image from one camera into the picture produced by another. Also called ‘‘keying’’, the system uses a solid color background behind the subject to be inserted and signal processing through a special-effects generator. Chrominance. The color portion of a video signal. Cinch marks. Short scratches on the surface of a motion picture film, running parallel to its length; these are caused by dust or other abrasive particles between film coils or improper winding of the roll that permits one coil of film to slide against another. Cinemascope. Trade name of a system of anamorphic wide-screen presentation. The first commercially successful anamorphic system for the presentation of wide-screen pictures combined with stereophonic sound. The 35 mm negative camera image is compressed horizontally by 50% using a special anamorphic camera lens. Upon projection, the 35 mm print image is expanded horizontally by the same amount using a similar anamorphic projection lens. Depending on the type of sound used in the print, the screen image has an aspect ratio of 2 : 35 : 1 (optical sound) or 2 : 55 : 1 (four-track magnetic sound). Cinemiracle. A wide-screen presentation, as in Cinerama, that used three separate 35 mm film strips projected on a large, deeply curved screen. One of the main differences, however, was the consolidation of the three projectors in a single booth away from the audience. This
was accomplished by using mirrors on the two outer projectors to maintain picture orientation. Cineon digital film system. A new Kodak system that transfers images originated on film to a digital format for electronic compositing, manipulation and enhancement, and outputs back to film without losing image quality. Cinerama. Originally, a wide-screen presentation using three separate 35 mm films, each containing one third of the total image (six perforations high) and projected on a deeply curved and vertically slotted screen from three projectors located in booths on the main floor of the auditorium. The sense of involvement was extraordinary, but the ever-present seams between the separate projected images were quite distracting. Current Cinerama presentations use 70 mm film containing a single image that is purposely distorted. During projection, the image distortion is corrected by the deeply curved screen, and the original Cinerama sensation is re-created. Cinex strip. A short test print in which each frame has been printed at a different exposure level. Circarama. A special presentation system used at Disneyland. The spectators stand in the middle of a circle viewing a 360° panorama on a surround screen 8 feet high and 40 feet in diameter made of eleven panels. The original negatives are made on eleven 16 mm cameras arranged in a concentric circle. The prints are projected by a ring of interlocked 16 mm projectors. Clapsticks. Two boards hinged at one end that are slapped together to indicate the start of a filming session (take). Used by editors in conjunction with a slate, which provides the corresponding visual cue, to synchronize sound and image. Claw. Mechanism used in most camera and projectors to move the film intermittently. Clay animation. An animation technique involving the use of pliable clay figures that are manipulated before each exposure. Close-up. A detail photographed from such a distance that only a small portion of the subject fills a frame of film. Coated lens. A lens covered with a very thin layer of transparent material that reduces the amount of light reflected by the surface or the lens. A coated lens usually transmits more light than an uncoated lens at the same f -stop because flare is less. Collimated. A beam of light is said to be collimated when all of its rays have been made parallel. Color analyzer. A device for determining the correct printing light ratios for printing color negatives. Color balance. The perceptual appearance of a color image of film as a function of the ratio of exposures of each of these primary color records on the film. Color correction. Altering the color balance by modifying the ratio of the printing light values.
MOTION PICTURE PHOTOGRAPHY
Color duplicate (dupe) negative. Duplicate of a negative color image made from a negative color original. Typically used for making release prints. Color internegative. Negative-image color duplicate made from a positive color original. Typically used for making release prints. Color negative. A negative (opposite) record of the original scene. Colors are the complementaries of the colors in the scene; light areas are dark, and dark areas are light. Color positive. A positive record of the original scene. Color print film. Film designed for making positive prints from color originals and color duplicates. Color reversal film. Film that has a color positive image after processing. Can be an original camera film or a film on which other positive films are printed. Color reversal intermediate. Color duplicate negative made directly from an original color negative by the reversal process. Color saturation. The brilliance or purity of a color. When colors in a film image are projected at the proper screen brightness and without interference from stray light, the colors that appear bright, deep, rich, and undiluted are said to be saturated. Color separation negative. Black-and-white negative made from red, green, or blue light from an original subject or from positive color film. Color temperature. The color quality of the light source expressed in Kelvin (K). The higher the color temperature, the bluer the light; the lower the temperature, the redder the light. Combined negative. Negative film containing the picture and the sound track. Complementary color. Color that is minus one of the primary colors. Cyan is minus red — cyan and red are complementary colors; yellow is minus blue — yellow and blue are complementary colors; magenta is minus green — magenta and green are complementary colors. Produces white when mixed in equal parts with the primary color to which it is complementary. Composite print. A print of a film that contains both picture and sound track. Films regularly shown in theaters are composite prints. Also called release print. Composite video. A video signal in which the luminance and chrominance elements have been combined, as is NTSC, PAL, and SECAM. Compound. The flat, tablelike part of an animation stand, on which the artwork rests while it is being photographed.
1043
size as the original images but have a reversed left-to-right orientation. Continuity. The smooth flow of action or events from one shot or sequence to the next. Continuous contact printer. A printing machine where the emulsion of the negative film is in direct physical contact with the positive raw stock emulsion, and the two films move continuously across the printing aperture. Contrast. (1) The general term for describing the tone separation in a print in relation to a given difference in the light-and-shade of the negative or subject from which it was made. Thus, ‘‘contrast’’ is the general term for the property called ‘‘gamma’’ (Y), which is measured by making an H & D Curve for the process under study. (2) The range of tones in a photographic negative or positive expressed as the ratio of the extreme opacities or transparencies or as the difference between the extreme densities. This range is more properly described as ‘‘scale’’ or ‘‘latitude.’’ (3) The ability of a photographic material, developer, or process as a whole to differentiate among small graduations in the tones of the subject. Control strip. A short length of film containing a series of densities to check on laboratory procedures. Core (film). A plastic cylinder on which film is wound, shipped, and stored. Correction filter. A medium enabling a color change. Coupler. A chemical incorporated in the emulsion of color film stocks that produces a dye image associated with the developed silver image. Crane. The mounting that supports the camera over the compound. Credits. Titles of acknowledgment for the production. CRI. Color reversal intermediate, a duplicate color negative prepared by reversal processing. Cropping. To change, delete, or otherwise alter the size of an image being projected or viewed as a print. In theatrical projection, it is usually the result of ‘‘homemade’’ aperture plates, improper screen masking, wrong focal length lenses, etc. Cut. (1) The instantaneous change from one scene to another. Successive frames contain the last frame of one scene and the first frame of the following scene. (2) To stop operation of camera, action, and/or sound recording equipment. (3) To sever or splice film in the editing process. Cutting. The selection and assembly of the various scenes or sequences of a reel of film. Cyan. Blue-green; the complement of red or the minusred subtractive used in three-color processes.
Conform. Match the original film to the final edited work print.
D log H curve. The curve showing the relation between the logarithm of the exposure and the resultant density on processed film.
Contact print. Print made by exposing the receiving material in contact with the original. Images are the same
D log E (Density vs. the log of exposure). The graph made by plotting the density of a film sample against the
1044
MOTION PICTURE PHOTOGRAPHY
log of the exposure that made that density. Also known as D log H and H and D curve. D log H (H for exposure) is the technically correct term. Dailies. Picture and sound work prints of a day’s shooting; usually an untimed one-light print, made without regard to color balance. Produced so that the action can be checked and the best takes selected; usually shown before the next day’s shooting begins. Daylight. Light consisting of a natural combination of sunlight and skylight (approximately 6,500 K). Decibel (dB). Unit of loudness measured on a logarithmic scale. The human ear can perceive 1 dB changes in loudness in the aural range. Definition. The clarity or distinctness with which detail of an image is rendered; fidelity of reproduction of sound or image. Delrama. A wide-screen process CinemaScope-type presentations
compatible
with
Densitometer. Instrument used to measure the optical density of an area in a processed image by transmittance or by reflectance. Density. Light-stopping characteristics of a film or a filter. The negative logarithm to the base 10 of the transmittance (or reflectance) of the sample. A sample which transmits 2 of the incident light has a transmittance of 0.50, or 50%, and a density of 0.30. Depth of field. The range of object distances within which objects are in satisfactorily sharp focus in a photograph. Depth of focus. The range through which a photographic film or plate can be moved forward and backward with respect to the lens while retaining satisfactorily sharp focus on an object at a given distance.
visible radiation transmitted, thus providing an efficient heat filter for arc radiation devices. Diffraction. The spreading of light as it passes the edges of opaque objects or through narrow slits. Light also is diffracted when passing through a lens. The effects of this distortion on images is greater as the aperture becomes smaller. Diffusion. The spreading of light rays from a rough reflecting surface or by transmission of light through a translucent material. Digital. A system whereby a continuously variable (analog) signal is broken down and encoded into discrete binary bits that represent a mathematical model of the original signal. Digital effects. Special effects, such as picture compression, rotation, reversal, performed by a digital effects system. Digital recording. Sound-recording process in which sound waves are recorded as digital bits. During playback, a digital-to-analog conversion occurs that changes the digital bits back into sound waves. Digital recording produces high-quality true sound that does not contain any system noise. Dimmer. An electrical device, normally in the form of variable resistance or load, that reduces electrical energy to a lamp, usually by reducing voltage. Director. The person who interprets a written book or script. The director oversees all aspects of the production. Dissolve. An optical or camera effect in which one scene gradually fades out when a second scene fades in. There is an apparent double exposure during the center portion of a dissolve sequence where the two scenes overlap. Distributor. Firm that sells, leases, and rents films.
Designer. In studio animation, the person responsible for the overall look and style of the film.
Dolby system. Trade name for an audio noise-reduction system.
Developer. A solution used to turn the latent image on exposed films into a visible image.
Dolly. (1) A truck built to carry camera and camera operator to facilitate movement of the camera while shooting scenes. (2) To move the camera toward or away from the subject while shooting a scene.
Dialogue. The portion of the sound track that is recorded by the voice artists and spoken by the characters on the screen. Diaphragm. An adjustable opening mounted behind or between the elements or a lens used to control the amount of light that reaches the film. Openings are usually calibrated in f -numbers. Dichroic. A type of coating that, when applied to glass, can produce a so-called ‘‘cold’’ mirror for use in projector lamphouses that permits greater screen brightness without the risk of radiant energy (heat) problems. Usually, the rear surface of the mirror is treated by depositing very thin layers of a special coating designed to transmit infrared (IR) radiation effectively and reflect visible radiation. Alternatively, by selecting certain other materials for the deposit, IR radiation can be reflected and
Double-frame. Identical views photographed twice (two frames) instead of once. This technique cuts in half either the speed of a movement or the number of drawings required for a complete action. Sometimes called ‘‘on twos.’’ Double-system recording. Synchronous sound recording on a recorder that is separate from the camera. Recorders are typically magnetic and have sync-pulse capability. Dubbing. The combination of several sound components into a single recording. Dupe, dupe negative. A duplicate negative made from a master positive by printing and development or from an original negative by printing followed by reversal development.
MOTION PICTURE PHOTOGRAPHY
Dye. In photography, the result of color processing in which the silver grains or incorporated color couplers have been converted into the appropriate dye to form part of the color image. Edge numbers (key numbers/footage numbers). Sequential numbers printed along the edge of a strip of film by the manufacturer to designate identification. Edit. To arrange the various shots, scenes, and sequences, or the elements of the sound track, in the order desired to create the finished film. Editor. The individual who decides what scenes and takes are to be used, how, where, in what sequence, and at what length they will appear. (Edit decision list EDL). List of edits prepared during off-line editing. Emulsion, emulsion layer. (1) Broadly, any lightsensitive photographic material consisting of a gelatin emulsion containing silver halides together with the base and any other layers or ingredients that may be required to produce a film that has desirable mechanical and photographic properties. (2) In discussions of the anatomy of a photographic film, the emulsion layer is any coating that contains light-sensitive silver halide grains, as distinguished from the backing, base, substratum, or filter layers. Emulsion side. The side of a film coated by emulsion. Emulsion speed. The photosensitivity of a film, usually expressed as an index number based on the film manufacturer’s recommendations for the use of the film under typical conditions of exposure and development. Encoder. A circuit that combines the primary red, green, and blue signals into a composite video signal.
Estar base. The trade name applied to the polyethylene terephthalate film base manufactured by Eastman Kodak Company. Exchange. A depository and inspection/distribution center for theatrical release prints. Exchanges are located in approximately 35 regional areas within the United States dependent roughly on theater and population density. Existing light. Available light; strictly speaking, existing light covers all natural lighting from moonlight to sunshine. For photographic purposes, existing light represents the light that is already on the scene or project and includes room lamps, fluorescent lamps, spotlights, neon signs, candles, daylight through windows, and outdoor scenes at twilight or in moonlight. Exposure. Amount of light that acts on a photographic material; product of illumination intensity (controlled by the lens opening) and duration (controlled by the shutter opening and the frame rate). Exposure latitude. Degree to which film can be underexposed or overexposed and still yield satisfactory results.
1045
Exposure setting. The lens opening selected to expose the film.
f -Number. A symbol that expresses the relative aperture of a lens. For example, a lens having a relative aperture of 1.7 would be marked f /1.7. The smaller the f -number, the more light the lens transmits. Fade. Exposure of motion picture film either in the camera or during subsequent operations, so that, for a fade-in, starting with no exposure and extending for a predetermined number of frames, each successive frame receives a systematically greater exposure than the frame preceding it, until full normal exposure for the scene has been attained. From this frame on, successive frames receive identical exposure for the remainder of the take. Fast. (1) Having a high photographic speed. The term may be applied to a photographic process as a whole, or it may refer to any element in the process, such as the optical system, emulsion, or developer. (2) Resistant to the action of destructive agents. For example, a dye image may be fast to light, fast to heat, or fast to diffusion. Field of view. The portion of the scene in front of the camera represented within the limits of the camera aperture at the focal plane. Area of field thus varies with the focal length of the lens and the camera-to-subject distance. Fill light. Light used to fill in shadows. Film (motion picture film). A thin, flexible, transparent ribbon perforated along one or both edges; it bears either a succession of images or a sensitive layer capable of producing photographic images. Film base. Flexible, usually transparent, support on which photographic emulsions are coated. Film can. Metal container designed to hold rolls, spools, or reels of motion picture films. Film cement. A special combination of solvents and solids used to make overlap splices on motion picture film by its solvent action and subsequent welding of the film at the junction. Film gate. Components that make up the pressure and aperture plates in a camera, printer, or projector. Film gauge. Width of the standard sizes of motion picture films. Film identification code. Letter that identifies film type. Film-to-tape transfer. The process of transferring an image captured on film to videotape. Film perforation. Holes punched at regular intervals for the length of film, intended to be engaged by pins, pegs, and sprockets as the film is transported through the camera, projector, or other equipment. Filter. A piece of glass, gelatin, or other transparent material used over the lens or light source to emphasize, eliminate, or change the color or density of the entire scene or certain elements in the scene.
1046
MOTION PICTURE PHOTOGRAPHY
Final cut. Last editing of a work print before conforming is done or before sound work prints are mixed. Fine grain. Emulsion in which silver particles are very small. First print. The first trial composite (married) print containing both picture and sound for checking picture and sound quality. Fixing bath (hypo). A solution that removes any unexposed silver halide crystals in the film. In addition, in color films, the silver is removed from the exposed area, leaving only the image-forming dyes. Flaking. The removal (chipping away) of emulsion particles from the edges of the film that tend to redeposit in the image area while the film is going through the projector gate. Flaking is eased by proper edgewax lubrication. Flange. The rim on a roller used for guiding the film. Also, a large disc used on a rewind to take up film on a core. A pair of flanges (disc) that screw together is called a split reel. Flashing. Technique for lowering contrast by giving a slight uniform exposure to film before processing. Flat. An image is said to be ‘‘flat’’ if its contrast is too low. Flatness is a defect that does not necessarily affect the entire density scale of a reproduction to the same degree. Thus, a picture may be ‘‘flat’’ in the highlight areas or ‘‘flat’’ in the shadow regions, or both. Flutter. In sound, rapid periodic variation of frequency caused by unsteadiness of the film or tape drive.
Foreground. The part of the scene in front of the camera, represented within the limits of the camera aperture, occupied by the object(s) nearest to the camera. Format. The size or aspect ratio of a motion picture frame. FPS. Frames per second, indicating the number or images exposed per second. Frame. The individual picture image on a strip of motion picture film. Frame-by-frame. Filming in which each frame is exposed separately, as the object being photographed must be altered before each exposure to create the illusion of movement in the finished film; as opposed to the more usual method of filming in which the film runs through the camera at a steady, prescribed rate to record action taking place before it. Frame line. The separation between adjacent image frames on motion picture film. Frame (video). A complete television picture made up of two fields, produced at the rate of approximately 29.97 Hz (color) or 30 Hz (black-and-white). Freeze frame. An optical printing effect in which a single frame image is repeated so as to appear stationary when projected. Frequency response. Ability of the photographic sound track to reproduce the full spectral range of sounds.
Fog. Darkening or discoloring of a negative or print, or lightening or discoloring of a reversal material. Causes include accidental exposure to light or X rays, overdevelopment, using outdated film, and storing film in a hot, humid place.
Gain, screen. The measure of a screen’s ability to reflect the light incident on it. A perfect screen reflects back all of the light that is incident on it at all angles. Such a screen would have a gain of 1.0. In practical use, however, most matte screens that allow wide viewing angles have a gain of about 0.85. Special metallized or directional screens can provide up to about 15 times more reflected light than a common matte screen, but their viewing angles are generally very limited, making them unsuitable for most theatrical applications.
Foley. Background sounds added during audio sweetening to heighten realism, for example, footsteps, bird calls, heavy breathing, and short gasps.
Gamma. Measurement of the contrast of an image, representing the slope of the straight-line portion of the characteristic curve.
Follow focus. To change the focus setting of a lens as a scene is photographed to keep a moving subject in sharp focus.
Gate. The aperture assembly at which the film is exposed in a camera, printer, or projector.
Focal plane. The area in space on which parallel rays of light refracted through a lens focus to form sharp images.
Footage. A method of measuring film length and, therefore, screen time. Because 90 feet of 35 mm film equal one minute of screen time, 35 mm footage is used in many studios as a measure of an animator’s weekly output. Animators also refer to the length of scenes in feet, rather than in seconds or minutes — a 30-foot scene, rather than a 20-second one. Footlambert. US luminance measurement footlambert = 3.425 candelas per square meter).
unit (1
Force-process. Develop film for longer than the normal time to compensate for underexposure.
Gauge. Refers to the format of the film stock, that is, super 8, 16 mm, or 35 mm. Gelatin filter (gel). A light filter consisting of a gelatin sheet in which light-absorbing pigment or dye is incorporated. Geneva movement. A mechanical device that produces intermittent film movement in a projector. The principle behind the movement involves a rotating cam and pin that intermittently engage in a four-slotted star wheel, also known as a Geneva cross or Maltese cross. During the pin/slot engagement, the star wheel shaft containing the intermittent sprocket rotates 90° , or one frame. At
MOTION PICTURE PHOTOGRAPHY
normal projection speed, this intermittent rotation occurs 24 times per second. Glove. A white, lintless, cotton glove used when handling motion picture raw stock and new release prints in the laboratory. Should be used in all film handling and changed frequently. Gobo. Panel of opaque material on a footed stand that has an adjustable arm. Used to confine the area that a light illuminates or to keep light from shining directly into the camera lens. Graininess. The character of a photographic image when, under normal viewing conditions, it appears to be made up of distinguishable particles or grains. This is due to the grouping together, or ‘‘clumping’’ of the individual silver grains, which are far too small to be perceived under normal viewing conditions by themselves. Gray card. A commercially prepared card that reflects 18% of the light that hits it. Visually, it appears neutral or a middle gray halfway between black and white. ‘‘Green’’ print. A newly processed print on which the emulsion may still be a little soft. If projected the first time without proper edgewax lubrication, perforation damage can result. Guide rails. Vertical rails located on both sides of the projector trap that restrict lateral movement of the film as it passes through the projector gate. Guide roller. Any flanged roller that is used to guide or restrict the position of motion picture film as it moves through a camera, projector, or printer. Guillotine splicer. Device used for butt-splicing film using splicing tape. Halation. A defect of photographic films and plates. Light forming an image on the film is scattered by passing through the emulsion or by reflection at the emulsion or base surfaces. This scattered light causes a local fog which is especially noticeable around images of light sources or sharply defined highlight areas. Halide. Compound of a halogen, such as chlorine, bromine, or iodine. Hard. (1) As applied to a photographic emulsion or developer, having a high contrast. (2) As applied to the lighting of a set, specular or harsh, giving sharp dense shadows and glaring highlights. Hard light. Light made up of directional rays that creates strong, hard, well-defined shadows; sometimes called specular light.
1047
Head-recording. On a tape recorder, printer, or projector, an electromagnet across which the tape or film is drawn and which magnetizes the coating on the tape base during recording. Hertz (Hz). Unit of frequency; 1 Hz = 1 cycle per second. High-speed camera. A camera designed to expose film at rates faster than 24 frames per second. Used to obtain slow-motion effects. Highlights. Visually, the brightest, or photometrically the most luminant areas of a subject. In a negative image, areas of greatest density; in a positive image, areas of least density. HMI lights. Metal halide lamps are fundamentally mercury arcs that have metal halide additives to adjust the color balance. Usually rated at approximately 5,400 K. For daylight-balanced films. Hold. To freeze or stop the action. To achieve a hold in animation, the same cel or position of an object is photographed for several frames. Hot. Referring to too much light in an area or to an excessively bright highlight. Hue. Sensation of the color itself measured by the dominant wavelength. Humidity. The moisture content of the air. For instance, low humidity describes conditions in a desert. Conversely, high humidity is related to tropical rain forest conditions. Hypo (fixer). The name for a fixing bath made from ammonium or sodium thiosulfate, other chemicals, and water; often used as a synonym for fixing bath. Idle roller. Free turning nonsprocketed rollers for guiding film through its appropriate path. Image, latent image. The invisible image formed in a camera or printer by the action of light on a photographic emulsion. Imbibition print. A color print produced by the transfer of magenta, cyan, and yellow dyed matrix films in register on a specially prepared clear film base or paper — Technicolor process and Kodak Dye Transfer process. Intensity, light. The power (strength) of a light source; the total visible radiation produced by the light source.
Haze filter. These filter provide varying degrees of bluelight and green-light absorption.
Interlock. A system that electronically links a projector to a sound recorder; used during postproduction to view the edited film and sound track, to check timing, pacing, synchronization, etc.
HDTV. High definition television, a video format whose resolution is approximately twice that of standard television.
Intermediate. Film used only for making duplicates from which other duplicates or prints are made. Does not include camera films.
Head end, heads. The beginning of a reel where the film image is upside down when the film is threaded into a projector for showing.
Intermediate sprocket. An intermittently rotated sprocket that positions the film in the aperture of a projector and moves it after the exposure cycle.
1048
MOTION PICTURE PHOTOGRAPHY
Intermittent. Not continuous but equally spaced (sometimes random) motion, as the intermittent (24 fps) motion of film through a projector.
or it may be a long length of any kind of film that is used to establish the film path in a processing machine before using the machine for processing film.
Internegative (dupe negative). Color negative made from a color negative. For making release prints.
Lens. An optical device designed to produce an image on a screen, on a camera film, and in a variety of optical instruments. Also used to converge, diverge, or otherwise control light rays in applications not involving images.
Interpositive. A color master positive print. In the can. Describes a scene or program which has been completed. Also, ‘‘That’s a wrap.’’ Interlace. The manner in which a television picture is composed, scanning alternate lines to produce one field, approximately every 1/60 of a second in NTSC. Two fields comprise one television frame. Therefore, the NTSC television frame rate is approximately 30 fps. Intermittent movement. The mechanism or a camera, printer, or projector by which each frame is held stationary when exposed and then advanced to the next. ISO. International Standards Organization. The international counter part of ANSI. K. Degrees kelvin, the unit of the color temperature scale. Keykode number. Kodak’s machine-readable key numbers, Includes 10-digit key number, manufacturer identification code, film code, and offset in perforations. Keystoning. A geometric image distortion resulting when a projected image strikes a plane surface at an angle other than perpendicular to the axis of throw or when a plane surface is photographed at an angle other that perpendicular to the axis of the lens.
Light axis. An imaginary line running exactly through the center of intensity of a light. Light filter. A light-absorbing transparent sheet, commonly consisting of colored glass or dyed gelatin that is placed in an optical system to control the spectral quality, color, or intensity of the light passing a given plane. Light intensity. Degree of light per unit falling on subject; usually expressed in footcandles. Light meter. An electrical exposure meter for measuring light intensity. Light output. The maximum power or energy delivered by a given light, concentrated by a spotlight, or spread out by a floodlight. Lighting ratio. The ratio of the intensity of key and fill lights to fill light alone. Light valve. Device for controlling the intensity and color quality of light on additive prints. Lip sync. Simultaneous precise recording of image and sound so that the sound appears to be accurately superimposed on the image, especially if a person is speaking toward the camera.
Kinescope. A film of a videotape made by shooting the picture on a specially designed television monitor. Also referred to as Kine.
Liquid gate. A printing system in which the original is immersed in a suitable liquid at the moment of exposure to reduce the effect of surface scratches and abrasions.
Laboratory film. Film products, not intended for original photography, but necessary to complete the production process.
Live-action. The filming or videotaping of staged or documentary scenes of people, props, and locations.
Latent image. Invisible image in exposed, undeveloped film; results from exposure to light. Latent image edge numbering. Images placed on the edge of film products in manufacturing that become visible after development. Latitude. In a photographic process, the range of exposure across which substantially correct reproduction is obtained. When the process is represented by an H and D curve, the latitude is the projection on the exposure axis of that part of the curve which approximates a straight line within the tolerance permitted for the purpose at hand. Layout. A detailed drawing of a shot in which background elements, staging of the action, and camera moves are carefully worked out and plotted; the stage of production in which these are determined. Leader. Any film or strip of material used for threading a motion picture machine. A leader may consist of short lengths of blank film attached to the ends of a print to protect the print from damage while threading a projector,
Long shot (ls). The photographing of a scene or action from a distance or a wide angle of view, so that a large area of the setting appears on a frame of film and the scene or objects appear quite small. Loop (continuous film). A section of film spliced end-toend for use in printing, testing, dubbing, etc. Low key. A scene is reproduced in a low key if the tone range of the reproduction is largely in the high density portion of the H and D scale of the process. Lubrication. To reduce friction, required on processed print film for optimum transport and projection life. Lumen. The measure of luminous flux (the rate at which light pulses are emitted or received). For instance, one candela of light covers a square foot of surface. Luminance. The measured value of brightness; reflected light measured on motion picture screens as footlamberts or candelas per square meter. Lux. Metric measure of illumination approximately equal to 10 footcandles (1 lux = 10.764 fc).
MOTION PICTURE PHOTOGRAPHY
1049
Magazine (projector). Enclosures on a motion picture projector which hold the reels of film.
produce no reduction in density. Sometimes called base plus fog in black-and-white film.
Magenta. Purplish color; complementary to green or the minus-green subtractive primary used in the three-color process. Magenta light results when red and blue light overlap.
Mix. To combine the various sound tracks — dialogue, music, sound effects — into a single track.
Magnetic sound. Sound derived from an electronic audio signal recorded on a magnetic oxide stripe or on full-coated magnetic tape. Magnetic striping. The application of magnetic material on motion picture film intended for recording sound. Magnetic tape/magnetic film. Usually (1/4)-inch plastic audiotape that has been coated with particles that can be magnetized. In film use, it is also used in various formats compatible with Super 8, 16 mm, 35 mm, and 70 mm films.
Motorboating. The distracting sound heard when the film becomes misaligned over the sound drum and causes the sound scanning beam to ‘‘read’’ the film perforations instead of the sound track. Moviola. A trademarked name for a machine that has a small rear-projection screen and the capacity to play back several sound tracks. Used in editing and for reviewing portions of a film during production. Also used to synchronize or interlock picture and sound track in editing. Newer devices called ‘‘flat-bed viewers’’ are slowly replacing upright Moviolas. Multiplexer. Device or circuit used for mixing television signals to a single video recorder.
Magnetic track. Linear path of magnetically recorded audio signal on a magnetic film stripe or tape. The number of ‘‘mag tracks’’ can vary from one to six, depending on the picture format.
Narration. The off-screen commentary for a film; often referred to as ‘‘voice-over.’’
Magoptical. Sound track that has an optical track and one or more magnetic tracks.
Negative film. Produces a negative image (black is white, white is black, and colors appear as complementaries).
Makeup table. A film handling unit that is one component of a platter system. It is used to assemble (makeup) the individual shipping reels into one large film roll on a platter for uninterrupted projection. Masking. Restricting the size of a projected image on a screen by using black borders around the screen. Also, the restriction in size of any projected image or photographic print by using undercut aperture plates or masks and borders. Master positive. Timed print made from a negative original and from which a duplicate negative is made. Master. The final negative-reversal positive or intermediate film from which subsequent prints are made. Matte. An opaque outline that limits the exposed area of a picture, either as a cutout object in front of the camera or as a silhouette on another strip of film. Medium shot. A scene that is photographed from a medium distance, so that the full figure of the subject fills an entire frame. Meter-candle. Unit of illuminance. The light received at a point one meter away from a point light source whose intensity is one candela (formerly candle). Mid-foot key number. Full key number plus bar code, including 32-perforation (35 mm) offset, positioned halfway between each footage number. Will help identify short scenes without a key number. Uses a smaller type size to distinguish from 1-foot key numbers. Use a magnifying glass to read it easily. Minimum density (D-min). Constant-density area in the tone of the characteristic curve where less exposure on negative film or more exposure on reversal film will
Negative image. A photographic image in which the values of light and shade of the original photographed subject are represented in inverse order. Note: In a negative image, light objects of the original subject are represented by high densities, and dark objects are represented by low densities. In a color negative, colors are represented by their complementary colors. Negative-positive process. Photographic process in which a positive image is obtained by developing a latent image made by printing a negative. Neutral-density filters. Used to reduce the intensity of light reading the film without affecting colors. Neutral test card. A commercially prepared card: One side has a neutral 18% reflection that has the appearance of medium gray. The other side has a neutral reflection of 90% and the visual appearance of stark white. Newton’s rings. Fuzzy, faintly colored lines in a projected image caused by high or uneven printer gate pressure. Nitrate film. A highly flammable motion picture film that has not been domestically manufactured since around 1950. It is still present in large quantities in storage vaults and archives and must be very carefully stored to prevent spontaneous combustion, explosions, or other forms of destruction. Noise. Unwanted sound in an audio pickup. Notching. Practice of making a ‘‘V’’ cut to remove damaged perforations rather than removing the damage and making a splice. Not recommended because it weakens the film even more. NTSC (National Television Systems Committee). Committee that established the color transmission system used in
1050
MOTION PICTURE PHOTOGRAPHY
the U.S. and some other countries. Also used to indicate the system itself consisting of 525 lines of information, scanned at a rate of approximately 30 frames per second.
lens with a compression of 2×. Contact 35 mm prints are compatible with anamorphic systems such as CinemaScope.
Off-line editing. The process of creatively assembling the elements of a production, to communicate the appropriate message or story, and/or calculating the order, timing, and pace for user-friendly equipment such as film, (3/4) inch videotape, or nonlinear computer editing systems.
Panchromatic (PAN) film. Black-and-white film which is sensitive to all colors in tones of about the same relative brightness as the human sees in the original scene. Film is sensitive to all visible wavelengths.
On-line editing. Final editing or assembly using the original master tapes to produce a finished program ready for distribution. Usually preceded by off-line editing. In some cases, programs go directly to the on-line editing suite. Usually associated with high-quality computer editing and digital effects. Opaque. Of sufficient density that all incident light is completely absorbed (the opposite of transparent). Optical effects. Trick shots prepared by using an optical printer in the laboratory, especially fades and dissolves. Optical printer. Used when the image size of the print film is different from the image size of the preprint film. Also used when effects such as skip frames, blowups, zooms, and mattes are included. Optical sound. System in which the photographic (optical) sound track on a film is scanned by a horizontal slit beam of light that modulates a photoelectric cell. The voltages generated by the cell produce audio signals that are amplified to operate screen speakers. Original. An initial photographic image or sound recording — whether photographic or magnetic — as opposed to some stage of duplication thereof. Original negative. The negative originally exposed in a camera. Orthochromatic (ortho) film. Film that is sensitive only to blue and green light. Out-take. A take of a scene which is not used for printing or final assembly in editing. Overlap splice. Any film splice in which one film end overlaps the other film end. Overlay. A technique in cel animation in which foreground elements of the setting are painted on a cel and placed over the characters to give an illusion of depth to the scene. PAL (phase alternation by line). Color television system developed in Germany and used by many European and other countries. PAL consists of 625 lines scanned at a rate of 25 frames per second. Pan. A camera move in which the camera appears to move horizontally or vertically, usually to follow the action or scan a scene. In animation, the effect is achieved by moving the artwork under the camera. Panavision 35. A 35 mm process using 35 mm negative film and photographed through a Panavision anamorphic
Parallax. In camera work, the viewfinder often is mounted so that its optical axis is at an appreciable distance from the optical axis of the camera lens, commonly resulting in inadvertent positional errors in framing. Perforations. Regularly spaced and accurately shaped holes that are punched throughout the length of a motion picture film. These holes engage the teeth of various sprockets and pins by which the film is advanced and positioned as it travels through cameras, processing machines, and projectors. Persistence of vision. The ability of the eye to perceive a series of rapid still images as a single moving image by retaining each impression on the retina for a fraction of a second, thus overlapping the images. This phenomenon makes it possible to see the sequential projected images of a motion picture as lifelike continuous movement. Perspecta sound. A system of recording that produces a form of stereophonic reproduction by using a single optical sound track carrying three subaudible control tones that can shift the one-track sound source to the left, center, or right speakers by using the appropriate reproducing equipment. The system is compatible with normal singletrack sound reproducers. Photometer. An electro-optical device used to measure light intensity (a light meter). Pin registration. A film term relating to the steadiness of the image. For optical and film-to-tape transfers, a pin-registered device holds each frame in position for a perfectly registered image, critical for creating multilayered special effects. Pitch. (1) The property of sound that is determined by the frequency of the sound waves. (2) Distance from the center of one perforation on a film to the next, or from one thread of a screw to the next, or from one curve of a spiral to the next. Pixel (‘‘picture element’’). The digital representation of the smallest area of a television picture, appearing as a tiny dot on the television screen. In a full color image, each pixel contains three components — a combination of red, green, and blue signals — reflecting the trichromatic nature of human vision. The number of pixels in a complete picture differs from one system to another; the more pixels, the greater the resolution. Polarizing filter. Transparent material used to subdue reflections and control the brightness of the sky. Polyester. A name for polyethylene terephthalate developed by E.I. Dupont de Nemours & Co. (Inc.) A film base
MOTION PICTURE PHOTOGRAPHY
material exhibiting superior strength and tear characteristics. Cronar is the trade name for Dupont motion picture products; Estar Base is the tradename for Kodak products. Positive film. Motion picture film designed and used primarily for making master positives or release prints. Postproduction. The work done on a film such as editing, developing and printing, and looping, once photography has been completed. Primary color. One of the colors of light — blue, red, or green — that can be mixed to form almost any color. Printer lights. Incremental steps on an additive printer. Print film. Film designed to carry positive images and sound tracks for projection. Printed edge numbers. Edge numbers (usually yellow) placed on film at the laboratory by a printing machine. Printing. Copying motion picture images by exposure to light. Processing. Procedure during which exposed film is developed, fixed, and washed to produce either a negative or a positive image. Process screen photography. The filming or videotaping of actors, props, or objects in front of a blue screen (or green screen). In postproduction, the blue or green is replaced by another element, such as a background, using digital or optical special-effects techniques. Producer. The administrative head of the film, usually responsible for budget, staff, legal contracts, distribution, scheduling, etc. Production. The general term used to describe the process involved in making all of the original material that is the basis for a finished motion picture. Loosely, a completed film. Production supervisor. An assistant to the producer, who is in charge of routine administration.
1051
Raster. The lines forming the scanning pattern of a television system. Scanned lines comprise the active portion of a video signal displayed on a cathode-ray tube (CRT). Raw stock. Unexposed and unprocessed motion picture film; includes camera original, laboratory intermediate, duplicating, and release-print stocks. Real time. The instantaneous response of a computer or device to instructions: the normal viewing time of any film or videotape program. Reciprocity law. Expressed by (H) = Et, where E is light intensity, and t is time. When E or t are varied to the extreme, an unsatisfactory exposure can result. Recommended practice. An SMPTE recommendation specifying good technical practice for some aspect of film or television. Reduction printing. Making a copy of a film original on smaller format raw stock by optical printing; for example, printing a 35 mm original onto 16 mm stock for use in libraries, etc. Reel band. A stiff paper strip that has a string loop tie that contains the release print number, title, and reel number and is used to keep the film snug on the shipping reel. Refraction. The change of direction (deflection) of a light ray or energy wave from a straight line as it passes obliquely from one medium (such as air) to another (such as glass) in which its velocity is different. Release print. In a motion picture processing laboratory, any of numerous duplicate prints of a subject made for general theater distribution. Rem-jet backing. Antihalation backing used on certain films. Rem jet is softened and removed when processing starts.
Projection. The process of presenting a film by optical means and transmitted light for either visual or aural review, or both.
Rendering. The simulation of light on three-dimensional objects; determining an object’s surface characteristics, such as color and texture.
Projection speed. The rate at which the film moves through a projector; twenty-four frames per second is the standard for all sound films.
Resolution. The capacity of a medium to capture and playback distinctly fine details. 35 mm film is a high resolution storage medium; current videotape formats are low-to-moderate resolution media. Computers can perform at a wide range of resolutions, from the lowest to the highest, depending on hardware and software capabilities, and therefore are considered resolution independent.
Pull-down claw. The metallic finger that advances film one frame between exposure cycles. Push processing. A means of increasing the exposure index of film. R-190 Spool. A 4.940-inch outside diameter metal camera spool. Square hole has single keyway, two offset round drive holes, one elliptical hole in both flanges. Side one and side two markings. For 200-foot, 16 mm film loads. R-90 Spool. A 3.615-inch outside diameter metal camera spool. Square hole has single keyway in both flanges, and center hole is aligned on both flanges for 100-foot film loads. Used in 16 mm spool loading cameras.
Resolving power. Ability of a photographic emulsion or an optical system to reproduce fine detail in a film image and on a screen. Reticulation. The formation of a coarse, crackled surface on the emulsion coating of a film during improper processing. If some process solution is too hot or too alkaline, it may cause excessive swelling of the emulsion, and this swollen gelatin may fail to dry down as a smooth homogeneous layer.
1052
MOTION PICTURE PHOTOGRAPHY
Reversal film. Film that processes to a positive image after exposure in a camera or in a printer to produce another positive film. Reversal process. Any photographic process in which an image is produced by secondary development of the silver halide grains that remain after the latent image has been changed to silver by primary development and destroyed by a chemical bleach. In film exposed in a camera, the first developer changes the latent image to a negative silver image. This is destroyed by a bleach, and the remaining silver halide is converted to a positive image by a second developer. The bleached silver and any traces of halides may now be removed using hypo. RGB. Red, green and blue, the primary color components of the additive color system used in color television. Rotation. A camera move in which the camera is moved in a complete circle to give a spinning effect in the film. Partial rotation is called a tilt. Rough cut. Preliminary stage in film editing, in which shots, scenes, and sequences are laid out in an approximate relationship without detailed attention to the individual cutting points. Safety film. A photographic film whose base is fireresistant or slow burning by various fire codes. At the present time, the terms ‘‘safety base film,’’ ‘‘acetate base film’’ and ‘‘polyester base film’’ are synonymous with ‘‘safety film.’’ Saturation. Describes color brilliance or purity. When color film images are projected at the proper brightness and without interference from stray light, colors that appear bright, deep, rich, and undiluted are said to be ‘‘saturated.’’ Scene. A segment of a film that depicts a single situation or incident. ‘‘Scope’’. A diminutive term used to describe any anamorphic projection system or film. Scrim. A translucent material that makes hard light appear more diffuse or reduces, like a screen, the intensity of the light without changing its character.
called edge light, top light, rim light, backlight, hair light, skimmer, or kicker. Separation masters. Three separate black-and-white master positives made from one color negative; one contains the red record, another the green record, and the third the blue record. Sequence. A group of related scenes in a film that combine to tell a particular portion of the story and are usually set in the same location or time span. Set. Derived from ‘‘setting.’’ The prepared stage on which the action for three-dimensional animation takes place. A set may be as simple as a plain tabletop or as elaborate as props and decoration can make it. Sharpness. Visual sensation of the abruptness of an edge. Clarity. Short. The term usually refers to the cartoons made in the Hollywood studios during the 1930s, 1940s, and 1950s, which ran between 6 and 7 minutes long. Today, shorts range from 1 12 to more than 20 minutes in length and cover a variety of styles and subjects. Shot. An unbroken filmed segment; the basic component of a scene. Shutter. In theatrical projection, a two-bladed rotating device used to interrupt the light source while the film is being pulled down into the projector gate. One blade masks the pull-down while the other blade causes an additional light interruption, increasing the flicker frequency to 48 cycles per second, a rate that is not objectionable to the viewer at the recommended screen brightness of 16 footlamberts (55 candelas per square meter). Sibilance. Excessive amount of vocal hiss when consonants such as ‘‘s’’ are spoken. Single-frame exposure. The exposure of one frame of motion picture film at a time, like still photography. Commonly used in animation and time-lapse. Silver halide. Light-sensitive compound used in film emulsions. Single-perforation film. Film perforated along one edge.
Script. The text of a film that gives dialogue, action, staging, camera moves, etc.
Single-system sound. Sound on a magnetic or optical track that was recorded on the same strip of film on which the action was recorded.
Secam (System Electronique pour Colour Avec Memoire). The color television system developed in France and used there and in most of the former Communist-bloc countries and a few other areas, including parts of Africa.
16 mm film. Film 16 mm wide. May have single or double perforations.
Sensitivity. Degree of responsiveness of a film to light. Sensitometer. An instrument with which a photographic emulsion is given a graduated series of exposures to light of controlled spectral quality, intensity, and duration. Depending upon whether the exposures vary in brightness or duration, the instrument may be called an intensity scale or a timescale sensitometer. Separation light. A light that helps define the outline of a subject, thereby separating it from the background. Also
Skip frame. An optical printing effect eliminating selected frames of the original scene to speed up the action. Slitting. The act of cutting apart a film into narrower sizes of the required width. Slow motion. The process of photographing a subject at a faster frame rate than used in projection to expand the time element. Smpte. The Society of Motion Picture and Television Engineers.
MOTION PICTURE PHOTOGRAPHY
1053
Soft. The opposite of ‘‘hard.’’ (1) As applied to a photographic emulsion or developer that has low contrast. (2) As applied to the lighting of a set, diffuse, giving a flat scene in which the brightness difference between highlights and shadows is small.
Splicing tape. Tape designed to make overlap or butt splices without the need for film cement or a mechanical fastener. Available in a variety of sizes, with or without perforations, and can be clear, translucent, or opaque orange.
Soft light. Light made up of soft, scattered rays resulting in soft, less clearly defined shadows; also called diffuse light.
Spool. A roll that has flanges on which film is wound for general handling.
Sound drum. A flat roller in the sound head designed to keep the film precisely positioned at the point where the scanning beam slit scans the sound track. Also called the scanning drum. Sound effects (foley). Sound from a source, other than the tracks, that bears synchronized dialogue, narration, or music: sound effects commonly introduced into a master track in the rerecording step, usually to enhance the illusion of reality. Sound gate. The gate used in an optical sound head, instead of a sound drum, to keep the film sound track precisely aligned on the scanning beam slit during sound reproduction. Sound head. The optical sound reproducer mounted beneath the projector head but above the take-up reel support arm or magazine. Sound recorder. Device that may use audiotape, magnetic film, or motion picture film to record sound. Sound sprocket. Any sprocket that pulls the film past the sound scanning beam slit. Sound track. Photographic/optical sound track running lengthwise on 35 mm film adjacent to the edges of the picture frames and inside the perforations. Special effect. A term broadly applied to any of numerous results obtained in the laboratory by combining and manipulating one or more camera records to produce an imaginatively creative scene different from that in front of the main camera. Making special effects may involve techniques such as double printing, fades, mattes, and vignetting. Spectrum. Range of radiant energy within which the visible spectrum that has wavelengths from 400 to 700 mm exists. Speed. (1) Inherent sensitivity of an emulsion to light. Represented by a number derived from a film’s characteristic curve. (2) The largest lens opening (smallest f -number) at which a lens can be set. A ‘‘fast’’ lens transmits more light and has a larger opening and better optics than a ‘‘slow’’ lens. Splice. Any type of cement or mechanical fastening by which two separate lengths of film are united end to end, so that they function as a single piece of film when passing through a camera, film processing machine, or projector. Splicer. A mechanical device arranged for holding film in alignment and at the correct sprocket hole interval during the various operations required to join two pieces of film. It often includes a device for removing emulsion.
Spotlight. A lighting unit; usually has a lens and shiny metal reflector that can be focused; produces hard light. Sprocket. A toothed wheel used to transport perforated motion picture film. Static electricity. Electric field that is present primarily due to the presence of electrical charges on materials. Stereophonic. Sound recording and reproduction using multiple microphones and speakers, each of which has its own separate track; designed to simulate the actual sound and to achieve a three-dimensional effect. Stock. General term for motion picture film, particularly before exposure. Stop. The relationship between the focal length of a lens and the effective diameter of its aperture. An adjustable iris diaphragm permits using any ordinary photographic lens at any stop within its range. Sometimes used synonymously with f -number as in ‘‘f -stop’’. A unit of exposure change. Stop motion. An animation method whereby apparent motion of objects is obtained on the film by exposing single frames and moving the object to simulate continuous motion. Storyboard. A series of small consecutive drawings accompanied by captionlike descriptions of the action and sound, which are arranged comic strip fashion and used to plan a film. The drawings are frequently tacked to corkboards so that individual drawings can be added or changed in the course of development. The technique, invented at the Disney studio, is now widely used for live action films and commercials, as well as animation. Stripe, magnetic. Narrow band(s) of magnetic oxide usually coated toward the edges of the base side of motion picture film for accepting audio signal recordings in the form of magnetic impulses. Subtractive color. The formation of colors by removing selected portions of the white light spectrum by transparent filters or dye images. Super panavision. Similar to Panavision 35, but photographed flat in 65 mm. The 70 mm prints produce an aspect ratio of 2.25 : 1 using four-channel sound and a ratio of 2 : 1 using six channel sound. Superscope. A 35 mm anamorphic release print system adopted by RKO Radio Pictures that produced a screen image whose aspect ratio was 2 : 1 or 2.35 : 1 when projected by a normal anamorphic lens. The original camera negative was photographed flat, but special printing produced the anamorphic print.
1054
MOTION PICTURE PHOTOGRAPHY
Surround speakers. Speakers placed at the sides and at the rear of an auditorium to increase the realism of a stereophonic presentation or to produce other special effects. Sweetening. Audio postproduction, at which time audio problems are corrected. Music, narration, and sound effects are mixed with original sound elements. Swell. The increase in motion picture film dimensions caused by the absorption of moisture during storage and use in high humidity. Extreme humidity and subsequent swelling of the film increase the susceptibility of the film surfaces to abrasion. Synchronizer. A mechanism employing a common rotary shaft that has sprockets which, by engaging perforations in the film, pass corresponding lengths of picture and sound films simultaneously, thus effectively keeping the two (or more) films in synchrony during editing. Sync pulse. Inaudible timing reference recorded on the magnetic tape used in double-system recording. The source can be a generator in the camera cabled to the tape recorder or an oscillating crystal in the recorder when the camera also has a crystal. When the sound is transferred to magnetic film for editing, a resolver reads the reference and ensures that the tape runs at the same speed as that during shooting. In this way, the magnetic work print can be placed in sync with the images for which the original sound was recorded. Synchronization. A picture record and a sound record are said to be ‘‘in sync’’ when they are placed relative to each other on a release print, so that, the action will coincide precisely with the accompanying sound when they are projected. T-grain emulsion. Emulsion made up of tabletlike crystals, rather than conventional silver halide crystals; produces fine-grain high-speed films. Tail ends, tails. The end of a film. The film must be rewound before projection if it is tails out. Take. When a particular scene is repeated and photographed more than once to get a perfect recording of some special action, each photographic record of the scene or of a repetition of the scene is known as a ‘‘take.’’ For example, the seventh scene of a particular sequence might be photographed three times, and the resulting records would be called Scene 7, Take 1; Scene 7, Take 2; and Scene 7, Take 3. Tape splice. Film splice made using special splicing tape applied to both sides of the film. Technirama. A 70 mm release print technique developed by Technicolor and printed from a horizontal doubleframe 35 mm negative using 1.5 : 1 horizontal compression. The 35 mm reduction print image had a 1.33 : 1 compression ratio and produced a 2 : 1 aspect ratio when projected on equipment designed for CinemaScope. Some 70 mm prints were also available for showing at a 2.2 : 1 ratio.
Techniscope. A system designed to produce 35 mm anamorphic prints from a 35 mm negative whose images are approximately one-half the height of regular negative images and are produced by using a special onehalf frame (two-perforation) pull-down camera. During printing, the negative image was blown up to normal height and squeezed to normal print image width to produce a regular anamorphic print that provided a projected aspect ratio of 2.35 : 1. The system was designed primarily to conserve negative raw stock. Telecine. A device for scanning motion picture film images and converting them to standard videotape. 35 mm film. Film 35 mm wide that has four perforations on both edges of each frame. Image frame and sound-track area lie inside the perforations. Thread. To place a length of film through an assigned path in a projector, camera, or other film handling device. Also called lacing. 3-D. The common term applied to three-dimensional (stereoscopic) images projected on a screen or viewed as a print. There have been several systems shown in theaters, but the discomfort attributed to the necessary eyewear, along with other equipment limitations has, more or less, relegated the present systems to the status of novelties. Throw. In theatrical projection, the distance from the projector aperture to the center of the screen. Tight wind. Relating to film wound tightly on a core or reel to form a firm roll that can be handled and shipped safely without danger of cinch marks or other damage to the film. Time-lapse movie. A movie that shows in a few minutes or a few seconds, events that take hours or even days to occur; accomplished by exposing single frames of film at fixed intervals. Timing. A laboratory process that involves balancing the color of a film to achieve consistency from scene to scene. Also includes adjusting exposure settings in duplication. Todd-AO. A flat, 70 mm print system developed by Magna Pictures Corporation and American Optical Company to produce a 2.2 : 1 screen image of high resolution, sharpness, and brightness. The print was made from a 65 mm negative exposed in a specially designed camera. The extra width of the print film was intended to provide room for the six magnetic sound tracks contained on four magnetic oxide stripes. Todd-AO, considered the first commercially successful 70 mm film system, was introduced in 1955 with the release of Oklahoma. Toe. Bottom portion of the characteristic curve, where slope increases gradually with constant changes in exposure. Tone. That degree of lightness or darkness in any given area of a print; also referred to a value. Cold tones (bluish) and warm tones (reddish) refer to the color of the image in both black-and-white and color photographs.
MOTION PICTURE PHOTOGRAPHY
Trailer. A length of film usually found on the end of each release print reel identifying subject, part, or reel number and containing several feet of projection leader. Also, a short roll of film containing coming attractions or other messages of interest. Transition. The passage from one episodic part to another. Usually, film transitions are accomplished rapidly and smoothly, without loss of audience orientation, and are consistent with the established mood of the film. Travelling matte. A process shot in which foreground action is superimposed on a separately photographed background by optical printing. Triangle. A three-sided framework of wood or metal, designed to hold the three points of a tripod to limit their spread.
1055
Unsqueezed print. A print in which the distorted image of an anamorphic negative has not been corrected for normal projection. Valve rollers. A cluster of three or four small rollers located at the entrances of the film magazines and designed to prevent fire from reaching the film reels. Because nitrate prints are now quite rare and are actually unlawful to use in some areas of the United States, the use of valve rollers, or fire rollers, is no longer essential on domestic projection equipment. Variable-area sound track. Photographic sound track consisting of one or more variable-width transparent lines that run the length of a motion picture film within the prescribed sound track area. The most common type of track.
Trims. Manual printer controls used for overall color correction. Also, unused portions of shots taken for a film; usually kept until the production is complete.
Variable-density sound track. Photographic sound track that has a constant width but varies in density along the length of a motion picture film within the prescribed sound track area. No longer used in motion picture productions.
Truck. A camera move in which the camera seems to move toward (truck in) or away from (truck out) the subject. The same effect in live action filmmaking is called zoom-in.
Viewer. A mechanical and optical device designed to permit examining an enlarged image of a motion picture film during editing.
Tungsten light. Light produced by an electrically heated filament that has a continuous spectral distribution. Type C. The SMPTE standard for the 1-inch nonsegmented helical videotape recording format. Type K Core. 3-inch outside diameter, 1-inch inside diameter plastic core. Used with 1,000, 2,000, 3,000, and 4,000-foot lengths of negative, sound, and print films.
Vignetting. The partial masking or blocking of peripheral light rays either by intent or by accident. In theatrical projection, the blockage of peripheral light rays in a projection lens due to a lens barrel that is too long or to a lamphouse optical system that is not correctly matched to the limiting aperture of the projection lens. In photography, the intentional masking of peripheral light rays to soften and enhance a photograph.
Type T Core. 2-inch outside diameter, 1-inch inside diameter plastic core. Used with most 16 mm films up to 400 feet.
VITC (vertical interval time code). Time code that is recorded in the vertical blanking interval above the active picture area. Can be read from videotape in the ‘‘still’’ mode.
Type U Core. 2-inch outside diameter, 1-inch inside diameter plastic core. Used with various length camera negative, sound, and print films.
Voice-over-narration. A sound and picture shot relationship in which a narrator’s voice accompanies picture action.
Type Y Core. Similar to K but of heavier construction. Used for most color print films.
Weave. Periodic sideways movement of the image as a result of mechanical faults in camera, printer, or projector.
Type Z Core. 3-inch outside diameter, 1-inch inside diameter plastic core. Used for camera and print films in rolls longer than 400 feet.
Wet-gate printer. Printer in which the film passes through fluid-filled pads just before exposure. Released fluid temporarily fills film scratches with a solution that has the same refractive index as the film base, thereby eliminating scratch refraction and ensuring that the scratches will not appear on the printed film.
Ultrasonic cleaner. Device that transfers ultrasonic sound waves to a cleaning liquid or solvent that dislodges embedded dirt on objects immersed in it. Ultraviolet light. Energy produced by the (invisible) part of the electromagnetic spectrum whose wavelengths are 100 to 400 nanometers. Popularly known as ‘‘black light.’’ UV radiation produces fluorescence in many materials. Universal leader. A film projection leader, designed for the current projection rate of 24 frames per second (1 1/2 feet per second) and recommended for use on all release prints. It was designed to replace the Academy leader originally conceived when the motion picture projection rate was 16 frames per second.
Wide screen. General term for form of film presentation in which the picture shown has an aspect ratio greater than 1.33 : 1. Wild. Picture or sound shot without synchronous relationship to the other. Winding. Designation of the relationship of perforation and emulsion position for film as it leaves a spool or core. Wipe. Optical transition effect in which one image is replaced by another at a boundary edge moving in a selected pattern across the frame.
1056
MOTION PICTURE PHOTOGRAPHY
Work print. Any picture or sound track print, usually a positive, intended for use in editing to establish the finished version of a film through a series of trial cuttings. The purpose is to preserve the original intact (and undamaged) until the cutting points have been established. Xenon arc. A short arc contained in a quartz envelope in which dc current, flowing from the cathode to the anode, forms an arc in a positive (high-pressure) atmosphere of xenon gas. The spectral distribution in the visible range closely resembles that of natural daylight. Yellow. Minus-blue subtractive primary used in the three-color process.
Zero-frame reference mark. Dot that identifies the frame directly below as the zero-frame specified by both the human-readable key number and the machine-readable bar code. Zoom-in. Continuous change of the camera lens focal length, which gradually narrows down the area of the picture being photographed and gives the effect of continuously enlarging the subject. Zoom-out. Continuous change of the camera lens focal length, which gradually enlarges the area being photographed, and gives the effect of a continuously diminishing subject.
N NEUTRON IMAGING, RADIOGRAPHY, AND CT
literature for imaging techniques are: (1–5) and for radiography (6,7). Techniques for (non imaging) neutron detection have also recently been reviewed (8), and an excellent treatise on all neutron detectors is given in 9).
GRAHAM C. SMITH Instrumentation Division Brookhaven National Laboratory Upton, NY
Neutron Interaction Mechanisms The creation of free charge carriers is the primary process upon which most radiation detectors rely for their operation. Neutrons have no net charge and can be detected either by direct collision with, and displacement of, nuclei, or by nuclear interactions. For thermal neutrons, there is insufficient energy available for collision displacements, and nuclear reactions are the dominant process in neutron detection methods. Giving definitions for cold, thermal and epithermal neutrons, Fig. 1 shows the neutron absorption cross section versus energy for the most important capture reactions, which can be loosely categorized in three groups:
INTRODUCTION The discovery of the neutron occurred in 1932, and the first neutron diffraction experiments were performed in the late 1940s. As usage progressively increased, the neutron has become a key probe in many fields of science, and sources around the world are now oversubscribed by experimenters. The European Spallation Source (ESS), and the Spallation Neutron Source (SNS), in the United State are new, high-intensity sources under development that attest to the importance of neutrons as a probe. Because they are uncharged, neutrons are one of the most difficult particles to detect. Only a few elements in nature have a high affinity for neutron interaction. This article is organized as follows. In the next section, the fundamental reactions upon which all neutron detection relies are described. Then a series of sections describe the main techniques of neutron detection and imaging: gasbased, scintillator based, semiconductor-based, film and image plates, and microchannel plates. In these sections, a few applications are noted. The section next to last focuses on neutron radiography/tomography, for which scintillator detectors and image plates are the primary imaging techniques. Some important reviews from recent
1. Reactions whose cross sections decrease as the square root of the neutron energy: 3
He + 1 n → 3 H + 1 H + 0.764 MeV
6
Li + 1 n → 4 He + 3 H + 4.8 MeV
10
B + 1 n → 7 Li + 4 He + 2.3 MeV + 0.48 MeV(γ )
2. Reactions that are (n, γ ) resonances in which γ -ray emission is inhibited and energy is transferred to the orbital electrons. This class of reactions gives rise to complex γ -ray and conversion electron spectra (10):
λn (Å) 9.1
1.8
0.29
107 106 105
155
157
Gd
3He
106
Gd
113
105 Cd
6Li
104 10B
3
10
235
103 U
Thermal
101
102
100 10−1 10−1
100
Epithermal
102
Cold
s (barns)
104
107
101
102
103
101 100
104
105
En (meV)
1057
10−1 106
Figure 1. Cross sections for the most important neutron capture reactions versus neutron energy.
NEUTRON IMAGING, RADIOGRAPHY, AND CT 155
Gd + 1 n → 156 Gd∗ → 156 Gd + (γ + conversion e’s; 7.9 MeV) Main electron energies are 39 and 81 keV.
157
Gd + 1 n → 158 Gd∗ → 158 Gd + (γ s + conversion e’s; 8.5 MeV) Main electron energies are 29, 71, 78, and 131 keV.
3. Fission of 235 U or 239 Pu: U + 1 n → fission fragments + 80 MeV Pu + 1 n → fission fragments + 80 MeV.
235 239
A common feature of the reactions in (1) is that the products are ejected colinearly, and generally give rise to ionizations within several micrometers (solid) or millimeters (gas) of the reaction point; the range in lithium is about 100 µm. The (n, γ ) reactions in (2) have a high probability of internal conversion, the conversion electrons ejected are much more easily stopped than γ rays and provide much more accurate positional information. Although 113 Cd has a high neutron capture cross section (Fig. 1), it has not been included in (2) because it has a low probability of internal conversion (about 4% of that for Gd) and accurate positional information is difficult to obtain. The reactions in (3) are based on nuclear fission. Their energies are very high, but the relatively low cross sections and difficulties in isotope enrichment have resulted in little use of these reactions. A compendium of neutron cross sections for many elements in the periodic table has been published in (11). Two of the most important characteristics of a neutron imaging device are positional resolution and efficiency. The (useful) isotopes in (1) and (2) before are shown in Table 1 in their most common physical forms. Information pertaining to reaction product energies and ranges, and absorp˚ is also tion depth for neutrons of energy 0.025 eV (1.8 A) given. These numerical values are most valuable in determining the isotope’s use in position-sensitive detectors. DETECTORS USING GAS FILLING
years, 3 He has become the converter gas of choice because BF3 is not an ideal gas in which to multiply electrons, and even enriched in 10 B, the absorption cross section is significantly smaller than that of 3 He. The gas depth in a thermal neutron proportional chamber is typically 10–15 mm. This is a compromise between achieving high efficiency and minimizing positional error due to parallax for neutrons incident away from the normal to the detector. Figure 2 shows the calculated detection efficiency at 15 mm for a range of gas pressures. Extremely high detection efficiencies can be obtained for cold neutrons. Ultimately, the major factor limiting positional resolution for neutron conversion in 3 He is the range of the proton and triton. Figure 3 shows the relative dE/dx profile of the proton and triton as a function of distance from the interaction point. The proton takes three-quarters of the reaction energy and also deposits a disproportionate quantity of ionization at the end of its track. The center of gravity of the resulting primary ionization is weighted part way along the proton track, about 0.4 Rp from the interaction point, where Rp is the range of the proton. A key parameter in obtaining high resolution is to ensure that the proton and triton ranges are kept small. This is normally achieved by using a heavier gas such as propane, (C3 H8 ), or carbon tetra-fluoride, (CF4 ), as an admixture with the 3 He. These gases also act as a quench for the electron avalanche. 100 3
m
Conversion efficiency (%)
1058
80
60
at
He 3 He e m 3H at 6 atm 3 He 4 tm a 3 3 He tm 2a
3
tm
He
1a
40
3 He (n, p) 3H Detector depth = 1.5 cm
20
0
Before gas-filled imaging detectors were developed, the usual component in single-wire proportional counters for neutron detection was boron trifluoride (BF3 ). In recent
8
0
2
4 6 Neutron wavelength (Å)
Isotope
State
σn a (barns)
Neutron absn . Length
Particle energies (keV)
Approximate Particle Range
3 He
Gas Solid Solid Gas Solid Solid
5,330 940 3,840 3,840 49,000 254,000
70 mm.atm 230 µm 20 µm 97 mm.atm 6.7 µm 1.3 µm
p: 573 t: 191 t: 2,727 α: 2,055 α: 1,472 7 Li: 840 α: 1,472 7 Li: 840 Convn electrons: ∼30-200 Convn electrons: ∼30-200
3.8 mm.atm C3 H8 130 µm 3 µm 4.2 mm.atm 12 µm 12 µm
10 B 10 BF
3
Nat. Gd 157 Gd
10
Figure 2. Calculated thermal neutron conversion efficiency for a range of 3 He partial pressures.
Table 1. The Most Useful Neutron Absorbing Isotopes and the Key Features of the Neutron Interactions at 25 meV
6 Li
8
a Cross sections, σn , are tabulated to the nearest 10 barns, and to the nearest 1,000 barns for Gd. This helps emphasize the huge stopping power of Gd relative to the other isotopes.
NEUTRON IMAGING, RADIOGRAPHY, AND CT
Spherical shell distribution
1059
5
P Neutron resolution (FWHM) (mm)
n Ionization centroid
H3
Rectangular distribution of measured centroids
dn i 3H(191
1
p (573 keV)
dx
keV)
0.1 Ionization loss
X0
1
10 Propane pressure (atm. abs.)
Figure 4. Positional resolution in 3 He + C3 H8 as a function of absolute propane pressure.
X Ionization centroid
Y coordinates can be found from the cross-wire cathodes by resistive charge division. Refinement of charge division schemes has led to very good signal-to-noise ratios (14,15), so that the limit to positional resolution is solely due to the range of the reaction products shown in Fig. 3. Gas proportional chambers can be fabricated in a wide range of sizes and configurations, a feature that has made them particularly popular in imaging experiments at the world’s reactor and spallation sources. A report in (16) detailing detectors at these installations indicated that nearly all were some form of gas detector. Figures 6 and 7 show two detectors from Brookhaven National Laboratory at opposite extremes of the size scale. The first is a planar detector whose sensitive area is 5 × 5 cm, and the second is a curved detector whose sensitive area is 1.5 × 20 cm. Gas-filled detectors are extraordinarily stable for long periods of time when fabricated correctly, and possess very good absolute count rate repeatability. These
Figure 3. Representation of the sphere of centroids from many conversions. The ionization centroid is displaced from the conversion point by about 0.4 × Rp , the proton range.
Figure 4 shows the manner in which propane improves the positional resolution as a function of its pressure (12). Multiwire Proportional Chambers (MWPCs) The concept of the multiwire chamber was born over 30 years ago (13), but its application in a variety of radiation detection applications is ubiquitous even today; neutron detection is no exception. Figure 5 shows schematically a typical 3 He filled proportional chamber. Neutrons convert in the gas in the absorption region. The resulting primary ionization then drifts to the anode plane to undergo multiplication. Positional information in X and
n 3H
3He
P
Window Absorption and drift region
e
Upper cathode wires configured into readout nodes with resistive charge division in same manner as lower cathode
Upper cathode wires Anode wires Lower cathode
Readout nodes
To analog shaping amplifiers, pulse stretchers, and sequential switching into centroid finding filter
Figure 5. Side view of proportional chamber that has position readout by resistive charge division.
1060
NEUTRON IMAGING, RADIOGRAPHY, AND CT
types of neutron detector. They comprise a cylindrical tube (of the order 1 cm in diameter) and have a central anode (which can be 1 m or longer), filled with the same gas mixture as the MWPC. An event’s position is determined from resistive charge division along the complete length of the anode, using charge amplifiers at each end to measure the signal level (19). Although the readout is relatively simple, good positional resolution cannot be obtained from long counters (in contrast to the subdivided readout described for MWPCs), the counters have variable efficiency due to their circular cross section, and generally they have modest counting rate capability. In arrays, they can be used to cover large areas but generally are less superior to a MWPC that has a uniformly thick gas volume. Microstrip Gas Chambers (MSGCs)
Figure 6. This 5 × 5 cm detector has the world’s highest neutron resolution in gas, <400 µm FWHM.
features permit experiments that would otherwise be almost impossible to carry out. In a study of crystalline domains (17), a nonionic detergent–water mixture was investigated at constant temperature by time-resolved neutron scattering. Structural oscillations were observed at 5-min intervals by recording the Bragg peaks that speckle the Debye–Scherrer ring. One of the many time frames recorded during several days is shown in Fig. 8. It is the gas detector’s ability to record the time dependence of the intensity and position that makes a study of this nature unique. Recent examples of gas detector results in structural biology are reported in (18). Discrete Element Counters Single-wire proportional counters (from which the MWPC ultimately was derived) constitute one of the most basic
These were developed originally at the Institute Laue Langevin (ILL), Grenoble (20). In the MSGC, the anode and cathode wires of a MWPC are replaced by metallic strips on a glass or plastic substrate. These electrodes can be registered with high precision by photolithography, potentially leading to more automated fabrication processes, better positional resolution across the anodes, and higher count rates. A section of an anode bounded on each side by a cathode is shown in Fig. 9. The anode, typically about 10 µm wide, has a field in excess of 3 × 105 V/cm at its surface. Detectors based on the MSGC technology have been developed extensively for neutron detection at ILL (21). Figure 10 shows one arrangement of a detector. The Micro-Strip (MS) plate is made from Schott S8900 glass, a material that offers long-term resistance to charging problems. The front side, made of a 1,500 A˚ chromium layer, has a pitch of 1 mm, anodes of 12 µm, and cathodes of 500 µm. Induced signals on these cathodes determine one positional coordinate of an event. On the rear side, 86 chromium electrodes, 960 µm wide, are placed perpendicularly to the front structure to determine the second positional coordinate.
Figure 7. Curved 120° detector for protein crystallography; sensitive area is 1.5 × 20 cm. See color insert.
NEUTRON IMAGING, RADIOGRAPHY, AND CT
1061
(a)
1.5
2
2.5
2
2.5
3 14.5
2.5 8
Cathode
Anode
200
150
100
500 µm
250 µm
50
Substratum Figure 9. Cathode and anode strips in an MSGC. Field lines indicate an increased electric field at the anode surface.
Aluminum cover
5
(b)
Vertical position (chn)
Figure 8. Small-angle scattering image, detailing many tiny Bragg peaks in a Debye–Scherrer ring.
5
21
235 Horizontal position (chn)
Figure 11. (a) Cadmium mask, dimensions in mm; (b) detector image of mask using thermal neutrons.
Window
10 mm
Diaphragm Cathodes Anodes 5 mm Front side
Reaction gap 15 mm
3 bar 3He + 1,5 bar CF4
Substrate Rear side electrodes Isolator
Stainless steel DN200CF flange Figure 10. Schematic arrangement of the MSGC detector.
Detectors from ILL have used CF4 as the stopping gas, and the positional resolution achievable is comparable
to that using C3 H8 . Figure 11 is the image of a cadmium mask. The timing resolution of single wire counters, MWPCs, and MSGCs is typically of the order of 1 µs. This is primarily due to the uncertainty in the neutron conversion point in several millimeters of gas, combined with the slow neutron velocity. MSGCs using Gd/CsI Converters This principle is based on the original work of (22). The technique is being applied in a device that has a Gd foil in the center, a layer of CsI on each side of the foil, and a low-pressure MSGC on both sides to detect the secondary electrons emitted from the CsI (23,24). The main features of the device, illustrated in Fig. 12, are as follows. Neutrons convert in a thin Gd foil in the center of the device. Conversion electrons escape into a layer of CsI on both sides of the foil, and further secondary
1062
NEUTRON IMAGING, RADIOGRAPHY, AND CT
Support foil 157Gd
157Gd
( 2 * 0.5 - 1.5 µm)
CsI
CsI
( 2 * 100 - 200 nm)
(n, e−) - converter
− − − −− −−−− − − − −− − − − −− − −− − − − −− G − − −−−− − − 2 − −− − G1 − −− −− − −− − − − −− −− − − −− −−− −−−−− G2 − −−−−−− − −−−− − −−−− − −− −− − −− − −− −
n
FWHM (diffusion)
E
E
−− ++ + −−−− − G1 +++ −−−
e− +++++ +
4.5 mm
−− −−−−− −−−
4.5 mm
Support foil
570 mm
635 µm
MSGC
Figure 12. Principle of proposal for neutron imaging using Gd/CsI and low-pressure operation of MSGCs.
0 57
157Gd
mm
Anodes
CsI
Table 2. Characteristics of Solid-State Scintillators Material : Ce3+ : Eu2+ 6 LiF/ZnS : Ag+ 6 Li glass : Tb3+ Gd2 SiO5 : Ce3+ 6 Li-glass 6 LiI
Li Concentration (cm−3 ) 2 × 1022 1.8 × 1023 1.2 × 1022 3 × 1022 –
electrons are emitted from this because a high electric field is present. These electrons drift across a low-pressure gas region to two-dimensional MSGCs on each side. This device has inherent resolution of less than 100 µm, high count rate capability, and excellent timing. No working devices have yet been reported. The timing resolution of this device is expected to be at least an order of magnitude smaller than the conventional gas detector because neutron conversion takes place in a very thin foil.
SCINTILLATOR DETECTORS Solid-state scintillators are about three orders of magnitude higher in density than gases at atmospheric pressure and generally have much greater stopping power per unit
Photons/ Absorbed neutron
Decay Time (ns)
λ (nm)
6,000–7,000 51,000 160,000 45,000 800
75 1.4 × 103 103 3 × 106 56/600
395 470 450 550 430
volume. 6 Li based materials are by far the most common. Table 2 gives the characteristics of the main solid-state scintillators (25). Anger Cameras Originally conceived for X rays, the Anger camera (26) became an attractive option for neutrons through developments at Argonne National Laboratory (27). It consists of a single sheet of scintillator material, an intermediate layer to disperse the light, and an array of photomultiplier (PM) tubes (Fig. 13). Neutron conversion in the scintillator creates light that is seen by a group of neighboring PM tubes, and a center of gravity calculation is performed from the amplitude information from the respective tubes. The most commonly used scintillators
NEUTRON IMAGING, RADIOGRAPHY, AND CT
Reflector n Glass scintillator
Air gap Light disperser
PM
PM
PM
Figure 13. Schematic of the Anger camera.
are those based on 6 Li-glass : Ce3+ , available from vendors such as Bicron. Kurz (28) describes further work on Anger cameras, and Schafer (29) describes linear and area detectors read out both as Anger cameras and using position-sensitive photmultipliers. These devices achieve relatively high counting rates and have good timing because of the thin layers. A major disadvantage is a relatively high gamma sensitivity. A device using 6 Li glass and a microchannel plate will be described later.
1063
among three fiber-optic channels and coupled uniquely to a combination of three PM tubes. The major advantage is a reduced number of PM tubes required, compared with the number of resolution elements. This technique has been further developed at the spallation facility ISIS, located at the Rutherford Appleton Laboratory. Initially used with linear detector arrays using 2 Cn coding, devices have now been developed using 4 Cn coding that yield two-dimensional information. A recent example at ISIS of a two-dimensional device based on this coding principle (31) is a detector designed for a new diffractometer, where the requirement was for a square detector of dimensions 500 × 500 mm and a pixel size of the order of 10 × 10 mm. This detector was designed as four modules; each module has an active surface area of 348 × 144 mm. A single module consists of a scintillator of ZnS/6 Li coupled by fiber optics to an array of photomultiplier tubes. The scintillator is located in front of an aluminized MylarTM , light guide grid that divides the module into 348 pixels, arranged on a 29 × 12 array, and pixel size is 12 × 12 mm. Figure 15 shows a single module from which the lighttight cover has been removed, and in which the scintillator has been removed (it is lying on the floor) to reveal the pixel array. Each pixel is viewed by sixteen
Scintillators Using Fiber-Optic Coding This technique was first demonstrated about 20 years ago (30). The principle is shown in Fig. 14. The light produced by neutron capture in the scintillator is divided
1
2
3
4
Glass scintillation elements
Optic fibers
PM 1
PM 2
PM 3
PM 4
PM 5
Electronic decoder Figure 14. Fiber-optic coding of a discrete-element scintillator detector.
Figure 15. One module of the coded scintillator/fiber-optic device. Pixel array is 29 × 12. One PM has been removed, and its fibers illuminated (from rear). Beginning at the bottom of the right-hand column, the first pixel (lit) contains fibers from PMs 1, 2, 3 and 4; the next one up (dark) from PMs 1, 2, 3 and 5, all the way (all dark) to the ninth pixel up, PMs 1, 2, 3 and 12. The tenth pixel up (lit) contains fibers from PMs 1, 2, 4 and 5, the eleventh (lit) PMs 1, 2, 4, and 6; and so on. Thus, PM 4 is illuminated.
1064
NEUTRON IMAGING, RADIOGRAPHY, AND CT Table 3. Detector Characteristics of Two-Dimensional Fiber-Coded Scintillator Detector 348 × 144 mm2 348 12 × 12 mm2 5,568 1.5 mm 4C n
1.5-mm diameter fiber-optic light guides. These fibers are optically bonded to PM tubes at the rear of the module, using a 4 Cn code. (Every pixel is actually a 2 × 2 array of square cells, each 6 × 6 mm2 . Centered on each cell are four fibers, and corresponding fibers from each cell join at the appropriate PM tube). For example, the first pixel is coupled to PMs 1, 2, 3 and 4; the second pixel is coupled to PMs 1, 2, 3, and 5; and so on. In this way, all 348 pixels of a module are coded by just 12 PMs. In Fig. 15, one photomultiplier tube has been removed, and its fiber optic bundle illuminated, thereby highlighting those fibers that feed that particular PM. Outputs from the photomultiplier tubes are amplified and registered by discriminators, whose outputs are fed to a decoder. This EPROM generates the appropriate binary address for each unique set of four coincident inputs. After time stamping, a histogramming memory is updated. Table 3 shows the major characteristics of this device. There is an interesting development from a new material based on Li/Gd borate (32). Initial measurements on small samples indicate a pulse height distribution for neutrons that is much more spectroscopic than existing scintillators and offers better n,γ discrimination. No imaging developments have yet been reported.
# of photomultipliers Diameter Neutron efficiency at 1 A˚ Gamma sensitivity/60 Co Background cts (cm−2 h−1 ) Pulse pair resolution
12 51 mm 20% <10−8 0.7 2.5 µs
38
1Å total
34 30 Detector efficiency (%)
Active surface # of pixels Pixel dimension # of fibers/module Fiber diameter Fiber code
26 1Å backscattering 22 18 1Å transmission
1Å total
14 10
1Å transmission
1Å backscatte
ring
6 2 2
4
6
8
10
12
14
16
18
Foil thickness (µm) Figure 17. Detection efficiency as a function of foil thickness (solid line: 157 Gd; dashed line: nat Gd).
SEMICONDUCTOR DETECTORS
t
Neutron beam
Charged particle detector
Neutron detector Foil (Gd, 10B, 6Li) Figure 16. Principle of foil converters in transmission mode.
The use of semiconductors in neutron detection generally requires converting the neutron in a foil. The charged particle that may escape from the foil is sensed by the semiconductor. Foils can be used in transmission, as illustrated in Fig. 16, in backscatter mode, or both. However, this technique is a delicate and somewhat nature-dependent compromise: the foil should be sufficiently thick for high conversion efficiency, yet thin enough to allow secondary particles to escape. Figure 17 illustrates the efficiencies of 157 Gd and nat Gd for 1 A˚ neutrons, as a function of foil thickness. Efficiency in this instance means that the probability of an escaping electron is unity. It should be noted that the foils are very thin and the efficiencies are not very high. (It is clear from this diagram why the Gd foil in the MSGC/Gd detector described previously is so thin.). These detectors generally have very good timing resolution because the secondary emission and its solid-state detection is extremely prompt. Silicon Semiconductor Using 6 LiF converter In this approach (33), a layer of LiF is attached to a reverse-biased Si diode. The reaction products are an
0.4 0.2
Intensity (a.u.)
0
0.0
Neutrons SiO2
P+
P+ e−
n+
+V
Al Gd
Figure 18. Principle of the Gd converter and silicon strip detector.
10 ) (k eV
0
0
E
0
ne
rg
y
20
1
(mm
Figure 20. Positional response as a function of neutron energy.
monochromatic neutrons, positional response as a function of energy was also determined, and a small dependence upon neutron energy is observed. It is likely that γ -ray background plays a limiting role in these measurements. The timing resolution of semiconductor-based detectors should be of the order of a few nanoseconds. FILM AND IMAGE PLATES Film Thermal neutrons can be imaged at high resolution by photographic film. Diffraction experiments using film were carried out many years ago (35), where indium was the neutron-absorbing medium. Now, the secondary foil is usually Gd. An alternative approach is to use a scintillator, such as 6 LiF/ZnS. A system has been described (36) in which film is sandwiched between two such screens, and a resolution of 100 µm was obtained. Although film consists of silver halide grains of the order of 5 µm, this intrinsic resolution is rarely achieved because the range of the electron from the foil is typically longer. Film, such as the image plate (next section), has the advantage of low cost, ability to cover large areas, flexibility, and good resolution. The disadvantages are no timing information, no sensitivity to single neutrons, and the inconvenience of a two-step readout process. Photographic film has a much smaller dynamic range, about 100 : 1, compared to image plates. Image Plates
Si sensor 20 × 20 mm
Capacitor chips 16.7 × 6.6 mm
on
)
A Si microstrip detector, a thin wafer of fully depleted silicon using readout strips on one face that permit positional determination, has been coupled to a Gd converter (34), as shown in Fig. 18. In recent work on this type of instrument, a doublesided microstrip silicon sensor of area 20 × 20 mm2 was used, as shown in Fig. 19. Signals from the 400-µm pitch strips are fed to low-noise, charge-sensitive preamplifier/shaping amplifier (VA) chips, the channel that has the largest signal is representative of the positional of a specific event. The neutron converter was a 250-µm thick natural Gd foil mounted in contact with the back-plane of the sensor. The less energetic line of the conversion electron spectra following neutron capture in Gd is observed at about 70 keV, creating about 2 × 104 electrons in the silicon. This is relatively easily detectable by the electronics. Figure 20 shows the positional response, which was measured by scanning a beam of neutrons from the TRIGA reactor, Italy, across the detector. By selecting
n
2
lati
Silicon/Gd Microstrip
Al
0
3 Tra ns
30
alpha, which has a range of only a few micrometers, and a triton, which has several times the alpha range. Even if only half of its 2.7 MeV is dissipated in the silicon, 3.8 × 105 electron/hole pairs are created, a sizeable signal. Such a LiF/Si device was made into a linear positional-sensitive detector by fabricating one of the two main electrodes as 24 strips of 1.25-mm spacing, and connecting them to an RC line. This application of charge division permits positional determination, and a measured spatial resolution of 1.25 mm has been obtained (33). The relatively thin converter layers (16 µm) yield less than 10% neutron absorption probability; further work is necessary to render this technique more practical.
1065
0.6
NEUTRON IMAGING, RADIOGRAPHY, AND CT
VA chips 1 × 3 mm
Figure 19. Silicon microstrip detector readout setup.
Image plates (IPs) were developed for diagnostic radiology during the early 1980s (37). They were soon used widely in X-ray protein crystallography (38). An IP is about 0.5 mm thick and is composed of a flexible plastic backing coated with fine storage phosphor crystals (BaFBr : Eu2+ ) combined with an organic binder. The phosphor can store part of the energy released when ionizing radiation converts in the plate. When the plate is later stimulated by visible light, photostimulated luminescence (PSL) is emitted, whose intensity is proportional to the neutron dose. Essentially the IP is a form of electronic film, in which
1066
NEUTRON IMAGING, RADIOGRAPHY, AND CT
incoming radiation creates an integrated latent image that is subsequently read out. The image is retrieved by illuminating the plate by laser light (HeNe); this produces PSL, or characteristic fluorescent radiation, from Eu2+ . The number of photons emitted is proportional to the converted incident radiation. A convenient method for the readout is a raster scan using a laser beam, simultaneously feeding the fluorescent radiation through a fiber-optic cable to a photomultiplier. IPs have a resolution less than 100 µm and a dynamic range much greater than that of film, typically 105 : 1. These are impressive characteristics. Figure 21 illustrates the readout process for an IP. Gadolinium is a good candidate to make an IP sensitive to neutrons. The strong (n, γ ) resonance produces a wide range of keV γ rays and a cascade of conversion electrons. Both can stimulate the phosphor. Conversion can be done either by placing a thin plate containing gadolinium in
MICROCHANNEL PLATES Microchannel plates (MCPs) are vacuum imaging devices for UV/X-ray photons and charged particles (42). A limited amount of work has been carried out to determine their viability as direct detectors of neutrons. A first approach involved measurements using MCPs made from lead oxide glass containing lithium (43), where ionization from the emitted α particle is detected. Only a tiny efficiency for neutron detection was measured and, furthermore, it was realized that incorporating large mole fractions of lithium oxide into MCP lead glass was a severe challenge. A
Rotating mirror Photomultiplier Electric signal
Laser
front of the IP or by having the gadolinium compound inside the phosphor. It has been found that the second approach is better from the viewpoint of both resolution and signal-to-noise, (39,40). A key advantage of image plates is their mechanical flexibility. For example, they can be used to surround objects cylindrically as shown in Fig. 22. In this example, the image plate is mounted on a 30-cm drum, and the sample is inside, on-axis. Using a pixel size 0.25 × 0.25 mm2 , about 6 Mpixels were read out in 5 min. A diffraction pattern from a partially deuterated tetragonal lysosyme crystal taken by using this instrument is shown in Fig. 23 (39). Recently, it has been shown (41) that mixing 6 LiF into the image plate phosphor may significantly improve the neutron to γ sensitivity by an order of magnitude.
Waveguide A/D Image plate Digitizing Roller bank
110110011001001010111
Figure 21. Signal readout for the image plate.
1 11
2 3
10
9 4 8
5 6 tagna
ni, Cos
Ciprio
EMBL-
7
Figure 22. Arrangement for more rapid readout of an image plate. 1: Image Plate mounted on drum. 2: Drum. 3: Sample holder. 4: Crystal. 5: Belt to drive drum. 6: Carrier for read head with photomultiplier. 7: Laser. 8: Mirror to bring laser light to read head. 9: Read head with photomultiplier. 10: Encoder. 11: Cover.
Figure 23. A diffraction pattern from a partially deuterated tetragonal lysosyme crystal.
NEUTRON IMAGING, RADIOGRAPHY, AND CT
more practical approach is to consider incorporating boron oxide into the MCP glass (3), but no results have been published so far. A recent development of the microchannel plate, called the microsphere detector, is described (44). The outer geometry and appearance of the two devices are essentially identical, but the newer device comprises irregularly packed, sintered glass beads in which an electron avalanche propagates in the gaps between the spheres. So far, however, there is no development of this plate specifically for neutrons. A combination of scintillator, 6 Li-glass : Ce3+ , and MCP has been examined as a positional-sensitive detector (45), where the readout was by a two-dimensional resistive sheet. Figure 24 illustrates the principle. In a further development of this approach, a positional resolution of 40 µm was reported on a small device (46). RADIOGRAPHY AND TOMOGRAPHY Neutrons have unique scattering and absorption properties for studying structures and flaws in nondestructive testing of materials. X rays have a cross section for absorption that increases approximately as Z4 , and hence they are strongly absorbed by heavy elements, and little absorbed by light elements. Low energy neutrons, on the other hand, can interact strongly with light elements and minimally with heavy elements, and can have interaction properties that differ by several orders of magnitude between nuclei whose atomic numbers are close.
Microchannel plate electron multiplier
Fiber optic faceplate Optical coupling agent
Photocathode surface
Back surface of scintillator
IR Resistive anode
Photon paths
e−
Incident neutron
Centroid of current distribution
Neutron interaction in scintillator
IL
Scintillator
−
To electronic system
+
High-voltage supply Figure 24. Combination of 6 Li glass, a microchannel plate, and a resistive sheet readout.
1067
An excellent selection of papers relating to radiography and tomography using neutrons can be found in (6,7). Planar neutron radiography was developed to take advantage of these unique neutron properties. It is ideally suited to studies of thin layers. In aluminum, for example, early corrosive changes that are rich in hydrogencontaining compounds can be detected because of the large interactive cross section with hydrogen nuclei. Neutron radiography, even though expensive compared with other nondestructive testing methods, is an indispensable part of inspection and examination in fields such as nuclear engineering, aerospace industry, pyrotechnics, the oil industry, and archeology. Specific examples are • • • •
welding seams, cooling channels in turbine blades, gluing of sandwich structures used in aviation, homogeneity of new materials or enclosures inside them, and • detection of hydrocarbonates inside metal or ceramic housings. It has also been used in life science applications such as root activity in growing plants (47), insect research (48), and in the study of microorganisms (49). However, the image in radiography is produced from the integrated interactive cross section across the entire path of radiation through the object. Local density or material changes can be hidden because of this essential averaging. Image reconstruction from a multitude of projections permits tomography to generate three-dimensional detail of the object’s internal structure. Some early advantageous features of tomography were realized in test measurements on nuclear fuel (50), but its full potential has been thwarted by lack of efficient, high resolution (fraction of a mm) detectors. Detectors for radiography have been largely based around image plates (51–54). However, image plates (and film) are not practical for tomography because a large number of projections need to be taken in a reasonable time and need to be available in digital form for further processing. Most tomographic work has been based on neutron scintillators coupled to CCDs (55–58); two are described here. (a) McFarland et al. (55) at the Massachusetts Institute of Technology pioneered the use of a scintillator screen coupled to a CCD to demonstrate that such a system had the positional resolution and efficiency capable of yielding useful tomographs. Figure 25 shows a schematic of the experimental setup. Neutrons that have passed through the test object convert in the screen, a 6 LiF–ZnS scintillator, whose light output is imaged onto a cooled CCD by a front-surfaced mirror and lens. This geometry allows shielding the CCD from the incident neutron beam, preventing radiation damage to the CCD and possible neutron activation. An efficiency of 15% for thermal neutrons and a resolution of about 100 µm for the screen was claimed by the manufacturer but this specific setup had significantly lower efficiency because the blue ZnS phosphor and the lens aperture were mismatched. However, acceptable images
1068
NEUTRON IMAGING, RADIOGRAPHY, AND CT
Scintillator screen
l
∅
x Mirror
s Neutron beam Object
y α Lens
CCD Figure 25. One of the first tomographic measurements using a CCD detector. Neutrons pass through the object and are converted in a scintillator screen mounted on the inside of a lighttight box. Optical photons reflected from the mirror are focused by the lens onto a CCD.
are absorbed at an efficiency of about 15%, and the ionizing reaction products produce approximately 1.8 × 105 photons in the ZnS(Ag) crystals at an average wavelength of 470 nm. The product particles, an α and triton, have average ranges of 10 µm and 26 µm, respectively, within the scintillator. The granular compositional of the scintillator limits the resolution to about 50 um. The efficiency is improved for subthermal neutrons because the absorption cross section is inversely proportional to the neutron velocity. The scintillator material itself is somewhat opaque to its own radiation, so the scintillator layer must not be thicker than 0.5 mm for optimum light output; a thicker layer would be required for optimum neutron absorption. Light output for gamma rays is 450 to 4,500 times smaller than for one detected neutron. Figure 27 shows the setup of the Munich instrument. The detector consists of a 28 × 28 cm2 scintillator screen, an aluminum-coated surface mirror, and a Peltier cooled CCD camera system that has a 512 × 512 pixel Thomson CCD (TH7895 that has a 19 × 19 µm2 pixel size). The driving electronics are very flexible and compact due to the use of highly integrated chips. Control of exposure, readout, and data transfer is performed by a standard PC. As in (a), the scintillator screen is viewed via the mirror by the CCD chip to prevent radiation damage to the latter. The mirror and the back side of the detector housing are transparent for neutrons, so that the nonabsorbed part of the beam leaves the detector. A zoom lens of f = 1.0 permits coverage of the whole active area of 28 × 28 cm2 . Although the CCD is not in the direct line of neutrons, some problems arise from stray gamma rays that are generated by neutron capture in the beam catcher or in the sample. The light output of the scintillator is negligible
Scintillator Neutrons
Mirror
Figure 26. Three-dimensional view of a neutron tomographic reconstruction of a brass control valve.
Zoom lens Vacuum
CCD
were obtained. Figure 26 shows a neutron tomogram of a small brass control valve (about 3 cm outside dimension) reconstructed from 90 projections at 2° increments. The total imaging time was a little under 5 minutes, and the neutron fluence was 7 × 108 n/cm2 per projection. Intrinsic resolution is close to the 100 µm of the scintillator. (b) A group at the Technical University, Munich developed a high sensitivity CCD camera for detecting the faint light of a neutron scintillator (56,57). A prototype that had an active area of up to 28 × 28 cm2 was developed. It is expected that after some further improvements, this detector will be sensitive enough to detect even a few single neutrons on an area as large as 0.5 × 0.5 m2 . As in (a), a commercially available granular scintillator consisting of ZnS(Ag) +6 LiF was used. Thermal neutrons
Peltier supply Water Readout electronics and power supply
Mains
Figure 27. Schematic setup of the detector.
NEUTRON IMAGING, RADIOGRAPHY, AND CT
for gammas compared to neutrons, but the CCD chip itself is highly sensitive to gammas. A gamma event in a CCD generates several thousand photoelectrons and forms a nearly perfect point source. These gamma events appear as isolated bright spots in the picture and can be eliminated by electronic image processing by comparing two identical exposures. In the future, the CCD will be shielded with lead glass. Neutron exposures have been taken at the Garching reactor FRM-I in a flux of the order of 3.0 × 105 n/s/cm2 and an average neutron energy of 6.5 meV. Because the beam was only about 2.5 cm wide and 16 cm high, largearea pictures were obtained by mounting the detector and sample on a translation table, which was scanned across the beam. A compete radiograph was taken in 10 minutes, exposing each part of the sample to the beam for 2 minutes. Figure 28 shows a radiograph of a car ignition distributor. On top is the low pressure box for ignition time adjustment that has a steel spring and membrane, and below this is the plastic housing of the capacitor, nearly dark because of its hydrogen content. An example of the transformation from projection to tomograph is shown in Fig. 29. The object is a small electromotor about 20 mm in diameter. The CCD instrument was used to record many projections in 0.9° steps using a three-minute exposure each time. One such projection is shown in Fig. 29(a). From all projections, a three-dimensional reconstruction can be put together from a stack of two-dimensional reconstructions; the latter in turn is obtained from Fourier analysis of the projection raw data (59). The three-dimensional reconstruction was put together from 268 separately reconstructed layers of 201 × 201 pixels, forming a data cube of 201 × 201 × 268 pixels (Fig. 29b). Each 3-D pixel corresponds to a cube whose side is about 120 µm. Some upgrades of the detector system that will improve both throughput and image quality are a reduction of dark current noise by liquid nitrogen cooling, improvement of the sensitivity of the CCD for the 470-nm scintillation light by coating it with laser dyes, which act as wavelength
1069
(a)
(b)
Figure 29. Neutron radiography/tomography of a small motor. (a) One single projection; (b) front view of motor (268 reconstruction layers).
shifters, or usage of backlit CCDs. Better shielding against gamma rays and better light collection by high-aperture lenses (f = 0.75) and fiber-optic tapers are also being planned. Fast neutron radiography is an attractive nondestructive inspection technique because of the excellent penetrating characteristics of fast neutrons into matter. The difficulty of detecting fast neutrons, however, reduces its attractiveness. Some techniques have been developed to help overcome this difficulty. Two image plates and a sheet of polyethylene as a proton emitter were examined (60); the results showed that the method is applicable to fast neutron radiography and effectively discriminates γ rays. CLOSING REMARKS
Figure 28. The raw data from a car ignition distributor.
In this article, we described the principal reactions in nature that are useful for detecting of neutrons, highlighted the salient features of a range of detector technologies that have been developed for imaging
1070
NEUTRON IMAGING, RADIOGRAPHY, AND CT
purposes, and outlined an important application, namely, radiography and tomography. However, in covering quite a range of disparate technologies, it is not possible to delve too deeply into some fundamental aspects of radiation detection. For example, in describing each imaging technology, little distinction was made between single neutron counting and the integrating mode. In fact, there are only three detector types described in this article that are truly integrating, namely, film, the image plate, and the combination of scintillator and CCD. These are most useful in very high rate applications where an image can be recorded quickly and where no time resolution is required; hence, they are commonly used in radiography and tomography. All other detector types described register neutrons electronically, one event at a time. This offers the possibility of time tagging events and, in experiments at spallation sources, each neutron energy can then be determined. Thermal neutrons, particularly, move very slowly; 1-meV and 100-meV neutrons traverse 10 m in 23 ms and 2.3 ms, respectively. Flight paths between source and detector can easily be 10 m, and therefore, time differences of several milliseconds can occur between the arrival of the least, and most, energetic neutron from one spallation pulse. Timing resolution, nonexistent for integrating detectors, has been indicated for counting detectors where appropriate. Development of improved neutron imaging techniques will continue to be an area of challenging research. Detectors that have sub millimeter resolution, counting rates of megahertz per square centimeter, and area coverage of several square meters are essential if the world community is to make effective use of the newest spallation sources now under construction and consideration throughout the world. Acknowledgments The author acknowledges many informative discussions with his Brookhaven colleagues, particularly Drs. Veljko Radeka and Bo Yu, and many other colleagues at neutron facilities in the United State and abroad. This work was supported by the U.S. Department of Energy Contract No. DE-AC02-98CH10886.
2. R. K. Crawford, Proc. SPIE 1,737, 210–223 (1992). 3. G. W. Fraser, Proc. SPIE 2,339, 287–301 (1995). 4. G. Lander, ed., J. Neutron Res. 4, 1–274 (1996). 5. J. Hastings, G. C. Smith, and B. Yu, eds., Workshop on Neutron Detectors for Spallation Sources. BNL, 1998, http://neutronworkshop.bnl.gov/. 6. C. O. Fischer, J. Stade, and W. Bock, eds., Proc. 5th World Conf. Neutron Radiogr., Berlin, 1996, Pub: DGZfp. 7. E. Lehmann, H. Pleinert, and S. K¨orner, Proc. 3rd Int. Topical Meet. Neutron Radiogr., Lucerne, Switzerland, 1999. 8. A. J. Peurrung, Nucl. Instrum. Methods A443, 400–415 (2000). 9. G. F. Knoll, Radiation Detection and Measurement, 3, John Wiley & Sons, Inc., New York, NY, 1999. 10. L. V. Groshev et al., Bull. Acad. Sci. USSR Phys. Ser. 26, 1,127–1,146 (1962). 11. D. I. Garber and R. R. Kinsey, Report Brookhaven National Laboratory, 1976.
No. BNL
325,
12. V. Radeka, N. A. Schaknowski, G. C. Smith, and B. Yu, in B. P. Schoenborn and R. B. Knott, eds., Neutrons in Biology, Plenum Press, NY, 1996, pp. 57–67. 13. G. Charpak, R. Bouclier, T. Bressani, J. Favier, and C. Zupancic, Nucl. Instrum. Methods 62, 262–268 (1968). 14. V. Radeka and R. A. Boie, Nucl. Instrum. Methods 178, 543–554 (1980). 15. R. A. Boie, J. Fischer, Y. Inagaki, F. C. Merritt, H. Okuno, and V. Radeka, Nucl. Instrum. Methods 200, 533–545 (1982). 16. S. A. McElhaney and R. I. Vandermolen, Oak Ridge National Laboratory Report ORNL/TM-11557, 1990. 17. M. Imai, T. Kato, and D. Schneider, J. Chem. Phys. 106, 9,362 (1997). 18. B. P. Schoenborn and R. B. Knott, eds., Neutrons in Biology, Plenum Press, NY, 1996. 19. V. Radeka, IEEE Trans. Nucl. Sci. NS-21, 51–64 (1974). 20. A. Oed, Nucl. Instrum. Methods A263, 351–359 (1988). 21. N. Vellettaz, J. E. Assaf, and A. Oed, Nucl. Instrum. Methods A392, 73–79 (1997). 22. V. Dangendorf, A. Demian, H. Friedrich, V. Wagner, A. Akkerman, A. Breskin, R. Chechik, and A. Gibrekhterman, Nucl. Instrum. Methods A350, 503–510 (1994). 23. B. Gebauer, C. Schulz and T. Wilpert, Nucl. Instrum. Methods A392, 68–72 (1997).
ABBREVIATIONS AND ACRONYMS
24. B. Gebauer, C. Schulz, T. Wilpert, and S. F. Biagi, Nucl. Instrum. Methods A409, 56–62 (1998).
CCD EPROM ESS FWHM ILL IP MCP MSGC MWPC PM PSL SNS
25. M. J. Knitel, Ph.D. Thesis, Technical University, Delft, 1998.
charge coupled device erasable programmable read only memory European Spallation source full width at half maximum Institute Laue Langevin image plate micro channel plate micro strip gas chamber multi-wire proportional chamber photomultiplier photostimulated luminescence spallation neutron source
BIBLIOGRAPHY 1. P. B. Convert and J. B. Forsyth, eds., Positional Sensitive Detection of Thermal Neutrons, Academic Press, London, 1983.
26. H. O Anger, Rev. Sci. Instrum. 29, 27–33 (1958). 27. M. G. Strauss, R. Brenner, F. J. Lynch, and C. B. Morgan, IEEE Trans. Nucl. Sci. NS-28, 800–806 (1981). 28. R. Kurz and J. Schelten, Nucl. Instrum. Methods A273, 273–282 (1988). ¨ 29. W. Schafer, E. Jansen, G. Will, A. Szepesvary, R. Reinartz, ¨ and K. D. Muller, Physica B213 & 214, 972–974 (1995). 30. P. L. Davidson and H. Wroe, in P. B. Convert and J. B. Forsyth, eds., Positional Sensitive Detection of Thermal Neutrons, Academic Press, London, 1983, pp. 166–174. 31. N. J. Rhodes, A. G. Wardle, A. J. Boram, and M. W. Johnson, Nucl. Instrum. Methods A392, 315–318 (1997). 32. B. Czirr, G. M. MacGillivray, R. R. MacGillivray, and P. J. Seddon, Nucl. Instrum. Methods A424, 15–19 (2001). 33. J. Schelten, M. Balzhauser, F. Hongesberg, R. Engels, and R. Reinhartz, Physica B234–236, 1,084–1,086 (1997).
NEUTRON IMAGING, RADIOGRAPHY, AND CT
1071
34. C. Petrillo, F. Sacchetti, G. Maehlum, and M. Mancinelli, Nucl. Instrum. Methods A424, 523–532 (1999).
47. J. Furukawa, T. M. Nakanishi, and M. Matsubayashi, Nucl. Instrum. Methods A424, 116–121 (1999).
35. E. O. Wollan, C. G. Shull, and M. C. Marney, Phys. Rev. 73, 527–528 (1948). 36. H. Holwein, in P. B. Convert and J. B. Forsyth, eds., Position Sensitive Detection of Thermal Neutrons, Academic Press, London, 1983, pp. 379–390. 37. M. Sonoda, M. Takano, J. Miyahara, and H. Kato, Radiology 148, 833 (1983).
48. L. L. Allee, P. M. Davis, and H. C. Aderhold, Proc. 5th World Conf. Neutron Radiogr., Berlin, 1996, pp. 728–733. 49. R. Wacha, V. R. Crispim, C. Lage, and J. D. R. Lopes, Radiation Meas. 32, 159–162 (2000). 50. C. F. Barton, Trans. Am. Nucl. Soc. 28, 212–213 (1977).
38. Y. Amemiya, T. Matsushita, A. Nakagawa, Y. Satow, J. Miyahara, and J. Chikawa, Nucl. Instrum. Meth. A266, 645–653 (1988). 39. F. Cipriani, J. -C. Castagna, M. S. Lehmann, and C. Wilkinson, Physica B213 & 214, 975–977 (1995). 40. M. J. Knitel, V. R. Bom, P. Dorenbos, C. W. E. van Eijk, I. Berezovskaya, and V. Dotsenko, Nucl. Instrum. Methods A449, 578–594 (2000). 41. M. Thoms, Nucl. Instrum. Methods A424, 34–39 (1999). 42. J. L. Wiza, Nucl. Instrum. Methods 162, 587–601 (1979). 43. G. W. Fraser and J. F. Pearson, Nucl. Instrum. Methods A293, 569–574 (1990). 44. A. S. Tremsin, J. F. Pearson, J. E. Lees, and G. W. Fraser, Nucl. Instrum. Methods A368, 719–730 (1996). 45. R. A. Schrack, Nucl. Instrum. Methods 222, 499–601 (1984). 46. C. R. Rausch, Thesis, Technical University, Munich, 1996.
51. J. Stade, J. J. Rant, and M. Kaling, Proc. 5th World Conf. Neutron Radiogr., Berlin, 1996, pp. 298–306. 52. S. Tazaki, K. Neriishi, K. Takahashi, M. Etoh, Y. Karasawa, S. Kumazawa, and N. Niimura, Nucl. Instrum. Methods A424, 20–25 (1999). 53. G. Bayon, Nucl. Instrum. Methods A424, 92–97 (1999). 54. S. Fujine, K. Yoneda, M. Kamata, and M. Etoh, Nucl. Instrum. Methods A424, 200–208 (1999). 55. E. W. McFarland, R. C. Lanza, and G. W. Poulos, IEEE Trans. Nucl. Sci. NS-38, 612–621 (1991). 56. B. Schillinger, J. Neutron Res. 4, 57–63 (1996). ¨ 57. B. Schillinger, W. Blumluber, A. Fent, and M. Wegner, Nucl. Instrum. Methods A424, 58–65 (1999). 58. M. R. Gibbons, W. J. Richards, and K. Shields, Nucl. Instrum. Methods A424, 53–57 (1999). 59. A. C. Kak and M. Slaney, Principles of Computerized Tomographic Imaging, IEEE, NY, 1988. 60. M. Masahito, T. Hibiki, K. Mishima, K. Yoshii, and K. Okamoto, Nucl. Instr. Methods A463, 324–330 (2001).
O arriving at point (x , y , z ) depends only on the amount of energy leaving the corresponding object point (x, y, z). Clearly this does not hold in practice because any optical system will image a point into a small area (or volume). This implies that the energy at any point O depends on the energy emitted from points in the neighborhood of O. Thus, the image is, in general, an imperfect map of the object, which means that some of the information contained in the object will be lost during imaging. Nevertheless, if the imperfections are not large, we may satisfy ourselves that a one-to-one correspondence may be established between small object and image patches. There are two fundamental ways of deriving the function A . Let us imagine for simplicity that our object is an illuminated plane such as a photograph that has some reflectance variation. We may scan the object, probing one small area at a time, or we may form simultaneously an image of the entire object in what is called whole-field imaging that may be accomplished by using a simple lens. Scanning may be accomplished in two fundamental ways: (1) by moving a small aperture in close proximity to the object and (2) by moving a spot of light across the object and collecting the reflected (or scattered) radiation. These three different ways of imaging share common principles and therefore can be described by a common mathematical formulation given later. However, the specific techniques and instruments involved in the various methods are different and are covered by separate articles in this Encyclopedia. Within this article, we concentrate explicitly on whole-field imaging, although all of the principles and most of the techniques are immediately transferable to scanning using a spot of light. The simplest device for whole-field imaging is the pinhole camera. It comprises a square or rectangular box that has a small opening on one side. The image is formed on the side opposite the pinhole, as indicated in Fig. 1. The small size of the pinhole (relative to its distance from the object and image planes) restricts light from any object point to be incident on only a small image patch, thus establishing the approximate one-toone correspondence mentioned previously. In other words, the light cone accepted from an object point is small, and so is the overlap between the bases of such light cones that arise from different object points. As the size of the
OPTICAL IMAGE FORMATION PANTAZIS MOUROULIS California Institute of Technology Pasadena, CA
INTRODUCTION For the greatest part of human history, visible light has been the only image forming medium. And until recently, visual telescopes and microscopes comprised the vast majority of image forming instruments. The image formation principles examined in this article derive from such venerable ancestry but are broader in their application. Whereas visible wavelengths span a range of less than 2 : 1 in the electromagnetic (EM) spectrum, the contents of this article can find application in devices operating across a wavelength range of ∼106 , from nanometers to millimeters. However, the middle part of that range, from about 100 nm to about 10 µm, contains practically all of the devices that are classified as optical instruments, and those are our main concern here. An image is an n-dimensional map of a property of an object. An optical image, as we understand it here, is a spatial map of one or more of the following three properties of an object: emissivity (E), reflectance (R), or transmittance (T). In other words, an object may emit EM radiation or it may reflect/transmit it in a spatially dependent way. Let A stand for any of E, R, and T, or a combination. Then, an optical image is a map or representation of A(x, y, z). The function A also depends on wavelength, so one may plot a map of A(λ). However, although the wavelength dependence is implicit in A(x, y, z), its explicit mapping is not part of an optical image. Also, we restrict the concept of optical image to be independent of any recording device. Thus, an optical image is the image projected on the film by the camera lens, but not the image recorded by the film because the latter depends on the characteristics of the film. In most cases, it is sufficient or necessary to simplify this definition of an optical image by restricting it to two dimensions. This is not to say that optical instruments may not image a three-dimensional object. It is really only a matter of convenience arising from the fact that most image recording devices are inherently twodimensional, so an image formed on a plane is of primary practical interest. Accordingly, most of the theory of optical instruments deals with planar images, and therefore, either with a projection of the three-dimensional object on a plane or a slice of such an object. The object property A(x, y, z) is mapped by the optical system to a function A (x , y , z ), which is the image. A 1 : 1 correspondence is ideally assumed to exist between the functions A and A , so that the amount of EM energy
A′ B′
O
B
A Figure 1. The pinhole camera. 1072
OPTICAL IMAGE FORMATION
pinhole increases, the bases of the light cones overlap significantly. Therefore, light from different object spots is incident on the same image spot, thus destroying the oneto-one correspondence and therefore the imaging ability of the pinhole camera. Therefore, it would appear desirable for the pinhole to be as small as possible, but in fact this is not so. In the first place, as the pinhole gets smaller, the amount of light passing through it will diminish, resulting in a very faint image. But an even more fundamental limitation is caused by the wave nature of light. When a wave is made to pass through an opening of a size comparable to the wavelength, the wave spreads in directions that differ from the original direction of incidence due to diffraction. If the opening is sufficiently small, the wave spreads in all forward directions, across essentially a hemisphere. Thus, light from any point in the object would end up illuminating the entire image plane simultaneously, again destroying the correspondence between object and image. The projection of the pinhole on the image plane, taken as approximately the same size as the pinhole itself, provides the resolution limit of the pinhole camera. Image details smaller than the pinhole size cannot be resolved. Any close examination of pinhole camera images reveals considerable blur. Thus, the pinhole camera has two fundamental problems: inferior resolution and a dim image. Both of these problems are resolved by introducing a lens in place of the pinhole. As Fig. 2 shows, the lens collects light from an object point and concentrates (focuses) that light onto an image point. The amount of light collected by the lens depends on its diameter and can clearly be a lot more than the light collected by a pinhole. Interestingly, for an ideal lens, the ability of the lens to focus the light down to a very small area (focal spot) is inversely proportional to its size: the larger the lens, the smaller the focal spot size (assuming that the distance to the image is fixed). So we appear to gain both image brightness and image sharpness by increasing the lens size. Unfortunately, for any nonideal lens, the focal spot size depends on the specific construction of the lens, as well as on the object and image size or the angle at which rays strike the lens. When these realistic considerations are taken into account, it is found that increasing the lens size beyond a certain point normally leads to an increase in spot size or requires a complicated lens construction. The subject of this article is a more detailed appreciation of the contents and implications of the last two paragraphs. The necessary background for much of this material is that of a first-year college
1073
physics course. Specifically, some familiarity is assumed with the elementary concepts of a thin lens and its focal length/points and the basic concepts of wave optics. In addition to that, a proper understanding of the image formation concepts in a later section assumes some familiarity with the basic mathematics of Fourier transformations. Although some basic material of Fourier theory is reviewed in the Appendix, it is primarily useful as a refresher for those who already know something about the topic. However, a reader unfamiliar with such mathematics should still be able to appreciate that material because it is largely selfcontained, although significantly compressed through omission of most mathematical proofs. A deliberate attempt has also been made to provide simple physical arguments for most equations, and many illustrations have been included. Clearly, for a detailed understanding, the reader must resort to the references given.
PARAXIAL OPTICS A basic understanding of image formation and the associated instruments may be obtained through the concept of rays. A ray is a geometrical path that corresponds roughly to the direction of propagation of EM energy. This correspondence is generally good away from a focus and away from diffracting edges but breaks down dramatically in those two cases, where, as we will see, we have to employ the methods of wave optics. Inside isotropic materials, rays may be defined more strictly as lines perpendicular to the geometrical wave fronts, where the latter are defined as surfaces of constant phase measured relative to an originating point source. Space limitations prohibit a full discussion here; a more complete introductory discussion may be found in (1,2). Along with the pinhole camera, a plane mirror may also be considered the simplest imaging device. As Fig. 3 shows, a plane mirror forms an image of a point at a distance (OA ) equal to –(AO). The image formed by a plane mirror is virtual. It is formed by a diverging bundle of rays that appear to have come from A even though they never get there. Combinations of plane mirrors allow us to perform interesting and useful image transformations such as image rotation or translation. Such combinations are often encountered as reflecting prisms, which are an essential element of many imaging systems, such as
B O
B A A
A′
A′ B′
Figure 2. Schematic of a camera where lens is in place of the pinhole.
Figure 3. Imaging through a plane mirror.
B′
1074
OPTICAL IMAGE FORMATION
and cos a = 1 + (terms of second or higher power in a).
Figure 4. Schematic of afocal or zero-power systems. In both cases, the divergence angle of the incident bundle of rays is not affected.
cameras or binoculars. Details of the forms and properties of reflecting prisms may be found in (2–4). A real image is formed by the lens shown in Fig. 2. This image is formed by a converging bundle of rays. One important difference between the plane mirror and the lens is that the plane mirror cannot alter the divergence (or convergence) of the rays, whereas the lens (or a curved mirror) can. For example, the lens of Fig. 2 transforms a diverging bundle, emanating from point A, into a converging bundle focusing on point A . When an optical system changes the divergence or convergence angle of the rays, we say that it has power. More strictly, we may define a zero-power (or afocal) system from Fig. 4: If an incident parallel bundle of rays remains parallel after passing through (or interacting with) the system, then we say that the system has zero power, or is afocal. Systems that have nonzero power may also be called focal. It should be understood that there is no fundamental difference between an afocal and a focal system. A plane mirror is the simplest afocal system. As Fig. 3 shows, although a plane mirror does not alter the divergence of the rays, nevertheless, it forms an image when a nearby object is presented. As the object recedes, so does the image. At the limit, when the object is at infinity, rays emitted from an object point will be parallel, and will remain parallel after impinging on the plane mirror. This means that they will intersect at infinity, which is also where the image will be located. In other words, parallel rays are focused at infinity. A lot of confusion can be avoided if one thinks of an object (or image) at infinity as the limiting case of a very distant object, and similarly, of an afocal system as the limiting case of a focal system. We will restrict the exposition of image forming optical systems to systems that have an axis of rotational symmetry and contain only homogeneous materials, such as a compound lens system in which all of individual lens elements have the same axis. Practically all of the critical concepts of image formation can be developed within this framework, and once those are understood, the reader can extend them to other situations whose symmetry is different. A first and important approximation to the behavior of rotationally symmetrical systems is obtained by considering only rays that remain infinitesimally close to the axis. Such rays are called paraxial. In practical terms, the paraxial domain is defined by neglecting terms of second or higher order in the power series expansion of the sine or cosine. Thus, if a is the angle that a ray forms with the axis, sin a = a + (terms of third or higher power in a)
(1)
(2)
Thus the paraxial approximation is sin a = a(= tan a), and cos a = 1. An important consequence of the paraxial approximation is that all smooth curved surfaces of revolution about the axis become indistinguishable from a sphere. When the origin of coordinates is at the apex, the equation describing such surfaces may be written as z = 12 c(x2 + y2 ) + (higher order terms),
(3)
where z is the coordinate along the axis and c = 1/R, where R is the local radius of curvature. The bracketed term in Eq. (3) can be neglected to within the same level of approximation as the corresponding terms in Eqs. (1) and (2). As a result of this restriction, paraxial image formation is perfect. This means that all wave fronts are spherical, and therefore, all rays meet at a single point, the center of the spherical wave front. Thus the image of point A in Fig. 2 is also a point. Similarly, the image B of the extra-axial point B is also a perfect point, but it is assumed that the lengths (AB) and (A B ) are infinitesimally small. Although this assumption appears at first sight extremely restrictive, the value of paraxial optics is not restricted to an infinitesimal region around the axis. Appropriate extension to a finite region can be successfully made, as shown later. The departure of the wave front from sphericity is described as aberration. Paraxial optics theory is used in performing the following four functions: (1) finding the location of the image, (2) finding the size of the image, (3) obtaining a first approximation to the brightness of the image, and (4) determining a ‘first-cut’ optical system that satisfies the previous three requirements. The fourth problem is generally considerably harder than the first three. That is the field of optical design, treated in (2–7), among several others. Within this article, we concentrate on the problem of finding the properties of a given system, without concerning ourselves how the system was designed to achieve those properties. IMAGE LOCATION AND SIZE Given an optical system and an object, how do we determine the image location and size? In this section, we show that both of these questions may be answered by tracing a single paraxial ray. An optical system may comprise any number of lenses or mirrors, keeping in mind that we have restricted our attention to systems of rotational symmetry. It is evident, however, that plane mirrors, used to fold an otherwise long system, do not destroy this rotational symmetry and are ignored from our present viewpoint. Thus, we represent an arbitrary optical system as a collection of n refracting and/or reflecting surfaces (Fig. 5). Paraxial Ray Tracing Paraxial ray tracing involves two operations: refraction through a surface and transfer to the next surface. In
OPTICAL IMAGE FORMATION
d1
1075
dk−1
h1 u ′1 = u 2
u1
...
h1
h2 l ′1
l1 n1
l2 n ′1 = n2
c1
hk −1
hk
u ′k l ′k
n ′k −1 = nk c2
Ck−1
h′k
Ck
n ′k
Figure 5. Schematic of an optical system comprising k optical surfaces that have the same axis of rotational symmetry. In the paraxial approximation, the height h and the angle u are both considered infinitesimal.
Table 1. Notation for Paraxial Ray Tracing Symbol Constructional parameters
Ray parameters
Meaning
n
Refractive index
c d h
Surface curvature (1/R) Axial surface separation Paraxial ray/surface intersection height variable Paraxial ray angle variable Object/image distance from surface Paraxial object/image height variable
u l η
the first operation, we determine the new ray angle, and in the second its point of intersection with the next surface. Table 1 gives the meaning of the symbols in Fig. 5. A subscript is used to associate a variable with its corresponding surface; primed quantities refer to the image space of a surface, that is, after refraction; unprimed quantities refer to the object space of a surface, that is, before refraction. In this sense, ‘‘refraction’’ also includes reflection, if the surface in question is a mirror. The same set of equations can handle both cases by using an appropriate convention, as shown. The following two equations describe the paraxial refraction (Eq. 4) and transfer (Eq. 5) processes: ni ui − ni ui = −hi ci (ni − ni ), hi+1 = hi + di ui ,
Determination of Image Location Sufficient background has already been provided to determine the image location for an arbitrary system and object. Clearly, the system must be fully specified in terms of refractive indexes, curvatures, and surface separations. The process is as follows:
(4) (5)
where ni ≡ ni+1 and ui ≡ ui+1 . The paraxial angle and height variables are related to the object distance through h u=− , l
of the object/image distance l. It should be clear that the quantity l is normally finite, whereas both u and h are infinitesimal (although obviously they may not be represented pictorially as such). In applying Eqs. (4)–(6), attention must be paid to the sign convention if one is to obtain correct numerical answers. The sign convention used is that of a Cartesian coordinate system where the local origin is at the apex of the surface. Thus, distances to the left and the bottom of the apex are negative, and acute angles measured from the axis to the ray are positive if counterclockwise. Light is assumed to come from the left, unless a mirror is involved. Mirrors introduce an additional consideration to the sign convention. Reflection from a mirror may be treated as refraction if the following two conditions are observed: (1) after reflection, the refractive index of the ‘‘new’’ medium changes sign (so that the refractive index of air becomes −1 after the first reflection, and 1 after the second), and (2) the sign of the axial separation d also reverses. The second condition is in fact obvious because in going from surface i to surface i + 1, we would have to move from right to left after an odd number of reflections.
(6)
and a similar equation (with primes) may be written for the image. Thus if paraxial ray tracing succeeds in establishing the height and angle variables at the last surface (after refraction), then, Eq. (6) may be used to determine the image location. It is also possible to remove the paraxial angle variable u from Eqs. (4), (5) in favor
1. Derive the initial values (at the first surface) for the paraxial height and angle variables u1 , h1 . If the object distance (l1 ) from the first surface is the only parameter known, then one specifies an arbitrary value for u1 and finds h1 through Eq. (6). An arbitrary value for u may be used because all paraxial rays from a point will be imaged at the same one point (all wave fronts are spherical). This may also be seen by the fact that the variable u drops out of Eqs. (4), (5) if u is substituted in terms of l [shown by Eq. (6)]. 2. Apply Eqs. (4), (5) to determine uk (after the last refraction) and hk (at the last surface). 3. Again apply Eq. (6) to determine lk .
1076
OPTICAL IMAGE FORMATION
Clearly, this procedure may be reversed if we desire to determine the object location from a known image location. Determination of Image Size (Magnification) It would appear at first sight that to find the image height, we would need to trace a ray from the top of the object and determine its intersection at the image plane. Although this is possible, it is strictly unnecessary. Figure 5 contains all of the information necessary for determining the image size, and it is important to understand why. First, let us define the magnification M as the ratio of the image and object heights: η (7) M = k. η1 This type of magnification is called transverse magnification, to distinguish it from other possible types such as longitudinal, angular, and so on, described in the next section. When the adjective is omitted, transverse is normally implied. Now, consider imaging by a single surface, as in Fig. 6. A ray is shown from the top of the object through the apex of the surface that intersects the image plane at the top of the image. Application of the paraxial form of Snell’s law for this ray (n sin i = n sin i ) and simple geometry gives us the following relationship: M=
nl nu η = = , η nl nu
(8)
where the last equality follows from u = −h/l. Successive application of this formula for the k surfaces of Fig. 5 gives η l ηk η η n1 l l = 1 2 ··· k = 1 2 ··· k η1 η1 η2 ηk nk l1 l2 lk n1 u1 n2 u2 nk uk = ··· n1 u1 n2 u2 nk uk
M=
(9)
because ηj = ηj+1 and nj = nj+1 . Also, because uj = uj+1 , the last equality simplifies to M=
n1 u1 , nk uk
(10)
showing the tremendous simplification that is achieved by using the paraxial angle variable u. The corresponding magnification formula utilizing the length l in Eq. (9) does not simplify further and can also cause additional
(n ′ )
(n ) A
u
O
i
h
i ′u ′
A′
h
l
h′
l′
Figure 6. Imagery of an extended object by a single surface.
computational problems if an intermediate distance is either infinite or zero. This is one of the key reasons for using the paraxial angle variable in the ray-tracing Eqs. (4), (5). Notice now that all of the quantities in Eq. (10) are associated with the single ray from the axial object point shown in Fig. 5. Thus, we have demonstrated that both the image location and the magnification can be determined by tracing a single paraxial ray from the axial object point. Other Types of Magnification Longitudinal Magnification, Image of a Volume. If the object is three-dimensional or if it is anything other than a plane perpendicular to the axis, the previous concept of linear or transverse magnification needs to be extended. We can do so by introducing the concept of longitudinal magnification. Consider two points A1 and A2 along the axis of an optical system, separated by a distance δl. Let their images be A1 and A2 , also along the axis, separated by a distance δl . M1 is the linear magnification between points A1 and A1 , and similarly for M2 . It can be shown that the longitudinal magnification is the ratio of the two axial segments (2): Mlong ≡
δl = M1 M2 . δl
(11)
Now consider a cubic volume. The face of the cube centered at K will be imaged at magnification M1 , and the face centered at L will be imaged at a magnification M2 . This shows that the images of the two faces will be of different size and implies that the image of a cubic volume is a truncated pyramid, as shown in Fig. 7. This holds only if the focal points of the lens are outside the segment (KL), so that the image of the cube is wholly real or wholly virtual. Afocal systems have the property that the linear magnification is constant, independent of the object location. For such systems, the image of a cubic volume becomes a parallelepiped. Angular Magnification. When object and image are at infinity, the concept of transverse or linear magnification is not useful and is replaced by the concept of angular magnification. An optical system that forms an image at infinity of an object at infinity must necessarily be afocal because a parallel bundle of rays remains parallel. This is illustrated in Fig. 8, where the two lenses are separated by the sum of their focal lengths. A parallel bundle of rays from the edge of the infinite object is shown that forms an angle u with the axis. The bundle emerges parallel at an angle u and is directed to the edge of the image. The angular magnification is Mang =
u . u
(12)
As mentioned in the previous section, the linear magnification is constant for an afocal system. Let the object not be at infinity but at an arbitrary point A along the axis. If the system is afocal, any ray parallel to the axis in the object space will emerge parallel in the image space.
OPTICAL IMAGE FORMATION
1077
G H C D
L
F E
K B
A O dl E′ A′
l
F′ B' L′
l′
H′ G′
K′
D′ dl ′ C′ Figure 7. Image of a cube through an ideal focal system.
A
u′
B
u A′ Figure 8. Imaging by an afocal system. The two lenses shown are assumed to be ‘‘thin.’’ A thin lens is an idealization of a real lens of infinitesimally small thickness. Therefore, all refraction is assumed to take place on a single plane, even though two distinct surfaces of different curvatures may be associated with the lens.
It follows that such a ray will pass through the top of the image, independent of the image location. Hence, the linear image size is constant. It may be shown that the linear magnification of an afocal system is the inverse of the angular: 1 (13) Mlin = Mang For a focal system, either the object or the image may be at infinity but not both together. In either case, the concept of linear magnification also breaks down. As Fig. 9 shows, an object point B at the focal plane of a lens gives rise to a parallel bundle of rays emerging from the lens (image at infinity). The magnification may then be thought of as the linear object size divided by the angular image size. From the figure, we see that this ratio is the focal length F, which therefore represents the system magnification in such a case. By reversing the direction of the rays, it is seen that a corresponding relationship holds when the object is at infinity.
b′
h F
f Figure 9. Imaging by an ideal focal lens where the object is at the focal plane. See Fig. 10 for an explanation of all symbols.
Visual Magnification. When a magnifying glass is used for visual observations, the important parameter is the gain in apparent image size between the aided and unaided eye, not the linear or the angular magnification. We set a (semiarbitrary) near point for clear vision at 25 cm away from the eye. When the object is at that distance, the maximum amount of detail may be resolved with naked eye. Let the object subtend an angle β in that case. When the object is viewed through a magnifier of focal length F, the magnified image subtends an angle β . Visual magnification is the ratio of the two angles, and it can be shown that it varies between the limits (2) 25 cm β 25 cm ≤ Mvis ≡ ≤ + 1, F β F depending on the object location.
(14)
1078
OPTICAL IMAGE FORMATION
LENS (MIRROR) ASSEMBLIES: GAUSSIAN OPTICS It is assumed that the reader is familiar with the following basic thin-lens conjugate equation: 1 1 1 − = l l F
(15)
In this equation, l and l are the object and image distances from the lens, and F is the equivalent or effective focal length (EFL). The lens is assumed to be in air. The signs are consistent with the Cartesian sign convention, established previously, which is almost universally followed in all recent advanced texts and almost never followed in early elementary texts. The quantity on the right-hand side is the power K of the lens given by K≡
1 = (n − 1)(c1 − c2 ). F
(16)
The power of the lens is its fundamental characteristic and suffices to describe all of its imaging properties that we have discussed so far. The thin-lens equation is depicted in Fig. 10. In the thin-lens approximation, the axial distance between the two surfaces is assumed to be zero, so all refraction takes place along a single plane, from which the distances l and l are measured. This is the reason for drawing the thin lens as a single line that has shaded surfaces. When the thickness of the lens cannot be ignored, it may be shown that the power is given by (n − 1)2 1 = K = (n − 1)(c1 − c2 ) + dc1 c2 , F n
(17)
where d is the axial lens thickness. Using this definition, a conjugate equation similar to Eq. (15) may be written for the thick lens, but only if the distances l and l are
c1
c2
O F
A
l
f
f′
F′
A′
l′
Figure 10. Depiction of the thin-lens conjugate equation. The object, point A, is imaged at A . The distances l and l are measured from the plane at which refraction is assumed to take place. Two more rays are also shown: one from an axial object at infinity that crosses the axis at the second focal point F , and one that emerges parallel to the axis, having originated at the first focal point F. The distances (OF) = f and (OF ) = f are the first and second focal lengths of the lens and are opposite in sign (f = −f ), irrespective of whether the lens is positive or negative. They are not to be confused with the equivalent focal length (EFL), F, whose sign is characteristic of the lens. In the simple situation depicted in this figure, the EFL happens to coincide numerically with f , but this is not generally the case. For example, if the lens is immersed in a medium of refractive index n , then f = n F. The general case is treated later in this section.
measured from the proper locations. The way this may be accomplished and the way of finding the proper locations for measuring l and l is explained later in the more general context of an arbitrary optical system. The paraxial ray-tracing techniques previously described suffice to determine the image location and size, if that is the sole concern. However, they are useful only if a detailed prescription of the system is available. But it is often necessary to understand the imaging action of a lens assembly whose internal construction is unknown. One such simple example is a compound camera lens comprising several unknown lens elements. The methods of Gaussian optics allow us to predict the behavior of such a lens in an abstract way. In effect, any focal lens/mirror assembly can be reduced to a thinlens equivalent. Clearly, the reduction cannot be perfect if we are to retain some essential information about the compound lens. Whereas a thin lens is characterized by a single parameter (its power or EFL), a compound or thick lens requires three parameters; but even so, an arbitrary system is enormously simplified when reduced to a set of only three parameters. The three parameters in question are the power or equivalent focal length of the entire system and the location of two reference planes from which object and image distances are to be measured. These are interrelated as follows: Consider an arbitrary system, for example, the compound camera lens of unknown internal construction mentioned earlier. The first and last surfaces of the lens are the only ones physically accessible, and we represent those in Fig. 11. Now, consider a ray from the axial object at infinity that crosses the axis at the second focal point F (if the system is not afocal). Inside the system, the ray is refracted in an unknown fashion, but we may determine the point Q through the intersection of the initial and final ray segments. The plane P Q is then the plane perpendicular to the axis that passes through Q . By proceeding in the opposite direction, we establish the plane PQ. These two planes are called principal planes, and they have the interesting property of unity magnification [so that the image of the segment (PQ) is the segment (P Q )]. The principal planes are the reference planes we need to apply the thin-lens equation to an arbitrary optical system. This is shown in Fig. 12. Let the image of (AB) be (A B ), and assume that we have established the location of the principal planes for the system (P, P ), following
Q
h
Q′ F′
F P
Surface 1
P′
Surface k
Figure 11. Establishing the principal planes of a focal lens system.
OPTICAL IMAGE FORMATION
Similarly, for a ray parallel to the axis from the right, or equivalently, for a ray coming from the left that emerges parallel to the axis, we obtain (Fig. 13)
B h
u′
u
A
P
F′
u ′k
P′
A′
δ=−
h′
l′
l
the previous procedure. Then, we may write an equation identical to Eq. (15) for the object and image distances, if they are measured from P and P , respectively. In the more general case, where the object and image space refractive indexes are different, Eq. 15 becomes n n − = K. l l
(18)
Now, the power of the system is defined with the aid of Fig. 12. A ray parallel to the axis at a height η emerges at an angle uk . If n is the image-space refractive index, then n uk 1 =− , F η
(19)
where again F is the EFL of the system. It should be clearly understood that the power is independent of the choice of the height η. It is also independent of the direction of propagation, in other words, whether light comes from the left, as in Fig. 12, or from the right. This holds whether or not the system has any left to right symmetry. In summary, both the thin-lens-equivalent conjugate equation (Eq. 18) and the magnification formula (Eq. 8) hold for an arbitrary system, provided that the distances l and l are measured from the principal planes. It remains to specify the principal plane locations. These are normally specified by reference to the first and last surfaces of the system, as in Fig. 13. We trace a ray parallel to the axis at a height h1 in the object space. It intercepts the last surface at a height hk and forms an angle uk with the axis. If δ is the distance between the last surface and the image-space principal plane, h1 − hk (u1 = 0). δ = uk
Ray 1
Q
(20)
Q′
h1
h1 hk
F P
(u1)
(hk ) T d
f
u ′k
F′
P′
(h1)
h1 − hk (uk = 0), u1
(21)
B′
Figure 12. Imaging by a general focal system represented by its principal planes.
K=
1079
T′ d′
f′
Figure 13. Location of the principal planes.
Ray 2
where the meanings of h1 and hk in the two cases are different (associated with a different ray). These two formulas complete the basic equations of Gaussian optics and also show its equivalence to the previous ray-tracing methods. All of the quantities in Eqs. (19)–(21) can be computed by the ray-tracing procedures of the previous section. However, the importance of the power and the principal planes is that they allow us to make general inferences about the system and they can be determined experimentally even without knowledge of the internal construction of the system. The way to achieve this is beyond the scope of this article [see (2,8)]. Afocal Systems Principal planes cannot be established for afocal systems, as the construction of Fig. 11 demonstrates (the ray would remain parallel to the axis after exiting the system). Also, Eq. (19) shows that the power of all afocal systems is zero (uk = 0), so we cannot distinguish between afocal systems by their power. Afocal systems are a special case, which nevertheless can also be treated generally in a simple way (2). Because the linear magnification for an afocal system is constant (say M0 ), we can use it instead of the power. The procedure is as follows. First, we establish any set of convenient conjugate planes by tracing a single ray through the system. Then, we use those conjugates as reference planes, from which to measure object and image distances. Thus, if l and l are the conjugate distances measured from our set of reference planes, l =
n 2 M l. n 0
(22)
DETERMINATION OF IMAGE IRRADIANCE The Marginal and Pupil Rays and the Optical Invariant Up to this point, we have shown that the image location and size can be determined by tracing a single ray, given the object and the system construction. But the significance of paraxial and Gaussian optics can be extended even further to determine a first (but often very good) approximation of the amount and distribution of EM energy that reaches the image from the object through the lens. This can be accomplished by tracing only one more paraxial ray, thus providing a striking demonstration of the economy and power of these methods. The magnification formula, Eqs. (8) or (10) may be rearranged as (23) nuη = n u η ≡ H. So the product nuη is an invariant. This holds for the entire system as well as for an individual surface. The invariant H is usually called the Lagrange invariant, but
1080
OPTICAL IMAGE FORMATION
Aperture stop
Exit pupil
Field stop PPR A
A′
u u Intermediate image PMR Figure 14. Imaging by a realistic optical system. The size of the light cones accepted into the system from any object point is limited by the aperture stop, or more generally, by the entrance pupil, which in this case coincides with the aperture stop. The maximum object size that can be imaged though the system is determined by the field stop.
it may be also called the Helmholtz invariant, or the optical invariant in (2). As we will see, it determines the amount of light accepted by the system (throughput), as well as the maximum number of resolvable spots in the system (information capacity). Up to this point, we have set no limit on the actual length of the object (η)–we have been concerned only with the magnification. The angle u will also be limited to a maximum value by the size of the lens opening, which has not been a concern until now. Although the invariance of Eq. (23) holds for any values of u and η, it acquires its fundamental significance if we interpret both u and η as the maximum values permitted by the system. This is depicted in Fig. 14. In the figure, the object at A is imaged at A at unity magnification, but this is not essential. This figure contains all of the important elements one needs to understand the transfer of EM energy through the lens system to a first approximation. We trace two rays, one from the axial object point A and one from the edge of the object. The first ray also passes through the edge of the ‘‘aperture stop,’’ which is a diaphragm that determines the size of the cone of light admitted into the system by any object point. This ray is called the marginal ray. The ray from the edge of the object is chosen to pass through the middle of the aperture stop; it is called the principal, chief, or pupil ray. The image of the aperture stop through the system is the ‘‘exit pupil,’’ as shown. The size of the exit pupil determines the angle of the light cones that converge onto image points. The corresponding size of the object-space cones is determined in this diagram by the aperture stop, but in the general case, the aperture stop may not be located at the front of the system, but somewhere inside it. In that case, the image of the aperture stop through the preceding elements is the ‘‘entrance pupil.’’ For the system shown, the aperture stop and the entrance pupil coincide. The size of the image is limited by the ‘‘field stop.’’ This is a diaphragm placed at an intermediate image location. If no intermediate image is formed by the system, the field stop may be provided by an aperture across the image itself, for example, the dimensions of the photodetector that records the image. Similarly, there may be no clearly defined diaphragm that plays the role of the aperture stop. The aperture and/or field stops may be determined by the size of those lens elements that come closest to limiting the
marginal and pupil ray angles. For example, it is evident from the figure that even if we had no physical diaphragm at the aperture stop location, the angle of the marginal ray would still be limited by the first or the third (but not the second) lens diameter. Similarly, in the absence of a field stop, the size of the image would be determined by the size of the second lens. The two rays shown in Fig. 14 determine the size and the location of the object and image, as well as the size and location of the entrance and exit pupils. These two rays are traced according to the paraxial equations shown previously, and for this reason, we designate them as the paraxial marginal ray (PMR) and the paraxial pupil ray (PPR), correspondingly. Despite the use of paraxial equations, it is assumed that the field (object) size and the aperture size correspond to the actual system values, and are thus finite, and possibly quite large. Because the paraxial equations are not valid far from the axis, it would appear that their use would lead to unacceptably large errors. However, that is not the case. The two conditions shown as equations 24 and 25 allow us to extend the validity of the paraxial equations and in fact arrive at predictions that are quite accurate for systems that do not have large aberrations. The two conditions relate the paraxial angle variable associated with the marginal and pupil rays to the actual (finite) field and aperture angles. Thus, let u be the paraxial angle variable associated with the PMR, and let a be the corresponding finite aperture angle (or angular aperture size — see Fig. 14). Similarly, let u be the paraxial angle variable associated with the PPR and a the corresponding finite field angle (or angular field size). Then, we set u = sin α
(24)
u = tan α.
(25)
and
Thus, for example, if a system is said to have a semifield of view of 45° , we set the PPR angle variable u as tan(45° ) = 1. Or, if the system is said to have an aperture angle of 30° , we would set the PMR angle variable u equal to sin 30° = 0.5. These two equations represent the link between paraxial and finite optics and allow us to extend
OPTICAL IMAGE FORMATION
1081
This quantity is more often used for a distant object than the NA, for example, for photographic objectives. In terms of the F# then, Eq. (28) becomes
∆A a h
E
E= Entrance pupil
Figure 15. Flux collected by the entrance pupil of a lens from a small Lambertian source.
the usefulness of paraxial optics to the finite domain. Full justification of this extension is given in (2).
To proceed further, we must assume familiarity with some radiometric and photometric concepts that are discussed elsewhere in this Encyclopedia [see also (2,3,8)]. The basic concept borrowed from physical optics is that of the radiant power (or flux), measured in watts, and its photometric equivalent, the luminous flux, measured in lumen. Other radiometric (photometric) quantities are then constructed by the use of areas and solid angles. Image irradiance (illuminance) is simply the flux per unit area incident on the image surface. For the source, the most important quantity is the radiance (luminance), given as flux per unit solid angle per unit area projected normally in the direction of observation. In Fig. 15, let the source have area A and a radius equal to η. At the entrance pupil of the lens, a fraction of the total flux emitted by the source is collected. This fraction is given by =
π 2 n
LH 2 ,
(26)
where L the source radiance (luminance) and H = nuη = n sin aη, the Lagrange invariant of the lens. Thus, we see that the flux collected by the lens is proportional to the square of the Lagrange invariant. If the lens has a transmission factor τ , then flux equal to τ will be distributed across the image surface π η2 . The image irradiance is then E=
τ . π η2
(27)
The advantage of using Eq. (24) is that for systems in which the aberrations are reasonably well corrected, the relationship u = sin a also holds in the image space. Therefore, H = n sin a η . Substituting in Eq. (27) and using Eq. (26) gives E=
π τ L 2 n sin2 a . n2
(28)
The quantity n sin a is called the numerical aperture (NA) of the lens. It is defined in the image space if the object is more distant than the image. More frequently, it is defined in the object space (n sin a) for a nearby object, as for a microscope objective. The quantity 1/(2n sin a ) is called the relative aperture or simply the F number (F#).
(29)
As mentioned, the last two equations are more appropriate when the object is distant. When both object and image are nearby, it may be useful to express the image irradiance in terms of the linear magnification M. By combining Eqs. (26) and (27) and using H = n sin aη, we obtain E=
Flux Collected by a Lens. Image Irradiance
πτL 1 . n2 4(F#)2
τ π L sin2 a . M2
(30)
The problem of determining the image irradiance has now been solved. Notice that pupil ray quantities do not appear explicitly anywhere in the last few equations. For example, both the angle α and the magnification can be thought of as associated with the marginal ray. Thus, it may appear that the image irradiance can be determined by using only one ray, rather than two, as we claimed at the beginning of this section. However, pupil ray quantities are implied. For example, the angle α is specified by the ratio of the pupil size (determined by the marginal ray) and the location of the pupil or the distance between the image and the pupil (determined by the pupil ray). It is best to think that the PMR determines the location of the object/image and the size of the pupils and the PPR determines the size of the object/image and the location of the pupils. This symmetry between the PMR and the PPR is exploited in aberration theory. IMAGE QUALITY LIMITATIONS Limitations due to the Dispersion of Optical Materials: Chromatic Aberrations The paraxial approximation and the image formation principles discussed so far contain one unstated assumption, namely, that the refractive index n is constant. In reality, that is correct only for a single wavelength. Optical materials exhibit dispersion, that is, variation of the refractive index with wavelength. Therefore, for any system that contains refractive elements, all of the system properties will generally vary with wavelength. In the Gaussian sense, we are concerned with the power and the principal plane location. The chromatic variation of those system properties leads to two directly observable chromatic effects or aberrations: the chromatic variation of focus (also called longitudinal or axial chromatic aberration) and the chromatic variation of magnification (also called transverse or lateral chromatic aberration). To characterize these effects, we need first to characterize the dispersion of optical materials. This is done through the so-called Abbe V number (or occasionally, constringence), defined as follows. Let n1 , n2 , n3 be the values of a material’s refractive index at wavelengths λ1 , λ2 , λ3 , respectively, where λ1 < λ2 < λ3 . These wavelengths, it is
OPTICAL IMAGE FORMATION
assumed, span the range of interest, and λ2 is approximately in the middle of that range. For the range of visible wavelengths, the most common wavelengths taken are 486 nm (F hydrogen line), 588 nm (d helium line), and 656 nm (C hydrogen line). Then, the V number is defined as n2 − 1 . (31) V= n1 − n3 The Abbe V number characterizes the dispersiveness of glass in the following sense: its inverse, 1/V, is equal to the chromatic variation of the angle by which a thin prism made from the given glass deviates light. Now, we can give the chromatic aberrations of a thin lens in air, as follows. Let l3 and l1 be the image distances from the lens for wavelengths 3 and 1, respectively; then, the chromatic variation of focus is 1 1 K − = , l1 l3 V
(32)
where K is implicitly K2 , that is, the power at the middle wavelength, and V is the Abbe value of the material from which the thin lens is made. The chromatic variation of image height is given by η1 − η3 = −hl
K , V
(33)
where K and l refer to wavelength 2 and h is the pupil ray height at the thin lens. This formula makes it clear that there is no chromatic variation of magnification (or image height) if the aperture stop is located at the (thin) lens (in which case, h = 0). Justification for Eqs. (32) and (33) can be found in (2). In a compound system that may contain many different optical materials of different refractive indexes and dispersive properties, calculating the chromatic errors is more complicated. We may note, however, that the difference in V value between different materials may be exploited to create achromatic systems, that is, systems that exhibit small chromatic errors. Further details may be found in (2,3,8–10). Limitations due to the Wave Nature of Light: Information Capacity In the geometrical approximation examined so far, we have seen that it is possible for all rays to cross at a single point, leading to perfect point image formation at least in the paraxial region and for a single wavelength. However, the geometrical approximation is valid only when the wavelength of light can be considered very small. The wave nature of light dictates that it cannot be focused to a region much smaller than its wavelength. Thus, the image of a point will have a certain minimum size, even if the system is otherwise perfect. The irradiance distribution in the image of a point is referred to as the point-spread function (PSF). For a system that suffers no other imperfections, the PSF is determined by the shape of the exit pupil (e.g., circular, elliptical, irregular) and the irradiance distribution across it (see later sections). A most commonly encountered case is that of a circular
pupil shape that is uniformly illuminated. In this case, the rotational symmetry means that the PSF is a function only of the radius ρ: G(ρ) = (const)
J1 (2πρ) πρ
2 (34)
where J1 is a Bessel function of the first kind. The form of G(ρ) is illustrated in Fig. 16. The irradiance distribution that corresponds to Fig. 16 is called the Airy pattern (after Sir George Airy, Royal Astronomer, who had many good opportunities to observe images of point objects). Diffraction theory (11–13) shows that the radius of the first zero, ρ0 , is given by ρ0 =
0.61λ , NA
(35)
where the NA is understood to refer to the image space. The images of two points separated by ρ0 are said to be just resolved, and ρ0 is traditionally thought of as the resolution limit of an optical system (also called the Rayleigh resolution limit — but see later sections for some additional discussion). The maximum number of resolvable spots in the image determines the information capacity of the optical system. If η is the image height, the number of resolvable spots across the image area is proportional to (η /ρo )2 . Using Eq. (35), we find that the information capacity I of a
(a)
1.0
(b)
Relative irradiance
1082
0.8 0.6 0.4 0.2 0.0 50
70
90 110 130 Distance (arbitrary units)
150
Figure 16. (a) Three-dimensional representation of the image of a point formed by a diffraction-limited system of circular aperture. (b) The same function in cross section, displaying the relative heights of the secondary maxima.
OPTICAL IMAGE FORMATION
system is I ∝ (NA)2 η2 = (n sin α )2 η2 = (n u η )2 = H 2 ,
(36)
using Eqs. (23) and (24). Thus the information capacity of an optical system is proportional to the square of the Lagrange invariant. Clearly, this calculation holds for a single wavelength and thus implicitly excludes chromatic aberration. Monochromatic aberrations are also common (see next section) and have the effect of further blurring the image (or increasing the resolution limit), thus limiting the information capacity. In that case, it can be seen that H 2 represents the upper limit in information capacity that is achievable by any (incoherently illuminated) system performing the given imaging task. Limitations due to Nonparaxial Geometry: Monochromatic Aberrations Here, we begin to examine the properties of wholefield imaging systems outside the paraxial region, where images are no longer perfect. There are two complementary ways of examining those properties. The first is based on the geometric properties of the wave fronts and rays, and the second takes the wave nature of light into account. The geometric way is only an approximation, useful when the wavelength can be considered negligibly small. It owes its utility to the fact that geometric calculations are considerably faster than wave (or diffraction)-based ones. Even with the speed of today’s computers, that is a considerable advantage. In principle, diffraction-based calculations are more accurate. However, as the image quality declines, diffractionbased calculations become extremely time-consuming and require very large arrays for an adequate representation of the aberration. Thus, geometric techniques are also useful at the limit of large aberrations. The linear paraxial ray-tracing equations (4), (5) lead to perfect image formation (all paraxial rays that originate from a single object point meet at the same image point). Those equations are linear in u, due to the paraxial approximation sin u = u. When the difference between the sine and the angle can no longer be neglected, the resulting nonlinear ray-tracing equations (involving sin u) lead to the result that rays from the same object point cross the image plane at different places. There are a few special-case exceptions to this (e.g., a point object at the center of curvature of the surface), but in general, monochromatic aberrations are the inevitable result of image formation outside the paraxial region. Imaging of an axial point is also subject to aberrations if the rays collected by the lens form a finite angle with the axis. Generally, the aim of optical design is to control the amount of aberration by using the smallest possible number of (preferably spherical) refracting or reflecting surfaces. A system in which the aberrations are negligible is called diffraction-limited, because the resolution of such a system will be determined only by the diffraction spread in the image of a single point, as per the previous section. A system may be diffraction-limited on-axis, but
1083
aberration-limited off-axis. It is also possible for a system to be diffraction-limited off-axis. The basic tool for deciding whether the system has the desired image quality is finite ray-tracing. A finite ray is any ray that does not stay infinitesimally close to the axis. The only two exceptions to this definition are the two Gaussian rays, the PMR and the PPR, which are traced according to the laws of paraxial optics, even though they form finite angles with the axis. However, the ray paths of the PMR and the PPR are fictitious; they are close to the paths of real rays only in the object and image spaces, assuming that the aberrations are not large and that the correct convention is chosen [Eqs. (24), (25)]. Also, there is no point in tracing more Gaussian rays. Gaussian optics assumes perfect image formation, so no additional information would be obtained. By contrast, because finite rays all cross the image plane at different locations, in general, many finite rays are needed to determine the ray intersection pattern at the image plane. Finite ray-tracing equations are considerably more involved than paraxial ones and are not given here. They may be found in several references, for example (2,9,14). They also form the basis of the many optical design programs available for personal computers. It is worth noting that those programs have evolved to the point that even a nonexpert can use them successfully to evaluate a given system, although design itself requires the appropriate expertise. Aberration may be characterized either through the rays or through the wave front. Ray aberration is, in general, the distance between a finite ray intersection with a desired image plane and the corresponding intersection of a Gaussian ray. Wave-front aberration is the optical path difference between the wave front and a chosen reference sphere, measured along the ray. Optical path difference is defined as the geometric distance times the refractive index of the medium. Qualitatively, the wavefront aberration measures the departure of the wave front from sphericity. Because rays are normal to wave fronts, ray and wavefront aberrations are related: The former is the derivative of the latter with respect to the exit pupil coordinates. Specifically, Fig. 17 shows a schematic of an optical system in which a ray proceeds from axial object point O to the image space, passing through the exit pupil located at E . An exaggeratedly nonspherical wave front is also shown. Let the exit pupil have a height he , and let the radius of the reference sphere be R , equal to the distance
B′o B′
Bo
h ′e
O E
E′ Ref. sphere
Wave front
R′
R Entrance pupil
Exit pupil
Figure 17. Wave front and ray aberrations.
O′ dh′ Q′
1084
OPTICAL IMAGE FORMATION
between exit pupil and Gaussian image (E O ). Let the refractive index of the image-space medium be n . A real ray is perpendicular to the wave front at B and cuts the reference sphere at B0 . The path (B0 O ) = R represents the path of a Gaussian ray that crosses the axis at the paraxial image point O . The wave-front aberration W in this figure is given by W = n (B0 B ).
(37)
Finally, let the vertical axis be y at the pupil and η at the image. The ray aberration is δη = (O Q ). All of these quantities are related through δη = −
R ∂W . n he ∂y
(38)
A similar equation can be written for the x direction. In this expression, R , he , W, and δη have units of length, and y is dimensionless, normalized with respect to he . Proof for this expression can be found in (2,9,15). By choosing to represent the wave-front aberration at the exit pupil, we can conveniently fold the effects of diffraction into the description because diffraction effects will be dominated by the limiting aperture of the system. In the image space, the limiting aperture is the exit pupil. The wave-front aberration is the preferred way of describing aberrations and provides a lucid and understandable description, although, traditionally, optical designers have favored the more esoteric ray aberration. It is worth reviewing the traditional aberration types here because they are often encountered in imaging systems even outside of optics, for example, in imaging by an electron beam. The wave theory of aberrations was put in a consistent theoretical framework by H. H. Hopkins, in his out-ofprint 1950 classic by the same title (15). Welford (9) gave an alternative but similar treatment, and Mouroulis and Macdonald (2) provided a treatment of the introductory aspects of the problem following Hopkins’s notation. The book by Mahajan (16) also contains derivations of the wave aberration of several common optical systems, as well as a complete theoretical treatment. For a single point object, the wave-front aberration can be described as the difference in shape between the actual (generally nonspherical) wave front in the image space and a corresponding reference sphere that represents an ideal wave front. The wave-front aberration can then be represented as a polynomial with respect to the pupil coordinates. However, every point in the object (e.g., an on-axis point and an off-axis point) will generate a different wave-front shape. In other words, the wave-front aberration is, in general, field-dependent. For this reason, we represent the wave-front aberration as a polynomial that contains both aperture and field terms. However, the inclusion of field terms in the wave-front aberration means that it is possible to have field-dependent aberrations where the corresponding wave-front shape is spherical but the location of the image departs from the paraxial prediction. This is explained further later.
Considerations of symmetry dictate that some combinations of terms can be excluded from the polynomial. The types of terms depend on the particular form of the symmetry. For example, a cylindrical lens has a set of aberrational terms different from those of a spherical lens. We restrict our attention to the common case of rotationally symmetrical systems. In that case, the wave-front aberration polynomial can be written as W = W040 r4 + W131 τ r3 cos θ + W222 τ 2 r2 cos2 θ + W220 τ 2 r2 + W311 τ 3 r cos θ + (higher order terms)
(39)
where r is the pupil radial coordinate, τ the field radial coordinate, and θ the azimuth. The coefficients Wijk represent the various aberrational terms and have units of length, and the radial coordinates are normalized with respect to their maximum values. In this notation, the subscripts are chosen to recall the exponents of the variables. Equations (38) and (39) are related through y = r cos θ . The magnitude of W is typically less than one wavelength for systems of high image quality and not more than a few wavelengths for most well-corrected systems. There are, however, cases where high image quality is not a concern, for example, lenses used with array detectors that possess coarse pixel size, in which case the aberration can be many wavelengths. The five terms in Eq. (39) are the so-called primary aberrations, also known as third-order or Seidel aberrations. Specifically, W040 r4 is called spherical aberration. This is a radially symmetrical term that has no field (τ ) dependence. The departure of the corresponding wave front from the spherical shape is shown in Fig. 18. Such a departure from sphericity gives a variety of possible irradiance distributions that depend on the amount of aberration and the exact location of the focus. These images are best appreciated when compared with the image of a nonaberrated point. For this reason, we show them in the next section. W131 τ r3 cos θ is called coma. The departure of the comatic wave front from sphericity is shown in Fig. 19. It can be seen that coma does not impart an overall curvature to the wave front and thus does not change the position of best focus along the axis. It does, however, produce an overall tilt to the wave front, that results in a shift of the best image point from the Gaussian prediction. W222 τ 2 r2 cos2 θ is called astigmatism. This aberration imparts a cylindrical error to the wave front, as shown in
Figure 18. Wave-front shape for a system suffering from spherical aberration. The corresponding reference sphere is flat. The departure from the reference has a fourth power dependence on the radius. Notice that spherical aberration adds an overall curvature to the wave front.
OPTICAL IMAGE FORMATION
Object (barrel)
1085
Image (Pincushion)
Figure 19. Wave-front shape for a comatic wave front (flat reference).
Figure 22. Image of a square object (left) through a system suffering from distortion.
Fig. 20. Again, the corresponding point images are shown in the next section. For the last two figures, we note that the field dependence is not particularly relevant; it merely determines the amount of aberration but does not change the basic wave-front shape. However, for the two remaining aberrations following, the wave-front shape is actually spherical, and only the field dependence is of importance. W220 τ 2 r2 is called field curvature. Although the wave front is spherical, the plane of best focus shifts as field angle increases. The result is that the best image is not obtained on a plane but on a curved surface. If it were possible to curve the receiving (detector) shape to conform to the image surface, a stigmatic (point-like) image would be obtained throughout. However, usually the detector surface is flat, and in that sense field curvature is as significant an aberration as the first three. Figure 21 shows the effect of field curvature. W311 τ 3 r cos θ is called distortion. Again, this represents a spherical wave-front shape. Distortion represents a change in magnification across the field. In negative or pincushion distortion, the magnification increases with field, and in positive (or barrel) distortion, the magnification decreases with field. The corresponding image of a square is shown in Fig. 22. Image imperfections are not exhausted by these five types of aberrations. Higher order aberrations exist, as do
other types in systems of different symmetry. However, these primary aberrations are important because they describe the most fundamental and commonly encountered types of image degradation. They are also very easy to compute: for rotationally symmetrical systems, it can be shown (2,9) that the computation of these primary aberrations involves only paraxial ray tracing of the two Gaussian rays, the PMR and PPR previously discussed. IMAGE FORMATION IN THE SPATIAL DOMAIN (CONVOLUTION): THE POINT-SPREAD FUNCTION The Convolution Integral Intuitively, we can consider the image of an extended object as the summation of the images of all points that comprise the object. Let G(u, v) be a function that represents the image of a point, called the point-spread function, (PSF). An example of such a function is shown in Fig. 16 for a system that has no aberration. More examples will be shown later. Let B(u, v) be the irradiance distribution in the object. The summation of the point images [where peak irradiances are weighted by B(u, v)] is expressed mathematically as a convolution integral (see also Appendix): B (u , v ) =
+∞ +∞ B(u, v)G(u − u, v − v) du dv.
(40)
−∞ −∞
Figure 20. Wave-front shape for an astigmatic wave front (flat reference).
Stigmatic image surface
Gaussian image plane Figure 21. Field curvature: The best (stigmatic) image is obtained on a curved surface.
The intuitive appeal of this way of describing image formation hides two critical assumptions in Eq. (40). The act of summing (or integrating) implies that the total irradiance is obtained by summing the irradiances of individual points. This is the linearity assumption, and it is satisfied if the system is illuminated incoherently. This corresponds to the everyday experience of adding the irradiances produced on a receiving screen by two different lamps. However, with coherent illumination, interference effects cause the total irradiance to differ from the sum of the individual irradiances. For our purposes here, we may define incoherent illumination as follows: The phase difference of the EM field emitted from two different source (object) points varies randomly and rapidly relative to the integration time of the photodetector. If this condition holds, interference effects from the two source points are averaged out over time and thus are not observed. The second assumption is that the function G(u − u, v − v) does not change form as it shifts across the image, in other words, that it is the same function for every u , v . This is the assumption of stationarity. To
1086
OPTICAL IMAGE FORMATION
connect this discussion with the previous section, notice that if the system suffers from any aberration other than spherical (in other words, any field-dependent aberration), it is strictly speaking nonstationary. However, we can apply Eq. (40) even in such a case, provided we restrict the region of integration over an area where the variation of the PSF is negligibly small and provided such area is large compared with the extent of the PSF. Thus, we can consider that an extended image comprises a set of patches across which the PSF is constant. For the vast majority of whole-field imaging optical systems, the assumption of incoherence is justified. However, there are cases where the system is coherently illuminated. Even in that case, image formation can still be described as a convolution, provided that we do not sum the irradiances but the complex amplitudes (amplitude and phase) of the field. In other words, for coherent illumination, the PSF is the complex amplitude (rather than irradiance) distribution in the image of a point. If F(u, v) is the coherent PSF and A(u, v) is the complex amplitude distribution in the object, then coherent image formation can be expressed as +∞ +∞ A(u, v)F(u − u, v − v) du dv, A (u , v ) =
the two is 2π W/λ ≡ kW, where k is the propagation constant. Therefore, f (x, y) = s(x, y) exp[ikW(x, y)].
(42)
In practice, the procedure for determining the PSF is as follows. First, the wave-front aberration is determined by finite ray tracing up to the exit pupil. This establishes W(x, y) for a grid of sampling points. Here, a 2-D polynomial may be interpolated if desired. Information about the variation in system transmittance (technically called apodization) must also be supplied, although often it is assumed that the system has uniform transmittance. Thus f (x, y) is established. At this point, the effects of diffraction are folded in by considering the propagation between the exit pupil and the image plane as a diffraction process. This process is described in some detail in the books by Goodman (17) and Gaskill (18). Here we give the result, which is that apart from a constant factor, the complex amplitude distribution in the image of a point is the Fourier transform of the pupil function (see Appendix for an introduction to the Fourier transform):
(41)
The Relationship between the PSF and the Pupil Function If we know the complex amplitude distribution over the exit pupil, we can determine the form of the PSF. Referring back to Fig. 17, consider the exit pupil of an optical system at E , and let a reference sphere be centered at point O . This point may be the Gaussian image point or any nearby convenient point, for example, one that minimizes the difference between the reference sphere and the wave front. The wave front shown is generally nonspherical, although the difference is greatly exaggerated for clarity. The reference sphere represents a perfect wave front. Let f (x, y) be the complex amplitude distribution at the reference sphere and s(x, y) be the real amplitude distribution (representing the variation of transparency or transmittance). The wave-front aberration W is the difference in the optical path length between the reference sphere and the wave front, so the phase difference between
(43)
A
−∞ −∞
where A describes the complex amplitude distribution in the image. In coherent illumination, the stationarity requirement is more involved, and we postpone a more complete discussion until later. In summary, Eqs. (40) and (41) show that the problem of image formation is reduced to the problem of determining the image of a single point. The next section shows how the image of a point may be determined. Since the image of a point has a certain minimum spread determined by diffraction, even in the absence of aberrations, it follows that the image of an extended object is bound to be an inaccurate, or blurred, representation of the object. This means that some details present in the object will be lost. Thus convolution can be thought of as a blurring operation.
f (x, y) exp[i2π(ux + vy)] dx dy
F(u, v) = const
The integration extends over the aperture or exit pupil area A because f (x, y) is zero outside the pupil area. In this expression, the pupil coordinates x and y, it is assumed, are normalized with respect to the radius of the exit pupil, and the normalized image coordinates u and v are related to the real coordinates ξ and η through u=
(n sin a)ξ λ
v=
(n sin a)η , λ
and
(44)
where (n sin a) is the numerical aperture in the image space. The validity of Eq. (43) assumes that the conditions for scalar diffraction apply, namely, that the angles involved are not large. In practice, vector diffraction effects are small, even for a numerical aperture of 0.5 or so. It also assumes that the field angle is not too large, otherwise the coordinate normalization scheme must be redefined (19). The complex amplitude distribution F is the PSF for a coherently illuminated system. For an incoherent system, the PSF is (45) G(u, v) = |F(u, v)|2 . Coherence has two aspects, spatial and temporal. Spatial coherence is defined as the degree of correlation between the EM fields emitted by two different source points. Temporal coherence describes the degree of monochromaticity of the light (or mathematically, the autocorrelation of the waveform emitted by the source). It is worth noting that even the incoherent PSF given before applies for a single wavelength, implying temporally coherent but spatially incoherent light. Quasi-monochromatic light
OPTICAL IMAGE FORMATION
1087
1.0
Figure 23. Shape of the pupil function for a system that has no aberration and a circular pupil.
Relative irradiance
0.8
0.6
0.4
(narrow wavelength band) provides a good approximation to the incoherent PSF.
0.2
The Diffraction-Limited PSF
0.0
Incoherent Two-Point Resolution Limit. The radius of the first zero (Airy disk) is given by Eq. (35) for a focusing system. This can also be given in angular terms as 1.22λ/d, where d is the exit pupil diameter, which is valid for afocal systems as well. When two nearby points are separated by a distance equal to the radius of the Airy disk, they are said to be just resolved, and the radius of the Airy disk is considered the incoherent, twopoint resolution limit or Rayleigh resolution limit (see also later for other kinds or resolution limits). The irradiance recorded by the photodetector is shown in Fig. 24 in cross section. Resolution of two points implies that the central dip is detectable. However, a low-noise photodetector can reliably detect a smaller dip than that shown in Fig. 24. For this reason, the resolution limit of Eq. (35) should be seen as a convention that does not reflect any deeper physics. For example, a more rational definition of the resolution limit could have been the distance that makes the central dip just vanish. This is referred to as the Sparrow resolution limit (20). The Coherent PSF The coherent PSF that corresponds to the pupil function of Fig. 23 is shown in Fig. 25. The coherent PSF is normally a complex function, and thus can be represented only in two separate graphs (amplitude and phase, or real and imaginary part). However, in this case, the phase of the function alternates between zero and π , so it can be represented as a real function. A photodetector, such as the human eye, will record only irradiance, and thus will not perceive directly any difference between the functions of
40
50
60
70 80 90 100 Arbitrary units of distance
110
120
Figure 24. Irradiance distribution in the image of two incoherent points at the Rayleigh limit of resolution (diamonds). The individual irradiances of the two points are also shown in thinner lines.
1.0 0.8 Relative amplitude
For a system that has a circular pupil and no aberrations, the pupil function is simply a ‘top-hat’ function, unity inside the pupil and zero outside, as shown in Fig. 23. The incoherent PSF resulting from that function has already been shown in Fig. 16. The pattern has regular zeros and rapidly diminishing maxima. The peak irradiance of the first ring is only 1.9% of the maximum, and 83.8% of the total energy is contained (encircled) in the area up to the first zero.
0.6 0.4 0.2 0.0
−0.2 −0.5
−0.3
−0.1 0.1 Arbitrary units of distance
0.3
0.5
Figure 25. Coherent PSF of a system that has a circular pupil and no aberrations.
Figs. 16 and 25 (the former is simply the squared modulus of the latter). However, two coherent point images in close proximity give rise to a pattern much different from that of Fig. 24, due to the fact that the light from the two point images will interfere constructively or destructively, depending on the relative phase. The pattern will also depend on whether the two object points are cophasally illuminated. Figure 26 shows an example of the amplitude and irradiance distribution in the combined image of two coherent, nearby points, where the distance between the points is the same as that of Fig. 24. It can be seen that when the two object points are cophasally illuminated, they are not resolved, but when they are in antiphase, they are fully resolved. This example shows the complications of coherent imaging.
1088
1.8
0.9
1.6
0.8
1.4
0.7
Relative irradiance
Relative irradiance
(a)
OPTICAL IMAGE FORMATION
1.2 1.0 0.8 0.6 0.4
50
60 70 80 90 100 Arbitrary units of distance
110
120
Relative irradiance
0.3 0.2
0
20
40 60 Arbitrary units of distance
80
100
Figure 27. Effect of defocus on the incoherent PSF. The Airy disk is omitted for clarity, but it would have a peak of unity on the same scale. Squares: W020 = 0.25λ, triangles: W020 = 0.5λ, diamonds: W020 = 1λ. In the latter case, the central value of the pattern is zero. The total energy (integrated over x and y) is constant in all three patterns.
1.0 0.8 0.6 0.4 0.2 0.0 40
0.4
0.0
1.2
(b)
0.5
0.1
0.2 0.0 40
0.6
50
60 70 80 90 100 Arbitrary units of distance
110
120
Figure 26. (a) Irradiance cross section plot in the image of two coherent and cophasal points at the Rayleigh distance. The two points are not resolved. (b) The same plot when the points are π out of phase. See color insert.
The Effect of Defocus. It is interesting to see how defocus modifies the pattern of Fig. 16. In this sense, defocus can be considered an aberration of the form W020 r2 following the aberration notation in ‘‘Limitations due to Nonparaxial Geometry.’’ This has the effect of simply changing the radius of the reference sphere (in other words, the perturbed wave front differs from the ideal by a spherical term that can be eliminated through refocusing). Figure 27 shows the effect of defocus using W020 as a parameter.
the secondary mirror often obscures the central part of the aperture. Figure 28 gives the form of the pupil function for a central obscuration of 12% of area. The corresponding PSF is in Fig. 29. It can be seen that the effect of the obscuration is to increase the energy in the rings at the expense of the central maximum. Various cases of nonuniform pupil shapes have been described by Mahajan (21). The PSF in the Presence of Aberrations Here, a brief, visual representation of the PSF in the presence of aberrations is given. Each aberrational term is a phase term, and the corresponding PSF is the squared modulus of the Fourier transform of the corresponding pupil function, s(x, y) exp[ikW(x, y)], where s(x, y) is the same as in Fig. 23, and W(x, y) is the corresponding aberrational term, as given in ‘‘Limitations due to Nonparaxial Geometry.’’ The original work on this topic was done by Nijboer (22–24). Excellent pictorial representations of PSFs can be found in (25), and in the more easily accessible (10). Spherical Aberration. The PSF in the presence of spherical aberration varies in a complicated way as
Apodization Because there is no phase error (aberration), the diffraction-limited PSF is determined by the shape of the pupil and the real amplitude term of the pupil function. A circular pupil is commonly assumed, but it should be appreciated that even in the absence of aberrations, the pupil shape becomes elliptical or irregular at significant off-axis angles. The effect of pupil shape is evident at rather large field angles of some tens of degrees or when the system suffers from vignetting, where rays from the extreme parts of the aperture are (sometimes deliberately) cut off. An example of a nonuniform real amplitude term (apodization) is given by astronomical telescopes, in which
Figure 28. Form of the pupil function for a system that has no aberration and a centrally obscured pupil.
OPTICAL IMAGE FORMATION
0.5
1.0
0.4 Relative irradiance
Relative irradiance
0.8 0.6 0.4 0.2
Best diffr. focus Best rms spot Paraxial focus 0.3
0.2
0.1
0.0 0
20
40 60 Arbitrary units of distance
80
100
Figure 29. Comparison of the incoherent PSFs of an apodized (squares) and an unapodized system (diamonds). The latter PSF is the same as in Fig. 16. The two PSFs are normalized to the same amount of energy.
a function of defocus. Thus, in fact, the aberration function is taken as W040 r4 + W020 r2 , where the second term represents defocus (a change in the radius of the reference sphere). This is a rotationally symmetrical aberration, so the PSFs are also rotationally symmetrical. Figure 30 shows the PSF section for one wave of aberration (W040 = 1λ) and at three different foci as noted, and Fig. 31 shows the PSF section for two waves of aberration. Coma. Coma is an asymmetrical aberration that does not interact with defocus in the sense that the best image in the presence of coma is at the paraxial focal plane. However, the location of the PSF centroid is shifted
0.9 Best diffr. focus Paraxial focus Best rms spot
0.8 0.7 Relative irradiance
1089
0.0 0
20
40
60
80
100
Arbitrary units of distance Figure 31. Plot of the incoherent PSF for two waves of spherical aberration and various positions along the axis: Diamonds: best diffraction focus, squares: best geometrical spot size; triangles: paraxial focus. Notice the difference in scale between this and the previous figure. The Airy disk is unity at the peak in both cases.
laterally with respect to the paraxial prediction. Figure 32 shows the PSF for one and two waves of coma. Astigmatism. Astigmatism is a cylindrical deformation of the wave front that makes the PSF look like a short line. As we move through focus, we encounter two such focal lines, the sagittal and the tangential, perpendicular to each other. Halfway between them, the circle of least confusion is formed, which (although it does not resemble a circle) represents the minimum spread of the PSF and thus is taken as the best focus. Figure 33 shows the astigmatic line and the circle of least confusion for one astigmatic wave. Image Quality Criteria Based on the PSF Traditionally, the notion of image quality applies to incoherently illuminated systems, which represent the
0.6 0.5
(a)
0.4 0.3 0.2 0.1 0.0 0
20
40 60 Arbitrary units of distance
80
100
Figure 30. Plot of the incoherent PSF in the presence of one wave of spherical aberration and at various positions along the axis. Diamonds: best diffraction focus, triangles: minimum geometrical spot size; squares: paraxial focus. The pattern that corresponds to the minimum geometrical spot size is called traditionally the ‘‘circle of least confusion,’’ although the name is not justifiable from the diffraction viewpoint.
(b)
Figure 32. (a) Irradiance distribution in the PSF in the presence of one wave of coma. The peak value of the pattern is about 0.53 (relative to an Airy disk of unity peak). (b) Two waves of coma.
1090
OPTICAL IMAGE FORMATION
(a)
Table 2. Strehl Tolerances for Primary Aberrations Symbol
Value for I = 0.8 (in waves)
Defocus Spherical
W020 W040
0.25 0.95
Coma Astigmatism
W131 W222
0.62 0.36
Aberration
(b)
Figure 33. (a) Irradiance distribution in the PSF of a system suffering from one wave of astigmatism, at one of the two astigmatic foci (sagittal or tangential), where the image resembles a line. (b) A symmetrical image is obtained between the sagittal and tangential foci.
vast majority of image forming systems. It would be possible to devise modifications for coherent systems, but no particular need for doing so has arisen. The Strehl Ratio. The Strehl ratio I is the ratio composed of the peak value of the PSF divided by the peak value of the PSF of an unaberrated system that has the same aperture (26). This definition is valid when the PSF has a well-defined, dominant peak, a condition that does not necessarily hold in the presence of a substantial amount of aberration (e.g., Fig. 33). Thus the Strehl ratio is primarily useful for well-corrected systems. The Strehl ratio can be read from Figs. 27, 29, and 30 as the maximum PSF value. An alternative definition of the Strehl ratio is the PSF value at the centroid. This has the advantage of retaining validity for any aberrational condition, although the scale of a PSF plot is best appreciated from its peak value. It can be shown that the two definitions lead to the same result under the appropriate conditions, which are small aberrations balanced by the appropriate amount of longitudinal or transverse focal shift. A system that has a Strehl ratio I ≥ 0.8 is considered practically diffraction-limited. If the system is free of aberration but is somewhat out of focus, the value of I = 0.8 corresponds to the Rayleigh quarter-wave criterion, according to which a wave-front error of less than a quarter of a wavelength is negligible. In other words, a wave-front focus error of W020 = λ/4 results in a PSF where I = 0.8 (see also Fig. 27). For other aberrations however, the permissible amount differs from a quarter wave. These values are given in Table 2 for the three primary aberrations [justification may be found in (27]. Related to the Strehl ratio is another often used quantity, the wave-front variance, which is also (inappropriately but commonly) called rms (root-mean-square) aberration. For any one field position, the wave-front variance is given by (46) E = (W 2 ) − (W)2 , where the average refers to integration across the pupil area. Mar´echal examined in detail the relationship
Comment
At best focus (W020 = −W040 ) At best focus (W020 = −W222 /2)
between the Strehl ratio and the wave-front variance (28). For values of I > ∼0.5, there is a one-to-one relationship between the wave-front variance and the Strehl ratio (29). However, for lower values, where the PSF may not have a well-defined single maximum, its peak value is no longer relevant. In such a case, the energy spread may be characterized by the encircled energy, as in the next section. The wave-front variance remains a useful way of specifying the departure of the wave front from sphericity, especially because it can be readily measured by interferometric techniques. However, it is not necessarily a good way of specifying image quality for all applications. For example, if the final detector is the human eye, it is known that the wavefront variance is poorly correlated to subjective image quality (30). Encircled Energy. There is no single criterion akin to the Strehl ratio for encircled energy. Although such criteria could be devised (e.g., the radius encircling a fixed amount of energy or the amount of energy inside a fixed radius), no particular need for doing so has arisen. Encircled energy becomes important in the presence of a photodetector of limited area or a photodetector array such as a CCD. Then, it is important to know how much of the PSF energy is contained in the pixel. Because pixels are typically square, we may also speak of ‘‘ensquared’’ energy. Figure 34 compares the encircled energy between the PSFs of Fig. 29 (diffraction-limited system with and without 12% central obscuration). Various cases have been examined by Mahajan (31). Line- and Edge-Spread Functions Line-Spread Function. The line-spread function (LSF) is the irradiance (incoherent) or the complex amplitude (coherent) distribution in the image of an ideal line object. Image formation can be described in terms of the LSF for the orientation perpendicular to the line. In other words, we can write the irradiance distribution in any direction in the image as a convolution of the irradiance distribution in the object (in the same direction) and the LSF. The LSF has two advantages; it reduces all twodimensional integrals to one-dimensional ones, and it is generally easier to measure the irradiance distribution in the image of a line than in the image of a point. The LSF is obtained from the PSF by simply integrating in the
OPTICAL IMAGE FORMATION
1.0 Relative amplitude or irradiance
1.0
0.8 Fraction of total energy
1091
0.6
0.4
0.2
0.8
Amplitude Irradiance
0.6 0.4 0.2 0.0 0
20
40
60
80
100
−0.2 −0.4 Arbitrary units of distance
0.0 0
10
20 30 40 Arbitrary units of distance
50
Figure 34. Encircled energy comparison between an obscured (squares) and unobscured (triangles) system, corresponding to the PSFs of Fig. 29.
direction of the line: +∞ G(u, v) dv
(47)
−∞
In the incoherent case and for a system that has no aberrations and a uniformly illuminated circular aperture, the LSF is given by integrating Eq. (34), expressed in rectangular coordinates. The form of the function is given in Fig. 35. The behavior of the LSF in the presence of aberrations can also be readily predicted numerically. Aberrational tolerance theory based on the LSF has been developed (32).
1.0
Relative irradiance
0.8 LSF PSF
0.4
0.2
0.0 40
60
80
100
Arbitrary units of distance Figure 35. Incoherent LSF (diamonds) for a diffraction-limited system that has a circular pupil, compared with the corresponding PSF (squares).
sin(2π u) πu
(48)
and the corresponding irradiance distribution by the square of this function. The two functions are shown in Fig. 36. The irradiance distribution in the coherent LSF has zeros, unlike the incoherent case. The location of the first zero is given by u = 0.5, or, for nonnormalized coordinates (Eq. 44), ξ=
0.5λ n sin a
(49)
Edge-Spread Function. The edge-spread function (ESF) is the irradiance (incoherent) or complex amplitude (coherent) distribution in the image of an edge. In the incoherent case, there is no closed-form expression even for a diffraction-limited system. The ESF is obtained by convolving the LSF with a step function that represents the edge. Alternatively, we may express the LSF [G(u)] as the derivative of the ESF [E(u)]: G(u) =
0.6
20
If the line object is coherently and cophasally illuminated, the complex amplitude distribution in the image is given simply by A(u) = (const)
LSF ≡ G(u) =
0
Figure 36. Amplitude (diamonds) and irradiance (squares) distribution in the image of the coherent LSF of a system that has no aberrations and a circular pupil.
dE(u) . du
(50)
This formula is more useful in practice because the most common use of the ESF is to determine the LSF and through it, the modulation transfer function (MTF) see next section). The incoherent ESF is a monotonic function for any combination of aberrations. It is shown in Fig. 37. For coherent case in the absence of aberrations, the complex amplitude distribution in the image of an edge is given by 1 (51) A(u) = 0.5 + Si(2π u) π cophasal illumination), where Si(z) = (assuming z sin x/x dx, the so-called Fresnel sine integral. The 0
1092
OPTICAL IMAGE FORMATION
The Image as a Fourier Transform
1.0
0.8 Relative irradiance
LSF ESF 0.6
0.4
0.2
0.0 0
20
40 60 80 Arbitrary units of distance
100
Figure 37. Incoherent ESF (squares) for a system that has no aberration and a circular pupil, compared with the corresponding LSF (diamonds).
1.2
Any periodic function f (x) that satisfies certain simple conditions (see Appendix) can be written as a summation of sinusoidal functions whose frequencies are equal to integral multiples of the frequency of the original function. Thus an image (or irradiance distribution) can be analyzed in terms of its harmonic (Fourier) components, or the components can be added (synthesized) to reconstruct the image. Fourier synthesis is illustrated by an example in Fig. 39, where we show a periodic function (sawtooth wave) and the function that results from summing the first four harmonic components (whose periods are p, p/2, p/3, p/4). A remarkably good representation of the function is obtained, even with only four harmonic components. The (ir)radiance distribution that corresponds to the sawtooth function of Fig. 39 is shown in Fig. 40. A sinusoidal irradiance distribution, whose period is the same as the sawtooth, is shown in Fig. 41. It is the addition of sinusoidal patterns (or gratings) similar to this that can be used to synthesize the distribution of Fig. 40, but there is a subtlety that must be appreciated.
Relative amplitude or irradiance
1.0 1.0
0.8 0.8
0.6 Complex amplitude Irradiance
0.6
0.4 0.4
0.2 0.2
0.0 −50
−30
−10
10
30
50
−0.2 Arbitrary units of distance Figure 38. Amplitude (diamonds) and irradiance (squares) distribution in the coherent image of an edge or a system that has no aberrations and a circular pupil.
0.0 0.0
0.5
1.0
1.5
2.0
2.5
3.0
Figure 39. A sawtooth function whose slope is −1 and period 1, and its harmonic approximation, 4 limited to the first four Fourier coefficients: f (x) ∼ = 1/2 + 1/π n=1 1/n sin(2π n).
corresponding irradiance pattern, B(u) = |A(u)|2 , is shown in Fig. 38. It can be seen that the coherent image of an edge shows ringing, unlike the incoherent ESF. The maximum value of the ESF is 1.18 (if the geometric edge is unity). This maximum value occurs at u = 0.5, or ξ = 0.5λ/(n sin a). IMAGE FORMATION IN THE FREQUENCY DOMAIN (FILTERING). THE OPTICAL TRANSFER FUNCTION The necessary mathematical background for this section is summarized in the Appendix. A reader unfamiliar with the basics of Fourier transforms will need to consult the Appendix before proceeding further.
Figure 40. A sawtooth irradiance distribution. This function may appear to differ visually from a sawtooth near the discontinuity due to the property of the human eye that exaggerates luminance discontinuities, as well as imperfections in the printing process.
OPTICAL IMAGE FORMATION
1093
proceeding further, the variables x and y used in the Appendix will be replaced by u and v to keep a consistent notation with the section ‘‘Image Formation in the Spatial Domain,’’ where the image coordinates were (u, v) and the pupil coordinates were (x, y). If f (u, v) is an irradiance distribution, it may be written as
f (u, v) =
+∞ +∞ F(σ, τ ) exp[i2π(σ u + τ v)] dσ dτ,
(53)
−∞ −∞
where +∞ +∞ F(σ, τ ) = f (u, v) exp[−i2π(σ u + τ v)] du dv.
(54)
−∞ −∞
Figure 41. An approximately sinusoidal irradiance distribution.
A function that represents a physically plausible (ir)radiance distribution cannot contain negative values. The full Fourier series expansion of the sawtooth function of Fig. 39 is given by [see Eqs. (A1)–(A4)] f (x) =
∞ 11 n 1 + sin 2π x , 2 π n=1 n p
(52)
where the coefficients of the cosine terms are all zero. It will be seen that the sinusoidal components contain negative values, unlike the irradiance distribution of Fig. 41. Thus, the harmonic components of an image are not necessarily themselves true images (or physically possible irradiance distributions). In the example of Eq. (52), the addition of the constant term 1/2 makes f (x) overall positive, but this is not the same as adding a series of wholly positive ∞ sinusoids. Such a series would be represented as n=1 [1/n + 1/n sin(2π(n/p)x] because the minimum constant that can be added to a sinusoid to make it positive is its own amplitude. This last series ∞ would give the same result as Eq. (52) if the series n=1 (1/n) converged to π/2. However, this series does not converge; hence, the addition of fully positive sinusoids does not reproduce f (x). The fact that there is no physically negative (ir)radiance1 ultimately does not affect the utility of Fourier analysis in describing an image, nor does it diminish the significance of sinusoidal (ir)radiance distributions as test objects that have special properties. Now, we wish to consider the image in two dimensions and extend the description to nonperiodic functions. Before
1
There are certain special cases, for example, the phenomenon of lateral inhibition in vision or of adjacency in photography, where the response of the recording medium can be described through a point-spread function that has negative irradiance. The physical interpretation of such negative irradiance is that the excitation of a point reduces the response of neighboring points. This exception has no effect on this discussion, and in any case, it is associated with recording media (retina, film) rather than purely optical systems.
The last two equations show that we can consider an image as the summation (or integral) of an infinite number of sinusoidal components of various periods (frequencies) and orientations. Fourier analysis provides us with an alternative way of thinking about an image: instead of being concerned about its visible features, we are concerned about its spatial frequency content. This shift of viewpoint proves to be of great utility. For example, one may perform operations that directly affect the spatial frequency spectrum of an image; some examples are given in the following sections. The Optical Transfer Function Definitions The fundamental significance of the sinusoidal distribution in image formation is that it remains sinusoidal when transmitted by an optical system, except for amplitude and phase. This is a cornerstone fact of image formation, and it is an immediate consequence of the linearity property of optical systems. Linearity means that no higher harmonics are generated by the system. When this fact is coupled with the Fourier method of describing an image, the general problem of image formation is reduced to the problem of imaging a sinusoid (or better, a set of sinusoids of various frequencies and orientations), just as in the section ‘‘Image Formation in the Spatial Domain,’’ the problem of image formation was reduced to determining the image of a point. And although substituting the image of a point by that of many sinusoids might appear a more laborious way of describing image formation, it ultimately leads to a simpler way of describing the action of an optical system — or, if not simpler, at least complementary and of equal significance. The concept of the transfer function is perhaps more familiar in audio engineering. Audio amplifiers are designed to have a flat frequency response over the 20 Hz to 20 kHz band. This means that they must transmit all pure tones (sinusoids) across that band in undiminished (or uniformly amplified) amplitude, independent of frequency. Similarly, they are not supposed to introduce any phase shift, and harmonic distortion (extraneous frequencies not present in the original signal) must be kept very low. An incoherently illuminated optical
1094
OPTICAL IMAGE FORMATION
system can be thought of as an amplifier with a gain ≤1 that is inherently incapable of introducing harmonic distortion but may introduce a nonlinear, frequencydependent phase shift. Let B(u, v) = a + b cos[2π(σ u + τ v)] (55)
magnification. When numerical values for the frequency are quoted in a system that has nonunity magnification, they usually refer to the image space. The functions M and φ in Eq. (58) are the modulus and phase of a single function: D(σ, τ ) = M(σ, τ ) exp[iφ(σ, τ ],
be a cosinusoidal radiance distribution (grating) or object. We assume incoherent illumination for now. The period d and the orientation θ of this cosinusoidal grating are given by d = (σ 2 + τ 2 )−1/2 , τ θ = arctan σ
a
B (u , v ) =
+b cos(2π(σ u + τ v))] [
× G(u − u, v − v) du dv,
(57)
where it is understood that the integration extends across the image area (or the area across which the PSF remains stationary). Performing the integration and after appropriate substitutions of variables that are not detailed here, we obtain the following result: B (u , v ) = const.{a + bM(σ, τ ) cos[2π(σ u + τ v ) + φ(σ, τ )]}
(58)
This equation shows that the irradiance distribution is also cosinusoidal and has the same frequency and orientation as the object, but has different contrast (bM/a) and a different phase (φ). Notice that to derive Eq. (58), we only made the assumptions of linearity and stationarity that are inherent in the convolution Eq. (57). No additional assumptions were needed; hence, the fact that a sinusoid is imaged as a sinusoid is inherent in those two assumptions. The reader may object at this point that the image of the sinusoid cannot be another sinusoid of the same period unless the system has unity magnification. This is true of course, but Eqs. (57) and (58) are strictly correct as written; the normalized coordinates we are using [defined through Eq. (44)] actually account for the magnification because they contain the numerical aperture [see also Eqs. (10) and (24)]. Thus, when we say that the image is a sinusoid of the same frequency, we mean the same frequency in the normalized coordinate space. In real coordinate space, the grating period scales according to the If both a and b are positive and b ≥ a, then the radiance distribution B is physically possible, that is, it contains no negative values. However, the derivation that follows is independent of that assumption.
2
(
u, v) exp[−i2π(σ u + τ v)] du dv
G
(
,
(60)
u, v) du dv
G
where the first of these two relationships arises simply from Pythagoras’ theorem. We also define the contrast or modulation of the grating as the ratio b/a.2 Now, apply the convolution Eq. (40) to determine the image of this grating. If G(u , v ) is the system PSF, the irradiance distribution in the image will be (from Eq. 40)
where D(σ, τ ) is given (from the derivation of Eq. 58) as
D(σ, τ ) = (56)
(59)
where the limits of integration are the same as in Eq. (57). This last equation may explain how G(u, v) seemed to have disappeared in going from Eq. (57) to (58). It was in fact absorbed in D(σ, τ ). The function M(σ, τ ) is the ratio of the image contrast or modulation (bM/a) to the object contrast (b/a). It is called the modulation transfer function (MTF). The function φ(σ, τ ) is the phase shift between image and object. It is called the phase transfer function (PTF). The function D(σ, τ ), which has the MTF as its modulus and the PTF as its phase, is called the optical transfer function (OTF). Under the assumptions of linearity and stationarity, the OTF fully characterizes image formation by an optical system. Eq. (60) shows that the OTF is given by the normalized (inverse) Fourier transform of the PSF G(u, v). The two ways of describing image formation (from the PSF and from the OTF) are intimately related. This may first be appreciated from Eq. (60) that related the PSF and OTF. The convolution theorem (Eqs. (A25), (A26) in the Appendix) also states that if a function can be written as the convolution of two others, its Fourier transform is given by the product of the Fourier transforms of the two functions. From the convolution Eq. (40) that relates the (ir)radiance distribution in the object and the image through the PSF, we obtain the following relationship: (61) b (σ, τ ) = b(σ, τ )g(σ, τ ), where g(σ, τ ) is the nonnormalized version of the OTF. This equation states that the Fourier (spatial frequency) spectrum of the image [b (σ, τ )] is given by the Fourier spectrum of the object, multiplied by the OTF. Thus the OTF acts as a spatial frequency filter on the frequency spectrum of the object. Calculation of the OTF. The steps for calculating the OTF have already been outlined. First, the pupil function f (x, y) must be established, which involves computing (by ray tracing) the wave-front aberration W(x, y) as well as the pupil transmittance [Eq. (42)]. Next, the Fourier transform of the pupil function gives the complex amplitude distribution F(u, v) in the image of a point [Eq. (43)]. The squared modulus of F is the incoherent PSF, G(u, v) [Eq. (45)]. Finally, the
OPTICAL IMAGE FORMATION
(normalized) inverse Fourier transform of the PSF gives the OTF [Eq. (60)]. This method of computing the OTF is occasionally employed because of the availability of fast Fourier transform (FFT) numerical routines that speed up the calculations considerably. However, there is another, more direct method of computation, which is preferable in many ways (33). This alternative is based on a mathematical theorem that allows us to relate the OTF directly to the pupil function. The theorem may be stated as follows: If a function A(x, y) is the Fourier transform of another function a(u, v), then the autocorrelation of a(u, v) is given by the inverse Fourier transform of the squared modulus of A(x, y) (see also Appendix). In this case, the OTF is the inverse Fourier transform of the squared modulus of the complex amplitude distribution in the image, which in turn is the Fourier transform of the pupil function. Hence, we may write the OTF directly as the autocorrelation of the pupil function: 1 A
(
S
1
−1
x −1 s
Figure 42. Illustration of the autocorrelation of a square pupil in the x direction. The area of overlap S is shown shaded.
1
x, y)f ∗ (x − σ, y − τ ) dx dy
(62)
MTF
D(σ, τ ) =
y
1095
f
where A is the pupil area and * denotes complex conjugate. The limits of integration can be considered infinite, but in practice the integral is nonzero only over the area of overlap of f (x, y) and the shifted function f ∗ (x − σ, y − τ ). In writing this equation, it would appear that the physical units are wrong if we are subtracting σ (spatial frequency or inverse length) from x (length). It is important to understand that all variables are normalized to be unitless [see Eq. (44) and associated discussion]. Without the appropriate normalization, all of these equations expressing the PSF, the OTF and the Fourier transformations become a lot more cumbersome. Some simple examples of the OTF are given below, for zero aberrations and unity pupil transmittance, that is, f (x, y) = 1 inside the pupil and zero outside. In that case, the OTF depends solely on the shape of the pupil. Along the axis of a rotationally symmetrical optical system, the pupil shape is typically circular, but off-axis it may be elliptical or more complicated. Of course, it is also possible to have a noncircular aperture stop that would then define the pupil shape. Because there is no phase term, the OTF is the same as the MTF, and the integral, Eq. (62), simplifies to D(σ, τ ) = M(σ, τ ) =
1 A
dx dy =
S(σ, τ ) A
(63)
S
where S is the overlap area between the pupil function and its shifted version. For an example that provides the simplest analytical expression, suppose that the system has a square pupil shape, as in Fig. 42. The pupil area A is 4, and the MTF (S/A) is given by 1 − σ/2. Thus, the MTF is linear with σ and reaches zero at σ = 2. Clearly, the autocorrelation can proceed in directions other than the x or y axes, for example, along the diagonal. In the latter √ case, the cutoff frequency would be 2 2. The MTF in the x direction is shown in Fig. 43.
2
0
s
Figure 43. MTF (or OTF) shape for a system that has a square pupil and no aberration, corresponding to the autocorrelation operation of Fig. 42.
In the more common circular pupil of unit radius, the overlap area between two circles has a somewhat more complicated analytical expression. The MTF in this case is given by D(σ ) = M(σ ) σ σ σ 4 1 = arccos − sin arccos , (64) π 2 2 4 2 and the MTF is the same in any direction. It is shown in Fig. 44. The cutoff frequency is σ = 2 because of the unit radius pupil normalization scheme. In other words, the pupil overlap goes to zero when the shifted function has moved by two radii. By inverting Eq. (44), we can obtain the denormalized spatial frequency N as N=
(n sin a)σ , λ
(65)
where all quantities refer to the image space. The OTF in the Presence of Aberrations. In the presence of aberrations, the MTF and the OTF are no longer the same. The simplest phase term or aberration is defocus. The effect of defocus on the OTF was determined analytically by Hopkins (34), although nowadays similar computations are always performed numerically. Like the PSF, the OTF in the presence of defocus is a wholly
1096
OPTICAL IMAGE FORMATION
1.0
(a)
MTF
0.8 0.6 0.4 0.2 0.0 0.0
0.5
1.0 s
1.5
2.0
Figure 44. MTF (or OTF) for a system that has a circular pupil and no aberrations. The normalized spatial frequency σ relates to the nonnormalized frequency N through σ = λN/(n sin a), where n sin a is the numerical aperture in the image space.
real function, but may contain negative values. This is illustrated in Fig. 45. The negative values imply a phase shift of 180° which produces a contrast reversal in practice; the bright grating bars become dark and vice versa. This is depicted in Fig. 46 using a target that has two different spatial frequencies. When the MTF goes to zero, the corresponding spatial frequency is not transmitted. This is a form of incoherent spatial filtering; certain frequencies are filtered out of the image, and others are modified in different ways. Such incoherent spatial filtering can also be performed by pupil apodization, that is, by modifying the pupil transmittance. For an extreme example, suppose that the pupil is obstructed by a mask that allows light to pass through only two squares as shown in Fig. 47. By performing the autocorrelation of the pupil function in the absence of aberrations [Eq. (63)] in the +x direction, we obtain the MTF shape in Fig. 48 that shows an entire band of missing frequencies. A less extreme example is given in Fig. 49, where the MTF of a centrally obscured system such as an astronomical telescope is compared with the MTF of an unobscured system of the same aperture. It can
1.0 0.9
Diff. limit OTF 1 wave of defocus
0.8
0.6 OTF
Figure 46. (a) A square wave object that has two different frequencies, such that the low frequency falls in the positive part of the MTF of Fig. 45, and the high frequency falls in the negative part. (b) The image of the square wave object through a system that has an OTF, as shown in Fig. 45. Notice that the bright and dark bars are reversed for the high-frequency grating (contrast reversal).
be seen that the effect of the obscuration is to suppress low frequencies and to enhance higher frequencies. In the presence of aberrations other than defocus, the MTF can take a variety of shapes that are constrained by the following conditions (35): 1. The MTF has a maximum of unity at zero spatial frequency3 (this says in effect that the best imaged object is no object at all; any contrast in the object will be attenuated). 2. At any given spatial frequency, the aberrated MTF is always lower than the unaberrated MTF of a system that has the same aperture. 3. The MTF slope at the origin is independent of aberration.
0.7
0.5 0.4 0.3 0.2 0.1 0.0 −0.1
(b)
0.0
0.5
1.0
1.5
2.0
s
Figure 45. OTF for a system suffering from one wave of defocus, compared with the diffraction-limited case (top curve).
3 Some
nonoptical systems like photographic film can have an MTF that peaks at a frequency >0. This is intimately related to the so-called adjacency or inhibition effects, mentioned previously.
OPTICAL IMAGE FORMATION
a
2a
a
Figure 47. An apodized pupil that has two square transmitting areas.
MTF
1.0 0.5
a
0.0
2a s
3a
4a
Figure 48. MTF (or OTF) in the x direction of a system that has a pupil shape such as shown in Fig. 47.
1.0 Diff. limit 12% obscuration 0.8
MTF
0.6
0.4
0.2
0.0 0.0
0.5
1.0 s
1.5
2.0
Figure 49. Comparison between obscured and unobscured diffraction-limited MTFs, corresponding to the PSFs of Fig. 29.
The MTF is a useful tool, but it is not always used as carefully as it requires. If the MTF is to be used successfully, one must keep in mind the following:
1. The MTF is a two-dimensional function. When shown as one-dimensional, one must always be aware that it represents only one object (grating) orientation unless the system is either diffractionlimited or suffers from spherical aberration but no other asymmetrical aberrations.
1097
2. Several aberrations like spherical or astigmatism and their higher order counterparts interact with defocus in a complicated way. In the presence of such aberrations, the optimum focal plane is not universally defined. A different choice of focus will yield a different MTF. In general, the effect of aberrations on the MTF is to suppress certain frequency bands more than other bands, as a function of focus. Thus it is often impossible to define a ‘‘best’’ MTF for an optical system suffering from significant aberration, unless a specific criterion for choosing the best MTF is applied. 3. It is dangerous to use only the MTF and neglect the PTF because the latter may cause significant image distortion. If the PTF is zero or linear, then it has no effect on the image. In the linear case, all sinusoidal components are shifted by an amount proportional to their frequency, resulting in a bodily image shift but no distortion because all sinusoidal components still add in phase. The effect of the nonlinear part of the PSF on image quality has been examined by Hopkins (36). Asymmetrical aberrations like coma have the greatest effect on the nonlinear components of the PTF. Examples have been shown by Hopkins (36) and Mahajan (10). Finally, the PTF may also be neglected when the MTF is high, as, for example, at low frequencies. If an optical system is followed by a sampling detector that cannot reproduce the high frequencies transmitted by the optics, then only the low frequencies matter, and in such a case the PTF may be entirely negligible. The OTF can be measured in a number of ways, and a considerable body of literature exists on this topic. In brief, we may state that although the OTF can be computed from a measurement of the PSF, such a measurement is not always feasible due to the small amount of light that passes through a pinhole (although the introduction of low-noise CCD detector arrays has facilitated such measurements). More often, the LSF is used, whose Fourier transform yields the OTF in the direction perpendicular to the line. The procedure may then be repeated for additional azimuths. When employing this process, a correction must generally be applied for the finite width of the object slit, and for the partial coherence in the slit illumination (37). An edge may also be used as an object, although the differentiation needed to derive the LSF (and hence the MTF) tends to increase the computational error. Interferometric techniques can provide a direct representation of the pupil function and hence, the OTF by autocorrelation. A detailed account of techniques and methods has been provided by Williams (38). The frequency response techniques outlined here can be generalized to handle optical systems and also the detector of the optical radiation or even the transmitting medium, such as the atmosphere. So, for example, it is possible to model a complete Earth-orbiting remote sensing system by taking the product of the transfer functions of the atmosphere, the optics, and the detector (such transfer functions are discussed in other articles
OPTICAL IMAGE FORMATION
in this encyclopedia). The multiplication of the transfer functions is valid if the effects of the various system components are decoupled. This means, for example, that the atmosphere cannot correct the effects of the lens aberrations, or that the lens aberrations cannot correct the effect of the detector. This condition is normally satisfied for physically different systems such as a detector and a lens, but it is not satisfied for two cascaded lenses because the aberrations of the first may cancel the aberrations of the second. Two lenses in cascade may be decoupled if the image of the first lens is formed on a diffusing screen that is then reimaged by the second lens. In such a case, the transfer functions of the two lenses may be multiplied because the diffusing screen destroys the coupling between the lenses and does not permit the second lens to correct the aberrations of the first. Coherent Image Formation Fundamentals. Coherent image formation has several subtleties as well as conceptual differences from the incoherent case. A detailed treatment of coherent image formation is outside the scope of this article because the entire subject holds, for now, relatively little interest outside the field of optics. Thus we will only hint at the subtleties and broadly describe the conceptual differences and similarities. The first broad similarity between incoherent and coherent imaging is that the latter can also be described as a convolution with the coherent PSF, as shown by Eq. (41). The linearity condition is satisfied for a coherently illuminated system, and the stationarity condition merits some more discussion later. Equation (41) relates the image and object complex amplitude distributions through a convolution with the PSF. The convolution theorem of the Appendix then tells us that the Fourier transforms of those functions will be related through a mere multiplication. Specifically, (66) a (x , y ) = a(x, y)f (x, y), where f (x, y), the Fourier transform of the PSF is the pupil function, as we have established previously [Eq. (42)]. Therefore, Eq. (66) tells us that the Fourier (frequency) spectrum of the image is given by the Fourier spectrum of the object multiplied by the pupil function, which plays the role of the coherent transfer function (CTF). Thus, for example, in the absence of aberrations and apodization, the CTF is the simple top-hat function of Fig. 23, for a system that has a circular pupil shape. In summary, then, coherent image formation can be described as follows: From the complex amplitude distribution in the object A(u, v), we pass to the amplitude distribution across the entrance pupil a(x, y) through a Fourier transformation. Multiplication of a(x, y) by the pupil function f (x, y) gives the amplitude distribution across the exit pupil (a ). Finally, an (inverse) Fourier transformation gives the complex amplitude distribution in the image (A ). Based on this description, we can predict the images of several typical objects. Consider first a low-frequency object, whose spatial frequency spectrum does not extend outside the pupil radius (taken as unity). This means
that a(x, y) = 0 for any x2 + y2 > 1. If the pupil function contains no apodizing term or aberrations, all spatial frequencies inside the pupil will be transmitted with undiminished amplitude, and the image will be a perfect reconstruction of the object. This contrasts sharply with the incoherent case, where the only perfectly imaged object was no object at all (zero spatial frequency). The images of objects that contain frequencies that fall outside the pupil will appear low-pass filtered. Take, for example, a sawtooth function similar to that of Fig. 39, where the abscissa is understood as amplitude rather than irradiance. If the extent of the pupil allows only the first four terms to pass, then the resulting image will have an amplitude distribution similar to that shown by synthesizing the first four harmonics. However, a photodetector such as the eye records only the irradiance and not the amplitude of the wave. Therefore, the final appearance of the image would correspond to the square of the function that arises from the superposition of the first four harmonics. Mills and Thompson (39) described in detail the effect of aberrations and pupil apodization in the images of coherently illuminated systems for standard test objects, including a point, slit, edge, and two adjacent points. It is common to compare coherent and incoherent transfer functions, as shown in Fig. 50. However, this figure, a version of which appears in several texts, is conceptually misleading because the spatial frequency axis refers to two different physical quantities, amplitude and irradiance. A sinusoidal amplitude distribution (grating), when squared, will yield an irradiance distribution of twice the frequency [through the trigonometric identity cos2 x = 1/2 + 1/2 cos(2x)]. Thus the sinusoidal grating that corresponds to the cutoff point of the CTF would appear visually as a grating of the same frequency as the cutoff point of the incoherent OTF. It is important to understand in more detail the nature of the ‘‘object’’ in coherent imaging. Up to this point, we have assumed that the object is a spatial
1.0 0.8 Transfer function
1098
0.6 0.4 0.2 0.0 0.0
0.5
1.0
1.5
2.0
Norm. spatial frequency Figure 50. A common but somewhat misleading comparison between the diffraction-limited coherent and incoherent transfer functions for a system that has a circular pupil. In the coherent case, all frequencies up to unity (the pupil radius) pass through without attenuation. However, the spatial frequency axis refers to different types of objects in the two cases (amplitude or irradiance gratings).
OPTICAL IMAGE FORMATION
(a) (b) Figure 51. Schematic of an amplitude (a) and a phase object (b). The latter is like a perfectly transparent piece of glass that has a surface relief pattern as shown.
distribution of complex amplitude, without giving any further details. Such a complex amplitude distribution will have the general form f (x, y) = U(x, y) exp[iφ(x, y)]. In this expression, U is the real amplitude (transmittance or reflectivity) variation, and φ is the variation in optical thickness or phase (Fig. 51). Physically, the object may be taken as an illuminated transparency. The illuminating wave itself may have amplitude and phase variations and is thus represented by a function fi (x, y) = Ui (x, y) exp[iφi (x, y)]. Under the assumption that the spatial detail in all four of these functions is considerably coarser than the wavelength, the complex amplitude distribution emanating from the object plane is given by the product of the two distributions fo (x, y) ≡ fi (x, y)f (x, y) = Ui (x, y)U(x, y) exp{i[φi (x, y) + φ(x, y)]} ≡ Uo (x, y) exp[iφo (x, y)]
(67)
Details in the object comparable to the wavelength are generally of interest for their diffractive, rather than imaging properties (in any case, high frequencies are likely to be cut off by the system aperture, as Fig. 50 indicates). Thus the term ‘‘object,’’ as used so far, implies the complex amplitude distribution fo (x, y) that contains the characteristics of the object and of the illuminating wave. Now, consider the question of stationarity of the amplitude PSF that was postponed previously. In addition to the requirement for the invariance of aberrations with field, as for incoherent imaging, we also need to ensure that there are no phase terms that vary with field and would not appear in the incoherent PSF. In the normal imaging situation for a planar object and image and finite pupil locations, there are two such terms. One is a linear obliquity term, and the other is a quadratic optical path variation term (40). Space prohibits further details, but it is noted that both of these terms can be eliminated if instead of a planar object and image surfaces, we consider spherical surfaces centered on the entrance and exit pupil, respectively. Then, the illuminating wave must also be a spherical wave focusing on the entrance pupil. Under these conditions, the illumination is cophasal, and the additional phase terms disappear, ensuring stationarity of the amplitude PSF and the validity of the convolution integral, Eq. (41). Of course, a planar object can also be handled if illuminated by a plane wave. This places the entrance pupil at infinity, and therefore the optical system must be appropriately designed. An alternative way of handling planar objects is by matched illumination. In this case, the planar object is illuminated by a spherical wave focusing at the entrance pupil. Then, there is an exact Fourier transform relationship between the object distribution and the
1099
entrance pupil, provided the entrance pupil surface is taken as a sphere. (If the illuminating wave is not perfectly spherical, that is, it has aberrations, then the FT of the product of the object function and the aberration terms is obtained, as noted previously). If further the exit pupil and the image surfaces are also taken as spherical, then the stationarity requirement is satisfied exactly (40). However, many practical situations can be handled without concern about the difference between spherical and plane surfaces because the irradiance distribution is very similar in the two cases, and several operations can be performed on the irradiance alone. Finally, we devote a little space to the intuitive understanding of Fourier analysis for a complex amplitude object. To recall the incoherent case, we saw that the fundamental building blocks of any incoherent image were sinusoidal irradiance distributions (albeit fictitious ones, that contain negative values). What are the corresponding building blocks of a complex amplitude distribution? The answer is that any complex amplitude distribution can be analyzed as a set of interfering plane waves at various directions of propagation. This is the so-called angular spectrum. This may be seen as follows: The Fourier transform relationship exp [i2π(ux + vy)] du dv a(x, y) =
(68)
A(u,v)
holds between the complex amplitude distribution in the entrance pupil a(x, y) and the object A(u, v). The inverse relationship, , A(u, v) =
(69)
a(x,y) exp[−i2π(ux+vy)] dx dy
expresses the object distribution as a superposition of plane waves. The plane waves are represented by the exponential term and have a direction of propagation given by the direction cosines α = xλ and β = yλ [see also (17)]. This is illustrated in Fig. 52, where we consider a simple, one-dimensional (coarse) diffraction grating, which is simply a one-dimensional periodic distribution of amplitude and/or phase. The grating is illuminated by a plane wave, as shown. It is a well-known result of wave optics [e.g., (12)] that after the grating, there will be plane waves emanating at various angles. Each such wave is called a grating order and corresponds to a harmonic term in the Fourier series of the grating profile [e.g., Eq. (A21) or (52) for a specific example]. This is the physical implementation of Eq. (68), which, however, refers to a general nonperiodic object. Now, if the direction of propagation is reversed, then the amplitude distribution at the plane of the grating is obtained as the superposition of the plane waves. This is the physical meaning of Eq. (69). Coherent and incoherent illumination are two extreme cases. The general case is partially coherent illumination. Fortunately, most practical imaging problems and systems satisfy the coherence or incoherence conditions quite well.
1100
OPTICAL IMAGE FORMATION
−3
−2
−1 0 +1 +2
+3
Figure 52. A plane wave incident on a periodic object (grating) and the resulting angular spectrum of diffracted plane waves, each corresponding to a grating order.
In general, partial coherence is significant in systems that incorporate illumination subassemblies, most commonly microscopes, or, more recently, photolithographic projection systems. Hopkins (40) showed that the problem of partially coherent imaging can be reduced to the superposition of the image intensities produced by waves emanating from each point of a properly defined ‘‘effective source’’ that simulates the partial coherence conditions across the illuminated object plane. Spatial Filtering: An Example and Application of Coherent Imaging. All of the previously discussed concepts of coherent imaging can be illustrated by the following example of a system, shown in Fig. 53. A laser beam is focused to a point and then expanded to illuminate the object. The system uses a plane wave to illuminate the planar object and therefore has the entrance pupil at infinity. If the image is to be obtained on a plane, then the exit pupil must also be at infinity. The plane noted as ‘‘filter plane’’ is also the image of the entrance and exit pupils (i.e., the aperture stop). The Fourier transform of the object amplitude distribution is formed at the entrance pupil by the angular spectrum of plane waves that emanate from
Figure 53. A system for illustrating coherent imaging concepts and for performing spatial filtering experiments. The two lenses have equal power and are separated by twice their focal length. Object and image are one focal length away from the first and second lens, respectively.
the object. The lens immediately after the object will image the entrance pupil at the aperture stop, where a scaled version of the Fourier transform is obtained. The following lens performs the inverse Fourier transformation, and the image appears at the image plane. If the lenses are well corrected for aberrations, then the Fourier transformations are exact and the stationarity conditions are satisfied across the entire image. This system becomes more interesting if one places various filters at the filter plane. Then, amplitude leaving the filter plane will be the product of the incident and the filter transmission, as in Eq. (67). This allows us to perform operations in the Fourier domain and thus alter the appearance of the image in specified ways. To illustrate the process using an experiment of some historic importance, consider a square grid as an object (Fig. 54). The scaled Fourier transform of the object transmittance will appear at the filter plane and will have a form similar to that shown in Fig. 55. In a landmark experiment (although performed in a somewhat different setup), Porter (41) took a transparent strip as a filter that allowed only the vertical or horizontal orders to pass. In each case, the image turned into a vertical or horizontal one-dimensional grating (Fig. 56). Although the outcome of this experiment might sound obvious after the introduction of the Fourier transform and the imaging equations of the previous sections, it was performed at a time when none of these concepts were in place and led to a revolutionary understanding of the process of image formation, as described briefly in the next section. Several other filtering operations can be performed by using the same setup. For example, a still picture of a TV frame displays the characteristic raster scan pattern in the form of fine horizontal lines. The spectrum of these lines is a set of dots in the perpendicular direction that are quite clearly distinguishable from the spectrum of the rest of the image. The dots may be described mathematically as a series of delta functions (if the grating is considered of infinite extent) that correspond to the Fourier transform of a periodic object [Eq. (A21)]. A well-constructed filter will eliminate the spectrum of the lines and will only minimally impact the rest of the image. Matched filters can also be constructed to aid in optical recognition of specific features within the image. All of these applications are described in several texts (11,13,17,20). THE ABBE THEORY OF IMAGE FORMATION AND RESOLUTION LIMITS Ernst Abbe (42) was the first to understand image formation as a diffraction and interference process. He was particularly interested in image formation by a
Filter plane Image
Plane wave Object
f
f
f
f
OPTICAL IMAGE FORMATION
Figure 54. A square grid object.
Figure 55. Approximate appearance of the spectra corresponding to the square grid object of Fig. 54. The size of the circles attempts to indicate the relative strength of the diffracted orders, except for the central (undiffracted) one, which would be considerably brighter than indicated.
microscope, so we use his theory as a way of summarizing the various types of resolution limits. Abbe’s view of image formation can be described as follows: Consider a periodic object, such as a grating, illuminated by a monochromatic plane wave. An angular spectrum of plane waves is diffracted from the grating. These waves are brought to a focus at the exit pupil plane, where the spectrum (or Fourier transform) of the grating is formed (this is shown in Fig. 53, where the Fourier transform appears at the filter plane). The spectra form a series of point sources that give rise to another set of waves in the image plane. The interference of those waves produces the image. To understand this last step aided by Fig. 53, notice that the image would be formed at infinity without the help of the second lens, or, if the object was not exactly one focal length away, the image would be formed at a finite distance. The second lens is not essential to this description and in this case serves merely to form the image at a finite distance. We may generalize the situation shown in Fig. 53 if we allow the refractive indexes of the object and image space to be different (n, n ) and the magnification to be other than unity (so that the second lens may have a focal length different from the first). We borrow this result from physical optics, that the first-order diffracted beam from the grating is at an angle θ given by d sin θ = λ/n where n is the refractive index of the object (grating) space. From this equation, the grating spacing d may be expressed as d=
Figure 56. Spatial frequency filtering of the square grid object. A rectangular aperture (shown in dashed lines) allows only the vertical or horizontal orders, giving the image the appearance of a vertical or horizontal grating, respectively. Although not shown, other orientations of the aperture (e.g., 45° ) would produce a similar effect.
1101
λ (n sin θ )
(70)
The spectra formed at the plane of the aperture stop are points [or more precisely, functions depending on the aperture shape such as given by Eq. (34)] separated by a distance proportional to the focal length (F tan θ ). Now, suppose that the radius of the aperture stop is larger than this distance, so that all orders pass through. Each of the orders generates a plane wave at the image plane. The interference of the three plane waves gives rise to a sinusoidal amplitude distribution or interference fringes, just as in Young’s interference experiment (12,13). The period of this grating is given by d = λ/(n sin θ ). The ratio d /d = (n sin θ )/(n sin θ ) is equal to the geometric magnification [see Eqs. (8), (24)], as expected. The limit of resolution of a given system is a function of the aperture size, as we have already seen. In this case, the resolution limit is obtained when the diffraction angle is such that the point spectra corresponding to the ±1 orders are formed exactly at the rim of the aperture. If the diffraction angle is larger and the first-order spectra are cut off by the aperture, only the zero-order spectrum remains and gives rise to a single plane wave that results in uniform illumination of the image plane. Therefore, the image detail has been lost. Thus, if a is the maximum angle in the object space for which the first-order spectra just pass, the resolution limit of a coherent system can be expressed as λ/(n sin a). It is possible to obtain higher resolution from a coherent system if oblique illumination is used. If, on the same system shown in Fig. 53, the illumination forms an angle
1102
OPTICAL IMAGE FORMATION
a with the object plane, then the undiffracted beam will be focused at the rim of the aperture. Only one more grating order is needed to pass through the aperture to create interference fringes (and thus visible detail) at the image plane. Thus, the angle between the zero and the first order can now be 2a, corresponding to a grating of twice the frequency. Using oblique illumination, then, the resolution limit can be halved to 0.5λ/(n sin a). It is worth appreciating the difference between an amplitude and a phase object relative to this discussion. The sinusoidal grating employed in the example of this section represents a sinusoidal variation in real amplitude (and hence transparency) and has a transmittance of the form a + b cos(2π x/x0 ). As the expression shows, this type of grating has a Fourier transform that contains only the first and zeroth orders. The appearance of the negative first order in the diffraction pattern justifies the preferential use of the complex exponential form [Eq. (A6)] in writing the Fourier series, because a sinusoidal grating in that form has both a positive and negative first order. However, the Fourier (or diffracted) spectrum of a sinusoidal phase grating contains higher harmonics, as can be seen by taking the Fourier transform of the transmittance of such a grating, which has the form exp[i cos(2π x/x0 )]. A phase object is invisible to a detector such as the human eye that responds to irradiance. Mathematically, the squared modulus of the phase object is a constant; thus, it contains no spatial variation that would render it visible to the eye. If the aperture of the imaging system is such that even the highest spatial frequency present in the object is transmitted, then the corresponding image irradiance distribution is uniform, and the image is invisible. Departures from that condition, whether in the form of truncated high spatial frequencies or defocus/aberrations, can render the image visible by altering the phase relationship of the Fourier components and hence, the order of their interference at the image plane. This is also the principle of phase contrast microscopy, which places a phase mask at the Fourier plane to retard the phase of a chosen frequency band.
Table 3. Various Forms of Resolution Limit Formula
Name
d = 0.61λ/(n sin a ) Rayleigh d = 0.5λ/(n sin a ) d = λ/(n sin a)
d = 0.5λ/(n sin a)
Condition
Resolved object
Image space Two mutually incoherent points MTF limit Image space Incoherent sinusoidal grating Abbe Object space Sinusoidal grating illuminated coherently by normally incident wave — Object space Sinusoidal grating illuminated coherently and obliquely
The various types of resolution limits shown in Table 3 summarize the entire wave theory of image formation for both incoherent and coherent imaging. APPENDIX SOME CONCEPTS OF FOURIER ANALYSIS The Fourier Series for Periodic Functions According to the Fourier theorem, any periodic function that satisfies certain simple conditions (to be stated later) can be written as a summation of sinusoidal functions whose frequencies are equal to integral multiples of the frequency of the original function. Let f (x) be a periodic function of period p, that is, f (x) = f (x + np) where n an integer, then f (x) =
∞ n n a0 + an cos 2π x + bn sin 2π x , (A1) 2 p p n=1
where x2
2 a0 = p
f (x) dx,
(A2)
n f (x) cos(2π x) dx, p
(A3)
n f (x) sin(2π x) dx, p
(A4)
x1
x2
2 an = p
x1
x2
2 bn = p
x1
and x2 − x1 = p. A form of the Fourier theorem is employed when, for example, one breaks down a certain musical tone into pure tones (sinusoidal vibrations). In the example of sound, all pure tones have positive frequencies. However, in certain cases in optics, there is plausible physical meaning assigned to negative frequencies. For this reason, the preferred way of expressing Fourier’s theorem in optics (at least most of the time) is through the complex exponential notation, in which sines and cosines are substituted by complex exponentials through 1 [exp(ix) + exp(−ix)], 2 1 sin x = [exp(ix) − exp(−ix)]. 2i
cos x =
(A5)
Now, Fourier’s theorem for periodic functions may be restated as follows:
n Fn exp i2π x , f (x) = p n=−∞ +∞
1 Fn = p
x2 x1
n f (x) exp −i2π x p
(A6)
dx
(A7)
OPTICAL IMAGE FORMATION
and x2 − x1 = p. Sufficient conditions (called the Dirichlet conditions) for the validity of Eqs. (A1) and (A6) are 1. f (x) has only a finite number of discontinuities in any finite interval of x within the range x1 < x < x2 . 2. f (x) has only a finite number of extrema in any finite interval of x in the range x1 to x2 . 3. The integral of |f (x)| over p converges (is finite). The coefficients Fn are the spectrum of f (x), also called the Fourier spectrum or the Fourier coefficients, or harmonics. When we use Eq. (58) or (53)–(55) to determine the strength of the Fourier coefficients, we speak of Fourier analysis. When we add the sinusoidal (or complex exponential) terms to approximate f (x) (i.e., when using Eq. (52) or (57), we speak of Fourier synthesis. Fourier synthesis is illustrated by an example in Fig. 39. Nonperiodic Functions: Fourier Transforms The Fourier theorem can be extended to nonperiodic functions, thus allowing Fourier analysis to represent an arbitrary (ir)radiance distribution. In the general case, as p goes to infinity to represent a nonperiodic function, the variable n/p in Eq. (A6) becomes continuous, and the Fourier series becomes a Fourier integral, called the Fourier transform:
f (x) =
+∞ F(σ ) exp(i2π σ x) dσ ,
(A8)
−∞
where
+∞ F(σ ) = f (x) exp(−i2π σ x) dx.
(A9)
−∞
The reader should be aware that several different ways exist for expressing the Fourier transform: the negative sign in the exponent may appear in Eq. (A8) instead of √ Eq. (A9), a constant 1/ (2π ) may appear in front of both integrals when 2π is removed from the exponents, or a constant 1/(2π ) may appear in front of one integral and not the other. All of these ways of expressing the Fourier transform are equivalent and lead to the same physics. The equation with the minus sign is called the inverse Fourier transform, and the equation with the positive sign is called the Fourier transform. But this can be cause for confusion because it is acceptable to call either function f (x) or F(σ ) the Fourier transform of the other. In fact, the terminology ‘‘inverse Fourier transform’’ need be employed only when Fourier transformations are cascaded, in which case it helps to ensure that the correct sign is used in successive operations. Otherwise, the sign of the Fourier transform has no physical relevance. The function F(σ ) is the Fourier or frequency spectrum of f (x). Notice that if x has dimensions of length, then σ has dimensions of inverse length and is called spatial frequency, although in the theory of image formation, it pays to normalize these variables and make them
1103
dimensionless, as we have already done in equation 444 . The variable σ is analogous to the discrete variable n/p in Eqs. (A6), (A7). The two functions f (x) and F(σ ) are called a Fourier transform pair to emphasize the symmetry between them. In other words, the two functions are mathematically similar objects. Notice that there is no such symmetry between a periodic function and its Fourier spectrum, as defined by Eqs. (A6) and (A7): the function and its spectrum are mathematically different objects because the spectrum is a set of integrals, rather than a single function. These two facts pose a problem. If the Fourier transform of a periodic function were to exist, then, according to Eqs. (A8) and (A9), it would have to be a mathematical object similar to the original function (it might not be a periodic function, but it should at least be a function). That would mean that it could not be written in the form of Eq. (A7). Therefore, either the Fourier transform of a periodic function could not exist, or it would have to represent something different from the spectrum given by Eq. (A7). The full resolution of this mathematical conundrum would fill several pages (43,44) and can be only hinted at here. It is noted that this dilemma is intimately connected with the conditions for the existence of the Fourier transform, because as p → ∞, the Dirichlet conditions would exclude many simple functions, starting with the function f (x) = const, whose integral from −∞ to +∞ does not converge. However, it is worth noting that if f (x) is to represent a physical quantity such as a function of distance (an image) or time (a signal), then it cannot extend from −∞ to +∞ but must necessarily be defined within finite limits. This would mean that the function is not truly periodic; hence, the Fourier transform would be applicable, but the Fourier series would not be. Thus, if we allow the mathematics to be tempered by physical reality, the conflict disappears. Nevertheless, when a quasi-periodic function (periodic within an interval) has very many periods, it approximates a fully periodic function so well that it is economical to treat it as such. This is another way of saying that the mathematical abstraction represented by a periodic function is useful because it simplifies many concepts and calculations. Thus, it is desirable to define the Fourier transform of periodic functions, and of course, it is desirable that it should have the same meaning as the discrete Fourier spectrum, Eq. (A7). To define the Fourier transform of a periodic function, we actually need to define a new class of functions, called generalized functions (43,44), which contain ordinary functions as a subset. Within this new set of functions, the Fourier transform and its inverse always exist. However, only one generalized function need concern us here, the so-called delta function; other generalized functions of interest can be considered derived from it.The delta function is defined through the following relationships: δ(σ ) = ∞ if σ = 0, = 0 if σ = 0,
(A10)
4 The variable σ can also represent the wave number (or inverse wavelength) when the Fourier transform is used in coherence theory. Fourier transforms are ubiquitous in optics!
1104
OPTICAL IMAGE FORMATION
and
The Fourier Transform of a Periodic Function
+∞ δ(σ ) dσ = 1.
(A11)
−∞
The delta function can be given physical meaning as, for example, the idealized concept of an electrical point charge. The charge is considered concentrated at a single point (taken here as zero), but the integral of the function that represents it must equal the charge value (taken here as unity). The simplest visualization of the delta function can be obtained as a rectangle of height b and width 1/b, centered at zero. The area (integral) of the rectangle remains unity for all values of b. As b → ∞, we obtain a delta function. Now, consider the function f (x) = 1, if −a/2 < x < a/2, and zero elsewhere. The Fourier transform of this function is +∞
+a/2
f (x) exp(−i2π σ x) dx =
F(σ ) = −∞
exp(−i2π σ x) dx, −a/2
(A12) from which, after performing the integration, we obtain F(σ ) =
sin(π aσ ) . πσ
(A13)
Now, as a → ∞, f (x) becomes the unit constant function, f (x) = 1. It may also be shown that +∞ lim
a→∞ −∞
sin(π aσ ) =1 πσ
lim
sin(π aσ ) = 0 for σ = 0, = ∞ for σ = 0. πσ
(A15)
Upon comparing Eqs. (A14) and (A15) with Eqs. (A10) and (A11), we see that lim
a→∞
sin(π aσ ) = δ(σ ) πσ
(A16)
[where the operations of taking the limit and integrating are defined as meaningful only if carried out in the order indicated in Eq. (A14)]. Therefore, the functions f (x) = 1 and δ(σ ) are a Fourier transform pair: +∞ δ(σ ) exp(i2π σ x) dσ , f (x) = 1 =
f (x) = exp(i2π σ0 x)
(A19)
The Fourier transform of this function is [from Eq. (A9)] +∞ exp[i2π(σ0 − σ )x] dx F(σ ) = −∞
and by virtue of Eq. (A18), F(σ ) = δ(σ − σ0 ).
(A20)
Thus the Fourier spectrum of a simple harmonic function of frequency σ0 is a delta function centered at σ = σ0 . This is one of the reasons that the negative sign appears in the exponent of the Fourier spectrum [Eq. (A9], instead of Eq. (A8). Had this alternative way been followed, the spectrum of the harmonic function of frequency σ0 would be centered at σ = −σ0 , which is less satisfactory. There is, however, no actual physical difference between the two representations. Now, consider a general periodic function that can be expressed as a Fourier series, per Eq. (A6). Without the need for introducing new mathematics, it can be shown that the Fourier transform of that function is
n , Fn δ σ − F(σ ) = p n=−∞
+∞
(A14)
and
a→∞
Consider first the simplest periodic function, a sinusoid or simple harmonic function, written in exponential form:
(A21)
by virtue of Eqs. (A9) and (A13). Thus, the Fourier transform of a periodic function is a series of equally spaced delta functions, whose strengths are equal to Fn , located at frequencies n/p. This is the way that the Fourier spectrum of a periodic function may be described in terms of the Fourier transform. The function described by Eq. (A21) is shown schematically in Fig. 57. The Two-Dimensional Fourier Transform The definition of the Fourier transform can be readily extended to two dimensions. The two-dimensional Fourier
F (s)
(A17)
−∞
and
+∞ exp(−i2π σ x) dx. δ(σ ) =
(A18)
−∞
The function F in Eq. (A13) is encountered frequently enough to be given a name. It is called a sinc function [see also Eq. (48)].
−4
−2
0
2
4
s (arbitrary units)
Figure 57. Schematic of the Fourier transform of a periodic function. Each arrow represents a delta function, and the corresponding harmonic strength is indicated by the height of the arrow.
OPTICAL IMAGE FORMATION
ABBREVIATIONS AND ACRONYMS
transform may be written as +∞ +∞ f (x, y) = F(σ, τ ) exp[i2π(σ x + τ y)] dσ dτ ,
(A22)
−∞ −∞
where +∞ +∞ f (x, y) exp[−i2π(σ x + τ y)] dx dy F(σ, τ ) =
(A23)
−∞ −∞
Similar extension can apply to functions of three or more variables, which, however, are not normally encountered in the theory of image formation.
We give these theorems in one-dimensional form for simplicity of notation, but the extension to two dimensions is straightforward. Proofs can be found in (43–45). Parseval’s Theorem. If f (x) and F(σ ) are a Fourier transform pair, then, +∞ +∞ |F(σ )|2 dσ = |f (x)|2 dx.
(A24)
−∞
This theorem expresses, in general, the conservation of energy (or irradiance). Thus, it ensures, for example, that the integral of the PSF always converges for a system that has a finite pupil size.
Convolution. A function A (u ) is said to be the convolution of two functions A(u) and F(u) if A (u ) =
+∞ −∞
A(u)F(u − u) du.
(A25)
If a (x), a(x), and f (x) are the Fourier transforms of A , A, and F, respectively, then, a (x) = a(x)f (x).
(A26)
This theorem can be extended to an arbitrary number of functions (repeated convolution). Autocorrelation. The autocorrelation function of F(x) is defined as +∞ F ∗ (x)F(x + ξ ) dx, (A27) C(ξ ) = −∞
where ∗ denotes complex conjugate. The (inverse) Fourier transform of C(ξ ) is given by c(u) = |f (u)|2 , where f (u) is the inverse Fourier transform of F(x).
CTF EM ESF FT LSF MTF OTF PMR PPR PSF PTF
coherent transfer function electromagnetic edge spread function fourier transform line spread function modulation transfer function optical transfer function paraxial marginal ray paraxial pupil ray point spread function phase transfer function
BIBLIOGRAPHY 1. D. S. Goodman, in M. Bass, ed., Handbook of Optics, vol. I, McGraw-Hill, 1995.
Some Important Definitions and Theorems
−∞
1105
2. P. Mouroulis and J. Macdonald, Geometrical Optics and Optical Design, Oxford University Press, 1997. 3. W. J. Smith, Modern Optical Engineering, McGraw-Hill, 1990. 4. L. Levi, Applied Optics: A Guide to Optical System Design, Wiley, 1968. 5. R. Kingslake, Optical System Design, Academic Press, 1983. 6. D. C. O’Shea, Elements of Modern Optical Design, Wiley, 1985. 7. W. J. Smith, in M. Bass, ed., Handbook of Optics, vol. I, McGraw-Hill, 1995. 8. R. S. Longhurst, Geometrical and Physical Optics, Longman, UK, 1967. 9. W. T. Welford, Aberrations of Optical Systems, Adam Hilger, 1986. 10. V. N. Mahajan, Aberration Theory Made Simple, SPIE Press, Bellingham, WA, 1991. 11. R. Guenther, Modern Optics, Wiley, 1990. 12. M. Born and E Wolf, Principles of Optics, 7th ed., Cambridge University Press, 1999. 13. E. Hecht, Optics, Addison-Wesley, 1987. 14. G. H. Spencer and M. V. R. K. Murty, J. Opt. Soc. Am. 52, 672–678 (1962). 15. H. H. Hopkins, The Wave Theory of Aberrations, Oxford University Press, 1950. 16. V. N. Mahajan, Optical Imaging and Aberrations Part 1: Ray Geometrical Optics, SPIE Press, 1999. 17. J. W. Goodman, Introduction to Fourier Optics, 2nd ed., McGraw-Hill, 1996. 18. J. D. Gaskill, Linear Systems, Fourier Transforms, and Optics, Wiley, 1978. 19. H. H. Hopkins, in R. Shannon and J. Wyant, eds., Applied Optics and Optical Engineering I, vol. IX, Academic Press, 1983. 20. G. O. Reynolds, J. B. DeVelis, G. B. Parrent Jr., and B. J. Thompson, The New Physical Optics Notebook, SPIE Press, Bellingham, WA, 1989. 21. V. N. Mahajan, J. Opt. Soc. Am. 72, 1,258–1,266 (1982). V. N. Mahajan, J. Opt. Soc. Am. A 3, 470–485 (1986). 22. B. R. A. Nijboer, Physica 10, 679–692 (1943).
(A28)
23. B. R. A. Nijboer, Physica 13, 605–620 (1947). 24. K. Nienhuis and B. R. A. Nijboer, Physica 14, 590–603 (1949).
1106
OPTICAL MICROSCOPY
25. M. Cagnet, M. Francon, and J. C. Thrierr, Atlas of Optical Phenomena, Springer-Verlag, 1962. 26. K Strehl, Zeitschrift f. Instrumentenkunde 22, 213–217 (1902). 27. See sect. 8.3 in ref. 10, and also V. N. Mahajan, J. Opt. Soc. Am. 73, 860–861 (1983). P. Mouroulis, in P. Mouroulis, ed., Visual Instrumentation: Optical Design and Engineering Principles, McGraw-Hill, 1999. 28. A. Mar´echal, Revue d’Optique 26, 257–277 (1947). 29. W. B. King, J. Opt. Soc. Am. 58, 655–661 (1968). 30. P. Mouroulis and H. Zhang, J. Opt. Soc. Am. A 9, 34–42 (1992). 31. V. N. Mahajan, Appl. Opt. 17, 964–968 (1978). 32. H. H. Hopkins and B. Zalar, J. Mod. Opt. 34, 371–406 (1987). 33. J. Macdonald, Optica Acta 18, 269–290 (1971). 34. H. H. Hopkins, Proc. R. Soc. A 231, 91–103 (1955). 35. H. H. Hopkins, K. J. Habell, ed., Proceedings of the Conference on Optical Instruments and Techniques, Chapman and Hall, London, 1961, pp. 480–514. 36. H. H. Hopkins, Optica Acta 31, 345–368 (1984). 37. A. P. Shaw, Optica Acta 33, 1,389–1,396 (1986). 38. T. L. Williams, The Optical Transfer Function of Imaging Systems, Institute of Physics, 1999. 39. J. P. Mills and B. J. Thompson, J. Opt. Soc. Am. A 3, 694–716 (1986). 40. H. H. Hopkins, Photogr. Sci. Eng. 21, 114–123 (1977). 41. A. B. Porter, Philos. Mag. 11(6), 154 (1906). 42. E. Abbe, Archiv. Mikroskopische Anat. 9, 413–468 (1873). 43. M. J. Lighthill, Introduction to Fourier Analysis and Generalized Functions, Cambridge University Press, 1958. 44. P. Dennery and A. Krzywicki, Mathematics for Physicists, Harper and Row, 1967. 45. R. Bracewell, The Fourier Transform and Its Applications, 2nd ed., McGraw-Hill, 1965.
OPTICAL MICROSCOPY MICHAEL W. DAVIDSON The Florida State University Tallahassee, FL
MORTIMER ABRAMOWITZ Olympus America, Inc. Melville, NY
INTRODUCTION The past decade has witnessed enormous growth in the application of optical microscopy to micron and submicron level investigations in a wide variety of disciplines (1–5). Rapid development of new fluorescent labels has accelerated the expansion of fluorescence microscopy in laboratory applications and research (6–8). Advances in digital imaging and analysis have also enabled microscopists to acquire quantitative measurements quickly and efficiently on specimens that range from photosensitive caged compounds and synthetic ceramic superconductors to realtime fluorescence microscopy of living cells in their natural environment (2,9). Optical microscopy, helped by digital video, can also be used to image very thin optical sections,
and confocal optical systems are now operating at most major research institutions (10–12). Early microscopists were hampered by optical aberration, blurred images, and poor lens design, which restricted high-resolution observations until the nineteenth century. Aberrations were partially corrected by the mid-nineteenth century by the introduction of Lister and Amici achromatic objectives that reduced chromatic aberration and raised numerical apertures to around 0.65 for dry objectives and up to 1.25 for homogeneous immersion objectives (13). In 1886, Ernst Abbe’s work with Carl Zeiss led to the production of apochromatic objectives based for the first time on sound optical principles and lens design (14). These advanced objectives provided images that had reduced spherical aberration and were free of color distortions (chromatic aberration) at high numerical apertures. Several years later, in 1893, Professor August K¨ohler reported a method of illumination, which he developed to optimize photomicrography, that allowed microscopists to take full advantage of the resolving power of Abbe’s objectives. The last decade of the nineteenth century saw innovations in optical microscopy, including metallographic microscopes, anastigmatic photolenses, binocular microscopes that had image-erecting prisms, and the first stereomicroscope (14). Early in the twentieth century, microscope manufacturers began parfocalizing objectives, that allowed the image to remain in focus when the microscopist exchanged objectives on the rotating nosepiece. In 1924, Zeiss introduced a LeChatelier-style metallograph that had infinity-corrected optics, but this method of correction was not used widely for another 60 years. Shortly before World War II, Zeiss created several prototype phase contrast microscopes based on optical principles advanced by Frits Zernike. Several years later, the same microscopes were modified to produce the first time-lapse cinematography of cell division photographed with phase-contrast optics (14). This contrast-enhancing technique did not become universally recognized until the 1950s and is still a method of choice for many cell biologists today. Physicist Georges Nomarski introduced improvements in Wollaston prism design for another powerful contrastgenerating microscopy theory in 1955 (15). This technique is commonly referred to as Nomarski interference or differential interference contrast (DIC) microscopy and, along with phase contrast, has allowed scientists to explore many new arenas in biology using living cells or unstained tissues. Robert Hoffman (16) introduced another method of increasing contrast in living material by taking advantage of phase gradients near cell membranes. This technique, now termed Hoffman Modulation Contrast, is available as optional equipment on most modern microscopes. The majority of microscopes manufactured around the world had fixed mechanical tube lengths (ranging from 160–210 mm) until the late 1980s, when manufacturers largely changed over to infinity-corrected optics. Ray paths through both finite-tube-length and infinity-corrected microscopes are illustrated in Fig. 1. The upper portion
OPTICAL MICROSCOPY Finite tube−length microscope ray paths
(a)
Eyepiece L ey Object O
Objective L ob
Intermediate image O'
h z
Eye O''
h' f a
(b) Objective L ob
fb
z'
b Infinity−corrected microscope ray paths Tube lens L tb
Object O
Eyepiece L ey Intermediate image O'
Eye O''
h h' Parallel light beam ''infinity space''
Figure 1. Optical trains of finite-tube and infinity-corrected microscope systems. (a) Ray traces of the optical train representing a theoretical finite-tube-length microscope. The object O is a distance a from the objective Lob and projects an intermediate image O’ at the finite tube length b, which is further magnified by the eyepiece Ley and then projected onto the retina at O’’. (b) Ray traces of the optical train that represents a theoretical infinity-corrected microscope system.
of the figure contains the essential optical elements and ray traces that define the optical train of a conventional finite-tube-length microscope (17). An object O of height h is being imaged on the retina of the eye at O’’. The objective lens Lob projects a real and inverted image of O magnified to the size O’ into the intermediate image plane of the microscope. This occurs at the eyepiece diaphragm, at the fixed distance fb + z behind the objective. In this diagram, fb represents the back focal length of the objective and z is the optical tube length of the microscope. The aerial intermediate image at O’ is further magnified by the microscope eyepiece Ley ) and produces an erect image of the object at O’’ on the retina, which appears inverted to the microscopist. The magnification factor of the object is calculated by considering the distance a between the object O and the objective Lob , and the front focal length f of the objective lens. The object is placed a short distance z outside of the objective’s front focal length f , such that z + f = a. The intermediate image of the object O’ is located at distance b, which equals the back focal length of the objective fb plus z , the optical tube length of the microscope. Magnification of the object at the intermediate image plane equals h . The image height at this position is derived by multiplying the microscope tube length b by the object height h, and dividing this by the distance of the object from the objective: h = (h × b)/a. From this argument, we can conclude that the lateral
1107
or transverse magnification of the objective is equal to a factor of b/a (also equal to f /z and z /fb, the back focal length of the objective divided by the distance of the object from the objective. The image at the intermediate plane h is further magnified by a factor of 25 centimeters (called the near distance to the eye) divided by the focal length of the eyepiece. Thus, the total magnification of the microscope is equal to the magnification by the objective multiplied by that of the eyepiece. The visual image (virtual) appears to the observer as if it were 10 inches away from the eye. Most objectives are corrected to work within a narrow range of image distances, and many are designed to work only in specifically corrected optical systems that have matching eyepieces. The magnification inscribed on the objective barrel is defined for the tube length of the microscope for which the objective was designed. The lower portion of Fig. 1 illustrates the optical train that uses ray traces of an infinity-corrected microscope system. The components of this system are labeled similarly to the finite-tube-length system for easy comparison. Here, the magnification of the objective is the ratio h /h, which is determined by the tube lens Ltb . Note the infinity space that is defined by parallel light beams in every azimuth between the objective and the tube lens. This is the space used by microscope manufacturers to add accessories such as vertical illuminators, DIC prisms, polarizers, and retardation plates, whose designs are much simpler and have little distortion of the image (18). The magnification of the objective in the infinity-corrected system equals the focal length of the tube lens divided by the focal length of the objective. FUNDAMENTALS OF IMAGE FORMATION When light from the microscope lamp in the optical microscope, passes through the condenser and then through the specimen (assuming that the specimen is light absorbing), some of the light passes undisturbed in its path around and through the specimen. Such light is called direct light or undeviated light. The background light (often called the surround) that passes around the specimen is also undeviated light. Some of the light that passes through the specimen is deviated when it encounters parts of the specimen. Such deviated light (as you will subsequently learn, called diffracted light) is rendered one-half wavelength or 180° out of phase with the direct light that has passed through undeviated. The one-half wavelength out of phase, caused by the specimen itself, enables this light to cause destructive interference with the direct light when both arrive at the intermediate image plane at the fixed diaphragm of the eyepiece. The eye lens of the eyepiece further magnifies this image which finally is projected onto the retina, the film plane of a camera, or the surface of a light-sensitive computer chip. What has happened is that the direct or undeviated light is projected by the objective and spread evenly across the entire image plane at the diaphragm of the eyepiece. The light diffracted by the specimen is brought to focus at various localized places on the same image plane,
1108
OPTICAL MICROSCOPY
Figure 2. Diffraction spectra seen at the rear focal plane of the objective through a focusing telescope when imaging a closely spaced line grating. (a) Image of the condenser aperture diaphragm whose stage’s empty. (b) Two diffraction spectra from a 10X objective when a finely ruled line grating is placed on the microscope stage. (c) Diffraction spectra of the line grating from a 40X objective. (d) Diffraction spectra of the line grating from a 60X objective.
Line grating diffraction patterns (a)
where the diffracted light causes destructive interference, reduces intensity, and results in more or less dark areas. These patterns of light and dark are what we recognize as an image of the specimen. Because our eyes are sensitive to variations in brightness, the image becomes a more or less faithful reconstitution of the original specimen. To help understand the basic principles, it is suggested that readers try the following exercise and use an object of known structure, such as a stage micrometer or similar grating of closely spaced dark lines as a specimen. To proceed, place the finely ruled grating on the microscope stage and bring it into focus using first a 10X and then the 40X objective (18). Remove the eyepiece and, insert a phase telescope in its place, so that the rear focal plane of the objective can be observed. If the condenser aperture diaphragm is closed most of the way, a bright white central spot of light will appear at the back of the objective, which is the image of the aperture diaphragm. To the right and left of the central spot, will be a series of spectra (also images of the aperture diaphragm), each colored blue on the part closest to the central spot and colored red on the part of the spectrum farthest from the central bright spot (as illustrated in Fig. 2). The intensity of these colored spectra decreases according to the distance from the central spot to the spectrum (17,18). Those spectra nearer the periphery of the objective are dimmer than those closer to the central spot. The diffraction spectra illustrated in Fig. 2 use three different magnifications. In Fig. 2b, the diffraction pattern visible at the rear focal plane of the 10X objective contains two diffraction spectra. If the grating is removed from the stage, as illustrated in Fig. 2a, these spectra disappear, and only the central image of the aperture diaphragm remains. If the grating is reinserted, the spectra reappear once again. Note that the spaces between the colored spectra appear dark. Only a single pair of spectra can be observed if the grating is examined with the 10X objective. In this case, one diffraction spot appears to the left, and one appears to the right of the central aperture opening. If the line grating is examined with a 40X objective, as shown in Fig. 2c, several diffraction spectra appear to the left and right of the central aperture. When the magnification is increased to 60X (and assuming that it has a higher numerical aperture than the 40X objective), additional spectra (Fig. 2d) appear to the right and left than are visible by using the 40X objective. Because the colored spectra disappear when the grating is removed, it can be assumed that the specimen itself affects the light passing through, thus producing the
(b)
(c)
(d)
colored spectra. Further, if the aperture diaphragm is closed down, we will observe that objectives of higher numerical aperture grasp more of these colored spectra than objectives of lower numerical aperture. The crucial importance of these two statements for understanding image formation will become clear in the ensuing paragraphs. The central spot of light (image of the condenser aperture diaphragm) represents the direct or undeviated light that passes undisturbed through or around the specimen as illustrated in Fig. 3b. It is called the zeroth order. The fainter images of the aperture diaphragm on each side of the zeroth order are called the first, second, third, fourth, etc. orders, respectively, as represented by the simulated diffraction pattern in Fig. 3a that would be observed at the rear focal plane of a 40X objective. All of
(a)
(b)
Undeviated light
Diffracted light
Figure 3. Diffraction spectra generated at the rear focal plane of the objective by undeviated and diffracted light. (a) Spectra visible through a focusing telescope at the rear focal plane of a 40X objective. (b) Schematic diagram of light both diffracted and undeviated by a line grating on the microscope stage.
OPTICAL MICROSCOPY
1109
Slit and grid diffraction patterns (a)
(d)
(b)
(e)
Figure 4. Diffraction patterns generated by narrow and wide slits and by complex grids. (a) Orthoscopic image of the grid seen at the rear focal plane of the objective when focused on the wide slit pattern in (b). (b) Conoscopic image of the grid that has a greater slit width at the top and a lesser width at the bottom. (c) Orthoscopic image of the narrow width portion of the grid [lower portion of (b)]. (d) and (f) Conoscopic images of grid lines arranged in a square pattern (d) and a hexagonal pattern (f). (e) and (g) Orthoscopic images of patterns in (d) and (f), respectively.
(c)
(f)
the captured orders in this case, represent, the diffraction pattern of the line grating seen at the rear focal plane of the objective (18). The fainter diffracted images of the aperture diaphragm are caused by light deviated or diffracted, spread out in fan shape, at each of the openings of the line grating (Fig. 3b). The blue wavelengths are diffracted at a lesser angle than the green wavelengths, which are diffracted at a lesser angle than the red wavelengths. The blue wavelengths from each slit at the rear focal plane of the objective, interfere constructively to produce the blue area of the diffracted image of each spectrum or order; similarly for the red and green areas (Fig. 3a). Where the diffracted wavelengths are one-half wave out of phase for each of these colors, the waves destructively interfere. Hence, the dark areas between the spectra or orders. All wavelengths from each slit add constructively at the position of the zeroth order. This produces the bright white light you see as the zeroth order at the center of the rear focal plane of the objective (Figs. 2, 3, and 4). The closer the spacing of a line grating, the fewer the spectra that will be captured by a given objective, as illustrated in Fig. 4a–c. The diffraction pattern illustrated in Fig. 4a was captured by a 40X objective imaging the lower portion the line grating in Fig. 4b, where the slits are closer together (17,18). In Fig. 4c, the objective is focused on the upper portion of the line grating (Fig. 4b) where the slits are farther apart, and more spectra are captured by the objective. The direct light and the light from the diffracted orders continue on, focused by the objective, to the intermediate image plane at the fixed diaphragm of the eyepiece. Here, the direct and diffracted light rays interfere and are thus reconstituted into the real, inverted image that is seen by the eye lens of the eyepiece and further magnified. This is illustrated in Fig. 4d–g by two types of diffraction gratings. The square grid illustrated in Fig. 4d represents the orthoscopic image of the grid (i.e., the usual specimen image) as seen through the full aperture of the objective. The diffraction pattern derived from this grid is shown as a conoscopic image that would be seen at the rear focal plane of the objective (Fig. 4e). Likewise, the orthoscopic image of a hexagonally arranged grid (Fig. 4f) produces a corresponding hexagonally arranged conoscopic image of first-order diffraction patterns (Fig. 4g).
(g)
Microscope specimens can be considered complex gratings that have details and openings of various sizes. This concept of image formation was largely developed by Ernst Abbe, the famous German microscopist and optics theoretician of the nineteenth century. According to Abbe (his theories are widely accepted now), the details of a specimen will be resolved if the objective captures the zeroth order of the light and at least the first order (or any two adjacent orders, for that matter). The greater the number of diffracted orders that gain admittance to the objective, the more accurately the image represents the original object (2,14,17,18). Further, if a medium of refractive index higher than that of air (such as immersion oil) is used in the space between the front lens of the objective and the top of the coverslip, as shown in Fig. 5a, the angle of the diffracted orders is reduced, and the fans of diffracted light are compressed. As a result, an oil immersion objective can
(a)
(b)
Figure 5. Effect of refractive index of imaging medium on diffracted orders captured by the objective. (a) Orthoscopic image of objective back focal plane diffraction spectra when air is the medium between the coverslip and the objective front lens. (b) Diffraction spectra when immersion oil of refractive index similar to glass is used in the space between the coverslip and the objective front lens.
1110
OPTICAL MICROSCOPY
capture more diffracted orders and yield better resolution than a dry objective (Fig. 5b). Moreover, because blue light is diffracted at a lesser angle than green or red light, a lens of a given aperture may capture more orders of light when the wavelengths are in the blue region of the visible light spectrum. These two principles explain the classic Rayleigh equation often cited for resolution (2,18–20): d = 1.22(λ/2NA)
Airy disks and resolution (a)
(b)
(c)
(1)
where d is the space between two adjacent particles (that still allows perceiving the particles as separate), λ is the wavelength of illumination, and NA is the numerical aperture of the objective. The greater the number of higher diffracted orders admitted into the objective, the smaller the details of the specimen that can be clearly separated (resolved). Hence, the value of using high numerical aperture for such specimens. Likewise, the shorter the wavelength of visible light used, the better the resolution. These ideas explain why apochromatic lenses of high numerical aperture can separate extremely small details in blue light. Placing an opaque mask at the back of the objective blocks the outermost diffracted orders. This either reduces the resolution of the grating lines, or any other object details, or it destroys the resolution altogether, so that the specimen is not visible. Hence, the usual caution not to close down the condenser aperture diaphragm below the suggested two-thirds to nine-tenths of the objective’s aperture. Failure of the objective to grasp any of the diffracted orders results in an unresolved image. In a specimen that has very minute details, the diffraction fans are spread at a very large angle and require a high numerical aperture objective to capture them. Likewise, because the diffraction fans are compressed in immersion oil or in water, objectives designed for such use can resolve better than dry objectives. If alternate diffracted orders are blocked out (still assuming the grating as our specimen), the number of lines in the grating appear doubled (a spurious resolution). The important caveat is that actions introduced at the rear of the objective can have a significant effect on the eventual image produced (18). For small details in a specimen (rather than a grating), the objective projects the direct and diffracted light onto the image plane of the eyepiece diaphragm as small, circular diffraction disks known as Airy disks (illustrated in Fig. 6). High numerical aperture objectives capture more of the diffracted orders and produce smaller disks than low numerical aperture objectives. In Fig. 6, Airy disk size is shown steadily decreasing from Fig. 6a through Fig. 6c. The larger disk sizes in Figs. 6a, b are produced by objectives that have lower numerical apertures, whereas the very sharp Airy disk in Fig. 6c is produced by an objective of very high numerical aperture (2,18). The resulting image at the eyepiece diaphragm level is actually a mosaic of Airy disks that are perceived as light and dark regions of the specimen. When two disks are so close together that their central black spots overlap considerably, the two details represented by these
(d)
(e)
Figure 6. Airy disks and resolution. (a–c) Airy disk size as a function of objective numerical aperture, which decreases from (a) to (c) as numerical aperture increases. (d) Two Airy disks so close together that their ring structures overlap. (e) Airy disks at the limit of resolution.
overlapping disks are not resolved or separated and thus appear as one (illustrated in Fig. 6c). The Airy disks shown in Fig. 6d are just far enough apart to be resolved. The basic principle to remember is that the combination of direct and diffracted light (or the manipulation of direct or diffracted light) is critically important in image formation. The key places for such manipulation are the rear focal plane of the objective and the front focal plane of the substage condenser. This principle is fundamental to most of the contrast improvement methods in optical microscopy (18; and see the section on Contrast-enhancing techniques); it is of particular importance at high magnification of small details close in size to the wavelength of light. Abbe was a pioneer in developing these concepts to explain image formation of light-absorbing or amplitude specimens (2,18–20). ¨ Kohler Illumination Proper illumination of the specimen is crucial in achieving high-quality images in microscopy and critical photomicrography. An advanced procedure for microscope illumination was first introduced in 1893 by August
OPTICAL MICROSCOPY
K¨ohler of the Carl Zeiss corporation to provide optimum specimen illumination. All manufacturers of modern laboratory microscopes recommend this technique because it produces specimen illumination that is uniformly bright and free from glare and thus allows the user to realize the microscope’s full potential. Most modern microscopes are designed so that the collector lens and other optical components built into the base project an enlarged and focused image of the lamp filament onto the plane of the aperture diaphragm of a properly positioned substage condenser. Closing or opening the condenser diaphragm controls the angle of the light rays that emerge from the condenser and reach the specimen from all azimuths. Because the light source is not focused at the level of the specimen, illumination at specimen level is essentially grainless and extended and does not suffer deterioration from dust and imperfections on the glass surfaces of the condenser. The opening size of the condenser aperture diaphragm, along with the aperture of the objective, determines the realized numerical aperture of the microscope system. As the condenser diaphragm is opened, the working numerical aperture of the microscope increases and results in greater light transmittance and resolving power. Parallel light rays that pass through and illuminate the specimen are brought to focus at the rear focal plane of the objective, where the image of the variable condenser aperture diaphragm and the light source are observed in focus simultaneously. Light pathways illustrated in Fig. 7 are schematically drawn to represent separate paths taken by the
1111
specimen-illuminating light rays and the image-forming light rays (17). This is not a true representation of any real segregation of these pathways, but it is a diagrammatic representation presented for visualization and discussion. The left-hand diagram in Fig. 7 demonstrates that the ray paths of illuminating light produce a focused image of the lamp filament at the plane of the substage condenser aperture diaphragm, the rear focal plane of the objective, and the eye point (also called the Ramsden disk) above the eyepiece. These areas in common focus are often referred to as conjugate planes, a principle that is critical in understanding the concept of K¨ohler illumination (2,17–21). By definition, an object that is in focus at one plane is also in focus at other conjugate planes of that light path. Four separate planes in each light pathway (both image forming and illumination), together make up a conjugate plane set. Conjugate planes in the path of the illuminating light rays in K¨ohler illumination (left-hand diagram in Fig. 7) include the lamp filament, condenser aperture diaphragm (at the front focal plane of the condenser), the rear focal plane of the objective, and the eye point of the eyepiece. The eye point is located approximately one-half inch (1 cm) above the top lens of the eyepiece, at the point where the observer places the front of the eye during observation. Likewise, the conjugate planes in the image-forming light path in K¨ohler illumination (right-hand diagram in Fig. 7) include the field diaphragm, the focused specimen, the intermediate image plane (i.e., the plane of the fixed diaphragm of the eyepiece), and the retina of the eye or the film plane of the camera. The presence of conjugate focal planes is often useful in troubleshooting a microscope
Light paths in köhler illumination Illuminating light path
Image-forming light path
Retina Exit pupil of microscope (Ramdsen disk)
Eye
Eyepiece Field diaphragm of eyepiece Back focal plane of Objective
Primary image
Aperture diaphragm of objective objective Object
Illuminating aperture diaphragm
Illuminated field diaphragm Lamp
Condenser
Lamp collector Lamp filament
Illuminated field diaphragm
Figure 7. Light paths in K¨ohler illumination. The illuminating ray paths are illustrated on the left side and the image-forming ray paths on the right. Light emitted from the lamp passes through a collector lens and then through the field diaphragm. The aperture diaphragm in the condenser determines the size and shape of the illumination cone on the specimen plane. After passing through the specimen, light is focused on the back focal plane of the objective and then enters and is magnified by the ocular before passing into the eye.
1112
OPTICAL MICROSCOPY
for contaminating dust, fibers, and imperfections in the optical elements. When such artifacts are in sharp focus, it follows that they must reside on or near a surface that is part of the imaging-forming set of conjugate planes. Members of this set include the glass element at the microscope light port, the specimen, and the graticule (if any) in the eyepiece. Alternatively, if these contaminants are out of focus, then they occur near the illuminating set of elements that share conjugate planes. Suspects in this category are the condenser top lens (where dust and dirt often accumulate), the exposed eyepiece lens element (contaminants from eyelashes), and the objective front lens (usually fingerprint smudges). In K¨ohler illumination, light emitted from the tungstenhalide lamp filament passes first through a collector lens located close to the lamp housing and then through a field lens that is near the field diaphragm. A sintered or frosted glass filter is often placed between the lamp and the collector lens to diffuse the light and ensure even intensity of illumination. In this case, the image of the lamp filament is focused onto the front focal plane of the condenser, while the diffuser glass is temporarily removed from the light path. The focal length of the collector lens must be carefully matched to the lamp filament dimensions to ensure that a filament image of the appropriate size is projected into the condenser aperture. For proper K¨ohler illumination, the image of the filament should completely fill the condenser aperture. The field lens is responsible for bringing the image of the filament into focus at the plane of the substage condenser aperture diaphragm. A first surface mirror (positioned at a 45° angle to the light path) reflects focused light that leaves the field lens through the field diaphragm and into the substage condenser. The field diaphragm iris opening is a virtual light source for the microscope, and its image is focused by the condenser (raised or lowered) onto the specimen plane. Optical designs for the arrangement of these elements may vary by microscope manufacturer, but the field diaphragm should be positioned at a sufficient distance from the field lens to eliminate dust and lens imperfections from imaging in the plane of the specimen. The field diaphragm in the base of the microscope controls only the width of the bundle of light rays that reaches the condenser — it does not affect the optical resolution, numerical aperture, or the intensity of illumination. Proper adjustment of the field diaphragm (i.e., focused by adjusting the height of the condenser and centered in the optical path, then opened to lie just outside the field of view) is important for preventing glare, which can reduce contrast in the observed image. Elimination of unwanted light is particularly important when attempting to image specimens that have inherently low contrast. When the field diaphragm is opened too far, scattered light originating from the specimen and light reflected at oblique angles from optical surfaces can act to degrade image quality. The substage condenser is typically mounted directly beneath the microscope stage in a bracket that can be raised or lowered independently of the stage. The aperture diaphragm opening size is controlled by a swinging arm, a lever, or by rotating a collar on the
condenser housing. The most critical aspect of achieving proper K¨ohler illumination is correct adjustment of the substage condenser. Condenser misalignment and an improperly adjusted condenser aperture diaphragm are the main sources of image degradation and poor quality photomicrography (19). When properly adjusted, light from the condenser fills the rear focal plane of the objective and projects a cone of light into the field of view. The condenser aperture diaphragm is responsible for controlling the angle of the illuminating light cone and, consequently, the working numerical aperture of the condenser. With respect to the size and shape of condenser light cones, it is important to note, that reducing the size of the field diaphragm only slightly decreases the size of the lower portions of the light cone. The angle and numerical aperture of the light cone remain essentially unchanged as field diaphragm size is reduced (21). Illumination intensity should not be controlled by opening and closing the condenser aperture diaphragm or by shifting the condenser up and down or axially with respect to the optical center of the microscope. It should be controlled only by using neutral density filters placed into the light path or by reducing voltage to the lamp (although the latter is not usually recommended, especially for photomicrography). To ensure maximum performance of the tungsten-halide lamp, refer to the manufacturer’s instrument manual to determine the optimum lamp voltage (usually 5–10 volts) and use that setting. Then, adding or removing neutral density filters can easily control the brightness of the illumination without affecting color temperature. The size of the substage condenser aperture diaphragm opening should coincide with the desired numerical aperture, and the quality of the resulting image should also be considered. In general, the diaphragm should be set to a position that allows two-thirds to nine-tenths (60 to 90%) of the entire light disk size (visible at the rear focal plane of the objective after removing of the eyepiece or using a phase telescope or Bertrand lens). These values may vary due to extremes in specimen contrast. The condenser aperture diaphragm should be set to an opening size that provides a compromise of resolution and contrast that depends, to a large degree, on the absorption, diffraction, and refraction characteristics of the specimen. This adjustment must be accomplished without overwhelming the image with artifacts that obscure detail and present erroneous enhancement of contrast. The amount of image detail and contrast necessary to produce the best photomicrograph also depends on refractive index, optical characteristics, and other specimen-dependent parameters. When the aperture diaphragm is erroneously closed too far, resulting diffraction artifacts cause visible fringes, banding, and/or pattern formation in visual images and photomicrographs. Other problems, such as refraction phenomena, can also produce apparent structures in the image that are not real (21). Alternatively, opening the condenser aperture too wide causes unwanted glare and light scattering from the specimen and optical surfaces within the microscope and leads to a significant loss of contrast and washing out of image detail. The correct
OPTICAL MICROSCOPY
setting will vary from specimen to specimen, and the experienced microscopist will soon learn to adjust the condenser aperture diaphragm (and numerical aperture of the system) accurately by observing the image without necessarily viewing the diaphragm in the rear focal plane of the objective. In fact, many microscopists (including the authors) believe that critical adjustment of the numerical aperture of the microscope system to optimize image quality is the single most important step in photomicrography. When the illumination system of the microscope is adjusted for proper K¨ohler illumination, it must satisfy several requirements. The illuminated area of the specimen plane must be no larger than the field of view for any given objective/eyepiece combination. The light must also have uniform intensity, and the numerical aperture may vary from a maximum (equal to that of the objective) to a lesser value that depends on the optical characteristics of the specimen. Table 1 contains a list of objective numerical apertures versus the field of view diameter (for an eyepiece of field number 22 with no tube lens present — see discussion on field number) for each objective, from very low to very high magnifications. Many microscopes are equipped with specialized substage condensers that have a swing-out top lens, which can be removed from the optical path for use with lower power objectives (2–5X). This action changes the performance of the remaining components in the light path, and some adjustment is necessary to achieve the best illumination conditions. The field diaphragm can no longer be used for aligning and centering the substage condenser and is now ineffective in limiting the area of the specimen under illumination. Much of the unwanted glare once removed by the field diaphragm is also reduced because the top lens of the condenser produces a light cone that has a much lower numerical aperture and allows light rays to pass through the specimen at much lower angles. Most important, the optical conditions for K¨ohler illumination no longer apply. For low power objectives (2–5X), aligning of the microscope optical components and establishing of K¨ohler illumination conditions should always be undertaken
Table 1. View-field Diameters (FN 22) (SWF 10X Eyepiece)a Objective Magnification 1/2X 1X 2X 4X 10X 20X 40X 50X 60X 100X 150X 250X a
Source: Nikon.
Diameter (mm) 44.0 22.0 11.0 5.5 2.2 1.1 0.55 0.44 0.37 0.22 0.15 0.088
1113
at a higher (10X) magnification before removing the swing-out condenser lens for work at lower (5X and less) magnifications. Then, the height of the condenser should not be changed. Condenser performance is radically changed when the swing-out lens is removed (18,21). The image of the lamp filament is no longer formed in the aperture diaphragm, which ceases to control the numerical aperture of the condenser and the illumination system. In fact, the aperture diaphragm should be opened completely to avoid vignetting, a gradual fading of light at the edges of the view field. Contrast adjustment in low magnification microscopy is achieved by adjusting of the field diaphragm (18,19,21). When the field diaphragm is wide open (more than 80%), specimen details are washed out, and a significant amount of scattering and glare is present. Closing the field diaphragm to a position between 50 and 80% yields the best compromise in specimen contrast and depth of field. This adjustment is now visible at the rear focal plane of the objective when the eyepiece is removed or when a phase telescope or Bertrand lens is inserted into the eye tube. Objectives designed for low magnification have significantly simpler designs than their higher magnification counterparts. This is due to the smaller angles of illuminating light cones produced by low magnification condensers, which require objectives of lower numerical aperture. Measurement graticules, which must be sharply focused and simultaneously superimposed on the specimen image, can be inserted into any of several conjugate planes in the image-forming path. The most common eyepiece (ocular) measuring and photomicrography graticules are placed in the intermediate image plane, which is positioned at the fixed diaphragm within the eyepiece. It is also theoretically possible to place graticules in any image-forming conjugate plane or in the plane of the illuminated field diaphragm. Stage micrometers are specialized graticules placed on microslides, which are used to calibrate eyepiece graticules and to make specimen measurements. Color and neutral density filters are often placed in the optical pathway to reduce light intensity or alter the color characteristics of the illumination. There are several locations within the microscope stand where these filters are usually placed. Some modern laboratory microscopes have a filter holder sandwiched between the lamp housing and collector lens, which is an ideal location for these filters. Often, neutral density filters along with color correction filters and a frosted diffusion filter are placed together in this filter holder. Other microscope designs provide a set of filters built internally into the body, which can be toggled into the light path by levers. A third common location for filters is a holder mounted on the bottom of the substage condenser, below the aperture diaphragm, that will accept gelatin or glass filters. It is important not to place filters in or near any of the image-forming conjugate planes to avoid imaging dirt or surface imperfections on the filters, along with the specimen (22). Some microscopes have an attachment for placing filters near the light port at the base (near the field diaphragm). This placement is probably too close to the
1114
OPTICAL MICROSCOPY
field diaphragm, and surface contamination may be either in sharp focus or appear as blurred artifacts superimposed on the image. For the same reasons, it is also not wise to place filters directly on the microscope stage. MICROSCOPE OBJECTIVES, EYEPIECES, CONDENSERS, AND OPTICAL ABERRATIONS Finite microscope objectives are designed to project a diffraction-limited image at a fixed plane (the intermediate image plane) that is dictated by the microscope tube length and located at a prespecified distance from the rear focal plane of the objective. Specimens are imaged at a very short distance beyond the front focal plane of the objective through a medium of defined refractive index, usually air, water, glycerin, or specialized immersion oils. Microscope manufacturers offer a wide range of objective designs to meet the performance needs of specialized imaging methods (2,6,9,18–21; and see the section on Contrastenhancing techniques) to compensate for cover glass thickness variations and to increase the effective working distance of the objective. All of the major microscope manufacturers have now changed their design to infinity-corrected objectives. Such objectives project emerging rays in parallel bundles from every azimuth to infinity. They require a tube lens in the light path to bring the image into focus at the intermediate image plane. The least expensive (and most common) objectives are achromatic objectives, which are corrected for axial chromatic aberration in two wavelengths (red and blue) that are brought into the same focus. Further, they are corrected for spherical aberration in the color green, as described in Table 2. The limited correction of achromatic objectives leads to problems in color microscopy and photomicrography. When focus is chosen in the red–blue region of the spectrum, images will have a green halo (often termed residual color). Achromatic objectives yield their best results when light is passed through a green filter (often an interference filter) and using blackand-white film when these objectives are employed for photomicrography. The lack of correction for flatness of field (or field curvature) further hampers achromat objectives. In the past few years, most manufacturers have begun providing flat-field corrections for achromat Table 2. Objective Lens Types and Correctionsa
Type Achromat Plan achromat Fluorite Plan fluorite Plan apochromat a
Corrections for Aberrations Spherical Chromatic *b *b 3d 3d 4e
2c 2c <3d <3d >4e
Flatness Correction No Yes No Yes Yes
Source: Nikon Instrument Group. Corrected for two wavelengths at two specific aperture angles. Corrected for blue and red — broad range of the visible spectrum. d Corrected for blue, green, and red — full range of the visible spectrum. e Corrected for dark blue, blue, green, and red. b c
objectives (see Fig. 8a) and have given these corrected objectives the name plan achromats. The next higher level of correction and cost is in objectives called fluorites (named for the mineral used for one of the optical components) or semiapochromats illustrated by the center objective in Fig. 8b. This figure depicts three major classes of objectives: achromats that have the least amount of correction, as discussed above; fluorites (or semi-apochromats) that have additional spherical corrections; and apochromats that are the most highly corrected objectives available. Modern fluorite objectives are produced from advanced glass formulations that often contain low-dispersion materials such as fluorspar or newer synthetic substitutes (5). These new formulations allow for greatly improved correction of optical aberration. Similar to achromats, fluorite objectives are also corrected chromatically for red and blue light. In addition, fluorites are also corrected spherically for two or three colors. The superior correction of fluorite objectives compared to achromats enables making these objectives with higher numerical apertures, that result in brighter images. Fluorite objectives also have better resolving power than achromats and provide a higher degree of contrast; this makes them better suited than achromats for color photomicrography in white light. The highest level of correction (and expense) is found in apochromatic objectives (see Fig. 8c), which are corrected chromatically for at least three colors (red, green, and blue), almost eliminating chromatic aberration, and are corrected spherically for three or four colors. Apochromatic objectives are the best choice for color photomicrography in white light. Because of their high level of correction, apochromat objectives usually have, higher numerical apertures than achromats or fluorites for a given magnification. Many of the newer high-end fluorite and apochromat objectives are corrected for four colors chromatically and four colors spherically. All three types of objectives formerly suffered from pronounced field curvature and projected images that were curved rather than flat. To overcome this inherent condition, lens designers have produced flat-field corrected objectives that yield flat images. Such lenses are called plan achromats, plan fluorites, or plan apochromats, and although this degree of correction is expensive, these objectives are now routinely used due to their value in photomicrography. Uncorrected field curvature is the most severe aberration in higher power fluorite and apochromat objectives, and it was tolerated as an unavoidable artifact for many years. During routine use, the view field had to be continuously refocused between the center and the edges to capture all specimen details. The introduction of flatfield (plan) correction to objectives perfected their use for photomicrography and video microscopy, and today these corrections are standard in both general use and highperformance objectives. Correction for field curvature adds a considerable number of lens elements to the objective, in many cases as many as four additional lenses. This significant increase in the number of lens elements for plan correction also occurs in already overcrowded fluorite and
OPTICAL MICROSCOPY
1115
Optical correction in objectives Achromatic objective (a)
Fluorite objective
Apochromatic objective
(b)
(c)
Figure 8. Levels of optical correction for aberration in commercial objectives. (a) Achromatic objectives, the lowest level of correction, contain two doublets and a single front lens, (b) Fluorites or semiapochromatic objectives, a medium level of correction, contain three doublets, a meniscus lens, and a single front lens, (c) Apochromatic objectives, the highest level of correction, contain a triplet, two doublets, a meniscus lens, and a single hemispherical front lens.
apochromat objectives and frequently results in a tight fit of lens elements within the objective barrel (4,5,18). Before the transition to infinity-corrected optics, most objectives were specifically designed to be used with a set of oculars termed compensating eyepieces. An example is the former use of compensating eyepieces with highly corrected high numerical aperture objectives to help eliminate lateral chromatic aberration. There is a wealth of information inscribed on the barrel of each objective, which can be broken down into several categories (illustrated in Fig. 9). These include the linear magnification, numerical aperture value, optical corrections, microscope body tube length, the type of medium for which the objective is designed, and other critical factors in deciding if the objective will perform as needed. Additional information is outlined here (17): Optical Corrections. These are usually abbreviated as Achro (achromat), Apo (apochromat), and Fl, Fluar, Fluor, Objective specifications Manufacturer Flat-field correction Linear magnification Specialized optical properties Tube length Coverslip thickness
Screw thread Aberration correction Numerical aperture Immersion medium Working distance Color code Finger grip Spring-loaded front lens
Figure 9. Specifications engraved on the barrel of a typical microscope objective. These include the manufacturer, correction levels, magnification, numerical aperture, immersion requirements, tube length, working distance, and specialized optical properties.
Neofluar, or Fluotar (fluorite) for better spherical and chromatic corrections, and as Plan, Pl, EF, Acroplan, Plan Apo or Plano for field curvature corrections. Other common abbreviations are ICS (infinity-corrected system) and UIS (universal infinity system), N and NPL (normal field of view plan), Ultrafluar (fluorite objective with glass that is transparent down to 250 nanometers), and CF and CFI (chrome-free; chrome-free infinity). Numerical Aperture. This is a critical value that indicates the light acceptance angle, which in turn determines the light gathering power, the resolving power, and depth of field of the objective. Some objectives specifically designed for transmitted light fluorescence and dark-field imaging are equipped with an internal iris diaphragm that allows adjusting of the effective numerical aperture. Designation abbreviations for these objectives include I, Iris, W/Iris. Mechanical Tube Length. This is the length of the microscope body tube between the nosepiece opening, where the objective is mounted, and the top edge of the observation tubes where the oculars (eyepieces) are inserted. Tube length is usually inscribed on the objective as the size in number of millimeters (160, 170, 210, etc.) for fixed lengths or as the infinity symbol (∞) for infinitycorrected tube lengths. Cover Glass Thickness. Most transmitted light objectives are designed to image specimens that are covered by a cover glass (or coverslip). The thickness of these small glass plates is now standardized at 0.17 mm for most applications, although there is some variation in thickness within a batch of coverslips. For this reason, some of the high numerical aperture dry objectives have a correction collar adjustment of the internal lens elements to compensate for this variation (Fig. 10). Abbreviations for the correction collar adjustment include Corr, w/Corr, and CR, although the presence of a rotatable, knurled collar and graduated scale is also an indicator of this feature.
1116
OPTICAL MICROSCOPY Table 3. Color-coded Rings on Microscope Objectives
High NA objective with correction collar (a)
(b)
Immersion Color Codea
Lens group 3 1
2
Lens group 2
Black Orange White Red Magnification Color Codeb
Lens group 1 Figure 10. Objective that has three lens groups and correction collar for varying cover glass thicknesses. (a) Lens group 2 rotated to the forward position within the objective. This position is used for the thinnest coverslips. (b) Lens group 2 rotated to the rearward position within the objective. This position is used for the thickest coverslips.
Working Distance. This is the distance between the objective front lens and the top of the cover glass when the specimen is in focus. In most instances, the working distance of an objective decreases as magnification increases. Working distance values are not included on all objectives and their presence varies, depending on the manufacturer. Common abbreviations are L, LL, LD, and LWD (long working distance); ELWD (extralong working distance); SLWD (superlong working distance), and ULWD (ultralong working distance). Objective Screw Threads. The mounting threads on almost all objectives are sized to standards of the Royal Microscopical Society (RMS) for universal compatibility. This standard specifies a mounting thread that is 20.32 mm in diameter whose a pitch is 0.706; it is currently used to produce infinity-corrected objectives by Olympus and Zeiss. Leica and Nikon broke from the standard when they introduced of new infinitycorrected objectives that have a wider mounting thread size; thus Leica and Nikon objectives are usable only on their own microscopes. Abbreviations commonly used are RMS (Royal Microscopical Society objective thread), M25 (metric 25-mm objective thread), and M32 (metric 32-mm objective thread). Immersion Medium. Most objectives are designed to image specimens where air is the medium between the objective and the cover glass. To attain higher working numerical apertures, many objectives are designed to image the specimen through another medium that reduces refractive index differences between glass and the imaging medium. High-resolution plan apochromat objectives can achieve numerical apertures up to 1.40 when the immersion medium is a special oil whose refractive index is 1.51. Other common immersion media are water and glycerin. Objectives designed for special immersion media usually have a color-coded ring inscribed around the circumference of the objective barrel, as listed in Table 3 and described next.
Black Brown Red Yellow Green Turquoise blue Light blue Cobalt (dark) blue White (cream)
Immersion Type Oil immersion Glycerol immersion Water immersion Special Magnification 1X, 1.25X 2X, 2.5X 4X, 5X 10X 16X, 20X 25X, 32X 40X, 50X 60X, 63X 100X
a Narrow colored ring located near the specimen end of objective. b Narrow band located closer to the mounting thread than the immersion code.
Color Codes. Many microscope manufacturers label their objectives with color codes, according to a DIN standard, to help in rapid identification of the magnification. For example, the dark blue color code on the objective illustrated in Fig. 9 indicates that linear magnification is 60X. This is very helpful when you have a nosepiece turret that contains five or six objectives and you must quickly select a specific magnification. Some specialized objectives have an additional color code that indicates the type of immersion medium necessary to achieve the optimum numerical aperture. Immersion lenses intended for use with oil have a black color ring, and those intended for use with glycerin have an orange ring. Objectives for imaging living organisms in aqueous media, designated water immersion objectives, have a white ring, and highly specialized objectives for unusual immersion media often have an engraved red ring. Table 3 lists current magnification and imaging media color codes used by most manufacturers. Specialized Optical Properties. Microscope objectives often have design parameters that optimize performance under certain conditions. For example, special objectives are designed for polarized illumination (signified by the abbreviations P, Po, Pol, or SF, and/or all barrel engravings are painted red), phase contrast (PH, and/or green barrel engravings), differential interference contrast (DIC), and many other abbreviations for additional applications. The apochromat objective illustrated in Fig. 9 is optimized for DIC photomicrography, and this is indicated on the barrel. The capital H beside the DIC marking indicates that the objective must be used with a specific DIC prism optimized for high-magnification applications. Some applications do not require objectives designed to be corrected for cover glass thickness. These include objectives used to observe uncovered specimens in reflected light, metallurgical specimens, integrated circuit inspection, micromachinery, biological smears, and
OPTICAL MICROSCOPY
other applications that require observing of uncovered objects. Other common abbreviations found on microscope objective barrels, which are useful in identifying specific properties, are listed in Table 4 (17). Optical Aberrations Lens errors or aberrations in optical microscopy are caused by artifacts that arise from the interaction of light with glass lenses (2–5, 19–24). There are two primary causes of aberration: (1) geometrical or spherical aberrations that are related to the spherical nature of the lens and approximations used to obtain the Gaussian lens equation; and (2) chromatic aberrations that arise from variations in the refractive index of the wide range of frequencies in visible light. In general, optical aberrations induce faults in the features of an image being observed through a microscope. These artifacts were first addressed in the eighteenth Table 4. Specialized Objective Designationsa Abbreviation Phase, PHACO, PC, Ph 1, 2, 3, etc. DL, DM, PLL, PL, PM, PH, NL, NM, NH
P, Po, Pol, SF U, UV, Universal
M NC, NCG, OD
EPI
TL BBD, HD, B\D D H U, UT
DI; MI; TI Michelson a
Type Phase-contrast, using phase condenser annulus 1, 2, 3, etc. Phase-contrast: dark low, dark medium, positive low, positive low low, positive medium, positive high contrast (regions that have higher refractive index appear darker); negative low, negative medium, negative high contrast (regions that have higher refractive index appear lighter) Strain-free, low birefringence, for polarized light UV transmitting (down to approx. 340 nm), for UV-excited epifluorescence Metallographic (no coverslip) No coverslip, no coverglass, ohne decke (German designation for no coverslip) Surface illumination (specimen illuminated through objective lens), as contrasted to dia- or transillumination Transmitted light For use in bright or dark field (hell, dunkel) Dark field Designed primarily for heating stage Designed to be used with universal stage (magnification/NA applied for use with glass hemisphere; divide both values by 1.51 when hemisphere is not used) Interferometry; noncontact; multiple beam (Tolanski)
Many of the designation codes are manufacturer-specific.
1117
century when physicist John Dollond discovered that chromatic aberration was reduced or corrected by using a combination of two different types of glass (flint and crown) to fabricate lenses (13). Later, during the nineteenth century, achromatic objectives that had high numerical aperture were developed, although the lenses still had geometrical problems. Modern glass formulations coupled with advanced grinding and manufacturing techniques have all but eliminated most aberrations from today’s microscope objectives, although careful attention must still be paid to these effects, especially when conducting quantitative high-magnification video microscopy and photomicrography (23). Spherical Aberration These artifacts occur when light waves that pass through the periphery of a lens are not brought into identical focus with those that pass closer to the center. Waves that pass near the center of the lens are refracted only slightly, whereas waves that pass near the periphery are refracted to a greater degree and result in producing of different focal points along the optical axis. This is one of the most serious resolution artifacts because the image of the specimen is spread out along the optical axis rather than sharply focused. Spherical aberrations are very important in the resolution of the lens because they affect the coincident imaging of points along the optical axis and degrade the performance of the lens, which seriously affects specimen sharpness and clarity. These lens defects can be reduced by limiting the outer edges of the lens from exposure to light by using diaphragms and also aspherical lens surfaces within the system. The highest quality modern microscope objectives address spherical aberrations in a number of ways, including special lens-grinding techniques, additional lens elements of different curvatures, improved glass formulations, and better control of optical pathways. Chromatic Aberration This type of optical defect results from the fact that white light is composed of numerous wavelengths. When white light passes through a convex lens, the component wavelengths are refracted according to their frequency. Blue light is refracted to the greatest extent followed by green and red light, a phenomenon commonly referred to as dispersion. The inability of the lens to bring all of the colors into common focus results in a slightly different image size and focal point for each predominant wavelength group. This leads to color fringes that surround the image. Lens corrections were first attempted in the latter part of the eighteenth century when Dollond, Lister, and others devised ways to reduce longitudinal chromatic aberration (13). By combining crown glass and flint glass (each type has a different dispersion of refractive index), they succeeded in bringing the blue rays and the red rays to a common focus, near but not identical to the green rays. This combination is termed a lens doublet; each lens has a different refractive index and dispersive properties. Lens doublets are also known as achromatic lenses or
1118
OPTICAL MICROSCOPY
achromats for short, derived from the Greek terms a meaning without and chroma meaning color. This simple form of correction allows the image points at 486 nm in the blue region and 656 nm in the red region to coincide. This is the most widely used objective lens and is commonly found on laboratory microscopes. Objectives that do not carry a special inscription stating otherwise are likely to be achromats. Achromats are satisfactory objectives for routine laboratory use, but because they are not corrected for all colors, a colorless specimen detail is likely to show a pale green color at best focus (the so-called secondary spectrum) in white light. A proper combination of lens thickness, curvature, refractive index, and dispersion allows the doublet to reduce chromatic aberration by bringing two of the wavelength groups into a common focal plane. If fluorspar is introduced into the glass formulation used to fabricate the lens, then the three colors red, green, and blue can be brought into a single focal point that results in a negligible amount of chromatic aberration (23). These lenses, known as apochromatic lenses, are used to build very highquality chromatic aberration-free microscope objectives. Modern microscopes use this concept, and today it is common to find optical lens triplets made of three lens elements cemented together, especially in higher quality objectives. For chromatic aberration correction, a typical 10X achromat microscope objective is built of two lens doublets (Fig. 8a). Apochromat objectives usually contain two lens doublets and a lens triplet (Fig. 8c) for advanced correction of both chromatic and spherical aberrations. In addition to longitudinal (or axial) chromatic aberration correction, apochromat objectives also exhibit another chromatic defect. Even when the three main colors are brought to identical focal planes axially, the point images of details near the periphery of the field of view are not the same size; for example, the blue image of a detail is slightly larger than the green image or the red image in white light; this causes color ringing of specimen details at the outer regions of the field of view (23), a defect known as lateral chromatic aberration or chromatic difference of magnification. The compensating eyepiece, whose chromatic difference of magnification is just the opposite of that of the objective, is used to correct for lateral chromatic aberration. Because this defect is also found in higher magnification achromats, compensating eyepieces are frequently used for such objectives, too. Indeed, many manufacturers design their achromats so that they have a standard lateral chromatic error and use compensating eyepieces for all their objectives. Such eyepieces often carry the inscription K, C, or Compens. As a result, compensating eyepieces have built-in lateral chromatic error and are not, in themselves, perfectly corrected. Coverslip Correction It is possible for the user to introduce spherical aberration into a well-corrected system inadvertently (2,23). For example, when using high magnification and high numerical aperture dry objectives (NA = 0.85–0.95), the correct thickness of the cover glass (suggested 0.17 mm) is critical; hence, a correction collar is included on such objectives to enable adjustment for incorrect cover
glass thickness. Similarly, inserting of accessories in the light path of finite-tube-length objectives may introduce aberrations that are apparent when the specimen is refocused, unless such accessories have been properly designed with additional optics. Fig. 10 illustrates how internal lenses operate in an objective designed for coverslip correction. Other Geometrical Aberrations These include a variety of effects including astigmatism, field curvature, and comatic aberrations that are corrected by proper lens fabrication. Curvature of field in the image is an aberration that is familiar to most experienced microscopists. This artifact is the natural result of using lenses that have curved surfaces. When visible light is focused through a curved lens, the image plane produced by the lens is curved. When the image is viewed in the eyepieces (oculars) of a microscope, it either appears sharp and crisp in the center or on the edges of the view field, but not both. Normally, this is not a serious problem when the microscopist is routinely scanning samples to observe their various features. It is a simple matter to use the fine focus knob to correct small deficiencies in specimen focus. However, in photomicrography, field curvature can be a serious problem, especially when a portion of the photomicrograph is out of focus. Modern microscopes correct field curvature by using specially designed flat-field objectives. These specially corrected objectives, named plan or plano, are the most common type of objective in use today. Plan objectives are also corrected for other optical artifacts such as spherical and chromatic aberrations. In a plan objective that also has been mostly corrected for chromatic aberration, the objective is referred to as a plan achromat. This is also the case for fluorite and apochromatic objectives, which have the modified names, plan fluorite and plan apochromat. Adding field curvature lens corrections to an objective that has already been corrected for optical aberrations can often add a significant number of lens elements to the objective. For example, the typical achromat objective has two lens doublets and a hemispherical lens, a total of five lens elements. In contrast, a comparable plan achromat objective has three doublets and three single lenses, a total of nine lens elements, which makes it considerably more difficult to fabricate. As we have seen, the number of lens elements increases as lenses are corrected for spherical errors and for chromatic and field curvature aberrations. Unfortunately, as the number of lens elements increases, so does the cost of the objective. Sophisticated plan apochromatic objectives that are corrected for spherical, chromatic, and field curvature aberrations can contain as many as eighteen to twenty separate lens elements; this makes these objectives the most expensive and difficult to manufacture. Plan apochromatic objectives can cost upward of $3,000 to $5,000 each for high-magnification units that also have high numerical apertures. For most photomicrographic applications, however, it is not absolutely necessary to have the best correction, although this is heavily dependent upon the purpose, the specimen, and the
OPTICAL MICROSCOPY
desired magnification range. When cost is important (when isn’t it?), it is often wise to select more modestly priced plan fluorite objectives that have a high degree of correction, especially the more modern versions. These objectives provide crisp and sharp images that have minimal field curvature and are sufficient for most photomicrograpic applications. Field curvature is very seldom totally eliminated, but it is often difficult to detect edge curvature when using the most plan-corrected objectives, and it does not show up in photomicrographs (19,23). This artifact is more severe at low magnifications and can be a problem when using stereomicroscopes. Manufacturers have struggled for years to eliminate field curvature in the large objectives found in stereomicroscopes. In the past ten years, companies such as Leica, Nikon, Olympus, and Zeiss have made great strides in the quality of optics used to build stereomicroscopes and, although the artifacts and aberrations have not been totally eliminated, high-end models can now produce superb photomicrographs. Comatic aberrations are similar to spherical aberrations, but they are encountered mainly in off-axis objects and are most severe when the microscope is out of alignment (23). In this instance, the image of a point is asymmetrical and results in a comet-like (hence, the term coma) shape. The comet shape may have its tail pointing toward the center of the field of view or away, depending on whether the comatic aberration has a positive or negative value. A coma may occur near the axial area of the light path and/or the more peripheral area. These aberrations are usually corrected along with spherical aberrations by designing lens elements of various shapes to eliminate this error. Objectives that are designed to yield excellent images for wide field of view eyepieces have to be corrected for coma and astigmatism using a specially designed multielement optic in the tube lens to avoid these artifacts at the periphery of the field of view. Astigmatic aberrations are similar to comatic aberrations; however, these artifacts depend more strongly on the obliquity of the light beam (23). This defect is found at the outer portions of the field of view of uncorrected lenses. The off-axis image of a specimen point appears as a line instead of a point. What is more, depending on the angle of the off-axis rays that enter the lens, the line image may be oriented in either of two different directions, tangentially or radially. Astigmatic errors are usually corrected by designing the objectives to provide precise spacing of individual lens elements as well as appropriate lens shapes and index of refraction. Astigmatism is often corrected in conjunction with the correction of field curvature aberrations. Eyepieces (Oculars) Eyepieces work in combination with microscope objectives to magnify the intermediate image further, so that specimen details can be observed. Ocular, an alternative name for eyepiece, has been widely used in the literature, but to maintain consistency during this discussion, we will refer to all oculars as eyepieces. Best results in microscopy require using objectives in combination with eyepieces that are appropriate to the correction and type of
1119
Eyepiece cutaway diagram Protective eyecup Eye lens Lens 3
Lens group 2
Fixed-aperture diaphragm Eye-tube clamp
Intermediate image position Lens group 1
Figure 11. Cutaway diagram of a typical periplan eyepiece. The fixed-aperture diaphragm is positioned between lens group 1 and lens group 2, where the intermediate image is formed. The eyepiece has a protective eyecup that makes viewing the specimen more comfortable for the microscopist.
objective. Inscriptions on the side of the eyepiece describe its particular characteristics and function. The eyepiece illustrated in Fig. 11 is inscribed with UW (not illustrated), which is an abbreviation for ultrawide view field. Often, eyepieces will also have an H designation, depending on the manufacturer, to indicate a high eye point focal point that allows microscopists to wear glasses while viewing samples. Other common inscriptions often found on eyepieces include WF for wide field, UWF for ultrawide field, SW and SWF for super-wide field, HE for high eye point, and CF for eyepieces intended for use with CF corrected objectives (2,3,5,19,23,24). As discussed before, compensating eyepieces are often inscribed with K, C, Comp, or Compens, as well as the magnification. Eyepieces used with flat-field objectives are sometimes labeled Plan-Comp. Magnification factors of common eyepieces range from 5–25X, and usually contain an inscription, such as A/24, which indicates that the field number is 24, referring to the diameter (in millimeters) of the fixed diaphragm in the eyepiece. Many eyepieces also have a focus adjustment and a thumbscrew that allows fixing their positions. Now, manufactures often produce eyepieces that have rubber eyecups that position the eyes at the proper distance from the front lens and block room light from reflecting from the lens surface and interfering with the view. There are two major types of eyepieces that are grouped according to lens and diaphragm arrangement: the negative eyepiece that has an internal diaphragm between the lenses and the positive eyepiece that has a diaphragm below the lenses of the eyepiece. Negative eyepieces have two lenses or lens groups: the upper lens, which is closest to the observer’s eye, is called the eye lens and the lower lens (beneath the diaphragm) is often termed the field lens. In their simplest form, both lenses are plano-convex, they have convex sides facing the
1120
OPTICAL MICROSCOPY
specimen. Approximately midway between these lenses is a fixed circular opening or internal diaphragm which, by its size, defines the circular field of view that is observed by looking into the microscope. The simplest kind of negative eyepiece, or Huygenian eyepiece, is used on most routine microscopes fitted with achromatic objectives. Although the Huygenian eye and field lenses are not well corrected, their aberrations tend to cancel each other out. More highly corrected negative eyepieces have two or three lens elements cemented and combined together to make the eye lens. If an unknown eyepiece carries only the magnification inscribed on the housing, it is most likely to be a Huygenian eyepiece, best suited for use with achromatic objectives of 5–40X magnification. The other main type of eyepiece is the positive eyepiece, commonly known as the Ramsden eyepiece, that has a diaphragm below its lenses. This eyepiece has an eye lens and field lens that are also plano-convex, but the field lens is mounted so that the curved surface faces toward the eye lens. The front focal plane of this eyepiece lies just below the field lens at the level of the eyepiece fixed diaphragm, and makes this eyepiece readily adaptable for mounting graticules. To provide better correction, the two lenses of the Ramsden eyepiece may be cemented together. Simple eyepieces such as the Huygenian and Ramsden and their achromatized counterparts will not correct for a residual chromatic difference of magnification in the intermediate image, especially when used in combination with high magnification achromatic objectives and fluorite or apochromatic objectives. To remedy this in finite microscopy systems, manufacturers produce compensating eyepieces that introduce an equal, but opposite, chromatic error in the lens elements. Compensating eyepieces may be either of the positive or negative type and must be used at all magnifications with fluorite, apochromatic, and all variations of plan objectives (they can also be used to advantage with achromatic objectives of 40X and higher). In recent years, modern microscope objectives have their correction for chromatic difference of magnification either built into the objectives themselves (Olympus and Nikon) or corrected in the tube lens (Leica and Zeiss), thus eliminating the need for compensation correction of the eyepieces. Compensating eyepieces play a crucial role in helping to eliminate residual lateral chromatic aberrations inherent in the design of highly corrected objectives. Hence, it is preferable for the microscopist to use the compensating eyepieces designed by a particular manufacturer to accompany that manufacturer’s higher corrected objectives. Using an incorrect eyepiece whose apochromatic objective is designed for a finite (160 or 170 mm) tube length microscope results in dramatically increased contrast, red fringes on the outer diameters and blue fringes on the inner diameters of specimen detail. Additional problems arise from a limited flatness of the view field in simple eyepieces, even those corrected with eye lens doublets. More advanced eyepiece designs resulted in the periplan eyepiece (Fig. 11), which contains seven lens elements that are cemented into a doublet, a triplet, and
two individual lenses. Design improvements in periplan eyepieces lead to better correction for residual lateral chromatic aberration, increased flatness of field, and general overall better performance when used with higher power objectives. Modern microscopes feature vastly improved plancorrected objectives in which the primary image has much less curvature of field than older objectives. In addition, most microscopes now feature much wider body tubes that have greatly increased the size of intermediate images. To address these new features, manufacturers now produce wide eye-field eyepieces that increase the viewable area of the specimen by as much as 40%. Because the strategies of eyepiece-objective correction techniques vary from manufacturer to manufacturer, it is very important (as stated before) to use only eyepieces recommended by a specific manufacturer for use with their objectives. Our recommendation is to choose the objective carefully first, then purchase an eyepiece that is designed to work in conjunction with the objective. When choosing eyepieces, it is relatively easy to differentiate between simple and more highly compensating eyepieces. Simple eyepieces such as the Ramsden and Huygenian (and their more highly corrected counterparts) will appear to have a blue ring around the edge of the eyepiece diaphragm when viewed through the microscope or held up to a light source. In contrast, more highly corrected compensating eyepieces have a yellow–red–orange ring around the diaphragm under the same circumstances. Modern noncompensating eyepieces are fully corrected and show no color. Most of the modern microscopes have all corrections in the objectives themselves or have a final correction in the tube lens. Such microscopes do not need compensating eyepieces. The properties of several common commercially available eyepieces are listed according to type in Table 5 (19,23). The three major types of eyepieces listed in Table 5 are finder, wide field, and superwide field. The terminology used by various manufacturers can be very confusing, and careful attention should be paid to their sales brochures and microscope manuals to ensure that the correct eyepieces are being used with a specific objective. In Table 5, the abbreviations that designate wide-field and superwide-field eyepieces are coupled to their design for high eye point, and are WH and SWH, respectively. The magnifications are either 10X or 15X and the field numbers range from 14 to 26.5, depending upon the application. The diopter adjustment, used to adjust for differences between the eyes, is approximately the same for all eyepieces, and many also contain either a photomask or micrometer graticule. Light rays emanating from the eyepiece intersect at the exit pupil or eye point (Ramsden disk), where the front of the microscopist’s eye should be placed to see the entire field of view (usually 8–10 mm above the eye lens). By increasing the magnification of the eyepiece, the eye point is drawn closer to the upper surface of the eye lens, making it much more difficult for microscopists to use, especially if they wear eyeglasses. Specially designed high-eye-point eyepieces have been manufactured that feature eye-point viewing distances that approach 20–25 mm above the
OPTICAL MICROSCOPY
1121
Table 5. Properties of Commercial Eyepieces Eyepiece Type Descriptive abbreviation Field number Diopter adjustment Remarks
Diameter of micrometer graticule
Superwide Field Eyepieces
Finder Eyepieces PSWH 10X
PWH 10X
35 SWH 10X
SWH 10X H
26.5 −8 ∼ +2
22 −8 ∼ +2
26.5 −8 ∼ +2
26.5 −8 ∼ +2
3 41 × 4 14 photomask
3 14 × 4 14 photomask
35-mm photomask
Diopter correction
—
—
—
—
surface of the eye lens. These improved eyepieces have larger diameter eye lenses that contain more optical elements and usually feature improved flatness of field. Such eyepieces are often designated by the inscription H somewhere on the eyepiece housing, either alone or in combination with other abbreviations, as discussed earlier. We should mention that high-eye-point eyepieces are especially useful for microscopists who wear eyeglasses to correct for near- or farsightedness, but they do not correct for several other visual defects, such as astigmatism. Today, high-eye-point eyepieces are very popular, even among people who do not wear eyeglasses, because the large eye clearance reduces fatigue and makes viewing images through the microscope much more comfortable. At one time, eyepieces were available in a wide range of magnifications extending from 6.3–30X and sometimes even higher for special applications. These eyepieces are very useful for observation and photomicrography using low-power objectives. Unfortunately, for higher power objectives, the problem of empty magnification (magnification without increased clarity) becomes important when using very high magnification eyepieces and these should be avoided. Today, most manufacturers restrict their eyepiece offerings to those in the 10X to 20X ranges. The diameter of the view field in an eyepiece is expressed as a field of view number or field number (FN), as discussed before. Information about the field number of an eyepiece can yield the real diameter of the object view field by using this formula (23): View-field diameter = FN/(MO × MT )
(2)
where FN is the field number in millimeters, MO is the magnification of the objective, and MT is the tube lens magnification factor (if any). Applying this formula to the superwide-field eyepiece listed in Table 5, we arrive at the following for a 40X objective whose tube lens magnification is 1.25: View-field diameter = 26.5/(40 × 1.25) = 0.53 mm.
(3)
Table 1 lists the view-field diameters across the common range of objectives that would occur using this eyepiece.
Wide Field Eyepieces CROSSWH 10X H 22 −8 ∼ +2 Diopter correction crossline —
WH 15X
WH 10X H
14 −8 ∼ +2
22 −8 ∼ +2 Diopter correction
24
24
Care should be taken in choosing eyepiece/objective combinations to ensure the optimal magnification of specimen detail without adding unnecessary artifacts. For instance, to achieve a magnification of 250X, the microscopist could choose a 25X eyepiece coupled to a 10X objective. An alternative choice for the same magnification would be a 10X eyepiece coupled to a 25X objective. Because the 25X objective has a higher numerical aperture (approximately 0.65) than the 10X objective (approximately 0.25) and considering that numerical aperture values define an objective’s resolving power, it is clear that the latter would be the better choice. If photomicrographs of the same view field were made with each objective/eyepiece combination described, it would be obvious that the 10X eyepiece/25X objective duo would produce photomicrographs that excelled in specimen detail and clarity compared to the alternative combination. The numerical aperture of the objective/condenser system defines the range of useful magnification for an objective/eyepiece combination (19,22–24). There is a minimum magnification necessary for the detail in an image to be resolved, and this value is usually rather arbitrarily set at 500 times the numerical aperture (500 × NA). At the other end of the spectrum, the maximum useful magnification of an image is usually set at 1000 times the numerical aperture (1000 × NA). Magnifications higher than this value will yield no further useful information or finer resolution of image detail and will usually lead to image degradation. Exceeding the limit of useful magnification causes the image to suffer from the phenomenon of empty magnification (19), where increasing magnification through the eyepiece or intermediate tube lens only causes the image to become more magnified without a corresponding increase in detail resolution. Table 6 lists the common objective/eyepiece combinations that fall into the range of useful magnification. Eyepieces can be adapted for measuring by adding a small circular disk-shaped glass graticule at the plane of the fixed-aperture diaphragm of the eyepiece. Graticules usually have markings, such as a measuring rule or grid, etched onto the surface. Because the graticule lies in the same conjugate plane as the fixed-eyepiece diaphragm, it appears in sharp focus superimposed on the image of the specimen. Eyepieces, that use graticules usually contain
1122
OPTICAL MICROSCOPY Table 6. Range of Useful Magnification (500–1000 × NA of Objective) Objective (NA) 2.5X (0.08) 4X (0.12) 10X (0.35) 25X (0.55) 40X (0.70) 60X (0.95) 100X (1.42) a b
10X
Eyepieces 12.5X 15X 20X
Abbe condenser/objective combination
Objective rear focal plane
25X
—a
—a
—
xb
xb
—
—
x
x
x
—
x
x
x
x
x
x
x
x
—
x
x
x
—
—
x
x
x
—
—
x
x
—
—
—
Immersion oil
Condenser front lens
Specimen Immersion oil
— : Not recommended. x: Useful range.
a focusing mechanism (helical screw or slider) that allows bringing the image of the graticule into focus. A stage micrometer is needed to calibrate the eyepiece scale for each objective.
Condenser aperture diaphragm
Condenser Systems The substage condenser gathers light from the microscope light source and concentrates it into a cone of light that illuminates the specimen with parallel beams of uniform intensity from all azimuths across the entire view field. It is critical to adjust the condenser light cone properly to optimize the intensity and angle of light that enters the objective front lens. Each time an objective is changed, the substage condenser aperture iris diaphragm must be adjusted correspondingly to provide the proper light cone for the numerical aperture of the new objective. A simple two-lens Abbe condenser is illustrated in Fig. 12. In this figure, light from the microscope illumination source passes through the aperture or condenser diaphragm, located at the base of the condenser, and is concentrated by internal lens elements, which then project light through the specimen in parallel bundles from every azimuth. The size and numerical aperture of the light cone are determined by adjusting of the aperture diaphragm. After passing through the specimen (on the microscope slide), the light diverges to an inverted cone of the proper angle (2θ in Fig. 12) to fill the front lens of the objective (2,18–24). Aperture adjustment and proper focusing of the condenser are critically important in realizing the full potential of the objective. Specifically, appropriate use of the adjustable aperture iris diaphragm (incorporated into the condenser or just below it) is most important in securing correct illumination, contrast, and depth of field. Opening and closing this iris diaphragm controls the angle of illuminating rays (and thus the aperture) which pass through the condenser, through the specimen, and then into the objective. Condenser height is controlled by a rack and pinion gear system that allows adjusting
Figure 12. Condenser/objective configuration for optical microscopy. The Abbe two-lens condenser illustrated shows ray traces through the optical train of the microscope. The aperture diaphragm restricts light that enters the condenser before it is refracted by the condenser lens system onto the specimen. Immersion oil is used in the contact beneath the underside of the slide and the condenser top lens and also between the objective and coverslip. The objective aperture angle (q) controls the amount of light that enters the objective.
the condenser focus to illuminate the specimen properly. Correctly positioning the condenser relative to the cone of illumination and focusing on the specimen are critical in quantitative microscopy and optimum photomicrography. Care must be taken to ensure that the condenser aperture is opened to the correct position with respect to the objective’s numerical aperture. When the aperture is opened too much, stray light generated by refraction of oblique light rays from the specimen can cause glare and lower the overall contrast. On the other hand, when the aperture is closed too far, the illumination cone is insufficient to provide adequate resolution, and the image is distorted due to refraction and diffraction from the specimen. The simplest and least corrected (also the least expensive) condenser is the Abbe condenser that, in its simplest form, has two optical lens elements which produce an image of the illuminated field diaphragm that is not sharp and is surrounded by blue and red color at the edges (Table 7). As a result of little optical correction, the Abbe condenser is suited mainly for routine observation using objectives of modest numerical aperture
OPTICAL MICROSCOPY Table 7. Condenser Aberration Corrections Condenser Type
Abbe Aplanatic Achromatic AplanaticAchromatic
Aberrations Corrected Spherical
Chromatic
— (No) x (Yes) — x
— — x x
and magnification. The primary advantages of the Abbe condenser are the wide cone of illumination that the condenser can produce as well as its ability to work with long working distance objectives. Manufacturers supply most microscopes with an Abbe condenser as the default, and these condensers are real workhorses for routine laboratory use. The next highest level of condenser correction is split between the aplanatic and achromatic condensers that are corrected exclusively for either spherical (aplanatic) or chromatic (achromatic) optical aberrations. Achromatic condensers typically contain four lens elements and have a numerical aperture of 0.95, the highest attainable without requiring immersion oil (5; see Table 7). This condenser is useful for both routine and critical laboratory analysis using dry objectives and also for black-and-white or color photomicrography. The numerical aperture performance, which will be necessary to provide an illumination cone adequate for the objectives, is critical factor in choosing substage condensers (see Table 8). The capability of the condenser numerical aperture should be equal to or slightly less than that of the highest objective numerical aperture. Therefore, if the largest magnification objective is an oil-immersion objective whose numerical aperture is
1.40, then the substage condenser should also have an equivalent numerical aperture to maintain the highest system resolution. In this case, immersion oil would have to be applied between the condenser top lens in contact with the underside of the microscope slide to achieve the intended numerical aperture (1.40) and resolution. Failure to use oil will restrict the highest numerical aperture of the system to 1.0, the highest obtainable using air as the imaging medium (2,5,17–24). Aplanatic condensers are well corrected for spherical aberration (green wavelengths) but not for chromatic aberration (Table 7). These condensers feature five lens elements and can focus light in a single plane. Aplanatic condensers produce excellent black-and-white photomicrographs by green light generated by a laser source or by using an interference filter that has tungstenhalide illumination. The highest level of correction for optical aberration is incorporated in the aplanatic–achromatic condenser (2,5). This condenser is well corrected for both chromatic and spherical aberrations and is the condenser of choice for use in critical color photomicrography by white light. A typical aplanatic–achromatic condenser features eight internal lens elements cemented into two doublets and four single lenses. Engravings on the condenser housing include its type (achromatic, aplanatic, etc.), the numerical aperture, and a graduated scale that indicates the approximate adjustment (size) of the aperture diaphragm. As mentioned before, condensers whose numerical apertures are higher than 0.95 perform best when a drop of oil is applied to the upper lens in contact with the undersurface of the specimen slide. This ensures that oblique light rays from the condenser are not reflected from underneath the slide, but are directed into the specimen without deviation. In practice, this can become tedious and is not commonly done in routine microscopy, but it is essential when working at high resolutions and for accurate photomicrography using high-power (and numerical aperture) objectives.
Table 8. Substage Condenser Applications Condenser Type
Bright-field
Achromat/ aplanat NA 1.3 Achromat swing-out NA 0.90 Low-power NA 0.20 Phasecontrast Abbe NA 1.25 Phasecontrast achromat NA 0.85 DIC universal achromat/aplanat Dark-field, dry NA 0.80 ∼ 0.95 Dark-field, oil NA 1.20 ∼ 1.43 Strain-free achromat swing-out NA 0.90
• • [10–100X] • [4–100X] • [1–10X] • • •
•
1123
Dark-field
Phase-contrast
DIC
• [up to NA 0.65] • [up to NA 0.70] • [up to NA 0.70] • [4–40X] • [4–100X]
• [10–100X] • [4–100X] • [10–100X]
• [10X, 20X, 100X]
Polarizing
• [4–100X]
1124
OPTICAL MICROSCOPY
Another important consideration is the thickness of the microscope slide, which is as crucial to the condenser as coverslip thickness is to the objective. Most commercial producers offer slides that range in thickness between 0.95 and 1.20 mm: the most common is very close to 1.0 mm. A microscope slide 1.20 mm thick is too thick for most high numerical aperture condensers that tend to have a very short working distance. This does not greatly matter for routine specimen observation, but the results can be devastating in precision photomicrography. We recommend choosing microscope slides that are 1.0 ± 0.05 mm thick, and thoroughly cleaning them before use. When the objective is changed, for example, from 10X to 20X, the aperture diaphragm of the condenser must also be adjusted to provide a light cone that matches the numerical aperture of the new objective. There is a small painted arrow or index mark located on the knurled knob or lever that indicates the relative size of the aperture compared to the linear gradation on this condenser housing. Many manufacturers synchronize this gradation to correspond to the approximate numerical aperture of the condenser. For example, if the microscopist has selected a 10X objective of numerical aperture 0.25, then the arrow would be placed next to the value 0.18–0.20 (about 80% of the objective numerical aperture) on the scale inscribed on the condenser housing. Often, it is not practical to use a single condenser for an entire range of objectives (2–100X) due to the broad range of light cones that must be produced to match objective numerical apertures. For low-power objectives in the range 2–5X, the illumination cone will have a diameter from 6–10 mm, and the high-power objectives (60–100X) need a highly focused light cone only about 0.2–0.4 mm in diameter. For a fixed focal length, it is difficult to achieve this wide range of illumination cones by using a single condenser (18). In practice, this problem can be solved in several ways. For low-power objectives (less than 10X), it may be necessary to unscrew the top lens of the condenser to fill the field of view with light. Other condensers produced have a flip-top upper lens to accomplish this more readily. Some manufacturers now produce a condenser that flips over completely when used with low-power objectives. Other companies incorporate auxiliary correction lenses in the light path to secure proper illumination for objectives less than 10X, or produce special low-power and lownumerical aperture condensers. When the condenser is used without its top lens, the aperture iris diaphragm is opened wide and the field diaphragm, now visible at the back of the objective, serves as if it were the aperture diaphragm. Flip-top condensers are manufactured in a variety of configurations, and with numerical apertures range from 0.65–1.40. Those condensers that have a numerical aperture value of 0.95 or less are intended for use with dry objectives. Flip-top condensers that have a numerical aperture greater than 0.95 are intended for use with oil-immersion objectives and they must have a drop of oil placed between the underside of the microscope slide and the condenser top lens when examining critical samples.
In addition to the common bright-field condensers discussed before, there are a wide variety of specialized models suited to many different applications. Substage condensers have a great deal of interchangeability among different applications (Table 8). For instance, the DIC universal achromat/aplanat condenser is useful for bright field, dark field, and phase-contrast, in addition to the primary DIC application. Other condensers have similar interchangeability. Depth of Field and Depth of Focus When considering resolution in optical microscopy, the major emphasis is placed on point-to-point resolution in the plane perpendicular to the optical axis. Another important aspect of resolution is the axial resolving power of an objective, which is measured parallel to the optical axis and is most often referred to as depth of field (2,4,5,22). Axial resolution, like horizontal resolution, is determined only by the numerical aperture of the objective; the eyepiece merely magnifies the details resolved and projected in the intermediate image plane. Just as in classical photography, depth of field is determined by the distance from the nearest object plane in focus to that of the farthest plane also simultaneously in focus. Depth of field in microscopy is very short and is usually measured in microns. The term depth of focus, which refers to image space, is often used interchangeably with depth of field, which refers to object space. This interchange of terms can lead to confusion, especially when the terms are both used specifically for depth of field. The geometric image plane might be expected to represent an infinitely thin section of the specimen, but even in the absence of aberrations, each image point is spread into a diffraction figure that extends above and below this plane (2,4,17). The Airy disk, discussed in the section on Image Formation, represents a section through the center of the image plane. This increases the effective in-focus depth of the Z-axis Airy disk intensity profile that passes through slightly different specimen planes. Depth of focus varies with numerical aperture and magnification of the objective, and under some conditions, high numerical aperture systems (usually of higher magnification power) have deeper focus depths than those systems of low numerical aperture, even though the depth of field is less (4). This is particularly important in photomicrography because the film emulsion or digital camera sensor must be exposed at a plane that is in focus. Small errors made by focusing at high magnification are not as critical as those made by using very low magnification objectives. Table 9 presents calculated variations in the depth of field and image depth in the intermediate image plane for a series of objectives of increasing numerical aperture and magnification. REFLECTED LIGHT MICROSCOPY Reflected light microscopy, often referred to as incident light, epi-illumination, or metallurgical microscopy, is the method of choice for fluorescence and for imaging specimens that remain opaque even when ground to a thickness
OPTICAL MICROSCOPY Table 9. Depth of Field and Image Deptha Magnification
Numerical aperture
Depth of Field (µm)
Image depth (mm)
4X 10X 20X 40X 60X 100X
0.10 0.25 0.40 0.65 0.85 0.95
15.5 8.5 5.8 1.0 0.40 0.19
0.13 0.80 3.8 12.8 29.8 80.0
a
Source: Nikon.
of 30 microns (25). The range of specimens that fall into this category is enormous and includes most metals, ores, ceramics, many polymers, semiconductors (unprocessed silicon, wafers, and integrated circuits), slag, coal, plastics, paint, paper, wood, leather, glass inclusions, and a wide variety of specialized materials (25–28). Because light cannot pass through these specimens, it must be directed onto the surface and eventually returned to the microscope objective by either specular or diffused reflection. As mentioned before, such illumination is most often referred to as episcopic illumination, epi-illumination, or vertical illumination (essentially originating from above), in contrast to diascopic (transmitted) illumination that passes through a specimen. Today, many microscope manufacturers offer models that permit the user to alternate or simultaneously conduct investigations using vertical and transmitted illumination. Reflected light microscopy is frequently the domain of industrial microscopy, especially in the rapidly growing semiconductor arena, and thus represents a most important segment of microscopic studies. A typical upright-compound, reflected-light (illustrated in Fig. 13) microscope that is also equipped for transmitted light has two eyepiece viewing tubes and often a trinocular tube head for mounting a conventional or digital/video camera system. Standard equipment eyepieces are usually of 10X magnification, and most microscopes are equipped with a nosepiece that can hold four to six objectives. The stage is mechanically controlled by a specimen holder that can be translated in the x and y directions, and the entire stage unit can be moved precisely up and down by a coarse and fine focusing mechanism. Built-in light sources range from 20- and 100-watt tungsten-halide bulbs to higher energy mercury vapor or xenon lamps that are used in fluorescence microscopy. Light passes from the lamp house through a vertical illuminator interposed above the nosepiece but below the underside of the viewing tube head. The specimen’s top surface is upright (usually without a coverslip) on the stage facing the objective, which has been rotated into the microscope’s optical axis. The vertical illuminator is horizontally oriented at a 90 ° angle to the optical axis of the microscope and parallel to the table top; the lamp housing is attached to the back of the illuminator. The coarse and fine adjustment knobs raise or lower the stage in large or small increments to bring the specimen into sharp focus. Another variation of the upright reflected-light microscope is the inverted microscope — of the Le Chatelier
1125
design (25). On the inverted stand, the specimen is placed on the stage, and its surface of interest faces downward. The primary advantage of this design is that samples can be easily examined when they are far too large to fit into the confines of an upright microscope. Also, only the side that faces the objectives need be perfectly flat. The objectives are mounted on a nosepiece under the stage, and their front lenses face upward toward the specimen. Focusing is accomplished either by moving the nosepiece or the entire stage up and down. Inverted microscope stands incorporate the vertical illuminator into the body of the microscope. Many types of objectives can be used with inverted reflected-light microscopes, and all modes of reflected light illumination may be possible: brightfield, dark-field, polarized light, differential interference contrast, and fluorescence. Many inverted microscopes have built-in 35-mm and/or large format cameras or are modular to allow attaching such accessories. Because most families of objectives progress in magnification by a factor of 2, some of the instruments include a magnification changer to provide intermediate magnifications. Contrast filters and a variety of reticles are also often built into reflected-light inverted microscopes. Because an inverted microscope is a favorite instrument for metallographers, it is often referred to as a metallograph (25,27). Manufacturers are largely changing to infinity-corrected optics in reflected light microscopes, but there are still thousands of fixed-tube-length microscopes in use whose objectives are corrected for a tube length between 160 and 210 mm. In the vertical illuminator, light travels from the light source, usually a 12-volt, 50- or 100-watt tungsten halogen lamp, passes through collector lenses, through the variable aperture iris diaphragm opening, and through the opening of a variable and centerable prefocused field iris diaphragm. The light then strikes a partially silvered plane glass reflector (Fig. 13) or strikes a fully silvered periphery of a mirror that has an elliptical opening for dark-field illumination. The plane glass reflector is partially silvered on the glass side that faces the light source and is antireflection coated on the glass side that faces the observation tube in bright-field reflected illumination. Thus, light is deflected downward into the objective. The mirrors are tilted at an angle of 45 ° degrees to the path of the light that travels along the vertical illuminator. The light reaches the specimen, which may absorb some of the light and reflect some of the light, either in a specular or diffuse manner. Light that is returned upward can be captured by the objective in accordance with the objective’s numerical aperture and then passes through the partially silvered mirror. In infinity-corrected objectives, the light emerges from the objective in parallel (from every azimuth) rays and projects an image of the specimen to infinity (25). The parallel rays enter the body tube lens, which forms the specimen image at the plane of the fixed-diaphragm opening in the eyepiece (intermediate image plane). It is important to note, that in these reflected-light systems, the objective serves a dual function: on the way down, as a matching wellcorrected condenser properly aligned; on the way up, as
1126
OPTICAL MICROSCOPY
Reflected-light lamp house
Binocular head
Aperture diaphragm Field diaphragm
Eyepieces
Collector lens
Vertical illuminator Filter slider Objective
Internal circuit board
Mechanical stage Condenser
Base
Field diaphragm
Filters Transmitted-light lamp house
Figure 13. Components of a modern microscope configured for both transmitted and reflected light. This cutaway diagram reveals the ray traces and lens components of the microscope’s optical trains. Also illustrated are the basic microscope components including two lamp houses, the microscope built-in vertical and base illuminators, condenser, objectives, eyepieces, filters, sliders, collector lenses, field, and aperture diaphragms.
an image-forming objective in the customary role of an objective that projects the image-carrying rays toward the eyepiece. Optimal performance is achieved in reflected-light illumination when the instrument is adjusted to produce K¨ohler illumination. A function of K¨ohler illumination (aside from providing uniform intensity across the field of view) is to ensure that the objective can deliver excellent resolution and good contrast, even if the source of light is a coil filament lamp. Some modern reflected-light illuminators are described as universal illuminators because, by using several additional accessories and little or no dismantling, the microscope can easily be switched from one mode of reflected-light microscopy to another. Often, reflectors can be removed from the light path altogether to perform transmitted-light observation. Such universal illuminators may include a partially reflecting plane glass surface (the half-mirror) for bright-field, and a fully silvered reflecting surface that has an elliptical, centrally located clear opening for dark-field observation. The best designed vertical illuminators include condensing lenses to gather and control the light, an aperture iris diaphragm,
and a prefocused, centerable iris diaphragm to permit desirable K¨ohler illumination (2,25). The vertical illuminator should also provide for inserting of filters for contrast and photomicrography, polarizers, analyzers, and compensator plates for polarized light and differential interference contrast illumination. In vertical illuminators designed for use with infinitycorrected objectives, the illuminator may also include a body tube lens. Affixed to the back end of the vertical illuminator is a lamp house, which usually contains a tungsten-halide lamp. For fluorescence work, the lamp house can be replaced by one that contains a mercury arc lamp. The lamp may be powered by the electronics built into the microscope stand, or in fluorescence, by an external transformer or power supply. In reflected-light microscopy, absorption and diffraction of the incident light rays by the specimen often lead to readily discernible variations in the image, from black through various shades of gray, or color if the specimen is colored. Such specimens, known as amplitude specimens, may not require special contrast methods or treatment to make their details visible. Other specimens show so little difference in intensity and/or color that their feature
OPTICAL MICROSCOPY
details are extremely difficult to discern and distinguish in bright-field reflected-light microscopy. Such specimens behave much like the phase specimens so familiar in transmitted light work, and they require special treatment or contrast methods that are described in the next section. Contrast-Enhancing Techniques Some specimens are considered amplitude objects because they absorb light partially or completely and can thus be readily observed by using conventional bright-field microscopy. Others that are naturally colored or artificially stained by chemical color dyes can also be seen. These stains or natural colors absorb some part of the white light that passes through and transmit or reflect other colors. It is common practice to stain specimens that do not readily absorb light, thus rendering such objects visible to the eye. Often, stains are combined to yield contrasting colors, for example, blue haemotoxylin stain for cell nuclei combined with pink eosin for cytoplasm. Contrast produced by the absorption of light, brightness, or color has been the classical means of imaging specimens in bright-field microscopy. The ability of a detail to stand out against the background or other adjacent details is a measure of specimen contrast. In a simple formula, contrast can be described as (18) % contrast = [(BI − SI ) × 100]/BI ,
(4)
where BI is the intensity of the background and SI is the specimen intensity. From this equation, it is evident that specimen contrast refers to the relationship between the highest and lowest intensity in the image. For many specimens in microscopy, especially unstained or living material, contrast is so poor that the object remains essentially invisible regardless of the objective’s ability to resolve or clearly separate details. Often, for just such specimens, it is important not to alter them by killing or treatment with chemical dyes or fixatives. This necessity has led microscopists to experiment with contrast-enhancing techniques for more than one hundred years in to improve specimen visibility and to bring more detail to the image. It is a common, but unwise, practice to reduce the condenser aperture diaphragm below the recommended size or to lower the substage condenser to increase specimen contrast. These maneuvers indeed increase contrast, but unfortunately, they also seriously reduce resolution and sharpness. An early and currently used method of increasing contrast of stained specimens uses color contrast filters, gelatin squares (from Kodak), or interference filters in the light path (18,19). For example, if a specimen is stained red, a green filter will darken the red areas and thus increase contrast. On the other hand, a green filter will lighten any green stained area. Color filters are very valuable aids to specimen contrast, especially when blackand-white photomicrography is the goal. Green filters are particularly valuable for use with achromats, which are spherically corrected for green light, and phasecontrast objectives, which are designed for manipulating wavelength, assume the use of green light, because phase specimens are usually transparent and lack inherent
1127
color. Another simple technique for contrast improvement involves selecting a mounting medium whose refractive index is substantially different from that of the specimen. For example, diatoms can be mounted in a variety of contrast-enhancing media such as air whose refractive index is low, or the high refractive index medium Styrax. The difference in refractive indexes improves the contrast of these colorless objects and renders their outlines and markings more visible. The following sections describe many of the more complex techniques used by present-day microscopists to improve specimen contrast. DARK-FIELD MICROSCOPY Dark-field illumination requires blocking out the central light rays that ordinarily pass through or around the specimen and allowing only oblique rays to illuminate the specimen. This is a simple and popular method for imaging unstained specimens, which appear as brightly illuminated objects on a dark background. Oblique light rays emanating from a dark-field condenser strike the specimen from every azimuth and are diffracted, reflected, and refracted into the microscope objective (5,18,19,28). This technique is illustrated in Fig. 14. If no specimen is present and the numerical aperture of the condenser is greater than that of the
Dark-field microscopy
Objective Cone of illumination Light diffracted by specimen Specimen
Condenser
Central opaque light stop (beneath condenser)
Figure 14. Schematic configuration for dark-field microscopy. The central opaque light stop is positioned beneath the condenser to eliminate zeroth-order illumination of the specimen. The condenser produces a hollow cone of illumination that strikes the specimen at oblique angles. Some of the reflected, refracted, and diffracted light from the specimen enters the objective front lens.
1128
OPTICAL MICROSCOPY
objective, the oblique rays cross and all such rays miss entering the objective because of the obliquity. The field of view will appear dark. In Fourier optics, dark-field illumination removes the zeroth order (unscattered light) from the diffraction pattern formed at the rear focal plane of the objective (5). Oblique rays, now diffracted by the specimen and yielding first, second, and higher diffracted orders at the rear focal plane of the objective, proceed onto the image plane where they interfere with one another to produce an image of the specimen. This results in an image formed exclusively from higher order diffraction intensities scattered by the specimen. Specimens that have smooth reflective surfaces produce images due, in part, to the reflection of light into the objective (18,19). When the refractive index is different from that of the surrounding medium or when refractive index gradients occur (as in the edge of a membrane), light is refracted by the specimen. Both instances of reflection and refraction produce relatively small angular changes in the direction of light and allow some to enter the objective. In contrast, some light that strikes the specimen is also diffracted and produces a 180 ° arc of light that passes through the entire numerical aperture range of the objective. The resolving power of the objective is the same in dark-field illumination as that in bright-field conditions, but the optical character of the image is not as faithfully reproduced. Several pieces of equipment are used to produce darkfield illumination. The simplest is a spider stop (Fig. 14) placed just under the bottom lens (in the front focal plane) of the substage condenser (5,18,19,28). Both the aperture and field diaphragms are opened wide to pass oblique rays. The central opaque stop (you can make one by mounting a coin on a clear glass disk) blocks out the central rays. This device works fairly well, even with the Abbe condenser, when using the 10X objective up to 40X or higher objectives that have a numerical aperture no higher than 0.65. The diameter of the opaque stop should be approximately 16–18 mm for a 10X objective of numerical aperture 0.25 to approximately 20–24 mm for 20X and 40X objectives of numerical apertures that approach 0.65. The stop should be placed as close as possible to the aperture diaphragm. For more precise work and blacker backgrounds, use a condenser designed especially for dark-field, that is, to transmit only oblique rays (28,29). There are several varieties, including dry dark-field condensers that have air between the top of the condenser and the underside of the slide. Immersion dark-field condensers are also available. These require using a drop of immersion oil (some are designed to use water instead) that establishes contact between the top of the condenser and the underside of the specimen slide. The immersion dark-field condenser has internal mirrored surfaces and passes rays of great obliquity free of chromatic aberration that produce the best results and blackest background. Dark-field objects are quite spectacular to see, and objects of very low contrast in bright-field shine brilliantly in dark-field. Such illumination is best for revealing outlines, edges, and boundaries. A high-magnification dark-field image of a diatom is illustrated in Fig. 15.
Figure 15. Dark-field photomicrograph of the diatom Arachnoidiscus ehrenbergi taken at high magnification using oil immersion optics and a 100X objective.
Rheinberg Illumination Rheinberg illumination, a form of optical staining, was initially demonstrated by the British microscopist Julius Rheinberg to the Royal Microscopical Society and the Quekett Microscopical Club (England) more than one hundred years ago (18,28). This technique is a striking variation of low to medium power dark-field illumination using colored gelatin or glass filters to provide rich color to both the specimen and background. The central opaque dark-field stop is replaced by a transparent, colored, circular stop inserted into a transparent ring of a contrasting color (illustrated in Fig. 16). These stops are placed under the bottom lens of the condenser. The result is a specimen rendered in the color of the ring whose background has the color of the central spot. A photomicrograph of the spiracle and trachea of a silkworm larva taken by Rheinberg illumination is shown in Fig. 17. PHASE-CONTRAST MICROSCOPY Research by Frits Zernike during the early1930s uncovered phase and amplitude differences between zerothorder and deviated light that can be altered to produce favorable conditions for interference and contrast
OPTICAL MICROSCOPY
Rheinberg illumination
Objective
Annular light rays Specimen
Condenser
Annular filter
Figure 16. Schematic microscope configuration for Rheinberg illumination. Light passes through the central/annular filter pack before entering the condenser. Zeroth-order light from the central filter pervades the specimen and illuminates it with higher order light from the annular filter. The filter colors in this diagram are a green central and red annular.
Figure 17. Spiracle and trachea from silkworm larva photographed at low magnification under Rheinberg illumination using a blue central and yellow annulus filter combination and a 2X objective.
enhancement (30,31). Unstained specimens that do not absorb light are called phase objects because they slightly alter the phase of the light diffracted by the specimen, usually by retarding such light approximately one-quarter
1129
wavelength compared to the undeviated direct light that passes unaffected through or around the specimen. Unfortunately, our eyes as well as camera film cannot detect these phase differences. To reiterate, the human eye is sensitive only to the colors of the visible spectrum or to differing levels of light intensity (related to wave amplitude). The direct zeroth-order light in phase specimens, passes undeviated through or around the specimen. However, the light diffracted by the specimen is not reduced in amplitude as it is in a light-absorbing object, but is slowed by the specimen because of the specimen’s refractive index or thickness (or both). This diffracted light that lags behind by approximately one-quarter wavelength arrives at the image plane out of phase with the undeviated light but, in interference, essentially undiminished in intensity. The result is that the image at the eyepiece level is so lacking in contrast as to make the details almost invisible. Zernike devised a method, now known as phase, contrast microscopy, for making unstained, phase objects yield contrast images as if they were amplitude objects. Amplitude objects show excellent contrast when the diffracted and direct light are out of phase by onehalf wavelength (18). Zernike’s method was to speed up the direct light by one-quarter wavelength so that the difference in wavelength between the direct and deviated light for a phase specimen would now be onehalf wavelength. As a result, the direct and diffracted light arriving at the image level of the eyepiece could produce destructive interference (see the section on Image formation for absorbing objects previously described). In such a procedure, the details of the image appear darker against a lighter background. This is called dark or positive phase contrast. A schematic illustration of the basic phase contrast microscope configuration is shown in Fig. 18. Another possible course, much less often used, is to arrange to slow down the direct light by one-quarter wavelength, so that the diffracted light and the direct light arrive at the eyepiece in phase and can interfere constructively (2,5,18). This arrangement called negative or bright contrast, results in a bright image of the details of the specimen on a darker background. Phase contrast involves separating the direct zerothorder light from the diffracted light at the rear focal plane of the objective. To do this, a ring annulus is positioned directly under the lower lens of the condenser at the front focal plane of the condenser, conjugate to the objective rear focal plane. As the hollow cone of light from the annulus passes undeviated through the specimen, it arrives at the rear focal plane of the objective in the shape of a ring of light. The fainter light diffracted by the specimen is spread over the rear focal plane of the objective. If this combination were allowed, as is, to proceed to the image plane of the eyepiece, the diffracted light would be approximately one-quarter wavelength behind the direct light. At the image plane, the phase of the diffracted light would be out of phase with the direct light, but the amplitude of their interference would be almost the same as that of the direct light (5,18). This would result in very little specimen contrast.
1130
OPTICAL MICROSCOPY
Phase-contrast microscopy Undeviated light
180° out of phase
Phase plate Undeviated light
90° out of phase
Objective
Specimen
Condenser
Phase ring
light is slowed by one-quarter wavelength. In this case, the zeroth-order light arrives at the image plane in phase with the diffracted light, and constructive interference takes place. The image appears bright on a darker background. This type of phase contrast is described as negative or bright contrast (2,5,18,19). Because undeviated zeroth-order light is much brighter than faint diffracted light, a thin absorptive transparent metallic layer is deposited on the phase ring to bring the direct and diffracted light into better intensity balance to increase contrast. Because speeding up or slowing down the direct light is calculated on one-quarter wavelength of green light, the phase image will also appear best when a green filter is placed in the light path (a green interference filter is preferable). Such a green filter also helps achromatic objectives produce their best images because achromats are spherically corrected for green light. The accessories needed for phase-contrast work are a substage phase-contrast condenser equipped with annuli and a set of phase-contrast objectives, each of which has a phase plate installed. The condenser usually has a bright-field position that has an aperture diaphragm and a rotating turret of annuli (each phase objective of different magnification requires an annulus of increasing diameter as the magnification of the objective increases). Each phase objective has a darkened ring on its back lens. Such objectives can also be used for ordinary bright-field transmitted light work that has only a slight reduction in image quality. A small phase telescope or a Bertrand lens is used to facilitate the centering of these rings in the optical pathway. A photomicrograph of hair cross sections from a fetal mouse taken by phase-contrast illumination is shown in Fig. 19.
Figure 18. Schematic configuration of phase-contrast microscopy. Light that passes through the phase ring is first concentrated onto the specimen by the condenser. Undeviated and out-of-phase light enters the objective and is retarded by the phase plate before interference at the rear focal plane of the objective.
To speed up the direct undeviated zeroth-order light, a phase plate is installed that has a ring-shaped phase shifter attached to it at the rear focal plane of the objective. The narrow area of the phase ring is optically thinner than the rest of the plate. As a result, undeviated light that passes through the phase ring travels a shorter distance in traversing the glass of the objective than the diffracted light. Now, when the direct undeviated light and the diffracted light proceed to the image plane, they are one-half wavelength out of phase with each other. The diffracted and direct light can now interfere destructively so that the details of the specimen appear dark against a lighter background (just as they do for an absorbing or amplitude specimen). This is a description of what takes place in positive or dark phase contrast. If the ring phase-shifter area of the phase plate is made optically thicker than the rest of the plate, direct
Figure 19. Photomicrograph of a hair thin section from a fetal mouse taken by using phase-contrast optics and a 20X objective.
OPTICAL MICROSCOPY
1131
POLARIZED LIGHT Many transparent solids are optically isotropic, meaning that the index of refraction is equal in all directions throughout the crystalline lattice. Examples of isotropic solids are glass, table salt (sodium chloride), many polymers, and a wide variety of organic and inorganic compounds (32,33). Crystals are classified as either isotropic or anisotropic, depending upon their optical behavior and whether their crystallographic axes are equivalent. All isotropic crystals have equivalent axes that interact similarly with light, regardless of the crystal orientation with respect to incident light waves. Light that enters an isotropic crystal is refracted at a constant angle and passes through the crystal at a single velocity without being polarized by interaction with the electronic components of the crystalline lattice. Anisotropic crystals, on the other hand, have crystallographically distinct axes and interact with light in a manner that depends on the orientation of the crystalline lattice with respect to the incident light. When light enters the optical axis of anisotropic crystals, it acts similarly to interaction with isotropic crystals and passes through at a single velocity. However, when light enters a nonequivalent axis, it is refracted into two rays; each is polarized so that their vibrational directions are oriented at right angles to one another and travel at different velocities. This phenomenon is termed double or birefraction and is seen to a greater or lesser degree in all anisotropic crystals (32–34). When anisotropic crystals refract light, the resulting rays are polarized and travel at different velocities. One of the rays, termed the ordinary ray, travels with the same velocity in every direction through the crystal. The other ray travels with a velocity that depends on the propagative direction within the crystal. This light ray is termed the extraordinary ray. The retardation between the ordinary and extraordinary ray increases as crystal thickness increases. The two independent refractive indexes of anisotropic crystals are quantified in terms of their birefringence, a measure of the difference in refractive index. Thus, the birefringence B of a crystal is defined as (5) B = |nhigh − nlow |, where nhigh is the largest refractive index and nlow is the smallest. This expression holds true for any part or fragment of an anisotropic crystal except for light waves propagated along the optical axis of the crystal. As mentioned before, light that is doubly refracted through anisotropic crystals is polarized, so that the vibrational directions of the polarized ordinary and extraordinary light waves are oriented perpendicular to each other. Now, we can examine how anisotropic crystals behave under polarized illumination in a polarizing microscope. A polarizer placed beneath the substage condenser is oriented, so that polarized light that exits the polarizer is plane polarized in a vibrational direction that is east–west with respect to the optic axis of the microscope stand. Polarized light enters the anisotropic crystal
Analyzer
Polarized-light microscopy
Out-of-phase light rays
Objective
Birefringent specimen Condenser
Plane-polarized light Polarizer
Figure 20. Schematic microscope configuration for observing birefringent specimens under crossed polarized illumination. Light that passes through the polarizer is plane polarized and concentrated onto the birefringent specimen by the condenser. Light rays that emerge from the specimen interfere when they are recombined in the analyzer and produce a myriad of tones and colors.
where it is refracted and divided into two separate components that vibrate parallel to the crystallographic axes and perpendicular to each other. Then, the polarized light waves pass through the specimen and objective before reaching a second polarizer (usually termed the analyzer) that is oriented to pass a polarized vibration perpendicular to that of the substage polarizer. Therefore, the analyzer passes only those components of the light waves that are parallel to the polarization direction of the analyzer. The retardation of one ray with respect to another is caused by the difference in speed between the ordinary and extraordinary rays refracted by the anisotropic crystal (33,34). A schematic illustration of a microscope configuration for crossed polarized illumination is presented in Fig. 20. Now we will consider the phase relationship and velocity differences between the ordinary and extraordinary rays after they pass through a birefringent crystal. These
1132
OPTICAL MICROSCOPY
rays are oriented so that they vibrate at right angles to each other. Each ray encounters a slightly different electrical environment (refractive index) as it enters the crystal, and this affects the velocity at which the ray passes through the crystal (32,33). Because of the difference in refractive indexes, one ray will pass through the crystal at a rate slower than the other ray. In other words, the velocity of the slower ray will be retarded with respect to the faster ray. This retardation can be quantified by the following equation: Retardation () = thickness (t) × birefringence (B), (6) or = t × |nhigh − nlow |,
(7)
where is the quantitative retardation of the material, t is the thickness of the birefringent crystal (or material), and B is birefringence as defined before (33). Factors that contribute to the retardation value are the magnitude of the difference in refractive indexes for the environments seen by the ordinary and extraordinary rays and also the sample thickness. Obviously, the greater the difference in either refractive indexes or thickness, the greater the degree of retardation. Early observations made on the mineral calcite indicate that thicker calcite crystals cause greater differences in splitting of the images seen through the crystals. This agrees with Eq. (7) which states that retardation increases with crystal (or sample) thickness. When the ordinary and extraordinary rays emerge from the birefringent crystal, they are still vibrating at
Figure 21. Photomicrograph of high-density columnar-hexatic liquid crystalline calf thymus DNA at a concentration of approximately 450 mg/mL. This concentration of DNA approaches that found in sperm heads, virus capsids, and dinoflagellate chromosomes. The image was recorded using a polarized light microscope and the 10X objective.
right angles with respect to one another. However, the component vectors of these waves that pass through the analyzer vibrate in the same plane. Because one wave is retarded with respect to the other, interference (either constructive or destructive) occurs between the waves as they pass through the analyzer. The net result is that some birefringent samples (in white light) acquire a spectrum of color when observed through crossed polarizers. Polarized light microscopy requires strain-free objectives and condensers to avoid depolarizing the transmitted light (32–34). Most polarized microscopes are equipped with a centerable stage that has free 360 ° rotation about the optical axis of the microscope. A Bertrand lens is commonly inserted into the light path so that conoscopic images of birefringent patterns can be observed at the back of the objective. Manufacturers also offer a wide range of compensators and light retarders (full-wave, quarter-wave, and continuous wedge plates) for quantitative measurements of birefringence and for adding color to polarized light photomicrographs. Skilled use of these compensators allows precise chemical identification of minute crystals. Birefringent DNA liquid crystals photographed by crossed polarized illumination are illustrated in Fig. 21. HOFFMAN MODULATION CONTRAST The Hoffman modulation-contrast system is designed to increase visibility and contrast in unstained and living material by detecting optical gradients (or slopes) and converting them into variations of light intensity. This ingenious technique, invented by Dr. Robert Hoffman in 1975, employs several accessories that have been adapted to major commercial microscopes (16,18). A schematic illustration of microscope configuration for Hoffman modulation contrast is presented in Fig. 22. An optical amplitude spatial filter, termed a modulator by Hoffman, is inserted at the rear focal plane of an achromat or planachromat objective (although higher correction can also be used). Light intensity that passes through this system varies above and below an average value, which by definition, is then said to be modulated. Objectives useful for modulation contrast can cover the entire magnification range of 10–100X. The modulator has three zones: a small, dark zone near the periphery of the rear focal plane that transmits only 1% of light; a narrow gray zone that transmits 15%; and the remaining clear or transparent zone that covers most of the area at the back of the objective and transmits 100% of the light (5,16,18). Unlike the phase plate in phase-contrast microscopy, the Hoffman modulator does not alter the phase of the light that passes through any of the zones. When viewed by modulation-contrast optics, transparent objects that are essentially invisible in ordinary brightfield microscopy take on an apparent three-dimensional appearance dictated by phase gradients. The modulator does not introduce changes in the phase relationship of light that passes through the system, but it influences the principal zeroth-order maxima. Higher order diffraction maxima are unaffected. Measurements by a Michelson interferometer confirm that the spatial coherency of light
OPTICAL MICROSCOPY
Hoffman modulationcontrast microscopy Modulator plate
Objective
Specimen with optical gradients
Condenser
Slit plate Polarizer
Figure 22. Schematic illustration of microscope configuration for Hoffman modulation contrast. Light that passes through the polarizer and slit is focused onto the specimen by the condenser. After passing through the specimen, light enters the objective and is filtered by the modulator plate in the rear focal plane of the objective.
passed through a Hoffman-style modulator varies (if any) by a factor of less than λ/20 (5). Below the stage, a condenser equipped withb a rotating turret is used to hold the remaining components of the Hoffman modulation-contrast system. The turret condenser has a bright-field opening that has an aperture iris diaphragm for regular bright-field microscopy and for aligning and establishing proper conditions of K¨ohler illumination for the microscope. At each of the other turret openings, there is an off-center slit that is partially covered by a small rectangular polarizer. The size of the slit/polarizer combination differs for each objective of different magnification; hence the need for a turret arrangement. When light passes through the off-axis slit, it is imaged at the rear focal plane of the objective (also termed the Fourier plane), where the modulator has been installed. Like the central annulus and phase ring in phase-contrast microscopy, the front focal plane of the condenser that contains the off-axis slit plate is optically conjugate to the modulator in the objective rear focal plane. Image intensity is proportional to the first derivative of the optical density in the specimen and is controlled by the zeroth order of the phase gradient diffraction pattern. Below the condenser, a circular polarizer is placed on the light exit port of the microscope (note that both polarizers are below the specimen). The rotation of this
1133
polarizer can control the effective width of the slit opening. For example, crossing both polarizers at 90 ° to each other results in narrowing the slit, so that its image falls within the gray area of the modulator (16,18). The part of the slit controlled by the polarizer registers on the bright area of the modulator. As the polarizer is rotated, contrast can be varied for best effect. A very narrow slit produces images that have very high contrast and a high degree of coherence. Optical section imaging is also optimized when the slit is adjusted to its narrowest position. When the circular polarizer is oriented so that its vibrational direction is parallel to that of the polarizer in the slit, the effective slit width is at a maximum. This reduces overall image contrast and coherence but yields much better images of thicker objects where large differences in refractive index exist. In modern advanced modulation-contrast systems, both the modulator and the slit are offset from the optical axis of the microscope (18). This arrangement permits fuller use of the numerical aperture of the objective and results in good resolution of specimen detail. Shapes and details are rendered in shadowed, pseudo-threedimensional appearance. These appear brighter on one side, gray in the central portion, and darker on the other side, against a gray background. The modulator converts optical phase gradients in details (steepness, slope, rate of change in refractive index, or thickness) into changes in the intensity of various areas of the image at the plane of the eyepiece diaphragm. Resulting images have an apparent three-dimensional appearance and directional sensitivity to optical gradients. The contrast (related to variations in intensity) of the dark and bright areas against the gray gives a shadowed pseudorelief effect. This is typical of modulation-contrast imaging. Rotation of the polarizer alters the contrast achieved, and the orientation of the specimen on the stage (with respect to the polarizer and offset slit) may dramatically improve or degrade contrast. There are numerous advantages as well as limitations to modulation contrast. Some of the advantages include fuller use of the numerical aperture of the objective that yields excellent resolution of details along with good specimen contrast and visibility. Although many standard modulation-contrast objectives are achromats or planachromats, it is also possible to use objectives that have a higher degree of correction for optical aberration. Many major microscope manufacturers now offer modulation-contrast objectives in fluorite-correction grades, and apochromats can be obtained by special order. Older objectives can often be retrofitted with a modulator made by Modulation Optics, Inc., the company founded by Dr. Robert Hoffman specifically to build aftermarket and custom systems. In addition to the advantages of using higher numerical apertures that have modulation contrast, it is also possible to do optical sectioning by this technique (16,18). Sectioning allows the microscopist to focus on a single thin plane of the specimen without interference from confusing images that arise in areas above or below the plane that is focused on. The depth of a specimen is measured in a direction parallel to the optical axis of the
1134
OPTICAL MICROSCOPY
microscope. Focusing the image establishes the correct specimen-to-image distance and allows interference of the diffracted waves at a predetermined plane (the image plane) positioned at a fixed distance from the eyepiece. This enables viewing diffracting objects separately that occur at different depth levels in the specimen, provided there is sufficient contrast. The entire depth of a specimen can be optically sectioned by sequentially focusing on each succeeding plane. In this system, depth of field is defined as the distance from one level to the next where imaging of distinct detail occurs, and is controlled by the numerical aperture of the objective (5,16,18). Higher numerical aperture objectives exhibit very shallow depths of field and the opposite holds for objectives of lower numerical aperture. The overall capability of an objective to isolate and focus on a specific optical section diminishes as the optical homogeneity of the specimen decreases. Birefringent objects (rock thin sections, crystals, bone, etc.), that can confuse images in DIC (see next section) can be examined because the specimen is not situated between the two polarizers (18). Further, specimens can be contained in plastic or glass vessels without deterioration of the image due to polarization effects, because such vessels are also above both polarizers, not between them. This makes the Hoffman system far more useful than DIC in examining and photomicrographing cell, tissue, and organ culture in plastic containers. Hoffman modulation contrast can be simultaneously combined with polarized light microscopy to achieve spectacular effects in photomicrography. Figure 23 illustrates a thin section of a dinosaur bone photographed by using a combination of Hoffman modulationcontrast and plane-polarized illumination. Note the
Figure 23. Thin section of a duck-billed dinosaur bone photographed by using a combination of Hoffman modulation-contrast and polarized-light illumination using a 20X objective. Note the pseudo-three-dimensional relief throughout the photomicrograph.
pseudo-three-dimensional relief apparent throughout the micrograph and the beautiful coloration provided by polarized light. DIFFERENTIAL INTERFERENCE CONTRAST In the mid 1950s, the French optics theoretician Georges Nomarski improved the method for detecting optical gradients in specimens and converting them into intensity differences (15). Today, there are several implementations of this design, which are collectively called differential interference contrast (DIC). Living or stained specimens, which often yield poor images when viewed in bright-field illumination, are made clearly visible by optical rather than chemical means (2,4,5,15,18–22). In transmitted-light DIC, light from the lamp is passed through a polarizer located beneath the substage condenser, similarly to polarized-light microscopy. Next in the light path (but still beneath the condenser) is a modified Wollaston prism that splits the entering beam of polarized light into two beams that travel in slightly different directions (illustrated in Fig. 24). The prism is composed of two halves cemented together (36). Emerging light rays vibrate at 90 ° degrees relative to each other and have a slight path difference. A different prism is needed for each objective of different magnification. A revolving turret on the condenser allows the microscopist to rotate the appropriate prism into the light path when changing magnifications. The plane-polarized light that vibrates only in one direction perpendicular to the propagative direction of the light beam enters the beam-splitting modified Wollaston prism and is split into two rays that vibrate perpendicular to each other (Fig. 24). These two rays travel close together, but in slightly different directions. The rays intersect at the front focal plane of the condenser, where they pass, travel parallel, extremely close together, and have a slight path difference, but they are vibrating perpendicular to each other and therefore cannot cause interference. The distance between the rays, called the shear, is so minute that it is less than the resolving ability of the objective (5,18,36). The split beams enter and pass through the specimen where their wave paths are altered in accordance with the specimen’s varying thickness, slopes, and refractive indexes. These variations alter the wave path of both beams that pass through areas of the specimen that lie close together. When the parallel beams enter the objective, they are focused above the rear focal plane where they enter a second modified Wollaston prism that combines the two beams. This removes the shear and the original path difference between the beam pairs. As a result of traversing the specimen, the paths of the parallel beams do not have the same length (optical path difference) for different areas of the specimen. For the beams to interfere, the vibrations of the beams of different path length must be brought into the same plane and axis by placing a second polarizer (analyzer) above the upper Wollaston beam-combining prism. Then, the light proceeds toward the eyepiece where it can be observed as differences in intensity and color. The design
OPTICAL MICROSCOPY
Differential interference contrast microscopy
Analyzer
Wollaston prism
Objective
Specimen
Condenser
Wollaston prism Polarizer
Figure 24. Schematic illustration of microscope configuration for differential interference contrast. Light is polarized in a single vibrational plane by the polarizer before it enters the lower Wollaston prism that acts as a beam splitter. Next, the light passes through the condenser and sample before the image is reconstructed by the objective. The second Wollaston prism at the rear focal plane of the objective acts as a beam combiner and passes the light to the analyzer, where it interferes both constructively and destructively.
makes one side of a detail appear bright (or possibly in color) whereas the other side appears darker (or another color). This shadow effect gives the specimen a pseudothree-dimensional appearance (18). In some microscopes, the upper modified Wollaston prism is combined in a single fitting, and the analyzer is incorporated above it. The upper prism may also be arranged so that it can be moved horizontally. This allows varying optical path differences by moving the prism and provide a mechanism for altering the brightness and color of the background and specimen. Because of the prism design and placements, the background will be homogeneous for whichever color has been selected. The color and/or light intensity effects shown in the image are related especially to the rate of change in refractive index, specimen thickness, or both. Orientation of the specimen can have a pronounced effect on the relief-like appearance; often, rotating of the specimen
1135
by 180 ° changes a hill into a valley or vice versa. The three-dimensional appearance does not represent the true geometry of the specimen, but it is an exaggeration based on optical thickness. It is not suitable for accurate measurement of actual heights and depths. There are numerous advantages of DIC microscopy compared in particular to phase-contrast and Hoffman modulation-contrast microscopy. In DIC, it is also possible to use the numerical aperture of the system more fully and to provide optical staining (color). DIC also allows the microscope to achieve excellent resolution. The use of the full objective aperture enables the microscopist to focus on a thin plane section of a thick specimen without confusing images from above or below the plane. Annoying halos, often encountered in phase-contrast, are absent in DIC images, and common achromat and fluorite objectives can be used for this work. A downside is that plastic tissue culture vessels and other birefringent specimens yield confusing images in DIC. High-quality apochromatic objectives are now designed so that they are suitable for DIC. Figure 25a presents transmitted and reflected light DIC photomicrographs of mouth parts of a blowfly (transmitted DIC), and Figure 25b shows surface defects in a ferrosilicate alloy (reflected DIC). Both photomicrographs were made by using a retardation plate and a 10X objective. FLUORESCENCE MICROSCOPY Fluorescence microscopy is an excellent tool for studying material which can be made to fluoresce, either in its natural form (primary or autofluorescence) or when treated with chemicals that fluoresce (secondary fluorescence). This form of optical microscopy is rapidly reaching maturity and is now one of the fastest growing areas of microscopic investigation (6–8). The basic task of the fluorescence microscope is to permit excited light to irradiate the specimen and then to separate the much weaker reradiating fluorescent light from the brighter excited light. Thus, only the emitted light reaches the eye or other detector. The resulting fluorescing areas shine against a dark background and have sufficient contrast to permit detection. The darker the background of the nonfluorescing material, the more efficient the instrument. For example, ultraviolet (UV) light of a specific wavelength or set of wavelengths is produced by passing light from a UV-emitting source through the exciter filter. The filtered UV light illuminates the specimen, which emits fluorescent light of longer wavelengths while illuminated with ultraviolet light. Visible light emitted from the specimen is then filtered through a barrier filter that does not allow reflected UV light to pass. It should be noted that this is the only mode of microscopy in which the specimen, gives off its own light subsequent to excitation (6). The emitted light reradiates spherically in all directions, regardless of the direction of the exciting light. A schematic diagram of the configuration of fluorescence microscopy is presented in Fig. 26. Fluorescence microscopy has advantages based upon attributes not as readily available in other optical
1136
OPTICAL MICROSCOPY
(b)
(a)
Figure 25. (a) Nomarski transmitted light differential interference contrast (DIC) photomicrograph of mouthparts from a blowfly. The image was recorded using a retardation plate and the 10x objective. (b) Reflected light differential interference contrast photomicrograph of defects on the surface of a ferro silicon alloy. The image was captured by using a 10X objective and a first-order retardation plate.
Emission filter
Figure 26. Schematic diagram of the configuration of reflected-light fluorescence microscopy. Light emitted from a mercury arc lamp is concentrated by the collector lens before it passes through the aperture and field diaphragms. The exciter filter passes only the desired wavelengths, which are reflected down through the objective to illuminate the specimen. Fluorescence emitted by the specimen passes back through the objective and dichroic mirror before it is filtered by the barrier emission filter.
microscopic techniques. The use of fluorochromes has made it possible to identify cells and submicroscopic cellular components and entities that have a high degree of specificity amidst nonfluorescing material. What is more, the fluorescence microscope can reveal the presence of fluorescing material with exquisite sensitivity. An extremely small number of fluorescing molecules (as few as 50 molecules per cubic micron) can be detected (6–8). By multiple staining of a given sample, different probes will reveal the presence of different target molecules. Although the fluorescence microscope cannot provide spatial resolution below the diffraction limit of the respective objectives, the presence of fluorescing molecules below such limits is made visible. Techniques of fluorescence microscopy can be applied to organic material, formerly living material, or to living material (by using in vitro or in vivo fluorochromes).
Dichroic mirror
Exciter filter
Aperture diaphragm
Field diaphragm
Lamp
Collector lens
Objective Reflected-light fluorescence microscopy
These techniques can also be applied to inorganic material (especially when investigating contaminants on semiconductor wafers). There are also a burgeoning number of studies using fluorescent probes to monitor rapidly changing physiological ion concentrations (calcium, magnesium, etc.) and pH values in living cells (1,7,8). Some specimens fluoresce without the addition of dyes when irradiated with shorter wavelength light (primary or autofluorescence). Autofluorescence has been found useful in plant studies, coal petrography, sedimentary rock petrology, and in the semiconductor industry. In the study of animal tissues or pathogens, autofluorescence is often either extremely faint or has such nonspecificity as to make it of minimal use. Of far greater value for such specimens are the fluorochromes (also called fluorophores) which are excited by irradiating light and whose eventual
OPTICAL MICROSCOPY
1137
exposures and a difficult process for developing emulsion plates. The primary medium for photomicrography was film until the past decade when improvements in electronic camera and computer technology made digital imaging cheaper and easier to use than conventional photography. This section will address photomicrography on film and by electronic analog and digital imaging systems. The quality of a photomicrograph, either digital or recorded on film, depends on the quality of the microscopy. Film is a stern judge of how good the microscopy has been before it captures the image. It is essential to configure the microscope using K¨ohler illumination, to adjust the field and condenser diaphragms correctly to and optimize the condenser height. When properly adjusted, the microscope yields images that have even illumination across the entire field of view and displays the best compromise of contrast and resolution. Photographic Films Figure 27. Fluorescence/DIC combination photomicrograph of cat brain tissue infected with Cryptococcus. Infected neurons are heavily stained with the DNA-specific fluorescent dye acridine orange. Note the pseudo-three-dimensional effect that occurs when these two techniques are combined. The micrograph was recorded by using a 40X objective.
yield of emitted light is of greater intensity. This is called secondary fluorescence. Fluorochromes are stains, somewhat similar to the better known tissue stains, which attach themselves to visible or subvisible organic matter. These fluorochromes that can absorb and then reradiate light are often highly specific in their attachment targeting and have significant yields in absorption–emission ratios. This makes them extremely valuable in biological applications. The growth in the use of fluorescence microscopes is closely linked to the development of hundreds of fluorochromes that have known intensity curves of excitation and emission and well-understood biological structure targets (6). When deciding which label to use for fluorescence microscopy, bear in mind that the fluorochrome chosen should have a high likelihood of absorbing the excited light and should remain attached to the target molecules. The fluorochrome should also provide a satisfactory yield of emitted fluorescent light. Figure 27 shows a photomicrograph of cat brain cells infected with Cryptococcus. The image was made by using a combination of fluorescence and DIC microscopy, taking advantage of the features of both contrast-enhancing techniques. Infected neurons are heavily stained with the DNA-specific fluorescent dye acridine orange, and the entire image is rendered in a pseudo-three-dimensional effect by DIC. PHOTOMICROGRAPHY AND DIGITAL IMAGING The use of photography to capture images in a microscope dates back to the invention of the photographic process (37). Early photomicrographs had remarkable quality, but the techniques were laborious and burdened with long
Films for photography are coated with a light-sensitive emulsion of silver salts and/or dyes. When light is allowed to expose the emulsion, active centers combine to form a latent image that must be developed by photographic chemicals (19,37–39). This requires exposing the film in a darkened container to a series of solutions where temperature, development time, and appropriate agitation or mixing of the solutions must be controlled. The developing process must then be halted by a stop solution. Next, the unexposed emulsion, which consists of silver salts and dyes, is cleared, and the film is fixed, then washed, and dried for use. The development, stop, fixing, and clearing must be done in a darkroom or in lighttight developing tanks, and film must be handled in complete darkness. The temperature, duration, and agitation usually depend on the film used. For example, the Kodachrome K14 process is far more demanding than the E6 process used for Ektachrome and Fujichrome (19,37). The emulsion speed determines how much light must be used to expose the film in a given time period. Films are rated according to their ASA or ISO number, which gives an indication of the relative film speed. Larger ISO numbers indicate faster films; A film of ISO 25 is one of the slowest available a film and of ISO 1600 is one of the fastest. Because the microscope is a relatively stable platform that has good illumination, films in the 50–200 ISO range are commonly used for photomicrography. Film is divided into a number of categories, depending on whether it is intended for black-and-white or color photography. Color films are subdivided into two types: color films that yield positive transparencies (colors like those of the image being observed) and color negatives (colors complementary to those in the microscope image, e.g., green for magenta). Color films can be further subdivided into two groups: films designed to receive daylight quality light and those designed to receive indoor or tungsten light. As a rule, but with some exceptions, transparency or positive films (slides) have the suffix chrome, such as Kodachrome, Ektachrome, Fujichrome, Agfachrome, etc. Color negative films usually end with the suffix color, such as Kodacolor, Ektacolor, Fujicolor, Agfacolor, etc. (37). Each of these film types is offered in
1138
OPTICAL MICROSCOPY
a variety of film speeds or ISO ratings. Film packages usually display the ISO of the film and indicate whether the film is intended for daylight or indoor/tungsten balanced illumination. Modern film magazines have a code (termed DX) that allows camera backs to recognize the film speed automatically and set the control device to the proper ISO. The color temperature of tungsten-halide lamps in modern microscopes varies between 2900 and 3200 K, depending on the voltage applied to the lamp filament. Film manufacturers offer film balanced for this illumination and usually indicate on the magazine that the film is intended for indoor or scientific use. Fuji offers a very nice transparency film, Fujichrome 64T, that has a rather slow emulsion speed intended for tungsten illumination, but is designed to perform well with push film processing. To push a film, it is first underexposed by one or several f stops (factors of 2), then the development time in the first developer is increased to compensate for this decrease in film density. This technique often will increase the color saturation of the image (19,37). Daylight-balanced film, by far the most common film available at retailers at a wide variety of ISOs, can also be used with the microscope, provided that an appropriate filter is added to the light path. Manufacturers usually add a daylight conversion light filter to their microscope packages as standard equipment, and high-end research microscopes usually have this filter (called a daylight color temperature conversion filter) housed in the base of the microscope. Almost any color print or transparency film can be used for microscopy, provided that the daylightbalanced filter is in place for those films designed for a 5500-K color temperature or removed if the film is balanced for tungsten illumination (3200 K). For many applications, securing the photomicrograph almost immediately is a necessity or a great advantage. Here, Polaroid films are unrivaled. These films are available for photomicrography in three sizes: 35-mm Polachrome (color transparency), and Polapan (black-and white) in 3 14 × 4 14 film packs and 4 × 5 individual film packets (37,39). Larger formats are available in color (ID numbers 668/669 or 58/59) or black-and-white (ID numbers 667, 52, 665, 55, etc.). The Polaroid large format films produce a paper negative and a paper positive print. After the positive print has been made, the paper negative is peeled away and discarded. This can be accomplished within a few minutes. The color films, except for the film pack 64T and the large format 64T, are balanced for daylight (5500 K) color temperature. All microscope manufacturers supply adapters for their photomicrographic cameras that accept large format Polaroid film sizes. Polaroid black-and-white films 665 P/N and 55 P/N require special mention. These films produce a paper positive print and a transparent, polymer-based negative. If the negative is bathed in an 18% sodium sulfite solution for a few minutes, washed, and hung up to dry, it produces a permanent negative of high quality and high resolution suitable for use in a photographic enlarger (37,39). Polaroid 35-mm films deserve more popular use. They are loaded into the usual 35-mm camera back and exposed
in the typical manner according to their rated ISO and color temperature requirements. When the film is purchased, the container includes the film cassette and a small box of developer. The film is processed in a tabletop processor which measures approximately 4 × 5 × 9 . All processing is carried out in daylight after the processor has been loaded and closed. The finished strip of positive color or positive black-and-white transparencies is removed and is ready to be examined and to be cut apart for mounting in frame holders for projection. The actual processing takes only five minutes and produces micrographs of quite accurate color and good resolution. Recently, Polaroid has marketed a relatively inexpensive photomicrographic camera call the MicroCam. This camera is lightweight and can be inserted into one of the eyetubes or phototube of the microscope. It contains an exposure meter, an eyepiece, and an automatically controlled shutter. The camera accepts 339 color film packs or 331 black-and-white film packs. These films, approximately 3 × 3 in size, are self-developing and do not require peeling apart. Although the films have only about half the resolution of the more common large format Polaroid films, the resulting print is easily obtained and may be quite adequate for many applications. DIGITAL PHOTOMICROGRAPHY During the past decade, the quality of digital cameras has greatly improved and today, the market has exploded; at least 40 manufacturers offer a wide variety of models to suit every application. The cameras operate by capturing the image and projecting it directly onto a computer chip without using film. In recent years, the number of pixels of information that can be captured and stored by the best digital cameras approaches, but is still short of, the resolution available from traditional film (37). Digital images offer many opportunities for computercontrolled image manipulation and enhancement as well as the advantage of permanent digital storage. The highest quality digital cameras cost many thousands of dollars. Selecting of an electronic camera must be preceded by carefully considering of its proposed use. This includes examining of fixed specimens or live specimens, the need for gray-scale or true color images, sensitivity to low light levels as in fluorescence, resolution, speed of image acquisition, use in qualitative or quantitative investigations, and the video feed rate into a computer or VCR (1,2,9,37). The two general types of electronic cameras available for microscopy are the tube cameras (Vidicon family) or charge-coupled device (CCD) cameras. Either type can be intensified for increased sensitivity to low light. The SIT (silicon intensified target) or ISIT is useful for further intensification of tube cameras, and the ICCD is an intensified CCD camera. CCD cameras can also be cooled to increase sensitivity by giving a better signal-to-noise ratio. These kinds of cameras can be designed to respond to light levels undetectable by the human eye (2,37). The criteria for selecting of an electronic camera for microscopy include sensitivity of the camera, quantum efficiency, signal-to-noise ratio, spectral response, dynamic
OPTICAL MICROSCOPY
range capability, speed of image acquisition and readout, linearity of response, speed of response in relation to changes in light intensity, geometric accuracy, and ready adaptability to the microscope. A very important criterion for the newer digital CCD cameras is resolution. Current chips range from as few as 64 × 64 pixels up to 2048 × 2048 and higher for very specialized applications, but larger arrays are continuously being introduced; at some not too distant time, digital image resolution will rival that of film. The purchaser must also decide whether the camera will operate at video rate (therefore making it easily compatible with such accessories as video recorders or video printers). Tube cameras and CCD cameras are available for video rate operation. CCD cameras can also deliver slow acquisition or high-speed acquisition of images. Scientific, rather than commercial grade, CCD cameras are the variety most suitable for research. CCD cameras are usually small. They have no geometrical distortion, no lag, and good linearity of response. These cameras are also more rugged and less susceptible to mishandling than tube cameras. Each pixel of the CCD camera is a well for charge storage of incoming photons for subsequent readout. The depth of these wells relates to the quality of the response. Binning of adjacent pixels by joining them together into super pixels can be employed to speed readout in a slow-scan CCD camera (2,37). An emerging technology that shows promise as the possible future of digital imaging is the active pixel sensor (APS) complementary metal oxide semiconductor (CMOS) camera on a chip. Mass production of CMOS devices is very economical and many facilities that are currently engaged in fabrication of microprocessors, memory, and support chips can be easily converted to produce CMOS optical sensors. Although CCD chips were responsible for the rapid development of video camcorders, the technology has remained a specialized process that requires custom tooling outside the mainstream of integrated circuit fabrication. CCD devices also require a substantial amount of support circuitry; it is not unusual to find five to six circuit boards in a typical video camcorder. In the center of the APS CMOS integrated circuit is a large array of optical sensors that are individual photodiode elements covered with dyed filters and arranged in a periodic matrix. A photomicrograph of the entire die of a CMOS chip is shown in Fig. 28. High magnification views of a single ‘‘pixel’’ element (Fig. 28) reveal a group of four photodiodes that contain filters based on the primary colors red, green, and blue. Each photodiode is masked by a red, green, or blue filter in low-end chips, but higher resolution APS CMOS devices often use a teal (blue-green) filter in place of one of the green filters. Together, the four elements illustrated in Fig. 28 comprise the light-sensitive portion of the pixel. Two green filter masks are used because visible light has an average wavelength of 550 nm, which lies in the green color region. Each pixel element is controlled by a set of three transistors and an amplifier that work together to collect and organize the distribution of optical information. The array is interconnected much like memory addresses
1139
Figure 28. CMOS active pixel sensor used in the new rapidly emerging camera on a chip technology. The photomicrograph captures the entire integrated circuit surface and shows the photodiode array and support circuitry. The insets illustrate progressively higher magnification of a set of pixel elements and a single pixel element composed of four dyed photodiodes.
and data busses on a dynamic random access memory (DRAM) chip, so that the charge generated by photons that strike each individual pixel can be accessed randomly to provide selective sampling of the CMOS sensor. The individual amplifiers for each pixel help reduce noise and distortion levels, but they also induce an artifact known as fixed-pattern noise that arises from small differences in the behavior of individual pixel amplifiers. This is manifested by reproducible patterns of speckle behavior in the image generated by CMOS active pixel sensor devices. A great deal of research effort has been invested in solving this problem, and the residual level of noise has been dramatically reduced in CMOS sensors. Another feature of CMOS active pixel sensors is the lack of column streaking due to pixel bloom when shift registers overflow. This problem is serious for the CCDs in most video camcorders. Another phenomenon, known as smear, which is caused by charge transfer in CCD chips under illumination, is also absent in CMOS active pixel sensor devices. Assisting the CMOS device is a coprocessor that is matched to the APS CMOS to optimize the handling of image data. Incorporated into the coprocessor is a proprietary digital video processor engine that can perform automatic exposure and gain control, white balance, color matrixing, gamma (contrast) correction, and aperture control. The CMOS sensor and coprocessor perform the key functions of image capture, digital video image processing, video compression, and interfacing to the main computer microprocessor. The primary concerns about CMOS technology are the rather low-quality, noisy images that are obtained compared with those from similar CCD devices. These are due primarily to process variations that produce a slightly different response in each pixel, which appears in the
1140
OPTICAL MICROSCOPY
image as snow. Another problem is that the total amount of chip area that is sensitive to light is less in CMOS devices and thus makes them less sensitive to light. These problems will be overcome as process technology advances, and it is very possible that CMOS devices will eclipse the CCD as the technology of choice in the very near future. If you have prized transparencies and negatives collected over a long period of time, many of the better camera stores can take these and put them onto a Kodak Photo CD, which is the same size as an audio compact disc (CD). The Photo CD can hold up to 100 images that are stored at several levels of resolution. Digital images recorded on the Photo CD can be displayed on a good monitor by a Photo CD player. If you have a computer and a program such as Adobe Photoshop, Corel Photo Paint, or Picture Publisher and a CD drive, you can open the images on your computer screen, manipulate and/or enhance the images, and then print the images using a digital printer — all without a darkroom! Kodak, Fuji, Olympus, Tektronix, and Sony market dye-sublimation printers that can produce prints virtually indistinguishable from those printed by the usual color enlarger in a darkroom. Nikon, Olympus, Polaroid, Kodak, Agfa, Microtek, and Hewlett-Packard produce 35-mm negative and positive transparency scanners and flat-bed scanners, that directly scan transparencies or negatives or prints into your computer for storage or manipulation. Images can be stored on the computer hard drive or stored on floppy disks in JPEG or TIFF files. Because the storage of floppy disks is limited to 1.44 megabytes, many micrographers now store images on Zip disks or magneto-optical drive disks; these can hold many images to sizes of 10 megabytes or more. Another popular storage medium that is quickly gaining widespread popularity is the recordable CDROM. Magneto-optical disks or CD-ROMs can be given to commercial printers and then printed with stunning color accuracy and resolution. A photomicrograph is also a photograph. As such, it should reveal information about the specimen, and it should also do so with attention to the aesthetics of the overall image. Always try to compose photomicrographs that have a sense of the balance of the color elements across the image frame. Use diagonals for greater visual impact, and scan the frame for unwanted debris or other artifacts. Select a magnification that readily reveals the details sought. Remember to keep detailed records to avoid repeating mistakes and to help by reviewing images that are several years old. Excellent photomicrographs are within the capability of most microscopists, so pay attention to the details, and the overall picture will assemble itself. Acknowledgments The authors thank the Molecular Expressions (http://microscopy. fsu.edu) web design and graphics team for the illustrations in this article. We also thank George Steares of Olympus America, Inc. for providing advice and microscopy equipment used to illustrate of this manuscript. Assistance from Chris Brandmaier of Nikon USA, Inc. in providing equipment for reflected-light differential interference contrast microscopy is also appreciated. Funding for this research was provided, in part, by the National High Magnetic Field Laboratory (NHMFL), the State of Florida, and
the National Science Foundation through cooperative grant No. DMR9016241. We also thank James Ferner, Samuel Spiegel, and Jack Crow for the support of the Optical Microscopy program at the NHMFL.
BIBLIOGRAPHY 1. B. Herman and J. J. Lemasters, eds., Optical Microscopy: Emerging Methods and Applications, Academic Press, NY, 1993. 2. S. Inou´e and K. R. Spring, in Video Microscopy: The Fundamentals, 2nd ed., Plenum Press, NY, 1997. 3. S. Bradbury and B. Bracegirdle, in Introduction to Light Microscopy, BIOS Scientific, Oxford, UK, 1998. 4. E. M. Slayter and H. S. Slayter, in Light and Electron Microscopy, Cambridge University Press, Cambridge, UK, 1992. 5. M. Pluta, in Advanced Light Microscopy (3 vols.), Elsevier, NY, 1989. 6. M. Abramowitz, in Fluorescence Microscopy: The Essentials, Olympus America, Melville, NY, 1993. 7. B. Herman, in Fluorescence Microscopy, 2nd ed., BIOS Scientific, Oxford, UK, 1998. 8. F. W. D. Rost, in Fluorescence Microscopy (2 vol.), Cambridge University Press, NY, 1992. 9. G. Sluder and D. E. Wolf, eds., Methods in Cell Biology, vol. 56: Video Microscopy, Academic Press, NY, 1998. 10. C. J. R. Sheppard and D. M. Shotton, in Confocal Laser Scanning Microscopy, BIOS Scientific, Oxford, UK, 1997. 11. S. W. Paddock, ed., Methods in Molecular Biology, vol. 122: Confocal Microscopy, Methods and Protocols, Humana Press, Totowa, NJ, 1999. 12. J. B. Pawley, ed., Handbook of Biological Confocal Microscopy, 2nd ed., Plenum Press, NY, 1995. 13. S. Bradbury, in The Evolution of the Microscope, Pergamon Press, NY, 1967. 14. Anticipating the Future, Zeiss Group Microscopes Business Unit, Jena, Germany, 1996. 15. G. Nomarski, J. Phys. Radium 16, 9S–11S (1955). 16. R. Hoffman, J. Microscopy 110, 205–222 (1977). 17. S. Inou´e and R. Oldenbourg, in M. Bass, ed., Handbook of Optics, vol. II, 2nd ed., McGraw-Hill, NY, 1995, pp. 17.1–17.52. 18. M. Abramowitz, in Contrast Methods in Microscopy: Transmitted Light, Olympus America, Melville, NY, 1987. 19. J. G. Delly, in Photography Through The Microscope, 9th ed., Eastman Kodak Co., Rochester, NY, 1988. 20. H. G. Kapitza, in Microscopy From The Very Beginning, Carl Zeiss, Oberkochen, Germany, 1994. 21. H. W. Zieler, in The Optical Performance of the Light Microscope (2 vols.), Microscope Publications, Chicago, 1972. 22. J. James and H. J. Tanke, in Biomedical Light Microscopy, Kluwer Academic, Boston, 1991. 23. M. Abramowitz, in Optics: A Primer, Olympus America, Melville, NY, 1984. 24. M. Abramowitz, in Microscope: Basics and Beyond, Olympus America, Melville, NY, 1987. 25. M. Abramowitz, in Reflected Light Microscopy: An Overview, Olympus America, Melville, NY, 1990. 26. R. McLaughlin, in Special Methods in Light Microscopy, Microscope Publications, Chicago, 1977. 27. A. Tomer, in Structure of Metals Through Optical Microscopy, ASM International, Boulder, CO, 1990.
OVER THE HORIZON (OTH) RADAR 28. G. H. Needham, in The Practical Use of the Microscope, Charles C. Thomas, Springfield, IL, 1958. 29. C. P. Shillaber, in Photomicrography in Theory and Practice, Wiley, NY, 1944. 30. F. Zernike, Physica 9, 686–693 (1942). 31. F. Zernike, Physica 9, 974–986 (1942). 32. E. A. Wood, in Crystals and Light: An Introduction to Optical Crystallography, 2nd ed., Dover, NY, 1977. 33. R. E. Stoiber and S. A. Morse, in Crystal Identification with the Polarizing Microscope, Chapman & Hall, NY, 1994. 34. P. C. Robinson and S. Bradbury, in Qualitative PolarizedLight Microscopy, Oxford Science, NY, 1992. 35. A. F. Hallimond, in The Polarizing Microscope, 3rd ed., Vickers Instruments, York, UK, 1970. 36. S. Bradbury and P. J. Evennett, in Contrast Techniques in Light Microscopy, BIOS Scientific, Oxford, UK, 1996. 37. M. Abramowitz, in Photomicrography: A Practical Guide, Olympus America, Melville, NY, 1998. 38. R. P. Loveland, in Photomicrography: A Comprehensive Treatise (2 vols.), Wiley, NY, 1970. 39. Photomicrography: Instant Photography Through the Microscope, Polaroid, Cambridge, MA, 1995.
OVER THE HORIZON (OTH) RADAR S. J. ANDERSON R. I. BARNES S. B. COLEGROVE G. F. EARL Surveillance Systems Division Defence Science and Technology Organization Australia
D. H. SINNOTT Cooperative Research Center for Sensor Signal and Information Processing Mawson Lakes South Australia, Australia
FUNCTION, PERFORMANCE AND IMPLEMENTATIONS OF OVER-THE-HORIZON RADAR In this introductory section, a description is given of over-the-horizon radar in general terms, and current implementations are indicated. Subsequent sections expand in more detail on propagation and scattering issues, operation of OTH radars and system and subsystem design.
1141
wave in the high-frequency band can be propagated over salt water to ranges beyond the horizon and microwave radar ducting in which specific atmospheric conditions can trap an electromagnetic wave within a narrow layer and allow it to propagate to anomalous ranges. The ionosphere is a solar-induced shell of ionization in the upper atmosphere with which electromagnetic waves interact so that, within a range of frequencies, the waves are reflected to earth. Figure 1 illustrates by a ray model how such an electromagnetic wave may be reflected by the ionosphere and thereby propagate over the horizon. An area or object illuminated by this electromagnetic wave returns part of its scattered energy by an (almost) reciprocal ionospherically reflected path and this signal, when received and processed, allows radar operation. The actual operating range of an OTHR system depends on the frequency used within the band for which ionospheric reflection is significant — low frequencies are reflected more strongly, so they illuminate the surface comparatively close to the radar, and conversely for higher frequencies. The limit on maximum range for this single-hop mode of OTHR operation, determined by geometry and practical considerations, is about 3000 km, but multihop operation, in which the electromagnetic wave is sequentially reflected between the earth and the ionosphere, can multiply this range. The practicality of multihop radar operation is quite limited, principally by the signal distortion imposed during multiple reflections, geometric spreading, and absorption losses. For any given frequency of operation, there is a minimum range for which ionospheric reflection to earth is effective. Figure 1 illustrates how rays propagate further into the ionosphere, as the elevation angle increases, until they penetrate the ionosphere and do not return to earth. This establishes the skip zone shown. For a number of practical reasons, a minimum skip distance of about 1000 km is typical of implemented systems, and thus the range of OTHR systems is conventionally stated as 1000–3000 km. The generic techniques of radar and its signal processing, as described elsewhere in this encyclopedia, are, in principle, applicable to OTHR. In practice, severe constraints are imposed by the ionosphere as the propagation medium on the applicability of many techniques. However, the basic approach is in common with other forms of radar: a suitably designed waveform
Lonosphere
Fundamentals of over-the-Horizon Radar Over-the-horizon radar (OTHR) is an application of radar technology that exploits the phenomenon of electromagnetic wave reflection by the earth’s ionosphere to allow radar operation beyond the local horizon (1). Excluded by this definition are other radar applications that may give consistent or ephemeral over-the-horizon performance. Two particular examples excluded are surface wave radar that exploits the fact that a surface
Radar waves Earth surface Skip zone
Figure 1. Typical ray paths for HF radio waves transmitted from the earth’s surface.
1142
OVER THE HORIZON (OTH) RADAR
is modulated onto a transmitted carrier wave and processing of the returned echoes allows defining both range (by determining of elapsed time) and direction (by using directional antennas) of the region under radar surveillance. As pointed out later, a third dimension, that of Doppler, is available to the OTHR operator and is a crucial parameter in deriving information from the radar returns. In particular, Doppler information allows discriminating of moving targets from stationary (or slowly moving) clutter. Clutter is the term used in radar parlance to describe returns from extended areas and is normally seen as contamination to be minimized, by processing in favor of the discrete target return. Of course, in some surface-mapping OTHR applications, this clutter is the object of the surveillance operation. The ionosphere provides both the support for OTHR operation and the principal limitation on performance. The typical plasma density of the ionosphere means that the operating carrier frequency of an OTHR is constrained to lie approximately within the electromagnetic wave band designated as high frequency (HF), 3–30 MHz, although practical considerations set the lower limit for most OTHR realizations at around 6 MHz. The frequency chosen for a particular application depends on many factors, but in broad terms it is determined by the range at which operations are desired, as explained in more detail in what follows. Operation in the HF band implies wavelengths from 10–100 m. It follows that transmitting and receiving antenna systems need to be large to realize adequate directive properties, which derive from multiwavelength antenna apertures. The need for angular resolution has driven conventional radar to employ frequencies well above the HF band, where antenna apertures of tens or hundreds of wavelengths are within the dimensions feasible for mechanically steerable antennas. In the HF band, acceptable angular resolution demands antenna systems of hundreds or thousands of meters long. So mechanical steering is out of the question and electronic beam steering, used only in the most demanding (usually military) conventional radar applications, is mandatory for OTHR. An antenna aperture in two dimensions is desirable to realize a pencil-beam and hence twodimensional angular localization, but to date most OTHR applications have used linear arrays (usually of modestly directional elements) for cost reasons and accepted the consequences of fan rather than pencil beams. Thus, the typical OTHR has very large, ground-basedarray antenna systems with electronic steering that make it appear physically quite unlike conventional radars. Another point of departure from conventional radars is that most realizations use separate transmit and receive sites. Whereas most conventional radars use a common antenna in which operation is monostatic (the single antenna switches between transmit and receive functions), many OTHR systems use continuous transmission of a repetitive waveform and hence need spatially separated transmit and receive antenna sites. The arrays need to be separated by sufficient distance that the direct leakage to the receiving system of the transmitted signal, by lineof-sight or ground-wave propagation, does not interfere
with reception: a separation of 60–100 km is typical. A cost-attractive architecture has a relatively compact, transmitting linear antenna array (100–200 m) and a receiving array about an order of magnitude larger. Then, the transmitter illuminates a region within which a number of much narrower simultaneous receive beams are focused. Besides being constrained to the broad frequency band necessary for ionospheric propagation, an OTHR must also allow for the fact that the ionosphere is a highly variable and unpredictable medium. For acceptable OTHR performance, continuous, real-time, ionospheric assessment (sounding) is necessary, as is the ability of the radar system to adapt to the changes. The day–night variability of the ionosphere requires, typically, a 5 : 1 change in frequency for comparable operational range, but both longer term sunspot-related effects and short-term disturbances are imposed on this diurnal ionospheric cycle. The large propagative distance and hence very weak returned signals, high external noise in the HF band, and the domination of surface-reflected clutter in the returned radar signal mean that OTH radars rely for their operation on high spectral resolution (Doppler) signal processing. Detection of aircraft with a significant radial velocity component toward or away from the radar is the most common application of OTHR, for which coherent illumination of several seconds suffices to satisfy spectral resolution requirements. If the propagative environment is sufficiently stable to allow coherent illumination over tens of seconds, slower large scatterers, such as ships, can be detected and the fine-grain Doppler characteristics of the sea surface can be mapped. It is artificial to propose hard definitional boundaries for coherent illumination times, but it is common to designate operations as air mode (seconds) and surface mode (tens of seconds). Table 1 summarizes the numerical values of major radar design parameters for a typical microwave groundbased military air-defense radar compared with those of a typical OTHR. The table illustrates the distinctiveness of OTHR design. OTHR Implementations The principal difficulties in realizing OTHR have been the generation of suitably stable signals and computing power sufficient to attack what is a highly intensive signal processing task. Both of these difficulties paced the evolution of OTHR, although the technology of ionospherically supported propagation beyond the visible horizon has been accessible since the earliest days of radio. It is still heavily used in shortwave broadcast services and for point-to-point narrowband communications. Some limited OTHR systems were operational as early as the 1950s, but it was not until the 1960s that advances in signal generation and computing allowed fielding OTHR systems of substantial capability (2). In the years following, both Western and Soviet systems were developed for long-range defense surveillance, and developments were ongoing. OTHR applications to sea-state sensing have followed, building on enhanced
OVER THE HORIZON (OTH) RADAR
1143
Table 1. OTHR and Microwave Radar Parameters Parameter
Microwave Radar
OTHR
Operating frequency Average transmitted power Duty cycle Instantaneous bandwidth Waveform repetition frequency Antenna dimensions
1200–1400 MHz 4 kW <16% 1.5 MHz 300–1000 Hz Single 8 × 8 m low sidelobe planar array
Azimuthal coverage Angular resolution Cross-range resolution Instrumented range Range resolution Doppler resolution Dynamic range (clutter peak to noise floor) Steering — elevation Steering — azimuth
360° 1.5° (pencil beam) 5 km at 200-km range 10–400 km 100 m 25 Hz 60–70 dB
6–30 MHz 200 kW 100% 10 kHz 10–50 Hz Separate linear phased arrays: transmitter 100–200 m, receiver 1000–3000 m 90° 0.3–1.5° in azimuth, 10–50 km at 2000-km range. 1000–3000 km 15 km 1.0–0.01 Hz 80 dB
Electronic Rotation at 6 rpm
None (fan beam) Electronic
knowledge of the ionosphere and access to facilities built with defense funds (3,4). OTHR PROPAGATION AND THE IONOSPHERE The Ionosphere and Basic Propagative Principles The earth’s ionosphere, a partial magnetoplasma, is produced by photons and particles that are emitted from the sun and ionize a portion of the neutral upper atmosphere. The lowest altitude at which ionization is encountered is about 60 km, and it extends from there to many earth radii. The region from 90–400 km is the most important for OTHR, because at these altitudes, electromagnetic wave reflection is significant. In this region, the bulk of the ionization is caused by the action of extreme UV and X rays on atoms and molecules of oxygen and nitrogen. The varying frequency bands of the sun’s emission and the physical chemistry of the atmosphere are jointly responsible for relatively distinct ionospheric regions or layers. Those most significant for OTHR operation are designated the D, E, and F regions. The D region does not provide useful reflection of HF waves but is associated with absorption. The E region occurs from 90–150 km, has an ionization peak around 105 km, and provides useful OTHR propagation during the day to about 1800 km. Above this lies the F region, within which there are frequently separate layers. The most important layer for OTHR, because it is the most intense layer in the ionosphere, is designated the F2 layer which peaks at around 300 km. An important feature of the F2 layer is that it is sustained throughout the night by ion transport and the low ionization recombination rate. Free electrons, rather than the far more massive positively charged ions, are the ionospheric constituent of dominant influence on HF propagation. Electron motion induced by the electric component of an incident HF electromagnetic wave leads to a secondary field that
may be described as the polarization field of a dielectric of refractive index n less than one, according to the approximation, 80.5Ne (1) n2 ≈ 1 − fc2 where Ne is the electron density (electrons per m3 ) and fc is the carrier frequency (Hz) of the incident wave. If the electron density gradient is positive with height, the refractive index decreases with height, and an electromagnetic wave from below is deflected increasingly from the normal. If the layer is sufficiently thick that the wave is propagating horizontally before the peak of Ne is reached, then the wave returns to earth. If Ne peaks before the wave is horizontal, then it passes through the layer and is deflected toward the normal in the decreasing electron density environment above the Ne peak. The penetrating rays of Fig. 1 illustrate this. Equation (1) also indicates that waves of lower fc , for which n departs more from one, are reflected more. This simplified wave description neglects the influence of the geomagnetic field, which causes the ionospheric refractive index to have tensor properties and exhibit birefringence. The two modes produced through this mechanism, designated O and X, have locally orthogonal polarizations. Further, collision of electrons with other particles contributes a loss mechanism that makes the refractive index complex. Losses are more significant lower in the ionosphere where the increased neutral density causes higher collision frequencies and at certain wave frequencies where the ionosphere appears highly dispersive. For most situations, OTHR propagation in the ionosphere can be approximated by Hamiltonian ray theory (5) in that the electromagnetic radiation can be thought of as pencils that are bent by ionospheric gradients. Even with this simplification, OTHR propagation may be complicated: for a single target, the separate layers of the ionosphere can present multiple paths, and each of these
1144
OVER THE HORIZON (OTH) RADAR
paths can contribute O and X modes. Outward and inward propagative paths can be a combination of any of the oneway paths, and so at times, a single target can appear as multiple detections at different ranges because of the plethora of paths of different propagative delay. At longer ranges, target paths can be multihop. It is the job of the radar’s coordinate registration system to model these effects and establish the true position of targets (6). Often the O and X modes are not resolved in range by an OTHR. Because the relative phase paths of the two modes can change in time, the combined elliptical polarization appears to rotate in space. This effect, known as Faraday rotation, is pervasive and causes the signal to fade when detected on linearly polarized antennas. Propagative Morphology Backscatter ionograms best describe the coverage that is provided by the ionosphere for OTHR. The backscatter ionogram is a representation of the clutter signal that is backscattered from the environment as a function of group range (half the transmit-to-receive delay multiplied by the speed of light in a vacuum) and carrier frequency. It is produced by a backscatter sounder, which is a swept frequency device that can measure delay (7). Figure 2 shows a typical daytime response. The clutter is calibrated at the receiver input and is displayed in absolute units. The main lobe extends to 3000 km and is due to the single-hop F2 propagation. Because the behavior of the ionospheric layers and the electromagnetic wave propagation that they support are strongly controlled by the sun, the propagation has diurnal, seasonal, and sunspot cycle variations (8). The electron density of the F2 layer can vary over two orders of magnitude during these cycles, and so an intimate
Session Displays Setup Print Options
Help
Backscatter sounder
Group range (km)
12000
−110
10000
−120
8000
−130 −140
6000
−150
4000
−160
2000
−170
Clutter power (dBW)
1998 315:08:23:10 UT
Facility : JFAS Segment : NW Source : Beam 6 KAT2 Delay : −1µSecs Sidelobe cancellation : OFF 1kW correction : OFF Data : RAW 17:53 LT 11 Nov
0 5
10
15
20 25 30 35 Frequency (MHz)
40
45
Figure 2. A backscatter ionogram recorded during day time, looking north across the equator. Both single-hop and chordal propagative modes are evident.
knowledge of the current state of ionospheric propagation is vital for successful operation of an OTHR. It has been shown that existing ionospheric models are inadequate for describing the real variability (9). Variation in the orientation of the magnetic field naturally separates the ionosphere into three phenomenologically different regions: midlatitude, equatorial (lowlatitude) and polar (high-latitude). The equatorial region is characterized by an upwelling of the ionosphere, the equatorial anomaly. The shape of the anomaly is such that north–south propagation across it can be chordal (ground to ionosphere to ionosphere to ground). The polar ionosphere is the most disturbed region. The magnetic field at the poles is vertically oriented and projects far from the surface, where it interacts strongly with the solar wind. Propagation across the poles is fraught with difficulty and can result in deviations of the gross angle of arrival due to walls of ionization associated with polar electrodynamics. The midlatitude ionosphere is relatively stable and exhibits the largest spatial correlation distance for F2 layer electron density and peak height measurements. However, it still exhibits many of the disturbances that are detailed below. Effects on OTHR of Ionospheric Dynamics The dimensions in OTHR measurement space are group range, angle(s) of arrival, and Doppler shift. Ionospheric effects that perturb these domains introduce errors into the estimate of the target state vector. A number of these effects are catalogued here; some of them are infrequent but have a major impact on OTHR performance, whereas others, more commonly encountered, have more subtle impacts. The ionosphere is constantly being disturbed by the propagation of atmospheric gravity waves (10). These waves appear as shifting neutral winds in the ionospheric medium. The subsequent motion of electrons is determined by the geomagnetic field and by the drag induced by the wind. Other forces are also involved, but the net effect is that the electron density is alternately enhanced and rarified in a wavelike fashion that leads to calling the modulation a traveling ionospheric disturbance (TID) (10). TIDs cause apparent motion of an OTHR-displayed target because they distort the angle of arrival and the Doppler shift of the incoming wave. The distortion of the angle of arrival can be as large as several degrees. Sporadic E (Es ), is a thin, erratic layer that forms in the E region of the ionosphere and is most prevalent between 90 and 120 km. Its behavior is strongly controlled by the orientation of the magnetic field, and its production mechanism is quite different at mid, low and high latitudes. At midlatitudes, Es is chiefly a summer daytime phenomenon. Es , which can be the most intense layer in the ionosphere, has several important effects on OTHR. First, it can provide very good propagation for surfacemode detection because it imparts very little Doppler distortion. Second, it can obscure propagation to the F2 layer, which has the negative impact of limiting the OTHR maximum range to about 2000 km. Third, because of its erratic temporal and spatial nature, Es is very hard to
OVER THE HORIZON (OTH) RADAR
model and consequently can play havoc with the coordinate registration accuracy of an OTHR. An image of Es patches produced by the Jindalee OTHR in Australia is shown in Fig. 3. The terminator is the boundary between day and night in the ionosphere. Large systematic distortions of the Doppler and the angle of arrival occur with the passage of the sunset and sunrise terminators because of the abrupt change in ionization production, as illustrated in Fig. 4 (11). The terminator is predictable, and its effects can be modeled and subsequently subtracted from the measurements. This is part of the coordinate registration process, but even this level of propagative modeling is difficult. A more pernicious form of distortion is produced by small-scale ionospheric irregularities. The clutter signal that is received by the radar after propagating through or backscattered by these irregularities may be spread
Date : 1994 Day 10 Date from 1:54:4 to 1:55:35 Radar frequency : 19.90 Mhz Darwin 0 −14
El
−28
1145
in the Doppler domain. One particular form of spread clutter is actually caused by reflection from meteor trails rather than from the ionosphere. Meteor trail lifetimes are such that they produce transitory reflections at HF, which spread their signal energy in Doppler space. Strategies for handling spread clutter exist and include radar waveform parameter adjustment, adaptive signal processing, and antenna pattern control. A very severe, albeit rare, ionospheric perturbation to OTHR operation is short-wave fadeout. This occurs when a disturbance on the sun causes an unusually intense emission of UV and X radiation. The effect in the ionosphere is to increase the D-region electron density dramatically, which in turn increases absorption. This effect can be so serious that the radar is disabled for as long as an hour through a lack of sensitivity. A related but less dramatic effect is a magnetic storm, caused by the interplay of the earth’s magnetic field with the solar wind. This manifests itself at midlatitudes by depressing the intensity of the F2 layer. Magnetic storm effects can last for days and can be so severe that propagation is limited to around 2000 km via the E and F1 layers. Some unusual propagative paths can affect OTHR operation through range folding — the aliasing of longer range returns, whereby they are mapped into shorter ranges through the ambiguity inherent in repetitive waveforms. Ionospheric irregularities on the other side of the earth or from the polar regions produce spread clutter this way. Radiation that has traveled around the world and appears on the OTHR display aliased onto short ranges is a particularly quirky example.
−42 −56 −70 JFAS Relative reflection coefficient (dB) Figure 3. Sporadic E-layer patches mapped by the Jindalee OTHR.
Figure 4. A map of ionospheric Doppler shift recorded at 0647 LT as the dawn terminator crosses the radar footprint.
RADAR SCATTERING PHENOMENOLOGY IN THE HF BAND As for any radar, the ability of the OTHR sensor to derive information about entities in its field of view depends on the interaction of the probe signals — in this case HF electromagnetic waves — with those entities. In most HF radar applications, the interaction may be viewed as a localized electromagnetic scattering process, where the incident electromagnetic field is taken as a plane wave. In common with the terminology used in other branches of radar, the directly backscattered power is represented by describing an object in terms of its aspectdependent radar cross section (RCS): the area for which the power from the incident plane wave would be sufficient to produce, by omnidirectional radiation, the backscattered power density observed. A scattering matrix description generalizes RCS for differing incidence and reflection angles and polarizations. OTHR wavelengths are sufficiently large that most discrete objects of interest are comparable with or smaller than a wavelength. Scattering by conductors comparable in size to a wavelength is dominated by resonances in which the radar cross section can be a sensitive function of frequency. This means that the strength of the radar return is a poor guide to the size of an object — a wire resonant at the OTHR’s carrier frequency can appear to the radar as an echo of strength comparable with that
OVER THE HORIZON (OTH) RADAR
of a large aircraft. For conductors much smaller than a wavelength, the radar cross section, in the Rayleigh scattering regime varies inversely as the fourth power of the illuminating wavelength (12). Calculation of HF radar cross sections of metallic targets, such as aircraft and ships, can reasonably assume that the bodies are perfectly conducting. Then, there are many established analytical and numerical methods that can be used to compute the scattered field. The most widely used technique is the method of moments (13,14) for which many computer code implementations are in general use. An example of such calculations is presented in Fig. 5 which shows the squared magnitudes of the elements of the scattering matrix of a Boeing 727 aircraft in free space at a radar frequency of 15 MHz. As noted previously, a typical OTHR design spatially separates the transmitter site from the receiver site so that they subtend a small but nonzero angle at the target (quasi-monostatic). For almost all practical situations, monostatic RCS calculations are applicable, despite this departure from exactly monostatic conditions. Figure 5 illustrates this point by including full bistatic coverage in the azimuth, though a single common elevation angle has been adopted. Monostatic solutions correspond to trailing diagonals. Free-space radar cross sections are normally used to model OTHR scattering behavior of targets above the earth’s surface that treat the various multipath scattering processes separately. This approximation fails when the target altitude becomes less than a few wavelengths and a more complicated formulation of the scattering problem is required. Discrete targets are a major application of OTHR, but there is also a keen interest in echoes from extended targets, such as the earth’s surface and the ionospheric
Bistatic radar scattering matrix Target: #7 Frequency = 15.00 MHz Lookdown angle = 10.00 deg HH
VH
360
360
270
270
180
180
90
90
0
360
0
90 180 270 360
HV
0
360
270
270
180
180
90
90
0
0
90 180 270 360
0
dBm2
0
90 180 270 360
VV
0
21.0 18.0 15.0 12.0 9.0 6.0 3.0 0.0 −3.0 −6.0 −9.0
90 180 270 360
Figure 5. The copolar and cross-polar RCS of a Boeing 727 aircraft in free space. The figures show the full bistatic RCS in the azimuth for a look-down angle of 10° from horizontal.
Jindalee radar (a)
Jinscat spectrum
(b)
Directional wave spectrum
0 Power density (dB)
1146
−15 −30 −45 −60 −2 −1 0 1 2 Doppler (Hz) Frequency = 17.54 MHz
Wind speed = 10.0 knots Wind direc = 118.8 degs
Figure 6. (a) The directional wave spectrum corresponding to a Pierson–Moskowitz wave-number spectrum with a simple spreading pattern. (b) predicted radar Doppler spectrum for the wave spectrum shown in (a) superimposed on a measured spectrum obtained under similar sea conditions.
plasma itself. The ocean surface is of particular interest because of the characteristic Doppler modulation imposed on the radar signals by ocean wave dynamics. The direct problem of scattering from a given ocean surface, specified by its directional wave spectrum, was originally solved for grazing incidence backscatter at vertical polarization (15); subsequently, the theory was extended to handle arbitrary geometries and polarizations, as illustrated in Fig. 6. This example shows (Fig. 6a) a polar representation of a model sea directional wave spectrum, where the wave number is plotted as the radial coordinate and the (sea) wave direction as the angular coordinate and (Fig. 6b) a measured Doppler spectrum, superimposed on a theoretical Doppler spectrum (the best fit solution to the measured Doppler spectrum from a parametric model). The directions in Fig. 6a are chosen so that ‘‘12 o’clock’’ corresponds to the bearing toward the radar. The wave spectral density is color coded, red indicates the strongest components. The two strong peaks in the Doppler spectra are the Bragg lines that arise from single resonant (Bragg) scattering from advancing and receding waves whose wavelength satisfies the condition for constructive interference. Various techniques have been developed to solve the inverse problem and hence obtain an estimate of the directional wave spectrum, each variant relies on different assumptions about the form of the solution (16–18). OTHR IN OPERATION Discrete Target Surveillance and Monitoring Tasking In target surveillance and monitoring, an OTHR typically sequentially spotlights regions of coverage that are small compared to the potential surveillance area. The radar dwells on one region for a period of time to integrate coherently (usually seconds) and then moves on to the
OVER THE HORIZON (OTH) RADAR
next region. For normal air surveillance, an OTHR usually needs to revisit a region every 30 seconds or so to maintain tracks. Several effects conspire to constrain the area that an OTHR can survey within the revisit time. The limited angular extent of the transmit and receive patterns, the limited range depth provided by the ionosphere on a constant frequency, and signal processing constraints are the main issues. Therefore, it is very important to have a prioritized set of tasks for an OTHR so that the needs of the tasking agency are attended to as well as possible. Radar Management The management of an OTHR to achieve the required tasking most effectively depends on the broader surveillance architecture. For instance, if the radar is part of a network of similar or different sensors, then tasks must be assigned and scheduled across candidate sensors. Some OTHRs have inherent limitations that reduce the range of options and hence make tasking a much simpler exercise. Setting up and scheduling tasks for a single OTHR requires an operator interface with the radar system. After defining the tasking area, the operator must balance the priorities, and decide on the radar parameters to be used. This usually involves a trade-off between radar sensitivity and shorter revisit times. Depending on the sophistication of the radar, the operator may have to decide on the following parameters: Aperture Size. Generally, a larger aperture will provide greater sensitivity and resolution. However, if the radar can split apertures and perform multiple tasks at once, using the whole aperture may unnecessarily impact the revisit time. Waveform Type. Waveform type might be varied for many reasons, for example, some waveforms offer protection against forms of spread Doppler clutter. Waveform Bandwidth. The waveform bandwidth controls the range resolution of the radar. In general, a higher bandwidth is preferable because this provides a greater chance of ionospheric path resolution and can provide extra sensitivity in a clutter background. However, large bandwidth requires greater processing capability and increases the likelihood of interference from and to other users of the HF spectrum. Ionospheric dispersion provides a limit to the usefulness of increasing bandwidth by setting a physical limit on available bandwidth. Waveform Repetition Frequency (WRF). This determines the ambiguity depths in range and Doppler. A higher WRF decreases the unambiguous range depth but increases unambiguous Doppler space. The WRF is often forced lower to avoid range-folded clutter, but this opens the possibility of Doppler-aliased clutter that produces regions of OTHR blindnesss at typical target speeds. Dwell Coherent Integration Time. Increasing the coherent integration time increases the target signal-to-noise ratio (SNR) but impacts the revisit time or the radar coverage potential.
1147
Signal Processing Steps. Basic signal processing produces peaks in azimuth–Doppler–range space corresponding to potential targets. Additional processing can be used to enhance the OTHR target detection capability. For instance, processing to mitigate the effects of transient returns from meteors and lightning may be incorporated, if necessary. Propagative Paths. It is necessary to know the propagative characteristics as the task is set up. An OTHR is likely to be tasked in ground coordinates, but the radar hardware must be set up in radar coordinates, that is, in steer direction and transmitto-receive delay. Therefore, coordinate conversions from ground coordinates to radar coordinates are required to ensure that the radar surveys the intended region. Task Priority. Assigning the various tasks different priorities is a way of altering the task revisit times in a priority-driven system. Revisit Requirement. Although there may be no means for directly changing this variable, the operator will usually monitor the revisit time for all tasks, especially as a new task is installed, to ensure that the radar is not being overused. Receiver Attenuation. At night, absorption of electromagnetic waves decreases and incoming signals can be very large, requiring attenuation to ensure that the receiving system does not distort or overload. Of all the decisions that the operator makes, the choice of operating frequency is probably the most important. Most OTHR systems provide sensor information about the environment to assist operators in their choice. This is the role of the propagative management system (7). The information required to make the best choice of frequency includes the following: Propagative Sensitivity. The prime concern in frequency selection is to ensure adequate sensitivity. The backscatter sounder in conjunction with a background noise measurement can provide an estimate of the expected propagative strength to an area over the HF band. Propagative Spectral Purity. In surface mode, and to some extent air mode, the spread in Doppler space impacts detection capability. An adjunct probing Doppler backscatter radar is useful in these circumstances (19). Available Channels. All HF radars should operate on the principle that they do not interfere with other users of the HF spectrum, and to this end, elaborate spectrum surveillance systems are employed. Even then, there is always some residual background noise which can compromise sensitivity, so radar management should choose a channel that is unoccupied by other HF users and has minimum external noise. Coordinate Registration Issues. The choice of frequency can impact the accuracy of an OTHR. Use
1148
OVER THE HORIZON (OTH) RADAR
of some frequencies may place the radar in a situation where targets are detected through multiple paths. This may prove difficult to model analytically and, subsequently, to determine the actual target locations. In some systems, part or all of the radar management process is automated, and the operator function is one of monitoring to ensure that sensible decisions are made. A description of a representative adaptive radar management scheme that addresses issues such as resource allocation, signal diagnosis, signal correction, task assignment, algorithm selection, and performance evaluation can be found in (20). Detection and Tracking Once the radar is surveying a region, the output is processed through the various stages, until it is presented to tracking operators for further vetting. This is usually done at some stage after the data has been transformed into scaled signal power as a function of angle of arrival, range, and Doppler. The graphical presentation format of the signal level data is crucial. Figure 7 shows the azimuth–range–doppler (ARD) interface used in the Australian Jindalee OTHR system. Usually, some attempt is made to extract peaks automatically from a whitened background (one in which the background noise has been constrained to be of uniform energy density in range and Doppler). Some of the functions that the tracking operators can perform at this stage are as follows: Assistance to the Tracker. Because of the statistical nature of OTHR trackers and the fact that they are
Figure 7. One way of displaying three-dimensional radar data — nesting range within the azimuth along the vertical axis and using the horizontal axis for Doppler. This is the azimuth–range–doppler (ARD) format.
based on incomplete models of the targets, they are less reliable than those for conventional radars. For this reason, most systems allow tracking operators to assist the computer-based tracker. Functions that an operator may perform include track initiation, track deletion, and joining tracks. Assistance to the Coordinate Registration System. Coordinate registration systems are designed to remove system and environmental effects on target detection in an attempt to locate targets accurately on the earth. They vary in sophistication from the manual setting of ionospheric mirror heights, to three-dimensional numerical ray tracing schemes that have system aspects added and use data-driven, real-time, ionospheric models. Coordinate registration is fraught with difficulty because of the uncertainty in many of the component processes. Therefore, it is important that the system handle uncertainty and provide for operator assistance. Some of the activities that a tracking operator might perform are assistance in processing sensor data for the ionospheric model, overriding the predicted effective layer heights and assistance in associating of propagative paths with tracks. Provision of Feedback to Radar Management on the Performance of the Task. Tracking operators view the signal level data and the track output. They can provide feedback on the quality of the data, including any evidence of interference, to the radar operator. Tracking operators are also in a good position to determine when tracks of interest are nearing task boundaries and require that the radar manager move tasking regions.
OVER THE HORIZON (OTH) RADAR
Geophysical Sensing Applications As noted, OTHRs, designed primarily for defense missions such as detecting ships and aircraft, are nonetheless useful as tools for probing the geophysical environment. The typical spatial resolution of about 10 km means that radar observables are averages on this scale and complement point measurements such as those made by a single wave buoy, for example. In contrast, the achievable Doppler resolution with OTHR is extremely fine, so that very subtle dynamic features of the sea surface or of the midlatitude ionosphere, say, may be revealed. Three examples illustrate the use of OTHR for geophysical observations. Ionospheric Dynamics By astute choice of waveform and sampling strategy, it is possible to monitor a region of the ionosphere that spans several million square kilometers at a sampling rate sufficient to observe the evolution of disturbances from atmospheric gravity waves, field-line resonances, day–night terminator passage, equatorial plasma bubbles, and various other phenomena. Figure 8 shows a map of the equivalent vertical velocity of the ionospheric control (reflective) point for a one-hop reflective path; this velocity is inferred from the Doppler shift of ground backscatter by using the effective reflective height model of Breit and Tuve (8). From an analysis of the time evolution of this picture, we can associate the ridge in the lower part of the picture with a propagating solitary wave that endures for more than an hour. Ocean Wave Height Of key interest to mariners, offshore oil facilities, oceanographers, and others is the sea roughness, expressed in
Meters 2.0
1149
RMS wave height (m) Freq = 13.48 MHz
1.5 1.0 0.5 0.0
Figure 9. The root-mean-square (rms) wave height measured over a patch of sea off northwestern Australia.
terms of root-mean-square (rms) wave height or alternative measures of surface variance. HF radar observations of the Doppler spectrum of sea clutter are related directly to the directional wave spectrum of sea waves in the gravity wave band, particularly waves of period between about 2 and 20 s. Estimating the directional spectrum yields wave height via simple integration, and this is the preferred method, but other approaches, some based on pattern recognition, perform almost as well at far less computational cost. Figure 9 shows the rms wave height for a patch of sea off northern Australia derived by using one such method. The enhanced wave height in the center of the radar footprint coincides with the position of a tropical cyclone. Surface Wind Fields Because surface gravity waves on the sea are generated by wind, it is possible to infer the wind field from measurements of the wave distribution. Fairly reliable maps of wind direction (not speed) can be generated from just two point values of the sea clutter Doppler spectrum (21). These are the amplitudes of the two Bragg peaks mentioned earlier that arise from scattering from water waves whose frequencies are typically in the range 0.3–0.5 Hz. Such waves are fairly tightly coupled to the local wind under most conditions, and hence the ratio of wave amplitudes in two opposing directions indicates the generating wind direction, subject to a left–right ambiguity that can be resolved almost everywhere from collateral data or other means. Although far more sophisticated techniques are available, for inferring wind direction, based on the full directional wave spectrum, the present method has the enormous advantage that the spectrum features on which it is based are extremely robust in the presence of signal distortion and contamination. Figure 10 is a map of wind direction covering a region over the northeastern Indian Ocean. It shows strong curvature, which may be a precursor to cyclone generation. SYSTEM IMPLEMENTATION Principal System Architectural Issues
Figure 8. The equivalent vertical velocity of the ionospheric control point for a one-hop reflective path.
Any sensing system’s effectiveness depends critically on the received SNR, and it is a primary OTHR system
1150
OVER THE HORIZON (OTH) RADAR
Figure 10. A map of wind direction covering a region over the northeastern Indian Ocean.
design driver. The received SNR of an OTHR depends on a multiplicity of factors. The fact that ionospheric propagation is essential to obtain any signal at all means that the designer has a major parameter, frequency, determined to a large extent by external and timedependent factors — it is not an independent variable as it is in less medium-dependent realizations of radar. Further, the fact that the operating frequency range is typically 5 : 1 means that the impact of this range on all factors that determine SNR must be recognized. Another factor peculiar to OTHR, as opposed to conventional radar, is that the external noise environment is usually very high (22). This allows some relaxation of antenna design to effect economies, as long as external noise dominates. Table 2 shows the dependence of major radar parameters on physical parameters and illustrates the difficult optimization problem posed by OTHR system design. The table assumes a periodic waveform repetition frequency (WRF) of fr Hz that modulates a carrier of frequency fc Hz where c m/s is the velocity of propagation. From this list, it follows that the most difficult detection conditions are at low fc . By making the ratio fr /fc constant and the number of samples per dwell fixed, the velocity ambiguity is constant, and the revisit time is approximately constant. The highest fr that gives the shortest range ambiguity occurs at the maximum frequency where propagative limitations on simultaneous
range depth are greatest. Hence, range-ambiguous returns are unlikely. At low frequencies, returns from long ranges are likely, and the WRF is correspondingly low. The aim is to follow this simple relationship of WRF to frequency, but it cannot always be maintained because of the need to avoid extended clutter. Subdividing of the transmitting and receiving antenna apertures provides added flexibility and robustness. At high frequencies, where full apertures may be unnecessary for sensitivity or resolution, two half apertures operating on different carrier frequencies can enhance the depth of coverage. At low frequencies, approximately 9 dB of extra sensitivity is achieved by using the full aperture. An approach to antenna geometries can be determined from basic considerations summarized earlier. For a bistatic OTHR, separation of the sites allows basing the design on cost minimization procedures where costs are assigned for transmitting power, transmitting aperture, receiving aperture, and receiver signal processing. The aim in this design approach is to select the worst case conditions and then design to meet a probability of track initiation in a specified time. This allows a trade-off in scanning rate and probability of detection per radar dwell. Sub System Hardware Design For an OTHR to attain its fundamental detection capability, careful attention to individual items of equipment is required, taking note of the particular and demanding environment in which it is to operate. Failure to comprehend and manage this environment will result in various forms of signal corruption that limit system performance. Whatever the precise source of the corruption, there are two principal classes of consequences: the radar system noise floor may be artificially raised, or spurious signals may be generated within the radar bandwidth. In either case, the probability of target detection is compromised compared with the system’s ultimate potential that is set by external noise levels. Waveform Generation The principal design objective for the waveform generation equipment is to maintain spectral purity, so that spurious responses and noise are less than equivalent external noise levels. A database of HF propagative data can be used to establish design objectives in terms of specified radar sensitivity requirements (23). Some quite subtle considerations come into play — an example is the
Table 2. Dependence of OTHR Performance Parameters on Physical Factors Antenna Gain Antenna beamwidth Waveform range ambiguity Waveform velocity ambiguity External noise spectral density Target radar cross section (small targets) Depth of coverage on single fc
Proportional to fc Inversely proportional to fc c/(2fr ) fr c/(2fc ) Falls by 30 dB from 6 to 30 MHz. Proportional to fc4 Limited by absorption during daytime
OVER THE HORIZON (OTH) RADAR
recognition that noise is backscattered into the receiving system from the entire radar footprint, not just from the individual range cell under consideration at a given time. To counter this effect requires suppressing the waveform generator noise floor by an additional 20 dB beyond what would be the level if the effect had been overlooked in the design specification. Power Amplifiers Logically, a power amplifier can be considered an extension of the waveform generator feeding it, and consequently, the waveform generator specification remains relevant, with one significant exception. Some OTHR architectures call for each of a number, NT , of power amplifiers to be appropriately phased and fed in parallel, connecting each to its own antenna in an array. Whereas noise originating in the waveform generator will be beam-formed along with the intended signal, noise generated within the power amplifiers is incoherent across the transmitting aperture. Consequently, in terms of the radar system design, the power amplifier noise specification can be relaxed by 10 log(NT ) dB with respect to the waveform generator noise specification. Antennas Transmitting antenna technology applied to OTHR systems is borrowed from well-established shortwave broadcast application, but has stringent requirements for wide bandwidth and careful control of array mutual coupling effects that can cause operationally unacceptable load interdependencies between power amplifiers. Spatial ambiguities (grating lobes) are controlled either by element spacing that does not significantly exceed half a wavelength at the highest frequency of operation or by pseudorandom spacing. The former approach creates excessive mutual coupling at lower frequencies and drives designers to break the band into two or more subbands with separate arrays for each subband. The latter approach offers economies but delivers generally higher and less controllable array side lobes. Receiving arrays are generally large, so using simple low-cost elements is critical to overall system cost. Control of grating lobes at the highest frequencies determines the maximum interelement spacing and hence the total number of elements required for the desired array aperture. (Alternatively, pseudorandom spacing can be used to break the spatial periodicity which causes grating lobes, but at a cost, as for the transmitting case, of overall side-lobe control.) Although each decibel of mismatch or efficiency loss in the transmitting antenna equates to a decibel loss of radar sensitivity, the receiving antenna operates in such a high external noise field that antenna mismatch and efficiency loss are generally not of prime importance, particularly at the low end of the HF band. This greatly eases the costs of the receiving elements. The criterion is simply that the external noise power available from the antenna should exceed the internal receiver noise by some acceptable quality factor. In fact, excessive coupling of the external environment can compromise
1151
system performance through the nonlinear phenomena discussed later. Wind-driven vibration of the antenna, Aeolian noise, is, in principle, a source of contaminating phase modulation of the radar signal, at least one implemented system uses mechanical dampers on its tubular receive array elements to counter any such effects. Measurements have generally shown that wire antennas that have realistic tension do not contribute measurable Aeolian noise. Receiving System Design of an OTHR receiving system is a particularly challenging task. Critical parameters and major considerations are summarized in Table 3. Signal Processing Modern OTH radars are characterized by a voracious appetite for signal processing capacity: the number of receivers fed from a single receiving antenna array has increased by a factor of about 5 per decade since 1960, and receiver output bandwidth by a similar factor. Accordingly, even the basic processing operations involved in converting from the signal domain (antenna location, time sample) to physical coordinates (range, bearing, velocity) are now demanding tens of gigaflops of processing power. In addition, deeper understanding of the physics involved in propagation and scattering and the development of adaptive filtering algorithms have led to a dramatic increase in the sophistication of signal decomposition and interpretation techniques (26). Notwithstanding the evolution of signal processing objectives and algorithms, the basic architecture of almost all current processing systems remains the classical pipeline structure. In general, it is computationally more effective to perform waveform compression (range processing) first, before transforming the signal from receiver space to beam space. Then, this projection of the incoming signal onto a spatial grid of range–azimuth cells can be repeated for consecutive occurrences of the elemental waveform until a time series of samples is accumulated for each grid cell. For almost all practical applications, it is not the timedomain history but the corresponding frequency spectrum associated with the Doppler signature of the scatterers that is required, so that the third signal transformation is the spectrum analysis of the accumulated time series. The suite of algorithms employed for these signal transformations has included, almost without exception, the fast Fourier transform (FFT), especially for Doppler, often for conventional beamforming, and sometimes for range processing, depending on the choice of waveform. In recent times, greater processing capacity has enabled implementing adaptive beam formers in real time, as well as the limited use of superresolution methods in all three domains (27). Increasingly, the refinement of signal processing techniques is based on more elaborate models of the propagative environment, of target scattering signatures, and of the characteristics of the external noise field. Adoption of these model-based or parametric signal
1152
OVER THE HORIZON (OTH) RADAR Table 3. Critical Receiving System Parameters Noise figure
Out-of-band intermodulation distortion (IMD) In-band IMD Cross-modulation
Reciprocal mixing
Selectivity Local sources of interference
Must be maintained sufficiently low so that the system sensitivity and hence the radar detection capability are determined by external noise sources of atmospheric and galactic origin. Spurious signals can appear at the output of the receiving system as a consequence of the nonlinear interactions of large signals external to the channel under consideration. This phenomenon must be recognized in a balanced approach to receiving system design. Fundamentally the same mechanism applies as for out-of-band IMD, but in this case the signal under consideration is itself driving receiver nonlinearity. This too is fundamentally a nonlinear mechanism, but instead of the nonlinearity creating spurious signals close to the frequency of the wanted radar signal, modulation of external unwanted signals is imposed on the wanted radar signal. This corrupts the frequency domain or Doppler space in which targets may have been detected (24). In a superheterodyne receiver, the radio frequency is converted to an intermediate frequency in various mixers which, ideally, function purely as frequency translators. Any phase noise or spurious signals on the local oscillator signal can mix with large unwanted signals that appear at the radio frequency input along with the radar signal and appear at the same intermediate frequency as the wanted radar signal from which it cannot be separated (25). Within the congested HF bands, it is imperative that the filtering employed within the receiver isolate the wanted radar signal from contamination by much larger unwanted signals. The receiver site should be free of sources of interference that could raise the level of noise or interference in which the radar is required to operate. The need to collocate computing and communications equipment with the receiving system poses significant electromagnetic compatibility engineering challenges.
processing methods must be accompanied by careful validation of the model structure and the use of statistical tests to ensure that the model chosen is appropriate to the prevailing situation. For example, the application of some spatial and temporal desmearing techniques, designed to compensate for ionospheric fluctuations, would actually degrade the data if the observed spectral broadening were caused by a different mechanism such as multimode propagation. Automatic Detection and Tracking The automatic tracking process in an OTHR uses data supplied from a peak detection process. Peak detection consists of peak selection followed by interpolation (28). In peak selection, the local peaks in SNR are found in amplitude versus range and Doppler. A signal can be declared a local peak only if its SNR is greater than all the surrounding ARD values. By fitting a parabola to each selected peak in ARD space, the location of the center of the signal that produced the peak is interpolated between the cell positions. To limit the number of interpolated peaks sent to the tracking process, a threshold in SNR is applied which is a compromise between enhanced detection of small targets and spurious detections from noise, clutter, and interference. The threshold value is determined by the automatic tracking system’s ability to deal with false measurements and prevent the initiation of false tracks. The peak detections supplied to the automatic tracking process contain ambiguous target identification information, the SNR gives some indication of which peaks are likely to be from targets. However, clutter interference from meteors and the residues from interference suppression can give target-like returns. Further, ionospheric multimode propagation gives multiple peak detections from a single target. Normally, they are
resolved in range by correctly selecting of the radar bandwidth. The automatic tracking system must cope with these uncertainties in the sources of peak detections and allow for closely spaced target peak detections created by multimode propagation. In addition, the tracking system must resolve any velocity ambiguity from the measured Doppler. The most common approach is to track the separate multimode returns in slant range coordinates, group the multimode tracks to form a fused track, and then convert the fused track to ground coordinates. An alternative approach is to have a model of the multimode propagation in the tracking filter, which then tracks in ground coordinates (29). This approach has yet to be demonstrated operationally under conditions that include errors in the ionospheric model, multiple targets, and velocity ambiguity. A demonstrably useful approach is based on the probabilistic data association filter (PDAF) (30). A fixed number of nearest measurements is selected by each filter, and for each measurement, the probability is found that it is either from the target or from clutter and noise. These probabilities control the correction applied to the peaks that update the filter. Possible PDAF extensions include automatic track initiation with multiple velocity ambiguities and tracking in a nonuniform cluttered environment. LIMITATIONS AND AVENUES FOR FUTURE DEVELOPMENT OF OTHR TECHNOLOGY It is evident that the extensive coverage and high coverage rate of OTHR is achieved at the cost of employing a propagative channel that may distort and contaminate the signals severely. This has motivated numerous
OVER THE HORIZON (OTH) RADAR
1153
Table 4. Projected Future Enhancements for OTHR Systems Enhancement Sought
Strategy for Achievement
Improved detection range and greater ability to detect small, fast (airborne) targets
Noise and elevated clutter rejection
Greater ability to detect small, slow (ship) targets Improved spatial precision and resolution
Minimization of sea clutter
Reduced siting costs and expanded options for deployment Remote operation to enable tasking authority to control radar in real time and view product Characterization of scatterer
Overcome array diffraction limit
Single-site operation
Indicated Developments Efficient antenna elements (D), polarization filtering (D), spatiotemporal filtering (D), two-dimensional arrays (D), optimum siting (U), lower frequency (U — needs very tall antennas). Special wave forms (A), multistatic configurations (A), process model optimization (U), optimum siting (U). Model-based spectrum analysis (D), extended aperture (U), interferometry (D), stereoscopic observations (A), multistatic configurations to increase time-delay resolution (U), wide bandwidth waveforms (D), superresolution algorithms (D). T/R switched pulse waveform (A), highly linear receiver (A), other special waveforms (U), passive radar (A).
High bandwidth communications links and appropriate protocols
Commercial communications technology application (D)
Exploit static and dynamic signatures
Carrier frequency domain (D), kinematic domain (U), Doppler domain (U), phase signature domain (D).
investigations into the factors that now limit performance and into strategies for overcoming them. Table 4 lists desired OTHR features or enhancements that are judged of greatest practical importance and associates these with prospective technical solutions. The feasibility of implementation is indicated by D (demonstrated), A (deemed achievable but not demonstrated) or U (deemed understood but subject to confirmation or has major implementation difficulties). BIBLIOGRAPHY 1. M. I. Skolnik, Radar Handbook, 2nd ed., McGraw-Hill, NY, 1989. 2. J. M. Headrick and M. I. Skolnik, Proc. IEEE 62, 664–673 (1974). 3. T. M. Georges, IEEE Trans. Antennas Propagation Ap-28, 751–761 (1980). 4. Special Issue on High-Frequency Radar for Ocean and Ice Mapping and Ship Location, IEEE J. Oceanic Eng. OE-11(2) (1986). 5. K. G. Budden, The Propagation of Radio Waves: The Theory of Radio Waves of Low Power in the Ionosphere and Magnetosphere, Cambridge University Press, 1985. 6. L. F. McNamara, The Ionosphere: Communications, Surveillance and Direction Finding, Orbit, Krieger, Melbourne, Fl, 1991. 7. G. F. Earl and B. D. Ward, Radio Sci. 22(2), 275–291 (1987). 8. K. Davies, Ionospheric Radio, Peter Perigrinus, London, 1990. 9. R. I. Barnes et al., Radio Sci. 33(4), 1,215–1,226 (1998). 10. K. Hocke and K. Schlegel, A Review of Atmospheric Gravity Waves and Travelling Ionospheric Disturbances: 1982–1995, 1996. 11. S. J. Anderson and M. L. Lees, Radio Sci. 23(3), 265–272 (1988). 12. R. F. Harrington, Time Harmonic Electromagnetic Fields, McGraw-Hill, NY, 1961, pp. 292–296.
13. R. F. Harrington, Field Computation by Moment Methods, Macmillan, NY, 1968. 14. E. K. Miller, L. Medgyesi-Mitschang, and E. H. Newman, eds., Computational Electromagnetics; Frequency-Domain Method of Moments, IEEE Press, NY, 1992. 15. D. E. Barrick, in V. E. Derr, ed., Remote Sensing of the Troposphere, U.S. Government Printing Office, Washington, DC, 1972, Chapter 12. 16. B. J. Lipa, Radio Sci. 12, 425–434 (1977). 17. L. R. Wyatt, Int. J. Remote Sensing 11, 1,481–1,494 (1990). 18. S. J. Anderson, Proce. Radarcon-90, Adelaide, Australia, April 1990, pp. 315–323. 19. R. I. Barnes, IEE Proc. Radar Sonar Navigation 143(1), 53–63 (1996). 20. S. J. Anderson, Proc. IEE Part F, 139(2), 182–192 (1992). 21. A. E. Long and D. B. Trizna, IEEE Trans. Antenna Propagation AP-21, 680–685 (1973). 22. Characteristics and Applications of Atmospheric Radio Noise Data, CCIR Report 322-3, International Telecommunications Union, Geneva, Switzerland, 1988. 23. G. F. Earl, Radio Sci. 33(4), 1,069–1,076 (1998). 24. G. F. Earl, IEEE Trans. Instrum. Meas. IM-36(3) (1987). 25. G. F. Earl, Consideration of Reciprocal Mixing in HF OTH Radar Design, HF Radio Systems and Techniques, IEE Conference Publication, 1997, 411–. 26. S. J. Anderson and Y. I. Abramovich, Radio Sci. 33(4), 1,055–1,067 (1998). 27. S. J. Anderson, Proc. Pacific Ocean Remote Sensing Conf., PORSEC’94, Melbourne, Australia, March 1994. 28. S. J. Davey, S. B. Colegrove, and D. Mudge, Advanced Jindalee Tracker: Enhanced Peak Detector, Defence Science and Technology Organization Technical Report No. DSTO-TR0765, 1999. 29. G. W. Pulford and R. J. Evans, Automatica 32(9), 1,311– 1,316 (1996). 30. Y. Bar-Shalom and E. Tse, Automatica 11, 451–460 (1975).
P PARTICLE DETECTOR TECHNOLOGY FOR IMAGING
of timing information, however, limit the applicability of this technique to imaging. The combination of lifetime (τ ), extended when viewed from the laboratory by the relativistic factor, γ D E/mc2 (where E is the particle energy and m its mass) and normalized particle velocity, β D v/c (where v is the particle velocity and c is the speed of light), yields an average distance traveled before decay of z D γβcτ . The most common particles that can be tracked and imaged include electrons, protons, some nuclei, muons, pions, kaons, and neutrinos. High-energy photons (X rays and gamma rays) also have particlelike properties and provide great utility in imaging applications. The source of the particles can be radioactive tracers distributed within the object being imaged or an external source such as a radioactive source, a reactor, an accelerator, or cosmic rays. For further reading in this area, see (1–7).
GEOFFREY N. TAYLOR The University of Melbourne Parkville, Victoria Australia
INTRODUCTION: PARTICLE IMAGING In this article, ‘‘particles’’ refer to charged particles that here enough energy to pass some significant distance through matter. The interaction of particles with matter forms the basis on which particle detectors are designed and operated. The principal process for detection is ionization of matter by the electromagnetic interaction of the moving particle. Ionization produces free charge that can be collected and detected. Electromagnetic radiation from the charged particle moving through matter may also be used to create a detectable signal. Neutral particles can be observed following their interaction with matter, producing one or several charged particles, that are subsequently detected. The development of particle detectors has occurred within the fields of nuclear and high-energy particle physics. In this regime, the interest lies in reconstructing particles emanating from a particle interaction or from the decay of unstable particles or nuclei. The trajectories of the detected particles are recorded to reconstruct the kinematics (momentum and energy of the particles) of the reactions being observed. The particle trajectories are sampled as the particles pass through the detector elements. The sampled track positions are used to reconstruct the full trajectory. In addition to the kinematics of such reactions, spatial information is also important, especially when determining the distance that an unstable particle travels before decaying. The decay time distributions of unstable particles follow the exponential statistical time distributions of radioactive decays whose half-lives or lifetimes are characteristic of the particle or the nuclear state being studied. For sufficiently long lifetimes, the combination of track length before decay and the reconstructed kinematics of the decay products are used to determine the decay time of individual decays. The ensemble of data is then used to extract the characteristic lifetime of the particle from the distribution of the individual decays. The list of particles that can be detected is generally restricted to those whose stability is sufficient for travel some minimum distance in matter. Current limitations of the spatial resolution of electronic detectors limit observable track lengths to a few hundred microns or more. (Stacks of particle-sensitive emulsion, similar to photographic film, have successfully measured decay lengths to a few microns. The rate limitation and lack
INTERACTIONS OF PARTICLES WITH MATTER Ionization The passage of moving charged particles through matter results in a coulombic interaction with the atomic electrons in the matter. Energy is transferred from the moving particle to the material. Sufficient energy transfer to the atomic electrons causes ionization of the atom, resulting in an ejected electron and an ionized atom (ion). The electrically charged ionization electrons and ions form the basis of the electrical signal that signifies the passage of the charged particle. In crystalline solids, ionization results in a free electron and ‘‘hole,’’ both of which can contribute to electrical signals. The average rate of energy loss of a high-energy particle, charge z, velocity β, whose mass is much greater than that of the electron is given by the Bethe–Bloch equation: Z 1 1 δ 2me c2 β 2 γ 2 Tmax dE 2 D Kz2 , β ln dx A β2 2 I2 2 (1) where Z and A are the atomic number and atomic mass, respectively, and I is the average ionization energy of the material through which the particle is traversing. K D 4π NA r2e me c2 , NA is Avogadro’s number, re D e2 /4π ε0 me c2 ³ 2.82 fm is the classical electron radius, and me is the electron mass, me c2 D 0.511MeV. Tmax is the maximum kinetic energy that the particle can transfer to a stationary electron:
Tmax D
2me c2 β 2 γ 2 2me D 2 , (2) me me 2 M C m2e C 2me E/c2 1 C 2γ C M M
where γ D E/m0 c2 and E, p, v, and M are the total energy, momentum, velocity, and mass of the moving particle. The 1154
PARTICLE DETECTOR TECHNOLOGY FOR IMAGING 10
−dE/dx (MeV g−1cm2)
8 6
H2 liquid
5 4
He gas 3 2
Sn Pb
1 0.1
1.0
10
100
Fe
Al
C
1000
10000
βγ = p/Mc 0.1
10 100 1.0 Pion momentum (GeV/c)
0.1
0.1
1.0 10 100 Muon momentum (GeV/c)
1.0
10 100 1000 Proton momentum (GeV/c)
1000
1000
10000
Figure 1. Mean energy loss rate in various materials (7).
distance dx is given in units of areal density, mass per unit area, g cm2 . (Dividing by the material density gives the distance in centimeters.) 100 90 80 70 60
K
p
d
1155
Figure 1 shows the mean energy loss rate in various materials. Figure 2 shows the measured ionization energy loss for electrons, pions, kaons, protons, deuterons, and tritons, in the OPAL jet chamber (8), displaying the expected velocity dependence. The different particle masses at given momenta provide different velocities. At low velocities, the 1/β 2 dependence dominates. Charged particles lose energy at higher rates per distance traveled, as they slow to a stop. This effect causes a (heavy) charged particle to come to a stop rapidly near the end of its trajectory. The energy loss rate reaches a minimum at intermediate values of γβ (minimum ionizing particles or MIPs) before the logarithmic increase becomes important at higher values. This ‘‘relativistic rise’’ due to length contraction in the direction of the particle and elongation of the field transverse to the particle direction is generally small. However, in low density gases, the relativistic rise can be significant before reaching a plateau. The plateau or ‘‘density effect’’ is the result of shielding by polarization of atomic electrons, and hence sets in earlier in higher density material (9). For a detailed description of the determination of ionization energy loss in various materials, see (10). The energy loss of a charged particle passing through some small amount of material is stochastic. The ionization energy deposited in a thin layer of material follows the ‘‘Landau distribution’’ (11), as shown in Fig. 3. The average value and the shape of the distribution describe the energy deposited in charged particle detectors. In most detectors, the detected signal will follow a similar distribution,
t
50 40
dE/dx (keV/cm)
30
20
e 10 9 8 7 6 5
π
4 3
2 10−1
1
10
p/|Q| (GeV/c)
102
Figure 2. The measured ionization energy loss for electrons, pions, kaons, protons, deuterons, and tritons in the OPAL jet chamber (8), displaying the expected velocity dependence. The different particle masses at given momenta provide different velocities.
1156
PARTICLE DETECTOR TECHNOLOGY FOR IMAGING
The Landau distribution is then described by 1000
Qmp = 25.1 ± 0.1
ω(λ) D
Qmean = 31.3
# Events per bin
800
400
200
0
20
40
60
80
100
120
140
Cluster charge (ADC counts)
Figure 3. The Landau distribution shows the energy loss distribution measured in a 300-µm thick silicon detector (12).
modified by any nonlinearity, noise, or dynamic range limitations. It is usual to define f (x, ) D
1 ω(λ) ξ
(3)
as the probability that the particle will lose energy after traversing material of thickness x. λ is the difference between and the most probable energy loss mp , λD
mp ξ
(4)
The Bethe–Bloch equation gives the normalization 0.154 Z x ð 103 ξD β2 A
X0 D (keV)
1
u ln uλu e sin π u du.
(6)
0
Figure 3 shows the energy loss distribution measured in a 300-µm thick silicon detector (12). The Bethe–Bloch equation is not appropriate for electrons. In that case, the equal mass of a moving particle and atomic electrons significantly changes the kinematics. Radiation dominates over ionization for electrons (and positrons) at energies as small as some tens of MeV. The contributions to energy loss for electrons and positrons are shown in Fig. 4. At energies above several hundred GeV, radiation also becomes the dominant energy loss mechanism for muons. The high energy tail in the Landau distribution is due to ‘‘knock-on’’ electrons or ‘‘delta rays’’ from rare, high-energy transfers to individual atomic electrons, resulting in their rapid ejection. If they have enough energy, these particles also contribute to ionization and thus to additional energy deposition in the material. If the detecting material is thin, however, such as in silicon detectors, the high-energy delta rays escape and do not contribute to the energy detected. In this case, the signal energy detected is less than the energy loss. Interactions of photons with matter are discussed elsewhere. However, at high energies, electrons and photon interactions are both characterized by material thickness in units of radiation length X0 . For high-energy particles, this is the average distance by which the electron energy has been reduced to 1/e of its initial value. Tsai (13) calculated the radiation for a range of materials. A useful fit to these calculations is reported in PDG from O.I.Dahl as
600
0
1 π
(5)
716.4 g cm2 A p . Z(Z C 1) ln(287/ Z)
(7)
The passage of charged particles through matter will also result in coulombic interactions with atomic nuclei. The
where x is the areal density measured in g cm2 .
0.20 Positrons
Lead (Z = 82)
Electrons 1.0
−
1 dE (X −1) 0 E dx
0.15
0.10 Ionization Moller (e − )
0.5
Bhabha (e + )
0 Figure 4. The contributions to energy loss for electrons and positrons (7).
0.05
Positron annihilation 1
10
100
E (MeV)
1000
(cm2g−1)
Bremsstrahlung
PARTICLE DETECTOR TECHNOLOGY FOR IMAGING
volume occupied by the nuclei is sufficiently restricted that hard scattering of nuclei occurs much less frequently than the interactions between a moving particle and atomic electrons. Furthermore, as distinct from ionization where a heavy moving charged particle is considerably more massive than electrons, the opposite is the case in nuclear coulombic scattering. The large number of interactions in ionization results in significant energy loss but in little change in direction of the heavier moving charged particle — energy loss without significant scattering results. In nuclear coulombic scattering, the massive nuclei behave as scattering centers, but the scattering is typically elastic — scattering without significant energy loss results. The coulombic force between a moving particle of charge z and momentum p (and velocity β D v/c, where c is the velocity of light) and a matter nucleus of charge Z results in Rutherford scattering that depends on the impact parameter — the perpendicular distance between the particle trajectory and the matter nucleus. Taking account of the statistical distribution of the possible impact parameters, the ‘‘differential cross section’’ for scattering an angle from the original trajectory is given by the Rutherford scattering formula: dσ D 4zZr2e d
me c βp
2
1 sin4 (/2)
,
(8)
where re and me are the classical electron radius and electron mass. dσ /d is the differential cross section for scattering into a fraction d of the unit sphere, centered on the scattering nucleus. The Rutherford formula assumes cylindrical symmetry around the initial particle trajectory. Thus, the scattering depends only on the angle from the initial direction, not on the orientation of the scattered trajectory around the initial direction. The coulombic scattering cross section is strongly peaked at small scattering angles. Thus, single scatters generally contribute little to a change in the direction of a moving particle. However, passing through a macroscopic amount of matter, a significant directional change can result from the statistical addition of a large number of small scatters. Multiple coulombic scattering (MCS) must be considered when evaluating the trajectories through matter that lead to a loss of information about the original particle trajectory. Assuming that the scattering angle results from many small angle scatters, the overall scattering angle is, to a good approximation, Gaussian distributed, and the width (for scattering in a plane) is given by θ0 ³
13.6MeV z x/X0 . βcp
(9)
The characteristic MCS scattering angle is proportional to the square root of the thickness of the material traversed, measured in radiative lengths. It should be pointed out that this Gaussian approximation underestimates the probability of large single scatters. The tail in the scattering distribution is thus somewhat underestimated.
1157
The interactions of photons with matter, discussed elsewhere, are related to processes that affect electrons as they pass through matter. Magnetic Fields and Particle Trajectories The classical Lorentz electromagnetic force, which is valid relativistically, results in helical charged particle trajectories in a (uniform) magnetic field. The component perpendicular to the magnetic field follows a circular orbit. Combining this with the longitudinal momentum component results in a helical orbit. The orbit radius is related to the particle momentum and magnetic field by p cos λ D 0.3BRz,
(10)
where p is the momentum in GeV/c, λ is the pitch of the helical trajectory, B is the magnetic field in tesla, R is the radius of curvature in meters, and z is the particle charge. From knowledge of the magnetic field, measurement of the perpendicular curvature yields the particle momentum. Energy loss as the particle passes through matter results in a reducing radius. Scattering produces deviations from a helical path and thus contributes to a loss in momentum resolution. These effects must be considered in determining particle momenta. The positions of particle tracks as they pass through discrete detector elements are sampled to give information from which the full particle trajectory can be reconstructed. The signals must be corrected for any detector biases or misalignments before the data are processed to reconstruct the tracks. The fitting procedure depends on the geometry of the tracking detectors and the presence or otherwise of a magnetic field. Fitting the track curvature through a magnetic field yields a measure of the momentum. Errors in the measured track positions result in errors in the momentum determined. Assuming a uniform medium and a large number (½10) N of measured points, equally spaced along a track length L, each of equal measurement error ε, ¨ (14) determined that the error in the measured Karimaki track curvature, k D 1/R, from detector resolution is
ε 720 ¾ . (11) δkres. D 2 L NC4 The complete covariance matrix is presented in (14) for nonuniform measurement spacing. An additional error is introduced by multiple coulombic scattering, given by the Particle Data Group as (7) z0.016(GeV/c) L , (12) δkmcs ³ Lpβ cos2 λ X0 where X0 is the radiative length of the material traversed. Assuming that the curvature measurement error and the error due to multiple scattering can be added quadratically, the total curvature error becomes δk2 D δk2res. C δk2mcs .
(13)
To optimize momentum reconstruction, equal emphasis must be applied to control each term.
1158
PARTICLE DETECTOR TECHNOLOGY FOR IMAGING
A program widely used for estimating expected momentum errors for cylindrically symmetrical geometry is TRACKERR (15). Fitting tracks to a point of origin (vertex fitting) can provide an additional constraint on the curvature and hence improve the determination of the momentum (‘‘vertex constrained fit’’). The fitted tracks can be used to reconstruct the interaction that produced the tracks; to search for separated vertices, indicative of decays or secondary interactions; and to determine the spatial origin of the tracks. For more details, see (16,17). In high-energy particle physics, measurements of the particle energies can also be carried out by completely absorbing the particles in calorimeters. Electromagnetic and nuclear (‘‘hadronic’’) cascades result in multiplication of particle tracks. The number of tracks produced is proportional to the energy of the initiating particle. To estimate the number of tracks, active elements (such as scintillator or wire chambers) at discrete points within the cascade region measure the total ionization and its spatial distribution. Typically, a sandwich of many layers of radiator material followed by active volumes is used to contain the complete shower in a sampling calorimeter. A material of high atomic number such as lead is chosen for electromagnetic calorimeters, and iron or a similar material for hadronic calorimeters (18,19). Alternatively, complete absorption calorimeters can be used. Electromagnetic scintillating crystal calorimeters whose scintillation signal is proportional to the energy deposited, or liquid argon or krypton calorimeters that measure total ionization energy can be used to absorb electromagnetic cascades completely. Particle identification is often important in highenergy physics and in nuclear physics applications. Separation of the various commonly produced particles is possible via their mass differences. For fixed momentum, different masses result in different velocities. Thus, velocity-dependent measurements can thus be applied to distinguish the particle masses. Ionization, particle flight times over fixed distances, and Cerenkov radiation are velocity-dependent effects that can be used for this task. Scintillation Ionization energy loss in scintillating materials results in emission of visible light and UV photons in deexcitation of the excited molecules. Collection of this light constitutes particle detection. An appropriate array of scintillators can make particle imaging practical. In particular, the use of scintillating fiber arrays has become a successful high precision tracking technique. Liquid and solid organic scintillators that have carefully chosen dopants provide fast, reliable, scintillation detectors. Coupling of light guides to plastic scintillators to guide the scintillation light to the photocathode of a photomultiplier is a standard technique for reading out particle-induced signals. Organic scintillators can have very fast signal rise times that result in precise time resolution (typically 150 ps) and short recovery times (10 ns) that are excellent for high rate applications. The positional resolution is determined by the geometry of the scintillator array.
Organic plastic scintillators consist of a base material, usually polystyrene, polyvinyltoluene, or acrylic, and one or more dopants consisting of organic fluors. The base molecules are excited by absorbing approximately 5 eV of energy from ionizing particles. The dopants are chosen to absorb at a wavelength very close to the fundamental excitation of the base molecules (within approximately 1 nm). Energy is transferred quickly from the base molecules to the dopants at high quantum efficiency approaching 100% (19–21). Commonly used combinations result in blue–green or UV–blue emission. A typical 1% doping level gives adequate light emission but also results in significant self-absorption of light that travels through more than about 1 m of the material. Self-absorption can be reduced by using low concentrations (100 ppm) of additional wavelength-shifting dopants. Cerenkov Radiation Cerenkov radiation is emitted by charged particles passing through material at a velocity greater than the (phase) velocity of light in the material. Three properties of Cerenkov radiation are important in particle detectors (23): the angle at which the Cerenkov radiation is emitted, the velocity threshold that must be exceeded for radiation to occur, and the number of photons emitted. The first of these properties is the most important for particle tracking. Cerenkov radiation is emitted in a cone of half-angle θc , given by the expression θC D cos1 (1/nβ),
(14)
where n is the index of refraction in the material, and β D v/c is the particle velocity. The cone half-angle, velocity threshold for radiation, and the velocity dependence of the number of photons radiated are all important for velocity determination. Measurements of these quantities can provide particle identification via sensitivity of the velocity to the particle mass for a given momentum (7,23). Transition radiation, a related process, occurs when a high-velocity particle crosses a material boundary where an abrupt difference exists in electrical permittivity (24). The number of photons emitted is proportional to (ln γ )2 , where γ D E/mc2 . The photon yield is enhanced by creating multiple transitions by using thin foils or expanded polystyrene. The radiation energy is also strongly γ -dependent. Typically, a signal is generated in the soft X-ray region. Transition radiation is particularly interesting for distinguishing electrons from heavier charged particles. For the same momentum, the γ for electrons is significantly higher than for the more massive particles and enhances the transition radiation produced. PARTICLE DETECTION TECHNIQUES Scintillation Tracking Devices Light plastic and liquid scintillators are appropriate for tracking applications to minimize scattering. In X-ray and gamma-ray imaging, heavy inorganic scintillators are
PARTICLE DETECTOR TECHNOLOGY FOR IMAGING
used to absorb photons highly efficiently. Imaging of highenergy photons using a segmented scintillator is discussed elsewhere in this volume. The positional resolution of scintillator tracking detectors is determined by the geometry of the scintillator. These detectors can be used as the principal tracking device, where fine scintillating fibers have been successfully implemented. The fast response of scintillators makes them attractive as the ‘‘trigger’’ device, where coarser segmentation is sufficient. A signal in trigger detectors can be used to initiate a readout of other, higher precision tracking devices. The timing information provided by the scintillator can provide further selection criteria and can be used to initiate a measurement in devices such as drift chambers (see later) that use timing information to determine track positions. Techniques for converting the scintillating light signal into an electrical signal must take account of the small light production and collection efficiency and of the quantum efficiency (efficiency in detecting single photons) of the light detection itself (20–70%). For tracking, the light detection device must retain the positional information or the identity of the fiber from which the light was obtained. Position-sensitive photomultipliers are available commercially for this application (25). Hybrid photomultipliers have also been developed for imaging applications. Alternatively an image intensifier can be used to maintain positional information while increasing the signal amplitude and maintaining signal timing. The amplified light output of the image intensifier can be imaged by a CCD device or an array of such devices. The image intensifier can be triggered or gated by an external signal to record only data coincident with a given event. Bundles of scintillating fibers, plastic and glass, and liquid filled capillaries, coupled to light detection arrays via inactive optical fibers have been successfully used in
Scintillating glass target
Three-stage image intensifier
particle tracking (see 26 and references therein, 27, 28). Figure 5 shows a schematic of such a combination. Wire Chambers. The signal in a gas particle detector commences with primary ionization of the gas molecules by the track being measured. Primary ionization is characteristic of the ionizing particle track and of the gas. Some of the primary ionization electrons have enough energy to produce further, secondary ionization. The primary charge will drift under the influence of an applied electric field and can be collected for measurement. Typical gases used in such detectors are the inert gases, carbon dioxide, methane, and ethane. The energy W required to produce an electron–ion pair in such gases is typically 20–40 eV For minimum ionizing particles, the ionization energy loss per track length varies with the density and atomic number of the gas molecules. The resulting ionization varies from approximately 10 electron–ion pairs per centimeter for the lowest atomic weight gases to several hundred pairs for the highest. The relatively low ionization rates lead to a statistical distribution in the primary ionization that is described by Poisson’s law. The ionization per track length follows a Landau distribution. The statistical nature of the ionization results in varying efficiency for detection and needs to be considered in thin gas chambers. The ionization cloud is drifted through the gas under the influence of an applied electric field E to a detecting anode. The speed at which the charge cloud drifts under the influence of the field is the drift velocity. The ratio of drift velocity to electric field is the mobility µ D vdrift /E. For ions, the drift velocity is proportional to the field, and the mobility is approximately constant. For electrons, the strong energy and gas dependence of
Reducing taper
12
CH4
Video
VME bus
VDAS ADC/compactor 16 megabyte FIFO
Drift velocity (cm/µs)
CCD/SIT camera
MCP
1159
10
8
6
4
C2H4 CO2
2
N2 Ar
2 tape drives
Image processor event display
Figure 5. Schematic of a scintillating fiber detector (25).
0
2 3 Electric field (kV/cm)
Figure 6. Electron drift velocity in various gases at STP (37).
1160
PARTICLE DETECTOR TECHNOLOGY FOR IMAGING
2 × 10−1 Argon
sx (cm1/2)
10−1
Exp.CH4
25% isobutane 75% argon (theory) Exp.C2H2 Lower limit from temperature (KT = 2.5 • 10−2)
10−2
Exp.C3H6 Pure isobutane (theory) Exp.CO2
the electron–molecule collisional cross section results in a complicated electric field dependence of the drift velocity [see, for example, Biagi (29) and Fig. 6]. Ion mobilities are in the range of 1–10 cm2 V1 s1 for typical chamber gases at STP. An ionization charge will diffuse from the production region where its density is largest. When drifting the charge to collection anodes, diffusion continues to expand the region occupied by the charge, affecting the precision with which the cloud can be localized. Drifting the ionization charge under the influence of an applied electric field increases the energy of the charge. For electrons in particular, the strong energy dependence of the electron–molecule collisional cross section can result in field-dependent diffusion. An Enhanced collisional cross section results in broadening of the electron cloud (see Fig. 7; 31). The ability of the detector to determine the position of the original particle track by localizing the primary ionization is limited by the diffusion of the charge cloud before signal collection. A sufficiently large applied electric field can increase the electron energy to such an extent that collisions with molecules result in further ionization. The critical field in the range 5–20 kVcm1 is required for this ionization multiplication. For a given gas in an electric field above the critical value, there is a characteristic length that electrons will travel, on average, to create an additional electron–ion pair. The inverse of this length is the Townsend coefficient
100
1000
500
2000
E (V/cm at 1 atm)
0.4 0.3 0.2 0.1
y (cm)
Figure 7. Electron diffusion per centimeter of drift for various gases at STP (28).
0 −0.1 −0.2 −0.3 −0.4 −0.6
−0.4
−0.2
0 x (cm)
0.2
0.4
0.6
Figure 8. MWPC electric field lines (7). Ionization charge drifts along the field lines to the anode wires. Close to the wires, the field strength increases, accelerating the charge and resulting in charge amplification. The vertical dashed lines indicate virtual boundaries provided by the uniform field.
α. Fast electrons create ionization after a characteristic distance given by 1/α; the ionized electrons are accelerated in the electric field and create more ionization across the same distance scale. For a uniform electric field, this results in an avalanche of ionization that grows exponentially by exp(α z), the effective charge gain of the chamber after a distance z.
PARTICLE DETECTOR TECHNOLOGY FOR IMAGING
Screening electrodes
Cathode drift wires
Field wire −HV 1
Anode wire +HV 2
Field wire −HV 1
Wire chambers use the high field region around thin anode wires to create an avalanche that produces a larger, more easily detected signal. In the proportional mode, multiplication occurs in a limited region near the anode. Initial ionization drifts without multiplication across most of the chamber volume. Only once the charge cloud reaches the anode does the avalanche occur. Thus, the collected signal charge is proportional to the original ionization. The avalanche signal collected on the anode consists of two components: a very fast pulse due to the arrival of the electrons and a much slower signal induced by the motion of ions away from the anode wire that typically lasts some microseconds. The latter provides most of the detectable signal. Electronic filtering of the signal by using differentiating electronics restores the timing information and keeps the fastest component of the signal. The multiwire proportional chamber (MWPC) (32) achieves a uniform drift field across most of the sensitive volume, and a series of equally spaced, thin anode wires is sandwiched between the cathode planes (see Fig. 8). The drift chamber takes this concept further and drifs charge across larger distances. The initial ionization charge is localized by measuring the time taken for the charge to drift from its source to the signal electrode. The calibrated drift velocity is applied to extract the distance traveled. For precision particle imaging, the ionized charged must be localized to maintain positional accuracy. Detailed knowledge of the electric drifting field is required, as well as of the optimized gas mixture and pressure, and the field strength, to minimize diffusion of the electron cloud. Figure 9 shows a schematic of the drift chamber concept (33). A uniformly changing electric potential distribution is set up on the cathode wires away from the anode wires. Close to the anode, field uniformity is lost, where the avalanche field is set up. Geometries vary from planar to cylindrical (see Fig. 10). The inherent ambiguity in selecting between symmetrically placed tracks on either side of the anode wires can be resolved by staggering the anode wires. The stagger provides a time offset of opposite sign for signals coming from each side of the wires. A consistent fit to the signals can thus resolve the ambiguity. The time projection chamber (TPC) takes the drift concept to the limit (34). Using a large active volume that has parallel electric and magnetic fields, the charge cloud dispersion is minimized. TPCs that have drift distances of 1–2 m have been successfully operated (35). A proportional mode anode wire or anode pad plane that provides high-speed time digitization of the signals
1161
Figure 9. Principle of operation of the drift chamber (30). Particles that perpendicularly to the drift chamber produce ionization electrons that drift toward the anode wire. The cathode wires provide a uniform field through most of this drift region. Measuring the drift time, together with knowledge of the drift velocity obtained in calibration measurements, localizes the position of the track.
(a)
(b)
Figure 10. An example of a cylindrical drift chamber from the BELLE experiment (31).
1162
PARTICLE DETECTOR TECHNOLOGY FOR IMAGING
results in three-dimensional reproduction of the original ionization. Two of these coordinates result from the end plane anode geometry, the third from timing. The end-cap detecting element of the TPC, uses the multistep avalanche chamber to reduce the distorting effects of the anode wire on the drift field shape. The large volume drift space that has a uniform electric field is isolated from the anode field by one or more planes of mesh or wires. Gas amplification can proceed by using the parallel plate amplification region of high field or via the MWPC anode wire avalanche. The additional grids can be used to gate the drift electrons from the amplification region, allowing the possibility of accepting signals only at selected times and reducing the ion buildup in the avalanche region (see Fig. 11). Drift Tubes. In high rate environments, multiple particle closely spaced tracks degrade the performance of an open structure drift chamber. A drift tube that consists of an anode wire in the center of a conducting cathode tube significantly reduces the volume for multiple track interference. The positional resolution within the tube is
(a)
Drift
Anode
Cathode
Substrate Back-plane
Figure 12. Electric field lines in an MSGC. The ionization charge produced in the upper drift region is drifted toward the anode. The increased field strength in the anode region results in charge amplification for signal detection.
2.0 1.8
determined by the drift time from the track ionization to the central anode. An example of the performance of a large drift tube array can be found in (36).
y (cm)
1.6 1.4 Gating grid
1.2
Shielding grid (−V) 1.0 0.8
Anode wires (+HV)
0.6
Cathode (−V) −0.6
−0.4
−0.2
0 x (cm)
0.2
0.4
0.6
−0.6
−0.4
−0.2
0 x (cm)
0.2
0.4
0.6
(b) 1.8 1.6
y (cm)
1.4 1.2 1.0 0.8 0.6 0.4
Figure 11. Example of the use of a gating grid to isolate the signal anode until a trigger is received (7). (a) Gating grid ‘‘closed,’’ field lines end on grid wires; ionization charge from the upper drift volume drifts along field lines and does not reach the anode wires. (b) Gating grid ‘‘open,’’ field lines guide drifting ionization electrons to the anode wires for detection
Microstrip Gas Chambers. The microstrip gas chamber (MSGC), is a development of the multiwire proportional chamber. It uses an insulating substrate that has thin conducting strips on the surface and alternating positive and negative potentials on the strips (see Fig. 12). Two important improvements are achieved by using MSGCs [see, for example, (37,38)]. First, submillimeter pitch can be achieved. In MWPCs, such fine wire spacing is impractical. Second by using a closed spacing (100 µm) of anode and cathode conductors, positive ions are cleared from the anode region much more quickly than by the millimeter or greater spacing in MWPCs. At high particle rates, the slow moving positive ions build up in the anode region, resulting in a space charge that reduces the electric field strength and hence, the avalanche gain and detection efficiency. Significant loss in particle detection efficiency occurs at particle rates of 104 –105 counts/mm2 s. To take full advantage of this improvement, the substrate must be slightly conducting in the region between the cathode and anode strips. A semiconductor substrate or the use of surface treatments achieves the necessary surface resistance. Count rates in the millions mm2 s1 have been observed without significant reduction in the gas gain. Micropattern Gas Detectors. Variants of the MSGC that have micropatterned structures to provide the necessary electric field shaping have been developed. They offer more robust operating conditions than MSGCs which are sensitive to breakdown in the presence of heavily ionizing particles such as alphas or heavy ions. Gas electron
PARTICLE DETECTOR TECHNOLOGY FOR IMAGING
(a) Conversion and drift
Amplification
Transfer
(b)
Energy trigger
X-coordinate
Y-coordinate Figure 13. (a) Schematic of a single stage GEM detector that has a two-dimensional readout (40). Increased field strength in the amplification region of the detector separates the drift and transfer regions. (b) Signals induced in the amplification plate are proportional to the total ionization charge passing from the drift region. Crossed x, y anode planes provide positional information.
multiplier (GEM) devices achieve the necessary anode field shapes by using a perforated metal (cathode) sheet overlying an anode layer (see Fig. 13).
1163
The multistage GEM (43) is an extension of the multistep wire chamber (41) to this technology. This device consists of a double-sided metal on an insulating substrate and an array of 100-mm diameter holes on a 200-mm pitch spacing (see Fig. 14). External electric potentials are arranged to drift electrons created by ionization toward the GEM. High fields in the hole regions generate avalanche conditions. The avalancheproduced electrons pass through the hole in the lower region, where further external fields drift the charge cloud away from the avalanche region to a detection electrode. The separate electrodes for avalanche multiplication and charge detector provide for the possibility of protecting the readout electronics from the effects of the discharge. Multistage GEM detectors allow reducing the avalanche gain on each stage, resulting in safer operating conditions. Ring-Imaging and Water Cerenkov Counters. Ringimaging Cerenkov counters (RICH counters) image the cone of Cerenkov light emitted when a particle passes though a ‘‘radiator.’’ RICH devices often use a dual radiator, one a liquid or a heavy, high-pressure gas, the second a low-pressure gas or lightweight aerogel. Mirrors are used to focus the light onto a photon-sensitive detector plane. The photons are typically UV. UV-absorbing impurities in the radiators must be tightly controlled. The large-area photon detector is usually a TPC or drift chamber using a dopant gas that generates photoelectrons from UV absorption. The dopant gas TMAE (tetrakisdimethylaminoethylene) has been successfully used in this application. The avalanche process itself generates UV photons along with ionization. Designs to prevent feedback from these photons include shields and electrical gating grids. An example of a double radiator RICH counter is shown in Fig. 15 (from 44). RICH counters are used mostly to identify particles of different mass. However, the Cerenkov rings produced can be used as a tracking signal. Large water Cerenkov detectors, whose capacity is thousands of tons of water, equipped with arrays of photomultiplier tubes, have been used to measure extraterrestrial neutrinos [see Fig. 16 from (45)]. Semiconductor (silicon) Detectors. In forming crystalline solids, atoms are bound into regular arrays. The
−VD Drift gap −VGEM1 −VGEM2
Transfer gap Induction gap
X strips Y strips
Figure 14. Schematic of a double GEM detector (40). Using two amplification stages, the amplification gain required for detection is the product of the two lower gain processes, resulting in more stable operation.
1164
PARTICLE DETECTOR TECHNOLOGY FOR IMAGING
(a) PMT plnne
Dry N2
Photons
Particle
Lucite
Aluminum Mirror C4F10
Acrogel tiles
(b)
PMT matrix
Figure 16. Kamiokande water Cerenkov detector (42). This picture is a depiction of signals from a high-energy interaction in a large water vessel, instrumented with photomultiplier tubes on the inside faces of the vessel. Cerenkov light is emitted from a cone about the axis of high-energy particles. Clearly seen is the ring of signals emanating from a high-energy particle in the water.
Soft steel plate Mirror array
Aerogel tiles Aluminum box Figure 15. Example of double-radiator RICH counter (41).
atomic electron (line) energies are broadened into energy bands through neighbor interactions. In (‘‘intrinsic’’) semiconductor crystals, the atomic valence electrons bind neighbor atoms. At zero absolute temperature, all valence electrons are bound in this way, and the valence energy bands are full. Electrical conductivity is zero. The first excited atomic states form the conduction band in semiconductors, and the gap between valence and electron bands is approximately 1 eV (Table 1). Above zero temperature, thermal fluctuations promote some valence electrons into
the conduction band, leaving a vacancy or ‘‘hole’’ in the valence band. The conduction band electrons and the holes in the valence band are thus thermally generated current carriers and contribute to limited electrical conductivity. At room temperature, the density of conduction band electrons is approximately 1011 cm3 in a crystalline intrinsic semiconductor. The current carrier density follows the Boltzmann temperature dependence and is proportional to exp(Eg /kT), where Eg is the energy band gap, k is Boltzmann’s constant, and T is the absolute temperature. Doped or ‘‘extrinsic’’ semiconductors provide control over the electrical properties of semiconductor devices. A (typically small) concentration of impurities is added to pure, crystalline semiconductors, replacing some of the intrinsic atoms in the lattice. The valence of ‘‘p-type’’ impurities is 3, and ‘‘n-type’’ have valence 5. There are only three valence electrons in p-type impurities to bind with neighbors, resulting in an easily filled bond in the neighboring region. Other electrons in the valence band
Table 1.
Atomic number, Z Density (g cm3 ) Radiation length (g cm2 ) Band gap energy (eV) Energy per e–h pair,(eV) Mobility, µ (cm2 V1 s1 ), e Mobility, µ (cm2 V1 s1 ), h Dielectric constant, ε
C Diamond
Si
Ge (at (T D 77K)
GaAs
CdTe
HgI2
6 3.36 40.3 6.0 13.6 1,800 1,200 5.7
14 2.33 21.82 1.1 3.61 1,350 480 11.9
32 5.32 12.25 0.74 2.98 38,000 45,000 16.2
31–33 5.35 12.19 1.45 4.2 8,500 400 13.1
48–52 6.06 8.91 1.47 4.43 1,000 65 10
53–80 6.30 7.44 2.1 4.22 100 4 7.4
PARTICLE DETECTOR TECHNOLOGY FOR IMAGING
can fill this bond, leaving a hole in the valence band. Boron, gallium, and indium are p-type impurities. n-type impurities such as antimony, phosphorous, and arsenic provide additional electrons in the conduction band. Intrinsic semiconductors have equal densities of ‘‘free’’ holes and electrons. At room temperature, doped semiconductors typically have larger current carrier densities than their intrinsic base materials. However, the product of the concentrations of conduction band electrons n and of valence band holes p is constant: np D nc nv exp(Eg /kT)
(15)
where nc , nv are the density of states in the conduction and valence bands, respectively. By fabricating a junction between n-type and p-type materials, the higher concentration of mobile electrons and holes, respectively, causes electrons from the n-type valence band to diffuse across the junction into the p-type region, and holes from the p-type conducting band diffuse into the n-type. These diffusing carriers ‘‘recombine’’ in the junction region. The initially charge neutral regions will then have a fixed lattice charge, positive on the n-type side and negative on the p-type side. The charged region, called the depletion region, produces an electric field across the junction, which opposes the diffusion process and results in an equilibrium depletion width that has a ‘‘built-in potential’’ Vbi across the junction. An externally applied ‘‘reverse’’ bias of appropriately chosen polarity across the junction will move majority carriers away from the junction, increasing the depletion width and increasing the electric potential difference across the junction. The remaining current is due to minority carriers, created primarily by thermal generation in or near the depletion region that drift across the junction under the influence of the electric field. The width of the depletion region depends on the doping density and the applied voltage VB as xd D
2εµe ρ(Vbi C VB )
1165
The electric field across the junction separates the electrons and holes and induces a current pulse through the device. The measurement of the current pulse above the quiescent dark current is the task of low noise electronics using appropriate filtering. Table 1 shows relevant properties of various semiconductor materials, including the energy required to create electron–hole pairs. Approximately 80 electron–holes pairs are liberated, on average, by a MIP per micrometer of path length in silicon. Thus, for fully depleted silicon detectors 300 µm thick, a MIP passing perpendicularly through the detector generates a signal pulse of approximately 24,000 electron–hole pairs, on average. The ionization process is stochastic, resulting in significant fluctuations in the signal, which must considered when evaluating the signal-to-noise ratio and the margins required for clean signal detection (46).
Strip Detectors. The application of microelectronics techniques to the development of silicon strip detectors has been extremely fruitful for high precision particle tracking (47–52). A large-area p-n diode is segmented into narrow strips along the length of the diode. Each strip is coupled to a separate readout electronic channel (see Fig. 17). The position of a particle incident on the strip detector is localized by determining the strip (or strips) upon which the signal is induced. Measurement of the total charge signal also provides a measure of the total energy deposited in the silicon. Detectors that have a strip pitch of 25 µm are produced routinely. Diffusion of the signal charge can result in spreading the signal across neighboring channels. Analog readout of the signals and interpolation between the strips according to signal size can result in even better resolution. Using a uniform, approximately planar electric field in the bulk of the detector, the charge is shared between neighboring strips approximately linearly with the relative position of the particle trajectory between the
(16) −Ve bias
where Vbi is the built-in junction potential, ε is the permittivity constant, µe is the electron mobility, and ρ is the material resistivity. Using a relatively high resistivity material (1–20 kWcm), an applied external bias (typically 50–500V) results in an appreciable depletion width (typically 20 µm for silicon CCDs to 300 µm for silicon strip detectors). Devices are typically fabricated using an ion-implanted, heavily doped (low resistivity) region close to the surface of a high resistivity bulk semiconductor. Thus, the depletion region grows asymmetrically from applied bias and reaches into the high resistivity bulk region. Detectors become ‘‘fully depleted’’ when the bias is sufficient to deplete the entire thickness of the bulk of charge carriers. A high-quality, depleted, reversed bias semiconductor diode has a low ‘‘dark current’’ and a large depletion thickness. Ionizing particles passing though the detector create electron–hole pairs along their trajectory in the semiconductor. In the depletion region, this ionization injects a significant charge into this carrier-free region.
Highenergy charged particle Aluminum contact
Depletion depth, Xd n-type
Bias resistor Preamplifier Decoupling cap.
+− p+ doped − + + −+ junction − + +− − Ionization + − + +− + n+ doped region silicon −− +
SiO2 or SiN passivation
Aluminum contact
Figure 17. Schematic of a cross section through a p-n silicon detector. The applied bias depletes the silicon of charge carriers in the depletion region. The ionization resulting from a high-energy particle passing through the depletion region drifts to the top and bottom faces of the detector. Electronics detects this charge pulse. Sufficient bias fully depletes the silicon, resulting in maximum efficiency for charged particle detection.
1166
PARTICLE DETECTOR TECHNOLOGY FOR IMAGING
strips. The quantity ηD
Sig2 Sig1 C Sig2
(17)
gives a good approximation of the relative distance between strips. Sig1 and Sig2 are the signal charges from neighboring strips. Corrections for nonlinearity close to the strips can be determined via calibration using tracks of well-defined positions. Capacitive coupling between the strips can be exploited to reduce the number of independent electronic channels needed to read out the signals. This coupling results in the distribution of the charge deposited amongst several strips. By reading out every second channel (or even fewer) using analog electronics, the center of the charge signal can be determined by interpolation between neighboring readout channels. Signal-to-noise ratios in excess of 10 are required to maintain the positional resolution of the configuration when all strips are read out. The intermediate strips need to be held at the same potential as the readout strips to avoid distorting the planar electric field. Distribution of the bias potential to the strips must be via high-resistance connections. The number of intermediate strips must be optimized. The advantage of fewer electronics channels must be weighed against the increased capacitive load on the readout electronics and hence, on the larger amplifier noise. Furthermore, some signal charge is lost via capacitive coupling to the back-plane, an effect that increases as the number of intermediate strips increases. The bias applied to deplete the detector will generally need to be decoupled from the amplifier inputs. This can be achieved by using external decoupling networks;
however, a more convenient approach is to provide the decoupling capacitors directly on the strips. This is achieved by applying an insulating SiO2 or Si3 N4 layer over the metallized diode strips, and metallization is on top for connection to the readout (53). [See Fig. 18 (54).] Capacitance in excess of 100 pF can be generated in this way. Such a decoupling capacitance that is much larger than the silicon sensor strip capacitance ensures practically complete charge collection by the amplifier circuits. Biasing is provided by polysilicon bias resistors fabricated on the detector. Alternatively punch-through (or reach-through or foxfet) biasing can be applied (55). The simple p-n diode detectors, described so far, collect holes on the readout strips. The electrons drift to the back-plane. A second coordinate can be obtained by segmenting the back-plane into strips orthogonal to the p-side strips. However, the strip isolating oxide contains a positive charge and results in electron accumulation at the oxide–silicon interface, eventually shorting the segmented strips. Various techniques have successfully alleviated this problem. Implantation of p-doped ‘‘blocking strip’’ in the center of the isolating oxide strips breaks the otherwise continuous accumulation layer. Alternatively, a biased metal gate over the isolation oxide can control the accumulation, or a compensating p-doped region can be implanted in the back side before the oxide layer is grown (56). Double metal processing of the silicon, where a second set of strips isolated from the first metal layer by an oxide is added, offers considerable practical convenience. Using an appropriately designed layout, the crossed strips from the top and bottom sides can both be read out from the same end of the silicon detector. This is particularly useful when cascading several detectors to a single set of readout electronics. [see Fig. 19 (57).]
Aluminum x-coordinate readout strips Aluminum layer
Guard rings Bias bond pad Polysilicon bias resistors
Strip wire bond pads
5
µm
Insulator
Aluminum strips over insulated n+ strips p+ isolation rings
p+ stop
300
µm
n+ readout strip
p+ strip
Floating strip
Figure 18. Schematic view of a corner of the nC side of a capacitively coupled silicon microstrip detector (51). Shown are the polysilicon bias resistors connecting the ion-implanted nC strips to the bias ring and aluminum readout strips over the implanted regions. These are isolated from the silicon by an insulating layer, usually of silicon dioxide, silicon nitride, or polyimide, that creates a decoupling capacitor between the bias silicon strips and the readout electronics.
Aluminum y -coordinate readout strips
Figure 19. Schematic of a double-sided silicon strip detector, using ‘‘double metal technology’’ on the top layer. The additional aluminum layer allows connecting both coordinate directions at the end of the detector and offers considerable simplification in building a system of such detectors.
PARTICLE DETECTOR TECHNOLOGY FOR IMAGING
(a)
Readout electronics hybrid
Support brackets and cooling pipes
Silicon detector ladders
Beryllium beam pipe
(b)
(c)
1167
quarks in high-energy physics experiments. Track resolutions less than 10 µm are routinely achieved, resulting in determining the position of the particle decay vertices to a precision of 10–100 µm. [See Fig. 20 (57)]. Charge-Coupled Devices. Application of the highly successful optical CCD technology for particle detection and imaging has been successful based on some modifications to these structures. The CCD architecture results in routing all elements of the device through a single output channel, making the readout time long. Furthermore, the device is sensitive to signals from later arriving particles during the readout phase. Therefore, the CCD is an appropriate solution for low rate environments (58). Silicon Drift Detectors. Silicon drift detectors (59) are fabricated where p-n junctions are on both sensor surfaces. The junctions are in the form of strips. These devices are depleted from both surfaces. An electric field gradient parallel to the device surface is provided by graded potentials on the strips. Electrons generated by ionizing particles are collected in the middle of the silicon where the potential is minimum, and are drifted out of the sensor under the influence of the parallel field component to readout electrodes formed by a segmented ion-implanted nC region. [See Fig. 21 (60] The combination of readout electrode segmentation and drift time provides a twodimensional localization of the original ionization charge. Optimum fabrication and operation of these devices demands low-defect, high-uniformity silicon wafers to provide good charge collection, precision on-detector voltage dividers to provide good drift field uniformity, and precise temperature control to maintain stable drift velocity. Spatial resolution of 50 µm from timing and 30 µm from anode segmentation have been demonstrated (61). Pixel Detectors. True, two-dimensional, segmented semiconductor detectors that have active, individual element readout electronics go beyond CCDs in speed, but also in complexity. The advent of very high rate, high particle multiplicity interactions in recent years
n+
Figure 20. (a) Schematic cross section of the BELLE silicon vertex detector (55). (b) Photograph of the BELLE silicon vertex detector. (c) Tracks originating from a particle interaction, reconstructed using the BELLE vertex detector together with reconstructed interaction using the entire BELLE detector.
The silicon strip detector has matured principally through its development and use for measuring tracks emanating from short-lived particles containing heavy
p+
− + − − ++ − + − +
p+
p+
p+
p+
p+
p+
p+
p+
x z y
Figure 21. Schematic of the silicon drift detector. Ionization from the charged track is drifted under the influence of the electric field, provided by the pC strips to which a graded electric potential is applied, to the segment nC strip attached to the electronics readout.
1168
PARTICLE DETECTOR TECHNOLOGY FOR IMAGING
has driven the development of these new active pixel devices (62). Both monolithic and hybrid devices are being developed. The latter are considerably more advanced; separating the sensor from the electronic chips allows parallel development of the two components. It also facilitates the use of different sensor materials. Silicon sensors are the most mature technology and offer good performance for charged particle detection. Compound semiconductors of higher atomic number such as GaAs are being pursued for X- and gamma ray applications. Diamond sensors offer very high radiative hardness. Rectangular pixels of areas as low as 55 ð 55 µm have been produced (63). The small element size has the further advantage of reduced input capacitance to the readout amplifier, resulting in considerably better noise performance. Both binary and analog electronics are being developed, resulting in tracking precision of 10–20 µm in the former and approaching 1 µm in the latter. The hybrid approach uses metal bump bonding to attach the electronic chip to the segmented sensor. The metal typically used is indium or some alloy that has similar characteristics. Electronic Signal Detection. The necessary signal processing and electronic readout to measure and record the detected signals and produce is an essential part of the particle detection process. The reader is directed to the literature for details of this subject (64–66). Radiation Damage Wire Chamber Aging. Prolonged operation of wire chambers exposed to large particle fluxes leads to deposits on the anode wires. These aging effects are strongly affected by the gas mixture used and on the operating conditions, especially the gas gain. High gain operation results in larger charge exposure of the anode wires that results in forming polymerized layers and ‘‘whiskers.’’ The build-up on the anode wires results in reduced field strength in the multiplication region and hence reduced signal gain. Whiskers can lead to electrical breakdown. Operation in a lower gain mode can extend the lifetime of wire chambers (67,68). Radiative Damage in Semiconductor Detectors. Radiative damage in silicon detectors is of two basic categories: bulk damage caused by displacement of atoms from the lattice and surface damage due to charge buildup at the interface between silicon and oxide layers. Bulk damage depends upon the nonionizing energy deposited in the lattice. The damage rate is thus radiation type and energy-dependent. The displacement damage results in increased dark current, signal charge trapping, and changes in the effective charge carrier density. The increase in dark current density is proportional to the radiative particle fluence: Idark D α.
(18)
Annealing of the damage that produces the current increase reduces the effective damage constant α. After long-term annealing (approximately 1 month at room temperature), it reaches a value of α ³ 3 ð 1017 A cm1 for high energy protons and pions and α ³ 2 ð 1017 A cm1 for 1 MeV neutrons. The effective carrier concentration in initially n-type silicon changes approximately as N D N0 eδ β
(19)
where N0 is the initial carrier concentration of the n-type silicon, δ ³ 6 ð 1014 cm2 is the donor removal constant, and β ³ 0.03 cm1 is the acceptor creation constant (69). Acceptor creation dominates At high fluences. Finally, the effective carrier concentration changes sign, typically at fluences above 3 ð 1013 cm2 (70). The increasing effective carrier concentration that develops at higher fluences requires a large bias to deplete the detector fully. Surface damage depends predominantly on the ionization created in the device that is trapped at interfaces between insulating and semiconducting layers. Such a charge buildup can result in increased surface currents, can result in FET structures failing, and cause strip isolation structures to become conducting. These effects depend strongly on the processes used in the fabrication and on device geometry and features. Other semiconductors have been developed in view of potentially improved radiative insensitivity. GaAs and diamond detectors have been constructed. GaAs has been found relatively resistant to neutron radiation but relatively more sensitive than silicon to high-energy pion and proton irradiation. Diamond detectors are being developed as an extremely radiation-tolerant alternative to silicon. Crystal growth optimization for improved charge signal collection has produced encouraging results for this material which is tolerant to radiative fluences well above those where significant degradation sets in for silicon (71). BIBLIOGRAPHY 1. B. Rossi, High Energy Particles, Prentice-Hall, 1952. 2. K. Kleinknecht, Detectors for Particle Radiation, Cambridge University Press, 1986. 3. W. R. Leo, Techniques for Nuclear and Particle Physics, Springer, Berlin, 1987. 4. T. Ferbel, ed., Experimental Techniques in High Energy Physics, Addison-Wesley, Menlo Park, 1987. 5. G. Charpak, ed., Research on Particle Imaging Detectors, World Scientific, Singapore, 1995. 6. C. Grupen, Particle Detectors, Cambridge University Press, 1996. 7. Particle Data Group, D. E. Groom et al., Eur. Phys. J. C15, 1 (2000). also available at url:http://pdg.lbl.gov/. 8. R. Akers et al., Z. Phys. C67, 203 (1995). 9. R. M. Sternheimer, M. J. Berger, and S. M. Seltzer, Atom. Data Nucl. Data Tables 30, 261–271 (1984). 10. S. M. Seltzer and M. J. Berger, Int. J. Appl. Radiat. Isotopes 35(7), 665–676 (1984).
PHOTOCONDUCTOR DETECTOR TECHNOLOGY 11. L. D. Landau, J. Exp. Phys (USSR) 8, 201 (1944). 12. M. Roco, Nucl. Instrum. Methods A418, 91–96 (1998). 13. Y. S. Tsai, Rev. Mod. Phys. 46, 815 (1974). 14. V. Karimaki, Nucl. Instrum. Methods A410, 284–292 (1998). 15. W. Innes, Computer Program TRACKERR, Version 2.04, 1998. Available from ftp://ftp.SLAC.Stanford.edu/ software/trackerr. ¨ 16. R. Fruhwirth, in CERN School of Computing, Bad Herrenalb, Germany, 20 Aug.–2 Sep. 1989, C. Verkerk, (ed.), CERN-9006, CERN Geneva, 1990. 17. R. Bock, H. Grote, and D. Notz, in W. Regler and ¨ R. Fruhwirth, eds., Data-Analysis Techniques for HighEnergy Physics, 2d, Cambridge Mono. Part. Phys. Nucl. Phys. Cosmol., 11, Cambridge University Press, 2000. 18. R. Wigmans, Calorimetry: Energy Measurement in Particle Physics, Oxford Clarendon Press, 2000. 19. R. Wigmans, Annu. Rev. Nucl. Part. Sci. 41, 133–185 (1991). 20. J. B. Birks, The Theory and Practice of Scintillation Counting, Pergamon, London, 1964. 21. T. Foerster, Ann. Phys. 2, 55 (1948). 22. I. Berlmann, Handbook of Fluorescence Spectra of Aromatic Molecules, Academic Press, NY, 1971. 23. W. W. M. Allison and P. Wright, in T. Ferbel, ed., Experimental Techniques in High Energy Physics, Addison-Wesley, 1987. 24. B. Dolgoshein, Nucl. Instrum. Methods A326, 434 (1993). 25. Hamamatsu Photonics, Japan. (http://www.hpk.co.jp). 26. R. C. Ruchti, Annu. Rev. Nucl. Sci. 46, 281–319 (1996). 27. P. Annis et al., Nucl. Instrum. Methods A409, 629 (1998). 28. B. B. Beck et al., Nucl. Instrum. Methods A412, 191 (1998). 29. S. Biagi et al., Nucl. Instrum. Methods A283, 716 (1989). 30. V. Palladino et al., (1975).
Nucl.
Instrum.
Methods
128,
323
31. G. Charpak et al., Nucl. Instrum. Methods 62, 262 (1968). 32. G. Charpak, F. Sauli, and W. Duinker, Nucl. Instrum. Methods 108, 413 (1973). 33. G. Alimonti et al., Nucl. Instrum. Methods A453, 71–77 (2000).
1169
48. J. Kemmer, Nucl. Instrum. Methods 169, 499 (1980). 49. E. H. M. Heijne et al., Nucl. Instrum. Methods 178, 331 (1980). 50. J. B. A. England et al., Nucl. Instrum. Methods 185, 43 (1981). 51. J. T. Walker et al., Nucl. Instrum. Methods 226, 200 (1984). 52. H. W. Kraner et al., IEEE Trans. Nucl. Sci. 30, 405–414 (1983). 53. M. Caccia et al., Nucl. Instrum. Methods A260, 124 (1987). 54. P. Allport, private communication. 55. J. Kemmer and G. Lutz, Nucl. Instrum. Methods A273, 588 (1988). 56. G. Lutz and A. S. Schwarz, Annu Rev. Nucl. Part. Sci. 45, 295–335 (1995). 57. Belle Collaboration, internal note. 58. G. Alimonti et al., Nucl. Instrum. Methods A453, 71–77 (2000). 59. C. Damerell et al., Nucl. Instrum. Methods A288, 236 (1990). 60. P. Rehak et al., Nucl. Instrum. Methods A235, 224 (1985). 61. P. Weilhammer, Nucl. Instrum. Methods A453, 60–70 (2000). 62. E. H. M. Heijne, CERN EP/2000-137, 2000. 63. B. Mikulec, M. Campbell, G. Dipasquale, C. Schwarz, and J. Watt, Nucl. Instrum. Methods A458, 352–359 (2001) (Medipix collaboration). 64. V. Radeka, IEEE Trans Nucl. Sci. NS-21, 51 (1974). 65. F. S. Goulding and D. A. Landing, IEEE Trans. Nucl. Sci. NS-29, 1,125 (1982). 66. H. Spieler, IEEE Trans. Nucl. Sci. NS-29, 1,142 (1982). 67. J. A. Kadyk, Nucl. Instrum Methods A300, 436–479 (1991). 68. J. Vavra, Nucl. Instrum. Methods A252, 547–563 (1986). 69. G. Lindstroem et al., Nucl. Instrum. Methods A426, 1 (1999). 70. R. Wunstorf, PhD Thesis, University of Hamburg, 1992. 71. RD48 Collaboration: S. Schnetzer et al., IEEE Trans. Nucl. Sci. 46, 193–200 (1999).
34. D. R. Nygren and J. N. Marx, Phys. Today 31, 46 (1978). 35. W. Blum and L. Rolandi, Particle Detection with Drift Chambers, Springer-Verlag, 1994. 36. W. Riegler et al., Nucl. Instrum. Methods A443, 156–163 (2000).
PHOTOCONDUCTOR DETECTOR TECHNOLOGY ANDREW MELNYK
37. A. Oed, Nucl. Instrum. Methods A263, 351 (1988). 38. R. Bellazzini and A. M. Spezziga, Rivista del Nuovo Cimento 17, 1 (1994). 39. F. Sauli and A. Sharma, Annu. Rev. Nucl. Part. Sci. 49, 341–388 (1999). 40. F. Sauli, New Developments in Gaseous Detectors CERNEP/2000-108, 2000. 41. G. Charpak et al., Phys. Letters 78B, 523 (1978). 42. F. Sauli, Nucl. Instrum. Methods A386, 531 (1997). 43. S. Bachmann et al., CERN-EP/2000-116, 31 August, 2000. 44. N. Akopov et al., preprint DESY-2000-190 (physics/0104033). 45. Y. Fukuda et al., Phys. Lett. B81, 1,562–1,567 (1998). 46. H. Bichsel, Rev. Mod. Phys. 60, 663–699 (1988). 47. G. Lutz, Semiconductor Radiation Detectors: Device Physics, Springer-Verlag, NY, 1998.
Rochester, NY
INTRODUCTION Photoelectric detectors are semiconducting devices that convert optical signals into electrical signals. They rely on the quantum nature of matter, whereby incident photons, ranging from high energy gamma rays through visible light to infrared rays, excite electrons in matter to produce an electrical charge. The operation of a photoelectric detector involves two steps; (1) conducting charge generation by an incident light photon and (2) charge transport and/or multiplication by whatever current gain mechanism may be present. Photoelectric detectors are classified as photoemission and photoconductive detectors. Photoemission
1170
PHOTOCONDUCTOR DETECTOR TECHNOLOGY
occurs when the energy of the photon hν is sufficient to eject an electron from the surface of the solid into vacuum, and the phenomenon is the basis for photomultiplier tubes. Photoconductivity, sometimes referred to as internal photoemission, is observed in a solid when the absorbed photon raises an electron from a nonconducting energy state to a conducting energy state where it contributes to electrical conductivity. The origin of the field of photoconductivity is attributed to Willougby Smith (1) who noted changes in the resistivity of selenium resistors when they were exposed to light. Systematic treatment of the photoconductivity process was initiated in the 1920s by the investigations of Gudden and Pohl and the G¨ottingen school (2), principally using zinc sulfide, diamond, and alkali halides. The investigation of photoconductivity in the first half of the twentieth century benefited greatly from the rapid growth of experimental and theoretical interest in condensed matter physics and chemistry. The period from the 1920s to the 1950s saw a rapid expansion in the understanding of photoconductivity. The quantum theory of matter, on the one hand, and industrial interest in devices, on the other, were the prime motivators. The experimental effort, especially in the 1940s and 1950s, was strongly influenced by the television industry and focused primarily on covalently bonded semiconductors, although some notable exploratory studies of the phenomenon in molecular solids were carried out. The detailed quantitative studies, concepts, and measurements of photoconductivity developed by the middle of the twentieth century are discussed in the now classic books by Bube (3), Rose (4), and Ryvkin (5). In the second half of the twentieth century, the explosive growth of the semiconductor industry led to the development of a wide variety of specialized photodetectors based on crystalline semiconductors of covalently bonded silicon and germanium, as well as compounds of elements from the III–V and II–IV columns of the periodic table of elements. During the same period, the photocopying and later the laser printing industries led to the development of very large area photodetectors. These initially used amorphous chalcogenide glasses but now almost exclusively use organic electro-optical materials. For a comprehensive list and detailed discussion of these organic photoconductors, the reader is referred to the book by Borsenberger and Weiss (6). This section on photoconducting detectors explains the fundamental concepts of photoconductivity and describes the elementary photoconducting devices, their design, materials, and how they work. Except for two, photodectors incorporated in specific imaging devices and an explanation of their operation are not considered here because they are discussed in other sections on specific applications such as UV or IR detectors, CCD imagers, etc. The exceptions are electrophotographic and vidicon-type television camera photoconductors that can be understood only as imaging devices. In contrast to other imaging photodetectors, electrophotographic photoconductors are not covered in any other sections. For this reason and because they differ significantly in their design, materials and operation
from the other photodetectors, most of this section is devoted to a detailed discussion of electrophotographic photoconductors.
PHOTOCONDUCTOR CONCEPTS Band Theory of Matter Before discussing photoconductors, it is useful to review the modern band theory of solids, which provides the basis for understanding the electrical and photoelectrical properties of solids. The discrete, outer electron energy levels of an atom within a solid are broadened into bands of allowed energies several electron volts (eV) wide. Each band actually contains N discrete levels, but because N, the number of atoms per cubic centimeter, is about 1022 , these levels are separated by 1022 eV and hence are essentially continuous. Between these allowed energy bands are ranges of forbidden energies called band gaps. The widths of the bands and band gaps and their occupancy by electrons depends solely on the chemical elements of the solid and their atomic arrangement. The electronic nature of the solid is determined by the electron occupancy of the last or highest filled band. If the band is partially filled by electrons, it is called a conduction band, and the solid is a metal. Such a solid is a very good conductor because even a very small electrical field applied to the solid will excite electrons to adjacent unoccupied higher energy levels and hence accelerate them and cause an electrical current to flow. The energy level to which the band is filled is called the Fermi energy level. The other possibility is that the highest occupied band is completely filled. This band is called the valence band, and the empty one just above it is the conduction band. The electrons are not free to move because there are no empty energy levels in the valence band for them to be excited to. The electrons must be excited to the empty conduction band to provide a conducting charge, which requires energies equal to the band gap or larger. Thermal energy kT is one source of excitation and at room temperature provides 0.025 eV of energy. Because the probability of excitation is proportional to the negative exponential of the energy required beyond kT, the degree of conductivity depends on the size of the band gap. If the band gap is many eV, the material is an insulator, but if the band gap is 1 eV or smaller, the material is a semiconductor. When an electron is excited from a filled valence band to an empty conduction band, it creates a vacancy that behaves like a positive conducting charge, called a hole. The term ‘‘charge carrier’’ is used to designate either an electron or a hole that can contribute to conductivity. Semiconductors whose charge-carrier densities (number per volume) are thermally excited across the band gap are called intrinsic semiconductors. The Fermi level in an intrinsic semiconductor is in the middle of the band gap. Crystalline silicon (Si), whose band gap is about 1 eV, is the prime example of a semiconductor and is the workhorse of the electronic industry.
PHOTOCONDUCTOR DETECTOR TECHNOLOGY
Doping of selected atom impurities is used to enhance and control the conductivity of semiconductors. We will use Si as a model to explain doping, although the concept is quite general. The Si atom has four valence electrons, which create covalent bonds to four adjacent Si atoms in a crystal. If a phosphorous (P) atom that has five valence electrons replaces a Si atom, the extra unbonded electron will be very weakly attached to the phosphorous atom. Thus, it can easily be thermally excited into the conduction band, leaving a positively charged phosphorous ion, which is just a deeply trapped hole. This makes phosphorous an electron donor; a very low concentration of phosphorous atoms of the order of parts per million (ppm) in Si will make it an n-doped semiconductor (n for negative charge), where electrons are the majority charge carriers. The few holes that will still be produced by thermal excitation from the valence band are called the minority charge carriers. Due to the extra electrons provided by phosphorous atoms, the Fermi level is moved up from the middle of the band gap closer to the conduction band; the amount of shift depends on the concentration of phosphorous doping. In contrast, boron (B), which has three valence electrons, doped into Si is an electron acceptor and makes it a p-doped semiconductor, where holes in the valence band are the majority charge carriers. If n-doped and p-doped semiconductors are brought into contact, the Fermi levels are mismatched. This mismatch will cause charge to flow; holes from the p side and electrons from the n side result in regions of space charge (p region depleted of holes and n of electrons) that produces a voltage drop equal to the difference in Fermi levels, about 1 volt for Si. The matching of the Fermi levels results in a bending of the bands in a width equal to the depletion regions. Conductivity and Photoconductivity At the most elementary level, a photoconductor consists of a slab of material, assumed for simplicity to be a semiconductor, between two ohmic contacts (Fig. 1). The electron current density J (current per unit area) flowing in the slab is given by the density of electrons n and their net average velocity v, J D env D enµE,
(1)
where e is the electronic charge and, in the ohmic regime where the current is proportional to the applied voltage, v is equal to the product of electron mobility µ and electrical field E. The product enµ is the conductivity of the slab. For a semiconductor that has electron and hole charge carriers of charge densities n and p and corresponding mobilities, the current density becomes J D e(nµn C pµp )E.
(2)
Equation (2) is a general expression for the conductivity, and if n and p are the charge densities and are not illuminated, Eq. (2) is the dark conductivity. Photoconductivity is the increase in electrical conductivity due to the additional charge carriers produced by light. In practical
1171
Light
L Photoconductor Ohmic electrode
A
Figure 1. Schematic diagram of a photoconductor that consists of a slab of semiconducting material between two ohmic contacts.
photoconductors, dark conductivity is negligible compared to photoconductivity, but in general if we designate n and p as the additional light-produced charge carriers, the photoconductivity of the slab in Fig. 1 is given by e(nµn C pµp ). Because photogeneration drives the carrier concentration out of equilibrium, the charge carrier densities are governed by the generation rate per unit volume g of the electrons and holes and their loss due to recombination or trapping. The generation rate is equal to the product of photon absorption per unit volume and the photogeneration efficiency η, which is defined as the number of electron–hole pairs produced per absorbed photon and is often referred to as the quantum efficiency. Photogenerated charge carriers decay at a time constant τ , defined as the lifetime of the charge carrier before it becomes trapped or recombines. At a steady state or equilibrium photoconductivity, the generation rate is equal to the recombination rate; hence the density of photogenerated charge carriers is the product of g and τ . So the photocurrent density expression becomes J D eg(τn µn C τp µp )E.
(3)
The transit time for the charge carrier to travel the length of the slab L is tT D
L L L2 D D v µE µV
(4)
Provided that the light incident and hence the photon absorption are uniform over the length of the slab, Eq. (3) can be rewritten as τp τn C D egLG (5) J D egL tTn tTp where egL is the total charge generation rate per crosssectional area and is usually called the primary current density and τp τn C (6) GD tTn tTp is called the gain of the photoconductor. The gain or current multiplication can be understood as follows. Ohmic
1172
PHOTOCONDUCTOR DETECTOR TECHNOLOGY
contacts or electrodes require that when a charge carrier leaves the solid at one contact, it is replenished at the opposite contact and maintains the neutrality of the slab charge. Thus each photogenerated charge carrier makes τ/tT transits before it is lost, resulting in what is referred to as secondary current density. For devices that have very long lifetimes and very short electrode spacings, gains up to 106 are possible. The response or signal rise time of a photoconductor is determined by the transit time tT . To achieve short transit times, small electrode spacings and high electrical fields are used. In summary, the operation of a photoconductor requires charge generation by light and charge transport; hence the two key material properties of a photoconductor are the photogeneration efficiency η and the charge carrier mobility µ. Most single crystal semiconductors free of impurities and defects have intrinsic generation efficiencies close to unity and very high mobilities, for example, about 103 cm2 V1 sec1 for Si and 104 cm2 V1 sec1 for GaAs. The wavelength response spectrum of a photoconductor is determined by the absorption spectrum, which in turn is determined by the energy difference between occupied and empty energy levels. For an intrinsic photoconductor, this is the band structure of the material; the band gap defines the long wavelength cutoff, and the extremes of the valence and conduction bands determine the shortest wavelength. Extrinsic photoconductors are doped with impurities to produce energy states in the gap and thus enable photoconductivity at longer wavelengths. Because only one electron–hole pair can be produced per photon, the upper limit for η is unity and is usually less if some fraction of the charge pairs recombines or is trapped before the pairs are separated. However, for very energetic photons, for example, X-rays, multiple charge carriers can be produced as the hot excited charge carriers lose their energy by exciting additional charge. PHOTODIODES AND PHOTOTRANSISTORS The simple discrete, resistor-like photoconductor structure in Fig. 1 is used as a sensor, for example, in light meters or night detectors to switch on lighting. Typically these devices consist of a strip of cadmium sulfide (CdS) or selenide (CdSe) deposited on an insulating substrate that has electrical contacts. But the most widely used photoconducting detectors are photodiodes or phototransistors. They are almost exclusively planar sandwich structures that have transparent top electrodes to transmit the light. The advantages of planar structures are: first, that they can be produced in large quantities at low cost using photolithographic techniques and second, that they are amenable to large scale integration (LSI) techniques used by the semiconductor industry to produce imaging arrays that have a high density of discrete photodetectors for each pixel. Materials The semiconductor electronics industry uses single crystals of covalently bonded semiconductors of Si or germanium (Ge), III–V compounds such as gallium
arsenide (GaAs) and indium antimony (InSb), or II–VI compounds such as CdS and mercury cadmium telluride (Hgx Cd1x Te) for photodetectors. The choice of material is governed primarily by wavelength requirements, which is determined by the band structure. Because Si LSI technology is highly advanced, imaging photodetectors that have the highest pixel density are based on Si. Because the band gap determines the long wavelength cutoff for intrinsic semiconductors, infrared photodetectors require low band-gap semiconductors such as InAs, PbSe, InSb, Hgx Cd1x Te or Pbx Sn1x Te. (Varying the relative concentrations of Hg to Cd enables fine-tuning the cutoff wavelength from 4 to 30 µm and similarly for Pbx Sn1x Te.) These materials may be grown epitaxially on appropriate single crystal substrates. Large band-gap semiconductors such as Si can be made to photogenerate at longer wavelength by doping with appropriate impurities to introduce shallow energy states in the gap. For example, Si, which has a 1.2-µm wavelength cutoff, can be made sensitive to infrared light wavelengths between 2.5 and 7.5 µm by doping with In. By doping with different metals, the spectral response of Ge can cover the infrared range down to 100 µm. These long-wavelength intrinsic or extrinsic photodetectors must be operated at low temperatures because the dark conductivity would swamp any photocurrent. For details, see the sections on infrared detectors. Photodiodes Photodiodes rely on the electrical field produced by a p–n junction or Schottky barrier to separate photogenerated electron–hole pairs. The p–n photodiode consists of a p–n junction operated under reverse bias (Fig. 2). At the junction between the layers, the energy bands are bent to match the Fermi levels, resulting in a chargedepleted region and an electrical field across that region. Electron–hole pairs produced by photons absorbed in the depletion region are separated by the electrical field before they recombine and cause a current to flow in the circuit. For high-frequency operation, the thickness of the depletion layer must be kept small to reduce transit time. On the other hand, a thicker depletion layer is needed to allow a greater fraction of the light to be absorbed to increase the photogeneration yield. Thus there is a
hn Metal contact −
SiO2 n
p+
n+ + Figure 2. A typical reverse biased p–n photodiode. The p–n junction forms a charge depletion region in the n layer. Electron-hole pairs photogenerated in the depletion region are separated by the internal electrical field.
PHOTOCONDUCTOR DETECTOR TECHNOLOGY
trade-off between speed and photogeneration efficiency. The heavily doped p region provides the front surface electrode, and the heavily doped n region at the bottom provides the other electrode. The p–i–n photodiode (Fig. 3) is one of the most common photodetectors because the depletion region thickness, the intrinsic region, can be tailored to optimize the photogeneration efficiency and the frequency response. Usually, there is an antireflective overcoat on the surface such as SiO2 to increase the photosensitivity. Otherwise, its operation is the same as the p–n diode. Metal–semiconductor diodes (Fig. 4) rely on the Schottky effect to produce a region of charge depletion and electrical field in the semiconductor. The metal layer has to be thin, typically 10 nm, to be transparent, and an antireflective overcoat is required to minimize reflection losses. Because the depletion region is at the surface of the semiconductor, the metal–semiconductor is used particularly for ultraviolet and visible wavelengths, which are highly absorbed and do not penetrate far into the semiconductor. In addition to operating in a photoconducting mode where the resistance is varied by the light as described before, photodiodes can also be operated in the photovoltaic mode, where the photodiode acts as a voltage source supplying an external current proportional to the photon flux. Avalanche Photodiodes, Phototransistors, and CCDs An avalanche photodiode is a special photodiode that has a sufficiently high reverse bias to cause chargecarrier multiplication by impact ionization. In addition
Metal contact
Antireflective coating
hn
−
SiO2 i
p+
n+ +
to being very sensitive, these devices can operate at very high frequencies, up to microwave, because the transit times are very short. Another type of specialized photodiode is a heterojunction device formed by depositing a large band-gap semiconductor on a smaller band-gap semiconductor. The advantage of heterojunction devices lies in the fact that the large band-gap material is transparent to the light of interest, so there is little loss of light reaching the photogeneration region at the junction. Bipolar-junction and field-effect transistors may also be used as light detectors. In a bipolar transistor, the supply voltage is applied between the collector and the emitter, and light creates charge carriers in the base region whose voltage is permitted to float to suit the current flow. Thus, the photon-produced current of microamperes controls the much larger collector-emitter current of milliamperes; the amplification is determined by the injection gain of the transistor. Figure 5 illustrates a typical n–p–n silicon phototransistor structure. Similarly, if an appropriately designed field-effect transistor has light focused on its gate-channel junction, the light will regulate the drain current, resulting in a photo-FET device. Currently the most widely used photodetector imaging arrays are charge-coupled devices (CCD) (7). Although CCDs are discussed in detail in their own section, they deserve a comment here because they are distinct from the previous photodetectors and may be thought of as photocapacitors. These are MOS devices, metal–insulator–semiconductor, which function by using charge storage locations on the p-Si chip. By applying appropriate voltages to an array of electrodes separated by an insulating oxide layer from the p-Si, inversion regions or potential wells are created under the biased electrodes. In the simplest structure, every third electrodes forms a well that corresponds to a pixel. Photogenerated charges diffuse to the nearest potential well which collects and stores them on a timescale, the image frame time, that is long compared to the readout. The readout of the charge pattern is accomplished by changing the depth of the wells so that the charge spills to the adjacent (empty) well in a linear, bucket-brigade fashion that sends each pixel to the readout amplifier.
Figure 3. A p–i–n photodiode whose antireflective overcoat increases efficiency. The depletion region is the intrinsic region.
hn
Base Metal contact SiO2
Antireflective coating
Thin, semitransparent metal layer
hn Emitter
−
n
n
p n
n+ + Figure 4. Structure of a metal–semiconductor photodiode.
1173
Collector Figure 5. Schematic of an n–p–n phototransistor.
1174
PHOTOCONDUCTOR DETECTOR TECHNOLOGY
VIDEO CAMERA PHOTOCONDUCTORS Although modern cameras use array imagers, primarily CCDs, since its beginning in the 1940s, the television industry relied on vidicon-type cameras throughout most of its history. Vidicon camera tubes (8) use electron beams to address and scan the pixels on a single photoconductor (Fig. 6). Typical vidicon tubes consist of a vacuum tube that has a transparent glass faceplate coated on the inside by a transparent conducting electrode of SnO2 (NESA) and a photoconductor layer a few micrometers thick. A fixed positive voltage of 10 to 50 volts relative to the cathode of the electron beam gun is applied to the transparent electrode. The electron beam momentarily establishes a closed electrical circuit to successive pixels as the beam raster scans the photoconductor surface, driving the free surface voltage to zero by depositing electrons and charging the photoconductor surface. In the dark, the photoconducting layer is very insulating, so it acts as a capacitor to prevent any significant current flow on successive scans, once the surface voltage is driven to zero. Pixels exposed to light during each image frame, 1/30 second for television, discharge partially in proportion to the integrated light intensity over the frame time and neutralize some of the beam-deposited surface charge. The subsequent electron beam scan causes a current to flow in proportion to the charge required to recharge the surface back to zero volts. This small current signal is amplified to produce the video signal. Because the photodischarge occurs during a frame, chargecarrier transit time need not be very short. The original vidicon tube used vacuum-deposited amorphous selenium (a-Se) as the photoconductor (9). Because a-Se readily crystallizes and has sensitivity only to blue light, current vidicon tubes use antimony trisulfide (Sb2 S3 ), which has a good spectral match to visible light. However, Sb2 S3 has three drawbacks: high dark conductivity, image blooming, and image lag or ghosting. The high dark conductivity
prevents its use in low light conditions because the signalto-noise ratio becomes too small. Overexposure of pixels causes the excess charge to discharge neighboring pixels, resulting in rapidly growing white spots, called image blooming, that erase the surrounding image. Afterimage effects or ghosts are due to photoconductive lag. This ‘‘lag’’ arises because electrons are practically immobile in this p-type material and produce a space charge of trapped electrons. The increased electrical field resulting from the trapped charge enhances the photogeneration efficiency and dark current in subsequent frames. The Plumbicon tube improved on these problems by using PbO as the semiconductor with blocking electrodes (10). Basically, the photoconductor layer is a p–i–n structure where the n layer is next to the positive electrode and the p layer is at the free surface. Another improved design is the Saticon tube (11), which uses an arsenic (As)- and tellurium (Te)doped a-Se photoconductor layer. a-Se is preferred because both electrons and holes are mobile and prevent electron trapping which leads to afterimage effects. Arsenic is added to prevent crystallization. Higher As concentration improves stability at the expense of electron lifetime, so a compromise value of 10% As is used. To extend the spectral response to red light, a thin layer adjacent to the faceplate is doped with Te. Finally, to trap the electrons from the e-beam at the surface, a thin layer of Sb2 S3 is deposited on the free surface. Hydrogenated amorphous silicon a-Si(H) is another amorphous photoconductor that can be used in vidicon tubes. It matches the visible light spectrum very well and can be deposited as large area films needed in vidicon applications. ELECTROPHOTOGRAPHIC PHOTORECEPTORS The electrophotographic imaging industry, which produces of xerographic copiers and printers, has grown immensely since its invention in 1938 by Chester Carlson (12). In 2001, electrophotographic photoconductors will generate
Alignment coil Gun cathode 0 volts
Focusing coil
Wall screen 300 V
Deflection yoke Scanning
Gun +300V
Wall + 300 V
Beam
Light from scene
Photoconductor
Transparent electrode
10 − 50 V+ Video signal Figure 6. Vidicon tube camera structure. [After Weimer, Forgue, and Goodrich, RCA Review 12, 309 (1951).]
PHOTOCONDUCTOR DETECTOR TECHNOLOGY
revenue of about $9.1B. We are surrounded by documents that are either electrophotographic copies or prints produced by electrophotographic (laser) printers. Much as television (also invented in 1938) has become a ubiquitous medium of communication for the home, so electrophotographic documents have become an essential tool for informational communication in the office. In its early history, the photoconductors used in electrophotographic imaging paralleled the materials and technical development of vidicon photoconductors. Like vidicon tubes they initially used films of amorphous chalcogenide glasses for photoconductors; one surface was connected to a conducting electrode, and the other surface was free and charged by a deposition of electrons or ions. Chalcogenide glasses are alloys or metal compounds of the chalcogen elements from column VI of the periodic table, that is, O, S, Se, Te, etc. Similarly, As was added for stability and Te for red light sensitivity, and layer structures that had thin Te-rich on Se layers were employed. However, the electrophotographic photoconductor, usually called the electrophotographic photoreceptor or simply photoreceptor, differs significantly in one key respect from all of the photoconductors discussed earlier. Whereas all the previously discussed photoconductor devices produce an electrical current, which results in a time-varying signal that is then electronically processed, the photoreceptor output is an electrostatic image that is developed. The photoreceptor is as much a display as a detector device, and in this respect it is similar to photographic film. The photoreceptor converts an optical image or pattern into an electrostatic image, a full scale spatial pattern of charge or voltage corresponding to the desired final print. The electrostatic image is developed by using a fine powder of microscopic black or colored charged particles, called the toner. The resulting visible image on the photoreceptor is then transferred to paper or another substrate. Figure 7 is a schematic diagram of an electrophotographic copier or printer. The process consists of (1) charging with ions, (2a) or (2b) image light exposure, (3) development of the charge image by pigmented plastic powder, (4) electrostatic transfer to paper, (5) fusing of the powder image to paper, (6) cleaning, and (7) erasure of the photoreceptor. Light-lens copiers use reflected document
Laser
−
Document
light directly imaged on the photoreceptor, as shown in Fig. 7, step 2a. Modern printers and digital copiers employ digitally controlled light such as a scanning laser spot, 2b in Fig. 7, or an array of light-emitting diodes (LED). In this section, we are concerned only with the charging and light exposure processes that are essential to the operation of the photoreceptor. For details of the other processes, see the section on copiers and laser printers. The interested reader may also want to consult the very readable book, The Anatomy of Xerography, Its Invention and Evolution, by J. Mort (13). In principle, very low dark conductivity, single-crystal semiconductors could be used as photoreceptors. But because the imaging is at a 1 : 1 scale and hence requires large surface areas (up to ten square feet) and continuous cylindrical or flexible belt geometries, the photoconducting materials used for photoreceptors have been amorphous or polycrystalline composites. The early commercialization of electrophotography was based entirely on inorganic photoconductors, primarily a-Se (14), but alloys of a-Se containing Te and/or As and arsenic triselenide (As2 Se3 ) were also used. Other inorganic materials used were polymer dispersions of CdS and ZnO pigments (14), and beginning about 1980, amorphous (hydrogenated) silicon (a-Si •• H) (15). The first commercial organic photoreceptors (16) did not appear until 1970, but by the mid-1980s, they surpassed inorganic photoreceptors in production volume. In the year 2000, more than 100 million photoreceptors were manufactured worldwide, of which more than 98% were of organic materials. Therefore, electrophotography is the largest commercial user of organic electro-optical devices. Before discussing organic materials, it is useful to review briefly photoreceptor structures and the physics underlying their operating functions. Photoreceptor Structures The electrophotographic photoreceptor consists of one or two layers of photoconductive material on a conductive substrate (Fig. 8). The single layer structures primarily used for inorganic photoconductors are charged positively, and the incident light is absorbed close to the surface.
−
−
−
+
+
+
1 Charge
Polygon mirror
− 2b
Developer
1175
2a 1
3 Photoreceptor
Cleaner 6
+
+
−
−
3 +
4
+
Paper 5 Transfer
Fuser
−
hn
2
Erase 7
− 4
+
+
+
+ + Paper
+ −
Figure 7. Schematic diagram of an electrophotographic copier or laser printer indicating the seven imaging steps. The copier uses a light lens imaging (2a), and the printer uses a laser-polygon mirror imaging system (2b). The diagram on the right schematically shows the photoreceptor during (1) charging, (2) photodischarge, (3) toner development, and (4) toner transfer to paper.
1176
(a)
PHOTOCONDUCTOR DETECTOR TECHNOLOGY
hn
− − − − − −
+ + + + + + +
CTL
CG & CTL − − − − − −
hn
(b)
Conducting substrate
CGL + + + + + + +
Figure 8. Photoreceptor structures of (a) single layer and (b) dual layer consisting of a transparent charge-transport layer (CTL) and a charge-generation layer (CGL). The plus and minus symbols indicate the polarity of the charge applied.
Organic photoreceptors are almost universally duallayer structures consisting of a thin, 0.1–4 µm light absorbing layer, called the charge-generation layer (CGL), overcoated by a thicker, 10–40 µm transparent layer that transports the photogenerated charge carriers, called the charge-transport layer (CTL). These organic dual-layer structures are charged negatively. The polarity of charging for both single- and dual-layer structures is determined by the fact that they rely on the transport of holes, which originate at either the top or bottom surface. Additional layers or treatments may be used to provide adhesion to the substrate, to block charge injection from the metal substrate, to prevent interference fringes caused by coherent light reflection from the substrate, and to protect the free surface from wear. The substrates used are primarily rigid aluminum cylinders or flexible belts of a metal or metalized polymer. Aluminum cylinders have diameters that range from 16 mm to 280 mm and lengths of 240 mm to a meter, and belt diameters range from 60 mm to more than 3 meters. Photoreceptor Properties The operation of the photoreceptor involves two basic steps, charging and photodischarging. In turn, photodischarge consists of the two processes central to photoconductivity, photogeneration and charge transport. In the charging step, voltage is applied to the photoreceptor by externally depositing an ionic charge on the free surface. This is done by ionizing the air and depositing ions on the free surface by one of several methods. The simplest charging device, called a corotron, is a thin, 50–100 µm wire, surrounded on three sides by a conducting shield. A voltage of 4 to 8 kV applied to the wire ionizes the air, and a fraction of the ions flows to the photoreceptor surface, which is moving at a constant speed. The wire may be replaced by pins or a sawtooth air ionizer. In a device called a scorotron, a conducting screen, biased to the desired voltage level, is placed between the photoreceptor and the source of ions enabling more uniform voltage charging. Scorotrons are used in high-speed machines, but slower machines typically use a resistive charge roller that has applied dc and ac voltages. The air near the contact nip is ionized as the voltage difference between the roller and photoreceptor surface reaches air breakdown. The ac voltage provides polarity charging to ensure uniformity, and the dc level determines the final voltage on the photoreceptor (17). Conductive brushes are also used to deposit a
charge. A surface charge σ on a photoreceptor of thickness L and dielectric permittivity ε results in a voltage VD
σL . ε
(7)
For example, an organic dielectric layer that has a thickness L of 25 micrometers and dielectric constant ε/ε0 of 3 corresponds to a charging slope L/ε of 9.4 V nC1 cm2 or a capacitance ε/L of 107 picofarads cm2 . A surface charge of 85 nC cm2 , or about 0.53 ð 1012 electrons/cm2 , produces a voltage of 800 volts and an internal electrical field of 32 V/µm on such a photoreceptor. Typical photoreceptor charge levels are 300 to 900 V and are determined by the requirements of the toner development process. To maintain this charge in the dark requires that no charge be injected from the metal electrode or the surface and that the photoconducting layer(s) contains a negligible density of conducting charge carrier. The former requires that the free and metal surfaces be charge blocking. Additionally, to maintain a charge image, there can be no lateral conductivity on the surface requiring the surface to be charge trapping. In general, the photoreceptors can be charged to about 100 V/µm before breakdown occurs. No-bulk charge carriers imply that no significant charge is thermally generated on timescales of seconds to minutes. A density of N thermal emission sites, located an energy E from the conducting states generates a charge, n(t) D N(1 eRt ),
(8)
where the thermal generation rate factor R D νe E/kT ,
(9)
where ν is the attempt to escape frequency, typically 1012 –1013 seconds1 . Thermal emission states shallower than 0.5 eV empty in milliseconds, Rt × 1, and states deeper than 0.6 eV emit very slowly, Rt − 1. Thus if the photoconductor has NS shallow states and ND deep states, the thermally generated charge is n(t) D NS C ND Rt
(10)
Because only holes are mobile in these materials, thermal generation results in a space charge of trapped electrons. For uniformly distributed thermal generation sites, the dark discharge is (18,19) V D
enL2 . 2ε
(11)
Thus the density of thermal emission sites is an important parameter and, for example, a dark voltage loss of less than 100 volts requires that there be fewer than 1014 per cubic centimeter shallow charge-generation sites, which corresponds to an atomic concentration of about 10 parts per billion. Dark discharge can be enhanced by light overexposure or exposure to light when the photoreceptor is not charged,
PHOTOCONDUCTOR DETECTOR TECHNOLOGY
800
S = Sensitivity S = 200 V erg −1 cm2 600
S = ehA V
producing extra charge that is not conducted to the surfaces. Because there is no electrical field to remove the generated charge, some fraction can become trapped, preventing recombination. When the photoreceptor is charged in the next image cycle, thermal emission of any remaining trapped charge carriers contributes to the dark discharge, a phenomenon called light fatigue or lightenhanced dark discharge. If this trapping occurs during the image exposure step, a ghost image is produced by the enhanced dark discharge pattern. The discharge of the photoreceptor is accomplished by light exposure in a pattern of either a reflected image or a digitally generated pattern from a scanning laser spot or LED array. The exposure times range from a few milliseconds in copiers to nanoseconds in laser printers and therefore are very short compared to the time between light exposure and toner development. Typical hole mobilities for organic CTLs are 106 –105 cm2 V1 sec1 , so to discharge a 25-µm photoreceptor to 50 V requires a transit time, Eq. (4), of 12 to 100 ms. Thus a half second after exposure to light, the discharge is over, and the voltage for a photogeneration rate η is
1177
L l e hc
400
h(E )
200
VR 0
0
5
10
15
Exposure (erg/cm 2) Figure 9. Photodischarge curves for an idealized constantgeneration-efficiency η (dashed curve) and typical field-dependent η discharge (solid curve). VR is a residual voltage due to trapped charge, and S is the sensitivity.
τ , a residual voltage VR will result: V(X) D V0 SX
(12) VR D
where V0 is the dark voltage and X is the light exposure in energy units of erg cm2 or mJ m2 . The photosensitivity is S D eηA
L λ ε hc
(13)
usually expressed in units of V erg1 cm2 or V mJ1 m2 . In the expression for photosensitivity, Eq. (13), e is the charge of an electron, A is the absorption efficiency at wavelength λ, and hc/λ is the energy per photon. The dielectric thickness L/ε is the capacitive amplification factor relating charge to voltage. Thus a 25-µm organic photoreceptor that has a photogeneration efficiency per incident photon ηA of 0.8 at a wavelength of 780 nm will have a sensitivity S of 470 V mJ1 m2 and require an exposure of 1.7 mJ m2 to discharge 800 volts. In practice, a linear photodischarge, as given by Eq. (12), dashed curve in Fig. 9, is not observed. The actual discharge tails off, Fig. 9 solid curve, because the photogeneration efficiency is field-dependent and decreases with field. Additionally, because some of the charge carriers do not transit the layer and are trapped, the discharge has a lower limit. Because nearly all of the photoconductors employ transport only holes, the light absorption and photogeneration take place near one or the other surface of the layer, and the holes must transit the layer L to discharge the photoreceptor. But because the electrical field moving the charge is due to the surface charge, the process of discharging collapses the field, slowing down the charge carriers. Although the leading edge of the moving charge carriers sees the full electrical field, each successive carrier moves more slowly as the field is reduced. Thus no matter how high the mobility or the initial field, the last charge will take an infinite time to transit (20). Eventually, the charge becomes trapped and for carrier lifetime
L2 . 2µτ
(14)
The term µτ is called the charge-carrier range. Eq. (14) for residual voltage is applicable only to trapping that occurs uniformly in the CTL. In multilayer structures, trapping may occur at the interfaces and differently in each layer. Additionally, if photogeneration occurs at any distance from the positive surface or electrode, the motion of electrons becomes important, and the residual voltage may be due to their trapping. But in any case, the residual voltage due to a trapped charge sets a lower limit to the discharge (Fig. 9). The factor that causes a nonlinear photodischarge is the field dependence of photogeneration. In any semiconductor, but especially in a molecular crystal or amorphous semiconductor where the mobility of the created charge is low, the photogenerated charge pair becomes self-trapped in their coulombic attraction, producing a bound charge pair called an exciton. The probability that these geminate charge pairs either recombine or separate depends on the ratio of their binding energy to thermal energy. The external electrical field enhances separation by supplying additional energy. Poole–Frenkel (21) and Onsager (22) models of this field-dependent photogeneration efficiency have been proposed (23,24), but the processes in typical layer photoreceptors are more complex. So empirical models are used for quantitative agreement, for example, ηD
η0 . 1 C (EC /E)P
(15)
For most photogenerating materials, the power exponent P ranges between 0.5 and 2.5, and EC is a constant indicating the electrical field where the generation efficiency is half its maximum value. At high electrical fields, E > EC , the
1178
PHOTOCONDUCTOR DETECTOR TECHNOLOGY
generation efficiency saturates at η0 , that is, the discharge is linear at high voltage, and dV/dX is equal to S0 , Eq. (13) where η = η0 . At low field, E < EC , the generation efficiency falls off as (E/EC )P , and dV/dX is proportional to S0 (V/VC )P (the electrical field is equal to the voltage divided by the thickness L if there is no trapped space charge). Thus the tail of the discharge (Fig. 9) is due to field dependence, and EC , along with P, is a measure of the ‘‘softness’’ of the photodischarge curve . Photoreceptor Materials The first commercial photoreceptors used a-Se as the photoconductor. The a-Se was vacuum-deposited to a thickness of 50 to 60 µm. a-Se has high sensitivity for blue light (Fig. 10) but is not a good match to the visible spectrum. Its key advantages are high hole and electron mobilities, 10−1 and 10−3 cm2 V−1 sec−1 , respectively, long lifetimes, and low thermal generation. Because a-Se crystallizes readily, about 0.5% of As is added to prevent crystallization. Te is added to enhance the red light response, but because a-Te crystallizes more readily than a-Se, the concentration of Te is kept to a few percent. To obtain more red light sensitivity, a higher concentration is deposited as a thin, typically a fraction of a micrometer, photogeneration layer, much like in the Saticon photoconductor (10). The other chalcogenide used extensively as a photoreceptor is a-As2 Se3 , which has the advantages of being very sensitive, a good match to visible light (Fig. 10), and is very durable. The main disadvantages of a-As2 Se3 photoreceptors are the lack of electron transport, low hole mobility, 10−5 cm2 V−1 sec−1 , and significant thermal generation. The discovery of hydrogenated amorphous silicon a-Si(H), formed by the chemical vapor deposition of decomposed silane (SiH4 ) (25), led to its use as a photoreceptor in the 1980s. Normal a-Si has too many broken or dangling bonds that give rise to a high
density of traps which result in poor charge transport and high thermal generation. By incorporating hydrogen to eliminate these dangling bonds, an electronic grade material is obtained. By doping with phosphorus or boron atoms, much like crystalline Si, p- or n-type material can be made that has hole mobilities of 10−1 cm2 V−1 sec−1 or electron mobilities of about 10−3 cm2 V−1 sec−1 . In other respects, the a-Si(H) photoconductive properties are similar to those of a-As2 Se3 (Fig. 10). The main advantage of a-Si(H) is that it is very hard and durable, enabling print lives of several million prints. Its main disadvantage is the slow vacuum-deposition process that makes the photoreceptor expensive. Due to the slow deposition rate, the typical layer thickness is kept down to about 20 µm. Other inorganic photoconducting materials that have been used are solvent-coated polymer dispersions of ZnO and CdS. The main difficulty with these material dispersions is that they have trap-dominated charge transport that makes them cyclically unstable due to buildup of residual voltage and enhanced dark discharge. Because solvent coating of polymer films can be significantly cheaper than any vacuum-deposition process, organic photoreceptors that could be solvent coated were of considerable interest in the 1960s and 1970s. The earliest practical organic photoreceptor material to be used was poly-N-vinylcarbazole (PVK) (26) (Fig. 11), which exhibited hole transport. Although PVK has limited photosensitivity because it absorbs light only in the bluewavelength spectrum, studies of charge transport in PVK and other organics identified the basic mechanism of hole transport as electron exchange between adjacent electron donor molecules (Fig. 12a), which are aryl amine units. CH3 H C N
O
C
C H2 n
O
C
O
CH3
n
Bisphenol-A-polycarbonate O
300
Sensitivity (V/mJ/m 2)
250
200 a-As2Se3
150
Polyvinylcarbazole
a-Se −1 mh=10 cm2/Vs me=10−3 cm2/Vs e= 6.3e0 † a-As2Se3 mh=10−5 cm2/Vs e=9.1e0 a-Si(H) mh=0.5 cm2/Vs me=10−3 cm2/Vs e=12e 0
(a)
50
800
O
Figure 11. Monomeric units of polyvinylcarbazole (PVK) and two polycarbonate polymers used in organic photoreceptors.
a-Si(H)
600 700 Wavelength (nm)
C
Bisphenol-Z-polycarbonate
100
500
O
n
a-Se
0 400
C
900
Figure 10. Spectral sensitivities of a-Se, a-As2 Se3 , and a-Si(H) photoreceptors. The insert indicates the corresponding hole and electron mobilities and dielectric constants.
−−− D E D D D m D e + D D +++
(b)
−−− A A− e E A A m A A A +++
Figure 12. Schematic of (a) hole and (b) electron transport in an organic charge-transport layer. The charge moves by electron exchange between adjacent electron-donor (D) or electron-acceptor (A) molecules.
PHOTOCONDUCTOR DETECTOR TECHNOLOGY
Energy
E
LUMO
e
HOMO
Distance +•
N
N
Figure 13. Energy diagram of electrical field-assisted hole transport and diagram of the electron transfer process. Electron-donor molecules are aromatic amines that can reversibly exchange an electron between the neutral molecule and its oxidized cation.
The process involves a reversible one-electron reductionoxidation phenomenon, as depicted in Fig. 13. The hole charge carrier is an oxidized aryl amine (cationic radical), and the hole moves when an adjacent neutral aryl amine unit transfers an electron. The electrical field imposes a preferred direction by lowering the energy and increasing the exchange rate, as indicated schematically in Fig. 13. Thus, hole transport is a hopping process that involves electron exchange between electron-donor molecules. Organic charge-transport layers consist of solid-state solutions of 30–40% aryl amines in a polymer, called a doped polymer. Typical polymers are polycarbonates whose molecular weights range from 30 to 150 thousand units (Fig. 11). Charge transport in molecular doped polymers has been studied extensively, and the underlying physics is well understood and documented in an extensive literature. For details and references, the interested reader is referred to Organic Photoreceptors in Xerography by Borsenberger and Weiss (6). Although an endless variety of aryl amines is possible (6), the two most widely used classes are hydrazones and triphenylamine derivatives (Fig. 14). Hydrazones are molecules characterized by the CDNN structure. They are transparent to light wavelengths down to about 450 nm where they absorb strongly, giving them a yellow color. Because they absorb blue–UV light, they photodegrade when exposed to blue light. Triphenylamines, that have three aromatic groups on the nitrogen have the highest hole mobilities as a class and are transparent to wavelengths less than 400 nm, depending on the side groups. Because the latter are more stable and have higher mobility, hydrazones are being phased out in favor of triphenylamines. Analogous to hole transport by electron-donor molecules, electron-acceptor molecules are required for electron transport (Fig. 12b). Although a number of acceptor molecules that enable electron transport have been reported (6), to date none has met the mobility and range requirement of commercial CTLs. Molecular polymer solutions make practical chargetransport layers, but photogeneration requires crystalline photoconductors. Solvent coating requires that these photoconductors be ground into fine particles, called pigment particles, and dispersed in a polymer binder. The process of grinding can introduce defects and impurities that act as traps or thermal generation centers, hence
1179
the amount of pigment must be minimized. Without electron transport, a dilute single layer dispersion that absorbs the light and photogenerates uniformly in the bulk is impractical. Instead, a separate CGL layer is used, thick enough to absorb the light but thin enough to permit electron removal. Because the pigment extinction coefficient is about 105 cm1 , CGLs need be only a few tenths of a micrometer thick and have pigment concentrations of 40 to 80% in the polymer binder. The pigment particles must be less than 0.1 µm, typically of the order of 10 nm. The pigments themselves are usually electron transporters, so the high concentration permits particle contact, enabling electron transport. Some stable pigments such as perylenes or phthalocyanines can also be vacuum-deposited to form very uniform thin films without a binder. Because the thin CGLs are fragile, it is preferable to coat them on the metal surface and overcoat using the thicker CTL (Fig. 8). In this case, the photoreceptor must be charged negatively. If, instead, the CGL is coated on top of the CTL, the photoreceptor is charged positively, and a protective overcoat may be required. Considerable progress has been made in understanding charge transport in molecularly doped polymers, but fundamental understanding of photogeneration has been limited by the complexity of the practical CGLs. Because the generation layers are dispersions of very small polycrystalline pigments in a polymer, the generation efficiency depends on the initial generation of charge in the crystal and also on the transfer of the charge from the crystal to the chargetransporting medium, that is, the CTL for holes and through the CGL for electrons. On the one hand, the charge-transfer process that takes place on the surface of actual pigment particles is difficult to characterize. On the other hand, model single crystals of the pigment employed are too difficult to obtain. As a result, the design of CGLs is highly empirical, and the technology is treated as proprietary. So the literature on CGL design is limited. A very large variety of organic pigments is used in photoreceptors, including azo pigments (27), dyes such as thiopyrilium (28) and squarilium (29), polyaromatic hydrocarbons and amines, and phthalocyanines (30). Figure 15 is an example of the spectral sensitivities of some typical organic photoreceptors. In general, the pigments are classified as either visible or ‘‘laser’’ pigments. The former have photogeneration efficiencies tailored to the visible spectrum, 400 to 700 nm. The latter are tailored for monochromatic LED (670, 680, 720 nm) or laser (630, 670, 780 nm) wavelength used in digital machines. The wavelength for most laser printers is 780 nm due to the wide use of diode lasers developed for the compact disk (CD) industry. The two most widely used visible pigments are azos (27), Fig. 16, and perylenes (31,32), Fig. 17. Azo pigments, identified by the double nitrogen bond NDN, are widely used as color pigments and dyes for fabrics and inks and have a well-established chemical industry to supply them. Bisazos consist of a CNDNANDNC structure, which offers unlimited possibilities through variations of possible central bridging units A and coupling
1180
PHOTOCONDUCTOR DETECTOR TECHNOLOGY
(a) C2H5 N
N
C H
N C2H5
CH3 N
C H
N
N
Hydrazones (b)
H3C CH3 N
CH3
H3C
N
N
H3C CH3
N
Figure 14. Examples of (a) hydrazone and (b) triarylamine electron-donor molecules employed in charge-transport layers.
units C (Fig. 16). The very fine particles readily obtained directly from synthesis make excellent dispersions. Bisazos are primarily visible absorbers, but near-IR pigments are also in use. Of the polyaromatic or polycyclic pigments, perylene imides are the best known because of their use in automotive paints and colorants for plastics. Their use in automobile paint attests to their stability to exposure to sunlight and the elements. The perylene pigments of interest as photogenerators fall into two classes, perylene diimides and perylene arylimidazoles (Fig. 17). Because of the stability of perylene pigments, thin CGLs can be prepared by direct vacuum deposition as well as by solvent coating of dispersions (31,32). Phthalocyanines are the pigments most often used for laser and LED light sources. Although phthalocyanine photoconducting properties have been studied for some time and their use as an organic photoreceptor goes back
C H
C N
H3C
CH3
H3C Triarylamines
to the 1960s (30), only after the appearance of diode lasers in the 1980s, were they introduced as pigments for laser printers. Phthalocyanines (Fig. 18) are metal complexes of porphyrin molecules that are related to natural photoactive pigments, such as chlorophyll. The molecules are flat, planar structures that are divalent metals, metal halides, or metal oxides at the center (33). They absorb light strongly in the visible red and near-IR, are extremely stable to decomposition, and are insoluble in common solvents. As for perylenes, their stability is exploited by the automotive paint industry for blue paints that will not degrade under full sun exposure. Because of their flat pancake shape, phthalocyanine pigments exhibit many different crystal polymorphs, distinguishable by X-ray diffraction spectra, which influence their photogeneration efficiency (6,33). Figure 15 shows the sensitivity of two photoreceptors based on titanyl phthalocyanine (TiOPc) of different
PHOTOCONDUCTOR DETECTOR TECHNOLOGY
Resolution and Durability
400
Two critical parameters of an imaging device are its resolution and how many times it can be cycled before wearing out. The resolution of a photoreceptor is governed first by the surface resistivity and second by coulombic blooming caused by the collapsing electrical field as the photogenerated charge moves through the CTL. Normal resistivity is greater than 1016 ohm/cm2 , which is sufficient for resolution down to a few micrometers. Because the surface is exposed to ions and corona effluents such as ozone and NOX during charging and to residues from toner development and air contaminants, film deposits of these contaminants on the surface can decrease the resistivity. For example, any salts that ionize in the presence of moisture will increase the surface conductivity. Maintaining high resolution requires aggressive cleaning that removes any film and the monolayer of the chemically oxidized surface. Because charge blooming varies with the thickness of the CTL, the limiting resolution is the thickness of the CTL for typical 70–90% discharge operating conditions. Chemically, the best organic materials are very stable, and lifetimes of millions of cycles are possible. The occurrence of chemical degradation or instabilities is due to impurities introduced in the manufacturing process, so the remedy lies with improved processing. The primary photoreceptor failure modes are extrinsic damage, such as scratching, and ultimately, wear. Catastrophic damage failures are a function of machine reliability and design, but once those are addressed, wear is the limiting factor. The photoreceptor is subject to physical contact during
TiOPc 300 Sensitivity (V/mJ/m2)
1181
200
TiOPc Azo 1 H2Pc
100 Azo 2
0 400
500
600 700 Wavelength (nm)
800
900
Figure 15. Spectral sensitivity of two visible (azo pigments) and three laser diode (phthalocyanine, Pc pigments) sensitive organic photoreceptors. The two TiOPc sensitivities are from a high- and medium-sensitivity polymorph.
crystal forms. The more stable polymorphs usually have lower photogeneration than the more sensitive polymorphs, which are obtained by techniques of milling, heating, solvent treating, or controled substrate temperature during vacuum deposition. Similarly to perylenes, phthalocyanine CGLs are made by either solvent coating from a binder dispersion or by vacuum deposition.
C
N
N
A
N
N
C
A − Bridging substituent C − Coupling substituent O O
HO
C
N H X
A1
CH3
O
C1 HO
N
O
A2 N
N
N
O
N C2
A3
O
O
Figure 16. Examples of bisazo pigment molecular structures.
N
O
N
N
Ar R
N
N
R
Ar O
O
O
N
Figure 17. Perylene bisimide and bisarylimidazole pigment molecules.
1182
PHOTOCONDUCTOR DETECTOR TECHNOLOGY
SUMMARY
N
N N N
M
N
N N
N
Divalent M: H 2, Mg, Zn, Cu, Pb, Cd, Sn Trivalent M: AlX, InX, GaX Tetravalent M: SiX 2, GeX2, VO, TiO X = F, Cl, Br, OH Figure 18. Phthalocyanine pigment molecules.
toner development, paper transfer, and cleaning (also charging if charge rollers are used), all of which contribute to abrasive wear. Wear rates of organic CTLs range from 10–60 nm per kilocycle depending on machine design, and the typical value is 20 nm per kilocycle, which results in a life of 300 to 600 kcycles. For comparison, the much harder a-Si(H), wears at 0.1–1 nm per kcycle and lasts about 3 to 10 million cycles. a-Si(H) is actually more sensitive to surface scratches and is overcoated by an oxygen- or carbon-doped thin layer which wears through. To increase photoreceptor wear life, CTL thickness has been increased from less than 20 µm to about 30 µm. Overcoated and reinforced photoreceptor designs are being introduced along with improved machine designs, and in 2000, organic photoreceptors that have a print life of more than one million cycles were introduced.
Photoconductors used in imaging fall into two classes, those based on discrete units for each image pixel and large surface area imagers. The former use crystalline IV, III–V and II–VI semiconductors and integrated chip technologies, especially the well-established silicon VLSI technology, and are used in a wide spectrum of imaging applications, as well as in the communication and electronic industries. The large surface area devices use amorphous or molecular photoconductors that can be fabricated in large continuous areas, often measured in square feet. The primary application of large area continuous photoconductors continues to be the electrophotographic industry, which, almost exclusively, employs organic materials. But recent advances in organic thin film transistors and organic light-emitting diodes (OLEDs), essentially photodiodes operated in reverse, promise to expand the use of organic electro-optic materials considerably. Photoconductivity involves the excitation of electrons from occupied energy states to empty states producing a mobile charge that is separated by an externally applied or internal electrical field. The key parameters are photogeneration and transport of charge carriers, measured by the photogeneration efficiency η and mobility µ, which determine the sensitivity and speed of the photodetector. The best materials are single crystals that have near unity photogeneration efficiencies and high mobilities. Choosing materials with different band gaps enables solid-state photodetectors that respond from 100 µm in the far infrared to X rays. Single-crystal photodetectors are fabricated, with few exceptions, in planar geometries in four basic structures: photoresistors, photodiodes, phototransistors, and CCDs. The basic photodiode and phototransistor structures come in many different varieties optimized for special applications, such as high sensitivity and low noise or high speed, up to microwave frequencies for optical communication.
350
600
Visible
500
250
Sensitivity (V/mJ/m2)
Sensitivity (V/mJ/m2)
300
Laser
200 150 100 Azos 50
400
Phthalocyanine Azos
300
Other
200
100
Perylene Other
0 1970
1980
1990 Year
2000
0 1970
1980
1990 Year
Figure 19. Sensitivity improvements from 1970 to 2000 of visible and laser-sensitive (primarily 780-nm) organic photoreceptors.
2000
PHOTODETECTORS
Single crystals are impractical for large area photoconductors such as photoreceptors, so they must rely on amorphous or disordered solids that have significantly lower mobilities. In modern organic photoreceptors, separate electronic material layers accomplish these functions, that is, photogeneration by organic crystalline particles and hole transport by aromatic-amine electron-donor molecules. A large variety of electron-donor molecules are in current use; the two main classes are hydrazone-based arylamines and triphenyl-based polyamines. At concentrations of 40%, typical mobilities of 105 cm2 V1 sec1 are achieved, which is sufficient for most electrophotographic applications. Considerable progress has been made in the last 30 years in the photosensitivity of organic photoreceptors (Fig. 19), and sensitivities approach the theoretical limit of unity photogeneration. The best materials are chemically very stable, and intrinsic lifetimes of millions of cycles are possible, but organic photoreceptors are soft, and wear limits the life to about half a million cycles, compared to a-Si(H) that has longer wear life by a factor of 10. Protective overcoats and reinforced polymers are being pursued to extend the life beyond a million cycles and approach that of a-Si. Typical two-layer designs in current use are limited in resolution to the thickness of the CTL, about 1,200 dpi (dots per inch). High quality color will require higher resolution and photoreceptors that have top surface photogeneration. ABBREVIATIONS AND ACRONYMS MOS CCD CGL CTL LED CD
metal-oxide-semiconductor charge coupled devices charge generation layer charge transport layer light emitting diode compact disk
BIBLIOGRAPHY
1183
12. Electrophotography, U.S. Pat. 2297691, 1942, C. F. Carlson. 13. J. Mort, The Anatomy of Xerography, Its Invention and Evolution, McFarland, Jefferson, NC, 1989. 14. J. H. Dessauer and H. E. Clark, eds., Electrophotography and Related Processes, Focal Press, London and New York, 1965. 15. I. Shimizu, T. Komatsu, T. Saito, and E. Inoue, J. Non-Cryst. Solids 35, 773 (1980). 16. W. D. Gill, J. Appl. Phys. 43, 5,033 (1973). 17. H. Kawamoto and H. Saito, J. Imaging Sci. Technol. 38, 383 (1994). 18. A. R. Melnyk, J. Non-Cryst. Solids 35, 837 (1980). 19. M. Abkowitz, F. Jansen, and A. R. Melnyk, Philos. Mag. B 51, 405 (1985). 20. I. P. Batra, K. K. Kanazawa, and H. Seki, J. Appl. Phys. 41, 3,416 (1970). 21. J. Frenkel, Phys. Rev. 54, 647 (1938). 22. L. Onsager, J. Chem. Phys. B, 599 (1934). L. Onsager, Phys. Rev. 54, 554 (1938). 23. R. H. Batt, C. L. Braun, and J. F. Horning, J. Chem. Phys. 49, 1,967 (1968). 24. D. M. Pai and S. W. Ing, Phys. Rev. 173, 729 (1968). 25. W. E. Spear and P. G. LeComber, Philos. Mag. B 33, 935 (1976). 26. H. Hoegl, J. Phys. Chem. Solids 25, 1,221 (1964). 27. K. C. Nguyen and D. S. Weiss, Electrophotography 27, 2 (1988). 28. W. D. Dulmage et al., J. Appl. Phys. 49, 5,543 (1978); P. M. Borsenberger, A. Chowdry, D. C. Hosterey, and W. Mey, J. Appl. Phys. 49, 5,555 (1978). 29. R. E. Wingard, IEEE Ind. Appl. 1,251 (1982). 30. J. Weigl et al., in W. Berg and K. Hauffe, eds., Current Problems in Electrophotography, Walter de Gruyter, Berlin, 1972, p. 287. 31. E. G. Schlosser, J. Appl. Photogr. Eng. 21, 118 (1978). 32. J. Duff, A. -M. Hor, A. R. Melnyk, and D. Teney, J. SPIE Proc. 1,253, 183 (1989). 33. R. O. Loutfy, A. -M. Hor, C. K. Hsiao, and A. R. Melnyk, Proc. Fourth Int. Congress Non-lmpact Printing Technol. SPSE, New Orleans, 1988.
1. W. Smith, Nature (London) 7, 303 (1873). 2. F. C. Nix, Rev. Mod. Phys. 4, 723 (1932); A. L. Hughes, Rev. Mod. Phys. 8, 294 (1936).
PHOTODETECTORS
3. R. H. Bube, Photoconductivity in Solids, John Wiley & Sons, Inc., New York, NY, 1960.
MARK WADSWORTH
4. A. Rose, Photoconductivity and Related Processes, Interscience, NY, 1963.
Jet Propulsion Laboratory Pasadena, CA
5. S. M. Ryvkin, Photoelectric Effects in Semiconductors, Consultants Bureau, NY, 1976. 6. P. M. Borsenberger and D. S. Weiss, Organic Photoreceptors for Xerography, Marcel Dekker, NY, 1998. 7. M. J. Howes and D. V. Morgan, eds., Charge-Coupled Devices and Systems, John Wiley & Sons, Inc., New York, NY, 1979. 8. P. K. Weimer, S. V. Forgue, and R. R. Goodrich, RCA Rev. 12, 309 (1951). 9. P. K. Weimer and A. D. Cope, RCA Rev. 12, 314 (1951). 10. E. F. deHaan, A. van der Drift, and P. R. M. Schampers, Phillips Tech. Rev. 25, 133 (1963/4). 11. N. Goto, Y. Isozaki, K. Shidara, E. Maruyama, T. Hirai, and T. Fujita, IEEE Trans. Electron Devices ED-21, 662–666 (1974).
Photons that impinge upon matter interact in a manner determined by the nature of the chemical bonds in the material and the energy of the incident photons. Interacting photons may be reflected, refracted, diffracted, transmitted, or absorbed. Each of these phenomena can be used to measure some parameter of interest in chemical analysis. Photoassisted chemical analytical techniques require an accurate, sensitive method of photon detection and quantification. Photographic film proved to be the first useful method for such evaluations. Later, photomultiplier tubes provided the means for improving many photoanalytical techniques. As a result of the advent of semiconductor technology, numerous single-element, broad-spectrum
1184
PHOTODETECTORS
photodetecting devices were developed and applied to the task. Most recently, photodetector technology has entered the age of high-density integration. Large-area photon detectors are becoming commonplace in most analytical equipment. The advances afforded by these devices have allowed significant improvements in the performance of systems designed for photoanalytical chemical analysis in general, and specifically for spectroscopy. Therefore, a working knowledge of the operation and limitations of photodetectors is necessary for the modern chemist. Photodetector devices convert electromagnetic radiation or photons to electric signals which can be processed to obtain the spectral, spatial, and temporal information inherent in the radiation. Photodetectors, as indicated in Table 1, may be operated in many modes. The more popular are photoconductors, photodiodes, charge-transfer devices, and pyroelectrics. Detectors may be used as single elements such as in street light controls, film camera exposure control, or motion detectors for security. Photodetectors also find application as linear arrays used in analytical spectrometers, night-vision equipment, in small, low cost spectrometers to control building ventilation and environmental pollution monitoring, or configured as large matrix arrays in video cameras. PRINCIPLES
impurity-activation energy Ei . The cutoff wavelength λc , corresponds to a photochemical activation energy E, where λc D hc/E.
(1)
The product of Planck’s constant and the velocity of light hc D 1.24 when E is expressed in electron volts and λc in µm. Because the absorption coefficient is typically a weak function of wavelength above the threshold energy, the spectral response approximates that of the ideal quantum counter. The interest in semiconductor photodetectors with cutoff wavelengths longer than 3 µm results from their extreme sensitivity to variations in photon flux emitted from gray-body objects that are characterized by time and spatial variations in temperature and emissivity. Although some applications require detecting narrowband radiation, for example, atomic emission spectra and radiation from hot gases, most uses of infrared detectors involve detecting variations in photon flux from objects near ambient temperature. Typically, a lens system images the scene of interest on a single detector element, a linear array of detector elements, or an area array (7,8). The infrared detector is exposed to the signal photon flux through the optics. The ambient background photon flux φB , is often much greater than the signal flux and determines the ultimate system performance. The background portion φB , in photons/(cm2 Ðs) may be calculated from Planck’s radiation law.
Photon Absorption and Thermal Emission The basic detection process, (Fig. 1) is the generation of free electrons, holes, or both by the absorption of photon energy (1–4). The absorption may be intrinsic, that is, creating a free electron–hole pair, or extrinsic, creating a free hole or electron. The process can be indirect, whereby the absorbed photon energy raises the lattice temperature (thermal detection) and a phonon generates the free charged particle. The change in free charge is sensed in an amplifier circuit as a signal voltage or current. Random generation of free charge is the source of detector noise. The detector geometry, spectral response, electrical bias, and temperature are adjusted to optimize the signal-tonoise ratio. The absorption efficiency (photon to electron conversion) is a critical issue. The detector surface is typically coated to minimize reflections at a particular wavelength, and the internal efficiency is given by 1 exp(αx). Because the absorption coefficient α is typically 3,000 cm1 for intrinsic detection and 3 cm1 for extrinsic detection, detector thickness x has a wide range of values. Important figures of merit (5) describe the performance of a photodetector: responsivity, noise, noise equivalent power, detectivity, and response time (2,6). However, there are several related parameters of measurement: temperature of operation, bias power, spectral response, background photon flux, noise spectra, impedance, and linearity. Operational concerns include detector element size, uniformity of response, array density, reliability, cooling time, radiation tolerance, vibration and shock resistance, shelf life, availability of arrays, and cost. Photodetectors exhibit well-defined, cutoff-wavelength thresholds, whose positions are determined by the magnitudes of the band gap activation energy Eg or
Blackbody Emittance Representative blackbody emittance (9,10), calculated as a power spectral density, is shown in Fig. 2. The wavelength λ of peak power density for a blackbody at temperature T is given by Wien’s displacement law λm T D 2,898 µm K,
(2)
where λm is the wavelength of maximum power density and T is temperature in K. The peak spectral power density or emittance in W/(cm2 Ð µm) is given by M(T)m D 1.288 ð 1015 T 5 watts/(cm2 µm)
(3)
for each square centimeter of emitting surface and micrometer of wavelength at the peak value. The total power density W integrated over all wavelengths in W/cm2 is given by the Stefan–Boltzmann law: W D 5.67 ð 1012 T 4 watts/cm2 ,
(4)
which expresses the total rate of emission in a 2π steradian solid angle from an element area on a blackbody surface. The very strong dependence of W on temperature and the sharp decline of emittance for wavelengths somewhat less than the peak wavelength results in very low ambient background flux density for ambient temperatures less than 320 K and detectors whose cutoff wavelengths are <3 µm. Short-wavelength detectors such as silicon chargecoupled devices (CCDs) do not work in the infrared for this reason but are excellent when hot sources such as light bulbs (1,500 K) and the sun (6,000 K) are available.
1185
3–5
150–220
77–200 60–90 60–90
120
60–80
PC PCf PV
PV
SBg
200–300
PC
PC
240 200–300 200–300
PV PV PV
300 300 300
PC PV PV
1 –3
300 300
CCD CCD
0.200 –0.400 0.400–0.700c
Temp.(K)
Modea,b
Bandwidth µm
Si : Pt
HgCdTe
PbSe Ge : Au InSb
HgCdTe
PbS
InGaAs InGaAs HgCdTe
CdS Si GaAsP
Si Si
Material
40–60
40–60
50–80 80–200 40–60
40–60
40–400
25–200 25–50 40–400
100–1,000 30–200 30–200
10–30 10–30
Pixel Size (µm)
Table 1. Semiconductor Photodetectors in General Use
Light meter Optical switch Optical switch
Astronomy Video Camera
Spectroscopy Imaging Astronomy; Mapping Mapping
480 ð 640
512 ð 512
1 ð 100 Few 512 ð 512
1 ð 100
Night scope; heat sensing Heat sensing Fast response Thermal imaging Thermal imaging Thermal imaging
Mid-wavelength infrared
1 ð 512 512 ð 512 256 ð 256– 1,000 ð 1,000 10 ð 3,000
Short-wavelength infrared
2,000 ð 4,000 488 ð 640– 2,000 ð 2,000 <100 Few Few
Ultraviolet
Focal Plane Size
Principal Use
4.6
1
5
150e 0.1 1
5.8e 7.8 5.25 5.0
10
300–1,000
1 1 10
100,000 0.01–10 0.1–10
Response Time(µs)
5.0
2.7–3.2
2.5 1.7 2.9
0.7 1.0 0.8
1.1 1.1
Cutoff Wavelength (µm)
3 ð 109
1 ð 1011
3.4d 0.03d
2 ð 1010 1 ð 1010 1 ð 1011
1 ð 106e 1 ð 104 3.4d
1 ð 1011
5 ð 1011 – 1 ð 1011
5 ð 104 – 1 ð 103
1 ð 105
2 ð 1012 2 ð 1012 2 ð 1012
1 ð 1013 5 ð 1013 1 ð 1013
1 ð 106 0.5d 0.2d
0.2d 0.2d 2.0d
5 ð 1013 5 ð 1013
DŁ (cm Hz1/2 /W)
1 ð 1011 2 ð 1010
Responsivity (V/W)
1186
60–90
50–77
30–40
20–30 5–15 4–10
PV
QWIPh
PCf
PCf PCf PCf
300
300
Boll
Boll
0.200–200
0.200–200
VO
α-silicon
Si : As
Ge : Cd Ge : Zn Si : Ga
Ge : Hg
GaAs/AlGaAs
HgCdTe
HgCdTe
Material
40–60
40–60
50–100
50–200 50–200 50–100
50–200
40–60
40–60
40–60
Pixel Size (µm)
640 ð 480
640 ð 480
Broadband
64 ð 256
1 ð 100 1 ð 100 64 ð 256
1 ð 200
486 ð 640
512 ð 512
1 ð 512
Thermal imaging Thermal imaging
Thermal imaging Thermal imaging Thermal imaging Thermal imaging Spectroscopy Spectroscopy Threat detection Threat detection
Long-wavelength infrared
Focal Plane Size
Principal Use
Full band
Full band
22
28j 37k 18
13
9.4i
10.5
12.2
Cutoff Wavelength (µm)
10,000
10,000
1
1j 1k 1
0.1
1i
1
1
Response Time(µs)
b
Bol D bolometer; CCD D charge-coupled device; PC D photoconductor; PV D photovoltaic; QWIP D quantum well infrared photodetector; and SB D Schottky barrier. Detectors are all intrinsic unless otherwise noted. See Fig. 1. c Visible bandwidth; near infrared = 0.700 –1.00 µm. d Units are A/W. e Values are for a temperature of 190 K. f Detector is extrinsic. g Detector is an internal photoemission (IPE) device. h Detector is a superlattice (SL) device. i Values are for a temperature of 77 K. j Values are for a temperature of 15 K. k Values are for a temperature of 6 K. l Detector is a thermal device.
a
4–10
PCf
5–150
70–100
PC
8–12
Temp.(K)
Modea,b
Bandwidth µm
Table 1. (Continued)
1 ð 106
1 ð 106
5 ð 105
5 ð 105j 5 ð 105k 5 ð 105
5 ð 105
5 ð 108
5 ð 108
2 ð 1010
3 ð 1010j 2 ð 1010k 2 ð 1010
3 ð 1010
1 ð 1010i
5 ð 1010
6.5d 0.2d,i
3 ð 1010
DŁ (cm Hz1/2 /W)
1 ð 105
Responsivity (V/W)
PHOTODETECTORS
FIGURES OF MERIT
1021 Emittance, photons / (cm2·s·µm)
It is apparent from Fig. 2 that detection of ambient temperature objects (night vision) requires a detector whose cutoff wavelength is >4 µm. Detectors whose response is beyond 8 µm are generally better for thermal imaging applications. However, system sensitivity also depends on several other factors: detector quantum efficiency, detector noise, electronics noise, number of detectors on the focal plane (ie., noise bandwidth), sampling rate, width of spectral band, optical aperture, and detector element size.
A detector responds to variations in photon-flux density φs by producing a signal voltage Vs , or current. The background photon flux φB , may result in an offset voltage which can be nulled by a differentiating circuit because φB is constant for a given measurement. For most detectors, the output voltage depends on the additive effect of φs over the detector area A. Each photon has energy hν. The integrated power density over a spectral interval is Js . The ratio of signal voltage (amperes or electrons) to the signal photon rate is the responsivity RV in units of V/W: RV D
(a)
Vs volts/watt. AJS
(5)
CB
Eg
hn ≥ E g
2000 K
1020
1000 K 1019 500 K
1018 1017 1016
Responsivity
1187
300 K
1
1.5 2
3
4 6 8 10 15 20 Wavelength, (µm)
30 40 50
Figure 2. Blackbody radiated photon flux interval distribution from ambient temperature up to 2,000 K. Ambient objects do not have detectable flux for detectors of cutoff wavelengths
Vs usually is linear with Js for low intensity but saturates at the higher signal-flux levels where the photon flux disrupts the thermal equilibrium of the semiconductor electronic states. When Js refers only to those photons capable of generating free carriers, RV is considered the spectral responsivity Rλ and when the flux includes all wavelengths, RV is called the blackbody responsivity Rbb . The responsivity of photodiodes is often expressed in amperes per watt (A/W) because the diode is used as a current generator when connected to an integrating amplifier. The ideal responsivity for a photodiode is given by ηq/Eg where η is the quantum efficiency, q is the charge of an electron, and Eg is the band gap energy. Most commercial detectors achieve more than 50% of ideal responsivity except for silicide detectors which typically show less than 2% of ideal responsivity.
VB
Noise (b) CB
Ei
hn ≥ E i VB
(c)
CB hν ≥ E i
Ei VB Figure 1. Photoexcitation modes in a semiconductor that has band gap energy, Eg , and impurity states, Ei . The photon energy hν must be sufficient to release an electron () into the conduction ž ) into the valence band (VB): (a) an band (CB) or a hole ( intrinsic detector; (b) and (c) extrinsic donor and acceptor devices, respectively.
The performance of a system, whether it is an analytical spectrometer or thermal imager, improves as the detector noise is lowered until amplifier noise becomes the dominant noise source, provided that the responsivity is not reduced faster than the detector noise. A wellengineered approach to improving system sensitivity consists of (1) lowering amplifier noise consistent with system bandwidth, power, weight, size, and cost budgets; (2) perfecting the detector material and fabrication processes for least defects to achieve lowest noise; and (3) designing the detector mode of operation, geometry, response time, and bias power for the highest responsivity. The operating parameters of the detector are often adjusted to increase responsivity and noise to the point where the detector noise just exceeds the amplifier noise, if that is possible. Although photodetector noise (11) originates as random fluctuations of photons or electrical carriers, it is measured using the Fourier transform process as a frequency function for purposes of analysis. The noise spectrum is characterized by frequency-dependent noise at low frequency (called l/f noise), generation–recombination
1188
PHOTODETECTORS
(g–r) noise in a midrange, and system noise at high frequency. At low frequency, noise typically varies inversely as the square root of frequency and results from surface traps and/or fluctuating defects (12,13). This noise is often reduced by surface passivation to reduce the trap density or create a surface potential to keep the charge carriers away from the surface. Crystal growth resulting in fewer electronic defect states (typically <1 ð 1015 cm3 ) and elimination of detector fabrication damage also reduce l/f noise. A well-designed detector is dominated by g–r noise associated with random carrier generation and recombination in the semiconductor. The minority carrier lifetime τ determines the extent of the g–r noise plateau. Trapping mechanisms, whether bulk or surface, exhibit behavior similar to the g–r process when the trapping time determines the detector response time. Careful analysis is required to distinguish between recombination and trapping. This identification helps in determining the nature of the defect such as the identity of the impurities. For example, copper atoms in HgCdTe reduce the lifetime and detector responsivity. Characterized by τ , the g–r or trapping noise declines as the inverse of frequency and the system or Johnson noise (JN) becomes dominant at the higher frequencies. In general, noise components Vi , are not correlated and add in quadrature to give the total noise as follows: 2 2 C VAMP . VN2 D Vf2 C Vg2– r C VJN
(6)
The noise is expressed as noise density in units of V/(Hz)1/2 or is integrated over a frequency range and given as volts rms. Typically, photoconductors are characterized by a g–r noise plateau from 103 to 105 Hz. Photovoltaic detectors exhibit similar behavior, but the l/f knee may be less than 100 Hz, and the high frequency noise roll-off is determined by the p–n junction impedance–capacitance product or the amplifier (AMP) circuit when operated in a transimpedance mode. Bolometers exhibit an additional noise, VTC , associated with thermal conductance. Detectivity Detector sensitivity (1,2) is expressed in terms of the minimum detectable signal power or noise equivalent power (NEP) given in units of watts or W/Hz1/2 . The reciprocal function when normalized for detector area A and noise bandwidth f , is defined as detectivity D* in units of cm Hz1/2 /W. Thus, NEP D DŁ D
VN RV
p watts or watts/ Hz,
RV (A f )1/2 VN
p cm Hz/watt.
(7) (8)
The rms noise is measured in a noise bandwidth f . The DŁ is called D star lambda when the spectral band is limited to a given interval, and DŁ blackbody when the total blackbody incident power density is used in the calculation. The blackbody method of calibrating photodetectors is excellent for cutoff wavelengths larger than 2 µm, but
for short wavelength detectors such as visible sensing charge-coupled devices, the unit of lux is the scattered illumination power density of 100 µW/cm2 in the visible spectrum. Most CCD cameras have a sensitivity of 3–10 lx. Spatial resolution and image display is often a problem with arrays of discrete detectors because small objects are undersampled. The results are moir´e patterns and loss of detail. Scanning arrays use overlapping detector geometries to improve the spatial quality of the image. Infrared detector element size is limited to larger than 10 µm by electrical contact methods or photolithographic minimum feature size (1 µm) for the detectors or readout electronics. Ideal Performance and Cooling Requirements Free carriers can be excited by thermal motion of the crystal lattice (phonons) as well as by photon absorption. These thermally excited carriers determine the magnitude of the dark current Jd and constitute a source of noise that defines the limit of the minimum radiation flux that can be detected. The dark carrier concentration is temperaturedependent and decreases exponentially with reciprocal temperature at a rate that is determined by the magnitude of Eg or Ei for intrinsic or extrinsic material, respectively. Therefore, it is usually necessary to operate infrared photon detectors at reduced temperatures to achieve high sensitivity. The smaller the value of Eg or Ei , the lower the temperature must be. The operating temperature depends on the required sensitivity and the temperature dependence of the various thermal-noise components. In all cases, a practical limitation in sensitivity is imposed by an amplifier or recording device. Theoretical performance can be calculated, assuming that the amplifier noise is negligible compared with detector noise. Optimum or ideal detector performance is based on the assumption that the detector noise is determined only by the random fluctuations of the photon-absorption process. This ultimate limit on performance can be calculated easily and is used as a final measure of performance for infrared detectors (14). For example, consider a photodetector as a photon counter, such as a photodiode connected to an integrating amplifier. After a time interval t, the integrated charge density Q may be expressed by the following: Q D (Jφ C Jd ) t.
(9)
The number of integrated carriers N is QA/q, where q is the electron charge. Because dark current, Jd , is a combination of thermal excitation processes, neglecting avalanche and tunneling, ideal performance occurs when the photon-induced current density Jφ is greater than Jd . Fluctuations of N are the dominant causes of noise. The cooling requirement may be calculated for each type of photon detector by analyzing the dark current noise mechanism in terms of thermal equilibrium processes and setting Jφ ½ Jd , where Jφ D ηφq A/cm2 and φ is the sum of the signal and background photon flux. Because the charge density of a photon-counting detector is sampled at time intervals t, the number of charges sampled N D ηφA t.
PHOTODETECTORS
The generation of photons obeys Poisson statistics where the variance is N and the deviation or noise is N1/2 . The noise spectral density NS is obtained by a Fourier transform of the deviation yielding the following at sampling frequency f : NS D
2N f f
1/2
D
2ηφA t f f
1/2 .
(10)
The signal S, in electrons, is that portion of N which is generated by a signal flux φS in the same time interval t. Therefore, (11) S D ηφs A t. The ideal detectivity may be obtained by combining Eqs. (5), (8), (10), and (11). When t D f 1 , the ideal spectral detectivity (2) in cm Hz1/2 /W for a photon limited process becomes DŁλ D
λ hc
η 2φ
1/2
p cm Hz/watt.
(12)
When Eq. (12) is valid, the detector is said to be a background-limited infrared photodetector (BLIP). When this is the case, attempts often are made to improve DŁ by cold shielding which reduces φ. The ideal DŁλ is shown in Fig. 3 as a function of wavelength, where background photon flux is a parameter. The line of termination in the lower left corner represents DŁλ values for a 180° (2π ) detector field of view, 300 K ambient background temperature, and background emissivity 2D 1.0. As long as the photon-generated current exceeds the dark currents, DŁλ may be increased by reducing the background flux. For very low values of photon flux, the ideal detection condition of Eq. (12) can be restored by cooling the detector to decrease the dark currents. The cooling needed to achieve a given detectivity for intrinsic-type detectors may be readily calculated on the basis of the temperature dependence of the minority carrier lifetime and mobility. The calculated
Peak Dλ∗ (cm Hz1/2 / W)
10
0.3
Energy gap (eV) 0.2 0.15 0.1 0.05
0.03 1012
1014
1014
1013
1016
1012
1018
1011
Effective background photon flux, (cm2 s)–1
0.5
15
300 K Background 1010
2
3
4
6
8 10
14
20
30 40
Cutoff wavelength (µm)
Figure 3. Ideal photon detector sensitivity as a function of cutoff wavelength. Lower background flux generates less photon-induced noise giving higher sensitivity. The sensitivity limit for the condition of 300 K background temperature and hemispherical (2π ) field of view is shown.
Peak Dλ∗ (cm Hz1/2 / W)
1015 7 4 2 1014 7 4 2 1013 7 4 2 1012 7 4 2 1011 7 4 2 1010 7 4 2 109
3.1 µm
5.2 µm
A
B 14 µm
1189
C 5.0 µm 3.0 µm 2.8 µm
4.7 µm
4.7 µm 4.7 µm 10 µm
20
60
100 140 180 220 Temperature (K)
260
300
Figure 4. Sensitivity as a function of detector temperature showing () experimental results for HgCdTe; ( and ) Hg1x CdTe, x D 0.185 and x D 0.280, respectively; (r) InSb; (ž) InAs; and () PbS. Also shown are theoretical values (− ž −) based on thermal equilibrium statistics, A and B, Hg1x CdTe where x D 0.195 and x D 0.29, respectively; and C, InAs.
values for indium arsenide (InAs) and two compositions of HgCdTe, are given in Fig. 4. Experimental data for lead sulfide (PbS) and three HgCdTe detectors of different compositions are also shown. The importance of cooling is obvious. Detectors are often cooled by providing good thermal conductivity to a suitable cryogen (2). The most readily available coolants are solid carbon dioxide (CO2 ) at 195 K, liquid nitrogen (N2 ) at 77 K, and liquid helium (He) at 4.2 K. If the desired temperature is not about 200 K, a thermoelectric cooler may be used. Joule–Thompson and expansion engine coolers are available for operation to 4.2 K and below. A cooled detector must be contained in a vacuum vessel that thermally isolates the detector element from other than the coolant by low thermal conduction materials and vacuum. If low absorbing extrinsic detectors are used, it is customary to enclose the detector element in a cooled chamber that has reflecting walls, that is, in an integrating chamber, to obtain multiple passes of the radiation. The latter can also be accomplished by shaping the element so as to ensure multiple, total, internal reflections. Thin detectors can be sized to produce optical resonance in the detector element, making the detector itself an optical resonant cavity. A detector for a specific application should be chosen to minimize the cooling requirements and the magnitude of the background radiation noise; therefore, in detector selection, the cutoff wavelength should be only slightly greater than that required by the application. If the signal radiation has a substantial path length through the atmosphere, then atmospheric
1190
PHOTODETECTORS
transmission characteristics must be considered. Because of the presence of CO2 and H2 O vapor, many absorption bands appear in the atmospheric transmission spectrum. Windows of high transmittance occur between the bands. The principal windows are at 1.0–2.5, 2.9–4.2, 4.4–5.3, 7.5–14.0, and 16–23 µm (2). PHOTODETECTOR MODES OF OPERATION The need for high performance and low cost detectors has resulted in fewer than two dozen types of detectors that are commercially available. These are made from only 10 basic semiconductor elements or compounds. Popular detectors, their modes of operation, cutoff wavelength, temperatures of operation, response times, responsivities, and detectivities are listed in Table 1 (15). The values chosen are for near-optimum performance at a given temperature. Silicon-based photon detectors such as photovoltaic and charge-coupled detectors operate near room temperature. Cadmium sulfide and germanium detectors also require no cooling. Wide acceptance of the ternary compound semiconductor mercury–cadmium–telluride (MCT), HgCdTe, as a photoconductor began in 1972. Requirements for large focal planes has since led to significant developments of the MCT photodiode focal plane (16). The controllable (by composition) band gap of MCT in each case results in a cutoff wavelength that is tailored for a specific application (Fig. 4). Indium gallium arsenide, lead sulfide, indium arsenide, and MCT (40% at. wt CdTe) detectors can be operated at 300 K but perform much better at lower temperatures (2,14). Four principal modes of semiconductor-based photodetectors are the photoconductor, photodiode, charge-mode device, and bolometer. The Schottky barrier mode using internal photoemission has been well developed using platinum silicide, but low quantum efficiency restricts the range of uses. The quantum well infrared photodetector (QWIP) is included here because artificially structured materials such as this superlattice detector represent a new metallurgy. The semiconductor materials for most applications are Si, Ge, GaAsP, InGaAs, InSb, PbS, CdS, and HgCdTe. These materials grown as single crystals and thin polycrystalline or amorphous films are doped with various elements such as boron, phosphorus, indium, and gold, to control the polarity of the conductivity and for selective optical absorption by the introduction of impurity states in the forbidden energy gap. Photodetection using these materials extends from the ultraviolet (0.3 µm) to long-wavelength infrared (200 µm). Higher sensitivity, especially in the infrared, can be achieved by cooling the detector and its immediate surroundings to low temperature, typically 77 K. Interfacing with signal processing electronics can be difficult when the photon signal is weak and imaging systems require sophisticated signal processing to handle the large number of pixels found on a typical focal plane. Photoconductors A photoconductor (17) is a semiconductor resistor whose resistance lowers when photons generate excess carriers.
The two general classes of photoconductor are extrinsic, which is characterized by impurity states that are emptied by photons, and intrinsic, where the photons excite electrons directly from the valence band to the conduction band (Fig. 1). The photoconductor is connected in series to a constant load resistor and current source. The ambient light or background radiation sets up a steadystate resistance in the detector. Changes in illumination produce changes in resistance and thereby changes in the voltage across the load resistor. When the load resistance is much larger than the detector resistance, the signal voltage is given by Eq. (13): Vs D
η s τ V , nt
(13)
where the signal flux φs , is caused by the changing background. At low bias, the signal is linear with the bias voltage V, but at high bias, the minority carrier sweep time becomes less than the lifetime τ , and the signal saturates. The majority carrier density n is typically 1 ð 1015 to 1 ð 1016 /cm3 . The thickness t is typically one to three absorption lengths (8–20 µm). Photodiodes Photovoltaic detectors (17) generate a voltage (open circuit) or a current (closed or integrating circuit) when they receive radiation and therefore do not require an external bias for operation. The charge integrated on a capacitor of the integrating circuit is proportional to the intensity of radiation. The current Is , generated by the photovoltaic cell is given by the following: Is D η s q A.
(14)
The quantum efficiency can be determined by using the measured current and a calibrated photon source, for example, a blackbody. The detector area A should be measured by a spot-scanning apparatus because the minority carrier diffusion length effectively increases the diode photon-active area. A photovoltage can arise from the presence of internal fields which are formed at the interface between a semiconductor and a metal (Schottky) barrier, at the interface between different dopings in a semiconductor (homojunction), and at the interface between two different semiconductors (heterojunction). Light releases mobile electric carriers within these barrier layers. These carriers are separated by the field and produce a photovoltage between them, as indicated in Fig. 5. Photovoltages can be produced in the bulk of a semiconductor because of the difference in the diffusion of electrons and holes or if there is a gradient in the impurity concentration within the material, but these mechanisms are not widely used. The impedance area product ZA of a diode may be obtained from differentiating the diode current–voltage relationship and may be expressed as follows, at zero applied basis: kT . (15) ZA D Bq Jd
PHOTODETECTORS
(a)
Incident radiation
Conduction band Electron
1191
Field plate Insulator
p-Type
hn
Fermi level
Depletion region
Battery Semiconductor
n-Type Hole Substrate contact Valence band
Figure 6. Device model for an MIS photodiode, the basic building block of the CCD photodetector. The depletion region is generated by the battery potential. See text.
(b) Passivation Contacts
Charge Mode Detector p-Type
Depletion zone
n-Type semiconductor
Figure 5. The photodiode detector: (a) band model where the photon generates electron–hole pairs that are separated by the built-in potential that sets up a photocurrent; (b) physical model for a planar diode. The passivation is typically SiO2 for Si diodes, an In oxide for InSb diodes, and CdTe for HgCdTe diodes.
Under little or no illumination, Jd must be minimized for optimum performance. The factor B is 1.0 for pure diffusion current and approaches 2.0 as depletion and surface-mode currents become important. Generally, high crystal quality for long minority carrier lifetime and low surface-state density reduce the dark current density which is the sum of the diffusion, depletion, tunneling, and surface currents. The ZA product is typically measured at zero bias and is expressed as R0 A. The ideal photodiode noise current can be expressed as i2N D 2qA(2Jd C Jφ ) f ,
Charge mode devices (19,20) are a group of detectors that use the metal–insulator–semiconductor (MIS) capacitor as their basic constituent. An MIS capacitor is a device that consists of a nondegenerately doped semiconductor substrate, a thin dielectric layer, and a metal gate, as shown in Fig. 6. The application of a potential difference across the metal gate and the semiconductor produces a surface potential barrier and an associated depletion region in the semiconductor. Periodic pulsing of the gate-to-substrate bias creates a nonequilibrium condition suitable for storing optically generated free carriers. The model for an MIS capacitor, shown in Fig. 7, includes a metal gate-to-semiconductor bias of Vg volts that produces an insulator bias VI , and surface potential φs . The negative for n-type material gate charge Qs , produces an ionic space charge Qd , and inversion layer charge Qp . From the basic relationships (21) that follow,
(16)
Vg D VI C s ,
(18)
Qs D Qd C Qp ,
(19)
Insulator Metal
DŁ D qη/hυ(4 kT/ZA C 2qJφ )1/2 .
(17)
Thus, ZA must be increased to achieve BLIP DŁ such that ZA > 2kT/qJφ . When Jφ > Jd , Eq. (17) reduces to the ideal photon limit of Eq. (12). The response time is equal to the ZC product, where C is the diode capacitance, or when using an integrating amplifier, the response time is determined by the closed-loop gain.
Semiconductor
VI Potential
where Jφ is the photon current density. Equations (15) and (16) can be used to determine the cooling requirements for photodiode detector performance because diffusion current depends exponentially on temperature (18). Because the diffusion current represents a valence band-to-conduction band excitation, the thermal limit, when Jd >Jφ , is essentially the photoconductive limit (Fig. 4). Combining Eqs. (5), (8), (14), (23), and (24) gives the following:
Qd
Vg Q
Ψs
n EF
Qp
x Distance Figure 7. Band model for the charge mode detector biased to deep depletion. The charge Qp , integrates in the potential well defined by the insulator and valence band. See text.
1192
PHOTODETECTORS
where Qs D CI VI and Qd D Cd s ,then s D Vg /(1 C Cd /CI ),
(a)
(20)
where Cd is the capacitance associated with the space–charge layer geometry of charge n at distance x. From semiconductor theory, s D n.qx2 /2ε,
Photons of focused image
Column Time t1
(21)
where ε is the dielectric permittivity of the semiconductor and, from geometry,
t2
Cd D εA/x.
t3
(22)
The basic MIS condition is obtained by combining Eqs. (18)–(22): s C (εqn/2)
1/2
A(s )
1/2
/CI D VQp /CI .
Photogenerated charge
Q t1
(23)
At initial turn-on, the inversion charge Qp D 0, and when n is small and CI is large, s D approximately V, that is, a potential well of nearly V is formed. Thus, the chargehandling capacity QFW is QFW D approximately CI V,
Row
(24)
and it can be seen that it depends on both the physical dimensions of the capacitor and the applied voltage. Typical full well capacities range from 10,000 carriers to more than 5,000,000 carriers, depending on the specific charge mode device. As time progresses, charge Qp is generated by photon currents, semiconductor dark currents, diffusion, depletion, surface, avalanche, and tunneling. The time required to fill the well by dark currents is the storage time τst , given by the following:
(b)
Photons of focused image
2
4 1
2
4 3
1
Time
t1 Photogenerated charge To sense gate or diode
t2
t3 I
(25)
Figure 8. Conceptual cross-sectional views of (a) a CID imager showing charge integration (t1 ), charge measurement (t2 ), and charge removal (t3 ); and (b) a CCD shift register showing charge transfer from a phase one well to a phase two well.
The storage time is the maximum time the potential well can be used for collecting a photon-generated charge. Because Jd depends strongly on band gap energy and temperature of operation, τst , in practice, extends from hours for silicon at 300 K to a few microseconds for HgCdTe, where the cutoff wavelength λ D 12 µm at 77 K. Indium antimonide operates in a few tenths of a second at 77 K (22). As previously indicated, an MIS capacitor can be utilized as a photon detector. A negative pulse on an n-type semiconductor generates a deep depletion region at the surface which, in turn, acts as a storage site for photogenerated holes. This type of detection is true charge integration and is the simplest charge mode detection concept. Other MIS devices exist that contain arrays of overlapping, that is, metal gates in a given detector element. These MIS-based detectors are called chargetransfer devices (CTD). The two most common forms of CTDs are charge-injection devices (CID) and chargecoupled devices (CCD). The CID is a dual gate version of a CTD. In the CID, two gate electrodes form two distinct but physically adjacent depletion capacitances, as shown in Fig. 8a. Under normal
operating conditions, each of the two gate electrodes is biased to form a depletion region in the underlying semiconductor. However, the surface potential minimum under one of the electrodes is much greater than that of the other electrode. As a result, the photogenerated charge produced in the detector collects in the deeper potential minimum. To measure the number of stored carriers, the potential difference between the collection electrode and the semiconductor substrate is reduced, thereby collapsing the storage well and forcing the collected charge to transfer to the region controlled by the second gate electrode. The collected charge can, in principle, be quantified by monitoring the change in surface potential of this second electrode during the transfer operation. In practice, this is accomplished by monitoring the instantaneous current through the electrode at the instant of charge transfer. The detector is reset to the empty state by simultaneously collapsing the depletion regions of both capacitors. This action forces the free carriers into the semiconductor substrate where they are subject to recombination mechanisms. Thus, a key feature of a CID is that the photogenerated charge is both collected and
τst D CI V/Jdark .
PHOTODETECTORS
sensed inside a single detector element. The signal charge is transferred from that element simply to eliminate it from the detector before the next detection cycle. The charge-coupled device (CCD) is a multigate CTD. As for the CID, the CCD gates are physically adjacent and are periodically pulsed, inducing the formation of deep depletion regions in the underlying semiconductor. However, unlike the CID, the CCD physically transfers the collected photogenerated charge from a given detector site to a separate charge measurement structure. Figure 8b shows a conceptual cross-sectional view of a CCD shift register. As the photons of an image that is focused on the device generate a charge, the MIS gates of the shift register are biased in sequence to produce, continually, a shifting potential well pattern. Thus, at time t1 , phase one is on, and a charge is integrating in the potential well beneath the MIS phase one gates. At time t2 , phases one and two are turned on, and the charge distributes beneath phase one and phase two gates. At time t3 , phase one is turned off, and the charge is only beneath phase two. Pulsing phases three and four in the proper sequence with phase one and two moves the charge packet through the shift register. For an area array CCD, the imaging portion of the array transfers charge, once per line interval, into a readout shift register in a manner equivalent to that described for the imaging shift register. Each charge packet in the imaging register is then shifted discretely to the chargedetection structure that resides at the end of the register. This sense electrode or diode is in turn typically connected to a low-noise, charge-sensitive amplifier that converts the signal charge to a voltage. After all charge packets in the shift register are read out, the next line of information from the imaging portion of the array is transferred, and the process is repeated. Bolometers The bolometer (23) has made a comeback as a popular detector due to recent advances in micromachining technology. When applied to silicon, it is feasible to fabricate large arrays of bolometers and detecting elements that have very small thermal mass. The imaged infrared radiant power slightly heats the bolometer film a few millidegrees K reducing the electrical resistance. The resulting change of the bias current is the signal. The noise components are l/f , Johnson, thermal conductance, and photon. The photon noise is actually the radiative part of the thermal conductance noise. From bolometer theory (1), the change in film temperature is proportional to the absorbed power and thermal resistance (inverse of the thermal conductance) and is given by the following: τs 1 exp Rth Ch Tb D JAεa Rth τs 1 C exp Rth Ch τs , (26) D JAεa Rth tanh 2Rth Ch where J is the differential power density at the detector caused by a change in target temperature in W/cm2 ;
1193
A is the active pixel area, 1.2 ð 105 cm2 ; εa is the IR absorption efficiency, 90%; Rth is the thermal resistance and is approximately 1.5 ð 107 K/W; τs is the half period of chopped radiation (1/2fc ), typically 16.7 ms (30-Hz modulation); and Ch is the heat capacity of the pixel element and is approximately 0.7 ð 109 J/K. The power density Js may be calculated by using Plank’s radiation law. For a 300 K scene temperature and spectral region from 8 to 12 µm, Js D 2.0E 4 ε0 Ts , φ
(27)
where ε0 , the optical efficiency, is typically 80%, Ts is the differential target temperature, and is the collection solid angle of the optical system [D π sin2 (tan1 1/2fn )]. For the bolometer detector that has the characteristics above and operates with f /l optics (fn D 1), the peak to peak temperature change of the α-Si film is approximately 3 mK per degree K change in scene temperature. Signal. The peak to peak signal voltage Vs , is proportional to the bias power and can be expressed as follows: Vs D α(PRb )1/2 Tb ,
(28)
where α is the temperature coefficient of resistance and is 2.8% K for α-Si, P is the bias power for the bolometer element (typically 0.3 µW), and Rb is the bolometer resistance (typically 3 ð 107 for α-Si and 1 ð 104 for VO). The voltage responsivity is simply given by Eq. (29): Vs τs 1/2 . RV D D α(PRb ) εa Rth tanh JA 2Rth Ch
(29)
Responsivity values are typically in the 5 ð 105 V/W range for nominal bias. Noise. The noise components may be expressed as voltage. Excess (l/f ) noise:
VEX D f
γ
2 f
df D KV (PRb )1/2 ,
(30)
where KV is the l/f noise coefficient that is typically in the range from 6 to 12 µV per volt bias when integrated from 1 to 100 Hz. Johnson noise VJN is VJN D (4kTRb f )1/2 ,
(31)
where k is the Boltzmann constant, T is the detector temperature (300 K), and f is the noise bandwidth (100 Hz). Thermal conductance noise VTC is VTC D α(PRb )1/2
kT 2 Ch
1/2 ,
(32)
when the thermal response time Rth Ch , is less than the frame time.
1194
PHOTODETECTORS
NEP. The noise components may be combined with the responsivity to give the noise equivalent power (NEP) equation: 2 2 1/2 C VTC ) (V 2 C VEX (33) NEP D JN RV Measured values are typically less than 100 pW for a 30-Hz scene modulation.
designed for commercial applications. Because of the relatively small number of devices supplied yearly for scientific applications and the more stringent performance requirements, scientific-grade CCDs are typically more expensive
(a)
1013 8
GaAsP (300)
6 Si (300)
Detector Fabrication and Performance 4
Dλ∗ (cm Hz1/2/W)
Detector performance and focal plane array development have improved significantly since the early 1980s (24,25). The spectral sensitivities of several photodetectors given in Fig. 9 cover the spectrum from ultraviolet to far infrared. The typical operating temperature is given, and the theoretical limit is that calculated for a detector exposed to hemispherical radiation from a 300 K background. Charge-Coupled Devices and Imaging Arrays
Utilization. CCDs are often grouped into two broad categories: commercial grade and scientific grade. The format and required performance of the CCD for both categories are driven by the application for which the array is designed. In general, commercial-grade CCDs are designed to optimize image resolution and color fidelity. On the other hand, scientific-grade CCDs are designed to maximize photon quantum efficiency and minimize device noise. The vast majority of available CCDs have been
Cds (300) 1012 8
GaAs (300)
Ge (300)
6
InAs (77)
4
2
1011 0.3
HgCdTe x = 0.4 (200) 0.5 0.7
1
2
3
5
7 9
Wavelength (µm) (b)
1012 PbS (195)
1011
Dλ∗ (cm Hz1/2/ W)
The single most popular type of visible photon detector is the charge-coupled device (CCD). Developed in 1970 (26,27) as a dynamic memory storage device, the CCD was quickly adapted for use as an imaging device. The CCD has been a resounding commercial success and has become the detector of choice in the video camera and electronic still photography markets. Furthermore, since its inception, the CCD has also been an invaluable tool in a wide range of scientific applications involving the detection of near-infrared, visible, ultraviolet, and X-ray photons, as well as in the detecting ionizing radiation and charged particles. CCDs are sensitive photon detectors. The physical mechanism governing photon detection in silicon depends on the energy of the incident photons. For the nearinfrared to visible spectral band, silicon performs as an indirect band gap semiconductor. Incident photons are converted to electron–hole pairs through interactions with the silicon lattice, producing one electron–hole pair for each absorbed photon. For higher energy photons such as those of the extreme UV and X-ray spectral bands, direct band gap detection dominates. In this case, one electron–hole pair is generated for every 3.65 eV of energy in the absorbed photon. If the scene of interest produces only a small number of high energy photons that impinge on the detector array during each frame interval, the number of signal electrons in each pixel is a measure of the energy of the incident photon. In such applications, CCDs can be used to obtain spectral as well as spatial information (28).
2
PbS (77)
PbS (300)
PbSe (77)
1010
109
PbSe (300)
108
107
PtSi (77)
0
1
2
3
4
5
PbSe (195) PbTe (77)
6
7
Wavelength (µm) Figure 9. Spectral sensitivity of detectors where the detector temperatures in K are in parentheses and the dashed line represents the theoretical limit at 300 K for a 180° field of view. (a) detectors from near UV to short-wavelength infrared; (b) lead salt family of detectors and platinum silicide; (c) detectors used for detection in the mid- and long-wavelength infrared. The Hg1x CdTe, InSb, and PbSnTe operate intrinsically, the doped silicon is photoconductive, and the GaAs/AlGaAs is a structured superlattice; and (d) extrinsic germanium detectors showing the six most popular dopants.
PHOTODETECTORS
(c)
2 1.5 1011 8
InSb (77)
∗
Dλ (cm Hz1/2/W)
6 4
HgCdTe x = 0.2 (77) Si:B (5)
HgCdTe x = 0.3 (200)
2 1.5
PbSnTe (77) x = 0.19
1010 8
Si:As (15)
GaAs/AlGaAs 6 (77)
Si:Ga (20)
4 3
5
7
10
15 20
30
40
Wavelength (µm) (d)
Dλ∗ (cm Hz1/2/W)
1011 5 4 3
Au (65)
Cu (15)
Hg (30)
2 1010
B (2)
5 4 3 Au (77)
2 109
2
3 4 5
Cd (25) 10
Zn (6)
20 30 40 50
100
200
Wavelength (µm) Figure 9. (Continued)
than their commercial counterparts. Commercial-grade CCDs can be adapted for use in scientific applications where a moderate photon level is present, such as some forms of visible spectroscopy. Most CCDs are specifically designed for video camera applications, which detect photons in the visible portion of the electromagnetic spectra. Video camera CCDs have specific camera formats compatible with standard video display systems. In the United States, video displays utilize the RS-170 standard, commonly known as the National Television System Committee (NTSC) 525-line television standard. The RS-170 standard places many constraints on imaging devices. These devices must have a frame time of 1/30 second, and two interlaced image fields of 1/60 second each are required during each frame. The standard luminance electrical bandwidth of 4.2 MHz coupled with the 4 : 3 picture aspect ratio require the pixel count and the pixel size of the commercial CCD array
1195
to fit within narrowly defined regimes. Specifically, to meet the electrical bandwidth requirements of the RS-170 standard, monochrome imagers must have a minimum of 240 pixels in the vertical dimension and a minimum of 450 pixels in the horizontal dimension. The distance along the diagonal of the active imaging portion of the CCD must be one of several specific values in the 5.46–25 mm range. The specific value depends on the particular camera format, for example, 8 mm, VHS, for which it is designed. These format and size requirements dictate a rectangular rather than square CCD pixel design. Data rates from the imaging devices must be chosen so that the image scene readout time matches the RS-170 interval. The minimum data rate for monochrome imagers is 6 MHz. The actual rate spans from 6 to 10 MHz, depending on the specific device. Color applications require more complex image detection schemes due to the need for spectral differentiation. Signal samples from at least three distinct spectral bands are required to reproduce a color image accurately. Typically, a monolithic color filter array that consists of alternating windows of appropriately colored filter media is attached to the CCD for color differentiation. A common approach to obtaining the scene information required for faithful color image reproduction is to increase the horizontal pixel count of the CCD by a factor of 3 (29). In this method, a filter that has alternating red, blue, and green stripes is then attached so that each pixel column aligns with a filter stripe. This color CCD uses three adjacent pixels, one red, one green, and one blue, to represent a single spatial sampling of the scene. In effect, the device can be thought of as three integrated monochrome CCDs. The color CCD has three on-chip output channels, one for each spectral component. As each pixel is read, the simultaneously generated signal information from each channel is remixed so that it correctly simulates a color display. More ambitious methods exist for achieving color image reproduction (30,31). These methods use the horizontal pixel count found in monochrome CCDs but require intricate clock control methods to intermix the spectral components appropriately for display during the actual integration interval. Intermixing is produced by using a repetitive mosaic pattern of colored filters, typically cyan, yellow, and green. The imaging results from these devices rival those obtained from the three-channel method. Video camera applications obligate the CCD imager to have an internal, electronic, image shuttering capability to differentiate the data in one image frame clearly from that of the next. The two most common forms of internal shuttering are frame transfer and interline transfer (32). In the frame transfer technique, the image gathered from one data frame is rapidly transferred from the contiguous image collection region to a secondary, optically shielded storage site that resides between the image collection and shift register regions. The image held in the shielded region is then read out during the time interval of the next image formation in the optically active area. In the interline transfer technique, each pixel in the imaging portion of the array is segmented into an optically active portion and a shift register portion. The image gathered from one data frame is quickly transferred
1196
PHOTODETECTORS
from the optically active to the shift register portion of the pixel and is read out during the next image collection interval. Unlike the frame transfer method, no additional image storage section is required for the interline transfer method. However, the frame transfer device is more efficient at photon detection because there are no optically shielded features in the image collection region. Most CCDs for consumer applications are illuminated from the front surface of the device. Photons that impinge on the front surface of the detector must pass unimpeded through the gate electrode layers to be collected in the photon-sensitive silicon beneath. Typically, some loss does occur for photons of wavelengths shorter than about 550 nm due to absorption in the gate electrodes. For standard polysilicon gate CCDs, photons of wavelength less than 400 nm are absorbed by the gate material. This absorption process affects photons that span the UV to the low-energy X-ray spectrum. Photons of wavelengths greater than 650 nm pass easily through the gate electrodes, but due to the lower value of the absorption cross section of silicon, a several micrometer path length is required to provide a high probability of absorption. A deeply generated signal charge can produce image smearing due to charge diffusion before collection in the potential well of the CCD pixel. Both the photon collection efficiency and image quality are strong functions of the particular CCD design. Typical consumer-quality CCDs provide sensitivities that span the 50–500 mA/W range and have minimum detectable signal levels from 40–400 photons at the standard 16-ms image integration interval. CCDs are widely used in electronic still photography. The late 1990s saw the development of a myriad of commercial digital electronic camera systems. The quality of these systems spans the range of very low-cost, lowresolution cameras for Internet imagery to very expensive, 35-mm film quality imagery for professional photography. Unlike film-based cameras, images are stored on electronic media such as floppy or optical disks. The image can be read into a computer-based system, edited, or enhanced as desired, and stored again on the digital medium. Hard copies are obtained by devices ranging from low-cost photographic quality color printers for personal computers to separate image transfer systems specifically designed for photographic paper. Still photography places more stringent specifications on a CCD than those required of RS-170 video cameras in both performance and resolution. Matching the image quality of 35-mm film requires a CCD whose format is larger than 4,000 ð 4,000 pixels and can faithfully reproduce an image at low light levels. The development of digital photography has raised the performance level of some commercial CCDs to that of custom scientific quality devices. Whereas some consumer CCDs can occasionally be adapted for scientific endeavors, in general these devices have format and operability constraints that negatively impact their performance in many applications. Scientific CCDs are somewhat more difficult to categorize because of the wide range of capabilities available in such devices. The performance features of a specific scientific CCD are determined by the device design. A few of the more significant performance features which may be found in
a given scientific CCD include the following: (1) spectral response that extends over a wide range of photon and charged particle energies; (2) noise levels as low as one electron per pixel, which result in very high signalto-noise ratios; (3) very high charge-transfer efficiencies to achieve no measurable degradation in spatial or spectral information upon image readout; (4) dark current generation rates that permit several minute integration times at room temperature and multiple hour integration times at cryogenic temperatures; (5) high photon detection efficiency; and (6) high frame rate operability. In addition, scientific CCDs are typically operated in the full-frame readout mode (32), in which an external mechanical shutter shields the imager during scene readout. Scientific CCDs have been used in a wide range of applications. In the visible and UV portions of the spectrum, scientific CCDs have proven to be effective tools for low photon flux applications. Specifically, CCDs have been used as image detectors in ground-based telescopes (33,34), as well as in space-borne systems such as the wide-field planetary camera of the earthorbiting space telescope (35,36), the Giotto mission to Halley’s comet (37), the Galileo mission to Jupiter (38,39), and the Cassini mission to Saturn (40). Figure 10 is a photomicrograph of a 6’’ diameter silicon wafer that contains more than a dozen different varieties of scientific CCDs. The wafer contains a 4,096 ð 4,096 pixel device for astronomy, several 1,024 ð 2,048 pixel imagers for slow scan frame store operation, a 1,024 ð 1,024 element CCD for use in very high speed image acquisition, and numerous other imager types. The photomicrograph was
Figure 10. Figure 10 is a photomicrograph of a 6’’ diameter silicon wafer containing over a dozen different formats of scientific CCDs. Included in the wafer is a 4,096 ð 4,096 pixel device for astronomy, several 1,024 ð 2,048 pixel imagers for slow scan frame store operation, and a 1,024 ð 1,024 element CCD for use in very high speed image acquisition.
PHOTODETECTORS
taken from a complete Si wafer after device processing but before device packaging. The same features that make scientific CCDs excellent devices for astronomy, that is, high photon collection efficient and low readout noise, also make CCDs excellent tools for chemical analysis (41). CCDs can be used in many forms of spectroscopy (42), including absorption (43,44), fluorescence (45–47), luminescence (48,49), emission (50–52), and Raman (53–55), across a spectral range of 0.1 to 1,100 nm. The high dynamic range (typically >100 dB) of the CCD is an invaluable property for identifying weak spectral lines. Furthermore, the inherent integration of many highly sensitive photodetector elements in a small area has, in some instances, allowed revolutionary experimental designs (56). Several visible light spectrometer systems that use CCDs as the imaging device are commercially available. In the X-ray portion of the spectrum, scientific CCDs (57,58) have been used as imaging spectrometers for astronomical mapping of the sun (59), galactic diffuse X-ray background (60), and other X-ray sources. Additionally, scientific CCDs designed for X-ray detection are also used in X-ray diffraction, materials analysis, medicine, and dentistry. CCD focal planes designed for infrared photon detection have also been demonstrated in InSb (61) and HgCdTe (62) but are not available commercially. Fabrication. Although CCDs have been fabricated in many semiconducting materials such as Ge (63), InP (64), and HgCdTe (65), by far the most readily available devices are those that use Si as the semiconductor. There are several common types of silicon CCDs. All share certain processing steps, and most utilize p-type silicon as the semiconducting material. Single silicon crystals are grown (66) in conventional Czochralski vertical pullers using a single-crystal silicon seed dipped and rotated in a silicon melt. The large boules of silicon are sawed into wafers. The following fabrication discussion describes a
f4 f 1 f2 f3
f4
SiO2 n p
Si
Channel stops
p+
Charg
e-tran
sfer d
irectio
n
Figure 11. Cutaway view of a CCD shift register where the φi represent gate electrodes. Voltage pulses applied to the phase gates move a photogenerated charge in the charge-transfer direction. The channel stops confine the charge during integration and transfer. See text.
1197
process for creating a generic four-phase CCD from p-type silicon (67). A cross-sectional view of a four-phase Si-based CCD is shown in Fig. 11. The starting wafer is a thin p-type silicon layer grown epitaxially on a degenerately doped pC silicon substrate roughly 500 µm thick. The epitaxial layer is typically 8–15 µm thick and is doped with boron to a nominal resistivity of 10 ohm cm. The initial step of the fabrication process consists of a cleaning procedure designed to ensure a nearly defect-free surface. Next, a thin layer of protective SiO2 is grown on the silicon surface by using elevated temperatures in a steam and oxygen ambient. Channel stops, thin stripes that laterally contain the stored charge, are created by high energy implantation of p-type ions into appropriate regions of the silicon. Most CCDs are designed with a thin layer of n-type dopant at the silicon surface which holds the stored charge physically away from the silicon surface. This dopant layer, called the buried channel, is produced by implanting n-type dopant ions such as P or As into the silicon. After the implantation of the channel stops and buried channel, the original SiO2 layer is removed, and a fresh, undamaged layer is SiO2 is regrown. Next a layer of heavily impuritydoped polysilicon is deposited and patterned to form the φ2 and φ4 gate electrodes. An additional layer of SiO2 is then grown to provide an effective insulating layer atop the first polysilicon layer. The φ1 and φ3 gate electrodes are formed from a second layer of heavily impurity-doped polysilicon. A low resistance material such as aluminum is then used to form electrical connections between appropriate parts of the CCD. Typically this same metal layer is used to form bond pads that connect the CCD to external control signals from off-chip electronics. However, some process sequences require including a second metal layer for this purpose. Finally, the CCD is covered with an overcoat of protective material such as SiO2 or borophosphosilicate glass. This 500 to 1,000-A˚ layer is a barrier between the environment and the contamination-sensitive CCD. Device Type Versus Application. The application for which the CCD is designed dictates the variants in the process that are used to provide the desired performance enhancements. Consider as an example the effect of front-side illumination of the CCD on photon collection efficiency. Applications that require very high photon collection efficiency in the visible blue, extreme UV, or X-ray spectral bands are not well served by front-side illumination of the CCD. A useful CCD variant for such applications is the back-side illuminated CCD. Backside illuminated CCDs undergo additional processing to remove the underlying pC substrate from the p-type epitaxial layer. As the name implies, photons impinge upon the device from the back side, thereby avoiding the absorption layers of gate electrodes present in front-side illuminated devices. In this manner, the photon collection efficiency of the device can be improved in the blue, UV, and low-energy X-ray photon regimes. Substrate is removed by using an etchant whose etch rate is highly dependent on the concentration of boron dopant in the silicon. The process of substrate removal is well understood and highly repeatable. However, thinned (to ca. 25 µm)
1198
PHOTODETECTORS
back-side illuminated CCDs are typically more costly than thick front-side illuminated devices. The additional cost is associated with the reduced mechanical rigidity produced by the substrate removal. Other CCDs have special processing steps that lower the rate at which surface dark current is generated during the interval in which signal charge is collected. One such device is known as the virtual phase CCD (VPCCD) (68). The VPCCD uses a series of implanted dopant layers near the surface of the silicon to eliminate one of the polysilicon deposition steps required for electrode formation. The implanted layers serve as virtual electrodes which force the surface potential of the silicon in these regions to a constant bias during operation. These virtual electrodes replace two of the four gate phases described in the generic CCD process. The two remaining gate control phases are incorporated into a single physical gate electrode, again by ion-implanted layers in the silicon. Thus, the top side of the VPCCD pixel is only partially covered by the polysilicon gate electrode. The ion-implanted layers provide storage capability and unidirectional charge transfer, as the single polysilicon electrode phase is clocked. The virtual electrodes also greatly reduce the rate of surface-related dark current generated in the CCD. At 25 ° C, the nominal dark current value of a VPCCD is 0.1 nA/cm2 , which is nearly five times lower than the dark current in a typical buried channel CCD. A second device that has additional specialized ion-implanted layers is the multipinned phase (MPP) CCD (69). The MPP device merges the best performance features of multiphase and virtual phase CCD technologies and is becoming the CCD of choice for scientific applications. As in the generic multiphase CCD, the entire charge storage region is covered by polysilicon gates. However, before the electrode deposition, additional dopant layers are implanted into the silicon. These implanted dopant layers enable the MPP CCD to integrate the charge with the applied gate bias set, so as to attract opposite polarity charges to the SiSiO2 interface. This method of operation produces the lowest possible dark currents that rival or exceed the performance of the virtual phase CCD. Operation of the CCD in the MPP mode does, however, reduce the total charge storage capacity to typically 20% of that of an equivalent, non-MPP device. Although front-side illuminated devices are available, the MPP CCD is typically back-side illuminated to achieve state-of-the-art photon collection performance. Even with this stipulation, the MPP CCD is still a popular detector due to its availability in various array sizes and formats specifically designed for scientific applications. Silicon Photodiodes The popularity of silicon photodiodes is directly related to their ability to detect photons over a spectral range that spans the near-infrared to low-energy X-ray regimes. Typical spectral response characteristics are given in Fig. 9a. The fast response time of less than 1 µs is attractive compared to the response times of photoconductive or bolometer-based devices. The silicon photodiode has a responsivity of about 0.4 A/W at the peak response wavelength. Some manufacturers list
NEP values of less than 2 ð 1015 W/Hz1/2 , DŁ above 1 ð 1014 cm Hz1/2 /W, and optical areas of 1–10 mm2 . The photodiode has proven to be a useful tool for photoncounting and imaging applications across the entire range of spectral sensitivity and has been used for power generation in the visible portion of the spectrum (70,71). Silicon photodiodes are available in both discrete and array formats. The simplicity of the discrete photodiode makes this device one of the least expensive photon detectors available. Discrete photodiodes are merely p–n homojunctions and are available from numerous manufacturers. Typical, commercially available devices have one to four discrete diode detectors per package, and frequently, device manufacturers include operational amplifier-based readout electronics inside the packaged device for ease of use. Discrete photodiodes can be used in a myriad of applications, including high-speed optical switching, intensity determination for automatic exposure control circuitry in film cameras, and photon counting for spectroscopic analyses (72). Photodiode arrays are more complex than their discrete counterparts due to the difficulty of directing the signal information from each diode to off-chip electronics. The more common linear arrays contain internal multiplexing circuitry located on the periphery of the imaging area. This circuitry amplifies and buffers the signal from each diode and presents the information from each pixel through a single-output amplifier in a controlled time sequence. The performance characteristics of linear photodiode arrays typically rival those obtained from discrete diodes. Arrays ranging in size from 1 ð 64 pixels up to 1 ð 4,096 pixels and larger are available from commercial sources. Linear photodiode arrays are commonly found in high-resolution image scanning applications such as photocopiers, facsimile (FAX) machines, and handheld scanners (73). As in linear photodiode arrays, two-dimensional photodiode arrays require internally integrated circuitry to mediate the signal information from each pixel. However, except for the outermost rows and columns, the pixels in two-dimensional arrays are surrounded on all sides by other pixels. Thus the required circuitry cannot reside solely on the periphery of the array but must be integrated into the actual pixel site. In a twodimensional array, each pixel consists of both a photodiode and electronic circuitry designed to attach or detach that diode from the readout electronics located at the chip periphery. The voltage pulses that control the timesequenced readout are generated from two multiplexing circuits, one each for the x and y chip dimensions, which are also located at the chip periphery. In its simplest form, the two-dimensional photodiode array uses a large capacitor that is common to all pixels in a given row to convert the signal charge sequentially from each pixel to a voltage. The resulting signal voltage is very small and results in a signal-to-noise ratio roughly onehalf that of an equivalent format CCD array. Generally device performance varies inversely with the number of pixels in the array. Prior to about 1984, two-dimensional photodiode arrays were heavily used in commercial imaging applications such as video cameras. Recently,
PHOTODETECTORS
CCDs and metal oxide semiconductor (MOS) arrays, two-dimensional imaging arrays that use p–n diodes as photosites, and complimentary metal oxide semiconductor (CMOS)-based components for readout circuitry have emerged as strong competitors in this arena. Nevertheless, some two-dimensional standard video format photodiode arrays are still manufactured. These devices are most useful in situations unsuited for CCD and CMOS-based imagers, such as ionizing radiation environments. In addition to use as a popular image detector, the silicon homojunction photodiode and avalanche photodiode (74) can also be used for power generation by operating the device in the photovoltaic mode. In this mode, incident photons produce a voltage drop across the device that is proportional to the number of absorbed photons. When a finite load resistance is placed across the diode leads, a current is produced. Thus, silicon homojunction diodes can be used to convert optical energy into electrical energy. The energy conversion efficiency for common silicon photocells is in the 3–15% range, depending on the specifics of the device design. High levels of illumination are required to produce useful output power. For typical photosensitive cells an illumination of 10 lux can produce an open-circuit output voltage of about 0.5 volts. However, any desired voltage or current can be generated by appropriate series or parallel interconnection of multiple individual elements. Although silicon photodiodes are not as efficient in generating power as more exotic Group 2–16 (II–VI) and Group 13–15 (III–V) materials, these devices are commonly used as power sources for both terrestrial and space-borne applications. Fabrication. Photodiodes can be made by manufacturing processes resembling those for constructing CMOS as well as bipolar integrated circuits. For a typical p- on ndiode formation process, the starting wafer is an n-type silicon substrate roughly 500 µm thick. The silicon has been doped with either P, As, or Sb during the wafer formation process to a nominal resistivity of 10 Ð cm. The initial step of the fabrication process consists of growing a thick layer of SiO2 on the top surface of the silicon wafer by using elevated temperatures in a steam and oxygen ambient. Next, circular windows are etched in the SiO2 layer. The wafer is then placed in a high-temperature diffusion furnace that introduces boron dopant into the silicon through the open circular windows. Ion implantation is a common alternative to high temperature diffusion for boron doping. The result of the doping procedure is the formation of a p–n junction, where one diode is formed for every window in the SiO2 . Next, a low resistance material such as aluminum is used to form ohmic contacts to the p-type silicon regions. The aluminum layer is also used to form distinct electrodes, one per diode, to which external connections can be made. Finally a single common ohmic contact is formed on the back surface of the n-type silicon wafer by an additional metal layer. The method of diode formation as well as the density and profile of the impurity ions determines the specific optical and electrical performance parameters of the photodiode. When photon absorption occurs in
1199
the depletion region of the diode, the resulting carriers are quickly swept from the diode and measured by the readout circuitry. The same behavior occurs for an optically induced charge formed within a diffusion length of the depletion region. A charge generated at a distance greater than one diffusion length recombines in the undepleted silicon and consequently cannot be detected as a signal charge. A similar fate befalls photogenerated carriers produced in heavily doped or heavily latticedamaged regions. Heavy surface doping and lattice damage are common by-products of the homojunction formation process. Therefore, the diode fabrication process balances the reduction in the rate of surface dark current generation against the charge collection loss produced by heavily doping the silicon surface. Similarly, the improvement in photon collection efficiency obtained by increasing the thickness of the depletion region must be weighed against the associated increase in bulk-generated dark current. CMOS Image Sensors A relatively new entry in the field of visible photon detection is the complementary metal oxide semiconductor (CMOS) image sensor (75). Sometimes referred to as active pixel sensors (APS), these devices are variants of readout electronics commonly used with infrared photodiode arrays (76,77) that have been modified to allow detecting visible photons. Unlike CCD or photodiode imaging arrays, CMOS image sensors have active transistor elements that reside in each pixel. These elements are configured into amplification circuitry that can convert an integrated photogenerated charge to a signal voltage inside each pixel. Before conversion to a voltage, a signal charge is collected and stored in either a diode or MIS capacitor region. Amplifier and multiplexer circuitry residing at the periphery of the imaging area poll, each pixel in an image line in parallel, and transmit the signal voltage measurements to an on-chip analog-to-digital converter (ADC). This readout process continues until every line in the image has been polled and output. In theory, the use of CMOS fabrication technology affords a number of advantages over CCD technology. CMOS provides a very high degree of circuit integrability, thereby allowing complete camera systems to be constructed on a single silicon chip (78). The combination of signal charge detection in the pixel of generation, without the need for transfer through the array, and the use of complementary n- and p-channel field effect transistors (FET) provide very low power operation. Finally, because CMOS fabrication is currently the most popular and most readily available semiconductor process for silicon, imagers can be manufactured at very low cost. At this time (2000), CMOS image sensors are still in their infancy. Compared with CCDs, CMOS imagers suffer from several performance problems, including poor quantum efficiency, fixed pattern noise, and image cross talk (79). Quantum efficiency is hampered in two ways. First, including active circuitry decreases the photonsensitive area of the pixel and thereby diminishes the total collection area for photons. Second, compared to a CCD, standard CMOS processes reduce the depth
1200
PHOTODETECTORS
in silicon from which the signal charge is collected, thereby retarding the collection efficiency of about 600-nm and longer wavelength photons. Fixed pattern noise is introduced into the image through variations in the threshold voltages of interpixel amplifiers across the imaging array, as well as through variations in dark current generation on a pixel-by-pixel basis. Image cross talk occurs through the diffusion of photon charge generated beneath the depletion region of the pixel array. Cross talk also results from changes in pixel operating voltages across the array, which are produced by current flow through the resistance of critical bias lines that feed the pixel circuitry. Much improvement in performance is anticipated during the next few years and will be required to strengthen the prospects for CMOS imagers in scientific usage. Cadmium Sulfide Photoconductor CdS photoconductive films are prepared by both evaporating bulk CdS and settling fine CdS powder from aqueous or organic suspension followed by sintering (80,81). Evaporated CdS is deposited to a thickness from 100 to 600 nm on ceramic substates. The evaporated films are polycrystalline and are heated to 250 ° C in oxygen at low pressure to increase photosensitivity. Copper or silver may be diffused into the films to lower the resistivity and reduce contact rectification and noise. The copper acceptor energy level is within 0.1 eV of the valence band edge. Sulfide vacancies produce donor levels, and cadmium vacancies produce deep acceptor levels. Settling can be accomplished from an ink which contains 1-µm crystallites of CdS and selected concentrations of CdCl2 . The coating is fired in a restricted volume of air at 500–700 ° C. During sintering, the CdCl2 acts as a flux and forms a solution with the surface of the grains of CdS. A few ppm of chloride and copper (from impurities) enter the crystal lattice and act as activator centers for trapping. Excess chloride evaporates and leaves a continuous polycrystalline photoconductive film. The layers are from 5–30 µm thick and have a linear I –V relationship. The films have an area resistance of 100 000–300 000 / square. Most applications, such as switching on outdoor lights at twilight, require a detector resistance near 1,000 ohms to operate the switching circuit without
Electrode
Photoconductive cadmium sulfide
Contact Metal case
Glass window Ceramic substrate
Base pin
Figure 12. CdS film detector in a package showing interdigitation to reduce resistance. See text.
the need for impedance matching electronics. This is accomplished by depositing the contacts in an interdigitated geometry as shown in Fig. 12. A protective film is deposited over the detector and contacts to provide long-term stability, or the detector structure is mounted in a hermetic package as shown. The spectral sensitivity of a thin-film CdS detector is shown in Fig. 9a. The response shape is similar to that of the human eye (see Table 1). GaAs, GaAsP, and InGaAs Photodiodes Gallium–arsenic and gallium–arsenic–phosphorus diodes are fabricated as photodiodes as well as light emitters. Fabrication is typically by mesa etch technology of the films of GaAsP or InGaAs grown (82) by the vapor-phase epitaxial process using metal organic chemical vapor deposition (MOCVD). This growth technique results in impurity densities less than 1 ð 1014 atoms/cm3 . The spectral cutoff range extends from 500–900 nm, depending on the phosphorus content. These detectors can be used for color discrimination and do not require expensive interference filters. Emitter–diode pairs are used for very high impedance signal coupling in high-speed integrated circuits. The spectral sensitivity is shown in Fig. 9a for typical GaAs and GaAsP photodiodes. Surface leakage, caused by the lack of a native surface-passivation technology such as SiO2 for Si detectors, is the most significant performance limitation for diodes made from this material system. The indium–gallium–arsenide (InGaAs) photodiode is becoming a very popular detector for very near-infrared and short-wavelength infrared radiation. InGaAs devices are made by improved epitaxial growth techniques, and their performance rivals or surpasses HgCdTe. The available cutoff wavelength of commercially available discrete InGaAs photodiodes extends to 2.5 µm. Linear InGaAs arrays that have cutoff wavelengths as high as 2.3 µm (see Table 1) are available, enabling research efforts in room temperature atmospheric spectroscopy (83) and environmental monitoring (84). Uncooled InAsSbP/InGaAs photodiodes that have good performance characteristics to a 3.4 µm cutoff wavelength have also been reported (85). PbS and PbSe Photoconductors Lead chalcogenides, PbS, PbSe, and PbTe, were among the first infrared-detector materials investigated. Although photovoltaic effects are observed from p–n junctions in single-crystal material, the response is quite poor and not reproducible. However, very sensitive photoconductors are prepared as polycrystalline thin films about 1 µm thick, which are deposited on glass or quartz substrates between gold or graphite electrodes. Detector elements are prepared either by sublimation in the presence of a small partial pressure of O2 or by chemical deposition from an alkaline solution that contains a lead salt and thiourea or selenourea (86). Lead sulfide and lead selenide deposit from solutions as mirrorlike coatings made up of cubic crystallites 0.2–1 µm on a side. The reaction may be represented nominally by the following: Pb2C C SC(NH2 )2 C 2OH ! PbS C C(D NH)2 C 2H2 O
PHOTODETECTORS
The actual reaction probably is more complex. The photoconductive behavior depends on the pH of the solution from which deposition occurs. It is likely that oxygen-containing compounds are present in the deposited films. For either method of preparation, the effect of oxygen, which is introduced during preparation or by subsequent heat treatment in air or oxygen, is critical for the development of optimum sensitivity. Maximum sensitivity is obtained near the point at which the film conductivity type changes from n to p. The long response times are suggestive of trapping. It is likely that deep trapping states are located at the oxidized surface of the micrograins (87,88). The evaporation technique produces the best results, especially for PbSe and the more obsolete PbTe. The spectral sensitivity is shown in Fig. 9b for different operating temperatures (see Table 1). Platinum Silicide Schottky Barrier Arrays The Pt:Si detector (89–91) is essentially a metal semiconductor barrier in which the platinum silicide is a quasi-metal that generates a small energy barrier to electrons. The effective photons are absorbed in a very thin region of the silicide next to the barrier and generate free electrons that flow over the barrier and tunnel through it into the n-type silicon. The efficiency of this process is only a few percent, even at high energies (short wavelengths), because of the low electron diffusion coefficient in the silicide. At wavelengths beyond 1 µm, the efficiency drops off dramatically because of decreasing tunneling probability for the lower energy electrons. The effective quantum efficiency is about 0.1% at a wavelength of 4.8 µm. Techniques of platinum deposition vary, but sputtering and annealing of a very thin layer of platinum produces a uniform platinum silicide film that has less than 0.3% variation in responsivity in an area of 2 ð 2 cm. Details of deposition and annealing processes are considered trade secrets by the manufacturers. Deposition is typically onto silicon integrated circuit (IC) chips that have a readout structure. Small regions of the IC chip at each pixel are dedicated for the silicide detector element. The readout is typically an in-line charge-coupled device. The spectral sensitivity is shown in Figure 9b, and array information is given in Table 1. Although the photon efficiency is quite low, focal plane performance for infrared imaging of ambient temperature scenes is acceptable because of long television display frame time, reasonably low detector noise, and the excellent uniformity of responsivity that allows for high onchip input gain without offset correction. The responsivity is about 10 millivolts per degree delta scene temperature using f /2 optics, and scene sensitivity with large arrays is about 0.1 ° C. The Pt:Si detector finds typical use in security and defense-related applications but is also useful in spectroscopy (92) and other applications where device cooling is not a significant systems issue. InSb Photodiode Detectors and Arrays Sensitive photodiodes (93,94) have been fabricated from single-crystal InSb using cadmium or zinc to form a p-type
1201
region in bulk n-type material. High-quality InSb crystals can be grown by the infinite-melt process (95,96) where an InSb film is grown epitaxially (from the liquid phase) on a slice of InSb which was prepared in a conventional Czochralski vertical puller. The diode formation process typically is a closed-tube diffusion. Cleaned and etched samples of InSb are placed in a quartz ampule with a limited amount of zinc or cadmium. After evacuation and sealing, the ampule is heated to about 50 ° C below the crystal melting point. The metal vaporizes partially or completely, depending on the amount, volume of ampule, and temperature, and diffuses into the crystal after a few hours. The impurity–diffusion profile approximates the error–function law and the p–n junction is 1–5 µm below the surface. Diode arrays (16) are formed by etching mesas about 50 ð 50 µm. Typical performance of InSb detectors is given in Table 1, and the spectral sensitivity is shown in Fig. 9c. Up to 480 ð 640 matrix arrays of InSb photodiodes in a mesa configuration have been demonstrated. Commercial units 256 ð 256 are available. The mesa detector array is mated to a silicon chip that has an array of amplifiers and multiplex circuitry. Each diode is connected to an amplifier input. The hybridization process consists of forming indium bumps on each diode mesa and on each amplifier input, using a photolithographic process and In evaporation and pressing the detector array chip to the silicon integrated circuit chip. The infrared radiation must pass through the InSb chip to reach the photodiode junction. To improve quantum efficiency, the InSb is grown as a thin layer on GaAs or GaAsSb substrates. Quantum efficiency is greater than 50% for wavelengths greater than 2 µm and less than the cutoff of InSb, 5.3 µm. A protective coating (passivation) of the InSb photodiode for stable operation during several hours without frequent signal normalizations has not been found. However, infrared imaging cameras using hybrid InSb focal planes are a commercial reality. In a real-time imaging configuration, the scene sensitivity is about 0.04 ° C using an InSb infrared camera. As a result of their commercial availability, InSb arrays are finding increased use in infrared astronomy (97) and other scientific applications. Mercury Cadmium Telluride HgCdTe has proven to be an excellent infrared detector material (2) the CdTe content can be readily adjusted to obtain cutoff wavelengths from 2–20 µm. The benefit is high spectral sensitivity of the photon detector, low defect density, and high cooling efficiency. The dependence of energy gap on mole fraction is linear. For Hg1x Cdx Te, the x values of most interest lie between 0.17 and 0.50. The need for large focal planes up to 2 ð 2 cm has dramatically changed the direction of single-crystal HgCdTe growth technology. Crystal Growth. The method of solid-state recrystallization (quench and anneal) was marginally adequate during the 1970s and 1980s for linear photoconductor arrays used in military night vision systems, but quenching kinetics restricted sample size to 6 mm ð 2 cm. The crystals had many low-angle grain boundaries, defect densities were
PHOTODETECTORS
Heating coil Substrate holder assembly
Melt
Figure 13. Furnace reactor for growing HgCdTe films on CdZnTe substrates using the liquid-phase epitaxial process. The melt is tellurium in a quartz crucible. A mercury reservoir in a cooler zone maintains the required Hg overpressure.
high, and there were large nonuniformities in composition and carrier concentration. Therefore, epitaxial growth technologies were developed. Crystal growers were hampered, however, by the lack of suitable substrate material until methods were developed to grow the HgCdTe films onto large-area silicon and GaAs single crystals. The problem with misfit dislocations has not yet been satisfactorily solved except for detector cutoff wavelengths of less than 5 µm. Thus, CdTe single crystals are used to obtain large, low defect density substrates for growth from the liquid phase. Lattice matching is achieved by adding 3% Zn at. wt. Large-area CdZnTe substrates form the basis for liquidphase epitaxy (LPE) growth of mid- and long wavelength IR HgCdTe detector material. Substrates are obtained from 3.5-kg CdZnTe ingots grown in graphite boats in sealed quartz ampules (98). The horizontal Bridgman growth process yields 41 ð 6.4 ð 5.0 cm semicylindrical ingots that have large single-crystal portions which are then sectioned, sawed into slabs, and diced into required dimensions. The substrate surfaces are diamond turned and etched to assure flat, damage-free surfaces. Standard substrate sizes are 3.6 ð 2.0 cm and 3.6 ð 1.5 cm but the ability to obtain single-crystal h111i oriented CdTe substrates as large as 5.1 ð 7.6 cm has already been demonstrated. The material typically exhibits dislocation densities of 1 ð 105 /cm2 and high purity as judged by Hall measurement made on evaluation samples of n-type LPE films grown on these substrates. Liquid-phase epitaxial films (98,99) are grown in production prototype dipping reactors, as shown in Fig. 13, using CdZnTe substrates. Film growth is from both tellurium and mercury solutions. Phase diagrams for the Te and Hg corners are shown in Fig. 14. Growth is from lightly (ca 5 ð 1014 /cm3 ) indium-doped 4,000-g tellurium solutions. The mercury vapor pressure is maintained by a mercury reservoir positioned in an independently
1.0
(a)
A
B
C D
X Hg
Crucible dome
controlled furnace zone. Up to 54 cm2 of material can be grown in a single run from the largest reactors. Multiple furnace zones in the vicinity of the melt crucible permit adjusting the melt temperature profile. The substrates are positioned horizontally during growth and rotated to promote uniformity of composition. The holder design allows reorienting substrates vertically for withdrawal to facilitate melt drainage. LPE films are also grown in mercury solutions of several kilograms in which small amounts of tellurium and cadmium have been added. In both cases, the cutoff wavelength varies less than 0.1 µm across the entire film. The films grown in tellurium are annealed in Hg vapor. Mercury atoms diffuse into the films to reduce the density of Hg vacancies that act as acceptor sites. The carrier concentration is shown in Fig. 15 as a function of the annealing temperature. For photoconductive detectors, the films are annealed so that the indium donor density is dominant, this results in n-type material that has a 1 ð 1015 /cm3 excess electron density. Minority carrier lifetime is typically 1–5 µs at 77 K. For photodiodes, the films are annealed to an acceptor (via Hg vacancies)
E
0.1
F
450 500 0.01
650
600
550
0.01 0.02 0.03 0.04 0.05 0.06 0.07 X Cd
10−3
(b)
A
550 B
520 500 480
C
X Cd
1202
D
−4
10
E 430
460 F
400 10−5 0.01
360 0.1
0.2
X Te
Figure 14. Phase diagrams of HgCdTe used to define the liquid-phase epitaxial growth process where composition is in mole fraction X, and the numbers represent temperatures in ° C: (a) Te-rich corner where the dotted lines A–F correspond to values of XTe of 0.1, 0.2, 0.3, 0.5, 0.8, and 0.9, respectively, and (b) Hg-rich corner where A–F correspond to values of XHg of 0.9, 0.8, 0.6, 0.4, 0.2, and 0.1, respectively.
PHOTODETECTORS
of HgCdTe removed must be carefully etched to avoid generating defects. The arrays are defined by photoetching and passivated using a ZnS film; indium electrical contacts are applied. The contacts define the optically active area of about 2 ð 105 cm2 . The arrays are mounted in a vacuum dewar for cryogenic cooling. Each element is connected to a series resistance bias circuit and ac coupled to an amplifier. Detector resistance is nominally 100 ohms, and bias current is about 2 mA. The signal voltage is given by Eq. (13) and the responsivity by Eq. (34).
Carrier concentration/cm3
1017
1016
1015
n-Type
RV D ητ V/hνnv. 1014
1013
Extrinsic impurities
200
(34)
p-Type
300 400 500 Annealing temperature (°C)
600
Figure 15. Excess carrier concentration in HgCdTe in a saturated Hg vapor as a function of temperature, where the dashed line represents Hg vacancies. The extrinsic impurity concentration can be adjusted in the growth process from low 1014 up to mid-1017 . Low temperature annealing reduces Hg vacancy concentration and acceptor density.
16
1203
3
The responsivity becomes independent of the bias voltage V when the electric field-induced sweep time of the holes equals the hole lifetime. g–r noise is dominant in a well-designed, well-made HgCdTe photoconductor detector (103,104) and may be expressed in terms of a minority carrier density p and majority carrier density n. Semiconductor noise analysis for the HgCdTe photoconductor yields Vg2– r D
4pτ V 2 f . n(n C p)v
(35)
The responsivity and g–r noise may be analyzed to obtain background photon flux and temperature dependence of responsivity, noise, and detectivity. Typically, n>p, and both are determined by shallow impurity levels. The minority carrier density is the sum of thermal and optical contributions, n2 ηB τ , (36) pD i C n t
density of 3 ð 10 /cm . Carrier lifetimes are less than 50 ns at 77 K. Extrinsic acceptor doping is being developed to replace vacancy doping to achieve longer minority carrier lifetime and lower dark current density. Film thickness is 40–80 µm. Films grown in Hg (99,100) are usually structured to make heterojunction photodiode arrays. The first or base layer is narrower band gap HgCdTe, grown on CdZnTe substrates, doped with indium for excess electrons (n-type) in the 3 ð 1016 /cm3 range and is 10 µm thick. The second or cap layer is wider band gap HgCdTe, doped with arsenic for excess holes (p-type) in the 5 ð 1015 /cm3 range and is 4 µm thick. The composite HgCdTe film is photolithographically etched part way into the base layer to form an array of mesas; each one is a photodiode detector element. The p–n junction is close to or coincident with the metallurgical heterojunction. For infrared detection in the 8–12 µm atmospheric spectral window, the base layer CdTe content is about 20%, and the cap layer CdTe content is about 30%.
where, typically, the background flux φB is much greater than the signal flux φs . The lifetime τ is about 1 µs but for narrow band gap, defect-free detectors, it becomes the Auger lifetime and can be calculated readily from basic semiconductor properties (101). The cooling requirement is determined according to Eq. (36) because ni is an exponential function of band gap energy and temperature. Combining Eqs. (35) and (36), detectivity can be expressed as follows: 1/2 τ η . (37) DŁλ D 2hν pt
Photoconductive Detector Arrays. Since 1972, more than 50,000 HgCdTe linear arrays (101,102) have been produced in the United States for Department of Defense infrared systems ranging from night vision for M1 tanks to targeting sights for smart weapons. These common module detector arrays consist of 180 elements photoetched on a 50-µm pitch. Virtually all of the material used for these arrays was prepared by solid-state recrystallization. Small slabs of material were epoxied to sapphire or ceramic and were thinned to 8 µm. The newer epitaxial HgCdTe is also epoxied to ceramic where the film side is down. The material is thinned by diamond turning, and the CdZnTe substrate is removed. In each case, an 8-µm thick layer of HgCdTe remains. The last few micrometers
At low background flux, this gives the temperature dependence of the DŁλ shown in Fig. 4. At high flux, the DŁλ equation (Eq. 37) reduces to Eq. (12) except for a factor of (2)1/2 which results from the random recombination process not present in diodes. The scene sensitivity of a scanning photoconductor array infrared camera is about 0.15 ° C. The long minority carrier lifetime in n-type HgCdTe has been exploited to perform signal processing in the element (SPRITE) (105). The bias field is adjusted to sweep the photogenerated holes along the HgCdTe detector element synchronously with the scanned image, creating a time delay and integration (TDI) enhancement of the signal-tonoise ratio.
PHOTODETECTORS
Photovoltaic Detectors, Arrays, and Focal Planes. The two popular types of photodiode arrays in HgCdTe are based on homojunction (106) and heterojunction (107,108) technologies. Homojunction diode arrays are fabricated from p-type epitaxial HgCdTe. The surface of the grown HgCdTe film is flattened by diamond milling and chemical lapping using a weak solution of bromine in methanol. The epitaxial film is then epoxied to a silicon chip that has an array of amplifiers (typically on 50-µm centers) and readout multiplexer (mux) circuitry. The CdZnTe substrate is milled away, and the HgCdTe is thinned to about 10 µm. An array of small (10–20 µm) holes is etched in the HgCdTe film so that each hole is located over an amplifier input pad of the silicon IC chip. The diodes are made by implanting boron (150 kV, 1 ð 14/cm2 ) through a 500-nm ZnS layer. Planar diodes are generated by using a patterned photoresist to define the implanted regions located between the holes. The implant process disrupts the lattice, creating Hg ion interstitials which cause the formation of very shallow donor states. The damage layer extends about 150 nm from the surface and the n–p junction varies from 2–8 µm deep, depending on the implant and annealing conditions. The fabrication is completed by applying a CdTe passivation layer (about 1 µm thick) directly to the HgCdTe surface, forming a ZnS layer for antireflection and electrical isolation, and formation of a metal film lead to connect each implanted n region to each amplifier input. Infrared illumination is directed onto the HgCdTe film, and quantum efficiency typically exceeds 75%. Because infrared focal planes are typically cooled to less than 100 K, the differential coefficient of thermal expansion causes the shrinking silicon to put tensile stress on the HgCdTe film. However, the thinness of the film and the presence of the holes allow the HgCdTe to strain and retain the integrity of the epoxy film and the electrical contacts. Array dimensions up to 2.5 cm have proven reliable. Heterojunction diode arrays utilize the grown p–n junction and mesa etch technology. The mesa arrays are passivated by a thin layer (about 500 nm) of CdTe using an evaporation or molecular beam epitaxial process. Annealing at 250 ° C in a Hg atmosphere creates a short (about 100 nm) graded layer from the CdTe to the HgCdTe. The benefit of CdTe passivation is efficient isolation of the p–n junction from the surface, low dark current noise, and immunity to ionization radiation. As in InSb hybrid arrays, indium bumps are formed using evaporation at the center of each mesa and on each amplifier contact pad of the silicon integrated circuit chip. The two chips are pressed together in a bump bonding process called hybridization (109). Sometimes the space between the chips is filled with epoxy. Infrared illumination is through the CdZnTe substrate which has been coated with ZnS for antireflection. Quantum efficiency typically exceeds 75%. HgCdTe photodiode performance depends, for the most part, on high quantum efficiency and low dark current density (110,111), as expressed by Eqs. (23) and (25). Typical values of R0 A at 77 K are shown as a function of cutoff wavelength in Fig. 16 (95). HgCdTe diodes sensitive out to a wavelength of 10.5 µm have shown ideal diffusion current limitation down to 50 K. Values of R0 A have
107 106
R 0A (Ω cm2)
1204
105 104 103 102 101 100
4
5
6
7 8 9 10 11 Cutoff wavelength (µm)
12
13
Figure 16. Resistance area (R0 A) product for HgCdTe photodiodes cooled to 77 K. The solid line represents the theoretical limit, the dashed lines (−−) and (− ž −) high and low performance, respectively. Dark current caused by defects lowers R0 A and detector sensitivity. In the high-performance range, dark current is an exponential function of cutoff wavelength and temperature.
exceeded 1 ð 106 Ðcm2 . Spectral sensitivities for three compositions of HgCdTe detectors are shown in Fig. 9a,c. More information is listed in Table 1. The scene sensitivity of a HgCdTe diode area array cooled to 80 K is about 0.02 ° C for ambient temperature scenes. Doped Germanium and Silicon Photoconductors Extrinsic photoconductors are typically single-crystal germanium doped with zinc, cadmium, mercury, boron, and gold (112,113) and silicon doped with indium, gallium, or arsenic (114). The doping density ranges from 1 ð 1015 to 1 ð 1017 impurity atoms/cm3 leading, respectively, from a low (1 cm1 ) to a higher absorption coefficient (50 cm1 ). Information on ionization energies, solubilities, diffusion coefficients, and solid–liquid distribution coefficients is available for many impurities from nearly all columns of the periodic table (113). Extrinsic Ge and Si have been used almost exclusively for infrared detector applications. Of the impurities, Cu, Au, Zn, Cd, Hg, and some of the elements of Groups 13 (III) and 15 (V) have been used in detectors. Germanium and silicon, which are used in the preparation of detectors, must be of high purity before they are doped with the desired activator impurity to avoid unwanted compensation by impurities whose ionization energies are smaller than the activator. The required purity can be achieved by zone refining in which a short molten zone is repeatedly passed from end to end of an ingot of impure Ge or Si. Impurities that have distribution coefficients larger than unity collect near the seed. The concentration of electrically active residual impurities in the center portion of the ingot can be reduced to 1012 –1013 /cm3 . Single crystals can be grown by using the Czochralski method, in which an oriented seed crystal is brought into contact with the melt and then is withdrawn slowly while being rotated, or by using a horizontal zone melting method, in which a seed crystal is melted onto one end of a polycrystalline ingot. A molten zone is produced at the junction of the ingot and seed and is moved slowly along
PHOTODETECTORS
the ingot, leaving behind a single crystal. All of these operations must be carried out in an inert or reducing atmosphere to prevent oxidation of the germanium or silicon. In most cases, the activator impurity must be incorporated during crystal growth. An appropriate amount of impurity element is dissolved in the molten Ge and, as crystal growth proceeds, enters the crystal at a concentration that depends on the magnitude of the distribution coefficient. For volatile impurities, for example, Zn, Cd, and Hg, special precautions must be taken to maintain a constant impurity concentration in the melt. Growth occurs in a sealed tube to prevent escape of the impurity vapor or in a flow system in which loss caused by vaporization from the melt is replenished from an upstream reservoir. Some impurities, for example, Cu, Ag, Ni, Co, and Fe in Ge, and In, Ga, and As in Si, have diffusion coefficients that are large enough to permit doping by solid-state diffusion well below the melting point of the host crystal. A thin layer of the diffusant, which is deposited on the surface of the crystal by electroplating, vacuum evaporation, or electrochemical replacement, serves as the source for diffusion. After homogeneity is achieved, the sample is quenched. The alloyed surface layer must be removed by lapping and etching before the electrical contacts are applied. The impurity concentration should be as large as possible, within limits, to maximize the absorption coefficient. In some cases, the concentration is limited by the impurity solubility, which may be small depending on the element. For impurities of high solubility, the upper limit is set by the onset of impurity banding. When the average separation of impurity atoms in the lattice is small enough, conduction by direct transfer of carriers from atom to atom can occur. Impurity banding limits the extent by which cooling can reduce the dark current and, therefore, the noise, in Ge and Si. This is significant above impurity concentrations of about 1016 /cm3 . Impurities other than those from Groups 13 (III) and 15 (V) generally exhibit two or more impurity levels in Ge. If an activator is used which is not of the lowest lying level, the lower lying levels must be compensated for by adding an impurity of the opposite conductivity type. For example, if the second Zn acceptor level for 0.095 eV is to be used, the lower lying level at 0.035 eV must be compensated for by adding a donor impurity, for example, As in a concentration slightly greater than that of Zn. Electrons from the As donors fall into the low-lying Zn level and render it inactive. A special case is the lowest lying Au level; Au acts as a donor at 0.045 eV above the valence band. If a shallow acceptor, for example, Ga, is added in a concentration that is slightly less than that of Au, electrons from the donor level fall into the Ga level. At low temperatures, holes that are bound to the compensated Au centers can be photoexcited to yield photoconductivity that has a long wavelength threshold at about 25 µm. Single-detector elements and arrays are formed by dicing and etching and attaching electrical contacts. Linear arrays of several hundred elements have been made. The spectral sensitivities of doped Si and Ge detectors are shown in Fig. 9c,d. See Table 1 for other information. Monolithic and hybrid extrinsic focal planes have also been
1205
demonstrated (115,116). Such arrays play an important role in areas that require detecting very long wavelength infrared radiation and can observe at the sensitivity limits imposed by natural sky background radiation (117). GaAs–AlGaAs Quantum Well Arrays The quantum well infrared detector (118–120) is a technology based on the artificial structure called a superlattice. Infrared detection out to a 15-µm wavelength and beyond has been achieved using engineered material that has a controlled energy gap. The technique utilizes multiple stacked layers of semiconducting material; each has a tailored band gap energy, such that a series of potential wells exists in the direction normal to the layers. Typically, very thin layers of GaAs and AlGaAs are grown by molecular beam epitaxy (MBE). The MBE process is conducted in stainless tubing and consists of simply controlled evaporation of the elements from side chambers into a main chamber that contains a wafer of the substrate material, typically, GaAs. The electron wave function is confined in the potential wells when the photon-generated electron wavelength is nominally the layer thickness (about 10 nm). The electrons are freed from the wells by photons, and a signal is detected by applying an external bias voltage. AlGaAs quantum well infrared photodetector (QWIP) focal planes have achieved sufficient sensitivity out to a 10-µm wavelength to result in scene temperature sensitivity of about 0.04 ° C when the focal plane is cooled to 70 K (121). Spectral sensitivity is shown in Fig. 9c, and array information is given in Table 1. To date, QWIP devices have been used in a number of scientific applications, including geology (122), medicine (120), and astronomy (123), as well as in public service applications such as fire fighting. Semiconductor Bolometer Arrays The use of bolometers to sense infrared radiation is not new. What is different is that, rather than using a single large cell, a matrix of miniature cells or microbolometers is created. Rather than a single amplifier connected externally, a custom integrated circuit is built under each cell to form a totally integrated focal plane array, as indicated in Fig. 17. Key elements of the structure (124,125) are the thin (100-nm) amorphous silicon (α:Si) or vanadium oxide (VO) thermally sensitive membrane, the thermally insulating support arms, and the integrated circuit underlying this structure. Infrared energy focused on the individual pixels heats the membrane, as given by Eq. (24). The support arms act as electrical contacts but are sufficiently thin and narrow to prevent significant conduction of thermal energy from the membrane to the surroundings. The temperaturedependent electrical resistance of the semiconductor film is monitored by the circuitry and is converted into an electrical signal proportional to the incident radiance. This signal is displayed on a standard television monitor, thus forming the image of the scene. Two technological advances have occurred that make such a structure feasible to build. The first is the development of microetching techniques that can be used
1206
PHOTODETECTORS Radiation (Uv,Ir, and visible)
Support, electrical connection, and thermal isolation Semiconductor membrane
Support post
Bias
Mux out
Buffer
Filter
Gain
R load
IC readout electronics Silicon substrate
Figure 17. Cross-sectional schematic of a microbolometer photodetector. Micromachining is used to construct small, very low mass detectors. The dimension of a detector–amplifier cell is 50 µm. Mux D multiplexer, IC D integrated circuit.
small circuit for each cell of the array. By doing this, the noise bandwidth of each cell can be minimized, thereby maximizing performance. This device is a thermal detector in the infrared and therefore is independent of wavelength. Measured responsivity is about 5 ð 105 V/W, and the NEP is about 50 pW. The focal plane operates at ambient temperature. Imaging arrays that have scene sensitivity better than 0.1 ° C have been demonstrated (see Table 1).
(a)
HEALTH AND SAFETY FACTORS
(b) Active area
Support pillars
2.5-µm Gap
Figure 18. Microbolometer (a) array portion showing pixels on a 50-µm pitch. Each pixel is connected to a readout amplifier in the supporting silicon IC chip. (b) Detector has a 35 ð 40 µm active area. The serpentine arms give excellent thermal isolation, and the low mass results in a 10-ms response time, ideal for thermal imaging.
to form microscopic structures in silicon and its coatings. This advance makes forming the membrane and support arms possible, and thickness control is in the 10-nm range. Cell dimensions of less than 50 µm have been demonstrated. Scanning electron microscope photos of an array and a test pixel are shown in Fig. 18. The second critical factor is the increased circuit density in silicon integrated circuits that makes it possible to build a
The completed photodetector is usually packaged hermetically in inert glass or plastic or is enclosed in an evacuated metal or glass container. Although most detector materials are toxic, the means taken to passivate and isolate these materials are often adequate to protect the user. However there are exceptions. The preparation of detector materials and detector fabrication can present considerable hazards. Some crystal preparation techniques require using a toxic substance, for example, Hg at vapor pressures above 1 MPa (10 atm). Ampule explosions do occur. The electrical circuitry required to operate photodetectors almost always couples to detector devices at low voltages, for example, <10 V; therefore, electrical hazards are minimal. Detectors that are operated at low temperatures are mounted in evacuated containers or dewars that are partially or nearly completely made of glass. These containers are subject to implosion by improper handling. BIBLIOGRAPHY 1. R. A. Smith, F. E. Jones, and R. P. Chasmar, The Detection and Measurement of Infrared Radiation, Oxford University Press, London, 1968. 2. W. L. Wolfe and G. J. Zissis, eds., The Infrared Handbook, rev., ed., Environmental Research Institute of Michigan, Ann Arbor, 1985. 3. P. W. Kruse, L. D. McGlauchlin, and R. B. McQuistan, Elements of Infrared Technology, John Wiley & Sons, Inc., New York, NY, 1962.
PHOTODETECTORS 4. S. M. Ryvkin, Photoelectric Effects in Semiconductors, Consultants Bureau, NY, 1964. 5. P. W. Kruse, in R. J. Keyes, ed., Topics in Applied Physics, Optical and Infrared Detectors, Springer-Verlag, NY, 1980. 6. S. M. Sze, Physics of Semiconductors, John Wiley & Sons, Inc., New York, NY, 1969. 7. R. D. Hudson Jr., Infrared System Engineering, John Wiley & Sons, Inc., New York, NY, 1969. 8. J. A. Jamieson et al., Infrared Physics and Engineering, McGraw-Hill, NY, 1963. 9. H. L. Hackforth, Infrared Radiation, McGraw-Hill, NY, 1960. 10. C. S. Williams and O. A. Becklund, Optics, A Short Course for Engineers and Scientists, John Wiley & Sons, Inc., New York, NY, 1972. 11. M. J. Buckingham, Noise in Electronic Devices and Systems, Halsted Press, NY, 1983. 12. S. Borrello and Z. Celik-Butler, Solid State Electron. 36, 407 (1993). 13. R. S. List, J. Electron. 22(8), 1,017 (1992). 14. S. R. Borrello, C. G. Roberts, B. H. Breazeale, and G. R. Pruett, Infrared Phys. 11, 225 (1971). 15. C. Hilsum, in T. S. Moss, ed., Handbook on Semiconductors, vol. 4, North-Holland, NY, 1981.
1207
38. K. P. Klaasen, M. C. Clary, and J. R. Janesick, SPIE J. 331, 376 (1982). 39. K. P. Klassen et al., Opt. Eng. 38, 1,178 (1999). 40. C. D. Murray, J. Brit. Interplanetary Soc. 45(9), 359 (1992). 41. J. V. Sweedler, K. L. Ratzlaff, and M. B. Denton, eds., Charge-Transfer Devices in Spectroscopy, VCH, NY, 1994. 42. C. Lengacher et al., Amer. J. Phys. 66, 1,025 (1998). 43. T. W. Barnard et al., Anal. Chem. 65, 1,231 (1993). 44. D. M. Rowley, M. H. Harwood, R. A. Freshwater, and R. L. Jones, J. Phys. Chem. 100, 3,020 (1996). 45. J. C. Sutherland et al., J. Anal. Biochem. 163, 446 (1987). 46. E. Borroto et al., Gastroenterol. Endoscopy 50, 684 (1999). 47. Z. Szigeti, P. Richter, and H. K. Lichtenhaler, J. Plant Phys. 148, 574 (1996). 48. P. M. Epperson, R. D. Jalkain, and M. B. Denton, Anal. Chem. 61, 282 (1989). 49. G. A. T. Duller, L. Botter Jensen, and B. G. Markey, Radiat. Meas. 27, 91 (1997). 50. T. W. Barnard et al., 1992 Pittsburgh Conference, New Orleans, paper no. 990. 51. T. W. Barnard et al., 1992 Pittsburgh Conference, New Orleans, paper no. 991.
16. J. H. Longo et al., IEEE ED 25(2), 213–232 (1978).
52. I. B. Brenner, S. Vats, and A. T. Zander, J. Anal. Atom. Spectrosc. 14, 1,231 (1999).
17. D. Long, in Ref. 5.
53. R. L. McCreery, in Ref. 37, pp. 227–279.
18. M. A. Kinch and S. R. Borrello, Infrared Phys. 15, 111 (1975).
54. A. Wang, L. A. Haskin, and E. Cortez, Appl. Spectrosc. 52(4), 477 (1998).
19. L. J. M. Lesser and F. L. J. Sangster, in Ref. 15.
55. A. Wang, B. L. Jollif, and L. A. Haskin, J. Geophys. Res. 104(E11), 27 067 (1999).
20. F. Milton, in Ref. 5. 21. E. H. Nicollian and J. R. Brews, MOS Physics and Technology, John Wiley & Sons, Inc., New York, NY, 1982, Chap. 2. 22. C. Y. Wei et al., IEEE Trans. Electron Dev. ED-27(1), 170 (1970). 23. E. H. Putley, in Ref. 5. 24. A. Rogalski, in J. Thompson, ed., SPIE Milestone Series, SPIE Optical Engineering Press, Bellingham, WA, 1992. 25. W. D. Rogatto, in J. S. Accetta and D. L. Shumaker, eds., The Infrared & Electro-Optical Systems Handbook, vol. 3, SPIE Optical Engineering Press, Bellingham, WA, 1993. 26. W. S. Boyle and G. E. Smith, BSTJ Briefs 49(4), 587 (1970). 27. G. F. Amelio, M. F. Tompsett, and G. E. Smith, BSTJ Briefs 49(4), 593 (1970). 28. J. R. Janesick et al., Proc. SPIE 501, 2 (1984). 29. J. Hynecek, IEEE Trans. Electron. Devices ED-32(8), 1,421 (1985). 30. K. A. Parulski, IEEE Trans. Electron. Devices ED-32(8), 1,381 (1985). 31. Y. Takemura et al., IEEE Trans. Electron. Devices ED-32(8), 1,402 (1985). 32. D. F. Barbe, Proc. IEEE 63, 38 (1975). 33. D. E. Groom et al., Nucl. Instrum. Methods. Phys. Res. A. (Accel.) 442, 216 (2000). 34. D. Morrison, Electron. Design 48, 30 (2000). 35. M. M. Blouke, J. R. Janesick, and J. E. Hall, Opt. Eng. 22(5), 607 (1983). 36. M. D. Gregg and D. Minniti, Publ. Astron. Soc. Pacific 109, 1,062 (1997). 37. H. U. Keller et al., J. Phys. E 20(6), 807 (1987).
56. J. Hynecek, IEEE Trans. Electron. Dev. ED-38(5), 1,011 (1991). 57. J. A. Gregory, B. E. Burke, B. B. Kosicki, and R. K. Reich, Nucl. Instrum. Methods. A. 436, 1 (1999). 58. P. E. Pool, R. Holtom, D. J. Burt, and A. D. Holland, Nucl. Instrum. Methods. A. 436, 9 (1999). 59. S. Tsuneta et al., Solar Phys. 136(1), 37 (1991). 60. D. N. Burrows et al., in O. H. W. Siegmund, ed., EUV, XRay, and Gamma-Ray Instrumentation for Astronomy III, Proc. SPIE 1,743, 72 (1992). 61. R. D. Thom, T. L. Koch, J. D. Langan, and W. J. Parrish, IEEE Trans. Electron. Devices ED-27(1), 160 (1980). 62. M. A. Kinch et al., Infrared Phys. 20, 1 (1980). 63. D. K. Schroder, Appl. Phys. Lett. 25(12), 747 (1974). 64. K. Y. Han et al., Electron. Lett. 28(19), 1,795 (1992). 65. M. Wadsworth et al., IEEE Trans. Electron. Dev. ED 42(2), 244 (1995). 66. W. Runyon, in Encyclopedia of Semiconductor Technology, M. Grayson, ed., 1984, John Wiley & Sons, Inc., New York, NY, p. 809. 67. D. J. Burt, in M. J. Howes and D. V. Morgan, eds., Charge Coupled Devices and Systems, John Wiley & Sons, Inc., New York, NY, 1979. 68. J. Hynecek, IEEE Trans. Electron. Dev. ED-28(5), 483 (1981). 69. J. R. Janesick et al., in Optical Sensors and Electronic Photography, in M. M. Blouke and D. Pophal, eds., Proc. SPIE 1,071, 153 (1989). 70. R. H. Bube, in Ref. 15. 71. R. H. Bube, Photoconductivity of Solids, John Wiley & Sons, Inc., New York, NY, 1960.
1208
PHOTOGRAPHIC COLOR DISPLAY TECHNOLOGY
72. A. Campion, J. Brown, and W. H. Grizzle, Surf. Sci. 115, L153–(1982). 73. I. Fujieda et al., Opt. Eng. 38(12), 2,093 (1999). 74. S. M. Sze, Physics of Semiconductor Devices, 2nd ed., John Wiley & Sons, Inc., New York, NY, 1981. 75. M. Schanz et al., IEEE. Trans. Electron. Dev. ED-44(10), 1,699 (1997). 76. N. Bluzer and A. S. Jensen, Opt. Eng. 26(3), 241 (1987). 77. G. C. Bailey, Proc. SPIE 311, 32 (1981). 78. B. H. Olson et al., Proc. IEEE CCD Workshop (1997). 79. A. J. Blanksby and M. J. Loinaz, IEEE Trans. Electron. Dev. ED47(1), 55 (2000). 80. A. B. Dreeben and R. H. Bube, J. Electrochem. Soc. 110, 456 (1963). 81. J. A. Beun, R. Nitsche, and H. U. Bolsterli, Physica 28, 194 (1962). 82. G. B. Stringfellow, J. Crystal Growth 137, 212 (1994). 83. V. V. Doushkina et al., Appl. Opt. 35, 6,115 (1996). 84. B. Matveev et al., Sensor Actuat. B-Chem. 51, 233 (1998). 85. A.Krier and Y. Mao, Infrared Phys. Technol. 38(7), 397 (1997). 86. E. H. Putley, in C. A. Hogarth, ed., Materials Used in Semiconductor Devices, John Wiley & Sons, Inc., New York, NY, 1965. 87. A. Carbone and P. Mazzetti, Phys. Rev. B-Cond. Matt. 57(4), 2,454 (1998). 88. G. Giroux, Can. J. Phys. 41, 1,840 (1963). 89. W. F. Kosonocky, Proc. SPIE 1,308, 2 (1990). 90. B. Capone et al., Opt. Eng. 18(5), 535–541 (1979). 91. M. Kimata et al., IEEE Solid State Circuits SC22(6), 1,124–1,129 (1987). 92. C. A. Gresham, D. A. Gilmore, and M. B. Denton, Appl. Spectrosc. 53(10), 1,177 (1999). 93. C. Hilsum and A. C. Rose-Innes, Semiconducting III-V Compounds, Pergamon, NY, 1961. 94. J. T. Wimmers, R. M. Davis, C. A. Niblack, and D. S. Smith, Proc. SPIE 930, 125–138 (1988). 95. S. R. Jost, V. F. Meikleham, and T. H. Myers, in Symposia Proceedings of the Material Research Society, R. F. C. Farrow, J. F. Schetzina, and J. T. Cheung, eds., Pittsburgh, PA, 1987, p. 429. 96. P. Mohan et al., J.Crystal Growth 200(1–2), 96 (1999).
108. R. B. Bailey et al., IEEE ED 38(5), 1,104–1,109 (1991). 109. J. P. Rode, Infrared Phys. 24(5), 443–453 (1984). 110. R. E. DeWames et al., J. Crystal Growth 86, 849–858 (1988). 111. Y. Nemirovsky, D. Rosenfeld, R. Adar, and A. Kornfeld, J. Vac. Sci. Technol. 7(2), 528 (1989). 112. P. R. Norton, Opt. Eng. 30(11), 1,649–1,663 (1991). 113. P. R. Bratt, in R. K. Willardson and A. C. Beer, eds., Semiconductors and Semimetals, vol. 12, Academic Press, NY, 1977. 114. R. D. Nelson, Opt. Eng. 16(3), 275–283 (1977). 115. D. H. Pommerrenig, Proc. SPIE 443, 144–150 (1984). 116. S. B. Stetson, D. B. Reynolds, M. G. Staplebroek, and R. L. Stermer, Proc. SPIE 686, 48–65 (1986). 117. J. Wolf, Opt. Eng. 33(5), 1,492 (1994). 118. S. D. Gunapala et al., IEEE Trans. Electron. Dev ED-44(1), 45 (1997). 119. S. D. Gunapala, B. F. Levine, L. Pfeiffer, and K. West, J. Appl. Phys. 69, 6,517 (1991). 120. S. D. Gunapala et al., IEEE Trans. Electron. Dev. ED-45(1), 1,890 (1998). 121. S. D. Gunapala and S. V. Bandara, in H. C. Liu and F. Capasso, eds., Semiconductors and Semimetals, vol. 62, Academic Press, NY, 1999. 122. V. J. Realmuto, A. J. Sutton, and T. Elias, J. Geo. Phys. Res 102, 1,5057 (1997). 123. S. D. Gunapala et al., Proc. SPIE 3,379, 382 (1998). 124. Method for forming uncooled detector, Feb. 22 1994. U.S. Pat. 5288649, W. F. Keenan (to Texas Instruments Inc.). 125. R. A. Wood, in R. K. Willardson and E. R. Weber, eds., Semiconductors and Semimetals, vol. 47, Academic Press, NY, 1997.
PHOTOGRAPHIC COLOR DISPLAY TECHNOLOGY BRUCE E. KAHN Imaging and Photographic Technology Rochester Institute of Technology Rochester, NY
97. A. Phillips et al., Astrophys. J. 527(2), 1,009 (1999). 98. C. A. Castro and J. H. Tregilgas, J. Crystal Growth 86, 138–145 (1988). 99. T. Tang, M. H. Kalisher, A. P. Stevens, and P. E. Herning, in Ref. 70, p. 321. 100. G. N. Pultz et al., J. Vacuum Sci. Technol. B 9(3), 1,724–1,730 (1991). 101. M. A. Kinch, S. R. Borrello, and A. Simmons, Infrared Phys. 17, 127 (1977). 102. M. Itoh, H. Takigawa, and R. Ueda, IEEE ED 27(1), 150–154 (1980). 103. C. T. Elliott, Infrared Detectors, in Ref. 15. 104. A. J´ozwikowska, K. J´ozwikowski, and A. Rolgalski, Infrared Phys. 31(6), 543–554 (1991). 105. C. T. Elliott, D. Day, and D. J. Wilson, Infrared Phys. 22, 31–42 (1982). 106. L. O. Bubulac, J. Crystal Growth 86, 723–734 (1988). 107. J. P. Rosebeck, R. E. Starr, S. L. Price, and K. J. Riley, J. Appl. Phys. 53(9), 6,430–6,440 (1982).
SCOPE This article reviews color photographic output technology based on silver halide materials. The largest product market in this category is reflection print material, such as that used in color photographic paper. Color paper is used primarily for making prints from photographic negatives and recently, as a high quality output medium for digital images. A number of other types of products employ very similar emulsion technology. In addition to reflection display materials, several types of transmissive color photographic display materials use this technology. The film that is used for projecting motion pictures (motion picture ‘‘print’’ film) is based on this technology. Large color photographic display materials, used primarily for
PHOTOGRAPHIC COLOR DISPLAY TECHNOLOGY
advertising, are available on either clear or translucent plastic supports. These materials also use nearly identical emulsion technology to that used in photographic paper and motion picture print film. Because photographic emulsion technology is the basis for, and common among all of these different types of color photographic display products, photographic emulsion technology for display materials is the focus of this review. There is relatively little information published in journal articles or product information on the technologies described in this report. For this reason, this article relies heavily on information published in the patent literature. Although the patent literature gives a good indication of the type of technologies available and areas of research and intellectual property interest, it does not specifically show which technologies are actually used, or in which products. As such, this article focuses on the technologies available in color photographic display materials and should not be taken to indicate or imply which technologies, compounds, or materials are used in any particular product.
1209
The interlayers contain scavengers for oxidized developing agents. These scavengers prevent the formation of dye in one layer caused by development in an adjacent layer and thus prevent unwanted color contamination from the carryover of oxidized developing agents between layers. These scavengers are composed of ballasted hydroquinones or aminophenols. Compounds that absorb ultraviolet light are also commonly coated over the emulsion layer units to prevent unwanted dye formation and dye fade. PAPER BASE The most obvious difference between photographic paper and film is the support on which the silver halide emulsions are coated. There is a great deal of technology involved in making paper supports for photographic output. Although often neglected, many of the improvements in overall photographic performance have come from new support technologies. History
INTRODUCTION Silver halide materials have been the standard for color photographic output since they were first produced in the late 1960s. These materials provide exceptional image quality and durability at a very reasonable cost. A number of developments have been incorporated during the last 40 years to ensure that these materials remain the standard for high quality photographic output for some time to come. Silver halide color output materials continue to be used in a variety of new ways, including serving as output media for digital printing. STRUCTURE There are a quite a few different types of silver halide color photographic output materials. Most of these materials employ a reflective support. Motion picture print film uses very similar technology, except that the layers are coated on a transparent support for projection. There are also some special purpose display materials, which use clear or translucent supports. Color photographic output materials are composed of a number of layers coated on a suitable support. A variety of sequences of emulsion layers can be used (1,2) but the following is the most typical. Overcoat 1–5 µm Red sensitive (cyan dye forming) layer 2 µm Interlayer 1 µm Green sensitive (magenta dye forming) layer 2 µm Inter layer 1 µm Blue sensitive (yellow dye forming) layer 2 µm Polyethylene 30 µm Paper support 160 µm Polyethylene 30 µm Antistatic layer 0–1 µm
Paper has been used as a photographic support since around 1800 when Thomas Wedgewood (3) experimented with paper impregnated with silver nitrate. Around 1840, William Henry Fox Talbot used paper as a support in a negative/positive system (4). The first reflective silver halide material for viewing color images was ‘‘Type I Kodacolor Paper’’ produced in 1942 by Eastman Kodak (5). In spite of the historical uses of paper, this initial support was actually a pigmented acetyl cellulose film (6), rather than a more conventional paper base. Since those early days, many of the requirements for an optimum reflective photographic support have been realized. Some of the most important technical requirements of reflective photographic supports are chemical inertness (to avoid adversely affecting the photographic process), low absorption of processing chemistry, opacity, reflectivity, roughness, wet strength, emulsion adhesion, and uniformity. Physical properties, including flexibility, wet strength, dimensional stability, and freedom from curl, are also important. Baryta One of the earliest technologies used for color photographic supports was baryta paper. Baryta paper is of paper that is coated with barium sulfate (baryta). Barium sulfate is a commonly used white pigment. The baryta coating provided a high surface reflectance by coating a reflective pigment on the surface of the paper. Not only is barium sulfate highly reflective, it also maintains this high reflectivity fairly uniformly across the visible spectrum. This gives baryta coatings a white appearance without otherwise affecting the color or tone of the image. Barium sulfate could also be made in the correct particle size and shape to give smooth coatings and could be produced sufficiently pure so that it did not affect the photographic process (7). Baryta coated paper had a number of drawbacks, however (6). Because the paper was highly absorbent,
1210
PHOTOGRAPHIC COLOR DISPLAY TECHNOLOGY
it absorbed a considerable amount of photographic processing chemicals, all of which eventually needed to be washed out of the paper. This necessitated relatively long processing and drying times. To produce a glossy finish, the rough paper surface had to be pressed onto a heated polished drying drum (known as ferrotyping). Physical limitations, of the wet baryta paper, such as wet strength, dimensional stability, and curl, also limited its usefulness. Resin Coating (RC) The demand for color paper increased in the 1960s to such an extent that automatic printing and processing became a necessity (7). This need for automation and increased throughput spurred the development of an improved support for color paper. The most important requirement of a new support was that it would be waterproof. A number of methods were tried, including coating or impregnating paper with latex or silicone, chemical modification of the paper fibers such as partial acetylation, using synthetic fibers, and coating the paper support with various polymers (8). These efforts to develop an improved support led to the creation of paper supports that were extrusion coated with polyethylene. Polyethylene was chosen for a number of reasons, including price and availability, chemical inertness, water impermeability, flexibility, thermal stability, and low viscosity change with temperature. Additionally, polyethylene was chosen because of the ease of extrusion coating. One significant problem that had to be overcome before polyethylene coated papers could be used for photography was the adhesion of emulsion layers to the polyethylene. Aqueous gelatin emulsion solutions do not adhere well to the very hydrophobic polyethylene surface. Fortunately this problem was fairly easily solved by technology that was developed for the printing and packaging industries (6). The printing industry solved the problem of applying hydrophilic dyes to a hydrophobic (plastic) surface by irradiating the plastic surface with a high-voltage electrical discharge. This corona discharge treatment (CDT) works very well to promote the adhesion of gelatin emulsion layers to polyethylene surfaces. CDT oxidizes the surface of the polyethylene, affording better adhesion of the hydrophilic solution. These extrusion-coated polyethylene paper supports are called resin coated (RC) papers. The first RC paper was introduced by Eastman Kodak in 1967–1968, and RC papers continue to be used as the primary support for silver halide based photographic color display materials. RC materials offer the advantages of reduced absorption of processing solutions, faster drying times, less usage of water and processing chemicals, energy savings, reduced curl, and the availability of a variety of surface textures. Although the polyethylene coating prevents the absorption of solutions into the paper from the front and back sides, solutions can still absorb into the paper from the edges. Additives can be added to the paper pulp to minimize or prevent this. The transmission of water through the paper can be reduced by adding a number of materials, such
as polyethylene terephthalate, polybutyl terephthalate, acetates, polyethylene vinyl acetate, ethylene vinyl acetate, methacrylate, polyethylene methylacrylate, acrylates, acrylonitrile, polyester ketone, polyethylene acrylic acid, polychlorotrifluoroethylene, polytetrafluoroethylene, amorphous nylon, polyhydroxamide ether, and metal salts of ethylene methacrylic acid copolymers. In addition to the use of various polymers to prevent water absorption, other polymeric materials can be used to prevent oxygen from diffusing through the layers. These materials are known as oxygen barrier layers. Polyvinyl alcohol, polyvinylidene chloride, and aliphatic polyketone polymers can be used for this purpose. Also used are polymers of acrylonitrile, alkyl acrylates or methacrylates, alkyl vinyl esters or ethers, cellulose acetates, polyesters, polyamide, and polycarbonate. Front (Emulsion) Side. Modern RC paper supports contain a variety of other additives such as pigments, colorants, dyes, stabilizers, and optical brighteners. The print (emulsion) side usually contains a pigment to provide image sharpness, reflectivity, and opaqueness. The white pigment most commonly used is anatase or rutile TiO2 . One of the problems in using TiO2 is that it can accelerate the photooxidative degradation of polyolefins (8). This effect can be reduced by the incorporating UV stabilizers in the TiO2 -containing layers. Hindered amine light stabilizers, such as those based upon 2,2,6,6-tetramethylpiperidine can be used for this purpose. These compounds form stable nitroxyl radicals that interfere with the photooxidation of polyolefins in the presence of oxygen. Antioxidants such as hindered phenols and organic alkyl and aryl phosphites can also be used. Blue colorants are generally added to compensate for the slight yellow color of gelatin. Because they are generally used in the extruded polymer layer, thermally stable materials such as pigments are used. Some of these are phthalocyanines, Chromophthal, Irgazin, and Irgalite blue pigments. Another technique for providing a blue color is the use of optical brighteners. Optical brighteners are colorless, fluorescent, organic compounds that absorb UV light and emit it as visible blue light. Examples of optical brighteners are derivatives of 4,40 -diaminostilbene-2,20 disulfonic acid, coumarin derivatives such as 4-methyl7-diethylaminocoumarin, 1,4-bis(o-cyanostyryl)benzol and 2-amino-4-methyl phenol. Back Side. The back side (side opposite the emulsion) is composed of a polymer similar to that used on the front (emulsion) side of the paper. The back side, however, has a rough surface. This allows the back side to be written on and can improve the transport properties through processing equipment. This roughness can be achieved by adding particulate materials whose particle size is between 0.20 and 10.0 µm. Some of the materials that can be used for this are inorganic compounds such as TiO2 , SiO2 , CaCO3 , BaSO4 , Al2 O3 , kaolin, and mica. Crosslinked polymer beads or incompatible block copolymers that are mixed with the polyolefin matrix may also be used. Surface roughness can also be achieved by embossing a pattern onto the polymer backing.
PHOTOGRAPHIC COLOR DISPLAY TECHNOLOGY
The back side of the support also contains electrically conductive compounds (antistats) that reduce the chance of exposing the material by static charges. Examples of antistatic coatings include conductive salts, colloidal silica, fatty amine or ammonium compounds, and phosphate esters. DuraLife. Recently researchers at Eastman Kodak developed a multilayer laminated paper support that has greatly improved photographic properties. They claim that this new support is ‘‘the most significant advancement in 50 years of photographic paper’’ (9). This new paper improves optical properties (surface smoothness, sharpness, brightness, and opacity), physical properties (tear resistance, stiffness, curl, gloss, and acceptance of writing), and process efficiency (edge penetration reduction). This support uses a composite material that consists of multilayered coextruded, biaxially oriented polyolefin sheets laminated to the top and bottom sides of a cellulose paper core. The top sheet contains a white pigment, an optical brightener, a UV stabilizer, and a unique microvoided polyolefin layer. The following is the overall structure of this paper in a color photographic format: Overcoat Light-sensitive AgX imaging layers Five layer biaxially oriented, microvoided polyolefin sheet C TiO2 , blue tint, and optical brightener Ethylene plastomer C TiO2 Cellulose paper base C TiO2 C blue dye Ethylene plastomer Biaxially oriented polyolefin sheet Antistatic coating This is the substructure of the top polyolefin composite sheet upon which the emulsion layers are coated: Layer
Contents
Thickness (µm)
L1 L2 L3 L4 L5
LDPE C blue tint Polypropylene C TiO2 Voided polypropylene Polypropylene C TiO2 Polypropylene
1.0 8.0 20.0 8.0 1.0
One very unique aspect of this support is the use of microvoided layers to provide whiteness and opacity. Layer L3 shown in the table is composed of a substructure that combines layers of alternating high and low refractive index. The high refractive index (n D 1.49) is due to the matrix polypropylene layers. Each polypropylene layer is 1.8–2.5 µm thick. Voids between layers give rise to low refractive index (n D 1.0). The voids are 0.8–1.5 µm thick. Approximately 15 such sublayers layers make up this layer. This alternating structure is more opaque than the commonly used polyethylene containing TiO2 . This invention (10) contains an integral emulsion bonding layer, that consists of a low-density polyethylene
1211
(LDPE) skin (layer L1) on the biaxially oriented sheet. Because emulsions adhere well to LDPE, this layer eliminates the need for CDT. This layer also contains blue tints that correct for the native yellowness of the gelatin. Concentration of these tints in a thin layer, reduces the amount of the tint materials needed. EMULSION CHARACTERISTICS Requirements The exact nature of the silver halide emulsions used for photographic color display materials are dictated by some of the stringent product specifications. The need for rapid processing requires emulsions that develop quickly and exert minimal restraint on development. This necessitates emulsions of low bromide and iodide content, because bromide and iodide restrain development. For environmental reasons, the quantity of effluent should be minimized, and the effluent should be environmentally benign. Photographic color display materials require very low minimum densities (Dmin ), typically less than 0.1. This requirement is the most critical for reflection print materials, where higher Dmin is more noticeable and objectionable. High sensitivity (speed) is always desirable, particularly for the blue record of photographic color display materials. Another important requirement is that the sensitivity (photographic speed) remain constant across a wide range of exposure times. This is known as flat reciprocity. It is also very important that these photographic materials maintain their characteristics in spite of the way they are stored, even at high temperatures or humidities. Two of the most important ‘‘keeping’’ properties are the variations of photographic speed between manufacture and exposure of the material, known as ‘‘raw stock keeping,’’ and between exposure and processing of the material, known as ‘‘latent image keeping.’’ Stable dyes that are not prone to fading are also highly desirable. One of the most stringent requirements for photographic print elements is that they possess very low minimum density. The Dmin requirements for color paper are typically less than 0.1. Minimum density levels that are acceptable for film products would be extremely objectionable for a print material when viewed against a white reflective support. These minimum density requirements are achieved by judiciously selecting high chloride emulsions and employing antifoggants. A variety of compounds can be added during the precipitation process to reduce fog. Some compounds that are used for this purpose include hydrogen peroxide, peroxy acid salts, disulfides (11), and mercury compounds (12). Another important requirement of color display materials is that they maintain their sensitometric characteristics as a function of time. This ability is called ‘‘keeping.’’ There are two major types of keeping, latent image keeping and raw-stock keeping. Latent image keeping (LIK) performance is generally measured in terms of observed variations of photographic speed as a function of the time delay between imagewise exposure and processing. Raw-stock keeping is important to stabilize silver halide
1212
PHOTOGRAPHIC COLOR DISPLAY TECHNOLOGY
emulsions against changes such as those due to storage at high temperature and humidity. Some techniques for improving the raw-stock keeping have recently been described (13). A number of compounds have been described which can be used to reduce the amount of dye fade. A compound that has both a hindered amine and a hindered phenol in the same molecule prevented the deterioration of yellow dye images by heat, humidity, and light (14). Light fade of magenta dyes was reduced using spiroindanes (15) or chromans that had a hydroquinone diether or monoether as a substituent (16). The light stability of cyan dyes can be improved by using benzotriazole ultraviolet absorbers. Theoretically, the exposure E of a photographic element is given by the product of the exposure intensity (I) and the exposure duration (time). This is known as the ‘‘reciprocity law.’’ Actual photographic materials usually deviate somewhat from this ideal behavior at very long or very short exposure times. This deviation from the ideal reciprocity law is known as ‘‘reciprocity law failure.’’ Typically, for the kinds of emulsions used in color paper, the speed and contrast decrease as the exposure intensity increases (or exposure time decreases). This is known as high-intensity reciprocity failure (HIRF). Higher exposures must be employed to compensate for reduced exposure times. One of the most important developments in color paper technology has been the discovery of iridium doping for controlling reciprocity behavior. This technology is discussed further in the section on dopants. One of the most significant challenges in silver halide output technology is achieving sufficient sensitivity in the blue sensitive layer (‘‘blue speed’’). Blue speed is a critical factor for several reasons. Incandescent light sources are normally used to expose these types of photographic materials. The energy output of incandescent light sources decreases as wavelength decreases. In other words, there is less blue light to begin with than green or red light. The color negative film through which the light must pass before exposing the photographic printing paper usually has masking dyes that give it an orange tint. These masking dyes absorb much of the blue light. So, the small amount of blue light emitted by the incandescent light source is reduced even further by the masking dyes in the color film. As described before, at least three light sensitive emulsion layers are used to capture a photographic image. In the structure used most typically, the blue sensitive emulsion is placed below the red and green sensitive emulsion layers. Although it offers many other advantages, this layer order severely reduces the amount of light available to the blue layer because of the light scattering and absorption that occurs in the layers above. So, the amount of blue light available to expose the blue sensitive layer is limited by the incandescent light source and the masking dyes, as well as the presence of the red and green sensitive emulsion layers. Silver Halide Precipitation, Structure, and Technology For many reasons, the silver halide emulsions that are used for color paper are composed primarily of AgCl. Silver chloride is more soluble than silver bromide or silver iodide. Chloride ion in the development solution
has less restraining action on development than bromide or iodide. For these reasons, chloride emulsions can be processed quickly, which allows photofinishers to increase their throughput. Furthermore, chloride emulsions use smaller quantities of processing solutions, which gives rise to better ecological compatibility and reduced processing effluent. The silver halides used for color display materials are generally precipitated in the presence of ripening agents. Ripening agents (ripeners) are compounds that enhance the solubility of silver halide grains and are used primarily to increase the grain size and to make geometrically uniform grains. However, the presence of ripeners during precipitation can also affect the morphology of the silver halide grains. In general, the silver halide grains have rounded corners and edges when precipitated in the presence of ripeners. Additionally, residual ripener may be adsorbed on the surface of the emulsion grain and may not be removed during washing and filtration. This residual ripener may be carried over to subsequent emulsion preparation steps. Such remaining ripener may cause undesirable sensitization effects and create fog during storage. Halide Conversion. The chloride emulsions used in color photographic display materials typically contain a small amount of bromide. This bromide is usually introduced following precipitation of the AgCl emulsion. This process is known as ‘‘halide conversion.’’ Photographic emulsion performance can be a very sensitive function of the exact nature of the bromide distribution on the AgCl grains. Silver chloride emulsions can be converted to silver chlorobromide emulsions in a variety of ways. Almost any source of bromide can be used. The three major types of bromide sources are soluble bromide (KBr), small grain (Lippmann) silver bromide (17), and organic bromides. The bromide distribution on the grains can be modified, for example by adsorbing an organic compound on the surface of the AgCl grain before introducing of the bromide source (18). Recently, this bromide conversion process was studied dynamically using a reflection spectrophotometric technique (19). This technique has a number of advantages, such as the ability to study the conversion process in situ, the sensitivity to small amounts (<5 % of the amount of chloride) of bromide, and the opportunity to study reaction products and reactants simultaneously. This study showed that the halide conversion rate depends on many factors, such as the source of bromide and the presence and concentration of ripeners and restrainers. The rate of halide conversion is much faster using soluble bromide sources (KBr) than insoluble ones (Lippmann AgBr). Ripeners increase the halide conversion rate, whereas restrainers suppress it. Iodide. In the last several years, a number of patents have been issued to Eastman Kodak on incorporating iodide into chloride emulsions. Although it can not be known for sure which, if any, of these technologies are incorporated into Kodak products, this technology seems to be important and heavily researched.
PHOTOGRAPHIC COLOR DISPLAY TECHNOLOGY
Recently, it has been shown that incorporating small amounts (0.1–0.6 mole % iodide, based on total silver within the host grains) of locally high iodide concentrations (sometimes referred to as ‘‘dump iodide’’) below the surface of an AgCl emulsion can greatly increase the sensitivity (speed) of the emulsion (20). This speed enhancement is primarily attributable to the intentional inclusion and specific placement of iodide within the host grains. The disruptive effect of iodide ions on the crystal lattice is suspected of playing an important role in this increased photographic sensitivity. Because iodide ions are much larger than chloride ions, the crystal cell dimensions of silver iodide are much larger than those of silver chloride. For example, the crystal lattice constant of silver iodide is 5.0 A˚ compared to 3.6 A˚ for silver chloride. These local high iodide concentrations cause local increases in the crystal lattice size and apparently create traps that increase photographic efficiency. Although chloroiodide emulsions are effective in improving speed, a disadvantage is that they can increase the minimum density. The production of density in unexposed areas is known as fog. It is believed that this fog is caused by iodide ion released during development that migrates to unexposed grains and disrupts the crystal lattice structure sufficiently to render these additional grains developable. This results in increased minimum density. Chloroiodide induced fog can be reduced by blending silver iodochloride grains with small amounts of a second grain population immediately before coating (21). The second grain population is composed of very small grain (Lippmann), unsensitized silver halide that is more soluble than silver iodide. This second grain population, it is thought, acts as a ‘‘sink’’ for iodide liberated during development and prevents it from creating fog by reacting with the primary grain population. As described for bromide conversion, a number of different sources of iodide can be used to introduce iodide into silver halide grains during precipitation. Iodide ion can be added as a soluble salt, such as an alkali or alkaline earth iodide salt (20–22), an iodate (23), or a triiodide salt; sparingly soluble iodine, by ripening the fine silver iodide grains of a Lippmann emulsion; by cleaving iodide ions from an organic molecule (24); or by using an amine complex of a silver iodide salt.
1213
parameters, and the kettle volume. In these precipitations, grains that happen to impinge on the point of iodide ion introduction encounter higher iodide ion concentrations than the remainder of the grains. This problem can result in uneven distribution of the iodide ion from grain to grain and from batch to batch. The resulting silver iodochloride emulsions can vary significantly in photographic performance and can be very difficult to manufacture consistently.
Iodate (IO3 − ). One way that reportedly improves the control over iodide ion uniformity is to use iodate (IO3 ) as the source of iodide (23). Iodide can be released from iodate by reacting it with a mild reducing agent, such as sulfite. The reaction of iodate with sulfite is as follows: IO3 C 3SO3 ! I C 3SO4 D Unfortunately, sulfite is photographically active, and is known as a grain ripening agent, that is, it tends to speed the ripening out of smaller grains onto larger grains and the preferential solubilization of grain edge and corner structures. This can have the undesirable effect of changing the shape of the grains. Furthermore, the use of iodate (IO3 ) ion to release iodide (I ) anion is relatively inefficient and requires three sulfite anions to release a single iodide (I ) anion. So, a relatively large amount of sulfite may be necessary, which can cause other deleterious photographic effects.
Organic Iodides (R–I). Another way to improve the control over iodide ion uniformity is to use an organic compound that releases iodide when it reacts with a nucleophile (25). The nucleophiles, however, can produce other harmful photographic effects. Some of the nucleophiles that have been used are hydroxide, sulfite, or ammonia. Hydroxide and ammonia are alkaline species that increase the emulsion pH and can produce fog. This occurs because the alkaline conditions favor reduction of AgC to Ag0 . If enough Ag0 atoms are produced in close proximity to each other, the grain can spontaneously develop, independent of its exposure. This is sometimes referred to as reduction fog or R-typing. Sulfite is a known silver halide grain ripening agent that can cause changes in grain morphology (see previous discussion).
Iodide Source
Soluble Iodide Salts. One of the easiest ways of introducing iodide is as a soluble salt, such as an ammonium or alkali metal iodide salt. Silver and/or chloride ion may be added at the same time or following iodide addition. Soluble iodide salts facilitate adding iodide rapidly, which is required to produce the necessary local high iodide concentrations There is a disadvantage in using soluble iodide sources, however. Soluble iodides can cause mixing sensitivity problems in large-scale precipitation of iodochloride emulsions. The rate of reaction between iodide ion and silver ion can be much faster than the rate of dispersion of the soluble iodide reactant. The dispersion rate depends on the amount of iodide added, various mixing related
Triiodide (I3 − ). Another source of iodide that appears to offers some advantages is triiodide (26). Triiodide is water soluble and provides an efficient, environmentally friendly, and convenient source of iodide from readily available materials. Triiodide does not require using additional photographically active materials for iodide release, using organic solvents, or generating organic by-products. Furthermore, triiodide generates iodide ion under mild conditions that avoid the risk of reduction fog and grain ripening. When added to photographic emulsions, triiodide slowly releases iodine and iodide as shown in the following equation: I3 ! I C I2
1214
PHOTOGRAPHIC COLOR DISPLAY TECHNOLOGY
The reaction reportedly goes to completion, driven by the constant removal of the iodide ion by incorporation in the silver halide grains. The delay in iodide ion release allows the triiodide to be well dispersed before incorporation of iodide into the silver halide grains. This reduces the variation in iodide content between grains and results in a more robust emulsion that has improved photographic performance.
Silver Halide Amine Complexes ([Me2 NH2 ]n [AgICln ]). Similar to the use of Lippmann AgI as a silver source, silver halide amine complexes of the type [Me2 NH2 ]n [AgICln ] have been used as single source materials for incorporating iodide into silver chloride (27). Iodide is incorporated by introducing the precursor material into an aqueous medium of silver chloride crystals, where iodide is released from the complex as silver iodide. Doping. One of the most actively patented silver halide technologies is the use of dopants in silver halides. The dopant related patent activity has been intense and has been conducted by almost all photographic manufacturers. Dopants can modify almost any photographic property and can be used in extremely low amounts (as low as a few atoms per silver halide grain!). Display materials, in particular, have benefited from dopant technology. Once again, it is impossible to know exactly which of these dopants is incorporated in any particular product. There seems to be little doubt, however, that dopant chemistry is one of the most important areas of research in silver halide technology, particularly in display materials. A dopant has been defined as an ion or complex that is incorporated in the silver halide lattice in a dispersed manner (28). This definition excludes compounds that exist in separate phases or are adsorbed onto the surface of silver halides. Initially, it was thought that dopants operate simply by replacing a single silver ion by an ion of another metal. This viewpoint evolved to one where a hexacoordinated anionic dopant complex (MX6n Ln )m replaced the silver halide lattice unit (AgX6 )5 . Recently, it has been shown that very large, complicated organic ligands can also be incorporated intact in the silver halide lattice as part of a transition-metal dopant complex. To accommodate such large ligands, it is necessary to consider replacing even larger portions of the silver halide lattice (Agn Xm )p by the organometallic dopant complex. This technology has given rise to a great deal of control over the electronic properties of the dopant complex. Dopants can be incorporated in a silver halide lattice in a number of ways. Most commonly, a water soluble solution of the dopant is added during the precipitation process. Another convenient method of incorporating dopants into a AgCl emulsion is to dope a small grain (Lippmann) AgBr emulsion and use this to effect both the halide conversion and doping of the AgCl substrate. In this manner, both the dopant and the silver bromide can be added to the host AgCl grain in a single step. Iridium (Ir). Group 8–10 hexahalo coordination complexes create deep electron traps in silver halides (29–31).
Deep electron traps trap electrons permanently or for relatively long periods of time. Deep electron traps are useful for controlling reciprocity failure. Most notable among these metals is iridium, which has been used extensively as a dopant to control the reciprocity behavior of photographic emulsions (32,33). Iridium hexachloride complexes are commonly used as dopants to reduce reciprocity law failure in silver halide emulsions. Many iridium dopants are effective for this purpose (34). Although iridium is incorporated in the silver halide grain as Ir3C , Ir4C complexes are preferred because they are more stable (35). Organic Ligands. In addition to simple metal ions, organometallic complexes can be incorporated in silver halide grains (36) to produce photographic performance modifications not possible by incorporating the transition-metal ion alone. It is also possible to incorporate organometallic complexes that have large organic moieties (37) and obtain photographic performance that can be attributed specifically to the presence of the organic ligand or ligands. Surprisingly, the selection of organic ligands in these complexes is not limited by steric considerations, long considered important for dopant incorporation in the silver halide lattice. For practical reasons, the organometallic complex should be thermally and structurally stable and exhibit at least very slight water solubility. This technology provides an excellent opportunity to tailor photographic performance to meet specific requirements. The organic ligands used in these complexes can be selected from a wide range of organic families, including substituted and unsubstituted aliphatic and aromatic hydrocarbons, secondary and tertiary amines (including diamines and hydrazines), phosphines, amides (including hydrazides), imides, nitriles, aldehydes, ketones, organic acids (including free acids, salts, and esters), sulfoxides, and aliphatic and aromatic heterocycles containing chalcogen and pnictide (particularly nitrogen) atoms in the heterocyclic ring. These organometallic dopant complexes can be effective in extremely small concentrations. Depending on the specific dopant and intended use, concentrations of 109 –104 moles of dopant per mole of silver can be used.
Applications Reduced Dye Desensitization. It has been shown that dye desensitization can be reduced when silver halide grains are doped with group 8 organometallic complexes that containing very electron withdrawing ligands, such as pseudohalide (CN, SCN, SeCN) ligands (37). These complexes are effective in concentrations of 108 –105 moles of dopant per mole of silver. Speed Reduction. An interesting application of organometallic dopants reduces photographic speed with minimal or no alteration in photographic contrast. Organometallic cobalt complexes that have electron withdrawing ligands are useful for this purpose (37). These complexes are effective in concentrations of 109 –106 moles of dopant per mole of silver.
PHOTOGRAPHIC COLOR DISPLAY TECHNOLOGY
Reciprocity, Heat Sensitivity, and Latent Image Keeping. Aromatic heterocyclic organometallic complexes of the forms shown following can be used to reduce high intensity reciprocity failure (HIRF), reduce low intensity reciprocity failure (LIRF), reduce thermal sensitivity, and improve latent image keeping (LIK) simultaneously. The preferred heterocycles for this application are azoles and azines, specifically thiazoles, thiazolines, and pyrazines. These complexes are effective in concentrations of 109 –103 moles of dopant per mole of silver: [MXx (H2 O)y Lz ] M is a group 8 or 9 metal (preferably iron, ruthenium, or iridium), X is a halide or pseudohalide (preferably Cl, Br, or CN) x D 3–5, y D 0, 1 L is an organic ligand z D 1, 2 Bimetallic complexes that have bridging organic ligands can also be used: 0
[MX5 LM X5 ] M and M0 are group 8 or 9 metals (preferably iron, ruthenium, or iridium, or any combination) X is a halide or pseudohalide (preferably Cl, Br, or CN) L is a bridging organic ligand, particularly those containing nitrogens, such as pyrazine, bipyridine, and phenanthroline
Reciprocity. The organometallic thiazole complexes following are particularly useful for controlling reciprocity at very short exposure times, such as those used in digital printing (38). [IrX6m Lm ]3m m D 0–1 X is a halide or pseudohalide (preferably Cl, Br, or CN) L D thiazole
Contrast-Increasing Dopants (NZ). Rhodium hexahalides are well known and widely employed dopants for increasing photographic contrast. These dopants are generally used in concentration ranges of 106 –104 moles of rhodium per mole of silver. Substituting an organic ligand for one or two of the halide ligands in rhodium hexahalide can result in a more stable complex (37). Group 8 nitrosyl or thionitrosyl complexes are also useful for increasing contrast (40–41). They are optimally incorporated below the surface of the grain to avoid reducing the grain sensitivity. Shallow Electron Trapping (SET) Dopants. Some dopants can be used to increase the sensitivity of photographic emulsions. This was first reported by Keevert et al. using an organometallic complex containing cyano ligands as a dopant in high chloride grains (39). A general class
1215
of dopants that can be used for this purpose are those that provide shallow electron trapping sites. A shallow electron trap is one that holds photoelectrons only temporarily, as opposed to ‘‘deep’’ electron traps, which trap electrons permanently. Shallow electron trapping dopants require a net valence that is more positive than C1 (the valence of the silver ion they replace), a filled highest occupied molecular orbital (HOMO), and a lowest occupied molecular orbital (LUMO) higher in energy than the lowest lying AgX conduction band (40–41). Many such complexes have been described (41) and are effective in concentrations ranging from 107 to 104 moles of dopant per mole of silver. Shallow electron traps improve photographic sensitivity by reducing some of the inefficiencies inherent in silver halides. It is thought that the dopant lowers the energy of the conduction band at the site of the dopant and temporarily attracts photoelectrons at the dopant site. This localization of electrons at the trap sites reduces the chances of various electron loss and recombination processes, thus improving photographic efficiency. Sensitization
Chemical Sensitization. To increase their photographic efficiency, silver chloride emulsions used for photographic color display materials can be sensitized by three different methods, either individually or in combination with each other (42). Sulfur sensitization employs sulfur-containing compounds such as thiosulfates, thioureas, or mercapto compounds that can react with silver. Generally, sulfur sensitization is conducted in the presence of a goldcontaining compound. Sulfur and gold may also be contained in the same compound. An example of this used by Eastman Kodak is colloidal aurous sulfide (43). Reduction sensitization uses reducing agents, such as stannous salts, amines, or borane derivatives (44). Chemical sensitization may also be conducted in the presence of other compounds such as thiocyanates (45), thioethers (46), azaindenes, azapyridazines, and azapyrimidines (47). Chemical sensitization can take place in the presence of spectral sensitizing dyes (48) or can be directed to specific sites or crystallographic faces on the silver halide grain (49). Spectral Sensitization. Intrinsically, silver chloride emulsions absorb very little visible light. Spectral sensitizing dyes need to be added to silver halides to impart visible light sensitivity. This process is known as spectral sensitization. A large number of dyes have been developed for this purpose (50), and generally vary widely from manufacturer to manufacturer and product to product. Dyes that have sensitizing maxima at wavelengths throughout the visible and infrared spectrum and have a great variety of spectral sensitivity curve shapes are known. The choice and relative proportions of dyes depends on the region of the spectrum to which sensitivity is desired and on the shape of the spectral sensitivity curve desired. Most typically, polymethine dyes are used, particularly cyanines and merocyanines. Normally, several sensitizing dyes are used, at least one per color record. It is also possible to use combinations
1216
PHOTOGRAPHIC COLOR DISPLAY TECHNOLOGY
of dyes in a single color record, where each has a different absorption maximum, to achieve spectral sensitivity whose maximum is between the sensitizing maxima of the individual dyes. Combinations of spectral sensitizing dyes that result in a phenomenon known as supersensitization can also be used (51). Supersensitization occurs when the spectral sensitization of a combination of dyes is greater in some spectral region than that from any concentration of one of the dyes alone or what would be predicted from the additive effect of the dyes. Supersensitization can be achieved by using selected combinations of spectral sensitizing dyes and other addenda such as stabilizers and antifoggants, development accelerators or inhibitors, coating aids, brighteners, and antistatic agents. Spectral sensitizing dyes can be added at any stage during the preparation of the emulsion. They may be added at the beginning of or during precipitation (52), before or during chemical sensitization (53), before or during emulsion washing (54), or can be mixed in directly before coating (55). Small amounts of iodide can be adsorbed on emulsion grains to promote aggregation and adsorption of spectral sensitizing dyes (56). Depending on their solubility, spectral sensitizing dyes can be added to an emulsion as a solution in water or in such solvents as methanol, ethanol, acetone, or pyridine; dissolved in surfactant solution (57); or as a dispersion (58).
Couplers. Compounds added to an emulsion layer form the image dye when they react during processing with oxidized developer solution. These image dye forming compounds are known as couplers. Couplers are defined as 4-equivalent or 2-equivalent depending on the number of atoms of AgC required to form one molecule of dye. Two-equivalent couplers are preferred because they reduce the amount of silver required. Couplers have different chemical structures, depending on the color image dye [cyan (59), magenta (60), or yellow (61)] they are to produce. As for sensitizing dyes, the couplers used vary considerably from manufacturer to manufacturer and product to product. When a coupler reacts with oxidized developer, a portion of the coupler molecule is released. This fragment is known as a coupling-off group. Because this group is released only at the locations where there is an image, it can provide a number of useful functions (62). After being released from the coupler, coupling-off groups can accelerate or inhibit development or bleaching, influence dye hue or formation, accomplish color correction, or facilitate electron transfer. Coupling-off groups can modify the reactivity of the coupler and can affect either the layer containing the coupler or other layers in the photographic material.
Antifoggants. As mentioned before, one of the most important requirements for photographic color display materials is that they provide very low Dmin and avoid fog. Many compounds can be added to stabilize the photographic characteristics and to help prevent the
generation of fog during manufacture, storage, or processing (63). These compounds are known as antifoggants or stabilizers. Examples of these types of compounds are azolium salts (64), heterocyclic mercapto compounds, mercaptotetrazoles (65) (particularly 1-phenyl-5mercaptotetrazoles and substituted derivatives thereof), azaindenes such as tetraazaindenes (66) [especially 4hydroxy-substituted (1,3,3a,7)tetraazaindenes], dichalcogenides (67), benzenethiosulfonic acids, and benzenesulfinic acids. Antifoggants can be added to an emulsion at many stages of the manufacturing process. They are usually added just before coating or during chemical or spectral sensitization. Similarly, compounds can be added during the precipitation process to prevent fog formation. These agents include hydrogen peroxide, peroxy acid salts, disulfides (68), mercury compounds (69), and p-quinone (70). Other Addenda. In addition to the compounds discussed previously, many other addenda can be added to emulsions designed for photographic color display materials. These addenda can be used for a variety of purposes, including whitening, development acceleration, antihalation, and gelatin hardening. Whitening agents such as stilbenes, triazines, oxazoles, or coumarins can be added to photographic emulsions. These whitening agents may be contained in any layer. If the whitening agent is not water-soluble, a dispersion of it may be used. Compounds that increase the rate of development can be added to a photographic material. Benzyl alcohol was commonly used as a development accelerator, but its use has been decreasing for environmental reasons. Other compounds that can be used are cationic compounds and neutral nitrate salts (71), polyethylene glycols, and thioethers (72). Compounds can be added to increase the hardness of gelatin to improve the physical properties of a photographic color material. Bisvinylsulfonyl compounds are particularly useful as hardeners. Other compounds, known as ‘‘fast acting’’ hardeners (73), can also be used for this purpose. DIGITAL One of the fastest growing markets for silver halide photographic output materials is that for producing hard copy images from digital images. This includes photofinishing systems that scan (and manipulate) processed film, then write the images digitally onto silver halide materials. Many of the basic principles of electronic printing have been discussed by Hunt (74). Some of the advantages of color silver halide materials for digital photographic output are the image quality, the look and feel of the material, the permanence and durability of the images, the large installed base of processing equipment, high productivity, and low media cost. Silver halide materials can record continuous-tone, very high resolution images at very low cost. Digital output onto silver halide materials provides all of the advantages inherent in silver halide materials
PHOTOGRAPHIC COLOR DISPLAY TECHNOLOGY
and the additional capabilities of digital image processing enhancements such as improvements in image structure and defect removal, in addition to the more conventional balance and tone scale modifications. In digital photofinishing, each pixel can be manipulated and optimized throughout the process. This allows using the full latitude for all media, and can even correct some normal failures encountered in photography. Based on the growth in digital imaging and the advantages provided by silver halide based output materials, the future for these materials looks very bright. Silver halide materials have been used as output materials for a number of years. Initially, the graphic arts market exposed silver halide materials by using lasers for proofing or making digital separations. Film recorders, which used digital exposing onto either positive or negative photographic materials, have also been available for a number of years. Most of the early technologies for exposing silver halide materials used CRTs to generate the exposures. Other digital exposing technologies are based on LEDs, LCDs, or lasers (gas or solid state). In 1987, Kodak introduced a monochrome laser printer for the health imaging industry (75). Kodak also developed a device in the late 1980s which used a trio of a red, green, and blue lasers to expose silver halide materials. Although there have been examples of digital exposures onto silver halide materials, it has only been recently that color silver halide materials have been specifically designed for digital exposure. Many of the properties of digital exposing devices necessitate developing an output material which is specifically designed for that purpose, rather than using materials intended for optical printing. Some of these critical digital exposing device-dependent properties are spectral energy output, flare, fringing, exposure time, exposure overlap, the number of pixels or colors exposed at a time, and multiple exposures. Other non-devicedependent factors driving the development of color silver halide materials for digital output are the maximum image size available, color gamut, resolution (spatial and color), printing speed, media cost, and optimization of the output materials for both text and image display. To generate optimal print quality from digital exposure devices, output media must satisfy a number of requirements. Both the exposure (power) required to achieve the maximum density and the spectral sensitivity of the media must be designed to match the exposing device. The exposure times used by digital exposure devices can be extremely short (as short as nanoseconds), necessitating carefully optimized reciprocity control at these times. The media must be able to generate sufficient contrast for all three color records at the exposure times used. Minimum density and color gamut are also important media parameters. The contrast of the media is a very important parameter. The contrast needs to be large enough that the maximum density can be achieved at approximately 1.2 log E more than the energy required to get just above Dmin . Otherwise, so much energy is needed to achieve the high densities that scattering of the light within media causes flare or fringing. These problems can be seen
1217
visually in images as blurring or fuzzy text, particularly in dark text on a light background or vice versa. However, if the contrast is too high, amplification of digital artifacts, such as lines or bands in the writing direction caused by nonuniformities of the exposing device, can be seen. High contrast can also give rise to density jumps or image contouring, caused by insufficient digital resolution to generate smooth density changes. It is also important that the contrast of the three color records be appropriately matched; otherwise, color flare or fringing can result, which is much more objectionable than neutral flare or fringing. The lowest contrast color record will be the first to display flare or fringing. The spectral sensitivity of the media and the spectral distribution of the exposing device also require careful matching. If the spectral overlap is poor, color contamination and color gamut reduction will occur. Productivity loss will also occur, requiring higher energies, which may cause flare or fringing. Photographic papers for digital exposures also require that there be no chemical cross talk between layers. This means that the oxidized developer from one layer cannot be allowed to migrate to adjacent layers, thus forming image dye where it is not desired. This problem is greatest at maximum densities, where the highest concentration of oxidized developer is generated. Another approach is to design materials that are compatible with both optical and digital exposing devices. The ability of a material to be used from nanoseconds to hundreds of seconds is truly impressive. Konica has been able to make emulsions that have these reciprocity characteristics by using transition-metal doping, optimization of chemical sensitization, and new emulsion stabilizer technology (76). Fluorescent brighteners are used to achieve low Dmin . Fast developing emulsions and improved couplers are used to achieve high Dmax . Konica has been supplying such materials since 1993 when they released Konica Color QA paper. Their newer materials employ high contrast in the middle- to high-density regions and low contrast in the low-density region. DYE BLEACH (ILFOCHROME) Dye bleach technology is another method of forming color photographic display images, particularly from positive originals. This technology is used, for example, in Ilfochrome materials. In the dye bleach process, preformed dyes (as opposed to those formed by coupling in the development process) are destroyed in an imagewise process (77). BIBLIOGRAPHY 1. Research Disclosure, Item 36544, Section XI. Layers and layer arrangements. 2. Digital imaging with high chloride emulsions, U.S. Pat. 5783373, 1998, J. Z. Mydlarz, J. A. Budz, and E. L. Bell (to Eastman Kodak Company). 3. H. Gernsheim, The History of Photography, Oxford University Press, London, New York, 1955, p. 32. 4. H. Gernsheim, Endeavour 1, 19 (1977).
1218
PHOTOGRAPHIC COLOR DISPLAY TECHNOLOGY
10. Superior photographic elements including biaxially oriented polyolefin sheets, U.S. Pat. 6030742, 2000, R. P. Bourdelais et al., (to Eastman Kodak Company).
H. Kawamoto, and M. Kikuchi (to Fuji Photo Film Co., Ltd.); Silver halide photographic emulsion and a photographic light-sensitive material, U.S. Pat. 5418124, 1995, Y. Suga, M. Kikuchi, M. Yagihara, H. Okamura, and H. Kawamoto (to Fuji Photo Film Co. Ltd.); Silver halide photographic emulsion and light-sensitive material using the same, U.S. Pat. 5525460, 1996, Y. Maruyama, M. Yagihara, H. Okamura, H. Kawamoto, and M. Kikuchi (to Fuji Photo Film Co., Ltd.); Method of preparing silver halide photographic emulsion, emulsion, and light-sensitive material, U.S. Pat. 5527664, 1996, M. Kikuchi, M. Yagihara, H. Okamura, and H. Kawamoto (to Fuji Photo Film Co., Ltd.).
11. Photographic emulsion stabilized with bis(p-acylamidophenyl)disulfides, U.S. Pat. 3397986, 1968, A. G. Millikan and A. H. Herz (to Eastman Kodak Company).
26. Preparation of silver chloride emulsions having iodide containing grains, U.S. Pat. 6033842, 2000, R. Lok, B. T. Chen, and W. W. White (to Eastman Kodak Company).
12. Photographic emulsions containing mercury salts, U.S. Pat. 2728664, 1955, B. H. Carroll and T. F. Murray (to Eastman Kodak Company).
27. Preparation and use of a dimethylamine silver chloroiodide complex as a single source precursor for iodide incorporation of silver chloride crystals, U.S. Pat. 5866314, 1999, T. L. Royster, Jr., S. Jagannathan, J. A. Budz, and J. Z. Mydlarz (to Eastman Kodak Company).
5. R. L. Heidke, L. H. Feldman, and C. C. Bard, J. Imaging Sci. Technol., 11, 3 (1985). 6. G. Kolf, Br. J. Photogr. 127(13) 296–299 (1980). 7. A. I. Woodward, J. Appl. Photogr. Eng. 7(4), 117–120 (1981). 8. T. F. Parsons, G. G. Gray, and I. H. Crawford, J. Appl. Photogr. Eng. 5(2), 110–117 (1979). 9. R. Bourdelais, T. Giarrusso, IS&T 11th Int. Symp. Photofinishing Technol., 2000, pp. 49–56.
13. Light-sensitive silver halide photographic material, U.S. Pat. 5116723, 1992, M. Kajiwara and H. Ono (to Konica Corporation); Silver halide emulsions with improved heat stability, U.S. Pat. 5670307, 1997, R. Lok (to Eastman Kodak Company); Silver halide photographic elements containing dioxide compounds a stabilizers, U.S. Pat. 5693460, 1997, R. Lok (to Eastman Kodak Company); Combination of dithiolone dioxides with gold sensitizers in AgCl photographic elements, U.S. Pat. 5756278, 1998, R. Lok (to Eastman Kodak Company). 14. Recording material for color photography, U.S. Pat. 4268593, 1981, D. G. Leppard and M. Rasberger (to Ciba-Geigy AG). 15. Japanese Pat. Appl. (OPI) 159,644/81. 16. Japanese Pat. Appl. (OPI) 89,835/80. 17. Method of manufacturing silver halide emulsion, U.S. Pat. 5254456, 1993, S. Yamashita, S. Takada, and S. Shibayama (to Fuji Photo Film Co., Ltd.). 18. Photographic light-sensitive material and method of developing the same, U.S. Pat. 4865962, 1989, K. Hasebe et al., (to Fuji Photo Film Co., Ltd.). 19. B. E. Kahn, et al., IS&T 2000 International Symposium on Silver Halide Technology, 20; B. E. Kahn, et al., J. Imaging Sci. Technol., in press. 20. Photographic print elements containing cubical grain silver iodochloride emulsions, U.S. Pat. 5726005, 1998, B. T. K. Chen, J. L. Edwards, R. Lok, and S. H. Ehrlich (to Eastman Kodak Company). 21. Photographic print elements containing cubical grain silver iodochloride emulsions, U.S. Pat. 5728516, 1998, J. L. Edwards, B. T. Chen, and S. H. Ehrlich (to Eastman Kodak Company). 22. Cubical grain silver iodochloride emulsions and processes for their preparation, U.S. Pat. 5736310, 1998, B. T. K. Chen, J. L. Edwards, R. Lok, and S. H. Ehrlich (to Eastman Kodak Company); Composite silver halide grains and processes for their preparation, U.S. Pat. 5792601, 1998, J. L. Edwards, B. T. Chen, E. L. Bell, and R. Lok (to Eastman Kodak Company).
28. M. T. Olm, R. S. Eachus, W. G. McDugle, and R. C. Baetzold, Int. Symp. Silver Halide Technol., 2000, pp. 66–70. 29. R. S. Eachus, R. E. Graves, and M. T. Olm, J. Chem. Phys. 69, 4580–4587 (1978). 30. R. S. Eachus, R. E. Graves and M. T. Olm, Physica Status Solidi A 57, 429–38 (1980). 31. R. S. Eachus and M. T. Olm, Annu. Rep. Prog. Chem. Sect. C., Phys. Chem. 83(3), 3–48 (1986). 32. B. H. Carroll, Photogr. Sci. Engi. 24(6), 265–267 (1980). 33. Asami; Silver halide photographic emulsion containing a gold salt and a polyalkylene oxide, U.S. Pat. 3901711, 1975, K. E. Iwaosa and I. N. Seigo (to Mitsubishi Paper Mills, Ltd., JA); High contrast scanner photographic elements employing ruthenium and iridium dopants, U.S. Pat. 4828962, 1989, N. E. Grzeskowiak and K. A. Penfound (to Minnesota Mining and Manufacturing Company); Silver halide emulsions having improved low intensity reciprocity characteristics and processes of preparing them, U.S. Pat. 4997751, 1991, S. H. Kim (to Eastman Kodak Company).; Silver halide photographic emulsion prepared with silver halide grains formed in the presence of a water soluble iridium compound and a nitrogen-containing heterocyclic compound, U.S. Pat. 5134060, 1992, H. Maekawa, M. Kajiwara, M. Miyoshi, and M. Okumura (to Konica Corporation).; Selenium and iridium doped emulsions with improved properties, U.S. Pat. 5164292, 1992, B. R. Johnson and P. J. Wightman (to Eastman Kodak Company).; Silver halide emulsion and photographic material using same, U.S. Pat. 5166044, 1992, M. Asami (to Fuji Photo Film Co., Ltd.).; Silver halide photographic material which contains an iron dopant and substantially no silver iodide, U.S. Pat. 5204234, 1993, M. Asami (to Fuji Photo Film Co., Ltd.).
24. Silver halide photographic light-sensitive material, U.S. Pat. 5389508, 1995, S. Takada, M. Yagihara, H. Okamura, H. Kawamoto, and M. Kikuchi (to Fuji Photo Film Co., Ltd.).
34. Photographic emulsion containing transition metal complexes, U.S. Pat. 5474888, 1995, E. L. Bell (to Eastman Kodak Company).; Photographic emulsion containing transition metal complexes, U.S. Pat. 5480771, 1996, E. L. Bell (to Eastman Kodak Company).; Photographic emulsion containing transition metal complexes, U.S. Pat. 5500335, 1996, E. L. Bell (to Eastman Kodak Company).; Photographic silver halide emulsion containing contrast improving dopants, U.S. Pat. 5597686, 1996, G. L. Maclntyre and E. L. Bell (to Eastman Kodak Company).
25. Silver halide photographic light-sensitive material, U.S. Pat. 5389508, 1995, S. Takada, M. Yagihara, H. Okamura,
35. Preparation of silver halide emulsions containing iridium, U.S. Pat. 4902611, 1990, I. H. Leubner and W. W. White.
23. Process for the preparation of silver halide emulsions having iodide containing grains, U.S. Pat. 5736312, 1998, S. Jagannathan et al., (to Eastman Kodak Company).
PHOTOGRAPHIC COLOR DISPLAY TECHNOLOGY 36. Internally doped silver halide emulsions, U.S. Pat. 4835093, 1989, G. A. Janusonis, et al., (to Eastman Kodak Company); Photographic emulsions containing internally modified silver halide grains, U.S. Pat. 4933272, 1990, W. G. McDugle, et al., (to Eastman Kodak Company).; Photographic emulsions containing internally modified silver halide grains, U.S. Pat. 4937180, 1990, A. P. Marchetti, W. G. McDugle, and R. S. Eachus (to Eastman Kodak Company).; Photographic emulsions containing internally modified silver halide grains, U.S. Pat. 4945035, 1990, J. E. Keevert, Jr., W. G. McDugle, and R. S. Eachus (to Eastman Kodak Company).; Photographic emulsions containing internally modified silver halide grains, U.S. Pat. 4981781, 1991, W. G. McDugle, J. E. Keevert, and A. P. Marchetti (to Eastman Kodak Company).; Photographic emulsions containing internally modified silver halide grains, U.S. Pat. 5037732, 1991, W. G. McDugle, A. P. Marchetti, J. E. Keevert, M. S. Henry, and M. T. Olm (to Eastman Kodak Company). 37. Silver halide emulsions with doped epitaxy, U.S. Pat. 5462849, 1995, T. Y. Kuromoto, et al., (to Eastman Kodak Company).
43. 44.
38. High chloride emulsions doped with combination of metal complexes, U.S. Pat. 6107018, 2000, J. Z. Mydlarz, E. L. Bell, and M. S. Graham (to Eastman Kodak Company). 39. Keevert et al.; Photographic emulsions containing internally modified silver halide grains, U.S. Pat. 4945035, 1990, J. E. Keevert, Jr., W. G. McDugle, and R. S. Eachus (to Eastman Kodak Company). 40. Epitaxially sensitized ultrathin tabular grain emulsions, U.S. Pat. 5494789, 1996, R. L. Daubendiek, et al., (to Eastman Kodak Company).; Ultrathin tabular grain emulsions with novel dopant management, U.S. Pat. 5503970, 1996, M. T. Olm, et al., (to Eastman Kodak Company).; Ultrathin tabular grain emulsions containing speed-granularity enhancements, U.S. Pat. 5503971, 1996, R. L. Daubendiek, et al., (to Eastman Kodak Company). 41. Research Disclosure, vol. 367, November 1994, Item 36736. 42. Research Disclosure, vol. 120, April, 1974, Item 12008; Research Disclosure, vol. 134, June, 1975, Item 13452; Photographic emulsion and material and process used in the preparation thereof, U.S. Pat. 1623499, 1927, S. E. Sheppard and R. F. Punnett (to Eastman Kodak Company, Rochester, NY); Process of manufacturing photographic silver halide emulsions and products obtained thereby, U.S. Pat. 1673522, 1928, O. Matties, P. Wulff, W. Dieterle, and B. Wendt (to I. G. Farbenindustrie Aktiengesellchaft).; Photographic materials, U.S. Pat. 2399083, 1946, C. Waller, R. B. Collins, and E. C. Dodd (to Ilford, Ltd.).; Photographic emulsions sensitized with salts of metals of group VIII of the periodic arrangement of the elements, U.S. Pat. 2448060, 1948, W. F. Smith and A. P. H. Trivelli (to Eastman Kodak Company).; Photographic silver halide emulsions sensitized with water-insoluble gold compounds, U.S. Pat. 2642361, 1953, R. E. Damschroder and H. C. Yutzy (to Eastman Kodak Company).; Synergistic sensitization of photographic systems with labile selenium and a noble metal, U.S. Pat. 3297446, 1967, J. S. Dunn (to Eastman Kodak Company, Rochester, NY); Stabilization of synergistically sensitized photographic systems, U.S. Pat. 3297447, 1967, P. A. McVeigh (to Eastman Kodak Company, Rochester, NY); Photographic emulsions containing certain silver-gelatin concentrations during chemical ripening, U.S. Pat. 3565633, 1971, G. H. Klinger and M. V. Cwikla (to GAF Corporation, New York, NY); Photographic element containing monodispersed, unfogged, dye sensitized silver halide grains metal ions sensitized internally and the use thereof in reversal process, U.S. Pat. 3761267, 1973, P. W. Gilman, Jr., R. G. Raleign, and
45.
46.
1219
J. A. Merrigan (to Eastman Kodak Company, Rochester, NY); Silver halide grains and photographic emulsions, U.S. Pat. 3772031, 1973, C. R. Berry, S. J. Marino (to Eastman Kodak Company, Rochester, NY); Silver halide photographic emulsion sensitized with a heterocyclic compound containing 4-sulfur atoms, U.S. Pat. 3857711, 1974, R. Ohi, M. Sugiyama, Y. Nakajima, M. Horie, and S. Kobayashi (to Fuji Photo Film Co., Ltd., Ashigara-Kamigun, Kanagawa, JA); Silver halide emulsions and elements including sensitizers of adamantane structure, U.S. Pat. 3901714, 1975, E. N. Oftedahl (to Eastman Kodak Company, Rochester, NY); Phosphine sensitized photographic silver halide emulsions and elements, U.S. Pat. 3904415, 1975, E. N. Oftedahl (to Eastman Kodak Company, Rochester, NY); Preparation of photographic silver halide materials, UK Pat. 1315755, 1973, C. E. McBride.; Sensitive silver halide photographic materials, UK Pat. 1396696, 1975, M. J. Simons. Research Disclosure 37154. Chemical sensitization of photographic emulsions, U.S. Pat. 2518698, 1950, W. G. Lowe and J. E. Jones (to Eastman Kodak Company).; Chemical sensitization of photographic emulsions, U.S. Pat. 2739060, 1956, W. G. Lowe, J. E. Jones, and H. E. Roberts (to Eastman Kodak Company).; Chemical sensitization of photographic emulsions, U.S. Pat. 2743182, 1956, W. G. Lowe, J. E. Jones, and H. E. Roberts (to Eastman Kodak Company).; Chemical sensitization of photographic emulsions, U.S. Pat. 2743183, 1956, W. G. Lowe, J. E. Jones, and H. E. Roberts (to Eastman Kodak Company).; Chemical sensitization of photographic emulsions, U.S. Pat. 2983609, 1961, C. F. H. Allen and W. G. Lowe (to Eastman Kodak Company).; Silver halide photographic emulsions containing linear polyamine sensitizing agents, U.S. Pat. 3026203, 1962, V. C. Chambers, Jr. and A. E. Oberth (to E. I. Du Pont de Nemours and Co.).; Amine borane as fogging agent in direct positive, U.S. Pat. 3361564, 1968, J. H. Bigelow and C. R. Burt (to E. I. Du Pont de Nemours and Co.); Polyhedral borane fogged direct-positive silver halide emulsion containing and organic sulfoxide, U.S. Pat. 3782959, 1974, J. H. Bigelow (to E. I. Du Pont de Nemours and Co.); Method of manufacturing silver halide emulsion, U.S. Pat. 5254456, 1993, S. Yamashita, S. Takada, and S. Shibayama (to Fuji Photo Film Co., Ltd.).; Photographic element exhibiting improved speed and stability, U.S. Pat. 5399479, 1995, R. Lok (to Eastman Kodak Company).; Photographic sensitivity increasing alkynylamine compounds and photographic elements, U.S. Pat. 5413905, 1995, R. Lok, C. R. Preddy, and J. W. Harder (to Eastman Kodak Company).; Class of compounds which increases and stabilizes photographic speed, U.S. Pat. 5500333, 1996, J. N. Eikenberry, R. Lok, and C. Chen (to Eastman Kodak Company).; Silver halide photographic material and process for its preparation, Eur. Pat. 0 407 576, 1990, U. Shigeharu.; Silver halide photographic light-sensitive material, Eur. Pat. 0 552 650, 1993, O. Hirofumi; Oftedahl et al., Research Disclosure, vol. 136, August 1975, Item 13654. Photographic silver halide emulsions sensitized with water-insoluble gold compounds, U.S. Pat. 2642361, 1953, R. E. Damschroder and H. C. Yutzy (to Eastman Kodak Company). Chemical sensitization of photographic emulsions, U.S. Pat. 2521926, 1950, W. G. Lowe and J. E. Jones (to Eastman Kodak Company).; Polythiaalklyenediols as sensitizers for photographic silver halide emulsions, U.S. Pat. 3021215, 1962, J. L. R. Williams and B. C. Cossar (to Eastman Kodak Company).; Silver halide emulsions containing hexathiocane thiones as sensitizers, U.S. Pat. 4054457, 1977, J. H. Bigelow (to E. I .Du Pont de Nemours and Co.).
1220
PHOTOGRAPHIC COLOR DISPLAY TECHNOLOGY
47. Method for stabilizing X-ray emulsions to red safelights, U.S. Pat. 3411914, 1968, C. Dostes (to Eastman Kodak Company, Rochester, NY); Stabilized photographic silver halide composition, U.S. Pat. 3554757, 1971, Y. Kuwabara and S. Sugita (to Konishiroku Photo Industry Co., Ltd.); 6-Amidinothio -5,7-Dihydroxy-s-triazolo[2,3-a]pyrimidino compounds as stabilizers for silver halide emulsion, U.S. Pat. 3565631, 1971, M. Oguchi, Y. Kuwabara, and K. Mogaki (to Konishiroku Photo Industry Co., Ltd.); Silver halide emulsions and elements including sensitizers of adamantane structure, U.S. Pat. 3901714, 1975, E. N. Oftedahl (to Eastman Kodak Company, Rochester, NY); Silver halide photographic lightsensitive material, U.S. Pat. 4897342, 1990, M. Kajiwara, M. Miyoshi, and K. Onodera (to Konishiroku Photo Industry Co., Ltd.).; Silver halide photographic emulsions, U.S. Pat. 4968595, 1990, S. Yamada, et al., (to Fuji Photo Film Co., Ltd.).; Process for preparing silver halide emulsion and silver halide X-ray photographic material containing said emulsion, U.S. Pat. 5114838, 1992, S. Yamada (to Fuji Photo Film Co., Ltd.).; Silver halide photographic emulsion, U.S. Pat. 5118600, 1992, S. Yamada and M. Kuwabara (to Fuji Photo Film Co., Ltd.).; Process of preparing for photographic use high chloride tabular grain emulsion, U.S. Pat. 5176991, 1993, C. G. Jones and T. L. Osborne-Perry (to Eastman Kodak Company).; Silver halide photographic material and method for processing the same, U.S. Pat. 5190855, 1993, I. Toya, R. Inoue, and T. Ito (to Fuji Photo Film Co., Ltd.).; Silver halide photographic material, Eur. Pat. 0 554 856, 1993, T. Hioki and R. Nishimura. 48. Light sensitive halide material with variable contrast, U.S. Pat. 3628960, 1971, H. A. Philippaerts, H. J. Corluy, H. Gernet, and G. E. W. Schulz (to Gevaert-AGFA N.V., Mortsel, Belgium); Sensitized high aspect ratio silver halide emulsions and photographic elements, U.S. Pat. 4439520, 1984, J. T. Kofron, et al., (to Eastman Kodak Company); Photographic element exhibiting reduced sensitizing dye stain, U.S. Pat. 4520098, 1985, R. E. Dickerson (to Eastman Kodak Company); Radiographic elements exhibiting reduced crossover, U.S. Pat. 4639411, 1987, R. L. Daubendiek, R. E. Dickerson, and J. E. Kelly (to Eastman Kodak Company); Method for manufacturing a silver halide emulsion, U.S. Pat. 4693965, 1987, M. Ihama and T. Tani (to Fuji Photo Film Co., Ltd.); Silver halide photographic material, U.S. Pat. 4791053, 1988, T. Ogawa (to Fuji Photo Film Co., Ltd.); High sensitivity light-sensitive silver halide photographic material with little stain, U.S. Pat. 4925783, 1990, I. Metoki, C. Honda, and Y. Wada (to Konica Corporation).; Color photographic recording material and a process for the preparation of a photographic silver halide emulsion, U.S. Pat. 5077183, 1991, H. Reuss, B. Mucke, and H. Kampfer (to Agfa Gevaert Aktiengesellschaft).; Process for producing silver halide photographic material, U.S. Pat. 5130212, 1992, I. Morimoto, K. Hattori, K. Ueda, and H. Ushiroyama (to Konica Corporation).; Method for preparing photographic emulsion, U.S. Pat. 5141846, 1992, K. E. Fickie and M. R. Lee (to Polaroid Corporation).; Silver halide light-sensitive photographic material, U.S. Pat. 5192652, 1993, M. Kajiwara and M. Miyoshi (to Konica Corporation).; Method of manufacturing silver halide emulsion and a color photographic material having the emulsion manufactured by the method, U.S. Pat. 5230995, 1993, M. Asami (to Fuji Photo Film Co., Ltd.).; Silver halide photographic light-sensitive material, U.S. Pat. 5238806, 1993, Y. Hashi (to Fuji Photo Film Co., Ltd.).; High sensitive silver halide photographic light-sensitive material, Eur. Pat. 0 354 798, 1990, K. Goan and T. Marui.; Silver halide color photographic material., Eur. Pat. 0 509 519, 1992, Y. Kashi.; Method for manufacturing a silver halide emulsion, Eur.
Pat. 0 533 033, 1993, M. Kubotera and M. Kajiwara.; Silver halide photographic light-sensitive material., Eur. Pat. 0 556 413, 1993, T. Marui.; A silver halide photographic emulsion and a photographic light-sensitive material, Eur. Pat. 0 562 476, 1993, Y. Suga, M. Kikuchi, H. Okamura, M. Yagihara, and H. Kawamoto. 49. Sensitized high aspect ratio silver halide emulsions and photographic elements, U.S. Pat. 4439520, 1984, J. T. Kofron, R. E. Booms, C. G. Jones, J. A. Haefner, H. S. Wilgus, and F. J. Evans (to Eastman Kodak Company); Novel silver halide crystals with two surface types, UK Pat. 2038792, 1980, E. F. Haugh. E. L. Kitts, and D. J. Mickewich (to Du Pont); Process for producing a silver halide photographic material, Eur. Pat. 0 302 528, 1989, H. Mifune and S. Takada. 50. Research Disclosure, item 36544, Section V. Spectral sensitization and desensitization. 51. P. B. Gilman, Jr., Photographic Science and Engineering, 18(4), 418–430 (1974). 52. E. J. Wall, Photographic Emulsions; their preparation and coating on glass, celluloid and paper, experimentally and on the large scale, American Photographic Publishing Co., Boston, 1929, p. 65; Prevention of dye wandering in photographic emulsions, U.S. Pat. 2735766, 1956, G. D. Hill (to Eastman Kodak Company).; Light sensitive halide material with variable contrast, U.S. Pat. 3628960, 1971, H. A. Philippaerts, H. J. Corluy, H. Gernet, and G. E. W. Schulz (to Gevaert-AGFA N.V.); Preprecipitation spectral sensitizing dye addition process, U.S. Pat. 4183756, 1980, D. J. Locker (to Eastman Kodak Company); Silver halide precipitation and methine dye spectral sensitization process and products thereof, U.S. Pat. 4225666, 1980, D. J. Locker and R. L. Daubendiek (to Eastman Kodak Company); Method for producing a silver halide photographic emulsion., Eur. Pat. App. 301508, 1989, T. Tani and T. Ikeda; Research Disclosure, vol. 181, May, 1979, Item 18155. 53. Light sensitive halide material with variable contrast, U.S. Pat. 3628960, 1971, H. A. Philippaerts, H. J. Corluy, H. Gernet, and G. E. W. Schulz (to Gevaert-AGFA N.V., Mortsel, Belgium); Controlled site epitaxial sensitization, U.S. Pat. 4435501, 1984, J. E. Maskasky (to Eastman Kodak Company); Sensitized high aspect ratio silver halide emulsions and photographic elements, U.S. Pat. 4439520, 1984, J. T. Kofron, et al., (to Eastman Kodak Company); Photographic element exhibiting reduced sensitizing dye stain, U.S. Pat. 4520098, 1985, R. E. Dickerson (to Eastman Kodak Company). 54. Silver halide photographic material, Eur. Pat. App. 287100, 1988, M. Asami, N. Ohshima and S. Ohtani (to Fuji Photo Film Co., Ltd.). 55. Production of photographic material, U.S. Pat. 2912343, 1959, R. B. Collins and S. F. W. Welford (to Ilford, Ltd.). 56. Photographic element exhibiting reduced sensitizing dye stain, U.S. Pat. 4520098, 1985, R. E. Dickerson (to Eastman Kodak Company). 57. Process for producing photographic emulsions, U.S. Pat. 3822135, 1974, T. Sakai, A. Sato, N. Yamamota, and S. Kondo (to Fuji Photo Film Co., Ltd., Ashigara-Kamigun, Kanagawa, JA). 58. Japanese published Patent Application (Kokai) 24185/71; Method of spectrally sensitizing photographic silver halide emulsions, U.S. Pat. 3469987, 1969, J. M. Owens, J. L. Graham, and E. A. Pilsworth (to Eastman Kodak Company, Rochester, NY). 59. ‘‘Farbkuppler–Eine Literature Ubersicht,’’ published in Agfa Mitteilungen, Band III, pp. 156–175 (1961).; Acylaminophenol photographic couplers, U.S. Pat. 2367531,
PHOTOGRAPHIC COLOR DISPLAY TECHNOLOGY 1945, I. F. Salminen, P. W. Vittum, and A. Weissberger (to Eastman Kodak Company).; Acylamino phenols, U.S. Pat. 2423730, 1947, I. F. Salminen and A. Weissberger (to Eastman Kodak Company).; 1-Naphthol-2-carboxylic acid amide couplers for color photography, U.S. Pat. 2474293, 1949, A. Weissberger, I. F. Salminen, and P. W. Vittum (to Eastman Kodak Company).; Diacyclaminophenol couplers, U.S. Pat. 2772162, 1956, I. F. Salminen and C. R. Barr (to Eastman Kodak Company).; Photographic color couplers containing fluoroalkylcarbonamido groups, U.S. Pat. 2895826, 1959, I. F. Salminen, C. R. Barr, and A. Loria (to Eastman Kodak Company).; Cyan color former for color photography, U.S. Pat. 3002836, 1961, P. W. Vittum and A. Weissberger (to Eastman Kodak Company).; Magenta-colored cyanforming couplers, U.S. Pat. 3034892, 1962, R. J. Gledhill and J. Williams (to Eastman Kodak Company).; Silver halide photographic material, U.S. Pat. 4883746, 1989, Y. Shimada, H. Fukuzawa, and S. Ichijima (to Fuji Photo Film Co., Ltd.). 60. Farbkuppler–Eine Literature Ubersicht, published in Agfa Mitteilungen, Band III, pp. 126–156 (1961).; Pyrazolone coupler for color photography, U.S. Pat. 2311082, 1943, H. D. Porter and A. Weissberger (to Eastman Kodak Company).; Pyrazolone coupler for color photography, U.S. Pat. 2343703, 1944, H. D. Porter and A. Weissberger (to Eastman Kodak Company).; Acylated amino pyrazolone couplers, U.S. Pat. 2369489, 1945, H. D. Porter and A. Weissberger (to Eastman Kodak Company).; Halogen-substituted pyrazolone couplers for color photography, U.S. Pat. 2600788, 1952, A. Loria, A. Weissberger, and P. W. Vittum (to Eastman Kodak Company, Rochester, NY); Photographic color couplers containing mono n- alkyl groups, U.S. Pat. 2908573, 1959, W. M. Bush, I. F. Salminen, and J. R. Thirtle (to Eastman Kodak Company).; Photographic emulsion containing pyrazolone magenta-forming couplers, U.S. Pat. 3062653, 1962, A. Weissberger, A. Loria, and I. F. Salminen (to Eastman Kodak Company).; Magenta-forming couplers, U.S. Pat. 3152896, 1964, R. J. Tuite (to Eastman Kodak Company, Rochester, NY); Silver halide emulsions containing a stabilizer pyrazolone coupler, U.S. Pat. 3519429, 1970, G. J. Lestina (to Eastman Kodak Company, Rochester, NY). 61. Farbkuppler–Eine Literature Ubersicht, published in Agfa Mitteilungen, Band III, pp. 112–126 (1961); Nondiffusing sulfonamide coupler for color photography, U.S. Pat. 2298443, 1942, A. Weissberger (to Eastman Kodak Company).; Colored couplers, U.S. Pat. 2407210, 1946, A. Weissberger, C. J. Kibler, and P. W. Vittum (to Eastman Kodak Company).; Benzoylacet-o-alkoxyanilide couplers for color photography, U.S. Pat. 2875057, 1959, F. C. McCrossen, P. W. Vittum, and A. Weissberger (to Eastman Kodak Company).; Yellow forming couplers, U.S. Pat. 3265506, 1966, A. Weissberger and C. J. Kibler (to Eastman Kodak Company, Rochester, NY); Silver halide emulsion containing two-equivalent yellow dyeforming coupler, U.S. Pat. 3447928, 1969, A. Loria (to Eastman Kodak Company, Rochester, NY) 62. Colored couplers, U.S. Pat. 2455169, 1948, D. B. Glass, P. W. Vittum, and A. Weissberger (to Eastman Kodak Company, Rochester, NY); Photographic color reproduction process and element, U.S. Pat. 3227551, 1966, C. R. Barr, J. Williams, and K. Whitmore (to Eastman Kodak Company).; 4-Acyloxy-5-pyrazolones, U.S. Pat. 3432521, 1969, A. Loria (to Eastman Kodak Company, Rochester, NY); Photographic silver halide elements containing two equivalent cyan couplers, U.S. Pat. 3476563, 1969, A. Loria (to Eastman Kodak Company, Rochester, NY); Twoequivalent couplers for photography, U.S. Pat. 3617291, 1971, G. W. Sawdey (to Eastman Kodak Company, Rochester,
1221
NY); Silver halide emulsion containing acylamidophenol photographic couplers, U.S. Pat. 3880661, 1975, P. T. Lau, I. F. Salminen, and L. E. Beavers (to Eastman Kodak Company, Rochester, NY); Photographic silver halide emulsion containing 2-equivalent cyan coupler, U.S. Pat. 4052212, 1977, H. Deguchi, et al., (to Konishiroku Photo Industry Co., Ltd.); Dye image forming process, U.S. Pat. 4134766, 1979, S. Kikuchi, et al., (to Konishiroku Photo Industry Co., Ltd.); Two equivalent type colour couplers containing carbamoyloxy groups at their active positions and their use in photography, UK Pat. 1466728, 1977.; Photographic cyan colour couplers and silver halide materials, UK Pat. 1531927, 1978.; Photographic silver halide materials and developers containing couplers, UK Pat. 1533039, 1978.; Cyan couplers containing sulfonamide groups for use in photography, UK Pat. 2006755A, 1978, M. Yagihara and Y. Yokada.; Ketomethylene yellow dye-forming couplers, UK Pat. 2017704A, 1979, P. T. S. Lau. 63. Research Disclosure, Item 36544, Section VII. Antifoggants and stabilizers. 64. Photographic emulsion containing alkyl quaternary salts of thiazoles and the like as antifoggants, U.S. Pat. 2131038, 1938, L. G. S. Brooker and C. J. Staud (to Eastman Kodak Company).; Polymethylene-bis-benzothiazolium salts, U.S. Pat. 2694716, 1954, C. F. H. Allen and C. V. Wilson (to Eastman Kodak Company, Rochester, NY); Benzothiazolium compounds for controlling overdevelopment, U.S. Pat. 3342596, 1967, J. L. Graham (to Eastman Kodak Company, Rochester, NY); Silver halide emulsion containing an alkenyl benzothiazolium salt as stabilizer, U.S. Pat. 3954478, 1976, N. Arai and R. Ohi (to Fuji Photo Film Co., Ltd.); Quaternized tellurium salt fog inhibiting agents for silver halide photography, U.S. Pat. 4661438, 1987, R. Przyklek-Elling, W. H. H. Gunther, and R. Lok (to Eastman Kodak Company). 65. Photographic emulsion containing fog inhibitors, U.S. Pat. 2319090, 1943, S. E. Sheppard and W. Vanselow (to Eastman Kodak Company).; Improvers for photographic emulsions, U.S. Pat. 2403927, 1946, J. D. Kendall, D. J. Fry, and J. D. Brooks (to Ilford, Ltd.).; Antifoggant agents for photography, U.S. Pat. 3266897, 1966, K. C. Kennard and D. F. Reard on (to Eastman Kodak Company, Rochester, NY); Photographic emulsions containing mercapto development anti-foggants, U.S. Pat. 3397987, 1968, G. W. Luckey and B. D. Illingsworth (to Eastman Kodak Company, Rochester, NY); Photographic elements and processes lithographic silver halide element containing a 1. (amidophenyl)-5mercaptotetrazole sensitizing agent and development process of using same, U.S. Pat. 3708303, 1973, E. D. Salesin (to Eastman Kodak Company, Rochester, NY). 66. Manufacture of photographic gelatin emulsions, U.S. Pat. 2152460, 1939, E. J. Birr, H. Fricke, and G. Wilmanns (to Agfa Ansco Corp.); Stabilizers for photographic emulsions, U.S. Pat. 2444605, 1948, N. Heimbach and W. Kelly, Jr. (to General Aniline & Film Corporation); Tetraazaindene compounds and photographic application, U.S. Pat. 2933388, 1960, E. B. Knott (to Eastman Kodak Company); Photographic silver halide emulsions stabilized with tetraazaindene compounds, U.S. Pat. 3202512, 1965, L. A. Williams (to Eastman Kodak Company); Process for the production of photographic emulsions, U.K. Pat. 1338567, 1973, M. S. W. Nepker and K. D. Strube (to Wolfen Fotochemische Werke); Nouvelles e´ mulsions n´egatives aux halog´enures d’argent prot´eg´ees contre la d´esensibilisation par pression, Fr. Pat. 2296204, 1976, C. G. Dostes, D.C. Dubromel, and J.-F. Le Guen (to Kofak Pathe); Research Disclosure, vol. 134, June, 1975, Item 13452; Research Disclosure vol. 148, August 1976, Item 14851.
1222
PHOTOGRAPHIC COLOR DISPLAY TECHNOLOGY
67. Stabilized photographic silver halide emulsions containing 5-thioctic acid, U.S. Pat. 2948614, 1960, C. F. H. Allen, B. D. Illingsworth, and J. J. Sagura (to Eastman Kodak Company).; Photographic emulsion stabilized with bis(pacylamidophenyl)disulfides, U.S. Pat. 3397986, 1968, A. G. Millikan and A. H. Herz (to Eastman Kodak Company).; Amido substituted divalent chalcogenide fog inhibiting agents for silver halide photography, U.S. Pat. 4607000, 1986, W. H. H. Gunther and R. Lok (to Eastman Kodak Company); Divalent chalcogenide fog inhibiting agents for silver halide photography, U.S. Pat. 4607001, 1986, R. Lok and W. H. H. Gunther (to Eastman Kodak Company); Cyclic dichalcogenide fog inhibiting agents for silver halide photography, U.S. Pat. 4861703, 1989, R. Lok, W. H. H. Gunther, and J. P. Freeman (to Eastman Kodak Company).; Silver halide emulsion containing an organic selenium compound antifogging agent, UK Pat. 1282303, 1973, R. Pollet and J. Willems.; Photographic emulsions, UK Pat. 1336570, 1973, J. C. Brown, D. J. Fry, and P. J. Keogh. 68. Photographic emulsion stabilized with bis(p-acylamidophenyl) disulfides, U.S. Pat. 3397986, 1968, A. G. Millikan and A. H. Herz (to Eastman Kodak Company). 69. Photographic emulsions containing mercury salts, U.S. Pat. 2728664, 1955, B. H. Carroll and T. F. Murray (to Eastman Kodak Company). 70. Method of preparing photographic silver halide emulsions, U.S. Pat. 3957490, 1976, M. J. Libeer and F. H. Claes (to Agfa-Gevaert N.V.). 71. Japanese Patent Publication No. 9503/69; Photographic developer containing a pyridinium salt and process of development, U.S. Pat. 2648604, 1953, L. G. Welliver and C. E. Johnson (to General Aniline & Film Corporation). 72. Japanese Patent Publication No. 9504/69; Silver halide developers containing polyethylene glycols, U.S. Pat. 2531832, 1950, W. A. Stanton (to E. I. Du Pont de Nemours and Co.).; Silver halide developer compositions containing polyoxyalkylene ethers of hexitol ring dehydration products, U.S.
Pat. 2533990, 1950, R. K. Blake (to E. I. Du Pont de Nemours and Co.).; Photographic element with colloid layer containing color former and nonionic wetting agent, U.S. Pat. 2577127, 1951, A. B. Jennings and O. W. Murray (to E. I. Du Pont de Nemours and Co.); Color developers containing polyethylene glycols, U.S. Pat. 2950970, 1960, J. A. Schwan and C. M. Spath (to Eastman Kodak Company).; Accelerators for reversal color development, U.S. Pat. 3201242, 1965, J. A. Schwan and J. R. Dann (to Eastman Kodak Company). 73. Light-sensitive photographic silver halide recording material, U.S. Pat. 4418142, 1983, H. Langen, E. Wolff, and E. Ranz (to Agfa Gevaert Aktiengesellschaft); Silver halide color photographic material, U.S. Pat. 4618573, 1986, H. Okamura, H. Kobayashi, T. Nishikawa, and K. Tamoto (to Fuji Photo Film Co., Ltd.); Silver halide color photographic material, U.S. Pat. 4863841, 1989, H. Okamura, H. Kobayashi, T. Nishikawa, and K. Tamoto (to Fuji Photo Film Co., Ltd.).; Method and composition for hardening gelatin, U.S. Pat. 4877724, 1989, C. Y. Chen, E. E. Riecke, K. G. Harbison, and D. D. Chapman (to Eastman Kodak Company).; Photographic recording material, U.S. Pat. 5009990, 1991, R. Scheerer and F. Moll (to Agfa-Gevaert Aktiengesellschaft).; Method and composition for hardening gelatin, U.S. Pat. 5236822, 1993, E. E. Riecke, D. D. Chapman, C. Y. Chen, and K. G. Harbison (to Eastman Kodak Company). 74. R. W. G. Hunt, The Reproduction of Colour, 4e, 1987, pp. 306–307. 75. H. C. Bozenhard, B. Narayan, IS&Ts 11th Int. Symp. Photofinishing Technol., 2000, pp. 86–89. 76. M. Miyoshi and H. Nakatsugawa, IS&Ts 11th Int. Symp. Photofinishing Technol., 2000, pp. 59–60. 77. Research Disclosure, vol. 308, December 1989, Item 308119, Section VII. Color materials, paragraph H, and Research Disclosure, Item 36544, Section X. Dye image formers and modifiers, Subsection A. Silver dye bleach.
R RF MAGNETIC FIELD MAPPING
theory will be presented. The reader is directed to the references for more detailed information on nuclear magnetic resonance (NMR) (1,2) and electron paramagnetic resonance (EPR) (3), and to the articles in this encyclopedia on TOMOGRAPHIC IMAGE FORMATION TECHNIQUES and ELECTRON PARAMAGNETIC RESONANCE (EPR) IMAGING. The mapping techniques are divided into two general categories: probe and imaging techniques. Probe techniques produce field values at discrete points in space that can be used to generate a B1 map. Imaging techniques produce an image directly. The method depends on the proportionality of the intensity of a pixel in this image to the magnetic field at the corresponding point in space. Although B1 in tesla can be calculated from any of the techniques presented in this section, it is often reported in relative units.
JOSEPH P. HORNAK Rochester Institute of Technology Rochester, NY
INTRODUCTION Radio-frequency (RF) magnetic field mapping is the imaging of oscillating magnetic fields in space. The title of this article implies RF mapping, however, the technique is applicable to both higher frequency microwaves and lower audio frequency electromagnetic waves. At a fixed point in space, the magnetic field, B(t), associated with these waves varies sinusoidally with respect to time t at frequency ν according to the general wave equation
PROBE TECHNIQUES
B(t) = B1 sin(2π νt),
(1) Probe techniques are based on a measurement that is made from a small probe placed in the oscillating magnetic field. The probe is moved to specific points in space, typically on a predetermined grid, and measurements are made. The measurements represent the field at specific points in space. These measurements are combined to produce a contour map or image of the B1 field. The spatial resolution of probe techniques is limited by the size of the probe, the spatial density of measurements, and the accuracy of the positioning device.
where B1 is the amplitude factor. It is this amplitude factor, B1 , that is mapped or imaged in one, two, or three spatial dimensions. Five mapping techniques are presented in this article. They were developed for mapping the oscillating magnetic fields (standing waves) associated with inductive–capacitive–resistive (LCR) circuits, such as cavity resonators and resonant coils used in magnetic resonance spectroscopy and imaging. LCR circuits store energy alternately in the capacitive element as electrical fields and in the inductive element as magnetic fields. The oscillating magnetic fields form a standing wave within the volume of the inductor. The electrical fields are either zero or separated in space from the magnetic fields, and hence generally are not detected in these circuits. This point is worth noting because some of the techniques presented in this article cannot distinguish oscillating magnetic and electrical fields. Figure 1 is a schematic representation of a multiturn solenoid LCR circuit and the direction of B1 at the center of the solenoid. Because of the magnetic resonance application of these techniques, much of the magnetic resonance symbolism has been adopted, and some basic magnetic resonance
Method of Perturbing Spheres The method of perturbing spheres (4,5) is useful for measuring the absolute B1 inside a resonant circuit such as a microwave cavity perfectly matched to a frequency source. This technique requires a mechanism for precisely measuring the resonant frequency of the cavity, as well as the full frequency width (ν) at halfpower of the absorption signal from the cavity. A gold or silver conductive sphere of radius r is supported by a thin, low-dielectric, nonconducting rod (typically glass or Teflon). The sphere can be made of any conductive material. However, gold or silver are typically used to minimize resistive losses due to the sphere and to improve sensitivity. The sphere is placed inside the resonator and the B1 field in tesla at the location of the sphere is given by Eq. (2), where W is the power lost in the cavity and µ the permeability constant of the material in the cavity.
B1
1 B1 = 2
C L
ν 2 − νo2 νo2
µW π 2 νr3
12 .
(2)
The sphere perturbs the resonant frequency νo of the cavity to frequency ν. Therefore, mapping ν allows calculating B1 . The B1 sensitivity of this technique is limited by the frequency resolution of the device used to monitor
Figure 1. A schematic depiction of an LCR circuit and the direction of the B1 field within the inductor. 1223
1224
RF MAGNETIC FIELD MAPPING
the resonant frequency of the cavity. The method of perturbing spheres has been applied to map the B1 fields of X-band microwave cavity resonators (4) and loop-gap resonators (6) that have fields of view less than 2.5 cm. The technique is applicable to larger size, lower frequency resonators as well. The method of perturbing spheres has recently been applied to map the RF fields from large-volume, low-frequency ESR and Overhauser MRI resonators (7,8). Note that the method of perturbing spheres is sensitive to both electrical and magnetic field components. The technique gives an accurate mapping of the magnetic fields only if the electrical and magnetic field components are well separated in space. Otherwise, the use of small conductors that have variable shapes is required to distinguish the contributions of the two components. Pickup Coil Technique Pickup coil techniques (9–16) measure a B1 field by detecting the current induced in a coil of wire placed in an oscillating magnetic field (Fig. 2). A small N turn coil of wire of radius r is placed in an oscillating magnetic field of frequency ν. The average B1 within the pickup coil and perpendicular to the plane of the coil is related to the maximum voltage Vmax produced in the coil by Eq. (3): B1 =
Vmax . 2π 2 r2 νN
(3)
The orientation of the loop may be varied, and therefore, the orthogonal components of B1 may be measured. The B1 sensitivity is limited by the signal-to-noise ratio (SNR) in the voltage monitoring system. The smaller the coil, the better the spatial resolution, but the poorer the SNR and hence B1 sensitivity. This technique has been applied to 2 × 103 cm3 imaging coils used in magnetic resonance imaging (11). The technique may also be used to map the fields in nonresonant devices and in free space, such as those from radar antennae. NMR Technique The simplest magnetic resonance experiment, or pulse sequence as it is more commonly called, consists of a
B1
Pick-Up Coil
Resonator Volume
Figure 2. A schematic representation of a pickup coil oriented for maximum signal pickup in the B1 field from a resonator.
one-pulse sequence (17). In this pulse sequence, the net magnetization from the sample is rotated by an angle θ away from its equilibrium position along the +Z axis. The rotational angle in degrees is given by θ = 2π γ B1 τ,
(4)
where γ is the gyromagnetic ratio for the signal bearing nucleus in (sT)−1 and τ is the width in seconds of the RF pulse of amplitude B1 tesla. After rotation, the XY component of the magnetization precesses about the direction of the applied static magnetic field, which coincides with the +Z axis. The signal induced in the resonator from the XY magnetization as it rotates about +Z is called free induction decay (FID). The Fourier transform of the FID is a frequency domain spectrum of the sample. Recording a series of frequency domain spectra as a function of τ yields maxima in the signal when (5) θ = n90° , and n is an odd integer. Knowing the τ value when θ = 90° allows calculating B1 . This is a rather tedious but effective process if an exact B1 value is needed. The B1 sensitivity is limited by the ability to determine the maximum in the signal accurately as a function of τ . A technique similar to this one, but using a spin-echo pulse sequence has been used to map the B1 fields in NMR coils (18). Another variation on this technique is called the rotating-frame technique (19). A series of NMR spectra are recorded at different τ values. From Eq. (4), the difference in τ values, τ , between successive maxima is (γ B1 )−1 . In all NMR techniques, the timing of the pulses in the sequence must be chosen to minimize effects from spin relaxation. IMAGING TECHNIQUES Imaging techniques require using a magnetic resonance imager and are limited to imaging the B1 field from an imaging coil or resonator connected to the imager. (The reader should refer to the article in this encyclopedia on MAGNETIC RESONANCE IMAGING (MRI) for more detailed information on the magnetic resonance imaging process and signal). The space in which the B1 field is being measured must be filled by a nuclear magnetic resonance signal-bearing substance. This procedure implies that the fields that are being mapped are those in the signalbearing substance. Some of the techniques presented require a homogeneous substance. Aqueous paramagnetic solutions; pure gels of gelatin, agar, polyvinyl alcohol, silicone, polyacrylamide, or agarose; organic doped gels; paramagnetically doped gels; and reverse micelle solutions have been proposed as hydrogen MRI phantoms (20). Each of these materials has its own application-specific advantages and disadvantages. When the material is water, the B1 images can show a standing wave artifact. This artifact manifests itself as a stronger field at the locations of the standing waves. The artifact is most pronounced when the size of the object approaches
RF MAGNETIC FIELD MAPPING
dimensions greater than half the wavelength of the radiation in water. The index of refraction of water is approximately 9 at 63 MHz, therefore the wavelength of this frequency in water is 53 cm. Materials that have high conductivity produce skin-effect artifacts. The B1 field at the surface of a sphere of radius r filled by a conducting solution of resistivity ρ and permeability µ is B1 . The field at any location x, measured from the surface along a radius of the sphere, is given by
B1 (x) = B1 e−x/δ ,
(6)
where δ is the skin depth of the photons of frequency ν in the solution: ρ δ= . (7) π νµ At 63 MHz, the skin depth in normal saline is ∼6 cm. Solutions that minimize the conductivity and index of refraction, such as reverse micelle solutions of a water (8 mM NiCl2 )–AOT–decane solution, where AOT is the surfactant bis(2-ethylhexyl)sulfosuccinate sodium salt, have been proposed to minimize these artifacts (20). The imager produces an image or series of images, where the intensity of any pixel is an assumed function of B1 . Both the rotational and direct mapping techniques mentioned following are affected by inhomogeneities in the static magnetic field Bo and nonlinearities in the gradient fields from the imager. The spatial resolution of imaging techniques is limited by the voxel size in the image. This voxel size is determined by the selected tomographic slice thickness (Thk), the number of points in the image matrix (NPTS1 × NPTS2 ), and the field of view (FOV). The dimensions of a voxel are Thk × FOV1 /NPTS1 × FOV2 /NPTS2 . In practice, the SNR limits the voxel size.
1225
imaged in an inhomogeneous B1 field from a 63-MHz magnetic resonance imaging coil. The original hydrogen magnetic resonance image shows bright bands of ∼90° and ∼270° rotational angles, and dark bands of ∼0° and ∼180° rotation. As the maximum rotational angle in the image increases, more bands are observed. Therefore, the accuracy of the technique increases when higher power RF or longer pulses are applied to the coil. Direct Mapping The amount of transverse magnetization and hence signal S produced by a one-pulse imaging sequence is proportional to the sine of the rotational angle (13). Substituting Eq. (4) for the rotational angle θ , S ∝ sin(2π γ τ B1 ).
(8)
Therefore, measuring S as a function of B1 for a few values on the sine curve will allow calculating B1 . This general procedure is more commonly implemented by using a spin-echo imaging sequence (13,23–26). A spin-echo sequence consists of two RF pulses and a signal called the echo. The echo time, TE, is the time between the first RF pulse and the maximum in the echo. The two RF pulses rotate the magnetization by θ and then 2θ a time TE/2 later. The second pulse samples the magnetization present TE/2 after the first. The optimum signal is at θ = 90° . The signal for this
(a)
Rotational Mapping This technique is an imaging version of the probe NMR technique mentioned earlier (21). Instead of a point source sample, a volume sample is used. Images are recorded as a function of a series of rotational angles. The period of the oscillation in the signal as a function of the rotational angle is used to calculate B1 . In this technique, B1 sensitivity is limited by the ability to measure the period of the oscillations in the signal intensity accurately. A variation on this technique applies only a highpower pulse that rotates some of the magnetization by thousands of degrees. A single image of a spatially uniform signal-bearing sample will display a series of dark and bright stripes. Dark stripes correspond to θ values where n of Eq. 5 is even, whereas bright stripes correspond to odd values of n. If the region of minimum B1 is known, B1 can be calculated from Eqs. (4) and (5). This single image technique is not foolproof, because there is no a priori way to distinguish between a decreasing B1 trend and an increasing trend. An example of this technique is depicted in Fig. 3. An 18-cm diameter sphere full of 14 mM NiCl2 was
(b) 270°
180° 90°
Figure 3. A magnetic resonance image (a) of a 18-cm diameter sphere full of 14 mM NiCl2 in an inhomogeneous B1 field, and (b) a rotational angle map (with permission from Ref. 22).
1226
RF MAGNETIC FIELD MAPPING
sequence is proportional to sin3 θ , or substituting Eq. (3) for θ , S ∝ sin3 (2π γ τ B1 ). (9) Images are taken at several B1 values near θ = 90° . Corresponding pixels in the images are fit with Eq. (9), and B1 is determined. This process is repeated for every pixel in the image. Figure 4 displays the B1 fields from a quadrature style bird cage coil (27) on a 1.5 tesla magnetic resonance imager. The fields were measured inside 27-cm diameter spheres full of 14 mM aqueous NiCl2 ; a reverse micelle solution of water, AOT (a surfactant), and decane; and 154 mM aqueous NaCl containing 6 mM NiCl2 . These three solutions exemplify a problem associated with the signal-bearing substance in a phantom. The inverted parabolic shape of the phantom full of 14 mM aqueous NiCl2 is due to the standing wave artifact. The parabolic shape of the phantom full of 154 mM aqueous NaCl and 6 mM NiCl2 is primarily due to the conductivity
(a)
(b)
(c)
or skin-effect artifact associated with solutions of high electrical conductivity. The deviation from a perfect parabola is also due to the presence of a small standing wave artifact. The shape of the field in the phantom full of a reverse micelle solution more accurately represents the B1 field in the imaging coil. The accuracy is improved because the standing wave and conductivity artifacts are minimized by the dielectric constant and conductivity of the reverse micelle solution (20). Similar variations in B1 have been reported for elliptical phantoms (28). The direct mapping technique does not require filling the resonator or coil with a homogeneous solution. In fact, B1 fields have been measured inside the human head using this technique (29). Other imaging techniques have been proposed. The reader is directed to the scientific literature for information on them. ABBREVIATIONS AND ACRONYMS AOT B B1 δ δv EPR FID FOV γ LCR MRI MHz mM n N ν NMR NPTS NiCl2 r RF SNR τ θ TE Thk V W
the surfactant sodium bis(2-ethylhexyl)sulfosuccinate magnetic field amplitude of the RF magnetif field skin depth difference in frequency electron paramagnetic resonance free induction decay field of view gyromagnetic ratio inductive-capacitive-resistive magnetic resonance imaging megahertz millimolar concentration an integer number of turns frequency nuclear magnetic resonance number of points Nickel Chloride radius Radio-frequency signal-to-noise-ratio pulse width rotation angle echo time slice thickness voltage power
BIBLIOGRAPHY Figure 4. Images of the B1 fields in a quadrature style bird cage coil on a 1.5 tesla magnetic resonance imager. The fields were measured inside 27-cm diameter spheres full of (a) 14 mM aqueous NiCl2 ; (b) a reverse micelle solution of water, AOT (a surfactant), and decane; and (c) 154 mM aqueous NaCl and 6 mM NiCl2 . Red lines are plots of the B1 field along a line drawn through the center of the sphere (15) (with permission from Ref. 22). See color insert.
1. C. P. Schlicter, Principles of Magnetic Resonance, SpringerVerlag, NY, 1980. 2. E. Fukushima and S. B. W. Roeder, Experimental Pulse NMR, A Nuts and Bolts Approach, Addison-Wesley, Reading, MA, 1981. 3. C. P. Poole Jr., Electron Spin Resonance, Wiley, NY, 1983. 4. J. H. Freed, D. S. Leniart, and J. S. Hyde, J. Chem. Phys. 47, 2762–2773 (1967).
RF MAGNETIC FIELD MAPPING
1227
5. E. L. Gintzon, Microwave Measurements, McGraw-Hill, NY, 1957, p. 448.
18. D. I. Hoult, C. -N. Chen, and V. J. Sank, Magn. Resonance Med. 1, 339–353 (1984).
6. J. P. Hornak and J. H. Freed, J. Magn. Resonance 67, 501–518 (1986).
19. J. Murphy-Boesch, G. J. So, and T. L. James, J. Magn. Resonance 73, 293–303 (1987).
7. M. Alecci, M. Ferrari, and C. L. Ursini, Biophys. J. 67, 1274 (1994).
20. J. E. Roe, W. E. Prentice, and J. P. Hornak, Magn. Resonance Med. 35, 136–141 (1996).
8. M. Alecci et al., Phys. Med. Biol. 43, 1899–1906 (1998).
21. J. Murphy-Boesch, G. J. So, and T. L. James, J. Magn. Resonance 73, 293–303 (1987).
9. D. I. Hoult and P. C. Lauterbur, J. Magn. Resonance 34, 425–433 (1979). 10. P. Mansfield and P. G. Morris, in J. Waugh, ed., Advances in Magnetic Resonance, Suppl. 2, Academic Press, NY, 1982. 11. C. Chen and D. J. Hoult, Biomedical Magnetic Resonance Technology, IOP, Bristol, 1989. 12. L. Fakri-Bouchet, C. Lapray, and A. Briguet, Meas. Sci. Technol. 9, 1641–1646 (1998).
22. J. P. Hornak, The Basics of MRI, 2001, http://www.cis.rit. edu/htbooks/mri/. 23. Method of Mapping the RF Transmit and Receive field in an NMR System. US Pat. 5,001,428, (1991), J. K. Maier and G. H. Glover. 24. E. K. Insko and L. Bolinger, J. Magn. Resonance Series A 103, 82–85 (1993).
13. J. P. Hornak, J. Szumowski, and R. G. Bryant, Magn. Resonance Med. 6, 158–163 (1988).
25. G. H. Glover et al., J. Magn. Resonance 64, 255–270 (1985).
14. J-C. Ginefri, E. Durand, and L. Darrasse, Rev. Sci. Instrum. 70, 4730–4731 (2001).
26. G. J. Barker, S. Simmons, S. R. Arridge, and P. S. Tofts, Br. J. Radiol. 71, 59–67 (1998).
15. L. Darrasse and G. Kassab, Rev. Sci. Instrum. 64, 1841–1844 (1993).
27. C. E. Hayes et al., J. Magn. Resonance 63, 622–628 (1985).
16. A. Leroy-Willig, L. Darrasse, J. Taquin, and M. Sauzade, Magn. Resonance. Med. 2, 20–28 (1985). 17. J. P. Hornak, The Basics of NMR. 2000 J. P. Hornak, http://www.cis.rit.edu/htbooks/nmr/.
28. J. S. Sled and G. Bruce, IEEE Trans. Med. Imaging 17, 653–662 (1998). 29. X. Li, Tissue parameter determination with MRI in the presence of imperfect radiofrequency pulses. M.S. Thesis. Rochester Institute of Technology, 1994.
S SCANNING ACOUSTIC MICROSCOPY
impact, and corrosion. However, critical defects such as voids, delaminations, and debonding sites of the new materials often exist in the interior. Therefore, conventional microscopes cannot observe them directly. Hence, defects must be exposed by cutting or grinding the specimens in layers which causes the following problems.
BERNHARD R. TITTMANN CHIAKI MIYASAKA The Pennsylvania State University University Park, PA
1. The shape of the defects might be changed when cutting or polishing the specimens. 2. There is no guarantee that all defects are exposed. 3. Certain specimens are difficult to cut or grind. 4. When the sizes of the defects are less than 10 µm, it is difficult to see their 3-D distributions by grinding in layers.
INTRODUCTION This article is an overview of scanning acoustic microscopy, which is one of the more advanced ultrasonic imaging technologies. The scanning acoustic microscope (hereinafter called simply ‘SAM’) uses ultrasound to produce enlarged images of the microscopic structures of materials. In other words, the SAM is an instrument that subjects an object to ultrasound and detects the variations of the elastic properties of the object. The elastic properties are determined by the molecular arrangement of the material, molecule size, intermolecular force or the like, and each material has its unique characteristics. Accordingly, each object reacts in its own peculiar way when subjected to ultrasonic waves. The waves received from the object can be converted into an image. Because ultrasonic waves can penetrate into an object’s interior, the surface and also the interior can be visualized. The contents of this article are as follows: First, the need in industry for this process is described. Second, the history of the development of the SAM is illustrated in brief. Third, the basic theory of ultrasonic imaging used in the SAM is depicted, as well as the basic theory of the implements used in the SAM. Furthermore, the V(z) curve method for material characterization is described. Selected images are given for a better understanding of the sections.
The SAM is also very useful in the biomedical field where many new treatment methods and new medications have been developed. Obviously, their effects and side effects need to be studied qualitatively and quantitatively. Optical microscopes are often used for observing such data from tissues or cells taken from patients as specimens. However, the tissues or the cells need to be chemically stained for optical microscopic observation. These stains usually kill the cells. Therefore, when the stains are applied, it is difficult to understand the real effects of the treatment or the medicine because the transformation in the tissue may only be visible in living cells. Furthermore, when measuring the elastic properties of tissues by conventional techniques, large volumes of the tissues are required. It is not realistic to take such amounts from patients suffering from diseases. It has been very popular to visualize internal structures of opaque objects nondestructively and to obtain the acoustic characteristics of specimens by instruments using the basic features of ultrasound (reflection, transmission, refraction, and diffraction). However, the frequencies of the ultrasound used for these instruments range typically from 20 kHz to 10 MHz, limiting the resolution of the process. Furthermore, most of their transducers do not have features to focus ultrasound so that the diameters of the ultrasonic beams would be small enough to form an image that has the resolution and/or the contrast of optical microscopes. In addition, traditional ultrasonic imaging cannot provide data on the elasticity of the small area of a specimen. Therefore, scanning acoustic microscopy is a viable nondestructive method that has high-resolution defect visualization to determine elastic parameters in both the medical and the industrial fields. Because the SAM uses acoustic waves, it can penetrate nondestructively into optically opaque materials. In addition, the contrast mechanism of the SAM depends only on the reflection from objects that have different acoustic properties and, thus, chemical agents for staining the tissues or the cells are not necessary, so that living tissues or cells
WHY SAM? Recently, corresponding to growth in the development of material science and technology, the requirements for quality assurance of materials have become more stringent, especially in aerospace components and microelectronics. It is sometimes difficult for conventional materials to meet these new requirements completely. In this case, the use of new materials (composite materials, coated materials, thin films, metal alloys), is often discussed. However, the discussion comes to a close if no data for securing the reliability and safety of the new materials exist. Generally speaking, when evaluating a material, test pieces for life estimation, quality, and safety analyses are prepared. Data showing material characteristics are typically collected by optical microscopes and scanning electron microscopes through destructive tests such as pull, compression, bending, torsion, creep, fatigue, 1228
SCANNING ACOUSTIC MICROSCOPY
may be clearly observed. Because the beam is focused (up to submicrometer size) and has a high frequency (10 MHz to 3 GHz), the resolution of the system is of the order of optical microscopes. In addition to image formation, the SAM can detect the amplitude and the phase of a reflected wave. By analyzing them, the elastic properties of a specimen can be quantitatively determined. BRIEF HISTORY OF SAM In 1949, Sokolov suggested the possibilities of SAM and indicated the advantages such a technology would have compared to conventional optical light microscopes. He made a remarkable statement in his article that a microstructure of a material could be microscopically visualized when using ultrasounds that have a frequency close to 3 GHz or more (1). However, the realization of a practical SAM was delayed until the development of electronic imaging, high-frequency processing, and an acoustic lens. In 1973, Lemons and Quate developed a mechanically scanning acoustic transmission microscope, including two acoustic lenses (one for emitting focused ultrasound into a specimen and another for receiving the ultrasounds transmitted from the specimen) and a RF generator that sends a high frequency of 1 GHz or more (2). For this type, a specimen is located between the lenses, so that the thickness of the specimen is limited. To overcome the disadvantage, the reflection mode of a mechanically scanning acoustic microscope that had one acoustic lens was developed (3). This type of SAM has been widely accepted as standard hardware in industries because of its simple system configuration and capability of forming a highly resolved image. In 1979, Weglein found that a change in the voltage of an acoustic lens, which is the so-called V(z) curve, when defocusing the acoustic lens toward a specimen, is unique to the elastic properties of the specimen (4). This phenomenon was modeled with ray tracing by Parmon and Bertoni (5), and with Fourier optics by Atalar (6). They also, found that the velocity of surface acoustic waves traveling within the spot of an ultrasonic beam could be obtained by measuring the distance between the periods of the V(z) curves. By the establishment of these basic theories, the SAM became the accepted apparatus for imaging and also for quantitative data acquisition. After its development, the SAM was modified or newly designed to improve the resolution, contrast, penetration depth, and capability of quantitative data acquisition, as well as to design the SAM for a specific application. Resolution The resolution of an image formed by the SAM is determined by the frequency of the wave, the velocity of the coupling medium, and the lens geometry. Therefore, three approaches were pursued to increase the resolution. The first approach was to raise the acoustic frequency. By this approach, Hadimiooglu and Quate achieved a resolution of 0.2 µ by operating at about 4.4 GHz, and the
1229
specimen was in boiling water to minimize the attenuation of the coupling medium (7). The second approach was to use low-velocity coupling media. Using the second approach, referred to as cryogenic microscopy, Foster and Rugar were able to obtain a resolution of 20 nm using superfluid liquid helium as a coupling medium at a temperature of 0.2 ° K (8). These approaches are excellent for increasing resolution on the surface of a specimen, but not necessarily in its interior. In the first approach, the ultrasonic wave is attenuated in proportion to the square of its frequency; therefore, when the high-frequency ultrasonic wave is used as a probe, the wave will not penetrate inside the specimen, and the internal information may not be obtained. In the second approach, the acoustic impedances of the coupling medium and the specimen are very different, so that most of the ultrasonic waves are reflected at the interface between the coupler and the specimen, and it is unlikely that an image may be obtained from any significant depth (9). Hence, acoustic microscopy’s most important and unique feature — namely, obtaining subsurface information — may not be realized by these two approaches. The third approach, which was to design a new acoustic lens, has been considered the best approach for maintaining the most important feature of the SAM. By the third approach, the following new acoustic lenses were proposed. Chubachi et al. developed the concave transducer to increase resolution by removing spherical aberration (10). Davids et al., Atalar et al., Yakub et al., and Miyasaka et al. proposed new acoustic lenses that have restricted apertures using Rayleigh waves (11), a nonspherical aperture using Lamb waves (12), a pinhole aperture for near-field imaging (13), and a centersealed aperture substantially using shear waves (14), respectively. Contrast Apart from resolution increase, contrast enhancement is another way to increase imaging quality. Daft et al., Chou et al. and Atalar et al. proposed an acoustic lens having a large transducer for phase-contrast imaging (15), a shearpolarized transducer for increasing the contrast of acoustic images (16), and a slit aperture for pseudoline focusing to image the anisotropy of materials (17), respectively. Penetration Depth In the field of plastic matrix composites, especially fiberreinforced plastic (FRP) laminates, it is necessary to detect delaminations located at an interface between laminates. When inspecting microelectronic integrated circuits (IC), it is necessary to form a cross-sectional image to display the details of defects (e.g., voids, delaminations, microcracks) located within an IC package. Materials commonly used for these have highly attenuating properties, so that a frequency of 100 MHz or higher may not penetrate into a portion that must be observed. Therefore, to meet the requirements, a new SAM, which is an improved version of a C-scan imaging apparatus, was designed to use the timeof-flight method (pulse-wave mode) in the lower frequency
1230
SCANNING ACOUSTIC MICROSCOPY
range of 10–100 MHz (18–21). This SAM can perform an A scan and a B scan as well as a C scan, so that conventional ultrasonic techniques can be used directly. Therefore, this type of SAM is the most widely used in various industries (22–26). Quantitative Data Acquisition Quantitative data (i.e., velocities of surface acoustic waves) obtained by a spherical lens, which is commonly used for acoustic imaging, may not provide all the information on the elastic anisotropy of materials. Kushibiki et al. developed a cylindrical lens (called line-focus lens) and a measuring method to overcome this issue (27,28). Liang et al. developed a SAM that can obtain complex V(z) curves, based on a nonparaxial formulation of the V(z) integral, and a reflectance function of a liquid–solid interface by inverting the V(z) curves formed at frequencies of 10 MHz or less (29). Endo et al. improved this SAM in terms of mechanical movement and increased frequency range (up to 3.00 GHz) to perform more precise measurement (30). Kulik et al. developed a SAM for continuous-wave operation that obtained V(z) curves having better signalto-noise ratios (31). Not adopting the V(z) curve techniques, Tsukahara et al. developed a SAM, a so-called ultrasonic microspectrometer, for accurately measuring the acoustic reflection coefficient of isotropic and/or anisotropic materials that have frequencies of 100 MHz or more (32). Modifications for Applications Phase-Contrast Imaging. An acoustic image can be formed by using either the amplitude or phase of the ultrasonic signals reflected or transmitted from a specimen. The amplitudes were often used to form an image because of the simplicity of designing electric circuits. However, phase contrast can often provide significant information such as residual stress distribution. Nikoonahad developed the tilted transducer to obtain differential phase contrast (33). Meeks et al. developed a phase-sensitive SAM, which operates across a large bandwidth into the GHz regime (34). Use of Various Coupling Media. Water is commonly used as a coupling medium for the SAM. However, some specimens allow only dry conditions. Attal proposed liquid metals (e.g., mercury or gallium) as coupling media (35). Wickramasinghe and Petts developed a SAM using gases (e.g., argon and xenon) as coupling media under elevated pressure (36). To increase resolution, Karaki and Sakai developed an acoustic microscope using liquid nitrogen instead of superfluid liquid helium and obtained images at 1.0 GHz (37). Yamanaka et al. developed a lowtemperature microscope to study a 0.5-mm thick epoxy bond between two 1-mm PMMA sheets at 40 MHz at −50 ° C using methyl alcohol as a coupling medium (38). These types of SAMs were designed for use strictly in a laboratory because they required environmental chambers for the coupling media. For industrial use, the air-coupled SAM is desired, although its resolution is limited. The challenge is the
transfer of acoustic energy across the solid/air interface. The calculated transmission coefficient between a steel transducer face plate and air is equal to only 1.797 × 10−5 . This means that 99.996% of the power is reflected back from the interface. Therefore, an appropriate transducer needed to be developed. Fox et al. manufactured transducers using silicone rubber as the front acoustic impedance-matching layer and operating frequencies at 1 and 2 MHz (39). By using such a transducer operating frequency at 1 MHz, the distance in air could be measured from 20 mm to 40 mm with an accuracy of 0.5 mm. Further improvements in transmission efficiency were shown by planting an acoustic impedance-matching layer that is composed of tiny glass spheres in a matrix of silicone rubber on piezoelectric elements (40,41). Reilly and Hayward reported air-coupling transducers using piezoelectric composites and operating with tone-burst waves whose frequency ranged from 250 kHz to 1.5 MHz. They were successful in producing transmitted signals (milivolt level) through a composite laminated honeycomb structure at 500 kHz (42). Shindel et al. developed a micromachined capacitance air transducer characterized by high bandwidths for evaluating composite materials (43). The micromachined capacitance air transducer could be operated at a frequency of 11.4 MHz (44). Bhardwadj et al. equipped a transducer for noncontact SAM (reflection type) (45). Elevated Temperature. Observation of changes of interior portions of materials due to a temperature increase had been desired when using the SAM. Miyasaka and Tittman developed a SAM using operating temperatures up to 500 ° C to observe a specimen located in motor oil (46). Ihara et al. developed a SAM using operating temperatures up to 920 ° C for liquid metals (47). However, such issues as frequency increase still remain, to be addressed. Nonlinear Imaging. The sharply convergent acoustic beam used in the SAM can provide sufficient intensity to produce strong nonlinear effects at microwave frequencies. The first nonlinear imaging was performed by using a mechanically scanned transmission acoustic microscope. Second-harmonic acoustic radiation generated in the vicinity of the beam focus is readily detected and used to form an image of an object placed in the focal plane (48). To obtain nonlinear data, the receiving transducer is tuned to either the second or third harmonic. When harmonics are produced, the resultant images have a complex appearance. However, a significant portion of the detected signal is due to nonlinearity not associated with specimen, but, for example, due to the water that serves as a coupling medium. To minimize these phenomena, Yeack et al. used a three-lens scanning acoustic microscope (transmission type) to achieve nonspecular oblique incidence. This enables mixing two independent inputs to occur only in the specimen, thus eliminating the contribution from, for example, the water coupling medium (49). Germain and Cheek used high-order harmonic acoustic radiation to form nonlinear images by using a mechanically scanned reflection acoustic microscope (50) and evaluated biological media (51). Tan et al. constructed a micron wavelength
SCANNING ACOUSTIC MICROSCOPY
acoustic microscope to observe third-order nonlinear products (52). More recently, nonlinear effects have been studied in pressurized superfluid helium in which resolution was improved beyond the diffraction limit (53–56). Most recently, several scientists studied nonlinear propagation applied to acoustic tomography (57–59). PRINCIPLE OF THE SCANNING ACOUSTIC MICROSCOPE Imaging Mechanism In this section, the imaging mechanism of the mechanically scanned reflection microscope using water as a coupling medium is described. The SAM has two modes. One is the so-called ‘‘toneburst-wave mode’’ that uses tone-burst waves (see Fig. 1b), and operates substantially in the frequency range between 100 MHz and 3 GHz. In this frequency range, the wavelengths of the waves in water range from 15.0 to 0.5 µm (see Table 1). Therefore, an image formed in the tone-burst-wave mode is highly resolved. However, the
(a)
(b)
Figure 1. Waveform: (a) Tone-burst wave. (b) Pulse wave.
Table 1. Wavelength in Water at Various Frequencies Frequency 100 MHz 200 MHz 400 MHz 600 MHz 800 MHz 1.0 MHz
Wavelength (µm) 15.0 7.5 3.7 2.5 1.8 1.5
µm µm µm µm µm µm
1231
penetration depth of the waves is limited by attenuation. The tone-burst-wave mode is for penetrations up to 300 µm. Hence, the tone-burst-wave mode is applicable for observing the surface and the subsurface of specimens. The other mode is the so-called pulse-wave mode that uses a spike pulse wave or only a few cycles (see Fig. 1a) and operates substantially in the frequency range between 10 and 100 MHz. Therefore, resolution of an image formed in this mode is limited. However, the pulse-wave mode is for penetrations up to 1 cm. Hence, the pulse-wave mode is usually applicable for observing the deep interior of specimens. Tone-Burst-Wave Mode. Figure 2 is a schematic diagram of the SAM (tone-burst-wave mode). The imaging mechanism of the tone-burst-wave mode in Fig. 2 is described here. An electrical signal is generated by a RF tone-burst source. In the beginning, the Colpitts oscillator, or the like, was used as the RF tone-burst source. Nowadays, burst waves are gated out from continuous waves by a single pole double-throw (SPDT) switch for frequency stability. The output (i.e., voltage) of the source is approximately 10 V. The electrical signal is transmitted to a piezoelectric transducer located on the top of a buffer rod through a circulator (or the SPDT switch). The electrical signal is converted to an acoustic signal (i.e., ultrasonic plane wave) by the transducer. The ultrasonic plane wave travels through the buffer rod to a spherical recess (hereinafter called simply ‘‘lens’’) located at the bottom of the buffer rod, wherein the lens is coated by the acoustic impedance-matching layer which is the so-called acoustic antireflective coating (hereinafter called simply ‘‘AARC’’). The lens converts the ultrasonic plane wave to an ultrasonic spherical wave (i.e., ultrasonic beam). The ultrasonic beam is focused within the specimen and reflected from the specimen. The reflected ultrasonic beam, which carries acoustic information from the specimen, is again converted to an ultrasonic plane wave by the lens. The ultrasonic plane wave returns to the transducer through the buffer rod. The ultrasonic plane wave is again converted to an electric signal at the transducer. The voltage of the electric signal ranges from 300 mV to 1 V. When the operating frequencies range from 100 MHz to 1 GHz, the corresponding insertion losses are approximately 30 and 80 dB. Therefore, the electric signal must be amplified by 30 to 80 dB at the receiver. Furthermore, the electric signal is comprised of transmission leaks, internal reflections from the interface between the lens and the AARC, and reflections from the specimen. Therefore, the reflections must be selected by a rectangular wave from a double balanced mixer (DMB), called the first gate. Then, the peak of the amplitude of the electric signal is detected by a circuit, which includes a diode and a capacitor (i.e., the peak detection technique). The gate noise is removed by using the second gate existing within the first gate (the blanking technique). The peak-detected signal is stored in a memory through an analog-to-digital signal (hereinafter called simply ‘‘A/D’’) converter. The stored signal is again converted into an
1232
SCANNING ACOUSTIC MICROSCOPY
Transmitter
Receiver
Circulator
CRT
Controller
DSC
GPIB Top electrode
Transducer
Computer Electrode
x – scan
Buffer rod
z stage Plane wave
y – scan
Acoustic beam Specimen
Coupling medium
z
x Figure 2. Schematic diagram of a scanning acoustic microscope (burst-wave mode).
analog signal by a digital-to-analog signal (hereinafter called simply ‘‘D/A’’) converter. This flow of processes allows the information that is collected at a single spot on a specimen to be displayed as intensity at a certain point on the TV monitor. To form a two-dimensional acoustic image, an acoustic lens and/or an x — y stage is mechanically scanned across a certain area of the specimen. The acoustic lens can translate axially in the z direction to vary the distance between the specimen and the lens for subsurface visualization, that is, when visualizing the surface of the specimen, the acoustic lens is focused on the specimen (we denote z = 0 µm), and when visualizing a subsurface of the specimen, the acoustic lens is mechanically defocused toward the specimen (we denote z = −x µm where x is the defocused distance). Pulse-Wave Mode. The imaging mechanism for the pulse-wave mode is mostly the same as that for the tone-burst-wave mode. The significant differences are described here. Suppose that a pulse wave emitted from an acoustic lens through a coupling medium (e.g., water) is focused onto the back surface of the specimen. Suppose that the pulse wave is strong enough to travel through the specimen and reflect back to the acoustic lens. If the specimen is a layered material, the timing of each reflection is different because the travel distance is different. As shown in Fig. 3, an oscilloscope may monitor both the amplitude and the delay of the pulse wave reflected from each plane of the specimen. Note that the output of an electrical
x–y stage y
t
Gate
[1]
[1] [2] [3] [4]
[2]
[3]
[4]
Reflected waveform of front surface Reflected waveform of first interface Reflected waveform of second interface Reflected waveform of rear surface Acoustic Iens
Front surface First interface Second interface Rear surface Layered sample Figure 3. Timing of pulse waves reflected from a specimen, and a gate that selects the first interfacial reflection.
SCANNING ACOUSTIC MICROSCOPY
pulse signal generated from a RF source is approximately 50–300 V. The pulse wave can be focused through the specimen of interest. For example, when the first interface needs to be clearly visualized to detect a defect such as a delamination, the pulse wave is focused onto the first interface to maximize the amplitude of the pulse wave reflected from the first interface. Then, as shown in Fig. 3, the reflected pulse wave is electrically gated out to visualize the first interface as a horizontal cross-sectional image by horizontally scanning the acoustic lens or the x — y stage. Acoustic Lens Design Acoustic Lens for Tone-Burst-Wave Mode. The acoustic lens comprises a piezoelectric transducer and a buffer rod. The transducer is deposited on the top of the buffer rod, and a spherical recess covered by an acoustic antireflective coating is located at the bottom of the buffer rod.
Piezoelectric Transducer. The role of the piezoelectric transducer is to convert electric signals to acoustic signals and back again. Zinc oxide is typically used for the transducer at frequencies of 100 MHz or more. The transducer is sputtered onto the electrode (lower portion) that is deposited on the polished surface of the buffer rod. The electrode material is either chromium–gold or titanium–gold. The electrode (upper portion) is deposited on the zinc oxide using a mask plate to match the centers of the electrode (upper portion) and that of the lens. When an electric signal is input onto the transducer, the transducer is excited at its resonance frequency fr . The condition of the resonance is expressed as 2n − 1 2d = , fr c
Buffer Rod. The diameter of the buffer rod is determined to be larger than the diameter of the transducer. The length of the buffer rod is generally determined to be longer than the near-field region (Fresnel zone) calculated by D2 , 4λ
Lens. The paraxial focal distance of the acoustic lens (denoted as ‘‘F0 ’’) is approximately expressed as F0 =
R , (1 − C )
(4)
C =
C2 , C1
(5)
where R is the radius of curvature of the surface of the lens, C1 is the longitudinal wave velocity of the buffer rod, and C2 is the longitudinal wave velocity of the coupling medium. The radius of curvature of the surface of the lens is determined by considering the operating frequency and the attenuation coefficient. Table 2 shows the range of radii corresponding to various frequencies. The aperture angle of the lens (denoted as ‘‘θα ’’) is determined by the focal distance. When the focal distance and the aperture angle are known, the radius of the aperture (denoted as ‘‘r’’) is determined by
(1)
where λ is the wavelength of the acoustic wave. However, the relationship between the acoustic impedance of the buffer rod and that of the transducer is also considered to determine the thickness. Let us denote the acoustic impedances of the buffer rod and the transducer as Z1 , and Z2 respectively. Then, the relationship of these impedances can be either Z1 > Z2 or Z1 < Z2 . The thickness d is determined as λ/4 when the relationship is Z1 > Z2 , and λ/2 when Z1 < Z2 .
N=
where N is the near-field region, D is the diameter of the transducer, and λ is the wavelength of the material used for the buffer rod. Considering acoustic energy loss, a material that has high velocity compared to that of a coupling medium and low attenuation (e.g., fused quartz, sapphire, or the like) is generally selected. The selection of the material directly relates to the design of the lens in terms of the spherical aberration that needs to be reduced.
r = F0 sin
where d is the thickness of the transducer, n is a natural number, and c is the velocity of the longitudinal wave of the transducer. Therefore, by modifying Eq. (1), the thickness d of the transducer is determined from c 2n − 1 2n − 1 λ, (2) = d= 2 fr 2
(3)
1233
θα 2
.
(6)
The spherical aberration of an acoustic lens is illustrated in Fig. 4. When the acoustic wave from the lens is emitted into the specimen, the following equation is obtained by Snell’s law: C1 sin θ = C2 sin θ,
(7)
where θ is the incident angle of the acoustic wave and θ is the refracted angle of the acoustic wave. Then, the zonal focal distance (denoted as ‘‘F’’) is expressed as F = R (1 − cos θ ) +
sin θ . tan(θ − θ )
Table 2. Range of Radius due to Frequency Frequency 100 MHz 200 MHz 400 MHz 800–1,000 MHz 1.5–3 GHz
Radius 1–2 mm 500 µm–1 mm 500 µm 125 µm 50 µm
(8)
1234
SCANNING ACOUSTIC MICROSCOPY
Transducer
A(q)/R Optical glass air– glass C2 /C1 = 0.667
0.14 Buffer rod
0.12 0.1
F
F0
R
0.02
q′
0
A(q) Figure 4. Calculation of the spherical aberration of the acoustic lens. A(θ) is the spherical aberration, F0 is the paraxial focal distance of the acoustic lens, F is the zonal focal distance, R is the radius of curvature of the surface of the lens, θ is the incident angle of the acoustic wave, and θ is the refracted angle of the acoustic wave.
Therefore, the spherical aberration denoted as ‘‘A(θ )’’ is expressed as A(θ ) = F0 − F.
(9)
For example, when sapphire and water are used for a buffer rod and a coupling medium, respectively, A(θ ) is calculated as 0.003 R, where R is the radius of curvature of the surface of the lens. For an acoustic lens operating at a frequency of 1.0 GHz, R is approximately 100 µm. Therefore, A(θ ) is calculated as 0.3 µm. The value of wavelength of an ultrasonic wave that has a frequency of 1.0 GHz in water (1.5 µm) is small enough for a single spherical lens to form a focused image. Therefore, sapphire (z-cut) is considered the best material although it is difficult to make a spherical recess by mechanical polishing. For reference, a comparison of the ratio of the spherical aberration and the radius (denoted as ‘‘A(θ )/R’’) between an acoustic lens and an optical lens is shown in Fig. 5. The spherical aberration of an acoustic lens is much smaller than that of an optical lens.
Acoustic Anti Reflective Coating. When acoustic waves are emitted directly from sapphire into water that is used as coupling medium, 88% of the acoustic waves are reflected back from the interface between the lens and water because of their impedance mismatch. Therefore, it is necessary to coat the lens with an acoustic anti reflective coating (AARC). The thickness of the AARC should be λ/4, where λ is the wavelength of the acoustic wave, to have an optimized result, and the acoustic impedance Z is Z=
Z1 Z2 ,
Acoustic lens Sapphire– water C2 /C1 = 0.135
0.06
q
10
20
30
40 q
50
60
70
Figure 5. Spherical aberration of optical and acoustic lenses. A(θ) is the spherical aberration, R is the radius of curvature of the surface of the lens, C1 is the longitudinal wave velocity of the buffer rod, and C2 is the longitudinal wave velocity of the coupling medium.
Evaporated silicon oxide (SiO2 ) is typically used as a material for the AARC although it does not completely satisfy Eq. (10). Acoustic Lens for Pulse-Wave Mode. Most of the configurations of acoustic lenses for the pulse-wave mode are the same as those for the burst-wave mode. The significant differences are described here.
Piezoelectric Transducer. Lead zirconate titanate ceramic (PZT), lithium niobate (LiNbO3 ), and polyvinyldifluoride, (PVDF) are typically used for transducer materials for operation of frequencies of 100 MHz or less. Buffer Rod and Lens. This mode is used mainly for observing interior portions of a specimen. Therefore, it is important that the working distance (denoted as ‘‘W. D.’’) is designed to be long and the aperture angle of the lens is designed to be small. The working distance is defined as the distance from the position of the lens touching the surface of the specimen to the position of the lens when focused on the surface of the specimen. The maximum depth at which the material can be imaged is determined by the working distance. Table 3 shows the range of working distances corresponding to the range of frequencies. In the pulse-wave mode, the focal distance is designed to be long, so that the buffer rod is long. Considering its high cost, sapphire is usually not used for the buffer rod. Fused quartz, polymers, and ceramics are typically used.
Table 3. Range of Working Distances due to Frequency Frequency
Working Distance
5–30 MHz 30–50 MHz 50–100 MHz More than 100 MHz
15–50 mm 10–20 mm 5–15 mm Less than 10 mm
(10)
where Z is the impedance of the AARC, Z1 is the impedance of the lens rod, and Z2 is the impedance of the coupling medium.
SCANNING ACOUSTIC MICROSCOPY
Contrast and Application Acoustic properties (i.e., reflection coefficient, attenuation, and velocity of acoustic wave) and the surface condition (i.e., surface roughness and discontinuities) of the specimen are factors in forming acoustic images. Reflection Coefficient. The reflection coefficient at the surface of a specimen differs, depending on the materials. The difference in the reflection coefficient is the basic factor behind the formation of an acoustic image. The reflection coefficient is obtained by the reflectance function defined by the acoustic impedance and is determined by the type of material. The acoustic impedance Z can be determined from Z = ρc,
(11)
where ρ is the density and c is the longitudinal wave velocity. When acoustic waves are incident from liquid to solid at angle θ , some reflect from the solid, and some transmit into the solid (see Fig. 6). Let Z1 , Z2l and Z2s be the acoustic impedance of the liquid, the solid due to a longitudinal wave, and the solid due to a shear wave, respectively. Then, Z1 , Z2l and Z2s are expressed by ρ1 c1 , cos θ ρ2 cl Z2l = , cos θl ρ2 cs Z2s = , cos θs Z1 =
(12) (13) (14)
where ρ1 is the density of the liquid, ρ2 is the density of the solid, c1 is the velocity of the longitudinal wave in the liquid, cl is the velocity of the longitudinal wave in the solid, cs is the velocity of the shear wave in the solid, θ is the reflection angle of the longitudinal wave in the liquid,
z K
KR q
Liquid
qR
x
Solid
qS
KL qL
KS
Figure 6. Reflection and transmission of a wave incident from a liquid to a solid. K, KR , KL , and KS are wave vectors that represent the incident longitudinal wave, the reflected wave, the refracted longitudinal wave, and the mode-converted shear wave, respectively. θ, θR , θL , and θS are angles corresponding to the wave vectors.
1235
θl is the refraction angle of the longitudinal wave in the solid, and θs is the refraction angle of the shear wave in the solid. The following equations are obtained by Snell’s law: sin θl = sin θs =
cl c1 cs c1
sin θ,
(15)
sin θ.
(16)
The reflectance function R(θ ) is expressed as R(θ ) =
Z2l cos2 2θs + Z2s sin2 θs − Z1 Z2l cos2 2θs + Z2s sin2 θs + Z1
.
(17)
When an acoustic wave is vertically incident from a liquid to a solid, the reflection coefficient R is determined as R=
Z2l − Z1 . Z2l + Z1
(18)
From this equation, it can be seen that the larger the difference in acoustic impedance of two contacting materials, the greater the reflection coefficient at the interface. Normally, water is used as coupling medium between the lens and a specimen. The acoustic impedance of water is lower than that of a solid; materials that have greater acoustic impedance show a higher reflection coefficient at the surface of a specimen, thus, forming image contrast. For example, suppose that an acoustic wave is perpendicularly incident onto fused quartz and sapphire immersed side by side in water. They have different elastic properties (their reflection coefficients are 0.632 and 0.873, respectively), but it is difficult to distinguish optically between them. However, the acoustic images show a significant difference in accordance with their reflection coefficients. In addition, the acoustic impedance of a gas, such as air, is significantly lower. Thus, if a material has a void, a debonding, or a delamination, the acoustic wave will reflect strongly from the solid–gas interface. Many applications have been reported for visualizing voids, debondings, delaminations, inclusions, adhesion strength, powder distribution, deformation, and residual stress distribution, using this contrast mechanism. This nondestructive evaluation (NDE) technique is applicable to thin and thick layered films (60–62), electrical/electronic materials including integrated circuits (63–67), polymers (68–73), composite materials (74–76), ceramics (77–79), and other materials (80,81). For example, Figs. 7 and 8 show debonding of reinforcing fibers from matrices. Figure 9 shows delaminations at the interfaces of carbon-fiber-reinforced plastics (C-FRP). Figure 10 shows delaminations and adhesion differences in a layered thin film. Figure 11 shows gaps around inclusions located at the interface of the substrate and the urethane coating. Figures 12 and 13 show deformations of composites. Figure 14 shows the powder distribution in composites.
1236
SCANNING ACOUSTIC MICROSCOPY
(a)
Impact impression
0°
0° 90°
Delamination A Transverse crack
90° Delamination B
Debonding 100 µm
0° Back surface crack
Figure 7. Debonding shown in metal matrix composites (MMC; fiber: carbon; matrix: aluminum alloy). The debondings are observed as bright regions at interfaces. Frequency: 200 MHz; defocusing distance: z = −200 µm.
Attenuation. When an acoustic wave travels within a specimen, its amplitude diminishes with travel distance and is finally not measurable. This phenomenon is called attenuation unique to the specimen and, mathematically, is described by Pt = P0 exp(−αz),
A
90° B 0° (b)
(19)
where Pt is the amplitude of a transmitted wave, P0 is the amplitude of an incident acoustic wave, α is the attenuation coefficient, and z is the travel distance. Figures 15 and 16 are acoustic images of living mouse cells (3T3) and a thinly sliced biological tissue (heart muscle). The acoustic impedance of these is close to that of water (or culture liquid). Therefore, virtually no contrast caused by the difference in reflection coefficient is displayed in these images. The contrast in the acoustic images of living cells or biological tissue is generated primarily from the difference in attenuation. Therefore, it is necessary to use a background composed of highly reflective material to make maximum use of the difference in attenuation. For example, when two types of substrates (one made of glass that has a longitudinal velocity of 5.6 km/s, a density of 2.49 × 103 kg/m3 , and a reflection coefficient of 0.581, and another of polystyrene that has corresponding values of 2.4 km/s, 1.05 × 103 kg/m3 , and 0.232, respectively) are used to image a biological specimen, the glass substrate theoretically provides better results. A variety of biological applications have been reported that use this contrast mechanism (82–93).
Figure 8. Debonding shown in plastic matrix composites (PMC; fiber: carbon; matrix: glass ceramics). The debondings are observed as bright regions at interfaces.
0°
Figure 9. Delaminations at the interfaces of carbon-fiber-reinforced plastics (C-FRP). (a) Schematic diagram showing locations of delaminations. (b) Superimposed pulse-wave-mode images showing delaminations at interfaces. Delaminations were induced by the impact of energy of 1.71 J. Directions of fibers are shown in [0°n /90°n ]SYM .
Velocity of Surface Acoustic Waves (SAW). This parameter is very important for images obtained in the tone-burst-wave mode. Figure 17a is the acoustic image of polycrystalline manganese zinc ferrite that has an operating frequency of 400 MHz and has the acoustic lens focused on the surface (z = 0 µm). Also, Figs. 17b,c are the acoustic images of the same specimen where the acoustic lens is focused into the interior (z = −25 µm), and the same areas were scanned. Acoustic
Debonding
f = 800 MHz
50 µm
z = 0 µm
50 µm
z = −9 µm
20 µm
SCANNING ACOUSTIC MICROSCOPY
1237
receiving voltage in the transducer associated with the excitation of the SAW. When the aperture angle of the acoustic lens is large, surface acoustic waves are generated because the incident angle goes beyond the Rayleigh critical angle. Therefore, the contrast in Fig. 17b is caused by the generation of the SAW. The aperture angle of the acoustic lens used for forming the image shown in Fig. 17c is not large enough to generate surface acoustic waves. Details will be given here.
100 µm
Figure 10. Delaminations and adhesion differences in a thin layered film. Frequency: 800 MHz. See color insert.
Inclusion Incomplete Bonding
V (z ) Curve. Figure 18 shows a schematic diagram of an acoustic lens for expressing the V(z) curve using an angular-spectrum approach (6). When a plane acoustic wave that has an angular frequency ω is incident on the transducer plane, the acoustic field at the transducer plane is expressed as −iωt . Hereafter, we omit e−iωt . u+ 0 (x, y)e Acoustic fields of the planes (denoted as i, where i = 0, 1, 2, and 3) are expressed by a spatial distribution u± i (x, y) where i = 0, 1, 2, and 3 or a frequency distribution Ui± (kx , ky ), where i = 0, 1, 2, and 3. Superscripts + or − indicate the direction of field travel, that is in the +z or −z direction, respectively. The angular spectrum at the transducer plane (z = z0 ) is expressed as Deformation
400 µm
Figure 11. Inclusion at the interface of the urethane coating and the substrate. Frequency: 200 MHz.
lenses that have aperture angles of 120° and 60° , respectively, were used. Grains are not clearly seen in Figs. 17a,c, but are observed with significant contrast in Fig. 17b. Furthermore, the contrast in the images formed by the acoustic lens that has the aperture angle of 120° changes in accordance with the position of the focal point controlled by the movement of the acoustic lens along the Z axis. This contrast mechanism is explained by the change of the
200 µm
Figure 13. Deformation shown in metal matrix composites (MMC; whisker: SiC; matrix: aluminum alloy). Frequency: 600 MHz; defocusing distance: Z = −10 µm.
Plastic deformation
z = 0 µm
200 µm
z = −20 µm
z = −25 µm
Figure 12. Deformation shown in plastic matrix composites (PMC; fiber: glass; matrix: nylon 6). Frequency: 200 MHz.
1238
SCANNING ACOUSTIC MICROSCOPY
Sic
AI 50 µm Figure 14. Powder distribution in metal matrix composites (MMC; powder: SiC; matrix: aluminum alloy). Frequency: 600 MHz; defocusing distance: Z = −4 µm.
400 µm Figure 16. Human heart muscle. Frequency: 400 MHz. To prepare thin specimens, the tissue was embedded in paraffin and sectioned at 5 µm by the microtome. For scanning acoustic microscopy, the thinly sectioned specimen was affixed on the glass substrate, wherein the specimen was deparaffinized in xylene but not stained.
When ultrasonic waves travel from the transducer plane (z0 ) to the back focal plane (z1 ), only the phase related to the distance traveled (|z1 − z0 |) is changed: U1+ (kx , ky ) = U0+ (kx , ky ) exp[ikz (z1 − z0 )].
(23)
The pupil function P(x, y), which shows the shape of the lens, is expressed as
Figure 15. Living mouse cells (3T3). Frequency: 600 MHz; defocusing distance: Z = −5 µm; culturing medium: Dulebecco’s modification of Eagle’s medium: 1 × (Mod); temperature of the culturing medium: 37.5 ° C. The cells that are grown on the glass substrate have enough acoustical reflectivity from the substrate.
U0+ (kx , ky ) = F [u+ 0 (x, y)], ∞ =
=
=
where R is the radius of the pupil and circ. is expressed as circ(r) =
u+ 0 (x, y) exp[−i{kx x + ky y + kz (z − z0 )}] dx dy, u+ 2 (x, y) = u+ 0 (x, y) exp[−i(kx x + ky y)] exp[ikz (z − z0 )] dx dy,
−∞
∞
(24)
1, 0,
r<1 . r>1
(25)
Using the Fresnel approximation and the thin lens model, u+ 2 (x, y) is expressed as
−∞
∞
(x2 + y2 )1/2 , P(x, y) = circ R
100 µm
exp[ik0 f (1 + c2 )] iλ0 f ∞ ×
u+ 1 (x1 , y1 )P1 (x1 + x2 , y1 + y2 )
−∞
u+ 0 (x, y) exp[−i(kx x + ky y)] dx dy.
(20)
−∞
Then, the relationships between the spatial and frequency distributions are generalized by the following equations: Ui± (kx , ky ) = F {u± i (x, y)}, u± i (kx , ky ) = F
−1
{Ui± (x, y)}.
2π × exp −i (x1 x2 + y1 y2 ) dx1 dy1 , λ0 f
where k0 is the wave number in the coupling medium, f is the front focal distance, λ0 is the wavelength in the coupling medium, P1 is the pupil function from the lens to the specimen, and c is the ratio expressed by
(21) (22)
(26)
c=
C2 , C1
(27)
SCANNING ACOUSTIC MICROSCOPY
(a)
u0+
1239
u0−
Transducer 0
Rt
u1+
u1− D
1
R u2+
100 µm
u2− x
2
(b)
u3+
u3− Sample
3
z Figure 18. Schematic diagram of an acoustic lens for expressing the V(z) curve by an angular-spectrum approach. ‘Plane 0’ is the transducer plane, ‘Plane 1’ is the back focal plane, ‘Plane 2’ is the front focal plane, ‘Plane 3’ is the surface of the specimen, and u± i (x, y), where i = 0, 1, 2, and 3, is a spatial distribution.
2 exp[ik0 f (1 + c )] + u+ (x, y) = F [u (x, y)P (x, y)] k0 x , 1 2 1 kx = iλ0 f kf y 0 ky =
100 µm
(c)
f
(29) U2+ (kx , ky ) = F [u+ 2 (x, y)].
(30)
Combining Eqs. (29) and (30) gives
100 µm Figure 17. Manganese zinc ferrite. Frequency: 400 MHz; (a) z = 0 µm (aperture angle is 120° ); (b) z = −25 µm (aperture angle is 120° ); (c) z = −25 µm (aperture angle is 60° ).
U2+ (kx , ky ) exp[ik f (1 + c2 )] 0 + =F F [u1 (x, y)P1 (x, y)] k0 x kx = iλ0 f kf y 0 ky =
f
where C1 is the longitudinal wave velocity of the buffer rod and C2 is the longitudinal wave velocity of the coupling medium. Supposing that x2 x1 and y2 y1 at the front focal plane, the pupil function from the lens to specimen is expressed as P1 (x1 + x2 , y1 + y2 ) ∼ = P1 (x1 , y1 ). Then Eq. (26) is rewritten as
f f = iλ0 f exp[ik0 f (1 + c2 )]u+ kx , − ky 1 − k0 k0 f f kx , − ky . (31) × P1 − k0 k0 Suppose that k2x + k2y k0 . Then, the following approximation is obtained:
(28) kz =
(k20
−
k2x
−
k2y )
1 ∼ = k0 − 2
k2x + k2y k0
.
(32)
1240
SCANNING ACOUSTIC MICROSCOPY
Then U3+ (kx , ky ) is written as
Then, Eq. (39) is written as
U3+ (kx , ky )
U2+ (kx , ky ) exp[ik0 z] exp
=
−i
k2x + k2y 2k0
z , (33)
z = |z3 − z2 |,
(34)
where z2 and z3 are values along the Z axis at the front focal and the specimen planes, respectively. U3− (kx , ky ) is the angular spectrum of the reflection wave at the interface between the specimen and the coupling medium. Therefore, it is written as
kx ky , U3− (kx , ky ) = U3+ (kx , ky )R −i z , k0 k0
(35)
where R is the reflectance function [see Eq. (17)]. The angular spectrum of the reflection wave at the front focal plane is written as U2− (kx , ky )
=
U3− (kx , ky ) exp[ik0 z] exp
−i
k2x + k2y 2k0
2 f (1 + c )] exp[ik 0 − − u1 (x, y) = P1 (x, y)F [u2 (x, y)] k0 x , kx = iλ0 f kf y 0 ky = f
u− 1 (x, y) =
exp[ik0 f (1 + c )] P2 (x, y)U2− iλ0 f 2
k0 k0 x, y , f f
(41) (42)
2 + u− 1 (x, y) = − exp{i2k0 [z + f (1 + c )]}u1 (−x, −y)P1 (−x, −y) k0 x y . × P2 (x, y) exp −i 2 z(x2 + y2 ) R , f f f (43)
The variation of transducer voltage, as the position of the focal point varies is controlled by the movement of the acoustic lens along the z axis and is expressed as
z .
∞ V(z) =
(36)
− u+ 0 (x1 , y1 )u0 (x1 , y1 ) dx1 dy1 .
(44)
−∞
From Eqs. (33), (35), and (36), we obtain The wave number kl in the buffer rod is expressed as
U2− (kx , ky )
f f − k ky , − = iλ0 f exp[ik0 f (1 + c2 )]u+ x 1 k0 k0 f f kx , − ky exp(i2k0 z) × P1 − k0 k0 k2x + k2y kx ky . , × exp −i z R k0 k0 k0 (37)
kz = (k2l − k2x − k2y )1/2 ,
(45)
+ ∗ u+ 1 (x1 , y1 ) = u0 (x1 , y1 ) F [exp(ikz d)],
(46)
u− 0 (x1 , y1 )
=
∗ u− 1 (x1 , y1 )
∞ V(z) =
u− 1 (ξ, η)
−∞
−1
[U2− (kx , ky )].
u− 1 (x, y) =
×
F u− 2 (x2 , y2 )P2 (x1 + x2 , y1 + y2 )
−∞
2π × exp −i (x1 + x2 , y1 + y2 ) dx2 dy2 , (39) λ0 f where P2 is the pupil function from the coupling medium to the lens. We use the same approximation as in Eq. (28): P2 (x1 + x2 , y1 + y2 ) ∼ = P2 (x1 , y1 ).
(48)
u+ 0 (x1 , y1 )
−∞
Equation (45) is an even function. Therefore, the following relationship is obtained:
exp[ik0 f (1 + c2 )] iλ0 f ∞
∞
(47)
−1 × F [exp(ikz d)] x=x1 −ξ dx1 dx2 dξ dη. (49) y=y2 −η
(38)
Similarly to obtaining Eq. (26), we obtain
[exp(ikz d)],
d = |z1 − z0 |,
From Eq. (22), we obtain u− 2 (x, y) = F
F
−1
−1
[exp(ikz d)] x=x1 −ξ = F y=y1 −η
[exp(ikz d)] x=ξ −x1 .
(50)
y=η−y1
Using Eq. (50), Eq. (49) is rewritten as ∞ V(z) =
+ ∗ dξ dη, u− 1 (ξ, η) u0 (ξ, η) F [exp(ikz d)] x=ξ y=η
−∞
∞ V(z) =
(40)
−1
−∞
(51) + u− 1 (ξ, η)u1 (ξ, η) dξ dη.
(52)
SCANNING ACOUSTIC MICROSCOPY
Then, finally, we obtain ∞ V(z) = − exp{i2k0 [z + f (1 + c2 )]}
+ u+ 1 (−x, −y)u1 (x, y)
−∞
k0 x y exp −i 2 × P1 (−x, −y)P2 (x, y)R , f f f × z(x2 + y2 ) dx dy. (53)
Omitting the constant exp{i2k0 [z + f (1 + c2 )]}, Eq. (53) is rewritten as ∞ V(z) = −∞
+ u+ 1 (−x, −y)u1 (x, y)P1 (−x, −y)P2 (x, y)
x y k0 2 2 ×R , exp −i 2 z(x + y ) dx dy. (54) f f f
Using r − θ coordination, the function of the voltage of the transducer, in accordance with the position of the focal point controlled by the movement of the acoustic lens along the Z axis, is rewritten as ∞ V(z) = 2π
2 u+ 1 (r) P1 (r)P2 (r)R
−∞
r f
k0 × exp −i 2 zr2 dr. f
(55)
Equation (55) shows that the acoustic field of the back focal plane, the lens, the reflectance function, and the defocal distance are the main factors in the output voltage. Although the shape of the amplitude distribution of the acoustic field at the back focal plane is determined by the size of transducer and the frequency of the acoustic wave, generally speaking, its shape forms that of a Gaussian distribution (30). The pupil function is substantially constant. Therefore, the reflectance function affects the V(z), which depends on the elastic properties of the material. Furthermore, the V(z) changes in accordance with the defocusing distance. Figure 19 shows the V(z) curve for fused quartz. An acoustic lens that has an aperture angle of 120° and a
1241
working distance of 310 µm was used at an operating frequency of 400 MHz. The specimen was located in the water tank. The temperature of 22.3 ° C in the coupling medium (distilled water) was measured by a thermocouple. The temperature was substantially stabilized (change less than ±0.1 ° C). The movement of the acoustic lens along the Z axis was monitored by a laser-based measuring instrument. The transducer output voltage was periodic and had axial motion as the acoustic lens advanced from the focal plane toward the specimen. The contrast changed in accordance with the period.
Phase Change. Figure 20a,b show the amplitude and phase changes of the reflectance function due to incident angles of the acoustic waves from water to fused quartz, respectively. Figure 20a shows the relationship between the incident angle and the amplitude of the ultrasonic beam emitted from an acoustic lens operating at a frequency of 400 MHz onto the specimen (fused quartz) via the coupling medium (water). When the incident angle is close to the critical angle of the longitudinal wave (14.58° ), the amplitude of the reflectance function suddenly becomes strong and becomes the maximum value at the critical angle. After passing the critical angle of the longitudinal wave, the amplitude decreases up to about 0.8 when the incident angle increases. Then, the incident angle is close to the critical angle of the shear wave (23.49° ); the amplitude is again suddenly strong and becomes the maximum value at the critical angle. After passing the critical angle of the shear wave, the amplitude is constant. Figure 20b shows the relationship between the incident angle and the phase of the ultrasonic beam emitted from an acoustic lens operating at a frequency of 400 MHz to the specimen (fused quartz) via the coupling medium (water). The phase changes a little at the critical angle of the longitudinal wave but changes significantly in the neighborhood of the Rayleigh critical angle. This significant phase change is the key factor in contrast change. Using a ray-tracing technique, this mechanism is explained as follows. The period of this variation results from interference between the two components. Figure 21 shows that one component is spectrally reflected at normal incidence; the second undergoes a lateral shift on incidence and reradiates at the critical phase-matching angle
0 −5
dB
−10 −15 −20 −25 ∆z
−30 −35 −250
−200
−150
−100
−50
0
50 µm
100
150
Figure 19. V(z) curve for fused quartz; Specimen: fused quartz; coupling medium: distilled water; temperature of the coupling medium: 22.3 ° C (change less than ±0.1 ° C). Parameters of the acoustic lens are as follows: frequency: 400 MHz; aperture angle: 120° ; and working distance: 310 µm.
1242
SCANNING ACOUSTIC MICROSCOPY
q′L (deg) (a)
0 10
20
30
∇A
35
40
45
∇B
Amplitude of R
1
I+
II+
II−
Lens I−
0.5
O′
0 1
0.9
0.8
0.7
R
Q
Normalized wavenumber k ′u /k ′L (b)
q′L (deg) 0 10
20
35
40
45
B
Sample
A
∇C
180 Phase of R (deg)
30
Coupler liquid
O
z
C Leaky surface wave
90
Figure 21. Cross-sectional geometry of spherical acoustic lens, explaining the mechanism of the V(z) curves.
0
The phase difference is calculated by
−90
−180 1
0.9 0.8 Normalized wavenumber k ′u /k ′L
0.7
Figure 20. Reflectance function (a) Amplitude. (b) Phase. Specimen: fused quartz; coupling medium: distilled water; temperature of the coupling medium: 22.3 ° C (change less than ±0.1 ° C). Parameters of the acoustic lens are as follows: frequency: 400 MHz, aperture angle: 120° ; and working distance: 310 µm.
for the SAW (also referred to as leaky Rayleigh waves). When the acoustic wave is focused onto the surface of the specimen, the phases of acoustic waves that travel path (I) and path (II) are the same. Let this phase be denoted as f . When the acoustic lens is defocused toward the specimen at distance z, the phase changes of the waves that travel paths (I) and (II) are expressed as
(I)
(II)
= f −
4π z 4π tan θR + + π, λW cos θR λR
(56)
(57)
where λW is the wavelength of the coupling medium (water), λR is the wavelength, and θR is the Rayleigh critical angle.
(58)
By Snell’s law, we obtain the following equation: VW λW = = sin θR , VR λR
(59)
where VR is the SAW (e.g., Rayleigh wave) velocity on the surface of the specimen whose penetration depth is about one wavelength: λW (60) λR = sin θR Then, putting Eq. (59) into Eq. (58), we obtain the quantitative contrast factor as = 4π z
2OC 4π z = f − , 2π = f − λW λW 2AC 2AB = f − 2π + 2π + π λW λR
= (II) − (I)
1 1 − cos θR + π. = 4π z tan θR λW + λR
1 − cos θR λW
+ π.
(61)
Theory of the SAW Velocity Measurement. Using the phase difference of two waves, the velocity of a surface acoustic wave (SAW) such as leaky Rayleigh waves, which propagate within a small area of a specimen, can be obtained (4,5). When ((II) − (I) ) = (2n − 1)π , where n is a natural number, the V(z) curve is at a local minimum. Therefore, the period of the V(z) is obtained as z =
λW . 2(1 − cos θR )
(62)
SCANNING ACOUSTIC MICROSCOPY
By rewriting Eq. (60), we obtain VR =
VW . sin θR
(63)
VW , f
(64)
λW is expressed as λW =
where λ is the frequency of the acoustic lens. Therefore, finally, from Eqs. (62–64) finally we obtain the following equation to calculate the SAW velocity: VR =
1−
VW 1 VW · 2 z · f
2 .
(65)
Many applications using this result have been reported for biological materials (94), thin films (95–101), coated materials (102–106), ceramics (107,108), and other materials (109–113).
Optimizing Measurement Precision. Assuming that 1 VW , Eq. (65) is expressed approximately as 2f z VR
VW f z.
the movement of the acoustic lens along the Z axis accurately. Many refinements in techniques for precision measurement have been reported (114–116). Scattering. The scattering of acoustic waves can be divided into two categories: scattering caused by the particular condition of the specimen surface and scattering caused by nonhomogeneous particles inside the specimen. The former covers discontinuities including those such as surface roughness. Although minor roughness (generally λ/5 or lower) is acceptable (117), large degrees of roughness affect the image directly (see Fig. 22). The image that is formed gives the impression that it is being viewed through frosted glass. For scanning acoustic microscopy, it is necessary either to consider the roughness in relation to the size of the area under observation or lower the frequency to minimize the influence of any roughness; some techniques for removing the effects of surface roughness have been reported (14,118,119). The previous also applies to wave scattering caused by nonhomogeneous particles and must be kept in mind, especially for subsurface imaging. A specimen that contains a huge amount of scattering bodies is difficult to observe under high resolution and necessitates lowering the observation frequency.
(66)
To enhance the precision of the SAW velocity measurement by the V(z) curve technique, both sides of Eq. (66) are differentiated after taking the logarithm of both sides. Then, the following equation is obtained: 1 dVW 1 df 1 dz dVR = + + VR 2 VW 2 f 2 z
1243
(67)
Equation (67) shows that the errors in measuring the SAW velocity are the sum of the errors in the values of the velocity of the coupling medium, the frequency of the acoustic wave, and the distance of the period. Therefore, to minimize the measurement error, it is necessary to maintain a constant temperature for the coupling medium to stabilize the frequency of the acoustic wave and measure
(a)
Discontinuities. When a specimen includes an elastic discontinuity, such as an edge, a step, a crack, or a joint interface, an acoustic image of the elastically discontinuous and peripheral portions visualized by the scanning acoustic microscope shows unique contrast, for example, fringes or black stripes. As a model shown in Fig. 23, this type of contrast appears as an interference effect of surface acoustic waves incident on and reflected from elastic discontinuities (120–125). Figure 24a is an image obtained by the scanning electron microscope (SEM) of a Cu/Si3 N4 interface after the Si3 N4 portion was given a micro-Vickers indentation. Figure 24b is the magnified SEM image of the portion that has a vertical radial crack which reached the Cu/Si3 N4 interface. The SEM at higher magnification did not reveal any delamination. The assumption was that an opened
(b)
50 µm Figure 22. Surface roughness. (a) Incident acoustic waves are scattered by surface roughness of the specimen. Therefore, the interior portions are difficult to observe. (b) SAM image of the specimen that has a polished surface. Specimen: stainless steel (SUS 304); frequency: 800 MHz.
50 µm
1244
SCANNING ACOUSTIC MICROSCOPY
Scanning direction Acoustic lens
Coupler
Disconnected boundary
Sample 2
1
Figure 23. Model of Rayleigh wave excitation at a boundary.
delamination on the surface, soon closed due to change of time and relief of stress when the indentation was completed. Figures 25a,b are SAM images. A frequency of 400 MHz was chosen for the visualization. The scanning width was
1.0 mm. In Figs. 25a,b, the acoustic lens was defocused by 40 and 100 µm, respectively. Without further preparation, the SAM gave the image of the same specimen at similar magnifications and revealed the delamination (see Fig. 25a). Moreover, the delamination was observed with enhanced contrast (Fig. 25b). This can be explained by the SAW scattering mechanism, that is, when an acoustic lens that has a high numerical aperture is defocused, an ultrasonic beam is incident at a certain range of angles onto the specimen. In this experiment, the aperture angle of the acoustic lens was 120° . At a certain incident angle, the SAW is excited, travels on the surface of the specimen, and is reflected back from the interface (see Fig. 23). These reflected SAWs give the enhanced contrast of discontinuities in a specimen in acoustic images. Resolution Two types of resolution must be considered for the SAM. One is a lateral resolution (notated as ‘‘r’’), and the other is a vertical resolution (notated as ‘‘ρ’’). These are
(b)
(a) Cu
Cu
Joint interface
Joint interface Si3N4
Si3N4
Figure 24. Scanning electron microscope (SEM) image. (a) Schematic view. (b) Magnified image. The micro-Vickers indentation was made in the Si3 N4 portion to create the delamination at the interface between steel and Si3 N4 .
(a)
(b) Cu
Cu
Delamination
Interference fringes
Si3N4 Si3N4
200 µm
Figure 25. SAM images. (a) Z = −40 µm; (b) Z = −100 µm; frequency: 400 MHz.
200 µm
SCANNING ACOUSTIC MICROSCOPY
1245
(b)
(a) (µm) (µm)
1.4
4.5
1.6 1.8 2.0 2.2 2.5
4.0 3.5 3.1
5.0
2.8
(c)
Figure 26. Resolution. (a) Optical image of a standard specimen that has patterns for measuring resolution for the scanning acoustic microscope in the C-scan mode using a tone-burst wave; (b) frequency: 1.0 GHz; (c) frequency: 400 MHz.
expressed as r = Fλ = F
vw f
of the lens. Figure 26 shows surface resolutions at different frequencies. The resolutions in the images tend to be higher than the calculated resolutions.
ρ = 2F 2 λ = (2F 2 )
,
(68) vw f
CONCLUSION , The features for the SAM are summarized as follows:
fo 2 v w =2 , D f 2 1 vw =2 , 2 tan θ f vw 1 = , 2(tan θ )2 f
(69)
where F is a constant related to lens geometry, λ is the wavelength in the coupling medium (water), f is the frequency of the wave generated by the transducer, vw is the longitudinal wave velocity in the coupling medium, fo is the focal distance of the lens, D is the diameter of the lens aperture, and θ is one-half of the aperture angle
1. capability of observing both the surface and subsurface of a specimen 2. high resolution 3. high contrast 4. capability of quantitative data acquisition Although the SAM has unique features, it (especially the tone-burst-wave mode) has not been used in industry as much as conventional microscopes. It is hoped that dedicated work performed by acoustic microscopists will clear the obstacles to acceptance and allow the SAM to mature and prosper fully and that this article, which describes the fundamentals of scanning acoustic microscopy, is useful to a person to become one of them.
1246
SCANNING ACOUSTIC MICROSCOPY
ABBREVIATIONS AND ACRONYMS
20. K. Kasano, T. S. Bea, and C. Miyasaka, Proc., Japan-U.S. CCM-VII, 1995, pp. 601–608.
AARC A/D C-FRP D/A DMB FRP IC LiNbO3 mV NDE PVDF PZT RF SAM SAW SEM SPDT V W.D.
21. A. M. Howard, IEE Colloquium on ‘NDT Evaluation of Electronic Components and Assemblies’, 1990, p. 3.
acoustic antireflective coating analog-to-digital signal carbon-fiber-reinforced plastic digital-to-analog signal double balanced mixer fiber reinforced plastic integrated circutis lithium niobate milli volt nondestructive evaluation polyvinyldifluoride lead zirconate titanate ceramic radio frequency scanning acoustic microscope surface acoustic wave scanning electron microscope single pole double throw volt working distance
BIBLIOGRAPHY 1. S. Y. Sokolov, Academia Nauk SSSR Doklady, 64(3), 333–335 (1949). 2. R. A. Lemons and C. F. Quate, Appl. Phys. Lett. 24, 163–165 (1974). 3. A. Atalar, C. F. Quate, and H. K. Wickramasinge, Appl. Phys. Lett. 31, 791 (1977). 4. R. D. Weglein, Appl. Phys. Lett. 34, 179–181 (1979). 5. W. Parmon and H. L. Bertoni, Electron. Lett. 15, 684–686 (1979). 6. A. Atalar, J. Appl. Phys. 49, 5,130–5,139 (1978). 7. B. Hadimioglu and C. F. Quate, Appl. Phys. Lett. 43, 1,006–1,007 (1983). 8. J. S. Foster and D. Rugar, Appl. Phys. Lett. 42, 869–871 (1983). 9. M. S. Muha, A. A. Moulthrop, G. C. Kozlowski, and B. Hadimioglu, Appl. Phys. Lett. 56, 1,019–1,021 (1990). 10. N. Chubachi, J. Kushibiki, T. Sannomia, and Y. Iyama, IEEE, Proc. Ultrasonic Symp., 1979, pp. 415–418. 11. D. A. Davids, P. Y. We, and D. Chizhik, Appl. Phys. Lett. 54(17), 1,639–1,641 (1989). 12. A. Atalar, H. Koymen, and L. Degertekin, Proc. IEEE Ultrasonics Symp., 1990, pp. 359–362. 13. B. T. Khuri-Yakub, C. Cinbis, and P. A. Reinholdtsen, Proc. IEEE Ultrasonics Symp., 1989, pp. 805–807.
22. L. Revay, G. Lindblad, and L. Lind, CERT’90; Component Engineering, Reliability and Test Conference (Electron Components Inst.), 1990, pp. 115–122. 23. R. Bauer, M. Luniak, and L. Rebenklau, Proc. Int. Symp. Microelectronics (IMAPS), 1997, pp. 659–664. 24. T. Adams, Electron. Bull. 16(6), 51–55 (1998). 25. J. C. Bernier and L. Teems, Proc. Int. Symp. Testing Failure Anal. (ISFTA), 1998, pp. 393–398. 26. B. Smith, Am. Lab. 31(22), 18–21 (1999). 27. J. Kushibiki, K. Horii, and N. Chubachi, Electron. Lett. 19, 404–405 (1983). 28. J. Kushibiki, K. Horii, and N. Chubachi, Electron. Lett. 19, 404–405 (1983). 29. K. Liang, G. S. Kino, and B. T. Khuri-Yakub, IEEE Trans. SU-32, 213–224 (1985). 30. T. Endo, Y. Sasaki, T. Yamagishi, and M. Sakai, Jpn. Appl. Phys. 31, 160–162 (1992). 31. A. Kulik, G. Gremaud, and S. Sathish, in Acoustic Imaging, vol. 17, Plenum Press, New York, 1989, pp. 71–78. 32. Y. Tsukahara, E. Takeuchi, E. Hayashi, and Y. Tani, Proc. IEEE 1984 Ultrason. Symp., 1984, pp. 992–996. 33. M. Nikoonahad, Electron. Lett. 32(10), 489–490 (1987). 34. S. W. Meeks et al., Appl. Phys. Lett. 55(18), 1,835–1,837 (1989). 35. J. Attal, in E. A. Ash, ed., Scanned Image Microscopy, Academic Press, London, 1980, pp. 97–118. 36. H. K. Wickramasinghe and C. R. Petts, in E. A. Ash, ed., Scanned Image Microscopy, Academic Press, London, 1980, pp. 57–100. 37. K. Karaki and M. Sakai, K. Toda, ed., Toyohashi Int. Conf. Ultrasonic Technol., Toyohashi, Japan, MYU Research, Tokyo, 1987, pp. 25–30. 38. K. Yamanaka, Y. Nagata, and T. Koda, Ultrasonics Int. 89, 744–749 (1989). 39. J. D. Fox, B. T. Khuri-Yakub, and G. S. Kino, Proc. IEEE 1983 Ultrasonics Symp., 1983, pp. 581–592. 40. T. Yano, M. Tone, and A. Fukumoto, IEEE Trans. Ultrasonics Ferroelectronic Frequency Control 34(2), 222–236 (1987). 41. M. I. Haller and B. T. Khuri-Yakub, Proc. IEEE 1992 Ultrasonics Symp., pp. 937–939. 42. D. Reilly and G. Hayward, Proc. IEEE 1991 Ultrasonics Symp., pp. 763–766.
14. C. Miyasaka, B. R. Tittmann, and M. Ohno, Res. Nondestructive Evaluation 11, 97–116 (1999).
43. D. W. Schindel, D. A. Hutchins, L. Zou, and M. Sayer, IEEE Trans. Ultrasonics Ferroelectronics Frequency Control 42, 42–51 (1995).
15. C. W. W. Daft, J. M. R. Weaver, and G. A. D. Briggs, J. Micros. 139(3), RP3–RP4 (1985).
44. I. Ladabaum, B. T. Khuri-Yakub, and D. Spoliansky, Appl. Phys. Lett. 68, 7–9 (1996).
16. B. T. Khuri-Yakub and C. -H. Chou, Proc. IEEE Ultrasonics Symp., 1986, pp. 741–744.
45. M. C. Bhardwaj, I. Neeson, M. E. Langron, and L. Vandervalk, Proc. 24th Conf. An Int. Conf. Eng. Ceram. Struct., American Ceramic, Materials, and Structures: A, 2000, pp. 163–172.
17. A. Atalar, I. Ishikawa, Y. Ogura, and K. Tomita, Proc. IEEE Ultrasonics Symp., 1993, pp. 613–616. 18. S. Ujihashi, K. Skanoue, T. Adachi, and H. Matsumoto, Proc. Mech. Eng. Fourth Int. Conf. FRC’90, 1990, pp. 157–162. 19. H. Matsumoto, T. Adachi, and S. Ujihashi, Proc. Oji International Seminar on Dynamic Fracture, Chou Technical Drawing Co., Ltd., 1990, pp. 174–182.
46. C. Miyasaka and B. R. Tittmann, Proc. IEEE 1998 Ultrasonics Symp., pp. 1,265–1,268. 47. I. Ihara, C. -K. Jen, and D. R. Franca, Proc. IEEE 1998 Ultrasonics Symp., pp. 803–807. 48. R. Kompfner and R. A. Lemons, Appl. Phys. Lett. 28(6), 295 (1976).
SCANNING ACOUSTIC MICROSCOPY 49. C. E. Yeack, M. Chodorow, and C. C. Cuttler, Appl. Phys. 51(9), 4,631–4,636 (1977). 50. L. Germain and J. D. N. Cheek, J. Acoust. Soc. Am. 83(3), 942–949 (1988). 51. L. Germain, R. Jacques, and J. D. N. Cheek, J. Acoust. Soc. Am. 86(4), 1,560–1,565 (1989). 52. M. R. T. Tan, H. L. Ransom Jr., C. C. Cutler, and M. Chodorow, Appl. Phys. 57, 4,931–4,935 (1985). 53. H. K. Wickramasinghe and C. E. Yeack, Appl. Phys. 48(12), 4,951–4,954 (1977). 54. D. Rugar, Appl. Phys. 56, 1,338–1,346 (1984). 55. K. Kraki, T. Saito, K. Matsumoto, and Y. Okuda, Physica B 165/166, 131–132 (1990). 56. K. Kraki, T. Saito, K. Matsumoto, and Y. Okuda, Appl. Phys. Lett. 59(8), 908–910 (1991). 57. V. A. Burov, I. E. Gurinovich, O. V. Rudenko, and E. Y. Tagunov, in Acoustic Image, vol. 22, Plenum, NY, 1996, pp. 125–130. 58. X. Gong and D. Zhang, in Acoustic Imaging, vol. 23, Plenum, NY, 1997, 601–605. 59. J. Synnevag and S. Holm, Proc. IEEE 1998 Ultrasonics Symp., 1998, pp. 1,885–1,888. 60. R. C. Bray and C. F. Quate, Thin Solid Films 74, 295–302 (1980). 61. C. C. Lee, C. S. Tsai, and X. Cheng, IEEE Trans. Sonics Ultrasonics SU-32(2), 248–258 (1985). 62. R. C. Addison, M. C. Somekh, J. M. Rowe, and G. A. D. Briggs, L. A. Ferrari, ed., SPIE 768, 275–284 (1987). 63. R. D. Weglein, IEEE Trans. Sonics Ultrasonics SU-30, 40–42 (1983). 64. I. Ishikawa, H. Kanda, and K. Ktakura, IEEE Trans. Sonics and Ultrasonics SU-32(2), 325–331 (1985). 65. C. S. Tsai and C. C. Lee, Proc. SPIE 768, 260–266 (1987). 66. J. Onuki, M. Koizumi, and I. Ishikawa, Materials Trans. JIM 37(9), 1,492–1,496 (1996). 67. I. Ishikawa, H. Kanda, K. Ktakura, and T. Semba, IEEE Trans. Ultrasonics Ferroelectrics Frequency Control UFCC36(6), 587–592 (1989). 68. R. L. Hollis, R. Hammer, and M. Y. Al-Jaroudi, Appl. Phys. 54, 7,016–7,019 (1984). 69. M. Hoppe and J. Bereiter-Hahn, IEEE Trans. SU-32, 289–301 (1985). 70. J. Y. Duquesne et al., Mat. Res. Soc. Symp. Proc. 142, 253–259 (1989). 71. A. F. Fagan, J. M. Bell, and G. A. D. Briggs, in A. C. RoulinMoloney, ed., Fractography and Failure Mechanisms of Polymers and Composites, Elsevier Applied Science, London, 1989, pp. 213–230. 72. A. M. Sinton, G. A. D. Briggs, and Y. Tsukahara, in H. Shimizu, N. Chubachi, and J. Kushibiki, eds., Acoustical Imaging, 17th ed., Plenum, NY, 1989, pp. 87–95. 73. R. C. Addison, M. W. Kendig, and S. J. Jeanjaquet, in H. Shimizu, N. Chubachi, and J. Kushibiki, eds., Acoustical Imaging, 17th ed., Plenum, NY, 1989, pp. 143–152. 74. T. Adachi, M. Okazaki, S. Ujihashi, and H. Matsumoto, JSME Int. J. Series A 38(3), 370–377 (1995). 75. H. Morita, T. Adachi, and H. Matsumoto, J. Reinforced Plastics Composites 16(2), 131–143 (1997). 76. N. Takeda, C. Miyasaka, and K. Nakata, Proc. 5th Int. Symp. Nondestructive Characterization Mater. Karuizawa, 1991, pp. 813–824. 77. K. Yamanaka, Appl. Phys. 24, 184–186 (1984).
1247
78. A. Okada, C. Miyasaka, and T. Nomura, JIM 33(1), 73–79 (1992). 79. B. Nongaillard et al., NDT INT. 19(2), 77–82 (1986). 80. R. D. Weglein, Acousts Imaging 17, 51–59 (1988). 81. H. Vetters, A. Meyyappan, A. Schlz, and P. Mayr, Mater. Sci Eng. A122, 9–14 (1989). 82. J. Hahn-Bereiter, C. H. Fox, and B. Thorell, J. Cell Biol. 82, 767–779 (1979). 83. J. A. Hildebrand and D. Rugar, J. Microsc. 134, 245–260 (1984). 84. N. Chubachi et al., Acoust. Imaging 16, 1–9 (1987). 85. J. Bereiter-Hahn et al., Acoust. Imaging 17, 27–38 (1988). 86. N. Akashi, J. Kushibiki, N. Chubachi, and F. Dunn, Acoust. Imaging 17, 183–191 (1989). 87. J. Litniewski and J. Bereiter-Hahn, J. Microsc. 158, 95–107 (1990). 88. J. Bereiter-Hahn and H. Luers, 53(Suppl. 31), 85 (1990).
Eur.
J.
Cell
Biol.
89. H. Luers, K. Hillmann, J. Litniewski, and J. Bereiter-Hahn, Cell Biophys. 18, 279–93 (1992). 90. P. A. N. Chandraratna et al., Am. Heart J 124, 1,358–1,364 (1992). 91. G. A. D. Briggs, J. Wang, and R. Gundle, J. Microsc. 172, 3–12 (1993). 92. J. Bereiter-Hahn and H. Luers, in N. Akkas, ed., Mechanics of Actively Locomoting Cells, ASI series 84, Springer, Heidelberg, New York, Berlin, 1994, pp. 181–230. 93. P. A. N. Chandraratna et al., Acous. Imaging 21, 559–564 (1995). 94. T. Kundu, J. Bereiter-Hahn, and K. Hillmann, Biophys. J. 59, 1,194–1,207 (1991). 95. J. Kushibiki, T. Ishikawa, and N. Chubachi, Appl. Phys. Lett. 57(19), 1,967–1,969 (1990). 96. T. Endo, C. Abe, M. Sakai, and M. Ohono, Proc. Ultrasonic Int. 1993 Conf., pp. 45–48. 97. D. Achenbach, J. D. Kim, and Y. C. Lee, in A. Briggs, ed., Advances in Acoustic Microscopy, 1st ed., Plenum, NY, 1995, pp. 153–208. 98. S. Parthasarathi, B. R. Tittmann, and R. J. Ianno, Thin Solid Films 300, 42–52 (1997). 99. R. D. Weglein, IEEE Trans. Sonics Ultrasonics SU-27(2), 82–86 (1980). 100. J. Kushibiki and N. Chubachi, Electron. Lett. 23(12), 652–4 (1987). 101. Y. Sasaki, T. Endo, T. Yamagishi, and M. Sakai, IEEE Trans. UFFC-39(5), 638–642 (1992). 102. R. D. Weglein, IEEE Trans. Sonics Ultrasonics SU-32(2), 225–234 (1985). 103. R. D. Weglein and A. K. Mal, Surf. Coat. Technol. 47, 667–686 (1991). 104. J. Kushibiki, M. Miyashita, and N. Chubachi, IEEE Photonics xTechnol. Lett. 8(11), 1,516–1,518 (1996). 105. J. Kushibiki, T. Kobayashi, H. Ishiji, and N. Chubachi, Appl. Phys. Lett. 61(18), 2,164–2,166 (1992). 106. J. Kushibiki and M. Miyashita, Jpn J. Appl. Phys. 36(7B), 959–961 (1997). 107. T. Narita, K. Miura, I. Ishikawa, and T. Ishikawa, Jpn. Inst. Met. 1,142–1,146 (in Japanese) (1990). 108. S. Tanaka and C. Miyasaka, Residual Stress III, 1991, pp. 278–283.
1248
SCANNING ELECTROCHEMICAL MICROSCOPY
109. J. Kushibiki and N. Chubachi, IEEE Trans. Sonics and Ultrasonics SU-32(2), 189–212 (1985). 110. M. Obata, H. Shimada, and T. Mihara, Exp. Mech. 30, 34–39 (1990). 111. Y. C. Lee, J. D. Kim, and D. Achenbach, Ultrasonics 32(5), 359–365 (1995). 112. K. Yamanaka, Electr. Lett. 18(14), 587–589 (1982). 113. S. M. Gracewski, R. C. Waag, and E. A. Schenk, J. Acoust. Soc. Am. 83(6), 2,405–2,409 (1988). 114. Z. L. Li, IEEE Trans. UFCC-40(6), 680–686 (1993). 115. M. Okade and K. Kawashima, Proc. QNDE 14, 1,883–1,889 (1994). 116. J. Kushibiki and M. Arakawa, IEEE Trans. UFCC-45(2), 421–430 (1998). 117. B. T. Khuri-Yakub, P. Reinholdtsen, and C. H. Chou, IEEE 1985 Ultrasonics Symp., pp. 746–749. 118. P. Reinholdtsen and B. T. Khuri-Yakub, Proc. IEEE 1986 Ultrasonics Symp., pp. 759–763. 119. M. Ohno, Appl. Phys. Lett. 55, 832–823 (1989). 120. R. D. Weglein, Electron. Lett. 14(20), 656–657 (1978). 121. C. Iett, M. G Somekh, and G. A. D. Briggs, Proc. R. Soc. London A393, 171–183 (1984). 122. S. Kojima, Jpn. J. Appl. Phys. 26, 233–235 (1987). 123. K. Yamanaka and Y. Enomoto, J. Appl. Phys. 53, 84–50 (1982). 124. T. Ghosh, K. I. Maslov, and T. Kundu, Ultrasonics 35, 337–366 (1997). 125. M. Ohno, C. Miyasaka, and B. R. Tittmann, Wave Motion 33, 309–320 (2001).
SCANNING ELECTROCHEMICAL MICROSCOPY DAVID O. WIPF Mississippi State University Mississippi State, MS
INTRODUCTION Scanning electrochemical microscopy (SECM) is one of a number of scanned probe microscopy (SPM) techniques invented after the demonstration of the scanning tunneling microscope (1). The use of an electrochemical process for image formation defines SECM. In most applications of the method, an ultramicroelectrode (UME) is used as the probe, and the probe signal is the faradic current that arises from electrolysis of solution species (2,3). In other applications, using an ion-selective electrode (ISE) as the probe provides a probe signal proportional to the logarithm of the activity of an ion in solution (e.g., pH). In SECM, the primary interaction between probe tip and sample is mediated by diffusion of solution species between the sample and the tip of the probe, this distinguishes SECM from other SPM methods that may use an electrochemically active probe. An electrochemically active probe permits a versatile range of experiments, whose essential aspect is chemical sensitivity or control of chemical processes that occur at a substrate surface (4–6). Forming an image in an SPM technique requires reproducibly perturbing the probe signal by some aspect of the
imaged substrate. There are two principal image-forming modes in SECM: feedback and generation/collection (GC). The feedback mode uses the faradaic current that flows from electrolysis of an intentionally added or naturally present mediator species (e.g., Ru(NH3 )6 3+ or O2 ) at an UME probe (4,7,8). Far from the substrate surface, the electrolytic current assumes a characteristic steady-state value, but moving the probe close to the surface disturbs the current. An electrochemical, chemical, or enzymatic reaction at the substrate can restore the mediator to its original oxidation state and produce a positive-feedback signal. The cartoon representation in Fig. 1a illustrates the process where an oxidized mediator undergoes reduction at the probe tip, diffuses to the substrate surface, and then is reoxidized by the substrate. The regeneration of the mediator in the tip–substrate gap causes the probe current to increase as the separation decreases and suggests the term feedback as a description. In contrast, negative feedback occurs when the substrate surface cannot regenerate the mediator and mediator diffusion is blocked. Depletion of mediator by tip electrolysis in the tip–substrate gap causes the probe signal to decrease as the gap width decreases (Fig. 1b). Imaging in the feedback mode provides topographic images of blocking or conducting surfaces. Images of many types of surfaces have been obtained in the feedback mode, including images of electrodes (9–11), polymer films (12–16), and immiscible liquid interfaces (17). It is possible to manipulate the SECM imaging conditions to produce images that represent chemical and electrochemical activity. In the reaction-rate feedback mode (Fig. 1c), the overall mediator turnover rate at the substrate is limited by the rate of substrate–mediator electron transfer (ET) (18–20). Thus, the probe current signal contains kinetic and tip–substrate separation information. Images in the reaction-rate mode can be used to map variations in ET rate between the mediator at metallic electrodes (18,21) and enzymes in biological materials (22). The generation/collection (GC) mode uses the probe to detect changes in the concentration of a chemical species at the surface of the imaged material (Fig. 1d). Ideally, the probe acts as a passive sensor to produce concentration maps of a particular chemical species near the substrate surface. GC mode imaging is described further by the type of sensing probe used. In amperometric GC imaging, the probe is an UME, that detects species by electrolysis. Amperometric GC, first reported as a method for mapping electrochemically active areas on electrodes (23), is used to make high-resolution chemical concentration maps of corroding metal surfaces (24–28), biological materials (29–33), and polymeric materials (15,34,35). In addition, measurements of ion fluxes through porous materials (36,37) such as skin (38–40) and dental material (37,41,42) are useful applications. In potentiometric GC imaging, the probe is an ISE, which has the advantage of increased sensitivity to nonelectroactive ions and improved selectivity for imaging a desired ion concentration. The literature contains descriptions of tip electrodes and experiments for measuring of many types of ions such as H+ (i.e., pH) (43–45), K+ (46), NH4 + , and Zn2+ (45).
SCANNING ELECTROCHEMICAL MICROSCOPY
Ultramicroelectrode tip
(a) Ox
Ox
Ox
Ox Ox
Ox
Ox Red
Ox
Ox
Conductive substrate Positive-feedback mode Ultramicroelectrode tip
(b) Ox
Ox
Ox
Ox Ox Ox Red
Ox
Ox
Ox
Blocking substrate Negative-feedback mode Ultramicroelectrode tip
(c) Ox
A description of the electrochemical behavior of the UME tip is required to understand the probe signal in the SECM feedback or amperometric GC mode. Voltammetry or amperometry is the basis of a large number of electrochemical techniques in which a potential is applied between an auxiliary and working electrode (47). Control of the potential is aided by a third, reference, electrode that maintains a constant reference voltage. In either voltammetry or amperometry, the signal is the faradaic current that flows as solution species are oxidized or reduced at the working electrode. A cyclic voltammogram (CV) illustrates this at an embedded-disk UME working electrode (Fig. 2, inset). The CV wave shown is a plot of the reduction current versus applied potential for a commonly used mediator ion, Ru(NH3 )6 3+ , at a 10-µm diameter tip electrode in unstirred solution. The reduction current reaches a limiting current value at potentials negative of the reduction potential, and the forward curve is retraced as the tip electrode potential is swept back to the starting potential. The current limit iT,∞ occurs when the rate of mediator diffusion to the electrode surface is at maximum velocity (i.e., when the mediator concentration is zero at the electrode surface) and, for a microdisk electrode embedded in an insulator, is given by the following equation (3): iT,∞ = 4 nFDCa
Ox
Ox
Ox Ox
Ox Ox
Red
Ox
1249
Ox
(1)
where F is the faraday, C and D are the mediator concentration and diffusion coefficients, n is the number
k 6
Substrate
Positive-feedback
Reaction-rate mode
5
Ultramicroelectrode tip
(d)
1 nA
4
IT
I T,∞
P
P
P
Q P
3 −0.2 E (V vs Ag/AgCl)
0.0 2
−0.4
I T,∞ 1 Generation/collection mode Figure 1. Schematic diagram of the image modes in SECM.
Negative feedback
0 0
1
2
3
4
5
L
SIGNAL TRANSDUCTION The probe in SECM is an electrode in an electrolyte solution, and the tip electrode response can be quantitatively described under most measurement conditions. The most well-understood response arises from a probe geometry where disk is embedded in an insulating plane. Other tip electrode geometries are also used, but the experimental data are generally treated semiquantitatively or empirically.
Figure 2. Theoretical ( ) and experimental current vs. distance curves for positive () and negative (•••) feedback using an embedded-disk tip electrode. Experimental conditions for positive feedback: 2 mM Ru(NH3 )6 3+ , pH 4.0 phosphate-citrate buffer, 10-µm diameter Pt tip, Pt substrate, tip potential (Etip ) is −0.3 V vs. Ag/AgCl reference, and substrate potential (Esub ) 0.0 V vs. Ag/AgCl. Negative feedback conditions are as in positive feedback except for use of a microscope slide substrate. Experimental current data are normalized by dividing by iT,∞ , and distance scale is divided by the tip radius. (Inset) cyclic voltammogram for a 10-µm diameter Pt disk in 2 mM Ru(NH3 )6 3+ , pH 4.0 phosphate-citrate buffer at a scan rate of 100 mV/s.
1250
SCANNING ELECTROCHEMICAL MICROSCOPY
of electrons transferred in the tip electrode reaction, and a is the disk radius. A characteristic of an UME is that the region of solution perturbed by diffusion is much larger than the electrode itself. This results in an efficient form of diffusional transport where the diffusion field assumes a nearly hemispherical zone around the electrode (3) and is the basis for the feedback mode of SECM imaging.
In the feedback mode, the tip electrode potential is adjusted so that the electrode current for a mediator is at the limiting current value. As the UME probe approaches the surface, the hemispherical diffusion field of the mediator is disturbed, and the imaging signal iT changes from that observed at infinite distance (i.e., iT,∞ ). Two limiting cases, referred to as positive or negative feedback, are observed when the substrate either permits or blocks, respectively, ET to the mediator. Although no physical contact between probe and sample occurs in normal feedback imaging, a chemical interaction occurs because a small region of the substrate is subjected to a chemical environment different from the bulk. Imaging conditions are tuned by choosing a mediator that will produce a desirable interaction between the probe and substrate. However, a mediator must be chosen to avoid undesirable interactions with the sample, such as destruction of a sample by an oxidizing mediator. The relationship between probe current and tip–surface separation can be quantitatively calculated for tip electrodes in the shapes of embedded disks (8), cones, and spherical sections (11,48–50). An approximate analytical equation of the current–distance relationship is available for an embedded-disk probe (49,51). For positive feedback, 0.78377 −1.0672 + 0.68. + 0.3315 exp L L
(2)
The probe current and distance are presented in normalized form: IT = iT /iT,∞ and L = d/a is the normalized distance d between the probe tip electrode and substrate. For negative feedback, IT (L) =
(b)
Microdisk tip
1 . (3) 0.292 + 1.515/L + 0.6553 exp(−2.4035/L)
Comparing I –L curves between experimental and theoretical data for disk-shaped electrode probes generally produces excellent agreement, and curve fitting can be used to verify proper instrument operation. Figure 2 is a plot of a portion of the theoretical I –L curve compared to experimental data recorded at a 10-µm diameter microdisk probe. Note the increased distance sensitivity in positive feedback compared to negative feedback. Probe geometry strongly influences the probe current in the feedback mode experiment (52). For accurate measurements, the probe generally has the shape of a truncated cone. Insulating material (e.g., glass or polymer) forms the shape of the cone, and the disk electrode is located in the center of the truncated end (Fig. 3). This shape allows a closer probe tip approach to the sample surface than would be otherwise possible with
Etched tip
Connecting wire Glass capillary
Electrode shank
Silver epoxy
Feedback Mode
IT (L) =
(a)
Microwire
Polymer or wax coating Exposed tip Bottom view
Side view
Figure 3. Schematic diagram for probes that have (a) an embedded disk and (b) a conical (etched) electrode geometry.
a large insulating radius. A truncated cone tip gives a different I –L curve than, say, a hemispherical tip. Thus, no single equation for an I –L curve includes all geometric possibilities. Equations (2) and (3) are strictly valid only for disk electrodes whose insulating radius is 10 times the electrode radius (a common configuration). For negative feedback, a smaller or larger insulating radius causes the probe I –L curve to descend less or more steeply, respectively (52,53) because both the sample surface and the insulating material provide the mediator blocking necessary for negative feedback. For example, at L = 1, IT = 0.534, 0.302, and 0.213 for insulator-to-electrode radius ratios of 10, 100, and 1,000, respectively (8). The effect of insulating radius is less critical in positive feedback because the probe current is mainly due to regeneration of the mediator in the region of the substrate directly below the probe tip. Probes that have a tip electrode size less than 1 µm are easier to make in a cone or spherical-segment (e.g., hemispherical) geometry (10,49,54–56). Unfortunately, the I –L sensitivity of these probes in the feedback mode is much less than that of the disk-shaped electrodes. Further, most construction methods for very small probes (e.g., electrochemically etched tips) do not produce a reproducible tip electrode shape (57). Good estimates of the tip electrode shape can be made by fitting the experimental to simulated I –L curves (49), but it is unlikely that fitting will be used for general imaging purposes. Thus, at present, non-disk-shaped tip electrodes should be used only to provide qualitative SECM feedback images. Rastering the probe across the sample surface and recording the probe current produces topographic images of surfaces in the feedback mode. Both resolution and sensitivity increase with a decrease in tip–sample distance. Typically, the probe tip is positioned within one tip electrode radius of the sample. Because the probe response is different for insulating (blocking) and conducting surfaces, areas of both types can be
SCANNING ELECTROCHEMICAL MICROSCOPY
distinguished in feedback mode imaging. Conducting areas of a surface have a positive-feedback response and are identified by currents larger than iT,∞ ; insulating areas have a negative-feedback response when tip currents are less than iT,∞ . Note that the nonlinear I –L relationship distorts the relationship to the actual topography of the images made in the feedback mode. However, a true topographic image can be made from the current image by employing theoretical I –L equations [e.g., Eqs. (2) and (3)] (7,58). The feedback image in Fig. 4a illustrates the concept of positive and negative feedback in a test image taken of an interdigitated electrode array. Au electrodes 3 µm wide are separated by 5-µm wide SiO2 insulators. The primary contrast is the transition from positive to negative feedback; the Au electrodes exhibit positive feedback and the largest current. Topographic features such as damage to the Au electrodes are visible as slight depressions on the second and third band from the right. Lateral and vertical resolution in the feedback mode depends on the probe electrode size and tip–sample separation (52,59). The ability to discriminate between insulating and conducting regions can be used as one estimate of lateral resolution. Scanning the probe over a conducting–insulating boundary ideally produces an instantaneous change in probe current (from positive to negative feedback). Experimentally, the change occurs across a region of finite width x. For zero tip–sample separation (L = 0), x equals the tip electrode diameter. For larger separations, the lateral resolution degrades as diffusion broadens the mediator concentration profile. Unfortunately, no comprehensive theoretical answer for lateral resolution has been published. An experimental study reported that the resolution function has the form (9) x/a = 4.6 + 0.111 L2 .
(4)
1251
This equation is probably overconservative because the best resolution is 2.3 times the tip electrode diameter, which is contrary to intuition and other published data [see (60) for resolution superior to that predicted by Eq. (4)]. However, the functional form of resolution by distance seems more reasonable. Regions smaller than a tip electrode diameter cannot be resolved, but small active regions embedded in an insulating matrix can be recognized if the regions are separated by at least a tip electrode diameter. The vertical resolution of the SECM depends primarily on the probe tip–sample separation and the dynamic range of the probe current measurement. Equations (2) and (3) allow estimating the minimum resolvable vertical change in distance at a disk-shaped probe tip electrode. For example, a 1% change in I occurs from a change from L = 1 to L = 1.0137. If this difference in current is measurable, it corresponds to a theoretical 0.068-µm change at a 5-µm radius tip electrode and 5-µm tip–substrate separation or a theoretical 0.014-µm change at a 1-µm radius tip electrode and a 1-µm separation. Increasing the resolution of SECM feedback images is, in principle, possible by removing diffusional blurring by applying digital image processing techniques. Unfortunately, the nonlinear response in the SECM feedback mode has prevented a general formulation of a deconvolution function. It has been shown that approximate deblurring functions improve image contrast (61,62). The ultimate resolution in SECM may have been demonstrated by Fan and Bard who reported a resolution approaching 1 nm for imaging biological molecules (63). To achieve this high resolution, they used an etched tungsten tip electrode and a very thin layer of water naturally present on the sample in a humid atmosphere. The thin layer of solution minimizes the blurring effect of diffusion. Reaction-Rate Mode
(a)
1 nA
50 pA (b)
y (µm)
40 20 0 0
20 x (µm)
40
Figure 4. SECM images of an interdigitated array electrode that consists of 3-µm wide Au electrodes separated by 5-µm wide strips of SiO2 insulator. Images acquired by using a 1.3-µm tip–substrate distance at a scan rate of 5 µm/s. Images are presented as shaded surface plots and larger tip currents appear as higher points on the image. (a) Feedback image experimental conditions: 2 mM Ru(NH3 )6 3+ , pH 3.2 phosphate-citrate buffer, 2-µm diameter Pt tip, Etip = −0.3 V vs. Ag/AgCl reference, Esub = 0.0 V, cathodic currents are positive. (b) GC image, experimental conditions: as in (a) except Etip = 0.0V, Esub = −0.090 V, and anodic currents are positive.
The general case of feedback imaging, known as reactionrate imaging, exists when the rate of ET at the probe tip electrode or sample is considered. Positive and negative feedback are limiting cases that occur when the ET rate is at the diffusion limit or zero, respectively. Between those limits, the I –L curves change from curves similar to positive feedback to what appears to be negative feedback. Note that deviations from the theoretical positivefeedback curve may arise even when mediators that have very rapid ET kinetics are used. This is made clear by considering that the SECM response for a disk-electrode tip at very small tip–sample separations is approximately that of a twin-electrode, thin-layer cell (64,65). For this configuration, the probe current is iT = 4 nFDCA/d.
(5)
where A is the tip electrode area. The very small separations in SECM produce impressively large values of the mass transfer coefficient D/d (47). As an example, for d = 0.2 µm and D = 1 × 10−5 cm2 s−1 , the mass transfer coefficient is 0.5 cm s−1 . Heterogeneous rate constants for mediator tip electrodes or substrate ETs must be about 10 times larger to preserve the theoretical response
SCANNING ELECTROCHEMICAL MICROSCOPY
Generation/Collection Mode Using the GC mode is most appropriate when the sample itself produces an electrochemically detectable material. GC is ideally a passive mode where the probe is used as a microscopic chemical sensor. Examples include using the GC mode to investigate corroding metal surfaces (25,70), ionic and molecular transfer through porous material (71–73), and oxygen generation and
150
y (µm)
for positive feedback. Thus, the SECM can measure or detect change in heterogeneous rates for ET reaction at the tip electrode or the substrate, as long as the opposing electrode is not allowed to limit the overall reaction. This is not difficult for mediators with reasonably rapid ET rates because the opposing substrate or tip electrode can be poised at a potential sufficiently past the oxidation or reduction potential to drive the reaction to its diffusion-controlled limits. A beneficial consequence of rapid diffusional transport in the positive-feedback mode is that the probe is less sensitive to convection caused by, for example, probe movement during SECM imaging. Because of the high mass-transport rate and the ease of making measurements at steady state, the SECM is a very promising method for fundamental investigation of rapid interfacial ET rate constants. Calculations suggest that it is possible to use the SECM tip electrode to measure a rate constant as high as 10–20 cm s−1 under steady-state conditions (66). When the mediator and sample are chosen to produce a kinetic limitation of the feedback process, a number of interesting experiments become possible. Calculations of I –L curves for quasi-reversible and irreversible ET kinetics are available (67–69) that permit quantitative measurements of ET rates at micron-sized regions of surfaces. At electronically conducting surfaces, kinetic information can be extracted in the tip-sized regions that allow direct imaging of active or passive sites on surfaces. For example, greater ET activity is easily observed on the Au region of a composite carbon/Au electrode (18). Feedback current also supplies potential independent kinetic information for chemical reactions that involve ET at the interface such as immobilized enzymatic activity or oxidative metal dissolution. An example of a reaction-rate image is presented in Fig. 5. The SECM was used to prepare a pattern of oxidized regions on a glassy-carbon electrode using a direct-mode oxidation process (see later). Reduction of Fe3+ ions is catalyzed on oxidized carbon regions and the reactionrate image is based on the increased reaction rate on oxidized carbon. Successive images of the same region were collected when the carbon electrode potential was at −200 (A), −400 (B), −600 (C), and −800 mV (D). At potentials less negative than −800 mV, the ‘‘face’’ pattern appears on the image due to a higher rate of ET in the oxidized regions compared to the native carbon. At −800 mV, the pattern disappears because the rate of ET in all regions of the carbon surface is at the diffusion limit and the contrast disappears. Note that topography is not the predominant contrast mechanism in the images, although a small pit in the electrode surface is clear in (D) and less so in the other images.
100 50 0 150
y (µm)
1252
S
100 50 0
0
50
100 x (µm)
150 0
50
100 x (µm)
150
Figure 5. Reaction-rate SECM images of an oxidized carbon pattern on a glassy-carbon electrode. Imaging conditions: 10-µm Pt tip, Etip = 700 mV vs. Hg/Hg2 SO4 reference, 2 mM Fe2+ /1 M H2 SO4 , 2.5-µm tip-substrate distance, scan rate of 10 µm/s. Images are presented as gray-scale plots and larger cathodic tip currents are lighter shades. (A) Esub = −200 mV, S = 3.6–5.0 nA. (B) Esub = −400 mV, S = 5.0–7.3 nA. (C) Esub = −600 mV, S = 6.4–8.7 nA. (D) Esub = −800 mV, S = 7.7–9.3 nA. Conditions for SECM production of oxidized pattern: 10-µm Pt tip, distilled water solution, tip–substrate separation <1 µm, 0.46 µA is passed between tip (cathode) and substrate for 5 s to produce spots or while moving the tip at 2 µm/s to produce linear features.
uptake in living material (29,31,32). The GC mode has several advantages over the feedback mode: undesirable chemical interactions between mediator and sample are eliminated, chemical species produced by the sample can be identified by voltammetry, and both potentiometric and amperometric tip electrodes can be used to extend the range of detectable chemical species. The major disadvantages of GC imaging are poor vertical sensitivity and lateral resolution. Unlike the feedback mode, there is not an artificially maintained diffusion gradient between probe tip and sample, and thus the concentration gradient of the sample-generated species may extend out to several hundred micrometers (74). In addition, the gradient is ill defined because natural convection, vibration, and probe movement can perturb the diffusion-based concentration profile. Unlike the feedback mode, the probe signal changes little near the surface, and this causes difficulty in locating the tip at the surface. Additional equipment such as an optical microscope is often required for positioning. Diffusional broadening also compromises lateral resolution. Consider that a species produced at a point on the surface will diffuse, on average, a distance x during time t: x=
√ 2Dt.
(6)
Thus, for example, sources smaller than 10 µm produce a diffusion cloud larger than themselves at times greater than 0.1 s. Closely spaced features overlap, and a diffuse background signal interferes with image interpretation.
SCANNING ELECTROCHEMICAL MICROSCOPY
Despite its poorer resolution, GC imaging is a popular method. The increased chemical sensitivity and selectivity are advantages. In addition, there is no sensitivity penalty for conical or hemispherical tip electrodes, so probe preparation is simplified. Techniques to improve resolution have also been developed. Imaging when the probe is as close as possible to the sample or stirring the solution minimizes diffusional broadening effects. The use of chemical scavengers can also be effective in reducing the extent to which a species diffuses into solution. An example is the addition of a small concentration of oxidizing agent to a sample solution; a reduced species produced by the sample can diffuse only a short distance before being destroyed by the added oxidant. Adjusting the scavenger concentration produces a much smaller reaction zone than the normal diffusion zone and minimizes diffusional broadening. Although not a general solution to diffusional broadening, a number of scavenging agents have been used or proposed: pH buffers, complexants, redox agents, and enzymes. Recent studies indicate that scavenging agents can also improve resolution in feedback imaging (a chemical lens) (75). The GC image in Fig. 4b can be compared to the image of the identical region acquired in feedback mode (Fig. 4a). The 3-µm wide Au electrodes of the substrate were polarized to produce a small amount of Ru(NH3 )2+ 6 ion in solution. The probe electrode by reducing the Ru(NH3 )3+ 6 potential is set to oxidize Ru(NH3 )2+ 6 , and thus the image shows regions of higher Ru(NH3 )2+ 6 ion concentration at the Au electrodes. The resolution in this image is similar to the feedback image primarily due to the close tip–substrate spacing (about 1.3 µm) during imaging. The potentiometric GC mode relies on an ISE as the probe. The main advantage of the potentiometric GC mode is that the probe is a passive sensor of the target ion activity and does not cause the depletion or feedback effects noted before. In addition, the probe can be used to detect ions such as Ca2+ or NH4 + that would be difficult or inconvenient to detect voltametrically. The selectivity of the tip electrode for the ion of interest is an advantage, but this requires separate probes for each species. The response of an ISE is, in general, a voltage proportional to the logarithm of the ion’s activity, but special consideration must be made for potentiometric GC to provide a rapid response when using microscopically small tip electrodes (76). INSTRUMENTATION The scanning electrochemical microscope (also abbreviated SECM) requires a probe positioning system, instrumentation to generate and measure the probe signal, and a data recorder. Many investigators have built their own SECM instruments; however, a commercial instrument designed specifically for SECM is now available from CH Instruments (Austin, TX). A schematic overview of an SECM is shown in Fig. 6. The probe tip is attached to a three-axis positioner. The type of positioner used in the SECM is determined mainly by the size of the probe used. Positioners for probes whose tips are smaller than about 1 µm are generally based on designs used
1253
Monitor Personal computer Tip 3-axis piezo translator and Controller Optical window Video microscope
Reference Auxiliary
Bipotentiostat y Electrolyte solution
Substrate
Figure 6. Schematic diagram of the SECM instrument.
in scanning tunneling or force microscopes, which use piezoelectric tube actuators. These positioners allow precise movements on the nanometer scale and are small, so that they can be placed in small, stiff mounts. This, in turn, makes these positioners less sensitive to positioning errors due to mechanical vibration and thermal drift. Most SECM experiments, however, have used probes whose tip electrodes are larger than 1 µm, and, piezotube positioners are unsuitable for these probes. A primary difficulty is that the larger size and mass of the probe is not suited to the piezotube positioner. In addition, the scan size available when using a piezotube positioner is not larger than 100 µm. Stepper-motor positioners are used for larger probes. The positioners in many of the homebuilt instruments and in the commercial example are based on a unique piezoelectric positioner, the Inchworm from Burleigh Instruments (77). These motors can produce probe scan rates greater than 100 µm/s across a 2.5-cm range of motion at a minimum resolution of less than 100 nm. Thus, these motors can act as both coarse and fine positioners. Many positioning motors operate in an open-loop configuration, and operators rely on frequent calibrations for accurate positioning. Some homebuilt instruments use closed-loop positioners that rely on integrated position encoders to provide superior positioning accuracy (at a higher price). A potentiostat is required for feedback and amperometric GC imaging, to control the probe electrode potential and to amplify the probe current. The use of a bipotentiostat, which allows simultaneous control of two working electrodes versus common reference and auxiliary electrodes, is convenient when controlling the sample potential is also required. Alternately, a second potentiostat can be used to control a sample electrode. The probe (and, optional, sample) potentiostat should be sufficiently sensitive to allow measuring of the low current flow at an UME, which may be in the pA range. Many older potentiostat instruments are not suitable without some user modification to decrease noise and increase sensitivity. A Faraday cage around the SECM cell and probe to reduce environmental noise is advisable. For potentiometric SECM operation, an electrometer is required to buffer the high impedance of the microscopic tip electrodes.
1254
SCANNING ELECTROCHEMICAL MICROSCOPY
A personal computer is used to record and display images and to control the probe positioner. Homebuilt instruments rely on custom software for programming the probe movement and displaying the probe signals in realtime (4,78). A video microscope system is often useful to assist in positioning the probe at the surface or at sample location. When a video microscope is used, the sample cell incorporates a window to allow direct observation of the tip–sample approach. Probes It is not an exaggeration to say that the most critical component of the SECM instrument is the probe. Two common probe types, the microdisk and etched tip electrode, are shown in Fig. 3. Microdisk electrodes are prepared by sealing a microscopic metal wire or carbon fiber in glass or epoxy and polishing or cutting to expose a disk embedded in insulator (3). These electrodes are tapered by carefully grinding away the surrounding insulator to produce a tip whose insulator diameter is no larger than about 10 electrode diameters. Alternately, cone-shaped tips are made by electrochemically etching metal wire or carbon fibers and then coating them with Apiezon wax or other polymers (49,54,56). The surface tension of the coating prevents coating the sharp tip apex. Conical tip electrodes are also available commercially from electrophysiology suppliers such as FHC (Brunswick, ME) and World Precision Instruments (Sarasota, FL). Many potentiometric probes have been proposed for GC SECM. A well-established method is based on micropipets filled with neutral carrier membranes or ionophores. These tips can be made in submicrometer dimensions (76). The literature contains descriptions of probes and experiments used for measuring of pH (43–45,74), NH4 + , K+ (45), and Zn2+ (45,79). A pH-sensitive probe based on an antimony electrode (80) has the advantage that it can be converted back and forth in situ from a voltammetric electrode to a potentiometric electrode by oxidizing antimony to antimony oxide. The pH-sensitive antimony oxide surface allows operating in a potentiometric GC mode, and the antimony surface allows precision positioning in the feedback mode. Micropipets can also configured as voltammetric ionselective electrodes (ISEs) (81,82). In this configuration, an organic solvent plus electrolyte fills the pipet. Application of a potential transfers ions in or out of the organic phase (pipet) and into an aqueous phase that contains the substrate. The parallels between ion transfer at a voltammetric ISE and electron transfer at a metallic electrode are strong, and much of the theoretical results obtained at metal SECM tips can be applied to voltammetric ISE probes. Thus, feedback imaging is possible using electroinactive ions as mediators. The advantage is that micropipets can be made reproducibly small without significant expense. The organic phase contents necessarily vary depending on the ion used as mediator. For example, K+ ion electrodes whose radii are less than 1 µm were produced by filling borosilicate capillaries with 1,2-dichloroethane containing 10 mM valinomycin and 10 mM of the ionophore ETH 500 (83).
Alternately, crown ethers can be used to facilitate ion transfer (46,57). Constant-Separation Imaging Most SECM images are acquired in a constant-height mode in which the probe is scanned in a reference plane above the sample surface. Because the feedback mode requires probe tip–sample spacing of about one tip electrode diameter for good resolution, care must be taken to prevent probe crashes, when examining samples that have high relief or a tilt, particularly for small probes. Constant-separation (or constant-distance) imaging uses a servomechanical mechanism to keep the tip–substrate distance constant while imaging, thus ensuring uniform resolution throughout an image and preventing probe crashes. Although relatively straightforward in scanning tunneling or force microscopy where the magnitude of the probe signal is related to the separation, constantseparation imaging in SECM is more challenging. In GC mode, the probe signal does not indicate the tip–substrate separation. In the feedback mode, the signal depends on separation and also on the sample conductivity. For surfaces known to be completely insulating or conducting, implementing of a constant-current mode is straightforward and similar in methodology to STM (scanning tunneling microscopy) studies (1). For example, a piezoelectric element constantly repositions the probe to minimize the deviation of the probe current from an established reference current. However, when a surface has both insulating and conducting regions, this method cannot work. Any general method for controlling tip–sample separation requires more information than supplied by a probe signal. Although no completely satisfactory method is available for constant-separation imaging, a number of methods have had limited success. One method is to impose a small-amplitude (100 nm or less), high-frequency vertical modulation of the probe position (tip position modulation, TPM) during feedback imaging (84). The low-frequency normal signal is unaffected by the modulation, but the high-frequency ac current has a 180° phase shift when the probe is moved from an insulating to a conducting surface. This additional information is sufficient to set the proper probe motion and reference current level on the servomechanism. A second method, the picking mode, monitors a transient probe current caused by repeated, rapid up and down approaches over a large probe–substrate separation (85,86). The transient current is affected by tip–substrate separation but not by substrate conductivity and, thus, can be used to position the probe automatically. Perhaps the most general method is based on the shear-force mode developed initially for the near-field scanning probe microscope (NSOM). The shear-force method uses a small-amplitude lateral probe vibration. As the probe tip approaches the sample, hydrodynamic forces act to damp the vibratory motion. Using a laser (87) or a tuning-fork sensor (59,88), the vibration intensity can be monitored and used to maintain constant separation. Because a specific electrochemical response is not required, the shear-force method can be suitable for either GC or feedback imaging. The relative
SCANNING ELECTROCHEMICAL MICROSCOPY
newness of the method, the requirement for special probe sizes, and the exacting experimental technique have so far, limited widespread adoption of the shearforce mode.
1255
oxidation at oxide-free (or thin oxide) locations. The SECM images at different tantalum electrode potentials show that oxide formation is not uniform and that active spots form and passivate at different spatial locations and electrode potentials. This behavior is not predicted from the voltammetric behavior of the electrode (21,94). The SECM can also be used to investigate corrosion by using the probe to generate aggressive conditions in a small region of solution near a metal surface. Electrochemical generation of Cl− ion by the SECM probe can generate localized pitting corrosion on iron, steel, and aluminum metal (25,95,96). In a related application, controlled dissolution of material is an area in which the ability of the SECM to produce a defined chemical environment near the material surface provides a significant advancement. Rates of dissolution of even very soluble materials can be obtained by using the electrolysis at the probe tip electrode to produce a local undersaturation of the dissolving material. Quantitative details about the rate and mechanism of the dissolution reaction are available from the probe current as a function of time and distance to the sample (97). Examples include the dissolution of the (010) surface of potassium ferrocyanide trihydrate (58), AgCl (98), Cu (99), and Au (100) crystallites.
APPLICATIONS Corrosion Science SECM is increasingly valuable in investigating corrosion processes on metallic samples. Voltammetric or amperometric GC SECM can produce concentration map images of electroactive species produced or depleted on corroding surfaces. Examples from the literature include Fe ions (25); O2 , Co, and Cr ions (70); S (24); and Cu and Ni ions (89). Alternatively, the use of potentiometric GC SECM images of corroding surfaces can yield information about local pH values. A pH-sensitive probe that employed a neutral-charge-carrier ionophore imaged pH changes that had a sensitivity of less than 0.1 pH units above AlFe3 and MnS inclusions in aluminum and steel alloys, respectively (44,90,91). SECM images of passive metal surfaces can be used to preidentify microscopic sites of future corrosion. Corrosion precursor sites on titanium were detected by their ability to oxidize Br− ions to Br2 in solution. These microscopic active sites were identified by SECM maps of Br2 concentration near the titanium surface (28,92,93). Pit formation, it was subsequently found, coincided with Br2 forming regions. The power of the SECM method to examine metal oxide growth is evident in Fig. 7 (94). An SECM image of a tantalum electrode in the presence of I− is featureless at low potentials; however, raising the electrode potential to produce an oxide film generates a current that corresponds to I− and tantalum oxidation. The numbered peaks that appear in the SECM image are regions of high I3 − concentration, produced by I−
Biological and Polymeric Materials The two broad interests in SECM of biological and polymeric materials are investigation of transport of molecules or ions across interfaces and observation of chemical and biological reactions on a substrate. The activity of enzymes immobilized on solid or gel supports is readily observed by using the GC mode to image the products of the enzymatic reaction (33,101), and efforts have been made to develop SECM-immunoassay techniques (102). Living tissue has also been examined by
Tip current (nA) 2
1.6
(c) 3
3
1
4 2
1.2
1
5
0.8
0.4
0
(d) 2
(e) 1
300 × 300 µm (b)
1
1 2
(f)
2 µA (a)
0.01 M KI and 0.1 M K2SO4
0
0.5
(g) 0.1 M K2SO4
1 1.5 Potential (V vs Ag/AgCl)
2
2.5
Figure 7. SECM images of a 300 × 300-µm region of a Ta/Ta2 O5 oxide electrode in 10 mM KI/0.1 M K2 SO4 at different electrode potentials. The probe is an 8-µm diameter carbon electrode held at a potential of 0.0 V scanned at 10 µm/s and the tip–sample separation was about 5 µm. The numbered peaks in the SECM images correspond to active regions for oxidation of I− on the Ta/Ta2 O5 surface. The voltammetric curves for Ta oxidation with and without KI are shown for comparison. (Reprinted with permission from Anal. Chem., 71, 3,166–3,170 (1999). Copyright 1999 American Chemical Society.)
1256
SCANNING ELECTROCHEMICAL MICROSCOPY
imaging enzyme activity in mitochondria (22) and changes in O2 concentration or pH during metabolic activity or photosynthesis (29,32,103,104). Ion transport through pores or channels in biological membranes can be directly imaged with SECM. In most examples, quantitative models of the local transport properties are available from the data. A modeled system is the transport of ions and molecules through hairless mouse skin that has applications in transdermal drug delivery (38,71,105). Similarly, SECM was used to examine fluid flow through the dentine tubules of teeth (37,98). SECM is often used to examine the transport properties of ions at polymeric material interfaces, for example, ion ejection or uptake during oxidation and reduction of conducting polymer films (34,106–108). Figure 8 is a graphic illustration of an
A
A′
SECM image in a fundamental study of size-selective permeation of ions through a patterned, electropolymerized Fe(5-amino-1,10-phenanthroline)3 2+ film (35). Images 8A, B, and C are GC images of a region over four indium tin oxide (ITO) electrodes using three redox probes: Ru(NH3 )6 3+ , Fe(phenanthroline)3 2+ , and Fe(bathophenanthroline(SO3 )2 )4− whose diameters are ˚ respectively. Images 8A’, B’, C’ are 5.5, 13, and 24 A, for the same ions but use the ITO electrodes covered with a ∼12-nm layer of electropolymerized Fe(5-amino-1,10phenanthroline)3 2+ . Size-selective permeation is evident from the reduced signal of Ru(NH3 )6 3+ and nearly complete loss of the signal for the larger ions Fe(phenanthroline)3 2+ and Fe(bathophenanthroline(SO3 )2 )4− . The spatial uniformity of the polymer can also be assessed from the SECM images, and the effects of confounding phenomena such as pinholes or cracks in the film can be dismissed. An important application of SECM technology involves fundamental studies of charge or molecular transport across the interface of two phases such as two immiscible electrolyte solutions (ITIES) (45,81,109–112) or gas–liquid interfaces (113–115). These studies allow all electrodes to be present in one phase, simplifying the experiment. The important aspect of these experiments is that the SECM instrument can place the SECM probe very close to the interface. The SECM probe electrode can be used to access a wide range of mass-transfer coefficients (see earlier) in a small region of the interface without the complications that appear in conventional measurements such as high solution resistance and large interfacial capacitance (81,83,111,115). Non Imaging Applications
B
B′
C
C′
Figure 8. SECM images of a patterned array of 50-µm diameter ITO electrodes. The probe is an 8-µm diameter carbon-fiber electrode held at a potential of 0 V (vs. Ag/AgCl) scanning at a rate of 30 µm/s and had a 2-µm tip-sample separation. Left images (A, B, C) are scans of bare ITO. Right images (A’, B’, C’) are scans of the ITO covered by a ∼12-nm film of electropolymerized Fe(5-amino-1,10-phenanthroline)3 2+ . Conditions: (A, A’) 4.8 mM Ru(NH3 )6 3+ , Esub = −0.35 V, S = 4.3 nA; (B, B’) 5.1 mM Fe(phenanthroline)3 2+ , Esub = +0.5 V, S = 7.1 nA; (C, C’) 4.3 mM Fe(bathophenanthroline(SO3 )2 )4− , Esub = +1.0 V, S = 1.2 nA. Reprinted with permission from Anal. Chem. 72: 3,122–3,128 (2000). Copyright 2000 American Chemical Society.)
The utility of the SECM extends beyond imaging applications. The electrochemical probe lends itself to many nonimaging experiments that, in many cases, prove synergistic in combination with SECM imaging. A common application is to use the SECM microscope to modify surfaces locally. The microreagent mode uses the SECM tip electrode to produce, via electrolysis, a local chemical environment that is different from the bulk environment. Changes in pH, redox activity, ion activity, and ionic strength are possible. The area of the substrate exposed to the local chemical environment is only slightly larger than the tip electrode diameter, and the volume of solution between the probe and the substrate can be quite small. For example, a volume of only 25 fL exists in the gap formed by a 5-µm radius disk-shaped tip electrode separated from the substrate by 1 µm. Consequently, changing the local chemical environment within this small volume is possible on a millisecond time scale. Rapid diffusion of the probe-generated reagents outside the vicinity of the tip electrode ensures that the reaction remains limited to the tip electrode area. The other substrate modification method is the direct mode. In contrast to normal procedure in electrochemistry, which employs a large area auxiliary remotely positioned with respect to the working electrode, the direct mode of SECM employs the tip electrode as a microscopic
SCANNING ELECTROCHEMICAL MICROSCOPY
auxiliary electrode held at a micrometer distance from the sample electrode. Thus, faradaic current that would normally be distributed across the electrode is confined to a small region near the microauxiliary electrode (116). When the tip is positioned close to the substrate, any faradaic reaction at the substrate is limited to the region near the tip. Any electrochemical reaction that can be produced at the bulk surface can be performed, at least in principle, in a local region of the surface by using the direct mode. For example, metal etching, metal deposition, and polymer deposition are all possible in microscopic regions. Figure 5 illustrates direct mode oxidation of a carbon electrode. To form a pattern of oxidized spots and lines, a 10-µm diameter Pt tip is positioned close to the carbon surface, and 0.46-µA a current is passed between the tip (cathode) and carbon (anode) in distilled water. Reaction-rate imaging is then used to verify the presence of the oxidized region (see earlier). Acknowledgments The author acknowledges the National Science Foundation, the State of Mississippi, and the Mississippi EPSCOR program for support by grants DBI-9987028 and EPS-9874669. The assistance of Mario Alpuche Aviles, Robert Tenent, and Emad El-Deen ElGiar is gratefully acknowledged.
ABBREVIATIONS AND ACRONYMS A a Ag/AgCl C CV D d ET F GC Hg/Hg2 SO4 ISE iT IT IT (L) iT,∞
IT,ins (L) ITO k L n NSOM SCE SECM SPM STM
area electrode radius silver/silver chloride reference electrode concentration of redox species cyclic voltammetry diffusion coefficient of redox species distance between tip and substrate in SECM electron transfer faraday constant generation/collection mode of the SECM mercury/mercury(II)sulfate reference electrode ion-selective electrode steady-state limiting current at an SECM tip electrode normalized SECM tip current, iT /iT,∞ normalized SECM tip current at the normalized distance L steady-state limiting current at an SECM tip electrode at infinite tip-substrate separation normalized SECM tip current over an insulator at the normalized distance L indium tin oxide heterogeneous rate constant the normalized distance from tip to substrate, d/a in SECM number of electrons transferred in a redox event near-field scanning optical microscope saturated calomel electrode scanning electrochemical microscope or scanning electrochemical microscopy scanning probe microscopy scanning tunneling microscopy
TPM UME
1257
tip position modulation ultramicroelectrode
BIBLIOGRAPHY 1. G. Binnig and H. Rohrer, Helv. Phys. Acta 55, 726–735 (1982). 2. R. M. Wightman, Science 240, 415–420 (1988). 3. R. M. Wightman and D. O. Wipf, in A. J. Bard, ed., Electroanalytical Chemistry, vol. 15, Marcel Dekker, NY, 1989, pp. 267–353. 4. A. J. Bard, F. -R. Fan, and M. V. Mirkin, in A. J. Bard, ed., Electroanalytical Chemistry, vol. 18, Marcel Dekker, NY, 1994, pp. 244–370. 5. A. J. Bard and M. V. Mirkin, eds., Scanning Electrochemical Microscopy, Wiley & Sons, NY, (2001). 6. M. V. Mirkin and B. R. Horrocks, Anal. Chim. Acta 406, 119–146 (2000). 7. A. J. Bard, F. -R. F. Fan, D. T. Pierce, P. R. Unwin, D. O. Wipf, and F. M. Zhou, Science 254, 68–74 (1991). 8. J. Kwak and A. J. Bard, Anal. Chem. 61, 1,221–1,227 (1989). 9. G. Wittstock, H. Emons, M. Kummer, J. R. Kirchhoff, and W. R. Heineman, Fresnius’ J. Anal. Chem. 348, 712–718 (1994). 10. C. M. Lee, C. J. Miller, and A. J. Bard, Anal. Chem. 63, 78–83 (1990). 11. A. J. Bard, F. -R. F. Fan, J. Kwak, and O. Lev, Anal. Chem. 61, 132–138 (1989). 12. J. Y. Kwak, C. M. Lee, and A. J. Bard, J. Electrochem. Soc. 137, 1,481–1,484 (1990). 13. C. M. Lee and A. J. Bard, Anal. Chem. 62, 1,906–1,913 (1990). 14. I. C. Jeon and F. C. Anson, Anal. Chem. 64, 2,021–2,028 (1992). 15. K. Borgwarth, C. Ricken, D. G. Ebling, and J. Heinze, Fresnius’ J. Anal. Chem. 356, 288–294 (1996). 16. G. Wittstock, T. Asmus, and T. Wilhelm, Fresnius’ J. Anal. Chem. 367:346–351–(2000). 17. A. L. Barker, P. R. Unwin, S. Amemiya, J. Zhou, and A. J. Bard, J. Phys. Chem. B 103, 7,260–7,269 (1999). 18. D. O. Wipf and A. J. Bard, J. Electrochem. Soc. 138, L4–L6 (1991). 19. D. O. Wipf and A. J. Bard, J. Electrochem. Soc. 138, 469–474 (1991). 20. S. B. Basame and H. S. White, J. Phys. Chem. B 102, 9,812–9,819 (1998). 21. S. B. Basame and H. S. White, Langmuir 15, 819–825 (1999). 22. D. T. Pierce and A. J. Bard, Anal. Chem. 65, 3,598–3,604 (1993). 23. R. C. Engstrom, M. Weber, D. J. Wunder, R. Burgess, and S. Winguist, Anal. Chem. 58, 844–848 (1986). 24. C. H. Paik, S. B. Basame, H. S. White, and R. C. Alkire, Proc. Electrochem. Soc. 99–28, 122–130 (2000). 25. D. O. Wipf, Colloid Surf. A 93, 251–261 (1994). 26. Y. Y. Zhu and D. E. Williams, J. Electrochem. Soc. 144, L43–L45 (1997). 27. S. B. Basame and H. S. White, J. Electrochem. Soc. 147, 1,376–1,381 (2000). 28. N. Casillas, S. J. Charlebois, W. H. Smyrl, and H. S. White, J. Electrochem. Soc. 140, L142–L145 (1993).
1258
SCANNING ELECTROCHEMICAL MICROSCOPY
29. M. Tsionsky, Z. G. Cardon, A. J. Bard, and R. B. Jackson, Plant Physiol. 113, 895–901 (1997). 30. J. Wang, L. -H. Wu, and R. Li, J. Electroanal. Chem. 272, 285–292 (1989). 31. C. M. Lee, J. Y. Kwak, and A. J. Bard, Proc. Natl. Acad. Sci. U.S.A. 87, 1,740–1,743 (1990). 32. T. Yasukawa, Y. Kondo, I. Uchida, and T. Matsue, Chem. Lett. 767–768 (1998). 33. T. Yasukawa, N. Kanaya, D. Mandler, and T. Matsue, Chem. Lett. 458–459 (2000). 34. M. Arca, M. V. Mirkin, and A. J. Bard, J. Phys. Chem. 99, 5,040–5,050 (1995). 35. M. E. Williams, K. J. Stevenson, A. M. Massari, and J. T. Hupp, Anal. Chem. 72, 3,122–3,128 (2000). 36. B. D. Bath, R. D. Lee, H. S. White, and E. R. Scott, Anal. Chem. 70, 1,047–1,058 (1998). 37. J. V. Macpherson, M. A. Beeston, P. R. Unwin, N. P. Hughes, and D. Littlewood, Langmuir 11, 3,959–3,963 (1995). 38. E. R. Scott, A. I. Laplaza, H. S. White, and J. B. Phipps, Pharmaceut. Res. 10, 1,699–1,709 (1993). 39. E. R. Scott, H. S. White, and J. B. Phipps, Solid State Ionics 53–56, 176–183 (1992). 40. K. C. Melikov and I. A. Ershler, Biol. Membrany 13, 423–429 (1996). 41. N. P. Hughes, D. Littlewood, P. R. Unwin, J. V. Macpherson, and M. A. Beeston, in M.Shimono, T. Maeda, H. Suda, and K. Takahashi, eds., Dentine/Pulp Complex, Quintessence, Carol Stream IL, 1996, pp. 333–335. 42. S. Nugues and G. Denuault, J. Electroanal. Chem. 408, 125–140 (1996). 43. E. Klusmann and J. W. Schultze, Electrochim. Acta 42, 3,123–3,136 (1997). 44. J. O. Park, C. H. Paik, and R. C. Alkire, J. Electrochem. Soc. 143, L174–L176 (1996). 45. C. Wei, A. J. Bard, G. Nagy, and K. Toth, Anal. Chem. 67, 1,346–1,356 (1995). 46. N. J. Evans, M. Gonsalves, N. J. Gray, A. L. Barker, J. V. Macpherson, and P. R. Unwin, Electrochem. Commun. 2, 201–206 (2000). 47. A. J. Bard and L. R. Faulkner, Electrochemical Methods: Fundamentals and Applications, J Wiley, NY, 2000. 48. J. M. Davis, F. -R. F. Fan, and A. J. Bard, J. Electroanal. Chem. 238, 9–31 (1987). 49. M. V. Mirkin, F. R. F. Fan, and A. J. Bard, J. Electroanal. Chem. 328, 47–62 (1992). 50. Q. Fulian, A. C. Fisher, and G. Denuault, J. Phys. Chem. B 103, 4,387–4,392 (1999). 51. J. Galceran, J. Cec´ılia, E. Companys, J. Salvador, and J. Puy, J. Phys. Chem. B 104, 7,993–8,000 (2000). 52. J. L. Amphlett and G. Denuault, J. Phys. Chem. B 102, 9,946–9,951 (1998). 53. J. L. Amphlett and G. Denuault, J. Phys. Chem. B 104, 7,993–8,000 (2000). 54. L. A. Nagahara, T. Thundat, and S. M. Lindsey, Rev. Sci. Instrum. 60, 3,128–3,130 (1989). 55. B. D. Penner, M. J. Heben, and N. S. Lewis, Anal. Chem. 61, 1,630 (1989). 56. C. J. Slevin, N. J. Gray, J. V. Macpherson, M. A. Webb, and P. R. Unwin, Electrochem Commun 1, 282–288 (1999). 57. Y. H. Shao, M. V. Mirkin, G. Fish, S. Kokotov, D. Palanker, and A. Lewis, Anal. Chem. 69, 1,627–1,634 (1997).
58. J. V. Macpherson and P. R. Unwin, J. Phys. Chem. 99, 3,338–3,351 (1995). 59. M. Buchler, S. C. Kelley, and W. H. Smyrl, Electrochem. Solid-State Lett. 3, 35–38 (2000). 60. D. O. Wipf and A. J. Bard, Anal. Chem. 64, 1,362–1,367 (1992). 61. C. M. Lee, D. O. Wipf, A. J. Bard, K. Bartels, A. C. Bovik, Anal. Chem. 63, 2,442–2,447 (1991).
and
62. K. A. Ellis, M. D. Pritzker, and T. Z. Fahidy, Anal. Chem. 67, 4,500–4,507 (1995). 63. F. R. F. Fan and A. J. Bard, Proc. Natl. Acad. Sci. U.S.A. 96, 14 222–14 227 (1999). 64. L. B. Anderson and C. N. Reilley, J. Electroanal. Chem. 10, 295–305 (1965). 65. L. B. Anderson, B. McDuffie, and C. N. Reilley, J. Electroanal. Chem. 12, 477–494 (1966). 66. M. V. Mirkin, T. C. Richards, and A. J. Bard, J. Phys. Chem. 97, 7,672–7,677 (1993). 67. M. V. Mirkin and A. J. Bard, J. Electroanal. Chem. 323, 29–51 (1992). 68. A. J. Bard, M. V. Mirkin, P. R. Unwin, and D. O. Wipf, J. Phys. Chem. 96, 1,861–1,868 (1992). 69. A. J. Bard, F. -R. Fan, and M. Mirkin, in I. Rubenstein, ed., Physical Electrochemistry: Principles, Methods, and Applications, Marcel Dekker, NY, 1995, pp. 209–242. 70. J. L. Gilbert, S. M. Smith, and E. P. Lautenschlager, J. Biomed. Mater. Res. 27, 1,357–1,366 (1993). 71. E. R. Scott, H. S. White, and J. B. Phipps, Anal. Chem. 65, 1,537–1,545 (1993). 72. E. R. Scott, J. B. Phipps, and H. S. White, J. Invest. Dermatol. 104, 142–145 (1995). 73. M. Gonsalves, A. L. Barker, J. V. Macpherson, P. R. Unwin, D. O’Hare, and C. P. Winlove, Biophys J. 78, 1,578–1,588 (2000). 74. D. O. Wipf, F. Ge, T. W. Spaine, and J. E. Baur, Anal. Chem. 72, 4,921–4,927 (2000). 75. K. Borgwarth and J. Heinze, J. Electrochem. Soc. 146, 3,285–3,289 (1999). 76. W. E. Morf and N. F. Derooij, Sensor Actuator A Phys. 51, 89–95 (1995). 77. The Piezo Book, Burleigh Instruments, Inc., Fishers, NY. 78. G. Wittstock, H. Emons, T. H. Ridgway, E. A. Blubaugh, and W. R. Heineman, Anal. Chim. Acta 298, 285–302 (1994). 79. K. Toth, G. Nagy, C. Wei, and A. J. Bard, Electroanalysis 7, 801–810 (1995). 80. B. R. Horrocks, M. V. Mirkin, D. T. Pierce, A. J. Bard, G. Nagy, and K. Toth, Anal. Chem. 65, 1,213–1,224 (1993). 81. Y. H. Shao and M. V. Mirkin, Anal. Chem. 70, 3,155–3,161 (1998). 82. T. Solomon and A. J. Bard, J. Phys. Chem. 99, 17,487–17,489 (1995). 83. S. Amemiya, Z. Ding, J. Zhou, and A. J. Bard, J. Electroanal. Chem. 483, 7–17 (2000). 84. D. O. Wipf, A. J. Bard, and D. E. Tallman, Anal. Chem. 65, 1,373–1,377 (1993). 85. K. Borgwarth, D. G. Ebling, and J. Heinze, Ber. Bunsenges. Phys. Chem. 98, 1,317–1,321 (1994). 86. K. Borgwarth, D. Ebling, and J. Heinze, Electrochim. Acta 40, 1,455–1,460 (1995). 87. M. Ludwig, C. Kranz, W. Schuhmann, and H. E. Gaub, Rev. Sci. Instrum. 66, 2,857–2,860 (1995).
SILVER HALIDE DETECTOR TECHNOLOGY 88. P. I. James, L. F. Garfias-Mesias, P. J. Moyer, and W. H. Smyrl, J. Electrochem. Soc. 145, L64–L66 (1998).
1259
SILVER HALIDE DETECTOR TECHNOLOGY RICHARD HAILSTONE
¨ 89. M. Kupper and J. W. Schultze, Electrochim. Acta 42, 3,023–3,032 (1997).
Rochester Institute of Technology Rochester, NY
90. J. O. Park, C. H. Paik, Y. H. Huang, and R. C. Alkire, J. Electrochem. Soc. 146, 517–523 (1999). 91. J. O. Park and H. B¨ohni, Electrochem. Solid-State Let. 3, 416–417 (2000).
INTRODUCTION
92. N. Casillas, S. Charlebois, W. H. Smyrl, and H. S. White, J. Electrochem. Soc. 141, 636–642 (1994).
Detectors based on silver halides are among the oldest light detectors available — both the daguerreotype and the calotype date to 1839 (1). The invention of the fixing process in 1841 provided a means of stabilizing the image (2), and the applications of photography exploded. It is quite interesting that the same basic materials being used for silver halide detectors in the twentyfirst century are identical to those discovered in 1839. That is not to say that the technology has not been improved over the intervening years. There have been tremendous improvements in sensitivity, image quality, image permanence, the ability to record colors, and convenience. But, because the same basic material is still being used despite numerous attempts to discover ‘‘better’’ materials, one must conclude that silver halides have some very unique properties — many of which will be discussed in this article. This article focuses on the basic aspects of the silver halide detector that are common to all of its commercial applications; specific applications are covered in other articles. A ‘‘top-down’’ approach is used. First, the detector and methods for characterizing it are described. Next, the preparation of the imaging elements in the detector are discussed. Then, a detailed description of the imaging elements is given, including their physical properties, the mechanism of latent-image formation, factors that affect the efficiency of image recording and the sensitivity of the detector, techniques for improving efficiency, and methods of adjusting its spectral response. A discussion follows of the signal amplification step required to achieve the final
93. L. F. Garfias-Mesias, M. Alodan, P. I. James, and W. H. Smyrl, J. Electrochem Soc. 145, 2,005–2,010 (1998). 94. S. B. Basame and H. S. White, Anal. Chem. 71, 3,166–3,170 (1999). 95. J. W. Still and D. O. Wipf, J. Electrochem. Soc. 144, 2,657–2,665 (1997). 96. K. Fushimi and M. Seo, Proc. Electrochem. Soc. 99–28, 362–369 (2000). 97. P. R. Unwin and J. V. Macpherson, Chem. Soc. Rev. 24, 109–119 (1995). 98. J. V. Macpherson, M. A. Beeston, P. R. Unwin, N. P. Hughes, and D. Littlewood, J. Chem. Soc. Faraday Trans. 91, 1,407–1,410 (1995). 99. J. V. Macpherson and P. R. Unwin, J. Phys. Chem. 98, 1,1764–1,1770 (1994). 100. B. W. Mao, B. Ren, X. W. Cai, and L. H. Xiong, J. Electroanal. Chem. 394, 155–160 (1995). 101. J. Zaumseil, G. Wittstock, S. Bahrs, and P. Steinrucke, Fresnius’ J. Anal. Chem. 367, 352–355 (2000). 102. S. Kasai, A. Yokota, H. Zhou, M. Nishizawa, K. Niwa, T. Onouchi, and T. Matsue, Anal. Chem. 72, 5,761–5,765 (2000). 103. R. B. Jackson, M. Tsionsky, Z. G. Cardon, and A. J. Bard, Plant Physiol. 111, 354–354 (1996). 104. T. Yasukawa, T. Kaya, and T. Matsue, Chem. Lett. 975–976 (1999). 105. B. D. Bath, H. S. White, and E. R. Scott, Pharm. Res. 17, 471–475 (2000). 106. G. Denuault, M. H. T. Frank, and L. M. Peter, Faraday Discuss. Chem. Soc. 23–35 (1992). 107. M. H. T. Frank and G. Denuault, J. Electroanal. Chem. 354, 331–339 (1993). 108. M. H. T. Frank and G. Denuault, J. Electroanal. Chem. 379, 399–406 (1994).
Silver halide microcrystal
Gelatin/H2O
Overcoat
109. Y. H. Shao, M. V. Mirkin, and J. F. Rusling, J. Phys. Chem. B 101, 3,202–3,208 (1997).
10 µm
110. J. Zhang and P. R. Unwin, J. Phys. Chem. B 104, 2,341–2,347 (2000). 111. J. Zhang, A. L. Barker, and P. R. Unwin, J. Electroanal. Chem. 483, 95–107 (2000). 112. A. L. Barker and P. R. Unwin, J. Phys. Chem. B 104, 2,330–2,340 (2000).
Coating support
100 µm
113. C. J. Slevin, S. Ryley, D. J. Walton, and P. R. Unwin, Langmuir 14, 5,331–5,334 (1998). 114. C. J. Slevin, J. V. Macpherson, and P. R. Unwin, J. Phys. Chem. B 101, 1,0851–1,0859 (1997). 115. A. L. Barker, M. Gonsalves, J. V. Macpherson, C. J. Slevin, and P. R. Unwin, Anal. Chim. Acta 385, 223–240 (1999). ¨ 116. O. E. Husser, D. H. Craston, and A. J. Bard, J. Electrochem. Soc. 136, 3,222–3,229 (1989).
Figure 1. Schematic depiction of the cross section of a simple silver halide detector. The overcoat layer provides abrasion protection, as well as a means of introducing chemicals into the coating. Often, reagents that lead to chemical cross-linking of the gelatin molecules are introduced here. The ‘‘hardened’’ film is more robust, particularly to high-temperature image processing.
1260
SILVER HALIDE DETECTOR TECHNOLOGY
image. Finally, a brief discussion of design considerations is included that stresses the optimization of sensitivity, noise, and sharpness. A short discussion of imaging systems is also included. Several books and articles can be recommended to those who would like a more detailed discussion of silver halide detector technology, including applications (3–7). Several review articles focusing on different aspects are also available (8–11). DETECTOR STRUCTURE AND FUNCTION The detector consists of silver halide microcrystals coated in a gelatin/water mixture on either a transparent or reflective support, as exemplified by the schematic in Fig. 1. A latent image is formed on the microcrystal via a photochemical change induced by light absorption. The visible image is created through an image processing step in which those microcrystals that have received sufficient light ‘‘blacken’’ as they are transformed from silver halide to metallic silver. In color photography, the image processing stage produces a dye image, rather than a silver image. The detector described in Fig. 1 is very simple. For some applications designed for black-and-white pictorial images, this structure might suffice. Usually, however, commercial products have a more complicated structure that involves several layers; some contain microcrystals, and some contain various chemicals to enhance the final image. This is particularly true for color films where upward of 15 layers are present. Manufacture of such complicated structures requires sophisticated coating technology to produce products consistently whose sensitivity and image quality are constant. DETECTOR CHARACTERIZATION
scale is used for exposure because many photographic systems are designed to have dynamic ranges of two, three, or even four decades of exposure to capture images faithfully and conveniently. The response curve describes the input–output relationship in an imaging system and is an integral part of system design. The response can be measured in many ways, but in silver halide systems, it is almost always the density of the developed image. There are two modes of measuring density: (1) transmission, which is used for transparent materials; and (2) reflection, which is used for reflective materials like prints. As shown in Fig. 3, transmission density Dt uses the material’s transmittance T, whereas reflection density Dr uses the material’s reflectance R. Note that Io and I are related to the light used to view the image, not the light used to make the image. Density is used rather than transmittance or reflectance because the brightness perceived by the human visual system varies approximately linearly with the logarithm of the luminance of the object or image being viewed (14). The Characteristic Curve The response or characteristic curve is often sigmoidal, as shown in Fig. 4. Also, as indicated, there are conventional labels for the different regions of the characteristic curve. The characteristic curve can have either a negative character, as shown at the top of the figure, which means that density increases as exposure increases, or a positive character as shown at the bottom of the figure, which means that density decreases as exposure increases. Systems that respond negatively are used as image capture materials, in which the image will often be subsequently transferred to another negativetype material for image display in a reflection mode (prints). Systems that respond positively are also used for image capture, but the intent is that the capture
Basic Concepts The study of the characteristic response function of an imaging device or material is called ‘‘sensitometry’’ (12,13). The response function can be described schematically, as in Fig. 2. Here E is the exposure which is defined as the product of irradiance I and the exposure time t. A log
I
I0
Detector
Response
Output
Image D t = −log I = −log (T ) I0
Detector
I
D r = −log I = −log (R ) I0
I0 Input log E Figure 2. Input–output relationship for an imaging material.
Image Figure 3. Schematic defining transmission (top) and reflection (bottom) density. The input irradiance is Io , and the transmitted or reflected irradiance is I.
SILVER HALIDE DETECTOR TECHNOLOGY
1261
D max D log E = 1.30
Shoulder ''Negative'' Straight line
D
D DD = 0.80
D min
Toe 0.10
log E
log E
Em Figure 5. Definition of ISO speed for a black-and-white photographic material. Film must be processed to match the specified D and log E. The exposure Em is used in calculating the ISO speed [Eq. (1)].
''Positive''
D
log E Figure 4. D–log E curves for negative- and positive-working materials. Curve labels indicate customary designations of the regions of the D–log E curve. Identical labels apply to the positive curve.
material becomes the display material in transmission mode (slides). The negative or positive nature of the characteristic curve of a given system is usually determined by the details of the image processing stage. Sensitivity Often, it is desired to summarize the behavior of a given imaging system in terms of a single number or index so that different systems can be more easily compared. The indexes that are most common are ‘‘sensitivity’’ or ‘‘speed’’ and ‘‘contrast.’’ However, these indexes can be defined in many different ways. In some cases the speed is simply defined as the log E value required to achieve a certain density. But, in this method, a higher numerical value means that the sensitivity is lower, which can be confusing. Sensitivity defined in this way also depends on the reference density used, so that the relative speed of two systems whose characteristic curves have quite different shapes will be very different in the toe versus the shoulder of the curve. Photographic manufacturers have long recognized these problems in defining sensitivity and, through the International Standards Organization (ISO), collectively designed a system called ‘‘ISO speed’’ for uniformly measuring sensitivity (15,16). In this system, sensitivity indexes for films are based on a reference density in the toe of the D–log E curve, as indicated in Fig. 5. In this case the ISO speed for noncolor materials is defined as SISO =
0.8 Em
exposure, but asymptotically approaches some minimum density Dmin , sometimes called ‘‘fog.’’ This is often true in imaging systems, and it means that the system can produce some response, even in the absence of exposure. Fog is often an undesirable but unavoidable artifact of many systems. However, some fog can be tolerated in the negative because it can be compensated for in making the final print. But display materials such as prints must have very low fog levels. Contrast The contrast of an imaging system is related to how quickly or slowly the response, or density, changes with changes in the input, log E. Because the characteristic curve is nonlinear, again we have several ways of measuring contrast or ‘‘gamma,’’ as it is sometimes called. The instantaneous slope of a D–log E curve is defined as γ =
dD d(log E)
(2)
Obviously, then, the value of γ is a function of where it is measured on the D–log E curve, as illustrated in Fig. 6. Often, one wants to define a ‘‘gamma index,’’ or a ‘‘contrast index,’’ to describe the overall steepness of the
D
g
(1)
where Em is defined in Fig. 5. Notice that the D–log E curve in Fig. 5 does not go to zero density at very low
log E Figure 6. Variation of contrast γ with exposure.
1262
SILVER HALIDE DETECTOR TECHNOLOGY
curve. Just as for sensitivity, there are many ways to define the contrast index. One method is to use the maximum slope to define the system gamma. This is merely the maximum gamma in Fig. 6. ‘‘Dynamic range’’ is sometimes used to indicate the contrast characteristics of a system. Dynamic range, also called ‘‘latitude,’’ is loosely defined as the range of log E over which the density changes. Clearly, dynamic range and gamma are inversely related. A narrow dynamic range means that the user must be very careful with exposure, otherwise the detector will be under- or overexposed. Camera negative films are designed to have a wide dynamic range, or long latitude, so that the user will have a much easier time fitting the entire range of scene luminances onto the D–log E curve. In many situations, we are interested in the way the silver halide responds at different wavelengths. It is common to condense this information into plots of sensitivity and gamma as a function of wavelength, S(λ) and γ (λ). The S(λ) versus λ plot is called a ‘‘spectral sensitivity’’ curve. It is common to correct the relative sensitivities for wavelength-dependent irradiance differences at the exposure plane, so that perceived sensitivity differences are truly related to the material and are not influenced by the particular light source used. Experimental Aspects of Sensitometry Sensitometry is studied by using devices called ‘‘sensitometers.’’ These devices must be capable of controlling the exposure. This may be done by varying both I and t, as in flash lamps, by varying t and holding I constant, or by varying I and holding t constant. The latter is the most common technique. One often uses a ‘‘step tablet’’ or ‘‘step wedge’’ placed over the imaging material. The step tablet has a series of steps of differing density that absorb (or reflect) differing amounts of the incident light. The step tablet is designed to decrement the irradiance by equal factors, that is, a geometric not an arithmetic progression. MICROCRYSTAL PREPARATION The microcrystals, also called ‘‘grains,’’ are prepared in a variety of shapes and size distributions to accommodate different applications. Strictly speaking, the resulting suspension of silver halide microcrystals in aqueous gelatin is classified as a dispersion, but early in the development of photography, it was called an ‘‘emulsion.’’ We will continue this practice in this article. Across the range of applications, the microcrystal size varies from a few tenths of a micron to several microns in diameter. Sensitivity, or speed, increases with increasing grain size, so higher speed films must necessarily use larger grains. Materials intended for image capture need to produce a distinguishable output over a wide range of inputs, and so a polydisperse size distribution is used. On the other hand, materials for image display, such as prints, need to respond only over a relatively narrow range of inputs, so a monodisperse size distribution is often used. Figure 7 contains electron micrographs of typical grains used as imaging elements in silver halide detectors. In some cases,
the grains are relatively flat (‘‘tabular grains’’); in other cases, the grains are more three-dimensional. Silver halide microcrystals are prepared via a precipitation reaction involving a soluble silver salt and a soluble halide salt to produce a silver halide precipitate. There are various ways by which the reactants can be introduced into the system; two basic schemes are shown in Fig. 8. The single-jet scheme has historical significance, but the double-jet scheme has largely replaced it in current precipitation modes. Note that in both schemes, as in all precipitation modes, gelatin is present. In the absence of gelatin or some suitable synthetic polymer, coagulation of the growing silver halide microcrystals would occur. As will be described later, the technology permits remarkable control over microcrystal shape and its size distribution, which is required for the various applications of silver halide detectors. Crystal Structure and Shape Silver chloride (AgCl), silver bromide (AgBr), and silver iodide (AgI) exist as crystals composed of silver (Ag+ ) and halide (Cl− , Br− , or I− ) ions. Pure AgI is not used in photographic materials, so it will not be discussed further. The crystal structure of AgCl and AgBr is face-centered cubic (fcc) which is the ‘‘rock salt’’ structure of sodium chloride, as shown in Fig. 9. However, unlike sodium chloride where the crystal’s cohesive energy is entirely due to ionic bonding, the cohesive energy in silver halides is partly due to covalent bonding in addition to substantial amounts of ionic bonding (18). However, an alternative view favors van der Waals forces rather than covalent bonding (19). Looking at the fcc lattice structure, one can distinguish many different planes of ions, and, in principle, any of these planes could be used to terminate the crystal to form a wide variety of crystal shapes. However, only two morphologies, cubic and octahedral, are sufficiently stable for practical use. The Miller index for the lattice plane leading to cubes is (100), where both ion types are present in an alternating pattern, leading to octahedra is (111), where all of the ions are of one type. Other crystal shapes would require surface planes that have higher Miller indexes. But such planes would show significant deviations from planarity, leading to high surface energies. As a result, they would reconstruct themselves to achieve a lower Miller index and thereby lower the surface energy. Particle Stability One critical feature of silver halides is their extreme insolubility in water, which makes it possible to grow microcrystals that are stable in an aqueous medium. However, silver halides in contact with water do release some ions into solution and this is symbolized for AgBr by the following equilibrium: + − −−− −− → AgBr ← − Ag + Br
(3)
Expressing this equilibrium mathematically, Ksp =
[Ag+ ][Br− ] = [Ag+ ][Br− ] [AgBrsolid ]
(4)
Figure 7. Electron micrographs of carbon replicas of emulsion grains. The ‘‘shadows’’ are obtained from metal evaporation at an angle to the plane of the sample and can be used to assess the thickness of the grain.
AgNO3
X−
AgNO3
a
Gelatin
Gelatin, X −
Double jet
Single jet
Figure 8. Basic schemes for precipitating silver halide microcrystals and a view of the particle distribution produced. X− represents a halide ion source, typically the potassium or sodium salt.
Figure 9. Lattice ion arrangement in a face-centered cubic crystal. Small filled circles are Ag+ and large open circles are Br− . a is the dimension of the unit cell (from Ref. 17). 1263
SILVER HALIDE DETECTOR TECHNOLOGY
where the simplification on the right-hand side recognizes that a molar concentration cannot be specified for a solid material, so [AgBrsolid ] = 1. Ksp is called the ‘‘solubility product.’’ Some typical values are given in Table 1. These Ksp values indicate that the equilibrium in reaction (3) lies far to the left. For example, a saturated AgCl solution at room temperature has silver and chloride ion concentrations of approximately 10 µM. The silver ion concentration plays a major role in material preparation and performance. This is expressed in compact notation analogous to pH as pAg = − log[Ag+ ]
(5)
Similar equations exist for pBr and pKsp so that Eq. (4) can be transformed into pKsp = pAg + pBr
(6)
The ion concentration can be electrochemically determined by using a silver/silver chloride electrode and a reference electrode. Using the Nernst equation (20), one can derive the following equation for room temperature measurements: EAg = EoAg + 0.059pAg
(7)
where EAg is the measured potential and EoAg is the standard potential for the reaction Ag+ + e− → Ag. Thus, the pAg can be calculated from the measured potential by using the standard potential. From reaction (3), we expect that when we place AgBr in water, the only ionic species in solution, besides the usual H+ and OH− , should be Ag+ and Br− . Likewise, these two ions should be at the same concentration or ‘‘equivalence point,’’ which can be defined as pAg = pBr = (pKsp )/2. If we add NaBr to this solution, we expect the Ag+ concentration to decrease and the Br− concentration to increase in accordance with the equilibrium description in reaction (3). This is only partially true because Ag+ can form soluble complexes with Br− described by the following equilibrium: −(m−1) −−− −− → Ag+ + mBr− ← − AgBrm
(8)
Before adding the NaBr, the system was at the equivalence point where equal numbers of Ag+ and Br− are present. After adding the NaBr the system is now said to be ‘‘on the excess bromide side.’’ A similar situation would occur if we had added AgNO3 , leading to the excess silver side. However, because the latter can lead to the chemical formation of silver clusters (fog), silver halide materials
Table 1. Solubility Products of Silver Halides Temperature ( ° C)
AgCl
AgBr
AgI
25 50
1.77 × 10−10 1.30 × 10−9
4.85 × 10−13 6.45 × 10−12
8.28 × 10−17 2.49 × 10−15
−3
Log AgBr solubility (moles/L)
1264
Controlled by complex formation
Equivalence point
−4
Controlled by K sp
−5
Minimum solubility
−6
−7
−8
5
6
7
8 p Ag (40 °C)
9
10
11
Figure 10. Solubility vs pAg relationship for AgBr at 40 ° C.
for photographic use are always prepared, handled, stored, and coated on the excess bromide side. The existence of soluble complexes leads to nonlinear behavior in the dependence of solubility on silver ion concentration. Solubility is defined as the total concentration of silver-containing species in solution. Its dependence on pAg for AgBr is shown in Fig. 10. Starting at the equivalence point (pAg 5.8), the solubility first decreases in accordance with the equilibrium expressed in reaction (3) because the Br− concentration is increasing when pAg is increasing [see Eq. (6)]. Then, a minimum solubility point is reached. Finally, the solubility increases as soluble silver complexes begin to form [reaction (8)]. Because the equilibria expressed by reactions (3) and (8) are shifted to the right by increasing temperature, the solubility also increases with temperature, and the minimum solubility point shifts toward lower pAg, and increases in value (21). To stabilize the silver halide microcrystals in aqueous solution and to prevent coalescence, all silver halide materials must be prepared in the presence of a stabilizing polymer. This role is most often filled by gelatin. Gelatin is a biopolymer prepared from animal collagen. It is a linear polymer composed of covalently linked amino acids that provide amide and carboxylic acid side groups which confer a pH-dependent charge on the polymer. The side groups also provide sites for cross-linking that lead to a three-dimensional polymer of much greater mechanical strength. This latter process, termed ‘‘hardening,’’ is carried out during the coating operation in all commercial photographic materials, usually by introducing a suitable cross-linking reagent into the overcoat (see Fig. 1). Imaging systems based on silver halide technology must possess a high degree of stability. The actual exposure may occur months to years after the system has been coated, and once exposed, the image processing stage may not occur for months to years later. Thus, both the silver halide and the photoproduced silver clusters must be stable for very long time periods. To this end, stabilizing agents are incorporated into the coating (22). The common feature
SILVER HALIDE DETECTOR TECHNOLOGY
of these stabilizers is that they form complexes with silver ions, often with Ksp ’s smaller than that achieved with halide ions. Thus, the silver ion activity is reduced in the coating, minimizing unwanted chemical changes during storage. These stabilizers can also adsorb on the photoproduced silver clusters and increase their stability. The Gibbs–Thomson effect (23), which deals with nucleation of new phases, for example, formation of rain drops in clouds, is an important phenomenon in emulsion precipitation. This effect describes how the particle radius and the supersaturation affect the system’s ability to change its size distribution. A thermodynamic description uses the change in the excess free energy G expressed as G = 4π r2 σs −
4π r3 kT ln S 3V
(9)
where σs is the interfacial free energy, V is the atomic volume, k is Boltzmann’s constant, and T is the absolute temperature. The supersaturation is defined as S=
Cs Ce
(10)
where Cs is the concentration of silver containing species in solution and Ce is the concentration of silver-containing species that would prevail at equilibrium. When G is positive, there is a strong driving force for the system to move to a more stable state by changing the particle radius until G ≤ 0. In emulsion precipitation, the most important variable is supersaturation, as schematically illustrated in Fig. 11. This figure shows how the excess free energy of the system varies with particle radius and supersaturation. Particles whose radius would place them to the left of the maximum
S1
∆G
S2
+
S3
0
r
rc
rc rc
−
on a given curve will be unstable because by becoming smaller, their excess free energy will be reduced. Particles whose radii place them to the right of the maximum will grow in size because this will also reduce the excess free energy. As shown in the figure, the position of the curve’s maximum with respect to the critical particle size rc varies inversely with the degree of supersaturation (23). Ostwald ripening (24), which follows directly from the Gibbs–Thomson effect, is a phenomenon that plays an important role in emulsion precipitation. In Ostwald ripening, small particles dissolve, and the material released is deposited on larger particles. The rate of Ostwald ripening is a function of the size difference between particles, growth and dissolution rates, and the transport properties through the solution. Dissolution rates are controlled by the temperature, the concentration of excess halide, and the presence of any silver ion complexing agents. Emulsions whose particle size distribution is narrow will broaden their size distribution when placed in an environment that favors Ostwald ripening (24). Phases in Crystal Formation Emulsion precipitation can be partitioned into a nucleation and a growth phase. Nucleation is the process whereby new crystals of extremely small size are formed. The driving force is a high level of supersaturation that encourages the formation of a new solid phase from the soluble species. Just how small a stable crystal is formed has been a question for many years and undoubtedly will be a function of the precipitation conditions. Earlier studies have concluded that the diameter of particles after just the first few seconds of precipitation is on the order of several hundred angstroms (25–28). More recently, multichannel UV spectrophotometry has been used to characterize particle size (29). This technique has millisecond resolution as opposed to the second resolution of earlier methods, and the results indicate that three to molecules take part in forming ultrafine four AgBr−(m−1) m particles. Depending on the pAg, m has integral values between 2 and 4. Ostwald ripening of these ultrafine particles occurred on the time scale of seconds to minutes. Thus, most of the ultrafine particles will dissolve and grow onto the few that survive. Growth is the addition of silver and halide ions to a growing crystal. The usual observation is that the population of grains grows at more or less the same rate and maintains the same width in their size distribution during precipitation (30). There are several mechanisms by which crystals can grow when immersed in a solution of their components. For silver halide precipitation, the most common mode of growth is by diffusion from solution and this becomes the rate-determining step when new material is rapidly incorporated into the growing crystal’s surface (28,31). When the diffusion layer thickness is much greater than the crystal size, the growth rate G is given by G=
Figure 11. Relationship between the system excess free energy G and the particle size r for different supersaturated conditions S. S3 > S2 > S1 . rc is the critical radius for particle stability.
1265
4D (C − Ci ) ρL
(11)
where D is the diffusivity, ρ is the crystal density, L is the crystal diameter, C is the bulk solute concentration, and
SILVER HALIDE DETECTOR TECHNOLOGY
Ci is the solute concentration at the surface. Because of its rapid incorporation into the crystal, the concentration of new material at the surface is always less than that in the bulk solution and provides a concentration gradient that acts as a driving force for growth. Precipitation Scheme
Sm
Pump
Mixer pH, p Ag meas.
Pump
AgNO3 soln
Computer Gel H2O
Temp probe
Temperature-controlled bath
Quasi steady state (dynamic mass balance)
Transient Nucleation
Sc
1.0
0 t1
tm
t2
t3
Time (t ) (nonlinear scale)
No.of crystals Z
Figure 12 shows the basic arrangement of a double-jet precipitation. The reactants are typically at 1 to 4 M concentrations, so that the supersaturation is very high just at the point of emergence from the supply tubes near the mixing head. For example, at 70 ° C and pAg 7.5, the equilibrium solubility of silver ion species is 10−6 M. If the AgNO3 concentration in the supply tube is 1 M, then the supersaturation is 106 , which provides a large driving force for crystal formation. Thus, transient nuclei are continuously formed in this highly supersaturated region of the reactor and then are fed into the bulk solution by the mixing head, where their survival will depend on the prevailing level of supersaturation (24). In the early stage (less than one minute), the bulk supersaturation is high, and the nuclei survive. These initial nuclei grow rapidly and reduce the bulk supersaturation so that additional nuclei cannot survive. Newly generated nuclei from the mixing head region will dissolve by Ostwald ripening and provide growth material for the crystals initially formed. Nucleation ceases when the rate of Ostwald ripening equals the rate of reactant addition. A model has been developed to understand quantitatively the parameters that control the outcome of the precipitation (32). Figure 13 shows a qualitative view of the precipitation. In the quasi-steady-state region a dynamic mass balance between material added and the total mass of the microcrystals formed leads to expressions for the number of stable growing microcrystals. In the usual diffusion-controlled growth mode, the number of
Reference electrode pH electrode pAg electrode
Supersaturation ratio (s)
1266
t1
t2
t3 Time
Figure 13. Qualitative representation of supersaturation ratio [Eq. (10)] and concentration of microcrystals vs. precipitation time. Axes are nonlinear. (From Ref. 32. Reprinted with permission of the American Chemical Society.)
growing microcrystals Z is given by Z=
ART r Cs −1 rc
(12)
where R is the reactant addition rate, r is the number average crystal radius, rc is the critical radius below which the crystal will tend to dissolve due to the Gibbs–Thomson effect, and A is a constant. This equation accounts for the observed effects of reactant addition rate, pAg, solubility, and temperature on the number of microcrystals formed. In many applications of silver halide technology, the grains contain more than one type of halide ion. In some cases, the second halide ion is added at a selected point during the precipitation, or even after all the first halide is added. In such situations, recrystallization will occur, and the system will tend toward a uniform composition, but cannot usually achieve it due to kinetic factors (24). The driving force for recrystallization is increased entropy, which is fundamentally different from that of Ostwald ripening. Twinning
NaBr soln
Pump control Figure 12. Diagram of an emulsion precipitation apparatus based on the double-jet scheme. The computer-controlled pumps allow the reactants to enter the reaction zone near the mixing head at the proper rate for microcrystal size control. The pAg electrode signal is fed to the computer and is part of a feedback loop for adjusting the bromide pump speed to maintain the desired pAg.
In earlier applications of silver halide technology, the grains had a more or less three-dimensional structure. But in more recent applications, tabular grains have been used. To form tabular grains, a phenomenon called ‘‘twinning’’ must occur. To understand twinning, we must first discuss how ions are placed within an individual layer during crystal growth. Consider a crystal growing by successive (111) layers — one layer of silver ions, the next bromide ions, the next layer silver ions, etc. Figure 14 shows the position of the ions in the first layer A. All of the ions in the next layer
SILVER HALIDE DETECTOR TECHNOLOGY
A
A
A B
B
C
C A
A
A
C A
B C
C
A B
C
A
A
B C
B
B
A
A B
A B
C A
C A
A
Figure 14. Layer arrangement for successive (111) planes. The bottom layer is denoted by A. The second layer is placed at the B position, the third layer at the C position, and the fourth layer would adopt the A position to start the sequence again.
can go in either position B or position C. Let us assume it is B. Then, the ions in the third layer would have to go into position C. The fourth layer would repeat the pattern where the ions go directly over those in position A in the first layer. Because there are two different ions in a growing AgBr crystal, we can represent one type of ion by an upper case letter, and the other type by a lower case letter. Then, the stacking pattern described above could be expressed as AbCaBcAbC. . . Twinning consists of reversal of the usual stacking sequence within the crystal, as shown in Fig. 15 for twinning on a (111) plane, which is the most common application in silver halide technology. The twinned portion may be thought of as the mirror image of the parent crystal. In practice, twinning often occurs when the precipitation is carried out at pAg > 10. A possible explanation of twinning suggests that occasionally a Br− (111) layer is added in the wrong (twin) position due to the predominance of complex ions such as AgBr2− 3 , (33). Other mechanisms that involve supersaturation-induced dominance of a surface integration growth mechanism (34) or coalescence of colloidal nuclei (35) have also been suggested to explain twinning. Tabular morphology requires that at least two twinning events occur in a microcrystal. In a hexagonal tabular grain, double twinning would result in six troughs, as shown in Fig. 16. It is thought that addition of new
C
Br−
b
Ag+
A
Br−
+
b C a B
a
Ag+
Ag
Stacking fault occurs here
B
Br−
Br−
A
c
Ag+
C b
A Layer structure of AgBr
1267
a C
Twin plane
b A Layer structure of AgBr with (111) twin plane
Figure 15. Normal stacking sequence of (111) planes (left) and stacking fault (right) in AgBr crystals. The stacking fault leads to the formation of a twin plane.
Double twinning Figure 16. A view of an hexagonal tabular grain that shows four of the troughs (arrows) created at the edges by double twinning. The other two troughs are hidden in the rear of the grain.
material to these trough regions occurs much more readily than addition to the large flat faces and results in much higher growth rates in the lateral direction (36). As a result, grains with a high aspect ratio (diameter divided by the thickness) are formed. The photographic advantages of tabular morphology will be discussed later. Controlling Crystal Shape and Size As already mentioned, a variety of crystal shapes are used in silver halide imaging. The shape is controlled primarily by pAg during precipitation (37). Cubic grains are formed at very low [Br− ] (10−3 to 10−4 M), octahedra are formed at somewhat higher concentrations (10−2 M), and tabular grains are formed at very high concentrations (10−1 M). The explanation for the pAg effect on shape is based on the silver-containing species in solution at the prevailing pAg. It is known that Br− is more strongly adsorbed to (111) faces than to (100) faces (38,39). At relatively high pAg, say 9, the predominant form of soluble silver is AgBr2− 3 . As more Br− is adsorbed to the (111) faces, these faces will repel the negatively charged complexes and therefore will grow more slowly than the (100) faces. The (100) faces will grow themselves out of existence, leaving only (111) faces (octahedra). At even higher pAg, twining will occur, and tabular grains are formed as discussed before. Other crystal shapes are possible for the face-centered cubic system (40) but, as discussed before, they are unstable and rapidly convert to cubes, octahedra, or cubooctahedra, depending on the pAg. Growth modifiers can be used to stabilize these other shapes (40), but such compounds are not practical because they interfere with subsequent sensitization steps. There are two basic classes of growth modifiers. The first class is known as ‘‘restrainers’’ because they slow crystal growth, sometimes preferentially on a particular face, as just mentioned. They adsorb on the crystal surface and typically form complexes with silver ions that have very small Ksp values. The second class of growth modifier is known as ‘‘ripeners.’’ These form soluble complexes with silver ions and are primarily used to increase grain size. We see from Eq. (12) that if the solubility of the system is increased, then the number of growing crystals is decreased. For a fixed mass of silver halide added during the precipitation, each grain will be larger if the solubility is increased by
1268
SILVER HALIDE DETECTOR TECHNOLOGY
using a ripener. Studies of ripeners have shown that they, indeed, decrease the number of growing crystals (41). In addition to ripeners, other parameters of the emulsion precipitation can be used to adjust grain size. We see by referring again to Eq. (12), that increasing the reactant addition rate will increase the number of crystals and thereby decrease grain size for a fixed mass of silver halide added. Factors that affect solubility such as pAg or halide type will also affect the number of crystals and ultimately the final grain size. Many of these variables will also affect crystal shape (42,43). One obvious method for increasing grain size is increasing the amount of material added. However, in a scheme of constant reactant flow rate, the grain volume increases linearly with time, so that the edge length increases as the cube root of time. Therefore, doubling the edge length will require an increase in precipitation time by a factor of 8. Under practical conditions, an upper limit to grain diameter is about 0.4 µm. Larger grain sizes can be achieved in several ways. One is to use a seeding technique in which the reactor is charged with small seeds that are then grown to a desired size. Another method is to use ripeners, as mentioned before. A third technique is to use accelerated flow rates. For example, if the flow rates are changed continuously, so that a constant flow rate per grain surface area is maintained, grain diameter will vary linearly with time. Emulsion Washing At the end of the precipitation, the reactor contains the desired emulsion grains but also a large concentration of counterions, usually NO3 − and Na+ . If these counterions are not removed, the emulsion will gradually dissolve. To stabilize the emulsion, it is necessary to remove these counterions. Several techniques are possible. One technique is called ‘‘flocculation washing.’’ In this procedure, modified gelatin is added to the emulsion, making the solubility pH-dependent. By lowering the pH, the grains and associated gelatin flocculate and settle to the bottom of the reactor, whereupon the counterions in the supernatant can be readily removed. After two or three cycles, the emulsion is essentially free of counterions. A variant of this technique uses inert salts instead of modified gelatin to increase the ionic strength until flocculation occurs, and the counterions can be removed by decanting. Another method common in industrial operations is ultrafiltration. In this technique, the emulsion is pumped through a membrane cell. The membrane pore size is such that the counterions can be transmitted, but the grains and gelatin cannot. Circulation of the emulsion is stopped when the desired conductivity is reached, indicating that the ion concentration has been reduced to the desired level. Emulsion Characterization Once the precipitation is complete and the counterions are removed, it is necessary to characterize the emulsion to determine if the desired properties were achieved. To determine the morphology of the grains, they can be examined by a scanning electron microscope or an
optical microscope if the grains are large enough. Next, the grain size must be determined. Grains in calibrated micrographs can be measured. There are also a variety of particle size distribution methods that can be employed for this purpose. It is also possible to use electrolytic reduction of the grains and, by counting the number of electrons required to reduce the grains, the volume can be calculated (44,45). It is also useful to know the thickness of tabular grains. Optical micrographs can sometimes be used for this purpose qualitatively because the grains are thin enough to cause interference effects, so that different grain thicknesses will have different colors. Quantitative measurements can be made by obtaining SEM micrographs from tilted images. Another possibility is using shadowed micrographs from a transmission electron microscope (see Fig. 7). Knowing the length of the shadow and the angle at which the shadowing was done, it is possible to calculate the thickness. However, for very thin grains, the thickness of any adsorbed gelatin layer can obscure the true thickness. The composition of the grains is another property of interest. There are several bulk methods for characterizing the grain as a whole such as neutron activation analysis, X-ray powder diffraction, spark source mass spectrometry, atomic absorption, and plasma emission. Each technique has its own limitations in detection limits and sample preparation. In some cases, it is useful to do compositional analysis in different parts of the grains to test for homogeneity or more usually to see if a desired heterogeneity has been achieved. Techniques here include analytical electron microscopy coupled with cross-sectioning techniques (46), luminescence microscopy at low temperatures (47,48), and secondary ion mass spectrometry (49,50). PHYSICS OF THE SILVER HALIDE MICROCRYSTAL Crystal Defects Similar to other crystalline materials, silver halides can possess crystal defects. One defect of importance photographically is the dislocation. An edge dislocation occurs when a partial plane of ions is missing from the crystal, creating a line imperfection (17). Dislocations can be introduced during crystal growth or can be induced by applying pressure to a perfect crystal. The defect induces inefficiencies in the image formation stage of the photographic process because it leads to internal latentimage formation, which is inaccessible to the developer during image processing. Thus, great care is taken to avoid this defect. Other crystal defects are known as point defects because they involve just one of the lattice ions, rather than a plane of ions. An example is the formation of an interstitial silver ion, as shown in Fig. 17. Note that a complementary species, a vacancy, is also formed and is assigned a negative charge to take into account the surrounding six Br− ions. The process is reversible and allows interstitials to recombine with vacancies to re-form lattice silver ions. Thus, an equilibrium is set up between lattice silver ions
SILVER HALIDE DETECTOR TECHNOLOGY
Ag+lattice
Ag+interstitial
V −Ag(vacancy)
+
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Interstitial Ag+i Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Vacancy Br− Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Ag+
Br−
Figure 17. Two-dimensional view of a silver halide crystal showing the formation of an interstitial silver ion and a vacancy. Also shown is the equilibrium between lattice silver ions and interstitials and vacancies. Once created, the interstitial migrates throughout the crystal, driven by thermal energy.
and interstitial silver ions and vacancies. Such a defect is also known as a Frenkel disorder, after one of the pioneers in studying them. The energy requirement for such a process is very low, so it is quite favorable in AgBr, but less so in AgCl. As will be discussed in detail later, much of the image formation and detection occurs at the surface of the silver halide microcrystals making the structure of the surface very important. A smooth surface plane would not be realizable under practical crystal growth and storage conditions. Rather, a more realistic view of the (100) crystal surface consists of steps and terraces and jogs along the steps. An example of these jogs, known as ‘‘kink’’ sites, is shown in Fig. 18. Because of the ionic nature of silver halides, the kink site has a partial charge associated with it because it is missing three of its normal six neighbors of opposite sign. The kink site illustrated in Fig. 18 has a formal charge of +0.5 e, where e is the electrostatic unit of charge, and is known as a ‘‘positive-kink site.’’ By interchanging the positive and negative charges in Fig. 18, one obtains a negative-kink site whose formal charge is −0.5 e. However, the ions around these sites of excess
−
− − − −
− −
− − − −
+
− − −
− −
− −
+
− − − − − −
− −− − − −
+
− − −
+
− − − −
− − − −
− − − − − −
− −
+
− −
−
+ −−
− −− − − − − − − − −− − − −− −
−
1269
charge can move to some extent to accommodate these excess charges and thereby reduce the charge to less than the indicated formal charge. This movement by the lattice ions is called ‘‘lattice relaxation.’’ These sites of partial charge will take on importance when we discuss image formation later. A microcrystal bounded by (111) planes would be very unstable. It would have a large excess of charge because all of the surface ions would be the same; they carry uncompensated charge because, like kink sites on (100) surfaces, they are missing three of their normal oppositely charged neighbors. As a result, the crystal surface reconstructs to achieve a lower energy configuration. The exact nature of the reconstructed surface is not known. Models of the surface suggest a partial outermost layer of the ions of the same type where ions form either a hexagonal (51) or alternate row (52–54) pattern, as illustrated in Fig. 19. Some of these surface ions may be missing, thus forming ‘‘kink-like’’ sites.
Ionic Properties The interstitial silver ion discussed previously is not a static entity but can repeatedly move from one interstitial position to another because the free energy requirement for such a process is very low. On the other hand, the complementary vacancy formed with the interstitial ion has a large free energy requirement for motion that reduces its mobility at room temperature by several orders of magnitude relative to the interstitial silver ion. The movement of interstitial silver ions and vacancies through silver halide crystals corresponds to a movement of charge and imparts electrical conductivity to the material. Because silver halides are electrical insulators in the dark, measurements of conductivity under such conditions relate to the ionic properties of the material.
(a)
(b)
− −−
−
Figure 18. Illustration of a positive-kink site on a AgBr (100) surface.
Figure 19. Two possible models of a reconstructed AgBr (111) surface. (a) hexagonal model. (b) alternate-row model. Shaded symbols are the top layer; open symbols are the second layer. The shaded circles can be either Ag+ or Br− , and the crosshatched squares are ion vacancies. (From Ref. 54. Reprinted with permission of The Society for Imaging Science and Technology, sole copyright owners of the Journal of Imaging Science and Technology.)
1270
SILVER HALIDE DETECTOR TECHNOLOGY
Ionic conductivity σ is determined by the product of mobility µ and concentration n and is expressed as σ = e(µi ni + µv nv )
(13)
where the subscripts i and v refer to the interstitial and vacancy, respectively. At equilibrium, KF = ni nv = 2N 2 exp
GF kT
(14)
where KF is the Frenkel equilibrium constant, N is the number of lattice sites, GF is the free energy of formation of the Frenkel pair, k is Boltzmann’s constant, and T is the temperature in absolute units. Table 2 summarizes the thermodynamic data related to Frenkel pair formation in AgCl and AgBr which can be used to calculate concentrations of defects, also given in Table 2. The mobility part of the ionic conductivity can be calculated as µ=
Gµ el2 v exp 6kT kT
(15)
where l is the jump distance, ν is the vibrational frequency of the defect, and Gµ is the free energy of the jump. Table 3 summarizes the thermodynamics of the vacancy and interstitial jump, as well as the calculated mobilities. A model consistent with experimental data has been developed that describes the mechanism by which the interstitial and vacancy move (55–57). Figure 20 illustrates that both the interstitial silver ion and the lattice ion that it will eventually replace move in a concerted process. The lattice silver ion moves into an Table 2. Thermodynamic Constants for Frenkel Defect Formation in Silver Halidesa Property Formation enthalpy (eV) Formation entropy (e.u.) Mole fraction defects Defect concentration (cm−3 ) a
AgCl
AgBr
1.49 ± 0.02 11.13 ± 0.20 4.92 × 10−11 1.15 × 1012
1.16 ± 0.02 7.28 ± 0.58 4.71 × 10−9 9.80 × 1013
Ref. 7.
Table 3. Thermodynamic Constants for Frenkel Defect Motion in Silver Halidesa Property
AgCl
AgBr
Interstitial Enthalpy (eV) Entropy (e.u.) Mobility, cm2 V−1 s−1
0.018 ± 0.008 −3.83 ± 0.12 3.3 × 10−3
0.042 ± 0.011 −3.34 ± 0.28 7.7 × 10−4
Vacancy Enthalpy (eV) Entropy (e.u.) Mobility, cm2 V−1 s−1 a
Ref. 7.
0.306 ± 0.008 −0.65 ± 0.12 6.1 × 10−7
I
0.325 ± 0.011 1.01 ± 0.28 1.7 × 10−6
I
Figure 20. Schematic of the movement of an interstitial silver ion in a AgBr crystal. The ions are not drawn to scale. The interstitial silver ion I displaces a lattice silver ion into another interstitial position, taking its place in the lattice. The silver ion must undergo quadrupolar deformation to squeeze through the triad of halide ions (crosshatched).
interstitial position at the same time that the interstitial silver ion is moving into the lattice ion position. The energetics involved in the movement of both ions is the same. Key to the operation of this mechanism is the ability of the interstitial silver ion to squeeze through the triad of bromide ions (crosshatched ions in Fig. 20). The actual effective radii of the bromide ions are much larger than those illustrated in Fig. 20 to the point that they are almost touching each other. The remaining space for the silver ion to squeeze through the bromides is smaller than the free silver ion. What saves the day is that the silver ion can rather easily undergo quadrupolar deformation to form an ellipsoidal ion that can squeeze through the available space between the bromide triad. The mechanism for vacancy motion is illustrated in Fig. 21. In the simple diagonal movement at the bottom of the figure, the silver ion moves from its lattice site toward the vacancy, so that when the jump is completed, the lattice silver ion and the vacancy have merely switched places. In this case, even after the quadrupolar deformation of the silver ion, there is significant spatial overlap between the deformed ion and the two bromide ions, leading to the higher enthalpies of motion indicated in Table 3. However, Fig. 21 also shows an alternative, presumably lower energy pathway that involves two sequential noncollinear jumps, where the normal interstitial position acts as an intermediate state. But such a low energy pathway is not consistent with experimental data, and the reasons for this discrepancy have been discussed (56,58). Impurities can be incorporated within the grain, and this is sometimes intentionally done as in doping. Whether impurities or dopants, these species are most often incorporated in their ionic form, and when they are polyvalent ions, the Frenkel equilibrium can be altered due to the requirement for overall electrical neutrality. So, for example, if an impurity in a +2 valence
SILVER HALIDE DETECTOR TECHNOLOGY
1271
25°C
AgCl : Cd Cl2 (0.20) Mol (0.15) Mol AgCl (0.10)
I
(0.08)
V
(0.06)
state is incorporated, a vacancy must also be present to compensate for the extra charge on the impurity. Depending on the impurity, the vacancy may be bound to it or may dissociate from the impurity to migrate elsewhere in the crystal. Ionic conductivity is an important property that affects both the storage and image capture properties of the silver halide detector. Thus, it is important to be able to measure this property in photographic materials. A contactless technique is required because of the small size of the microcrystal. This is done by measuring the dielectric properties of thick emulsion layers and then using a theory based on ionic relaxation of conducting particles in an insulating medium (59–68). In this technique, the imaginary part of the dielectric constant of the coating is measured at different frequencies of an ac field. The theory predicts that the frequency at which the peak occurs is proportional to the ionic conductivity. An example of the data obtained by this technique is shown in Fig. 22 which pertains to AgCl grains doped with CdCl2 (61). As the Cd2+ concentration increases, the peak first shifts to a lower frequency that indicates lower ionic conductivity. This can be attributed to a lowering of the interstitial silver ion concentration and a corresponding increase in the less mobile vacancy concentration. However, a further increase in the Cd2+ concentration shifts the peak to higher frequencies which indicates an increase in ionic conductivity. In this region, the Frenkel equilibrium is dominated by vacancies and, even though they are less mobile than interstitials, their concentration is high enough that conductivity increases. Problems with this theory and associated theories (69) have been noted when multiple peaks are observed in the dielectric loss spectra of silver halide (70,71). Nevertheless, the reasonable correlation between observed and expected trends makes the technique at least qualitatively useful.
Dielectric loss
Figure 21. Schematic illustrating the movement of a vacancy V in a AgBr crystal. The lattice silver ion can move diagonally to a new lattice position (dashed arrow), thereby exchanging places with the vacancy. The ion must undergo quadrupolar deformation to squeeze through the halide dyad (crosshatched). Also shown is a presumably lower energy pathway that involves two sequential noncollinear jumps through the normal interstitial position (solid arrows).
(0.04) (0.02)
AgCl
(0.01)
(0)
1
2 3 4 Log frequency (Hz)
5
Figure 22. Dielectric loss spectra for CdCl2 doped AgCl. Curve labels indicate the mole fraction of dopant. Curves are displaced vertically for clarity. (From Ref. 61. Reprinted with permission of The Society for Imaging Science and Technology, sole copyright owners of the Journal of Imaging Science and Technology.)
Table 4. Ionic Conductivity of Silver Halides at Room Temperaturea Ionic conductivity, −1 cm−1
AgCl AgBr a
Microcrystal
Bulk crystal
0.8 × 10−8 1 × 10−6
2 × 10−9 2 × 10−8
Ref. 61.
It is of interest to compare those results obtained by techniques for measuring ionic conductivity in emulsion grains with measurements on large crystals using more conventional techniques. This comparison, given in Table 4, shows that microcrystals have 10 to 100 times higher ionic conductivity than large crystals. These results are explained by the large surface-area-to-volume ratio in microcrystals and the realization that interstitial silver ions can be created independently of vacancies at the surface via positive-kink sites, as illustrated in Fig. 23. This mechanism lowers the free energy requirements relative to the bulk process (7,72) and leads to higher concentrations of interstitial silver ions. The vast majority
1272
SILVER HALIDE DETECTOR TECHNOLOGY
Br− Ag+ Br− Ag+
Br− Ag+ Br−
Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Ag+ Br− Figure 23. An edge-on view of the formation of an interstitial silver ion from a surface positive-kink site.
of the interstitial ions in emulsion grains are created by this surface generation process and the following equilibrium is appropriate: + −−−→ Ag+ kink ←−−− Aginterstitial
toward the center of the crystal where the normal bulk process dominates. The potential profile that arises from the ionic space charge region is also shown in Fig. 24. Theories that treat the ionic space region have been developed (73–77), and they predict how the concentration of interstitials and, based on knowledge of mobilities (Table 3), ionic conductivity will vary with grain size (78). These calculations agree reasonably with experimental data and show that ionic conductivity decreases with increasing microcrystal diameter. Using space charge theory, it is possible to calculate the number of interstitials per grain. A 1-µm diameter spherical AgBr grain would contain about 5,000 interstitial silver ions (78). Because of the higher energy requirements in AgCl (see Table 2), the number of interstitials in the same size AgCl grain would be two to three orders of magnitude lower.
(16) Electronic Properties
The increase in the density of negative kinks at the surface is another important feature associated with the surface generation process of interstitials. In fact, there is an increase of one negative kink for every interstitial silver ion created by the surface generation process and a corresponding decrease of one positive kink. Thus, an excess negative charge is built up on the surface, which is balanced by a subsurface region of interstitial silver ions. This spatially inhomogeneous charge distribution is known as the ‘‘ionic space charge region.’’ The concentration gradients are illustrated in Fig. 24. Note that the concentration of interstitials and vacancies is quite different near the surface, but they gradually approach equal concentrations as one moves
(a)
The electronic properties of a material are controlled by the energy levels accessible to its electrons. The relevant energy levels in molecules are those of the highest filled and lowest vacant molecular orbitals. In crystalline solids, these energy levels broaden into bands called the ‘‘conduction band’’ and ‘‘valence band.’’ These bands arise because the Pauli exclusion principle dictates that all of the electrons in the material be distinguishable (i.e., have a set of four quantum numbers that is unique). These bands in AgBr are derived from the Ag+ 4d and 5s and the Br− 4p and 5s atomic orbitals (79,80), as shown schematically in Fig. 25. Of special interest is the energy gap between the two bands, called the ‘‘forbidden gap’’ or the ‘‘band gap.’’ Electrons are quantum mechanically forbidden to have energies encompassed by this gap. Insulators like AgBr and AgCl are characterized by a filled valence band and an empty conduction band. Light absorption occurs when a photon transfers its energy
Log concentration
[Agi+ ]
x Conduction band (empty) − [V ]
Bulk levels
Br− 5s
Energy
Ag+ 5s Surface
Forbidden gap
(b)
x
Potential
0
−
Valence band (filled) Br− 4p Ag+ 4d
Ionic space charge region
Figure 24. Concentration profiles (a) of interstitials and vacancies and electric potential (b) as a function of distance x from the surface. The space charge region is indicated. ‘‘Bulk levels’’ are the concentrations expected in the absence of the ionic space charge effect.
Interionic distance Figure 25. Energy vs. interionic distance schematic illustrating the formation of the conduction and valence bands in AgBr. On the right are the orbital energies of the isolated ions. These orbitals mix together as the interionic distance decreases to form the indicated bands.
SILVER HALIDE DETECTOR TECHNOLOGY
to an electron in the valence band and promotes it to the conduction band. Once in the conduction band, the electron is free to migrate throughout the crystal, whereas those electrons in the valence band are confined to a very localized region. The energy of the band gap defines the lowest photon energy that can cause light absorption. Based on photographic data, the effective band gap is about 3 eV (410 nm) for AgCl and about 2.6 eV (475 nm) for AgBr. Mixed halide crystals are often used in practical applications. In AgClBr crystals, the photon threshold is intermediate between that for AgCl and AgBr, whereas the threshold wavelength in AgIBr crystals extends to somewhat longer wavelengths than those in AgBr. Typical absorption spectra of large crystals shown in Fig. 26 (81) indicate that absorption extends to a slightly longer wavelength than indicated by the photographic data. But this absorption is too weak to be photographically detected. The absorption spectra shown in Fig. 26 indicate a rather significant problem in recording color images. The AgBr detector responds only with sufficient sensitivity in the UV and blue regions of the spectrum, and incorporation of iodide extends the absorption only into the short green region. This deficiency is resolved by adsorbing spectral sensitizing dyes on the silver halide surface, which extends
106
Absorption coefficient (cm−1)
105 104 103 102 101
their absorption range through the red and even into the near IR. We will discuss this feature in a later section. Light absorption also leads to an empty electron position in the valence band which is referred to as a ‘‘hole.’’ Neighboring electrons can move to occupy the empty electron position, but this creates a vacant position elsewhere; the hole appears to move. Although the electrons in the valence band are the actual charge carriers, it is mathematically valid and simpler to focus on the hole as the moving entity. The hole has a characteristic mobility and mass, as well as a positive charge to complement the negative charge on the electron. Electrons and holes are confined to the energy bands, but localized regions in the crystal lattice, known as ‘‘traps,’’ can exist at energies within the forbidden gap. These traps are sites that may localize the electron or hole, as illustrated in Fig. 27. An electron is localized within a trap by losing enough energy, so that its energy is now coincident with that of the trapping level. The excess energy is dissipated to the surrounding lattice ions as thermal energy. An escape from the trap requires the trapped electron to acquire sufficient thermal energy so that its energy is now coincident with the bottom of the conduction band. Note that the holes in this diagram are trapped ‘‘upward’’ because the energy direction refers to that of the electron. Bear in mind that when a hole moves either spatially or energetically in one direction, it can be viewed as an electron moving in the opposite direction. Traps in silver halide materials are impurities (such as dopants) and lattice defects (such as dislocations and kink sites). Lattice defects generally have an associated uncompensated charge and interact with charged carriers (electron or hole) via Coulomb’s law. The situation is analogous to that of a hydrogen atom with a central charge and an orbiting particle of opposite charge. An equation to calculate the binding energy of the carrier to the trapping site comes from the Bohr treatment of the H atom, modified for the effective mass m∗ of the carrier and the dielectric constant ε of the material (82,83):
1
100
3
2
4
E=
10−1 10−2 220
260
300
340 380 420 460 Wavelength (nm)
500
540
580
Figure 26. Absorption spectra of pure and mixed halides at room temperature. (1) AgCl; (2) AgBr; (3) AgCl containing 10 mole% AgBr; (4) AgBr containing 3 mole% AgI. (From Ref. 81.)
Energy
e−
1273
q2t e2 m∗ 2
(17)
2h ε2
where qt is the trap charge and h is Planck’s constant divided by 2π . The effective mass arises because the carriers within the crystal appear to move with a mass different from their real mass. This effect is induced by the carrier interaction with the surrounding lattice ions as it moves through the crystal. An analogous equation is
Conduction band (CB)
Traps
Valence band (VB) Photon absorption
h+
Figure 27. Illustration of electron and hole trapping levels in a crystal. The energy arrow applies only to electrons; an oppositely directed one exists for holes, but is not shown
1274
SILVER HALIDE DETECTOR TECHNOLOGY
available for calculating orbital radius r: 2
r=
eh qt em∗
(18)
Typical binding energies and orbital radii are given in Table 5. The binding energies for the electron are small (kT or less), and the radii are correspondingly large (10’s of lattice spacings). Because of its much higher effective mass, the hole has a much higher binding energy and smaller orbital radii than the electron. However, the trapped carrier in both cases is in a delocalized state characteristic of coulombic traps. Impurity traps are derived from the atomic energy levels of the specific impurity but are modified by their environment within the crystal lattice. Examples of such traps are I− and Cu+ . These are isoelectronic with the lattice ions that they replace and therefore provide no long-range coulombic effects. However, their atomic energy levels are different from the surrounding lattice ions and provide the requisite trapping level. In contrast with the coulombic traps, these impurity traps provide high localization of the trapped carrier. In addition to these two types of trapping sites in the silver halide lattice, a third possibility is derived by combining these two; an example is Ir3+ . The probability of trapping by a particular trap is determined by its effective trapping radius or crosssectional area (84). The traps present a ‘‘sphere of influence’’ to a carrier diffusing through the crystal, and trapping occurs if the carrier passes through this sphere. The effective trapping radius is determined by the amount of energy that must be dissipated as a carrier falls from the free band to the trapping level. If the amount of energy to be dissipated is only on the order of a few kT or less, then the transition has unit probability of occurring, and the trapping radius is determined solely by the charge on the trap. This is typical for coulombic traps such as lattice defects. Then, the capture cross section of these so-called ‘‘shallow traps’’ is determined by equating the binding energy to the average thermal energy available (kT), and the resulting trap cross section for a trap with charge 0.5 e is of the order of 10−13 cm2 . Although obviously a small number, this cross section is about 100 times larger than that calculated for an uncharged site, which would have a trapping radius of the order of a lattice spacing.
When the amount of energy to be dissipated is larger than a few kT (>0.05 eV), a multistep process is required in which the energy is transferred to the surrounding lattice in small increments (85). This process requires the presence of intermediate energy levels that act like rungs on a ladder. These so-called ‘‘deep traps’’ are usually also associated with a shallow trapping coulombic level. For example, a silver atom located at a positive-kink site would be characterized by a shallow trapping level due to the coulombic nature of the kink site and by a deep trapping level associated with the 5s atomic energy level of the silver atom. This is depicted schematically in Fig. 28. The effective trapping cross section σt is given by (86) σt = σc η = σc
t t+i
(19)
where σc is the trapping cross section of the coulombic trap and η is a ‘‘sticking coefficient’’ that describes the probability that the carrier will transit to the deep level t, as opposed to returning to the conduction band i. Trapping of the electron (or hole) causes a lattice distortion (lattice relaxation) around the localized carrier because the charge density has been altered (87). Basically, the ions rearrange themselves to accommodate the higher (lower) charge, thereby lowering the energy and deepening the trap relative to the case in which there is no lattice relaxation. These processes are represented as configuration coordinate diagrams. An example of such a diagram is given in Fig. 29 for electron trapping at a deep trap such as a silver atom, which also has an associated coulombic level. The three-dimensional nature of the crystal is collapsed into a one-dimensional lattice distortion Q, where Q = 0 is the equilibrium configuration in the absence of the trapped carrier. Once the electron is trapped in the coulombic level, it can move to the relaxed state, provided that the energy barrier Ea is less than the coulombic trap depth Ec . Transition to the deep level occurs where the energy curves of the relaxed coulombic level and the deep level intersect (point C in Fig. 29). As drawn, this transition is favored, but, if the crossing point C lies above the barrier for thermal escape (point B), then detrapping is favored.
e− CB
i Table 5. Trap Characteristics of Charged Centers in Silver Halides
e− m∗ /m ε qt = 1 E(meV) ˚ r(A) qt = 0.5 E(meV) ˚ r(A)
0.29 13.1 23 24 6 48
AgCl h+ 1.71 13.1 140 4 45 8
e− 0.43 12.3 39 15 10 30
Energy
AgBr
Shallow
t
Deep Figure 28. Schematic showing the capture of an electron at a deep trap. Transition t represents the movement of the electron to the deep level, whereas transition i represents thermal release to the conduction band from the shallow trapping level.
SILVER HALIDE DETECTOR TECHNOLOGY
1275
Table 6. Trap Escape Times
CB
Trap depth (eV)
Escape timea
Comments
0.01 0.05 0.10 0.20 0.30 1.0
0.4 ps 1.8 ps 13.6 ps 0.75 ns 41 ns 67 h
Intrinsic defect traps Intrinsic defect traps Intrinsic defect traps Chemical sensitization traps Chemical sensitization traps Latent image
B
Ec
Eb
Ea
C
Relaxed state
Energy
Shallow trap
a
Deep trap
ν = 10−12 s−1
CB
e− e−
Q=0
Escape of a carrier from a trap involves transferring thermal energy from the surrounding lattice ions to the carrier. The rate of transfer depends on the lattice vibrational frequency which is about 1012 s−1 in AgBr. The probability that a certain amount of energy will be available is given by the usual Boltzmann factor, so that the overall rate of escape of a carrier is given by rateescape = ν exp
−E kT
(20)
where ν is the attempt-to-escape frequency, synonymous with the lattice vibrational frequency, and E is the trap depth, which is measured from the edge of the free band to the trap energy level. The inverse of the release rate gives the time for escape of the carrier from the trap, so that τ = ν −1 exp
E kT
(21)
Table 6 summarizes some typical escape times for a range of trap depths and indicates the kinds of entities that are associated with these trap depths. A major loss process in the photochemistry of silver halides is recombination between electrons and holes. Quantum mechanical selection rules prevent recombination between free electrons and free holes (7). Nevertheless, recombination does occur, provided that one of the carriers is trapped, as illustrated in Fig. 30.
Energy
Q Figure 29. Configuration coordinate diagram for trapping at a deep trap. Following capture of the electron at the shallow trap (coulombic level), a thermally assisted process allows formation of the relaxed state. From this state, the transition to the deep level occurs at crossing point C, also requiring thermal assistance. Ec is the coulombic trap depth of the shallow trap, Ea is the energy barrier at point B to relaxation, and Eb is the thermal trap depth of the relaxed state. Q is a one-dimensional simplification of the three-dimensional rearrangement of lattice ions as they accommodate the extra charge of the electron.
h+ VB
h+
Figure 30. Electron-hole recombination pathways in AgBr. Dotted lines depict trapping levels for carriers. Dashed lines indicate the recombination step. In the right-hand diagram, the hole is undergoing trapping at a filled level of the center that is providing the electron trapping level, after which the recombination occurs.
Under most conditions, the pathway on the left is favored because holes have a higher effective mass than electrons and therefore are more deeply trapped and because the electron’s mobility is much higher than that of the hole. The recombination often leads to luminescence at low temperatures (88,89), but this process is quenched at room temperature where nonradiative processes dominate. The photochemistry of silver halides involves the formation of atomic species — silver and halogen. The silver atom is formed when an interstitial silver ion migrates to the site of a trapped electron. The energy level scheme associated with this transformation is shown in Fig. 31. The electron is initially trapped in a shallow coulombic trap that is characteristic of a positive kink. Following this initial trapping, the lattice surrounding the trapped electron relaxes to accommodate this new added charge, thus deepening the trap. Then, the interstitial silver ion is attracted to this site which now bears a partial negative charge. Finally, the electron is localized at the deep level corresponding to the 5s level of the silver atom. There is a corresponding process for the hole, except that now a lattice silver ion must leave the site
1276
SILVER HALIDE DETECTOR TECHNOLOGY
e−
CB
e− e−
Energy
Trap empty 1
Trap filled 2
Trap relaxed + 3 Ag i 4
Free
y
f
Free
Trapped
g
b
Trapped
Atom
D
a
Atom
Hole
Electron
Ag 5s
Atom state 5
VB Figure 31. Sequence of events leading to the formation of a silver atom. Circled numbers indicate the sequence of events. Arrow depicts the ionic step in which a silver ion interstitial is captured at the relaxed trap.
Figure 32. Schematic representation of the three-way equilibria between the free, trapped, and atom states for both the hole and electron.
LATENT-IMAGE FORMATION adjacent to a trapped hole at a negative-kink site and move into an interstitial position, leaving behind a halogen atom (90). Both the silver and halogen atoms are unstable species. Most of the evidence for the instability of the silver atom, will be discussed in later chapters. Now, we can merely realize that for the silver cluster formation process to be efficient, which it often is, there must be a pathway for collecting all of the photoproduced silver atoms in one or a few sites on the grain surface. Although diffusion of silver atoms on the grain surface is one way of accomplishing this, the general consensus is that silver clusters form by alternate addition of electrons and interstitial silver ions, as discussed in the next section. Thus, the silver atoms must decay and re-form at a preferred growth site. There is no direct way to measure the lifetime of the silver atom, but indirect methods provide an upper limit of about 1 s (91–96). In the case of halogen atoms, experiments in which large silver halide crystals are placed in a chamber of halogen gas have shown increased conductivity of the crystals; this could happen only if the adsorbed halogen gas injected a hole into the valence band, resulting in the formation of a halide ion (97). In this section we have described three effective states for the electron — free, trapped, and atom. Similar states exist for the hole. These states are all transient; it is useful to use the schematic in Fig. 32 to describe the transitions between the different states. The fraction of time spent in each state is symbolized by Greek letters, so that φ + β + α = 1 and similarly for the three hole states. Because of the high interstitial silver ion concentration in silver bromide, the electron spends most of its time in the atom state. The low mobility of the hole coupled with its higher effective mass results in a deeply trapped state which means that the hole spends most of its time in the trapped state. We will return to this partitioning of the electron and hole into transitory states in the next section.
Nucleation-and-Growth Model A number of models have been developed to explain latent-image formation in silver halides (7,98–114). The most successful of them is based on ideas first proposed by Gurney and Mott in 1940 (99,100), shortly after the new ideas of quantum mechanics had been applied to crystalline solids. The Gurney–Mott model is based on the following assumptions: • Silver atoms form when photoelectrons combine with preexisting mobile interstitial silverions. • Silver atoms are unstable, and their diffusion to give silver clusters is not possible. • Aggregation of silver atoms proceeds by the addition of photoelectrons and mobile interstitial silver ions to the initial trapping site. • Trapping of a number of electrons occurs at a given site in the microcrystal and then a like number of interstitial silver ions is added. It was recognized rather quickly that the Gurney–Mott picture had a serious flaw. After trapping the first electron, a trapping site would acquire a negative charge that would repel subsequent electrons. To avoid this problem, it was suggested that each trapping of an electron should be followed by the addition of a silver ion (115). Thus, the charge should alternate between 0 and −e. Later, it was pointed out that the sites where the silver clusters form are most likely to be physical defects such as kink sites, that have a partial electrostatic charge (116). The charge on the site should alternate between partial negative and partial positive as electrons and silver ions are added alternately. These ideas have become the basis of a model first proposed by Hamilton and Bayer (117) and subsequently referred to as the nucleation-and-growth model (7,106) because it includes a two-stage concept of nucleation and growth (118–123). It also considers that the trapping of electrons at trapping sites and atom
SILVER HALIDE DETECTOR TECHNOLOGY
e−
(a)
CB Trapping level
Energy
e− + (+kink)+1/2
(trapped e−)−1/2 VB
(b)
Energy
Agi+
e−
e−
Ag
Ag
CB +
Agi
(trapped e−)−1/2 + Agi+ (Ag0)+1/2 VB
Figure 33. The initial electronic (a) and ionic (b) events in the formation of the latent image. Figures give a band diagram view, whereas the inset shows the chemical equilibrium between the two states.
formation are reversible, as shown in Fig. 33. These two reversible events form the basis for the three-way cycle diagrams seen earlier in Fig. 32. Following the formation of a silver atom, it is possible for a second electron to be trapped at this site followed by the addition of a second silver ion. These two events constitute the nucleation stage and result in a two-atom center. Unlike the silver atom, the two-atom center is considered a permanently stable species during latent-image formation. Because nucleation cannot occur without a silver atom, nucleation requires at least two electrons in the grain. Now, we can add nucleation to the other events depicted in Fig. 31 to get Fig. 34. Growth is the continuation of this scheme of alternate addition of electrons and interstitial silver ions at the trapping site. The growth stage continues as long as both reactants are available.
CB
e−
e− e−
e−
Energy
1
2
3
5 Agi+
Interstitial silver ions are continually replenished via the Frenkel equilibrium (Fig. 17), so the terminating step of the growth sequence is exhaustion of the electron supply. A basic premise of the model is that the various states alternate in charge and provide an electrostatic driving force for each step (7), as shown in Fig. 35. As the state of the grain moves from left to right along this diagram, larger and larger silver clusters are produced. As we will see shortly, a certain size cluster is required to promote the development of the grain, so that growth need only achieve this threshold size for the grain to be developable and thereby contribute to the macroscopic image. From what has been said so far, one might conclude that, for example, if a five-atom silver cluster is required to initiate the development reaction, then the grain need absorb only five photons to produce such an occurrence. However, we have not yet included the major inefficiency in the model that will dramatically increase the number of absorbed photons required for the grain to achieve developability. This inefficiency involves recombination between electrons and holes, and, as discussed earlier, there are two possible pathways. For free electrons recombining with trapped holes, we recognize that a negative-kink site would be a possible hole trap that would then act as a recombination center whose formal charge is +0.5 e. Likewise, for free holes recombining with trapped electrons, we recognize that an electron trapped at a positive-kink site would be a recombination center whose charge is −0.5 e. Thus, both pathways have an electrostatic driving force that allows them to compete readily with nucleation and growth. As discussed earlier, the dominant pathway is recombination between free electrons and trapped holes. Figure 36 summarizes all of the events in latent-image formation that we have discussed. On the left are the two three-way cycles that characterize the transient states of the electron and hole. Also included within this region are the two recombination pathways. To the right are the irreversible events of nucleation and growth. Note that nucleation involves a free electron combining with an electron in an atom state to produce an electron-atan-atom state. Then, moving diagonally from this state represents the addition of a silver ion to form a twoatom center. Similar processes occur to produce larger
e− e−
Trap Trap Trap Ag empty filled relaxed
1277
Ag + e− 6
e−
Unstable
Nucleation
−e/2
−e/2
Growth
Ag2
Agi+
8
7
4
VB Figure 34. A continuation of Fig. 31 showing the formation of Ag2 .
e− Kink +e/2
Ag+ e− Ag + e/2
Ag+ e− Ag2 +e/2
−e/2
Ag+ e− Ag3 +e/2
−e/2
Ag+ e− Ag4 +e/2
Figure 35. The sequence of events that lead to the formation of a developable silver cluster. Note the alternation of charge on the site which is given at the top and bottom of the diagram. (From Ref. 106. Reprinted with permission from World Scientific Publishing Co. Pte. Ltd.)
1278
SILVER HALIDE DETECTOR TECHNOLOGY
and
Free hole y
Trapped hole g
Recombination
hn Free electron f
Nucleation
Growth
= (e)(e − 1)n where
Trapped electron b
Electron at atom
n = φαve σAg V −1
Electron at two atoms etc.
Bromine atom D
(26)
Two silver atoms
Silver atom a
Three silver atoms
Figure 36. A schematic representation of the essential events in the nucleation-and-growth model. Light absorption is symbolized by hν. Additional events may be needed to treat specific cases. (From Ref. 124. Reprinted with permission of The Society for Imaging Science and Technology, sole copyright owners of the Journal of Imaging Science and Technology.)
silver clusters. The scheme depicted in Fig. 36 is the minimum number of events needed to explain latentimage formation. The sensitometric behavior of silver halides can be explained largely by understanding how the forward moving processes of nucleation and growth compete with the backward moving recombination process, as well as how nucleation and growth compete with each other. There will be many examples in the remaining sections to illustrate these concepts. In some situations, additional events may be needed to explain latent-image formation. The ideas expressed before concerning latent-image formation can be put into equations that can be used as a basis for a simulation program to predict the photographic consequences of various factors (117,125–127). The three-way cycles can be treated with a steadystate approximation. Equations for the irreversible events can be derived by assuming that the ionic event — the addition of an interstitial silver ion — is a fast process and electron or hole capture is rate-limiting. Generically, the rates λct of the irreversible processes can be expressed as λct = nc nt vc σt V −1 (22)
Note how the concentration of silver atoms is expressed. The (e − 1) factor is used so that the nucleation rate cannot be greater than zero unless there are two or more electrons in the three-way electron cycle. Physically, this factor indicates that an electron can participate with all other electrons in silver cluster formation, but not with itself. Similarly, a growth rate at an n-atom silver cluster can be expressed as λg(n) = [free electrons][n-atom centers](rate constant),(28) (29)
= eAgn g(n)
(30)
where g(n) = φve σAgn V −1
(31)
and Agn is the number of n-atom centers. Equation (30) gives the rate of growth at just one size silver cluster. The total growth rate is the sum of rate expressions for all of the silver clusters present. Note that the rate constant for growth differs from that for nucleation primarily in the capture cross section used — that for a silver atom versus that for a silver cluster. We will shortly see that this is a very fundamental difference. We can develop similar equations for recombination. However, now there are two rates — one for freeelectron/trapped-hole recombination and the other for free-hole/trapped-electron recombination. For the first pathway, λr1 = [free electrons][trapped holes](rate constant), (32) = (φe)(γ h)ve σγ V −1
(33)
= ehr1
(34)
(23)
where nc is the number of carriers, nt is the number of traps, νc is the thermal velocity of the carrier, σt is the capture cross section of the trap, ct is the rate constant, and V is the grain volume which is used to put the rates in per-grain terms. If we let e be the number of electrons partitioned among the free, trapped, and atom states, and let h be the number of holes similarly partitioned, then, using the template expressed by Eqs. (22,23), the rate of nucleation can be expressed as λn = [free electrons][silver stoms](rate constant), (24) = (φe)α(e − 1)ve σAg V −1
= (φe)(Agn )ve σAgn V −1 and
and = nc nt ct
(27)
(25)
where σγ is the cross section for electron capture at a trapped-hole site. For the second pathway, λr2 = [free holes][trapped electrons](rate constant), (35) = (ψh)(βe)vh σβ V −1
(36)
= ehr2
(37)
where σβ is the cross section for hole capture at a trappedelectron site, (38) r1 = φγ ve σγ V −1 and r2 = ψβvh σβ V −1
(39)
SILVER HALIDE DETECTOR TECHNOLOGY
1279
The rate constant for the total recombination is r = r1 + r2 = [φγ ve σγ + ψβvh σβ ]V −1
(40)
c c.b.
c.b.
Equation (40) can be simplified by the following definition: f =
Shallow state
v h σβ v e σγ
Excited state
(41) Silver atom level
so that now f ψβ r = φ γ + ve σγ V −1 φ
ω=γ +
f ψβ φ
(43)
which then leads to r = φωve σγ V −1
(a) Nucleation (h<<1)
(42)
Finally, to make the expression for r similar to that for n and g(n) , we make the following definition:
Ground state of unfilled silver level (b) Growth (h ≅1)
Figure 37. A configuration coordinate diagram, similar to that in Fig. 29, that illustrates the trapping of an electron at a silver atom (a) and a silver cluster (b) in an unsensitized grain. The ‘‘excited state’’ curves in (b) indicate postulated intermediate orbital energies available in a silver cluster that are not available in the silver atom. (From Ref. 106. Reprinted with permission from World Scientific Publishing Co. Pte. Ltd.)
section in terms of the shallow-level trapping cross section and the sticking coefficient η [Eq. (19)], (44) σt = ησc
Now, let us summarize the rates for the three irreversible events of interest: λn = e(e − 1)n λg(n) = eAgn g(n) λr = ehr
(45) (46) (47)
where n = (φve V −1 )ασAg g(n) = (φve V
−1
)σAgn
(51)
Following the preceding discussion, we can identify the sticking coefficient for nucleation (Fig. 37a) as ηn , and that for the growth (Fig. 37b) as ηg . We can also think of these sticking coefficients as efficiencies and also note that ηn ηg based on the previous arguments. If we assume that the transition to the deep level for both growth and recombination is highly efficient, then ηg = ηr = 1. Based on the previous simplifications, we can now rewrite the rate constant expressions [Eqs. (48–50)] using
(48)
σAg = ηn σc
(52)
(49)
σAgn = ηg σc
(53)
σr = η r σc
(54)
n = (φve V −1 )αηn σc
(55)
g = (φve V −1 )ηg σc
(56)
r = (φve V −1 )ωηr σc
(57)
and
and r = (φve V
−1
)ωσγ
(50)
These rate equations can be further simplified by making the following assumptions. First we recognize that the shallow coulombic levels involved in nucleation, growth, and recombination have similar charges, so that the shallow-level trapping cross section σc is the same for all three events (128). Second, because charge is the predominant factor that determines the trap cross section, the size of the silver cluster has little effect on σc . A further assumption is that the transition to the deep level is inefficient for the silver atom, but efficient for silver clusters. Figure 37 is a configuration coordinate diagram, similar to that shown in Fig. 29, that compares the electron transition to the deep level for these two centers. Based on molecular orbital calculations, it is postulated that there are intermediate levels for two-atom silver clusters and larger that facilitate the dissipation of energy, but no such levels exist for electron capture into the silver-atom site (106). Recall that earlier we defined the trapping cross
to give
and
Furthermore, because we have already argued in the previous section that α ≈ 1 and have now assumed that ηg = ηr = 1, then n = (φve V −1 )ηn σc g = (φve V and
−1
)σc
r = (φve V −1 )ωσc
(58) (59)
(60)
1280
SILVER HALIDE DETECTOR TECHNOLOGY
From these equations, we can define the nucleation efficiency ηn and the recombination index ω as ηn =
n g
(61)
ω=
r g
(62)
and
Silver Clusters and Development We will discuss development of the latent image in much detail later, but some discussion is required at this point to show how the latent image confers developability upon the emulsion grain. Developers are multicomponent solutions, but the key ingredient pertinent to our present discussion is the developing agent itself. These are organic molecules that can transfer electrons to appropriate acceptors, here, the latent image. A typical energy-level diagram for the initiation of development is shown in Fig. 38. On the left of the diagram is an exposed grain in which the electron accepting level of the photoproduced silver cluster is within the band gap and is energetically coincident with the highest filled (HF) molecular orbital of the developer molecule. Thus, electron transfer from the developer molecule to the latent image is energetically favorable. Also, shown in Fig. 38 on the right is the situation for an unexposed grain. Electron transfer from the developer to the grain in this case can occur only by transfer to one of the many energy levels within the conduction band. Such a transition has low probability because of the high energy barrier. However, if given enough time, electron transfer will occur and lead to the formation of a silver cluster whose energy levels are within the band gap. Now, further electron transfers become much more likely and cause the unexposed grain to develop. This grain will contribute to the fog density.
CB
CB
e− e− Agn Latent image
VB
VB AgBr exposed
Developer solution
AgBr unexposed
Figure 38. An energy-band diagram depicting the development stage of the photographic process for an exposed and an unexposed grain. The energy level of the highest filled molecular orbital of the developing agent is depicted in the center of the diagram.
It should be apparent from the preceding discussion that the discrimination between exposed and unexposed grains is due to the different kinetics of development of these two types of grains. The presence of the latent image increases the development rate and allows detecting the latent image before significant fog arises. This large difference in development kinetics makes the photographic process viable. There is evidence that there is a critical silver cluster size, below which the development reaction cannot be initiated within the normal development time. Indirectly, experiments with vacuum-deposited silver clusters on SiO2 followed by physical development have shown that development is initiated by four-atom and larger silver clusters (129). Also, light latensification experiments, in which the film is given an overall uniform exposure at low irradiance following an initial exposure through an irradiance-modulating step wedge, have led to sensitivity increases up to a factor of 10 (130,131). The uniform exposure in these experiments provides additional electrons that allow the subdevelopable silver cluster (‘‘latent subimage’’) to grow to a latent image. The number of developable grains increases and results in the observed speed increase. These experiments indicate there are stable silver clusters that are not detectable under the usual development conditions. Direct techniques have recently been devised for determining the minimum developable size which involve selecting the size of silver clusters emitted from an oven and directing them toward a target that contains silver bromide grains from which a major part of the adsorbed gelatin has been removed (132,133). Initial concerns about fragmenting the clusters have been decreased by using a buffer gas that reduces the kinetic energy of the clusters when they arrive at the grain. Results show that grains become developable when four-atom and larger clusters are used. Concerns still remain that residual gelatin is present and that the deposited clusters reside on top of the gel layer, where their catalytic activity may be different from that of clusters adsorbed directly on the grain surface. The preceding discussion has focused on a single threshold size for catalyzing the development reaction. In fact, studies at different development times show that the sensitivity continually increases as the development time increases, until a fog limit is reached. One cannot explain such results by assuming a fixed threshold size or a threshold size that changes by one-atom jumps. Rather, it is necessary to assume that there is a distribution of developable sizes that varies continuously with increasing development time and allows detection of smaller and smaller clusters (134). In establishing a threshold size for development, we are really saying that the energy level of the silver cluster is such that it acts as an acceptor for electrons transferred from the HF level of the developer molecule. Early attempts to explain these results hypothesized that the electron accepting levels of the silver clusters continually decreased with increasing size relative to the CB bottom until the level matched that of the developer HF level, whereupon development would be initiated (135).
SILVER HALIDE DETECTOR TECHNOLOGY
Because bonding in silver clusters is expected to involve primarily the 5s electrons, simple molecular orbital theory can also be used to calculate occupied and unoccupied states (5). A typical result is shown in Fig. 39. Note the odd–even oscillating nature of the electronic properties. Clusters that contain an even number of atoms have either filled or empty orbitals; this results in a large difference between their electron affinity and ionization potentials. Clusters that contain an odd number of atoms have halffilled, as well as filled and empty orbitals, that result in a near coincidence of the electron affinity and ionization potentials. This alternating odd–even character has been confirmed in gas-phase metal clusters that have filled d shells and lone s electrons (136). Such behavior has also been found in photoproduced silver clusters on emulsion grains (137). To be relevant to the photographic process, an energylevel picture of silver clusters adsorbed on AgBr is needed. This has been attempted in calculations using a model that consists of silver clusters in virtual lattice sites but allows no ionic relaxation of the lattice (138). Electronic interactions between the AgBr substrate and the silver clusters, however, were allowed, and the energy-level picture that emerges is similar to that in Fig. 39. A feature missing in these results is an indication of a critical threshold size. More recent studies using more sophisticated calculations that allow for lattice relaxation have produced similar results (139). Inefficiencies
Electronic energy
Recombination has already been discussed previously as a dominant inefficiency, but other inefficiencies can also be important in certain situations. Inefficiencies can be classified into three categories (140). The first category deals with the loss of light-generated electrons, which includes recombination as the dominant inefficiency. Other pathways for electron loss involve reactions with oxygen and/or water in the film coating and irreversible trapping at impurities either within the grain or adsorbed on the surface. The second category deals with loss of silver which includes reaction with oxygen and/or water, thermal dissociation, and oxidation by reaction with holes. The latter process is sometimes called ‘‘regression’’ because it is the reverse of the cluster-forming reaction.
Ag
Ag2
Ag3
Ag4
Ag5
Ag6
Ag7
Figure 39. An energy-level diagram from simple molecular orbital theory for isolated linear silver clusters. Only those orbitals formed from 5s atomic orbitals are shown. Filled circles represent electrons. (Reprinted with permission from the American Institute of Physics.)
1281
The third category focuses on the inefficient use of electrons that involve either internal latent-image formation or dispersity. In the former, competition between internal sites provided by lattice defects or perhaps impurities and surface sites leads to internal latent-image formation. All developers are formulated to detect only the latent image at the surface of the grain, or at best slightly subsurface, so some of the silver is produced in a region that is inaccessible to the developer and results in a speed loss. When all of the surface silver is not located at the same site, this is termed a ‘‘dispersity inefficiency.’’ Any inefficiency from the dispersity of silver may be equated specifically to the undevelopable fraction of grains, which would be developable if the silver atoms they contain could be concentrated into a single center per grain (141). Thus, the inefficiency depends strongly on the minimum developable size. Dispersity occurs when the rate of nucleation becomes comparable to that for growth. We see from Eqs. (45,46,55,56) that this would occur when e and Agn are large and ηn ≈ ηg . From what was said earlier about nucleation efficiency, it seems that the latter condition is not possible, but we will see later that such a condition is approached in chemically sensitized emulsions. The discussion that follows assumes this. If the minimum developable size were two atoms, then dispersity would be nonexistent because the initial nucleation makes the grain developable and any other nucleation would be superfluous. If the minimum developable size is three atoms, the dispersity inefficiency would correspond to the fraction of grains that has two or more two-atom clusters, but none larger. Simple probability considerations indicate that this fraction will be very small except in the shoulder region of the D– log E curve, where it would cause a loss in sensitivity of about 40%. When the minimum developable size is four or more atoms, the number of ways of configuring the silver in stable but undevelopable centers on the surface increases (142) and leads to strong dependence of the degree of dispersity inefficiency on the threshold size for development. In an ideal situation leading to one cluster per grain, only one nucleation should occur followed by a series of growth events to produce a developable cluster. Thus, nucleation must not be competitive with growth. This is usually the case at low irradiance (exposure times of 100’s of seconds or longer) where electrons are being produced very slowly and the rate of nucleation, given by Eq. (45), is zero most of the time. This ideal is also approached at intermediate irradiances (exposure times from 0.1 to 10 s). However, at high irradiance (ms or shorter exposure times), the rate of nucleation can be comparable to that of growth, provided that the nucleation efficiency is reasonably high (say, greater than about 0.2). Under these conditions and assuming a minimum developable size greater than three atoms, the dispersity inefficiency will cause a significant loss of sensitivity. This leads to high-irradiance reciprocity failure which will be discussed in a later section.
1282
SILVER HALIDE DETECTOR TECHNOLOGY
Computer Simulation In its general form, the nucleation-and-growth model is too complicated for analytical study, but computer simulation is an alternative way to study the model’s predictions. Simulation forms a bridge between the experimental and the theoretical worlds and gives us insight into the ways the physical properties of the silver halide grain determine its photographic properties. The simulation is based on the Monte Carlo technique (117,125,126). It is assumed that events occur at random, but the sequence is weighted by the relative probabilities of the events. The rates of the various events are found from Eqs. (45–47) plus an additional one for light absorption λa : λa = kI (63) where k is a constant and I is the irradiance. To derive appropriate values for the rates of the various processes, we need values for the rate constants as well, Eqs. (58–60), and the relative values of these will in turn depend on ηn and ω. Once the appropriate rates are available, they are normalized by the total rate λtot (sum of all the rates) to form relative probabilities. Then a uniform random number r1 is generated, and the next event u to occur is determined by the following inequality: u−1 u λi λi < r1 < λtot λtot i=1 i=1
(64)
Then, a second random number r2 is generated and used to determine the time between events τ using the equation 1 1 ln τ= λtot r2
(65)
The rates are then updated by changing the values for e, h, Agn in accordance with the event that just occurred, and the process is repeated until the specific exposure time is reached, whereupon I = 0 and the process continues until λtot = 0. At this point, the simulation for one grain is complete. But, because the events are chosen at random and because the experimental data for comparison are the mean responses of a large ensemble of grains, the entire process must be repeated for each of many grains. By surveying the outcome for this ensemble of grains, we can determine the fraction of grains F made developable for a given minimum developable size for a particular exposure level. To simulate the entire response curve, we need to increase the number of absorbed photons/grain and repeat the entire process. This yields an F – log P curve where P is the mean number of absorbed photons per grain. Finally, if we add increments to the irradiance and decrease the exposure time in steps repeatedly, while keeping the total exposure constant, we can simulate the reciprocity failure characteristics. Quantum Sensitivity In the absence of any other inefficiency besides recombination, a simple analytical model (143) can be used to derive
an equation for the mean number of absorbed photons EN required to produce a minimum developable size N: EN = N +
2ω +ω , ηn
(66)
where is the number of holes accumulated during the growth stage, i.e., N = h. (67) h=3
The first term in Eq. (66) represents the number of absorbed photons required to create a cluster of size N. The second and third terms of Eq. (66) represent the number of absorbed photons, and therefore electrons, that are lost to recombination during nucleation and growth, respectively. EN is equivalent to the quantum sensitivity (QS) of the emulsion which is the mean number of absorbed photons/grain required to make 50% of the grains developable. Note that this definition means that lower QS values lead to higher efficiency. A situation which will produce the maximum efficiency is ηn = ω = 1, and Table 7 summarizes the QS values for this case for different N values. An assumption in deriving Eq. (66) is that the quantum yield of carrier production by light absorption is unity. Every absorbed photon generates a conduction band electron, so that any deviation of QS from the ideal value is due to recombination inefficiency. This assumption is supported by measurements on AgBr (144), which show that the quantum yield is greater than 0.9, and other work on AgCl, which indicates even higher values (145). An interesting exercise that illustrates the role of recombination in reducing efficiency can be done assuming ηn = ω = 1. Under these conditions, the rate constants for nucleation, growth, and recombination [Eqs. (58–60)] are all the same, and the relative rates of these processes [Eqs. (45–47)] are simply determined by the concentrations of electrons, holes, and silver clusters. It is also assumed that holes are immediately trapped upon creation. Imagine that one silver atom has been produced and that a second photon absorption has just produced a conduction-band electron and a trapped hole. Then, the situation would look like the top panel of Fig. 40, where the second hole is the companion to the electron that led to the formation of the silver atom. Note that the electron can be removed from the conduction band either by trapping at the silver atom to produce Ag2 (after addition of an interstitial silver ion) or recombination with either of the two trapped holes. As indicated before, these three events Table 7. Quantum Sensitivities Calculated from Eq. (66) N
(2 ω)/ηn
ω
QS, absorbed photons/grain
2 3 4 5
2 2 2 2
0 3 7 12
4 8 13 19
SILVER HALIDE DETECTOR TECHNOLOGY
1283
Cs = Ce
e− CB Ag
107
∆G [J • 1019 ] Holes
VB
2 108 e−
CB Ag2 109 1 VB e−
1010
CB Ag3
1012 1
VB Figure 40. Schematic showing the pathways available to a conduction-band electron at various stages of latent-image formation.
have equal probability, and therefore recombination is twice as likely as nucleation. After a number of absorptions, successful nucleation occurs. This brings us to the middle panel of Fig. 40, where after absorption of an additional photon, there are one Ag2 center and three trapped holes. Now, recombination is three times as likely as growth. After a number of tries, successful growth occurs and we move to the bottom panel of Fig. 40. Here, we see that after an additional photon absorption, recombination is four times as likely as growth. By this point, the pattern has become obvious. If we assume that development can be catalyzed by a four-atom cluster, then the total number of electrons required is four to make Ag4 , plus an average of two electrons lost to recombination during nucleation, plus an average of three electrons lost to recombination in growing from Ag2 to Ag3 , plus an average of four electrons lost to recombination during growth from Ag3 to Ag4 , a total of 13 electrons. This means that the QS for this case is 13 absorbed photons/grain, consistent with the results summarized in the third row of Table 7. Furthermore, the schematics in Fig. 40 clearly indicate where the electrons are being lost and that the efficiency of latent-image formation decreases as the cluster size required for development increases. This is a feature of the photographic process that is well known experimentally (134,146). Other Models of Latent-Image Formation Moisar and Granzer favored a phase formation model of latent-image formation (110–112). Their approach uses the Gibbs–Thomson treatment of forming a new phase under supersaturated conditions [see Eq. (9)], as shown
2
3
4
5
1011
6
7
n
Figure 41. Free energy vs. cluster size diagram based on the phase-formation model. Labels on curves give supersaturation values. (From Ref. 112. Reprinted with permission of The Society for Imaging Science and Technology, sole copyright owners of the Journal of Imaging Science and Technology.)
in Fig. 41. It is assumed that a solute/solvent system applies to silver cluster formation in AgBr, where AgBr is the solvent and silver in the form of interstitial silver ions and electrons is the solute. Exposure produces a nonequilibrium concentration of electrons and holes. Electrons form silver atoms, and holes attack silver so that the system can return to equilibrium. A hole removal pathway must be available to produce a stable photoproduct. To calculate the supersaturation, it is necessary to know the number of electrons and holes in the material before exposure. This was calculated using equations from semiconductor physics which lead to a value of 4.5 × 10−15 /m3 when the band gap of AgBr is used. If the generation rate of electrons and holes during exposure is 100 –103 /grain-s and the lifetime of the electron is 10−7 s, then the concentration is 10−7 to 10−4 /m3 . This leads to a supersaturation of about 108 to 1011 . As seen in Fig. 41, these supersaturation values lead to unstable small clusters (left of curve peak) and stable large clusters (right of curve peak). Furthermore, the peak position along the x axis is a function of the supersaturation, which could be correlated with the exposure irradiance. Atomically, the small clusters are attacked by holes and decrease in size, whereas the larger clusters are on the other side of the peak where they grow in size because they are not attacked by holes. The main criticism of the phase-formation model is that an average value of supersaturation is used, but what is really needed is the instantaneous supersaturation (7). If two electrons are generated in a 10 µm3 crystal, then the instantaneous supersaturation is about 1017 . A supersaturation of this magnitude would produce a curve in Fig. 41 that would simply have a maximum at n = 1 and decrease at larger n. Thus, all silver centers,
1284
SILVER HALIDE DETECTOR TECHNOLOGY
including the silver atom, would be stable and grow. There would be no threshold size for stability. Additionally, an equilibrium solubility of 10−15 /µm3 does not make sense physically because it implies that most grains will have zero electrons. The first photon absorbed will produce a supersaturation of infinity! The phase-formation model also rules out the existence of any latent subimage because it would dissolve in the host AgBr phase. Finally, the Gibbs–Thomson theory does not deal with the tendency of the silver clusters to react with holes. Thermodynamically, all cluster sizes should react. Another model of latent-image formation has been proposed by Mitchell (101–104). Silver atoms in this model form by simultaneously trapping electrons and interstitial silver ions at a trapping site. The resulting silver atom is thermally unstable and is not a trap for electrons. Ag2 forms by the simultaneous capture of an electron and an interstitial silver ion at the silver atom. Because Ag2 is not stable, the Ag3 cluster forms similarly to Ag2 , except, of course, now it is at the Ag2 site. Ag3 is the cluster of minimum size that can extract a silver ion from a neighboring lattice position to form Ag4 + , which is a stable cluster and, by virtue of its positive charge, is a good trap for electrons. Mitchell calls it a ‘‘concentration speck.’’ It is considered that it has a very long range effect on electrons because of its charge, and provides a large trap depth. Once this stage of latentimage formation is reached, the silver cluster grows by alternate addition of electrons and lattice silver ions. The main inefficiency in this model is the attack of holes on silver atoms and dimers, the positive charge on Ag4 + and larger clusters preventing such a regression reaction. Recombination between electrons and holes is ignored. There are several criticisms of the Mitchell model. For example, it assumes that no traps are present due to lattice defects, yet in emulsion grains there is a large difference between the mobility measurements of electrons that are designed to include the time spent in traps and those that are designed just to measure the mobility between trapping events (147). Such a difference is not found in large crystals. This difference is best explained by the large surface-area-to-volume ratio in emulsion grains and the inherent trapping sites associated with charged surface defects. Quantum mechanical treatments of the trapping problem also lead to the type of binding energies and trap radii discussed earlier using the hydrogenic model (148–150). The inability of Ag, Ag2 , and Ag3 to act as traps is based on the assumption that the electron affinity of the free clusters is pertinent. But a large downward shift of energy levels occurs when the substrate effect is considered (139,148–150). Furthermore, the large effective capture radius attributed to the concentration speck is untenable in view of the large dielectric constant of silver halides. Because the Ag, Ag2 , and Ag3 form at uncharged sites in the Mitchell model, they should have a small cross section for capturing holes. The known inefficiency of latent-image formation is difficult to explain in these terms. Reciprocity failure, an inherent inefficiency in the photographic process, is also largely ignored by the Mitchell model.
EFFICIENCY AND SPEED Quantum Sensitivity Measurement In the previous section, we introduced the concept of quantum sensitivity (QS) in connection with the nucleationand-growth model of latent-image formation. This is a property that can also be measured experimentally. To accomplish this, a transformation of the characteristic curve of the emulsion from D– log E to F – log P space must be possible. In monodisperse emulsions, this is relatively straightforward, and the log E transformation can be accomplished by measuring the appropriate physical properties of the coating and emulsion that will be described shortly. To convert the density axis into fraction of grains developable, we rely on the Nutting equation which says that density is proportional to the number of developed grains multiplied by their average projective area (151). Therefore, normalization of the D values produces the required F values. This normalization procedure is expected to work well for monodisperse grains and development in which the entire grain is converted to silver (134). A further assumption in QS measurements is that all of the grains in the emulsion coating have equal access to the incoming photon flux. This assumption can be approximated only in practice because a finite layer thickness is required to achieve densities that can be measured accurately. Under these conditions, there is an inevitable decrease in irradiance depthwise through the coating, although light scattering by the grains (152) tends to diminish the magnitude of the effect that is predicted by the Bouguer–Lambert Law (153–155). In practice, a compromise is struck between accurate density measurement and uniform exposure by coating the emulsion to give a maximum density of about one. As an example, an emulsion composed of 0.5-µm diameter AgBr grains and exposed at 400 nm would have a decrease of light irradiance of about 25% between the top and bottom of the coating, when coated to achieve a Dmax of one (156). Using the preceding assumptions and approximations, the following equation (156) is used to calculate QS from experimental data: QS =
E Ec
AITVρ SW
,
(68)
where E is the exposure at the reference density point on the D–log E curve, Ec is the exposure at the clear step of the step tablet (the point of maximum exposure), A is the asorptance of the coating, I is the irradiance at the film plane, T is the exposure time, V is the grain volume, ρ is the density of silver halide, S is the silver coating weight in the emulsion layer, and W is a factor for converting silver coating weight to silver halide coating weight. A propagation of error analysis using this equation has shown that typical uncertainty (two sigma) in the measured QS is about 15% (156). An example with typical values is shown in Fig. 42. Measuring QS in polydisperse emulsions is considerably more involved. The transformation from density to
SILVER HALIDE DETECTOR TECHNOLOGY
1285
following equation:
1.0
P P PM−1 + + ··· + F = 1 − eP 1 + P + 2! 3! (M − 1)! D 0.5
0.0 2.0
3.0 Log E
A = 0.29 11
I = 1.86 × 10
4.0 r = 6.47 g/cm3
photons/cm −s S = 0.97 g/m2
T = 1s V = 0.051 × 1012 cm3
2
W = 1.74 E = 550, Ec = 10,000
(69)
Examples of the resulting curves are shown in Fig. 43. Note that the response curve is not a step function, as might be expected if all the grains respond at the same photon threshold, because light absorption within a coating follows Poisson statistics and the abscissa in Fig. 43 has units of mean absorbed photons/grain. A summary of QS data is shown in Fig. 44. Note that most of the data points fall between 10 and 30 photons/grain. The filled symbols represent special cases where the sensitization was done to eliminate all recombination. In this case, the QS is 2–3 photons/grain, which is consistent with the expected minimum developable size of three atoms for these types of emulsions (124). The remaining data show that some inefficiency still remains in commercial emulsions, indicating that there is still room for efficiency improvement in the silver halide detector.
QS = 5.8 ± 0.9 photons/grain
fraction developable using the Nutting equation is no longer possible because different size grains make different contributions to the measured density. Likewise, transformation from log E to log P is not straightforward because different size grains have different mean absorption values. As a result, the D–log E curve is a convolution of many D–log E curves, one for each size class of grains in the emulsion. These considerations indicate that the emulsion response must be analyzed in terms of individual grain size classes usually using some form of microscopy — either optical microscopy for large grains (>2 µm diameter) or electron microscopy for smaller grains. The fraction of grains developable can be determined directly by sizing developed and undeveloped grains or indirectly by using residual grain analysis in which only the undeveloped grains are sized and the developable fraction is inferred by reference to the unprocessed emulsion. The latter technique is more common because development often destroys any remnants of the grain before development. Once these data are available and the absorbed light is apportioned among the grains according to their size, it is a simple matter to generate the F –log P curves for each size class. Theoretical Limits The theoretical limits of QS were derived by Silberstein many years ago (157). Assuming no inefficiencies in latentimage formation, one latent image per grain, and a very thin coating, so that all grains have equal access to the incoming light, it is possible to derive theoretical curve shapes for various values of exposure expressed as the mean number of absorbed photons/grain for different threshold values M to induce developability by using the
Factors Affecting Speed Photographic speed is a measure of the amount of light that must fall on a film to form a latent image. Speed is a function of three factors: (1) the amount of light absorbed, (2) the efficiency with which that absorbed light is used, and (3) the minimum developable size of the latent image. We have already discussed the second two factors in detail. Light absorption by AgBr grains is proportional to their size — to volume if the exposure is in the intrinsic region of absorption, to surface area if the exposure is in the region of absorption by adsorbed dye. As a result, we expect speed to increase linearly with grain size (either volume or surface area, depending on the exposure wavelength), assuming that factors (2) and (3) are independent of grain size. The shape of the grain also can affect speed. Consider ‘‘flattening’’ a cubic grain into a tabular one, while maintaining its volume. The tabular grain would have a much higher surface area than the cubic grain, leading to
1.0 Fraction developable
Figure 42. A sample calculation of quantum sensitivity for the indicated experimental data using Eq. (68).
0.8 0.6 1
2
4
8
16
32
0.4 0.2 0.0 −1.0
0.5 1.0 −0.5 0.0 Log mean absorbed photons/grain
1.5
Figure 43. Theoretical response curves from the Silberstein model [Eq. (69)]. Curve labels indicate the photon threshold to reach the developable state. (Reprinted with permission of The Society for Imaging Science and Technology, sole copyright owners of the Journal of Imaging Science and Technology.)
1286
SILVER HALIDE DETECTOR TECHNOLOGY
Absorbed photons/grain
1
10 ∗ + +
100 0.001
0.01
+
+
+ + +
0.1
1
10
Grain volume (µm3) Figure 44. A summary of available QS data. The several cases with multiple data points that have the same symbol represent the different size classes in a polydisperse emulsion. Inverted triangles (high-speed X-ray emulsion); squares (high-speed negative); diamonds (medium-speed negative); and + (low-speed negative) (158); × (160); ∗ (159); filled circle (161); filled and empty triangle (156). (Figure taken from Ref. 156. Reprinted with permission of The Society for Imaging Science and Technology, sole copyright owners of the Journal of Imaging Science and Technology.)
more dye adsorbed on the surface. Thus, light absorption at wavelengths longer than that absorbed by the silver halide grain would be enhanced which would translate into higher speed, assuming that the efficiency of latentimage formation is independent of shape. For this reason, tabular grains are used in most systems designed for image capture. Experimental studies have consistently shown that speed is proportional to grain size only over a limited size range (162–165), as shown schematically in Fig. 45. At large grain sizes, the measured speed is less than that predicted by the expected relationship between speed and size based only on the absorption factor. This means that
Theory
Speed
Experiment
Log grain size Figure 45. Speed vs. grain size relationship typical for photographic materials compared with the theoretical relationship.
either factors (2) or (3) or both are grain-size dependent. There is no indication that (3) is grain-size dependent, so the key factor in explaining this discrepancy between theory and experiment is that the efficiency of latent-image formation decreases at large grain sizes. There is not general agreement on what grain-size dependent inefficiency is causing this behavior. Some think that the effective range over which the electron can diffuse is limited, so that different parts of a large grain compete with each other for the available electrons (160,162–164,166,167). However, exactly what process limits the range of the electron is not known. Others have suggested that internal image formation is more likely in large grains, making it more difficult to form a surface image efficiently (168). Other factors might involve enhanced recombination because internal lattice defects, assumed to be more prevalent in larger grains, provide additional recombination centers or impurities that act as irreversible electron traps (165). Finally, because the experimental data pertain to chemically sensitized grains, a fog increase induced by sensitization reactions (to be discussed later) may prevent optimization of sensitization. Limited experimental data suggest that larger grains are more prone to fog formation during chemical sensitization (162). RECIPROCITY FAILURE A general law of photochemical reactions proposed by Bunsen and Roscoe indicates that the amount of photoproduct depends only on the total exposure, not on the individual components (irradiance and time) (169). Because of the reciprocal relationship between irradiance and time in determining a constant value of exposure and therefore, a photoproduct, this law is known as the reciprocity law (105). Photochemical systems that deviate from this law are said to ‘‘fail the reciprocity law,’’ or more tersely, to exhibit ‘‘reciprocity failure.’’ In photography, density is proportional to the amount of photoproduct. Long time, low irradiance exposures are expected to give the same density as short time, high irradiance exposures if the reciprocity law is obeyed. Deviations from the reciprocity law are quite common in silver halides because the amount of photoproduct, that is, the fraction of developable grains, depends on the rate of arrival of photons, in addition to the total number of photons. There are many different ways to plot experimental data to show whether the reciprocity law is obeyed. One example is shown in Fig. 46 in which speed is plotted against the logarithm of exposure time. Implicit in this plot is that irradiance varies reciprocally with exposure time, so that the total number of photons incident on the film remains constant. If the reciprocity law were obeyed, the speed should be constant with exposure time, and the plot should simply be a horizontal line. The example in Fig. 46 shows deviations from ideal behavior at both long (low-irradiance reciprocity failure, or LIRF) and short (high-irradiance reciprocity failure, or HIRF) exposure times.
SILVER HALIDE DETECTOR TECHNOLOGY
Optimum Speed
LIRF HIRF
Plateau
Log time Figure 46. Reciprocity-failure plot showing LIRF, HIRF and the high-irradiance plateau. The optimum position indicates the exposure time that leads to maximum speed.
Low-Irradiance Reciprocity Failure The appearance of LIRF at low irradiance indicates instability. The photoproduct of some initial number of absorbed photons fades unless there are at least S successful absorptions within the fading period. Analysis shows that the ultimate slope of the LIRF curve is −(S − 1), where S is the number of absorptions required for stability (170). Comparison with experimental reciprocity data indicates that the S = 2 theoretical curve gives the best fit of experimental data. One concludes from this result that the two-atom clusters are the smallest stable size and that silver atoms are unstable. These ideas have been incorporated into the nucleation-and-growth model discussed earlier. At low irradiance (minutes to hours), the photon arrival rate will be low. As a result, the silver atom goes through many formative and decay reactions, leading to inefficiency in the nucleation stage of latent-image formation. The decay of silver atoms is not in itself a problem because the released electron could simply be trapped elsewhere in the grain and the atom re-forms. This cycle could be repeated many times until a second electron becomes available through photon absorption, making the nucleation event possible [recall Eq. (45)]. However, while the released electron is in the CB, it is also possible that it will encounter a trapped hole and undergo recombination. Emulsions that have been sensitized so as to destroy holes efficiently show virtually no LIRF (156,171–173). High-Irradiance Reciprocity Failure HIRF can appear at high irradiance (ms to ns exposures). High-irradiance exposures in chemically sensitized emulsions, where the nucleation efficiency is enhanced relative to unsensitized emulsions, lead to the formation of many small silver clusters on the emulsion grain. Unless the minimum developable size is small, such grains will not develop, and speed is decreased relative to intermediate irradiance conditions where the photon arrival rate is lower and only one or a few silver clusters are formed.
1287
The inefficiency at work here is the dispersity inefficiency discussed earlier. Because of the large number of electrons generated by high-irradiance exposures in a very short time, the nucleation rate is enhanced relative to the growth rate [Eq. (45) vs. (46)] and leads to multiple cluster formation and inefficient growth. As might be expected, the degree of HIRF depends strongly on the minimum developable size detected and the nucleation efficiency of the emulsion (124). If two-atom centers can be detected by the developer, then there would be no HIRF because the smallest stable size would be detectable and any additional clusters formed would be inconsequential. If a three-atom center constitutes the minimum developable size, then there could be some competition between nucleation and growth. However, we saw earlier that the dispersity inefficiency is negligible in this case and HIRF would be undetectable. As the minimum developable size increases beyond three atoms, the degree of the dispersity inefficiency increases (141), and the degree of HIRF becomes increasingly greater. Nucleation efficiency comes into play because this factor determines the degree of competition between nucleation and growth for high-irradiance exposures. At low nucleation efficiency, there is little chance of a second nucleation after the first at any irradiance level, and all the silver is collected in one site. As nucleation efficiency increases, nucleation and growth become more competitive, eventually leading to more than one nucleation per grain at high irradiance and dispersity. As we discussed earlier, the nucleation stage is considered inefficient, so that there really should be no dispersity. However, in the next section, we will see that chemical sensitization improves the efficiency of nucleation, which will make HIRF observable, depending on the minimum developable size N. The connection between HIRF, the minimum size, and nucleation efficiency can be made more quantitative. When there is a high-irradiance plateau, such as in Fig. 46, the magnitude of HIRF δ can be defined as the speed difference between the high-irradiance plateau and the optimum irradiance. Figure 47 shows how δ, computed by computer simulation based on the nucleation-and-growth model, depends on N and ηn (124). As expected from the previous discussions, δ = 0 for N = 2, and almost so for N = 3. But for N > 3, δ becomes progressively greater as both N and ηn increase. It may be thought that recombination might be the cause of HIRF because high-irradiance exposures lead to large concentrations of electrons and holes. However, Eqs. (45) and (47) show that both nucleation and recombination have a second-order dependence on carrier density. Thus, it is generally believed that competition between nucleation and growth is the cause of HIRF (105). However, Eq. (46) shows that growth has a first-order dependence on carrier density, indicating that recombination will dominate over growth at high irradiance. So, the extent of recombination will influence the magnitude of HIRF. This can be seen in Fig. 47 where an increase in the recombination index ω produces higher δ values.
1288
SILVER HALIDE DETECTOR TECHNOLOGY
2.0
Normal exposure followed by a uniform exposure
N=6
1.8
Uniform exposure followed by a normal exposure
1.4
N=5
Speed
1.6
Normal exposure
d
1.2 1.0 0.8 Log time
N=4
0.6 0.4
Figure 48. An illustration of the results of double-exposure experiments using a reciprocity-failure plot. ‘‘Normal exposure’’ means that a step tablet was used. When the uniform exposure follows the normal exposure, it has low irradiance, but when it precedes the normal exposure, it has high irradiance.
0.2
N=3 N=2
0 0.2
0.4
0.6 hn
0.8
1.0
Figure 47. The dependence of the degree of HIRF δ on nucleation efficiency ηn and minimum developable size N. Solid line, ω = 1; dashed line, ω = 4. (From Ref. 124. Reprinted with permission of The Society for Imaging Science and Technology, sole copyright owners of the Journal of Imaging Science and Technology.)
Double-Exposure Experiments Now that we know something about the mechanistic causes of LIRF and HIRF, we can discuss double-exposure experiments (118–123). These experiments, summarized in Fig. 48, involve reciprocity studies but use either a uniform pre- or postexposure in conjunction with the usual step-tablet exposure. In the absence of limitations due to exposure-induced fog, these experiments can eliminate LIRF by a short, high-irradiance preexposure and eliminate HIRF by a long, low-irradiance postexposure. In the less than ideal case when the pre- or postexposure creates fog centers, there is still a substantial reduction in the degree of LIRF and HIRF. These results indicate that there are two distinct stages in latent-image formation that have different inefficiencies in each stage and are one of the foundation experiments that led to applying the nucleation-and-growth concept to latent-image formation. The results of the double-exposure experiments can easily be explained by the nucleation-and-growth model. At low irradiance, nucleation is inefficient relative to growth. Thus, a low-irradiance uniform exposure following a high-irradiance step-tablet exposure will primarily provide electrons that can be used to enlarge a latent subimage to a developable size, with minimal formation of new clusters (fog centers). This procedure, known as ‘‘light latensification,’’ will eliminate HIRF. At high irradiance, nucleation is efficient, whereas growth is not. Therefore,
a uniform preexposure at high irradiance will produce many small subdevelopable centers. When followed by a low-irradiance step-tablet exposure, these centers will be made developable by a few growth steps. The normally inefficient nucleation at low irradiance is avoided, and LIRF is removed. CHEMICAL SENSITIZATION Chemical sensitization (5,7) involves treating the grain surface with low levels (100 ppm or less) of chemical reagents to increase the efficiency of latent-image formation. In some cases, chemical sensitization may have an alternative goal such as modifying the shape of the D–log E curve, reciprocity failure, or the response to changes in development time. Because chemical sensitization occurs at the grain surface, this process must necessarily occur after the emulsion has been washed but before the coating operation. In contrast, dopants, which are the subject of the next section, are placed in the emulsion reactor so that they can be incorporated into the grain during crystal growth. There are two principal inefficiencies to be minimized in improving the efficiency of latent-image formation. The dominant one that is always present is recombination. In addition, some grains may have a tendency to form an internal latent image in the unsensitized state, but the surface modifications by chemical sensitization are intended to favor surface over internal latent-image formation. It will be useful first to review the state of the emulsion before any chemical sensitization. Such unsensitized grains are characterized by very poor quantum sensitivity of 100 or more photons/grain (159,175). This is explained by postulating inefficient nucleation, as done in an earlier section. Using the quantum sensitivity equation [Eq. (66)], taking the recombination index ω = 1, and assuming that
SILVER HALIDE DETECTOR TECHNOLOGY
1289
(71)
Thus, the low efficiency agrees with experimental observations. Furthermore, this simple analysis shows that most of the inefficiency occurs in nucleation. This calculation assumes that recombination is the only inefficiency. Internal latent-image formation would simply increase the number of photons required. Unsensitized emulsions display no HIRF but have significant LIRF. They have primarily one latent-image center/grain at all exposure irradiances and show little speed change when the development time is varied over a wide range. All of these effects can be explained by inefficient nucleation. Such inefficiency would prevent nucleation from competing with growth, which avoids the dispersity inefficiency and the associated HIRF. Because there is no dispersity inefficiency, there is only one silver cluster/grain, even at high irradiance. It follows that, if there is only one cluster/grain, increasing the development time will not increase speed because there are no grains that contain only subdevelopable centers that might be revealed by lengthening the development time and effectively decreasing N. Sulfur Sensitization Historically, sulfur sensitization is probably the first type of sensitization used for silver halides. Early gelatin materials had many impurities, including those containing sulfur. Simply heating the emulsion would produce a speed increase and a fog increase if overdone (176,177). Modern gelatin is quite pure, and sulfur sensitization is accomplished by intentionally adding reagents that contain labile sulfur atoms. Examples are thiourea derivatives and thiosulfate; we will use the latter as an example in our discussions because it is a commonly used reagent. The schematic in Fig. 49 illustrates the practical aspects of chemical sensitization. The emulsion is in a gelled state at room temperature, so typically it is heated to 35–40 ° C to produce a homogeneous liquid state. Then, the sulfur-sensitizing reagent is added after adjusting the pH and pAg to the desired levels. Most sulfur-sensitizing reagents react slowly at this temperature, so the emulsion is heated to accelerate the reaction. Typical reaction temperatures are 50–70 ° C. After the desired reaction time at the elevated temperature, the emulsion is cooled to 40 ° C, and stabilizers are added. The emulsion is now ready for coating. The overall reaction between thiosulfate and AgBr is shown in Fig. 50. Only the outer sulfur atom is involved in the reaction. It is thought that the reaction occurs in three stages (178–183). The first stage involves adsorption of the thiosulfate anion. The second stage involves the actual reaction to form sulfide anions
40
Reduce temp.
QS = 5 + 200 + 12 = 217 photons/grain
Hold for desired time at desired temp.
Raise temp. 5°C/3 min.
Now assume that ηn is 0.01, which leads to
Add sensitizer @ pH, p Ag
70
(70) Bring emulsion to 40°C
2 + 12 ηn
Temperature (°C)
QS = 5 +
Add coating doctors
the minimum developable size is five atoms,
Coat Chill set to store
Time Figure 49. Schematic illustrating the practical steps involved in the chemical sensitization of an emulsion. NaO
O
+ 2AgBr + H2O = Ag2S +
S NaO
−
O
S
O + 2Na− + 2Br− + H+
S HO
O
Figure 50. The chemistry involved in silver sulfide formation on a AgBr surface using sodium thiosulfate.
adsorbed on the grain surface. Formally, we denote the species Ag2 S, although the exact structure of the formed species is unknown. Experiments that limit the reaction to these first two stages result in little, if any, speed increase (178–180,184). A third stage occurs upon normal sulfur sensitization, which involves formation of the active species. The exact nature of the active species is unknown, but it is often assigned to an aggregate of silver sulfide. That is, the site contains two or more S2− which somehow improve the efficiency of nucleation and increase the sensitivity of the emulsion. Some experiments, as well as modeling, suggest that the active site contains two S2− (184–186). It has been suggested that the number of these sites needed to reach maximum speed is about 1,000/µm2 (184), based on radiotracer determination of the deposited S2− and the resulting speed. The energy levels of the sulfur-containing species on the silver halide surface obviously play a critical role in their sensitometric effects. Because bulk silver sulfide is black, we know that its band gap is much smaller than that of silver halides. We might surmise that there is a gradual decrease in the energy gap as one goes from very small centers containing just a few sulfide ions to very large centers whose structure approaches that of bulk silver sulfide. Quantum mechanical calculations on free single- and double-sulfide molecules show this effect (187). Consistent with these ideas is the knowledge that sulfur sensitization induces photographic sensitivity at longer wavelengths where silver halide does not absorb (188–191). Assuming a general pattern of decreasing energy gap as the number of sulfides increases, the electronic properties of the sulfide centers will depend on the way their vacant and filled levels align with the silver halide conduction and valence bands.
SILVER HALIDE DETECTOR TECHNOLOGY
Unfortunately, direct physical measurement of these levels is not possible. Indirect measurements combining photoconductivity and dielectric loss measurements have indicated that the lowest vacant level of the sulfide centers is about 0.3 eV below the conduction band (192). Studies using luminescence modulation spectroscopy have also suggested a similar value (185,186). Photographic measurements of the temperature dependence of the previously mentioned long wavelength sensitivity also indicate trapping levels of several tenths of an eV (189–191). Some researchers have attributed a hole trapping property to the sulfide centers (101), and the measured and calculated energy levels are consistent with this (185–187,189–191). However, it is likely that sulfide centers are formed at positive-kink or kink-like sites, and would, as such, provide a very small cross section for hole trapping. The relationship between speed and sensitizer level or reaction time is shown schematically in Fig. 51. There is a clear optimum sensitizer level or reaction time, beyond which speed decreases. Associated with the speed decrease is an increase in fog, which in unfavorable situations begins to increase before the maximum speed is reached. These two effects are correlated. The speed loss at high sensitizer concentration or long reaction times is called ‘‘oversensitization,’’ but we will postpone a discussion of the mechanism of this effect until after a general discussion of the sulfur-sensitization mechanism. The source of fog upon sulfur sensitization is often attributed to overly large aggregates of Ag2 S (193,194). The proposed energy-level scheme given in Fig. 52 shows that small aggregates provide electron-trapping levels below the CB that will improve nucleation, as will be discussed shortly. However, it is hypothesized that large aggregates have even lower lying electron-accepting levels that can accept electrons from the developer, much like that of the latent image. As a result, a grain that contains such a cluster will develop without exposure and lead to an increase in fog.
CB (Ag2S)small e− e−
(Ag2S)large
VB
Developer solution
AgBr
Figure 52. Energy-band diagram that shows the suggested difference between sensitization centers and fog centers from sulfur sensitization of AgBr. Shaded rectangles depict the range of electron-accepting levels. The energy level in the developer solution phase is that of the highest filled molecular orbital of the developing agent.
S opt
S over Speed
1290
U
Log time
Fog
Speed
Figure 53. Changes in reciprocity failure due to sulfur sensitization. U denotes the unsensitized emulsion, Sopt the optimum level of sensitizing reagent, and Sover an excessive amount of sensitizing reagent.
[S ] or reaction time Figure 51. Schematic showing how speed and fog depend on either thiosulfate concentration at a fixed reaction time or reaction time at a fixed thiosulfate concentration.
Sulfur sensitization has profound effects on the reciprocity characteristics of silver halides (131,195–198). As demonstrated schematically in Fig. 53, sulfur sensitization, relative to the unsensitized emulsion, shifts the onset of LIRF to longer exposure times but also introduces HIRF. The extent of these changes increases with increasing sulfur level. The speed of sulfur-sensitized emulsions exposed for short times is very sensitive to the development conditions (time, temperature, activity of the developer) (195), as discussed earlier in connection with dispersity inefficiency. Centers arising from sulfur sensitization direct the location of the latent-image centers (197). It is possible to
SILVER HALIDE DETECTOR TECHNOLOGY
interrupt the emulsion precipitation, carry out chemical sensitization, and then to continue precipitation so as to ‘‘bury’’ the sulfide centers. Emulsion grains in which the sulfide centers are located internally form an internal latent image (detected with a special developer for developing an internal latent image). Emulsion grains in which the sulfide centers are located on the surface form a surface latent image. This latent-image directing property is most consistent with the idea that sulfide centers act as electron traps. An understanding of the many sensitometric effects of sulfur sensitization can be found in the nucleation-andgrowth model (127). All of the effects can be explained by enhanced electron-trapping ability of the grain. From the QS analysis of the unsensitized emulsion, we know that nucleation is very inefficient, so it is reasonable to suggest that sulfur sensitization somehow enhances this stage of latent-image formation. This enhancement is due to the creation of a deeper trapping level associated with the silver sulfide center, as shown on the righthand side of Fig. 54 (106). Here, we see that the sulfide center increases the trap depth and thereby lowers the crossing point on the configuration coordinate diagram, so that the transition to the deep level associated with the silver atom becomes more likely. As a result, nucleation becomes more efficient. Experimental quantum sensitivity measurements yield a value of about 30 photons/grain for optimally sulfur-sensitized emulsions (131). Using Eq. (66) and a nucleation efficiency of 0.2 yields a QS of 27 photons/grain. The effect of sulfur sensitization on reciprocity failure follows similar reasoning. As discussed earlier, the probability of the dispersity inefficiency, which is the cause of HIRF, increases with increasing nucleation efficiency. As nucleation becomes more efficient, this stage competes more favorably with growth at high irradiance and leads to a dispersed silver cluster distribution and HIRF. The effect on LIRF is understood by realizing that the onset of LIRF is proportional to the fraction of time that the electron spends free in the CB (127). Sulfur sensitization causes the electron to spend more time in the deeper traps
1291
associated with the sulfide centers and less time free in the CB, requiring a longer exposure time to observe LIRF. When the sulfur-sensitizer concentration is increased beyond the optimum level, oversensitization results, as illustrated in Fig. 51. Recalling Eq. (43), we see that the recombination index ω depends on γ , ψ, β, and φ. We have already argued that the hole spends most of its time in the trapped state so γ ∼ = 1, which means that ψ 1. In the unsensitized state, the electrons spend most of their time in the atom state, but the remaining time is partitioned more toward the free state than the trapped state because the intrinsic defect traps are very shallow. Thus, β < φ and the second term of Eq. (43) is negligible, leading to ω∼ = 1, as we assumed before. As more and more traps are created by the increased sulfur-sensitizer concentration, the electron spends less and less time free in the conduction band, causing φ to decrease and β to increase, so that the second term in Eq. (43) eventually becomes larger than one. At this point, ω begins to increase (131,198). This latter parameter is present in both the second and third term of the quantum sensitivity equation [Eq. (66)], so we see that the efficiency decreases. This happens because now the second pathway for recombination involving free holes and trapped electrons, whose extent is determined by the second term of Eq. (43), has become important. Thus, the optimum level of sensitizer or optimum sensitization time that leads to maximum speed corresponds to the minimization of the rate of both recombination pathways. These considerations show why an optimum sensitizer level exists. Increasing sensitizer concentration leads to more traps or deeper traps or some combination of the two. As a result, nucleation efficiency increases and quantum sensitivity improves. However, at some sensitizer concentration, the parameter ω may start to increase, leading to poorer efficiency. The possible speed versus sensitizer level behavior is shown in Fig. 55. C corresponds to the case in which ω remains at unity for any sensitizer level, whereas B corresponds to the optimum sensitization condition in which ηn and ω are both one at the sensitizer level that leads to the maximum sensitivity point. When ω increases before
c c.b.
C
Shallow state
c
Speed
c.b. B A
Silver atom level (a) Unsensitized (h<<1)
(b) Sensitized (h≅1)
Figure 54. Configuration coordinate diagram similar to Fig. 29 that illustrates the effect of sulfur sensitization on electron trapping at a silver atom (nucleation). Note the lower crossing point C for sulfur sensitization. (From Ref. 106. Reprinted with permission from World Scientific Publishing Co. Pte. Ltd.)
Sulfur-sensitizer level Figure 55. Hypothetical speed-sensitizer concentration relationships. See the text for an explanation of the labels. (From Ref. 198. Reprinted with permission of The Society for Imaging Science and Technology, sole copyright owners of the Journal of Imaging Science and Technology.)
1292
SILVER HALIDE DETECTOR TECHNOLOGY
ηn reaches one, A results. The quantum sensitivities at the highest speed values for B and C are those already discussed in Table 7. Note, however, that these quantum sensitivity values are greater than those expected in the fully efficient case considered by Silberstein in which QS = N. Evidently there is a residual recombination inefficiency that cannot be removed by optimizing the sulfur sensitization (131,198). Sulfur-Plus-Gold Sensitization Most commercial silver halide materials are chemically sensitized with gold- and sulfur-containing reagents. Typical gold reagents are KAuCl4 and Na3 Au(S2 O3 )2 . In the first reagent, the gold is in the +3 valence state, whereas in the second, it is in the +1 valence state. Nevertheless, the gold in the sensitizer center is in the +1 oxidation state, indicating that Au3+ is reduced during sensitization, most likely by components of the gelatin, particularly methionine (199,200). Studies of heavily sulfur-sensitized emulsions that have been treated with gold solutions show a change in the composition of the surface species where Au+ replaces Ag+ (201). In addition, the reflection spectra take on the characteristics of AgAuS, as opposed to Ag2 S. It has been suggested that the sensitizer center contains metallic gold, but the preexposure treatment of emulsion coatings with bleaches that are known to oxidize metallic gold produces very little change in photographic sensitivity (202). Chemical oxidation studies of the latent image formed in sulfur-plus-gold sensitized emulsions, however, suggest that the latent image is composed partly of gold atoms (203). In some cases, the presence of gold leads to an antifogging effect (201,204). Apparently, overly large (Ag2 S)n that act as fog centers during development are converted to (AgAuS)n centers that do not. Two sources of fog have been suggested for sulfur-plusgold sensitization (205). The first is due to overly large (AgAuS)n , similar to the situation with Ag2 S. The second is attributed to those centers that contain metallic nuclei composed of silver, gold, or silver–gold. Many of the sensitometric effects of sulfur-plus-gold (S + Au) sensitization are similar to those found with sulfur (S) sensitization, except that S + Au sensitization routinely results in a higher speed than S sensitization (159). Furthermore, S + Au sensitization leads to little, if any, HIRF (202). This effect is ascribed to the incorporation of metallic gold in the latent image and a corresponding change in its catalytic activity (124). Indirect evidence from deposited metal clusters on mica has shown that development is initiated by two gold atoms versus four silver atoms (129). Additionally, the ionization potentials of silver versus gold suggest that the presence of gold should lower the electron-accepting level of the latent image and make smaller clusters more developable. Finally, molecular orbital calculations focused on small clusters have also predicted a lowering of energy levels when gold is present (139,206). All of this suggests that one of the effects of gold is to reduce the minimum developable size, which can be called a ‘‘development effect.’’
The development effect does not fully explain the increase in speed for S + Au sensitization versus S sensitization. The development effect can be simulated by treating exposed sulfur-sensitized emulsion coatings with a gold solution, which allows incorporating metallic gold into the latent image. Large speed increases and the reduction or elimination of HIRF are observed, but the speed is still slower than S + Au sensitized emulsions (207). This suggests that gold exerts an electronic effect in addition to its development effect when it is present in the sensitizer center during exposure. In the nucleation-and-growth model, this electronic effect is attributed to an increase in nucleation efficiency over that possible by S sensitization alone. The magnitude of the development and electronic effects can be illustrated by the sensitivity equation [Eq. (66)]. Assuming that ω = 1, we arrive at the data in Table 8. In the first row, we calculate the QS of a S + Au sensitized emulsion where ηn = 1 and N = 3 atoms, leading to QS = 8 absorbed photons/grain. In the second row, we calculate the QS for a S-sensitized emulsion where ηn = 0.2, but N = 3 atoms, so as to mimic just the electronic effect of gold, leading to QS = 16 absorbed photons/grain. Finally, the last row corresponds to a Ssensitized emulsion where ηn = 0.2 and N = 5 atoms, leading to QS = 27 absorbed photons/grain. The difference between the first and second rows is the electronic effect, which is 0.30 log E, whereas the difference between the second and last Rows indicates the development effect, which is 0.23 log E. So, the two effects have similar magnitudes. These results are for the optimum position on the reciprocity plot (see Fig. 46). At high irradiance, the development effect is much greater than the electronic effect because of HIRF in the sulfur-sensitized emulsions. A minimum developable size of three atoms was used in the preceding calculations because this size will eliminate HIRF according to the results from the nucleation-andgrowth model given in Fig. 47. As discussed before, experimentally, S + Au sensitized emulsions show little, if any, HIRF. However, the data in Fig. 47 also show that a minimum size of two atoms eliminates HIRF, so why not choose this as the minimum size? The reason is that the two-atom cluster is the smallest stable cluster and if this were the minimum developable size, there would be no latent subimage, contrary to experimental data (130). An atomic explanation is not available for the increase in nucleation efficiency when gold is part of the sensitization. Measurements of trap depths show that they are not deeper than when sulfur sensitization is used alone (190). Possibly, the atom state is now a gold atom that might provide unoccupied levels between the trapping
Table 8. Calculated QS for Different Sensitizations and Minimum Developable Sizes Sensitization
ηn
N
2/ηn
QS, absorbed photons/grain
S + Au S with N = 3 S
1.0 0.2 0.2
3 3 5
2 10 10
3 3 12
8 16 27
SILVER HALIDE DETECTOR TECHNOLOGY
level and the gold 6s atomic orbital and lead to a higher probability of electron transition to the deep level during nucleation. Reduction Sensitization The species formed from this mode of chemical sensitization are silver clusters (208). Thus, typical reagents are reducing agents such as boranes, stannous chloride, and thiourea dioxide. The electronic properties of these silver clusters have been of keen interest (138,165,209–216). Do they act as latent-image precursors and trap electrons or do they react with holes and decrease in size? The answer to this question seems to depend on the concentration of sensitizing reagent used. At low and intermediate concentrations, the sensitizer centers produced can not direct latent-image location (217,218), unlike sulfur or sulfurplus-gold sensitization. This behavior would suggest they have hole-trapping properties. But, at higher concentrations, at least some of the centers apparently possess latent-image directing properties (217,218) and behave as latent-image precursors (electron traps) (210,216). Because at least some of these centers can be made developable by treatment with a gold solution, which converts the silver cluster to the more easily developed silver–gold cluster, they can be observed in the electron microscope after mild development (219,220). The number of silver clusters detected by this technique decreases upon increasing exposure, that is, they are photobleached, as would be expected from the suggested hole-trapping properties. Fog occurs when these silver clusters become large enough to trigger the development reaction, suggesting that their properties are the same as those of latent-image centers. Reduction sensitization increases speed, particularly for low-irradiance exposures, so that LIRF is reduced (221). There is no HIRF introduced by this type of sensitization. Oversensitization is not usually possible with reduction sensitization. Rather, fog limits the amount of sensitizer that can be used (210–213). An interesting possibility arises when the extent of hole removal by the reduction sensitization centers is high enough that there is no recombination. Then, of course, we should expect the quantum sensitivities to be consistent with the theoretical limits (Fig. 43). Such behavior has, in fact, been observed in model systems (156,161). An interesting aspect of chemically produced silver clusters is their ability to produce electrons upon hole trapping via the following reactions: Ag2 + h+ −−−→ Ag2 + +
Ag2 −−−→ Ag + Agi
(72) +
Ag −−−→ Agi + + e−
(73) (74)
This mechanism has been verified experimentally (222–224). The theoretical quantum yield of conduction-band electrons is two per absorbed photon. Given that sulfur-plus-gold and reduction sensitization have complementary functions, it might be expected that the best of both worlds could be obtained by using both. Sadly, this is not usually possible because significant
1293
fog arises when they are combined (156,225), most likely due to an interaction between the gold and the silver clusters that produces developable metal clusters. One way to prevent this might be to place the reduction sensitization inside the grain to avoid this interaction. However, it appears that, in at least some cases, the properties of internal chemically produced silver clusters are different from those created on the surface and they actually trap electrons and desensitize the surface image (226). There are also problems with stability of the reduction sensitization over time. All this means that reduction sensitization is usually not a viable option to the photographic film manufacturer. DOPANTS Dopants differ from chemical sensitizers in that they are introduced during the emulsion precipitation, so that they can be incorporated in desired regions of the grain. They are used to modify the intrinsic sensitometry such as reciprocity failure or contrast (227,228). Additionally, they are used to reduce the desensitization that is sometimes observed when spectral sensitizing dyes are used. The electronic effects of dopants are controlled by the atomic orbitals of the dopant but modified by the crystal potential, crystal field effects, Jahn–Teller distortions, and relaxation effects. Many of the dopants used today are transition-metal ions that have positive valence states. These ions exist as complexes, most often of octahedral symmetry. Their chemistry is controlled by the d electrons, and because of their directionality, not all of the d orbitals have the same energy. In an octahedral complex of six negative ligands, the dxy , dyz , dzx orbitals (the t2g levels) have lower energy than the free ion because they point between the ligands, whereas the remaining d orbitals dz2 and dx2 −y2 (the eg levels) have higher energy than the free ion because they point toward the negative ligands. When these ions (and their ligands) are incorporated into the silver halide lattice, they often have a charge (+2, +3, or +4) which is greater than the silver ion that they replace. To maintain electrical neutrality, a vacancy must also be incorporated for every unit of charge greater than one. These vacancies can associate with the transition-metal complex, in which case the symmetry of the octahedral field is destroyed (229,230) and causes distortion of the complex and a shift in energy levels. Because these dopants take part in one-electron trapping during exposure (or one-electron release in hole trapping), they undergo valence changes that, depending on the dopant, produce or eliminate paramagnetic species upon exposure. Thus, the photochemistry can often be followed by magnetic resonance techniques. An example of the way these ionic effects can influence the electronic properties of the dopant is illustrated by Pd(Cl)6 , which contains the d8 transition-metal ion Pd2+ (231). The dopant would, of course, take the place of a silver ion, and the six Cl− ligands would occupy the positions normally occupied by the halide ions that surround the silver ion. The energy-level diagram for this dopant, shown in Fig. 56, indicates that it can trap
1294
SILVER HALIDE DETECTOR TECHNOLOGY
[(IrCl6 )4− · 2V]1− −−−→ [(IrCl6 )4− · V]0 + [Vfree ]1− (77)
CB
Energy
dx2− y2 eg dz2
VB d xy d xz d yz
t2g
Figure 56. Hypothetical energy-level scheme for PdCl6 -doped AgCl without vacancy association. Braces indicate orbital energy degeneracy. Filled circles represent the orbital electrons.
electrons in one of the eg levels. There is also no possibility of trapping holes because the t2g levels are in the VB. However, because of the excess charge on the Pd ion, a vacancy may be associated with the Pd(Cl)6 which would distort the complex and shift the energy levels to the new positions shown Fig. 57. Now, the dopant can no longer trap an electron, but it may trap a hole in the dxy orbital. Evidence has been found for both electron-trapping and hole-trapping species that coexist in emulsion grains (231). Ir3+ is a widely used dopant in photographic systems; at less than ppm concentrations it decreases both LIRF and HIRF (232,233). At higher concentrations it leads to the formation of an internal latent image located adjacent to the dopant (234). In contrast to palladium discussed before, Ir3+ exists as a fully vacancy-compensated center [(IrCl6 )3− · 2V]0 , where V represents a vacancy and the outer superscript indicates the overall charge on the site (227,235). During exposure, the following reactions can occur at this center: [(IrCl6 )3− · 2V]0 + e− −−−→ [(IrCl6 )3− · 2V · e− ]1−
(75)
[(IrCl6 )3− · 2V · e− ]1− −−−→ [(IrCl6 )4− · 2V]1−
(76)
CB
dx2 − y2
Energy
dz2
VB
dxy dxz dyz
Figure 57. Effect of vacancy association on the hypothetical energy-level scheme of Fig. 56. Braces indicate orbital energy degeneracy.
The first reaction depicts shallow electron trapping brought about by the coulombic attraction of the iridium ion, followed on a longer time scale by an electronic relaxation in which the electron becomes localized in an iridium eg level [reaction (76)]. On an even longer time scale, ionic relaxation occurs [reaction (77)] as one of the vacancies diffuses from the complex to become a free vacancy and produces a deeply trapped electron whose a trap depth is about 0.45 eV. Atom formation occurs when an interstitial silver ion diffuses to the iridium site and the electron transfers to it. In this way, the interstitial silver ions help to reset the trap for subsequent trapping of another electron. Latent-image formation follows this initial atom formation by an alternating series of electron and interstitial silver ion captures. Fe(CN)6 which contains Fe2+ , a d6 ion, is another dopant. This dopant, which can be incorporated in AgCl emulsions with all six ligands intact, is associated with one vacancy (236). The t2g levels are filled and lie above the valence band, whereas the eg levels are empty and lie within the conduction band. This is an example of a dopant that can trap a hole, after which a second vacancy can be associated with the complex. These two steps are shown in reaction (78). 3− + Fe(CN)4− 6 · V + h −−−→ Fe(CN)6 · 2V
(78)
Although the eg levels of this dopant are above the CB minimum and therefore it has no electron trapping levels, it is still possible to localize an electron at the dopant by virtue of its +e charge with respect to the surrounding lattice. It behaves as a coulombic trap whose depth is about 65 meV (236). The CN− ions act as ballasts to prevent diffusion of the Fe2+ through the lattice, as can happen with other dopants. SPECTRAL SENSITIZATION As discussed earlier, silver halides absorb light only in the UV or blue region of the spectrum. To enable image capture and reproduce full color scenes, it is necessary to extend the spectral sensitivity to the green and red regions of the spectrum. This is accomplished with spectral sensitization in which adsorbed dyes are used to absorb in the spectral regions where the silver halide cannot and make that photon energy available for latent-image formation (5,237). The energetics of the spectral sensitization process are illustrated in Fig. 58 for a red-absorbing dye. Spectral sensitization is an example of photoinduced electron transfer. To be an efficient sensitizer (high quantum yield of conduction-band electrons), the energy of the lowest vacant (LV) molecular orbital of the dye must be at or above the conduction-band minimum of the silver halide, as shown in Fig. 58. Otherwise, the processes of radiative and nonradiative return to the ground state dominate. These latter processes occur on a nanosecond or
SILVER HALIDE DETECTOR TECHNOLOGY
e−
Intrinsic traps
Excited state
(Red) (Blue)
Ground state
VB AgBr
Dye
Figure 58. Illustration of the energetic requirements for spectral sensitization by a red dye. The excited state is the energy level of the lowest vacant (LV) molecular orbital of the dye, and the ground state is the energy level of the highest filled (HF) molecular orbital of the dye.
faster time scale, so electron transfer must be at least this fast to produce a useful quantum yield of electrons. Dye Structure Spectral sensitizing dyes are taken from the cyanine dye class (238). An example of a structure is shown in Fig. 59. The chromophore is the conjugated chain (alternating single and double bonds) between the nitrogen atoms, whose length is a major determinant of the spectral region where the dye will absorb. Cyanine dyes dominate other dye classes for spectral sensitization because (1) they have the highest extinction coefficient of almost any class of organic compounds, (2) their ionic charge promotes strong adsorption to the silver halide surface, (3) structural modifications to achieve the desired characteristics are relatively easy, and (4) they are very efficient in electron transfer to the silver halide conduction band. Cyanine dyes are an example of molecules to which the familiar one-dimensional particle-in-a-box model from quantum mechanics can be applied. With some minor adjustments, the shift in absorption wavelength caused by increasing the length of the chromophore can be predicted by this model with surprising accuracy (239). A general rule is that each additional double bond inserted into the chromophore (making the box longer) shifts
Et
Dye Adsorption The dye must be adsorbed on the silver halide surface to be effective. This property is often measured by using a technique in which the dye and emulsion are first allowed to equilibrate with each other, followed by a centrifugation step. The amount of dye in the supernatant is measured spectrophotometrically using a prior calibration of the dye in a gelatin/water solution. The amount of adsorbed dye is calculated by difference and then plotted as an adsorption isotherm, as shown in Fig. 60. Adsorption often follows the Langmuir adsorption equation, and only single monolayer adsorption is usually observed (240). Factors that can affect dye adsorption include the nature of the grain halide and morphology, the dye charge and substituents, and the environment such as pH, pAg, gelatin, and stabilizers.
Saturation coverage
S
S N
the absorption maximum about 100 nm toward a longer wavelength (238). Additional changes to the absorption maximum can be made by the choice of the ring structure on each end of the chromophore, as well as by substituents on the outer ring. Using these ‘‘knobs,’’ it is relatively easy to design structures that absorb at any desired wavelength within the visible or near infrared spectrum. The charge on the dye can be varied by modifying the alkyl groups on the nitrogen atoms with groups that have carboxylic or sulphonic acid functionalities; this leads to three possible charge states for the dye — cationic, anionic, and zwitterionic. The first two are the most commonly used. The cationic form of the dye is usually the more strongly adsorbed but is less water soluble (37). The solubility difference becomes important in materials for color photography because dyes that are less soluble in aqueous systems are more easily removed from the grain during storage by the incorporated image dye precursors needed to form a color image. The image dye precursors are organic molecules that are contained in small droplets dispersed throughout the coating. These droplets provide an organic phase where a wandering dye molecule would prefer to be. As a result, the anionic form of the dye is preferred in color imaging systems.
Dye adsorbed
e−
CB
1295
+
N Et
Figure 59. Typical structure of a cyanine dye. The chromophore of the dye is in bold. Hydrogen atoms not shown. Et is the–CH2 CH3 group. The sulfur atom S indicates the heteroatom position in the dye structure.
Dye in solution Figure 60. An adsorption isotherm typical of a cyanine dye. The saturation coverage allows determining the molecular area of the dye, provided that the surface area of the grain is known.
1296
SILVER HALIDE DETECTOR TECHNOLOGY
Driving forces for adsorption are van der Waals and coulombic forces. Dyes tend to be more strongly adsorbed on the more polarizable AgBr than on the less polarizable AgCl (241–244). Dyes whose atoms are more polarizable, particularly those in the heteroatom position (see Fig. 59), are also more strongly adsorbed. Dye adsorption strength tends to increase with increasing Br− concentration at the surface, suggesting coulombic forces (245,246). Adsorption isotherm measurements allow determining the molecular area (see Fig. 60), and these values are considerably smaller than that expected for flat-on adsorption (245,246). The conclusion is that the dyes are adsorbed edge-on in a vertical or almost vertical orientation, where the long axis of the dye molecule is parallel to the grain surface. Furthermore, the effects of changing the heteroatom on adsorption suggest that this part of the dye molecule is near the grain surface and the N atoms of the chromophore are away from the surface. These results indicate that there is a dye–dye interaction at work that also drives adsorption (cooperative adsorption). The effect of the crystallographic nature of the grain surface on dye adsorption is mixed (247). In some cases, the dye adsorbs equally well on both (100) and (111) surfaces. In other cases the dye has a preference for the surface on which it adsorbs. Adsorption also affects the absorption maximum of the dye and routinely shifts it about 20 nm to a longer wavelength in the absence of aggregation effects which will be discussed next. This shift is due to the refractive index change of the dye environment upon adsorption on the silver halide surface (248,249). Dye Aggregation Cyanine dyes also tend to form aggregates in which the individual dye molecules form a larger two-dimensional ‘‘supermolecule’’ (240). The structure of the aggregate can take on several different forms, as shown in Fig. 61. The ordered nature of these aggregates leads to dye–dye electronic interactions that shift the absorption maximum. These aggregates are distinguished by their so called ‘‘slip angle,’’ as defined in Fig. 61. Variation of the slip angle changes the nature of the dye–dye interaction and leads to different spectral shifts of the absorption peak (5). Slip angles of 60° are associated with H aggregates, where the absorption maximum is shifted to shorter wavelengths, whereas slip angles of 30° or less are associated with J aggregates, in which the absorption maximum is shifted to longer wavelengths. J aggregates are particularly interesting because their absorption band is significantly narrowed with respect to the unaggregated dye. This leads to a high extinction coefficient that makes them desirable for many photographic film applications. Factors that control the degree of J aggregation include the dye structure and the grain morphology. As might be surmised from Fig. 61, dyes whose structure leads to a high degree of planarity are more likely to aggregate (251). Similarly, dyes that have long chromophores (three or more double bonds) rarely form aggregates because such dyes have a large number of conformations that interfere with the ordered nature of the aggregate.
60°
30°
19°
Figure 61. Arrangement of dye molecules in H and J aggregates. View is from the top of the grain, and the filled circles represent ions in the surface layer. The dye molecules are adsorbed edge-on and depicted as oblong structures. The slip angle is indicated. (From Ref. 250. Reprinted with permission of The Society for Imaging Science and Technology, sole copyright owners of the Journal of Imaging Science and Technology.)
Dye Energy Levels As would be expected from Fig. 58, dye energy levels play a critical role in the efficiency of spectral sensitization. Unless the dye excited state is at or above the CB minimum, the return of the electron to the ground state dominates, and the probability of electron transfer to the CB per photon absorbed is small. Thus, it is important to be able to correlate dye energy levels with spectral sensitizing efficiency. It is difficult to measure these energy levels directly on the adsorbed grain in a film coating, so what has been done is to measure the electrochemical properties of the dye in solution (252) and then correlate the solution redox potentials with photographic behavior. The electrochemical reduction potential Ered is related to the LV level of the dye, whereas the oxidation potential Eox is related to its HF level. Earlier measurements of redox potentials were hampered by the irreversibility of the measurement which will influence the measured potentials in ways not easily predicted (253). More recently, a phase-selective secondharmonic ac voltammetric technique has been described which makes the measurements on a shorter time scale and avoids most of the problems of secondary reactions of the reduced or oxidized dye (253). All potentials given in this article are based on this technique, are for dyes in acetonitrile, and are measured against a Ag/AgCl reference electrode. Spectral Sensitizing Efficiency The efficiency of spectral sensitization can be compared in a relative sense to the efficiency of latent-image formation for intrinsic exposures, leading to what is known as the ‘‘relative quantum efficiency’’ (RQE) (237). The RQE r is the ratio of the number of photons absorbed by the silver halide to produce a specified reference density on the
SILVER HALIDE DETECTOR TECHNOLOGY
D–log E curve to the number of photons absorbed by the dye to produce the same density. An equation symbolizing this definition is r =
400 · E400 · A400 λ · Eλ · Aλ
(79)
where E400 is the incident irradiance at 400 nm and A400 is the fraction of incident radiation at 400 nm absorbed by the grain. Analogous definitions apply at wavelength λ absorbed by the dye. Of course, absolute efficiencies can also be measured using the quantum sensitivity procedures discussed earlier, in which case the ratio of the QS at 400 nm to that at wavelength λ would give r . Using this more quantitative measurement of photographic efficiency, we can now discuss the correlation of electrochemical reduction potential with photographic behavior. This comparison is shown in Fig. 62 (254). The RQE is very closely coupled to the solution Ered . The solid curve in this figure is a fit from Marcus electron-transfer theory (255). Note that the efficiency varies by a factor of 100 for a very narrow range of Ered , indicating how critical the positioning of the dye LV level is for efficient electron transfer. We see from these data that ‘‘good’’ spectral sensitizers should have an Ered more negative than about −1.3 V. It is also of interest from the mechanistic point of view to consider the dye’s ability to transfer a hole to a silver halide grain. This can be done by studying how dyes bleach internal fog (256). With this technique, it is necessary to include a strong electron-trapping agent to prevent the electron from contributing to latent-image formation, so
19 20 18 6 7 13 14 4 17 11 15 9 10 12 16 5
fr
1.0
0.1
1
21 8
2
0.01 −1.0
E R+
−1.5
−2.0
E R (V vs SCE) Figure 62. The effect of Ered on RQE for cyanine dyes. Numbers refer to the dye structure given in Ref. 254. (Reprinted with permission of the American Chemical Society.)
1297
that only bleaching can occur. Then, the results from these experiments can be compared with the same dye’s ability to form a surface negative image by the usual electron transfer process. These hole bleaching experiments showed that there is a threshold of about Eox = +1.0 V below which the hole cannot be transferred into the AgX valence band. The surface negative experiments showed that an electron cannot be transferred to the AgX conduction band when Ered is more positive than about −1.0 V. Also of note was that some dyes could also transfer both an electron and a hole to a grain. Of course, because the excitation energy of the dye is smaller than the band gap of silver halide, it is energetically impossible to transfer both carriers simultaneously. These dyes must first transfer an electron or hole from the excited state, depending on their HF and LV levels, and then transfer the complementary carrier later. Correlations such as these have allowed generating energy-level diagrams for spectral sensitization that are qualitatively useful for understanding the mechanisms of spectral sensitizing dyes. One drawback of such electrochemical/photographic studies is that the dyes often aggregate. This shifts their energy relative to the adsorbed monomer, the state that would be expected to correlate best with the solution redox potentials. A possible way around this has been demonstrated by measuring redox potentials of an adsorbed dye in an emulsion coating by a combination of reflectance spectroscopy and treating a dyed-emulsion coating with a redox couple solution (257). The work showed that for the particular infrared dye studied, the major energetic effect in going from the monomer to the J aggregate of the dye was a lowering of the dye LV level. The generality of such an in situ measurement has yet to be demonstrated. A general model of spectral sensitization has been developed that is based on the concepts expressed in Marcus electron-transfer theory (5,258). The model recognizes that there are two classes of spectral sensitizing dyes based on their mode of charge transfer. Some dyes function by first transferring an electron from the excited dye and then transferring a hole later. Others transfer a hole from the excited dye and then transfer an electron later. As would be expected, the class in which a particular dye falls depends on its energy levels relative to the conduction and valence bands of the silver halide. Using this model, very good correlation is found between the RQE of the dye and its solution redox potentials, as illustrated in Fig. 62 (254). This model showed that a dye whose LV level is coincident with the minimum of the silver bromide conduction band had an Ered = −0.92 V. This value differs from the −1.3 V in Fig. 62 that corresponds to a spectral sensitizing dye with RQE approximately unity because one must also consider the reorganization energy in Marcus theory. This energy was found to be 0.42 eV (5,258), so that a dye with Ered = −1.34(= −0.92 − 0.42) V would have approximately unit efficiency of injecting an electron into the conduction band. Spectral sensitization by hole-injecting dyes can be efficient if the dye ground-state energy level is at or
1298
SILVER HALIDE DETECTOR TECHNOLOGY
below the valence-band energy level, leading to a dye anion radical. The odd electron in this species lies higher in energy than it would if it were in the LV level of the excited dye (237). This is qualitatively explained by postulating electron–electron repulsion effects. Calculated energy differences between these two states are on the order of 0.2 eV (5), which makes transferring the odd electron more energetically favorable. In addition, there are no competing radiative or nonradiative de-excitation processes as in the case of an excited dye. A more circuitous route for transferring the odd electron to the grain involves transfer to a surface state, which forms a silver atom. Then, decay of the silver atom allows the electron to enter the conduction band. Dye Desensitization Dyes can decrease or even eliminate the intrinsic or blue sensitivity of the silver halide, depending on their energy levels and concentration (5,237). Although such effects are most easily quantified by comparing the intrinsic response with and without adsorbed dye, whatever desensitizing properties are at work also affect the response of the emulsion when exposed in the dye absorption region. As a result, there is a limit to the amount of dye that can be adsorbed on the grain, and if this limit is exceeded, the speed actually decreases. Thus, dye desensitization processes limit the maximum spectral speed that can be obtained from spectral sensitization. Dye desensitization has it origins in several possible processes. Dyes whose LV level lies below the conduction band (‘‘electron-trapping’’ dyes) can trap an electron from the conduction band. The trapping itself is often reversible, but secondary processes lead to irreversible loss of the electron. One secondary process involves recombination with a hole in the valence band, so that the dye acts as an additional recombination center. Another secondary process is loss of the trapped electron from the dye to O2 in the coating which subsequently reacts with a hole and provides yet another recombination pathway (259). Evidence for such a process comes from exposures of films in vacuum that show markedly reduced dye desensitization (260). As mentioned earlier, dyes that would normally be chosen for practical applications are those whose Ered is more negative than −1.3 V, or as close to this criterion as practically possible. As a result, electron-trapping desensitization is very unlikely for such dyes, but there is an alternative desensitization pathway for them. Because the dye LV level is positioned near or above the conduction band edge to maximize the yield of transferred electrons, by definition such dyes will introduce a new hole-trapping level associated with their HF level; these dyes can trap holes from the valence band, as shown in Fig. 63. Such dyes can desensitize by two pathways. The first is complementary to that for electron-trapping dyes. The oxidized dye can act as a recombination center for electrons in the conduction band (237). The second pathway is based on the ability of the dye-trapped hole to migrate across the surface of a grain by hopping from one dye molecule to another (261). When the dye-trapped hole encounters a latent image or latent subimage, there is an electron
e−
CB
LV
2.6 eV
HF
VB
h+ AgBr
Green dye
Red dye
Figure 63. Ideal energy-level diagram for spectral sensitization by green- and red-absorbing cyanine dyes for maximum electron transfer efficiency. The dye HF levels provide additional hole-trapping sites for intrinsically generated holes that lead to desensitization.
transfer to the oxidized dye. This could be considered an alternative pathway for recombination, but it is more traditionally called regression. Mechanistic studies of dye desensitization by good spectral sensitizing dyes have shown such dyes produce much lower desensitization when the location of the latent image is separated from that of the adsorbed dye (261–263). One way to accomplish this is to use emulsions that produce an exclusively internal image, for example, by sulfur-plus-gold sensitizing an emulsion and then precipitating more silver halide to produce a core–shell structure. Although not practically useful because the usual development is limited to a surface image, special developers can be formulated to reveal the internal image and allow such systems to provide mechanistic insight into dye desensitization processes. In fact, quantitative studies have shown that dye desensitization is completely eliminated in such systems even up to monolayer coverage of dyes that are very good hole traps (Eox < 0.60 V) (261). Such results are strong evidence for the regression mechanism of dye desensitization. Supersensitization It should be clear by now that one of the important factors in achieving efficient spectral sensitization is that the dye LV level should be at or above the CB minimum, so that the quantum yield of CB electrons approaches one. However, such requirements become problematical, particularly for red and infrared dyes, because of dye desensitization mechanisms. Even though the dye leads to a high yield of CB electrons, most of them are eventually lost through desensitization processes. Another problem with this energy level criterion is that it forces the HF level to lie very high within the silver halide band gap. Such
SILVER HALIDE DETECTOR TECHNOLOGY
dyes can be unstable because now they can be oxidized by other species in the coating, for example, O2 (264). From the standpoint of efficient electron transfer, such considerations suggest that it is necessary to use less than ideal dyes, for spectral sensitization. The electron transfer efficiencies of such dyes can be improved by a process known as supersensitization (265–267). This process uses an additional molecule, which may be another dye, whose HF level lies higher in energy than the spectral sensitizing dye HF level. The supersensitizer can transfer an electron to the half-filled HF level of the excited dye, which creates a reduced dye molecule. The advantage of this process is that now the electron residing in the former dye LV level is situated some 0.2 eV higher than an electron in the same level of the excited dye, as discussed before. The electron is more energetically disposed to transfer to the CB than it would be if just the usual excitation process was at work.
1299
(a)
IMAGE PROCESSING In image processing, the latent image is used as a catalyst to initiate development of the grain, that is, the silver ions in the crystal are reduced to silver atoms (4,5,268–270). In ‘‘black-and-white’’ development, the developed image consists of silver, whose extent determines the ‘‘blackness’’ of the image. Black-and-white development is an all or nothing process. Either the grain has received sufficient exposure to generate a latent image and all the silver ions of that grain are converted to silver atoms, or the grain does not develop at all. Here is where the silver halide detector derives its tremendous amplification capability. Considering that the minimum developable size of the latent image is three atoms and that a 1 µm cubic grain contains 2 × 1010 ion pairs, an amplification of the order of 1010 is achieved! In color photography, a dye image, rather than a silver one, is produced (271–273). To accomplish this, a by-product of the development reaction, the oxidized developer, is used by allowing it to react with an image dye precursor to form the image dye. The dye forms a ‘‘cloud’’ around the grain. In this type of development, a bleaching step must be part of the postdevelopment process, so that the developed silver can be removed from the final image. Otherwise, desaturation of the colors results. For the most part, grains act independently of each other during exposure of a film layer. That is to say, absorption and subsequent latent-image formation in one grain does not affect the electronic and ionic processes in surrounding grains. During development, however, there can be considerable intergrain interactions, particularly in color development (271). These interactions can be finetuned to enhance the final image in sharpness and/or noise (see Detector Design Considerations section). In color photography, development is also restricted to only a small fraction of the grain so as to control the size of the dye clouds and reduce graininess in the final image. A comparison of grains developed in a black-and-white developer and a color developer is given in Fig. 64. Microscopic examinations of developed grains have shown that they are usually masses of filamentary silver (269,274) often 15–25 nm in diameter and several
(b)
Figure 64. Micrographs of developed grains that show the difference between black-and-white development (a) and color development (b).
times that in length. The filament size has little dependence on the initial grain size. Developed silver is initially spherical, but, the developing site elongates into a filament at some point during development. There is no general agreement on the reason for this switchover, but some researchers have suggested that the reaction becomes limited by either diffusion of silver atoms (275) or silver ions (276,277), so the spherical geometry can no longer be maintained. Electrochemistry of Development Silver ions are reduced to silver atoms by reducing agents that are part of the developer formulation. As such, the process is best described by an electrochemical approach. The development reaction can be described by the following reaction: Dred + nAg+ −−−→ nAg + Dox + mH +
(80)
1300
SILVER HALIDE DETECTOR TECHNOLOGY
where Dred is the reduced form of the developer, Dox is the oxidized form of the developer, n is the number of electrons transferred (1 or 2) and m is the number of protons produced (1 or 2). This reaction is composed of two half-cell reactions — one involves the silver ion, and the other involves the developing agent. These can be written as −−− −− → Ag+ + e− ← − Ag,
EAg
−− −− → Dox + mH − ← − Dred , +
Edev
(82)
(83)
A thermodynamic relationship between Ecell and the free energy change G is given by G = −nFEcell , where F is Faraday’s constant. Thus, when Ecell is positive, the free energy change is negative and reaction (80) occurs as written, that is, reduction occurs. But when Ecell is negative, the free energy change is positive, and reaction (80) occurs in reverse, that is, oxidation occurs. When Ecell is not 0, the reaction proceeds until Ecell = 0. Of course, if the by-products Dox and H + can be removed, then Ecell never reaches 0 until one of the reactants is consumed. The silver potential EAg is, of course, given by the Nernst equation which for a one-electron reaction at room temperature is expressed as EAg = EoAg − 0.059 log
[Ag] [Ag+ ]
(84)
and because [Ag] is assigned a value of unity and the solubility product Ksp controls [Ag+ ], we can recast the Nernst equation into EAg = EoAg + 0.059 log Ksp − 0.059 log[Br− ]
(85)
This equation indicates that the silver potential for developing AgBr is controlled by the Br− concentration. Thus, most developer formulations contain Br− to adjust the silver potential for optimum development. The developing agents are usually organic compounds (268) that generally lose both an e− and H+ during development and can often lose a total of two of each, depending on the structure. It is useful to consider just the ionic equilibria first described by the following reversible reactions: H2 R −−−→ HR− + H+ −−−→ R2− + H+ R2− + H+ −−−→ HR− + H+ −−−→ H2 R
K1 =
[H+ ][HR− ] [H+ ][R2− ] ; K2 = [H2 R] [HR− ]
(86) (87)
All three forms of the developing agent (H2 R, HR− , and R2− ) can be reductants, although with different electron
(88)
Now, if we also consider the electron transfer reaction, the Nernst equation would look like
(81)
where EAg and Edev are the electrochemical potentials of the half-cell reactions. The overall potential Ecell for the development reaction is given by Ecell = EAg − Edev
transfer abilities. For the two deprotonation reactions, the equilibrium constants are given by
Edev = Eodev + 0.0295 log
[H+ ]2 [Dox ] [Dred ]
(89)
where [Dox ] is the Dox concentration and [Dred ] is the Dred concentration, which for this example would be given by [Dred ] = [HR] + [HR− ] + [R2− ]
(90)
Now substituting expressions from the ionic equilibria, (reactions (86) and (87)) we get for Edev Edev = Eodev + 0.0259 log
[Dox ] + 0.0259 log([H+ ]2 [Dred ]
+ K1 [H+ ] + K1 K2 )
(91)
This equation indicates that Edev will be affected by [Dred ], [Dox ], and the pH. Developing Agents Most developing agents used today are based on appropriately substituted benzene derivatives and can be classified into the following three categories: (1) di- and polyhydroxy compounds (Fig. 65a), (2) amino hydroxy compounds (Fig. 65b), and (3) di- and polyamino compounds (Fig. 65c). The first two categories are used in developers for blackand-white images, whereas the third category is used in color developers. Some developers are based on ring structures that contain noncarbon nuclei such as in Fig. 65d. A primary example of the polyhydroxy category is hydroquinone, a component in the majority of black-andwhite developers. It exists in three forms, depending on the pH, as shown in Fig. 66. The pK values are useful because they indicate the pH at which the particular form will be half protonated. Many black-and-white developers work at around pH 10, so, the two forms of hydroquinone on the left and in the center of Fig. 66 will predominate. However, the ability to transfer electrons increases as the degree of deprotonation increases. The electron transfer reactions, illustrated in Fig. 67 for the fully deprotonated form, show that two electrons can be transferred to form the quinone molecule, Dox . Similar reactions are at work for the other developing agent categories. Developer Solutions The driving force for development is the potential difference between the silver and developer half-cells, Ecell . If Ecell changes during development, then the rate of development will change and perhaps become so small that development ceases. Ecell changes because [H+ ] and [Dox ] increase during development, whereas [Dred ] decreases, causing Edev to increase. During development,
SILVER HALIDE DETECTOR TECHNOLOGY
− O
••
••
••
OH
OH
•
••
OH
••
(a)
1301
O
OH
OH
+
Ag+
+
Ag
O
••
••
−
−
Pyrogallol
Catechol
••
••
O
Hydroquinone
••
••
OH OH
Semiquinone O
NH2
+
Ag
O
••
••
O
••
p-Aminophenol
(−)
Figure 67. Electron-transfer reactions of fully deprotonated hydroquinone. The semiquinone is an intermediate stage of oxidation. The quinone form is depicted at the lower right.
NH2
(c)
Ag+
••
+
••
•
O
••
••
••
••
OH
(b)
NH2
E Ag
(d)
HO
OH C
C6H5 H2C
C H
C O
C O
CHOHCH2OH
Ascorbic acid
N H
N
H2C
Potential
p-Phenylenediamine [Br − ] Driving force for development [H+ ], [Dox ] [Dred]
O
C
Phenidone
E dev
Figure 65. Structures of developing agents.
Time O−
OH
O− + H+
OH
O−
OH pK1 = 9.9
+ H+
pK2 = 11.6
Figure 66. The deprotonation reactions of hydroquinone. The two pKa values are indicated.
as silver ions are reduced to silver atoms, Br− is also released into solution to maintain electrical neutrality within the developing grain. As a result, EAg decreases. These features of development are shown schematically in Fig. 68. The driving force obviously decreases over time. The developer contains at least two other components besides the developing agent to minimize some of these effects. The developer contains a buffer that will react with the protons, released from the developing agent as a result of electron transfer, to prevent the local pH from decreasing. An example is Na2 CO3 , which buffers best at pH 10–11. This reagent reacts with water to give HCO3 − , which
Figure 68. Schematic showing the effect of Br− and H+ release and Dox formation on the silver and developer half-cell potentials (dashed curves).
undergoes a further reaction to give OH− and H2 CO3 . As OH− is consumed by the H+ released from the oxidized developer, more HCO3 − dissociates to maintain [OH− ] constant. Other buffers are metaborate/tetraborate for pH 8–10 and phosphate/monohydrogen phosphate for pH 11–12. Most developers contain sulfite that reacts with oxidized developing agents to produce inert forms, thereby preventing the buildup of Dox and an increase in Edev . Because dissolved O2 can also react with the developing agent, which can catalyze further formation of Dox , sulfite also acts as a preservative to allow developer solutions to remain stable for prolonged storage periods. All developers also contain halide ion to adjust the silver potential, usually Br− but sometimes Cl− if AgCl emulsions are processed. The halide ion retards both development of image and fog, but more so the latter. Thus, it acts as an antifoggant. The antifoggant effect is primarily kinetic and allows substantially complete development of the image before the onset of fog, provided
1302
SILVER HALIDE DETECTOR TECHNOLOGY
that the proper development time is chosen. However, there is nothing in the developer to counteract the release of bromide during development. As a result, the silver potential decreases over time, but not to the extent that stops development. However, subsequent film developed in the same developer will probably not develop at the same speed because the halide concentration is higher than that for the first set of films. In commercial developing and printing, often called ‘‘photofinishing,’’ film or paper is continually being run through processing machines. Developing agent, sulfite, and buffer are continually consumed, and halide is continually generated. Under such conditions, it is impossible to maintain a constant Ecell , and the development rate slows down. To avoid this problem, a procedure called ‘‘replenishment’’ is employed in which a ‘‘replenisher’’ solution that contains fresh developing agent, sulfite, and buffer is continually added to the developer solution at a rate sufficient to compensate for the loss of developer components during development. To maintain a constant volume of developer, an equivalent amount of used developer flows out. The replenisher contains no halide, so the halide in the working developer solution is diluted. All of this allows Ecell to remain constant and provides constant developer activity during many processing cycles. Types of Development Development can be classified by the way the silver ions that take part in the development reaction are provided to the reacting site. In ‘‘chemical’’ development, the silver ions in the crystal lattice diffuse through the lattice to the developing site and take part in the reduction process, as shown in Fig. 69a. In ‘‘physical’’ development, the silver ions are made available by diffusion through the solution phase, and within this class are further distinguished as (1) solution physical development when the silver ions are provided by the crystal but diffuse through solution to arrive at the developing site; (2) prefixation physical development when the silver ions are provided in the developer formulation and the silver ions in the crystal are not used (Fig. 69b); and (3) postfixation physical development when the silver halide grains are first removed from the film layer, but not the latent image, and then the remaining film is subjected to a developer that contains the requisite silver ions. Solution physical development occurs because components of the developer solution such as sulfite can solubilize lattice silver ions and provide transport through the solution to developing sites. Because Br− is released during development and because it can increase the solubility of AgBr (see Fig. 10), there is always some solution physical development, even in the absence of any developer-induced effects. Initiation of Development After inserting of the film into the developer solution, there is a time during which nothing seems to be happening. The extent of this period will, of course, be a function of the detection method. This apparent period of inactivity is
(a)
D ox
Ag+ Br−
D red
AgBr Ag
−
l Br ica Ag+ Chem
Physical Ag+
(b)
D ox
D red
AgBr Ag
Ag+
Figure 69. Illustration of the Ag+ transport stage in development. In (a) the silver is transported either through the grain (chemical development) or by complexation of the silver ion and subsequent diffusion through the solution phase (solution physical development).Bromide must leave the grain to maintain electrical balance and leads to dissolution of the grain. (b) represents another form of physical development in which the silver ions are supplied via the developer. In this case, the silver ions of the grain are not used, and the grain does not dissolve.
usually due to the small size of the latent image, although partial burying of the latent image in the grain can be an additional factor. This latter effect might happen if some recrystallization occurs during the process of chemical sensitization and leaves the sulfur-plus-gold sensitizer site slightly buried beneath the surface. In the early stages of development, the silver nuclei are small, and the equation for EAg derived earlier, which was based on macroscopic quantities of silver, does not apply. The Gibbs–Thomson concept can be used to arrive at an EAg that would be useful in understanding initiation by using the following equation (269): EAg = E∞ −
2σs Vm rF
(92)
where E∞ is the potential of bulk silver, r is the silver nuclei radius, σs is the free surface energy, Vm is the molar volume, and F is Faraday’s constant. Here we see that as r becomes larger, the second term tends toward zero and the calculated EAg is just that from Eq. (85). But, for small nuclei, the second term becomes significant, and the calculated EAg and therefore, the driving force for development is smaller than that expected. Thus, the initial rate of development will be smaller than that observed at later times. Rate of Development The rate of development can be measured by plotting density versus development time for a fixed exposure.
SILVER HALIDE DETECTOR TECHNOLOGY
Alternatively, the mass of developed silver can be determined and then plotted versus development time. The degree of swelling of the film layer, which is mainly determined by the extent of hardening of the layer, is one factor that affects the rate of development. Other factors are the rate of diffusion of the developer components into the layer; the adsorption of the reduced form of the developing agent on the developing silver; the electron-transfer rate from the developing agent to the developing silver; and the desorption, diffusion, or reaction of the oxidized form of the developing agent. Additionally, hydration and diffusion of the halide ion can be important as is the dissolution rate of the silver halide grain and the diffusion rate of soluble silver-ion complexes. Revisiting Eq. (85), we see that the EAg is controlled by the Ksp for a fixed molarity of the halide ion. So, as one moves from AgCl to AgBr and then to AgI, the EAg decreases, which, in turn, decreases the driving force for development, provided that no changes are made to the developer half-cell reaction. Thus, it is well known that AgCl emulsions develop more rapidly than AgBr emulsions. AgI emulsions develop too slowly to be photographically useful. Superadditivity Two developing agents are actually present in all blackand-white developers. This is done because it has been found that several developing agent combinations produce synergistic effects (278). The development rate is greater in a certain concentration range than a simple additive effect would predict. The mechanism of this superadditivity effect appears to be a regeneration of one of the developing agents by the other. Thus, in a superadditivity combination A/B, as developing agent A is converted into Aox via development, it is converted back into A by an electrochemical reaction with B in the solution phase. Postdevelopment Processes After development, there are developed and undeveloped grains in the coating. Without additional processing, the image will be unstable because the undeveloped grains are still photosensitive and will gradually blacken through the photolytic formation of silver. These undeveloped grains must be removed from the film layer to make the image stable; this process is called ‘‘fixing’’ (279). Silver halide is removed from the film by forming soluble silver complexes. The complexation must be strong enough to prevent reprecipitation during the final washing phase. The steps involved in fixing are (1) diffusion of the fixing agent into the film, (2) dissolution of silver halide to form soluble silver-ion complexes, and (3) diffusion of soluble silver-ion complexes out of the film into the bulk solution. The most common fixing agent is the thiosulfate anion. The 1:1 complex is insoluble but the 1:2 and 1:3 complexes are water soluble. The sequence of reactions is as follows − −−−→ Ag+ (surface) + S2 O2− 3 ←−−− AgS2 O3 (surface) (93) 2− −−−→ 3− AgS2 O− 3 (surface) + S2 O3 ←−−− Ag(S2 O3 )2 (surface) (94)
1303
3− −−−→ Ag(S2 O3 )3− 2 (surface) ←−−− Ag(S2 O3 )2 (solution) (95)
Surprisingly, the specific cation used with the thiosulfate anion affects the fixing rate because the diffusion rate of + + + the cations is in the order: NH+ 4 > K > Na > Li . To avoid charge buildup, the cation must diffuse into the film at the same rate as the anion. It turns out that NH+ 4 can accelerate the fixing process when added to a sodium thiosulfate solution, although there is a optimum ratio, beyond which the acceleration effect can be lost. The final stage of image processing is the wash step. Washing removes salts and other species accumulated during other stages in image processing. A major component to be removed is thiosulfate. If it remains in the dried film or print, it will attack the silver image over time. The principle of washing is to provide a driving force for thiosulfate to diffuse out of the film or paper by providing a solution of very low thiosulfate concentration. This principle can be met by continually providing changes of water by using a flowing water setup. The degree of washing can be predicted by using the Bunsen–Ostwald dilution law (279), Xn =
v V +v
Xo
(96)
where Xo is the initial concentration of thiosulfate, Xn is the thiosulfate concentration after n washings, v is the volume of washing fluid in and on the materials between stages, and V is the volume used in each wash bath. Commercial acceptability requires Xn /Xo = 100, but archival acceptability requires Xn /Xo = 1,000. DETECTOR DESIGN CONSIDERATIONS What has been discussed so far focuses entirely on how the silver halide grain is optimized for the desired (usually maximum) sensitivity. But the performance of any detector is based on its achievable signal-to-noise ratio (SNR). Noise in the silver halide detector is observed as graininess in an image (4,280,281). The image of an object whose radiance is uniform may or may not display the expected uniformity, depending on the characteristics of the detector. When the image does not reproduce the object’s uniform radiance faithfully, it has a granular appearance. There are fluctuations in the image density that should not be present. The appearance of graininess has to do with the randomness by which the silver halide grains are placed within the detector during the coating operation. Following development, the ‘‘silver grains’’ are also randomly placed and lead to areas of image density fluctuations. The human visual system is very sensitive to this randomness. A similar degree of density fluctuation but in an ordered array would be less noticeable. Although the human visual system cannot resolve the individual silver grains, their random placement does translate into the appearance of graininess in the final image. A further feature of this randomness in silver halide detectors is that it increases with grain size. Consider an
1304
SILVER HALIDE DETECTOR TECHNOLOGY
emulsion coating that contains small grains and another that contains large grains. If both coatings are made so that they have comparable maximum density, then the large grain coating will have higher graininess because the larger grains can achieve the required maximum density with fewer grains. As a result, the intergrain distance will be larger with correspondingly larger fluctuations in density. The correlation of graininess with grain size leads to a fundamental problem in designing photographic systems whose SNR is optimum. As discussed before, higher speed is achieved most often by increasing the grain size. But, now we see that the noise also increases with the sensitivity increase, so that the improvement in SNR will be less than expected. The exact relationship between SNR and grain size is related to a host of emulsion, coating, and design factors, so no simple rule of thumb can be given. Nevertheless, it is a fundamental issue in optimizing photographic systems for image capture. Yet another feature of the silver halide detector is how well it reproduces the sharpness of the object being imaged (4,280,281). If we imagine that the object reflects a very narrow beam of light, we are concerned whether this beam is blurred in the final image. Imaging scientists use the concept of a ‘‘point-spread function’’ to characterize the amount of blurring seen in the image. Grain size is an important factor in determining the point-spread function in a silver halide detector. Because the sizes of the grains are on the order of the wavelength of light used in image capture, there is considerable scattering of light. The point of light is spread out laterally so that there is blurring. Although a bit of oversimplification, the tendency is for the spread to be greater for larger grains. Taking these three system design factors — sensitivity, graininess, and sharpness — we can relate them by reference to the ‘‘photographic space’’ shown in Fig. 70 (282). The volume under the triangle represents the space available to the system designer. If higher sensitivity is needed, then the grain size can be increased. This will move the apex of the triangle further up
Sensitivity
Increased sharpness
Less noise
Figure 70. Schematic that illustrates detector design space. The area under the triangle is the ‘‘photographic space.’’
the vertical axis, but will also pull the points inward where the triangle intercepts the horizontal axes — the volume under the triangle remains constant, and one design attribute is optimized at the expense of others. Equal but opposite effects occur when grain size is decreased to reduce graininess and increase sharpness. One way to overcome these design constraints is to improve the efficiency of latent-image formation. Using this approach, the sensitivity can be increased without increasing the grain size. Thus, the volume under the triangle increases, and the system designer has more degrees of freedom. If higher sensitivity is not needed for a particular application, then the efficiencyincreasing technology can be applied to smaller grains to increase their sensitivity to the design requirements, and the smaller grain size ensures improved sharpness and less graininess. For this reason, photographic film manufacturers are constantly looking for ways to improve the efficiency of latent-image formation. IMAGING SYSTEMS This section briefly describes imaging systems based on silver halide detector technology; much more information can be found in other articles. The detector in systems designed to produce black-and-white images, would be prepared much as described earlier (4). Because the detector must have spectral sensitivity that spans the major part of the visible spectrum, both green and red dyes must be adsorbed on the grain surface. The emulsion would have a polydisperse grain-size distribution, so that the characteristic curve has a low-contrast (about 0.6), long-latitude response to minimize the risk of over- or underexposure. The final print image is made by exposing through the negative onto a negativeworking silver halide emulsion on a reflective support. The characteristic curve of the emulsion intended for print viewing must have high contrast (about 1.8) to produce a positive image that has the correct variation of image gray value (tone scale) with scene luminance (4,14). Systems designed to produce color prints also use a low-contrast emulsion to capture the image and a highcontrast emulsion to print the image. However, such systems are designed to produce dye images rather than silver images (271–273). As mentioned in the Image Processing section, this is done by allowing the oxidized developer to react with dye precursors to produce the dye image and then bleaching the developed silver back to silver halide for removal during fixing. Movie films are designed similarly to still camera films, except that the final-stage ‘‘print’’ is made using a transparent support, so that the image can be projected on a screen in a darkened room. For color reproduction, the image must be separated into its blue, green, and red components using separate layers in the film that are sensitive to these spectral regions. This color separation is accomplished by using spectral sensitization to produce separate layers that are sensitive to blue through a combination of the silver halide
SILVER HALIDE DETECTOR TECHNOLOGY
grain and an adsorbed blue spectral sensitizing dye (B), blue plus green by using a green spectral sensitizing dye (G), and blue plus red by using a red spectral sensitizing dye (R), as shown in Fig. 71. Representative spectral sensitivity curves for the different layers are shown in Fig. 72. By coating the B layer on the top and coating underneath it a yellow filter layer (which absorbs blue light), the G and R layers receive only minus-blue light. Then, by coating the G layer on top of the R layer, the latter receives only red light. Thus, the image has been separated into its B, G, and R components. Systems that produce slides also use similar color technology, but now the image capture medium and the image display medium are the same film. Positive images are obtained by first developing the negative image in a black-and-white developer, then chemically treating the remaining undeveloped grains to make them developable, and finally developing those grains with a color developer. Then all the developed silver is bleached back to silver halide and fixed out. To produce the right tone scale in the final image, the film latitude must be considerably shorter than that in systems designed to produce prints and requires much more care in selecting optimum exposure conditions. Systems designed for medical diagnosis using Xrays produce negative black-and-white images. Because film is a very poor absorber of X rays, these systems
use a screen–film combination to minimize patient exposure (283). The screens are composed of heavyelement particles that are good X-ray absorbers and also emit radiation in the near UV or visible region. In these systems, the film need be sensitive only to the particular wavelength region where the screen emits. Films are designed with different contrasts to optimize the image for the particular diagnosis performed.
ABBREVIATIONS AND ACRONYMS E t T R I D ISO γ AgCl AgBr AgI Ag+ Br− I− Ksp pAg pBr S G Q
Blue
ηn ω N
Yellow filter layer Green (+Blue)
Red (+Blue)
Coating support
Spectral sensitivity (log E )
Figure 71. Arrangement of layers in a simple color film.
Green layer
Blue layer
4
Red layer
1305
QS F P LIRF HIRF S S + Au LV HF RQE Ered Eox Dred Dox EAg Edev Ecell SNR
exposure exposure time transmittance reflectance irradiance image density international standards organization contrast silver chloride silver bromide silver iodide silver ion bromide ion iodide ion solubility product negative logarithm of the silver ion concentration negative logarithm of the bromide ion concentration supersaturation free energy one-dimensional representation of a lattice distortion nucleation efficiency recombination index minimum number of silver/gold atoms in the developable latent image quantum sensitivity fraction of grains developable mean absorbed photons/grain low-irradiance reciprocity failure high-irradiance reciprocity failure sulfur sensitized sulfur plus gold sensitized lowest vacant molecular orbital highest filled molecular orbital relative quantum efficiency electrochemical reduction potential electrochemical oxidation potential reduced form of the developing agent oxidized form of the developing agent electrochemical silver potential electrochemical developer potential difference between EAg and Edev (= EAg − Edev ) signal-to-noise ratio
2
0
400
500
600
700
l (nm) Figure 72. Spectral sensitivity of layers in color film. Data were obtained on separately coated layers.
BIBLIOGRAPHY 1. E. Ostroff, ed., Pioneers of Photography. Their Achievements in Science and Technology, SPSE, The Society for Imaging Science and Technology, Springfield, VA, 1987.
1306
SILVER HALIDE DETECTOR TECHNOLOGY
2. W. H. F. Talbot, The Process of Calotype Photogenic Drawing (communicated to the Royal Society, June 10, 1841) J. L. Cox & Sons, London, 1841. 3. G. F. Dutton, Photographic Emulsion Chemistry, Focal Press, London, 1966. 4. B. H. Carroll, G. C. Higgins, and T. H. James, Introduction to Photographic Theory, J Wiley, NY, 1980. 5. T. Tani, Photographic Sensitivity, Oxford University Press, NY, 1995. 6. T. H. James, T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977. 7. J. F. Hamilton, Adv. Phys. 37, 359 (1988). 8. D. M. Sturmer and A. P. Marchetti, in J. Sturge, V. Walworth, and A. Shepp, eds., Imaging Materials and Processes, Neblette’s 8th ed., Van Nostrand Reinhold, NY, 1989, Chap. 3. 9. D. J. Locker, in Kirk Othmer Encyclopedia of Chemical Technology, vol. 18, 4th ed., J Wiley, NY, 1996, pp. 905–963. 10. R. S. Eachus, A. P. Marchetti, and A. A. Muenter, in H. L. Strauss, G. T. Babcock, and S. R. Leone, eds., Annual Review of Physical Chemistry, vol. 50, Annual Reviews, Palo Alto, CA, 1999, pp. 117–144. 11. H. A. Hoyen and X. Wen, in C. N. Proudfoot, ed., Handbook of Photographic Science and Engineering, 2nd ed., The Society for Imaging Science and Technology, Springfield, VA, 1997, pp. 201–224. 12. J. H. Altman, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 17. 13. P. Kowalski, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 18. 14. C. N. Nelson, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 19, Sec. I. 15. American National Standard PH2.5-1972. 16. American National Standard PH2.27-1965. 17. C. Kittel, Introduction to Solid Sate Physics, 4th ed., J Wiley, NY, 1971. 18. E. Vogl and W. Waidelich, Z. Agnew. Phys. 25, 98 (1968). 19. M. Bucher, Phys. Rev. B 35, 2,923 (1987). 20. W. Nernst, Z. Physik. Chem. 4, 129 (1889). Consult any analytical chemistry textbook. 21. I. H. Leubner, R. Jaganathan, and J. S. Wey, Photogr. Sci. Eng. 24, 268 (1980). 22. E. J. Birr, Stabilization of Photographic Silver Halide Emulsions, Focal Press, London, 1974. 23. J. W. Mullin, Crystallization, 2nd ed., Butterworths, London, 1972, p. 222. 24. C. R. Berry and D. C. Skillman, J. Photogr. Sci. 16, 137–147 (1968). 25. C. R. Berry and D. C. Skillman, J. Phys. Chem. 68, 1,138 (1964). 26. C. R. Berry and D. C. Skillman, J. Appl. Phys. 33, 1,900 (1962). 27. P. Claes and H. Borginon, J. Photogr. Sci. 21, 155 (1973). 28. J. S. Wey and R. W. Strong, Photogr. Sci. Eng. 21, 14–18 (1977). 29. T. Tanaka and M. Iwasaki, J. Imaging Sci. 29, 86 (1985). 30. C. R. Berry and D. C. Skillman, Photogr. Sci. Eng. 6, 159–165 (1962).
31. J. S. Wey and R. W. Strong, Photogr. Sci. Eng. 21, 248 (1977). 32. I. H. Leubner, J. Phys. Chem. 91, 6,069 (1987) and references cited therein. 33. C. R. Berry, S. J. Marino, and C. F. Oster Jr., Photogr. Sci. Eng. 5, 332–336 (1961). 34. R. Jagannathan, J. Imaging Sci. 35, 104–112 (1991). 35. M. Antonaides and J. S. Wey, J. Imaging Sci. Technol. 39, 323–331 (1995). 36. C. R. Berry, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 3. 37. A. H. Herz and J. Helling, J. Colloid Sci. 17, 293 (1962). 38. W. L. Gardiner, D. Wrathall, and A. H. Herz, Photogr. Sci. Eng. 21, 325–330 (1977). 39. S. Boyer, J. Cappelaere, and J. Pouradier, Chim. Phys. 56, 495 (1959). 40. J. E. Maskasky, J. Imaging Sci. 30, 247 (1986). 41. I. H. Leubner, J. Imaging Sci. 31, 145 (1987). 42. M. J. Harding, J. Photogr. Sci. 27, 1–12 (1979). 43. W. Markocki and A. Zaleski, Photogr. Sci. Eng. 17, 289–294 (1973). 44. A. B. Holland and J. R. Sawers, Photogr. Sci. Eng. 17, 295–298 (1973). 45. A. B. Holland and A. D. Feinerman, J. Appl. Photogr. Eng. 84, 165 (1982). 46. D. L. Black and J. A. Timmons, J. Imaging Sci. Technol. 38, 10–13 (1994). 47. J. E. Maskasky, J. Imaging Sci. 31, 15–26 (1987). 48. J. E. Maskasky, J. Imaging Sci. 32, 15–16 (1988). 49. B. K. Furman, G. H. Morrison, V. I. Saunders, and Y. T. Tan, Photogr. Sci. Eng. 25, 121 (1981). 50. T. J. Maternaghan, C. J. Falder, R. Levi-Setti, and J. M. Chabala, J. Imaging Sci. 34, 58–65 (1990). 51. J. F. Hamilton and L. E. Brady, Surface Sci. 23, 389 (1970). 52. R. C. Baetzold, Y. T. Tan, and P. W. Tasker, Surface Sci. 195, 579 (1988). 53. P. Tangyunyong, T. N. Rhodin, Y. T. Tan, and K. J. Lushington, Surface Sci. 255, 259 (1991). 54. Y. T. Tan, K. J. Lushington, P. Tangyunyong, and T. N. Rhodin, J. Imaging Sci. Technol. 36, 118 (1992). 55. P. W. M. Jacobs, J. Corish, and C. R. A. Catlow, J. Phys. C Solid State Phys. 13, 1977 (1980). 56. W. G. Kleppmann and H. Bilz, Commun. Phys. 1, 105 (1976). 57. H. Bilz and W. Weber, in A. Baldereschi, W. Czaja, E. Tosati, and M. Tosi, eds., The Physics of Latent Image Formation in Silver Halides, World Scientific, Singapore, 1984, p. 25. 58. R. J. Friauf, in A. Baldereschi, W. Czaja, E. Tosati, and M. Tosi, eds., The Physics of Latent Image Formation in Silver Halides, World Scientific, Singapore, 1984, p. 79. 59. J. Van Biesen, J. Appl. Phys. 41, 1,910 (1970). 60. S. Takada, J. Appl. Phys. Jpn. 12, 190 (1973). 61. S. Takada, Photogr. Sci. Eng. 18, 500 (1974). 62. M. E. van Hull and W. Maenhout-van der Vorst, Phys. Stat. Sol. 39(a), 253 (1977). 63. M. E. van Hull and W. Maenhout-van der Vorst, Phys. Stat. Sol. 40(a), K57 (1977). 64. M. E. van Hull and W. Maenhout-van der Vorst, Int. Cong. Photogr. Sci., Rochester, NY, 1978, Paper I 8. 65. M. E. van Hull and W. Maenhout-van der Vorst, Phys. Stat. Sol. 52(a), 277 (1979).
SILVER HALIDE DETECTOR TECHNOLOGY
1307
66. F. Callens and W. Maenhout-van der Vorst, Phys. Stat. Sol. 50(a), K175 (1978).
101. J. W. Mitchell, Rep. Prog. Phys. 20, 433 (1957).
67. F. Callens and W. Maenhout-van der Vorst, Phys. Stat. Sol. 71(a), K61 (1982).
103. J. W. Mitchell, Photogr. Sci. Eng. 22, 249 (1978).
68. F. Callens, W. Maenhout-van der Vorst, and L. Kettellapper, Phys. Stat. Sol. 70(a), 189 (1982).
105. J. F. Hamilton, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 4.
69. H. Pauly and H. P. Schwan, Zeitsch. f. Naturforschung 14b, 125–131 (1950). 70. F. Callens, D. Vandenbroucke, L Soens, M. Van den Eeden, and F. Cardon, J. Photogr. Sci. 41, 72–73 (1993).
102. J. W. Mitchell, J. Phys. Chem. 66, 2,359 (1962). 104. J. W. Mitchell, Photogr. Sci. Eng. 25, 170 (1981).
106. J. F. Hamilton, in A. Baldereschi, W. Czaja, E. Tosati, and M. Tosi, eds., The Physics of Latent Image Formation in Silver Halides, World Scientific, Singapore, 1984, p. 203.
71. J. Heick and F. Granzer, J. Imaging Sci. Technol. 38, 464–474 (1995).
107. J. Malinowski, Photogr. Sci. Eng. 14, 112 (1970).
72. R. C. Baetzold, Phys. Rev. B 52, 11,424–11,431 (1995).
109. J. Malinowski, Photogr. Sci. Eng. 23, 99 (1979).
73. K. Lehovec, J. Chem. Phys. 21, 1,123 (1953).
110. E. Moisar, Photogr. Sci. Eng. 26, 124–132 (1982).
74. K. L. Kliewer, J. Phys. Chem. Solids 27, 705, 719–(1966).
111. E. Moisar, Photogr. Sci. Eng. 25, 45–56 (1981).
75. R. B. Poeppel and J. M. Blakely, Surface Sci. 15, 507 (1969).
112. E. Moisar and F. Granzer, Photogr. Sci. Eng. 26, 1–14 (1982).
76. Y. T. Tan and H. A. Hoyen Jr., Surface Sci. 36, 242 (1973).
108. J. Malinowski, J. Photogr. Sci. 18, 363 (1974).
77. H. A. Hoyen, in A. Baldereschi, W. Czaja, E. Tosati, and M. Tosi, eds., The Physics of Latent Image Formation in Silver Halides, World Scientific, Singapore, 1984, p. 151.
113. M. R. V. Sahyun, Photogr. Sci. Eng. 27, 171–177 (1983).
78. H. A. Hoyen Jr. and Y. T. Tan, J. Colloid Interface Sci. 79, 525–534 (1981).
116. F. Seitz, Rev. Mod. Phys. 23, 328 (1951).
79. F. Bassini, R. S. Knox, and W. B. Fowler, Phys. Rev. A 137, 1,217 (1965).
114. M. R. V. Sahyun, Photogr. Sci. Eng. 28, 157–161 (1984). 115. W. F. Berg, Trans. Faraday Soc. 39, 115 (1943). 117. B. E. Bayer and J. F. Hamilton, J. Opt. Soc. Am. 55, 439–452 (1965). 118. P. C. Burton and W. F. Berg, Photogr. J. 86B, 2 (1946).
80. W. B. Fowler, Phys. Stat. Sol.(b) 52, 591 (1972).
119. P. C. Burton, Photogr. J. 86B, 62 (1946).
81. F. Moser and R. Ahrenkiel, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 1, Sect. IV.
120. P. C. Burton and W. F. Berg, Photogr. J. 88B, 84 (1948).
82. R. C. Brandt and F. C. Brown, Phys, Rev. 181, 1,241 (1969). 83. A. M. Stoneham, Theory of Defects in Solids, Clarendon, Oxford, 1975. 84. P. Langevin, Ann. Chem. Phys. 28, 433 (1903).
121. P. C. Burton, Photogr. J. 88B, 13 (1948). 122. P. C. Burton, Photogr. J. 88B, 123 (1948). 123. W. F. Berg, Rep. Prog. Phys. 11, 248 (1948). 124. R. K. Hailstone and J. F. Hamilton, J. Imaging Sci. 29, 125–131 (1985).
85. M. Lax, Phys. Rev. 119, 1,502 (1960).
125. J. F. Hamilton and B. E. Bayer, J. Opt. Soc. Am. 55, 528–533 (1965).
86. R. M. Gibb, G. J. Rees, B. W. Thomas, B. L. H. Wilson, B. Hamilton, D. R. Wight, and N. F. Mott, Philos. Mag. 36, 1,021 (1977).
126. J. F. Hamilton and B. E. Bayer, J. Opt. Soc. Am. 56, 1,088–1,094 (1966).
87. Y. Toyozawa, Semicond. Insulators 5, 175 (1983) and references therein.
128. J. F. Hamilton, Radiat. Effects 72, 103–106 (1983).
88. H. Kanzaki and S. Sakuragi, J. Phys. Soc. Jpn 27, 109 (1969). 89. F. Moser, R. K. Ahrenkeil, and S. L. Lyu, Phys. Rev. 161, 897 (1967). 90. V. Platikanova and J. Malinowsi, Phys. Stat. Sol. 47, 683 (1978). 91. R. E. Maerker, J. Opt. Soc. Am. 44, 625 (1954). 92. M. Tamura, H. Hada, S. Fujiwara, and S. Ikenoue, Photogr. Sci. Eng. 15, 200 (1971).
127. J. F. Hamilton, Photogr. Sci. Eng. 26, 263–269 (1982). 129. J. F. Hamilton and P. C. Logel, Photogr. Sci. Eng. 18, 507–512 (1974). 130. R. K. Hailstone, N. B. Liebert, M. Levy, and J. F. Hamilton, J. Imaging Sci. 31, 185–193 (1987). 131. R. K. Hailstone, N. B. Liebert, M. Levy, and J. F. Hamilton, J. Imaging Sci. 31, 255–262 (1987). 132. P. Fayet, F. Granzer, G. Hegenbart, E. Moisar, B. Pischel, and L. W¨oste, Phys. Rev. Lett. 55, 3,002 (1985). 133. T. Leisner, C. Rosche, S. Wolf, F. Granzer, and L. W¨oste, Surf. Rev. Lett. 3, 1,105–1,108 (1996).
93. M. Kawasaki and H. Hada, J. Soc. Photogr. Sci. Technol. Jpn. 44, 185 (1981).
134. R. K. Hailstone and J. F. Hamilton, J. Imaging Sci. 31, 229–238 (1987).
94. H. Hada and M. Kawasaki, J. Appl. Phys. 54, 1,644 (1983).
135. F. Trautweiler, Photogr. Sci. Eng. 12, 138–142 (1968).
95. M. Kawasaki and H. Hada, J. Imaging Sci. 29, 132 (1985). 97. G. W. Luckey, Discuss. Faraday Soc. 28, 113 (1959).
136. D. E. Powers, S. G. Hamsen, M. E. Geusic, D. L. Michalopoulos, and R. E. Smalley, J. Chem. Phys. 78, 2,866–2,881 (1983).
98. S. E. Sheppard, A. P. H. Trivelli, R. P. Loveland, J. Franklin Inst. 200, 15 (1925).
137. M. Kawaski, Y. Tsujimura, and H. Hada, Phys. Rev. Lett. 57, 2,796–2,799 (1986).
99. R. W. Gurney and N. F. Mott, Proc. R. Soc. London A 164, 151 (1938).
138. J. F. Hamilton and R. C. Baetzold, Photogr. Sci. Eng. 25, 189–197 (1981).
96. M. Kawasaki and H. Hada, J. Imaging Sci. 31, 267 (1987).
100. N. F. Mott and R. W. Gurney, Electronic Processes in Ionic Crystals, Clarendon, Oxford, 1940.
139. R. C. Baetzold, J. Phys. Chem. 101, 8,180–8,190 (1997). 140. T. H. James, J. Photogr. Sci. 20, 182–186 (1972).
1308
SILVER HALIDE DETECTOR TECHNOLOGY
141. R. K. Hailstone and J. F. Hamilton, J. Photogr. Sci. 34, 2–8 (1986). 142. J. F. Hamilton, Photogr. Sci. Eng. 14, 122–130 (1970). 143. J. F. Hamilton, Photogr. Sci. Eng. 14, 102–111 (1970). 144. V. I. Saunders, R. W. Tyler, and W. West, Photogr. Sci. Eng. 16, 87 (1972). 145. R. S. Van Heyingen and F. C. Brown, Phys. Rev. 111, 462 (1958). 146. H. E. Spencer and R. E. Atwell, J. Opt. Soc. Am. 54, 498–505 (1964). 147. R. Deri and J. Spoonhower, J. Appl. Phys. 57, 2,806 (1985).
181. M. Ridgway and P. J. Hillson, J. Photogr. Sci. 23, 153 (1975). 182. P. H. Roth and W. H. Simpson, Photogr. Sci. Eng. 24, 133 (1975). 183. D. A. Pitt, D. L. Rachu, and M. R. V. Sahyun, Photogr. Sci. Eng. 25, 57 (1981). 184. J. E. Keevert and V. V. Gokhale, J. Imaging Sci. 31, 243 (1987). 185. H. Kamzaki and Y. Tadakuma, J. Phys. Chem. Solids 55, 631 (1994). 186. H. Kanzaki and Y. Tadakuma, J. Phys. Chem. Solids 58, 221 (1997).
148. J. Flad, H. Stoll, and H. Preuss, Z. Phys. D — At. Mol. Clusters 6, 193–198 (1987).
187. R. C. Baetzold, J. Imaging Sci. Technol. 43, 375 (1999).
149. J. Flad, H. Stoll, and H. Preuss, Z. Phys. D — At. Mol. Clusters 6, 287–292 (1987).
189. J. F. Hamilton, J. M. Harbison, and D. L. Jeanmaire, J. Imaging Sci. 32, 17 (1988).
150. J. Flad, H. Stoll, and H. Preuss, Z. Phys. D — At. Mol. Clusters 15, 79–86 (1990).
190. D. Zhang and R. K. Hailstone, J. Imaging Sci. Technol. 37, 61 (1993).
151. P. G. Nutting, Philos. Mag. 26(6), 423 (1913).
191. K. Morimura and H. Mifune, J. Soc. Photogr. Sci. Technol. Jpn. 61, 175 (1998).
188. H. Frieser and W. Bahnmuller, J. Photogr. Sci. 16, 38 (1968).
152. J. Gasper and J. J. DePalma, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 20.
192. T. Kaneda, J. Imaging Sci. 33, 115–118 (1989).
153. J. H. Webb, J. Opt. Soc. Am. 38, 27 (1948).
194. T. Tani, J. Imaging Sci. Technol. 39, 386 (1995).
154. G. C. Farnell, J. Photogr. Sci. 7, 83 (1959).
195. H. E. Spencer and R. E. Atwell, J. Opt. Soc. Am. 54, 498–505 (1964).
155. G. C. Farnell, J. Photogr. Sci. 8, 194 (1960).
193. T. Tani, Photogr. Sci. Eng. 27, 75 (1983).
156. R. K. Hailstone, N. B. Liebert, M. Levy, R. T. McCleary, S. R. Girolmo, D. L. Jeanmaire, and C. R. Boda, J. Imaging Sci. 32, 113–124 (1988).
196. H. E. Spencer and M. Levy, J. Soc. Photogr. Sci. Technol. Jpn. 46, 514–524 (1983).
157. L. Silberstein, Philos. Mag 45, 1,062 (1923).
198. R. K. Hailstone, N. B. Liebert, and M. Levy, J. Imaging Sci. 34, 169–176 (1990).
158. G. C. Farnell and J. B. Chanter, J. Photogr. Sci. 9, 73 (1961). 159. H. E. Spencer, Photogr. Sci. Eng. 15, 468 (1971). 160. G. C. Attridge, J. Photogr. Sci. 30, 197 (1982). 161. T. A. Babcock and T. H. James, J. Photogr. Sci. 24, 19 (1976). 162. G. C. Farnell, J. Photogr. Sci. 17, 116 (1969). 163. P. Broadhead and G. C. Farnell, J. Photogr. Sci. 30, 176 (1982). 164. T. Tani, J. Imaging Sci. 29, 93 (1985). 165. A. G. DiFrancesco, M. Tyne, C. Pryor, and R. Hailstone, J. Imaging Sci. Technol. 40, 576–581 (1996). 166. T. Tani, J. Soc. Photogr. Sci. Technol. Jpn. 43, 335 (1980). 167. J. W. Mitchell, Photogr. Sci. Eng. 25, 170 (1981). 168. R. K. Hailstone, J. Appl. Phys. 86, 1,363–1,369 (1999). 169. R. W. Bunsen and H. E. Roscoe, Ann. Phys. Chem. 2(117), 529 (1862). 170. J. H. Webb, J. Opt. Soc. Am. 40, 3 (1950). 171. T. A. Babcock, P. M. Ferguson, W. C. Lewis, and T. H. James, Photogr. Sci. Eng. 19, 49–55 (1975). 172. T. A. Babcock, P. M. Ferguson, W. C. Lewis, and T. H. James, Photogr. Sci. Eng. 19, 211–214 (1975).
197. E. Moisar, Photogr. Sci. Eng. 25, 45 (1981).
199. J. Pouradier, A. Maillet, and B. Cerisy, J. Chim. Phys. 63, 469 (1966). 200. K. Tanaka, Nippon Kagaku Kaishi 12, 2,264 (1973). 201. H. Hirsch, J. Photogr. Sci. 20, 187 (1972). 202. H. E. Spencer, J. Imaging Sci. 32, 28–34 (1988). 203. D. Spracklen, J. Photogr. Sci. 9, 145–(1961). 204. P. Faelens, Photogr. Korr. 104, 137 (1968). 205. D. Cash, Photogr. Sci. Eng. 27, 156 (1983). 206. R. C. Baetzold, J. Photogr.Sci. 28, 15–22 (1980). 207. J. M. Harbison and J. F. Hamilton, Photogr. Sci. Eng. 19, 322 (1975). 208. G. W. Lowe, J. E. Jones, and H. E. Roberts, in J. W. Mitchell, ed., Fundamentals of Photographic Sensitivity (Proc. Bristol Symp.), Butterworths, London, 1951, p. 112. 209. T. Tani, Photogr. Sci. Eng. 15, 181 (1971). 210. T. Tani and M. Murofushi, J. Imaging Sci. Technol. 38, 1 (1994). 211. T. Tani, J. Imaging Sci. Technol. 41, 577 (1997). 212. T. Tani, J. Imaging Sci. Technol. 42, 402 (1998).
173. G. A. Janusonis, Photogr. Sci. Eng. 22, 297–301 (1978)
213. S. Guo and R. Hailstone, J. Imaging Sci. Technol. 40, 210 (1996).
174. J. M. Harbison and H. E. Spencer, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 5.
214. M. Kawasakki and Y. Oku, J. Imaging Sci. Technol. 42, 409 (1998).
175. H. E. Spencer, J. Photogr. Sci. 24, 34–39 (1976).
215. A. P. Marchetti, A. A. Muenter, R. C. Baetzold, and R. T. McCleary, J. Phys. Chem. B 102, 5,287–5,297 (1998).
176. S. Sheppard, Photogr. J. 65, 380 (1925).
216. T. Tani, Imaging Sci. J. 47, 1 (1999).
177. S. Sheppard, Photogr. J. 66, 399 (1926).
217. E. Moisar, Photogr. Korr. 106, 149 (1970).
178. D. J. Cash, J. Photogr. Sci. 20, 19 (1972).
218. S. S. Collier, Photogr. Sci. Eng. 23, 113 (1979).
179. D. J. Cash, J. Photogr. Sci. 20, 77 (1972).
219. H. E. Spencer, L. E. Brady, and J. F. Hamilton, J. Opt. Soc. Am. 57, 1,020 (1967).
180. D. J. Cash, J. Photogr. Sci. 20, 223 (1972).
SILVER HALIDE DETECTOR TECHNOLOGY
1309
220. H. E. Spencer, R. E. Atwell, and M. Levy, J. Photogr. Sci. 31, 158 (1983).
254. T. Tani, T. Suzumoto and K. Ohzeki, J. Phys. Chem. 94, 1,298–1,301 (1990).
221. H. E. Spencer, Photogr. Sci. Eng. 11, 352 (1967).
255. R. A. Marcus, Annu. Rev. Phys. Chem. 15, 155 (1964).
222. T. Tani, J. Imaging Sci. 29, 93 (1985). 223. T. Tani, J. Imaging Sci. 30, 41 (1986).
256. R. W. Berriman and P. B. Gilman Jr., Photogr. Sci. Eng. 17, 235–244 (1973).
224. R. K. Hailstone, N. B. Liebert, M. Levy, and J. F. Hamilton, J. Imaging Sci. 35, 219–230 (1991).
257. J. R. Lenhard and B. R. Hein, J. Phys. Chem. 100, 17,287 (1996).
225. S. S. Collier, Photogr. Sci. Eng. 26, 98 (1982).
258. A. A. Muenter, P. B. Gilman Jr., J. R. Lenhard, and T. L. Penner, The Int. East-West Symp. Factors Influencing Photogr. Sensitivity, 1984. Mauii, Hawaii, Paper C-4.
226. A. G. DiFrancesco, M. Tyne, and R. Hailstone, IS & T 49th Annual Conf., Minneapolis, MN, 1996, p. 222. 227. R. S. Eachus and M. T. Olm, Crystl. Lattice Deformation Amorphous Mater. 18, 297 (1989). 228. R. S. Eachus and M. T. Olm, Annu. Rep. Prog. Chem. C 83, 3 (1989). 229. D. A. Corrigan, R. S. Eachus, R. E. Graves, and M. T. Olm, J. Chem. Phys. 70, 5,676 (1979). 230. A. P. Marchetti and R. S. Eachus, in D. Volman, G. Hammond, and D. Neckers, eds., Advances in Photochemistry, vol. 17, J Wiley, NY, 1992, pp 145–216. 231. R. S. Eachus and R. E. Graves, 5,445–5,452 (1976).
J.
Chem.
Phys.
65,
259. T. H. James, Photogr. Sci. Eng. 18, 100–109 (1974). 260. W. C. Lewis and T. H. James, Photogr. Sci. Eng. 13, 54–64 (1969). 261. J. M. Simson and W. S. Gaugh, Photogr. Sci. Eng. 19, 339–343 (1975). 262. F. J. Evans and P. B. Gilman Jr., Photogr. Sci. Eng. 19, 333–339 (1975). 263. D. M. Sturmer and W. S. Gaugh, Photogr. Sci. Eng. 19, 344–351 (1975). 264. J. R. Lenhard, B. R. Hein, and A. A. Muenter, J. Phys. Chem. 97, 8,269–8,280 (1993).
¨ 232. W. Bahnmuller, Photogr. Korr. 104, 169 (1968).
265. P. B. Gilman Jr., Photogr. Sci. Eng. 11, 222 (1967).
233. H. Zwickey, J. Photogr. Sci. 33, 201 (1985).
266. P. B. Gilman Jr., Photogr. Sci. Eng. 12, 230 (1968).
234. B. H. Carroll, Photogr. Sci. Eng. 24, 265–267 (1980).
267. P. B. Gilman Jr., Photogr. Sci. Eng. 18, 418 (1974).
235. R. S. Eachus and R. E. Graves, J. Chem. Phys. 65, 1,530 (1976).
268. W. E. Lee and E. R. Brown, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 11.
236. M. T. Olm, R. S. Eachus, and W. S. McDugle, Bulg. Chem. Comm. 26, 350–367 (1993). 237. W. West and P. B. Gilman Jr., in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 10. 238. D. M. Sturmer and D. W. Heseltine, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 8. 239. H. Kuhn, J. Chem. Phys. 17, 1,198–1,212 (1949). 240. A. H. Herz, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 9. 241. T. Tani and S. Kikuchi, Bull. Soc. Sci. Photogr. Jpn. 18, 1 (1968). 242. T. Tani and S. Kikuchi, J. Photogr. Sci. 17, 33 (1969). 243. H. Matsusaki, H. Hada, and M. Tamura, J. Soc. Photogr. Sci. Technol. Jpn. 31, 204 (1968). 244. J. F. Padday, Trans. Faraday Soc. 60, 1,325 (1964).
¨ 246. E. Gunther and E. Moisar, J. Photogr. Sci. 13, 280 (1965). 247. H. Phillippaerts, W. Vanassche, F. Cleaes, and H. Borginon, J. Photogr. Sci. 20, 215 (1972). 248. W. West and A. L. Geddes, J. Phys. Chem. 68, 837 (1964). 249. T. Tani and S. Kikuchi, Bull. Soc. Sci. Photogr. Jpn. 17, 1 (1967). H. B.
251. L. G. S. Brooker, F. L. White, D. W. Heseltine, G. H. Keyes, S. G. Dent Jr., and E. J. Van Lare, J. Photogr. Sci. 1, 173 (1953). 252. R. L. Large, in R. Cox, ed., Photographic Sensitivity, Academic Press, NY, 1973, pp. 241–263. 253. J. Lenhard, J. Imaging Sci. 30, 27–35 (1986).
270. G. Haist, Modern Photographic Processing, vols. 1 and 2, J Wiley, NY, 1979. 271. J. Kapecki and J. Rodgers, in Kirk Othmer Encyclopedia of Chemical Technology, vol. 6, 4th ed., Wiley, NY, 1993, pp. 965–1002. 272. J. R. Thirtle, L. K. J. Tong, and L. J. Fleckenstein, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 12. 273. P. Krause, in J. Sturge, V. Walworth, and A. Shepp, ed., Imaging Materials and Processes, Neblette’s 8th ed., Van Nostrand Reinhold, New York, 1989, Chap. 4. 274. J. F. Hamilton, Appl. Opt. 11, 13 (1972). 275. C. R. Berry, Photogr. Sci. Eng. 13, 65 (1969). 276. H. D. Keith and J. W. Mitchell, Philos. Mag. 44, 877 (1953).
245. B. H. Carroll and W. West, in J. W. Mitchell, ed., Fundamentals of Photographic Sensitivity (Proc. Bristol Symp.), Butterworths, London, 1951, p. 162.
250. G. R. Bird, K. S. Norland, A. E. Rosenoff, and Michaud, Photogr. Sci. Eng. 12, 196–206 (1968).
269. T. H. James, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chaps. 13, 14.
277. D. C. Skillman, Photogr. Sci. Eng. 19, 28 (1975). 278. W. E. Lee, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 14, Sect. II. 279. G. I. P. Levensen, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 15. 280. M. A. Kriss, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 21. 281. J. Dainty and R. Shaw, Image Science, Academic Press, NY, 1974. 282. M. R. Pointer and R. A. Jeffreys, J. Photog. Sci. 39, 100 (1991). 283. L. Erickson and H. R. Splettstosser, in T. H. James, ed., The Theory of the Photographic Process, 4th ed., Macmillan, NY, 1977, Chap. 23, Sect. III.
1310
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT) MARK T. MADSEN University of Iowa Iowa City, IA
INTRODUCTION Single photon emission computed tomography (SPECT) is a diagnostic imaging modality that produces tomographic slices of internally distributed radiopharmaceuticals. It is routinely used in diagnosing coronary artery disease and in tumor detection. Projection views of the radiopharmaceutical distribution are collected by one or more scintillation cameras mounted on a gantry designed to rotate about a patient lying horizontally on a pallet. The projection information is mathematically reconstructed to obtain the tomographic slices. Most clinical SPECT studies are qualitative and have simplistic corrections for attenuation and scattered radiation. Quantitative SPECT requires corrections for attenuation, scatter and spatial resolution, although these have not been routinely implemented in the past because of their computational load. SPECT instrumentation has evolved to include coincidence imaging of positron-emitting radiopharmaceuticals, specifically 18 F Fluorodeoxyglucose. RADIOTRACERS Much of medical imaging depends on anatomic information. Examples include radiographs, X-ray computed tomography (CT), and magnetic resonance imaging (MRI). In SPECT imaging, functional information is obtained about tissues and organs from specific chemical compounds labeled with radionuclides that are used as tracers. These radiotracers, or radiopharmaceuticals, are nearly ideal tracers because they can be externally detected and they are injected in such small quantities that they do not perturb the physiological state of the patient. A radionuclide is an unstable atomic nucleus that spontaneously emits energy (1). As part of this process, it may emit some or all of the energy as high energy photons called gamma rays. Because gamma rays are energetic, a significant fraction of them are transmitted from their site of origin to the outside of the body where they can be
detected and recorded. Only a small number of radionuclides are suitable as radiotracers. The most commonly used radionuclides in SPECT imaging are summarized in Table 1. The phrase ‘‘single photon’’ refers to the fact that in this type of imaging, gamma rays are detected as individual events. The term is used to distinguish SPECT from positron-emitting emission tomography (PET) which also uses radionuclides, but relies on coincidence imaging. The radionuclides used in PET emit positrons, which quickly annihilate with electrons to form two, collinear 511-keV photons. Both of the annihilation photons have to be detected simultaneously by opposed detectors to record a true event, as discussed in more detail in the SPECT/PET Hybrid section. Diagnostic information is obtained from the way the tissues and organs of the body process the radiopharmaceutical. For example, some tumor imaging uses radiopharmaceuticals that have affinity for malignant tissue. In these scans, abnormal areas are characterized by an increased uptake of the tracer. In nearly all instances, the radiopharmaceutical is administered to the patient by intravenous injection and is carried throughout the body by the circulation where it localizes in tissues and organs. Because SPECT studies require 15–30 minutes to acquire, we are limited to radiopharmaceuticals whose distribution will remain relatively constant over that or longer intervals. Ideally, we also want the radiopharmaceutical to distribute only in abnormal tissues. Unfortunately, this is never the case, and the abnormal concentration of the radiotracer is often obscured by normal uptake of the radiopharmaceutical in surrounding tissues. This is why tomographic imaging is crucial. It substantially increases the contrast of the abnormal area, thereby greatly improving the likelihood of detection. The widespread distribution of the radiopharmaceutical in the body has other implications; the most important is radiation dose. The radiation burden limits the amount of radioactivity that can be administered to a patient, and for most SPECT studies, this limits the number of detected emissions and thereby, the quality of the SPECT images. SPECT studies are performed for a wide variety of diseases and organ systems (2–6). Although myocardial perfusion imaging (Fig. 1) and tumor scanning (Fig. 2) are by far the most common SPECT applications, other studies include brain perfusion for evaluating stroke (Fig. 3) and dementias, renal function, and the evaluation of trauma.
Table 1. SPECT Radionuclides Radionuclide 99m Tc 67 Ga 111 In 123 I 131 I 133 Xe 201 Tl 18 F
Decay Mode IT EC EC EC ββEC β+
Production Method 99 Mo
generator
68 Zn(p,2n)67 Ga 111 Cd(p,n)111 In 124 Te(p,5n)123 I
Fission by-product Fission by-product 201 Hg(d,2n)201 Tl 18 O(p,n)18 F
Half-Life
Principal photon Emissions (keV)
6 h 78 h 67 h 13 h 8 days 5.3 days 73 h 110 min
140 93, 185, 296 172, 247 159 364 30(X rays), 80 60–80(X rays), 167 511
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
1311
(c)
(a)
A
L I
A
S
Septal
Lateral I
(b)
(d)
200 EF: 69% EDV: 182 mL ESV: 56 mL SV: 126 mL Mass: 189 gm
150 100 50 0
1
2
3
4
5
6
7
8
Figure 1. Myocardial perfusion SPECT. This common SPECT procedure is used to evaluate coronary artery disease. SPECT images show regional blood flow in the heart muscle under resting and stress conditions. Both bullet (a) and bull’s eye displays (b) are used to compare the 3-D rest and stress images. Myocardial SPECT studies can also be gated to evaluate wall motion (c) and ejection fraction (d). (a)
Transverse
Sagittal
Coronal
Transverse
Sagittal
Coronal
(b)
Figure 2. Tumor scanning. This shows images from scan for prostate cancer. Because of the difficulty in distinguishing abnormal uptake from circulating tracer, additional studies using a nonspecific radiotracer are acquired simultaneously. The upper set of images (a) shows tumor uptake (arrows) and no corresponding uptake in the corresponding blood pool image in the lower set (b).
Table 2 summarizes several of the most common SPECT studies along with radiation dose estimates. Gamma Ray Interactions To understand the detection and imaging of gamma rays, we must first review gamma ray interactions
with different materials (7). The intensity of a gamma ray beam decreases as it traverses through a material because of interactions between the gamma rays and the electrons in the material. This is referred to as attenuation. Attenuation is an exponential process described by I(x) = Io exp(−µx),
(1)
1312
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
(a) 28
29
30
31
32
(b) 36
37
38
39
40
(c) 32
33
34
35
36
Figure 3. Brain perfusion study. These images show the asymmetric distribution of blood flow in the brain resulting from a stroke. Selected transverse (a), sagittal (b), and coronal views (c) are shown.
Table 2. SPECT Radiopharmaceuticals Radiopharmaceutical 99m Tc 99m Tc 99m Tc 99m Tc 123 I 67 Ga 111 In 111 In 201 Tl 18 F
Medronate(MDP), Oxidronate(HDP) Exametazine(HMPAO), Bicisate(ECD) Arcitumomab(CEA Scan) Sestamibi, Tetrofosmin Metaiodobenzylguanidine (MIBG) Citrate Capromab Pendetide(ProstaScint) Pentetreotide(OctreoScan) Thallous Chloride Fluoro-2-deoxyglucose(FDG)
where Io is the initial intensity, I(x) is the intensity after traveling a distance x through the material, and µ is the linear attenuation coefficient of the material. Over the range of gamma ray energies used in radionuclide imaging, the two primary interactions that contribute to
Application
Effective Dose (rem)
Bone scan
0.75
Brain perfusion
1.2
Colon cancer Myocardial perfusion, breast cancer Neuroendocrine tumors
0.75 1.2
Infection, tumor localization Prostate cancer
2.5 2.1
Neuroendocrine tumors Myocardial perfusion Tumor localization, Myocardial viability
2.1 2.5 1.1
0.7
the attenuation coefficient are photoelectric absorption and Compton scattering. Photoelectric absorption refers to the total absorption of the gamma ray by an inner shell atomic electron. It is not an important interaction in body tissues, but it is the primary interaction in high
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
Z materials such as sodium iodide (the detector material used in the scintillation camera) and lead. Photoelectric absorption is inversely proportional to the cube of the gamma ray energy, so that the efficiency of detection falls sharply as photon energy increases. Compton scattering occurs when the incoming gamma ray interacts with a loosely bound outer shell electron. A portion of the gamma ray energy is imparted to the electron, and the remaining energy is left with the scattered photon. The amount of energy lost in scattering depends on the angle between the gamma ray and the scattered photon. The cross section of Compton scattering is inversely proportional to the gamma ray energy, and it is the dominant interaction in body tissues.
1313
Scintillation camera Pulse height analyzer Energy signal
Position signals
X Y
Z
PMT array Nal(TI) crystal Image matrix Collimator
Scintillation Cameras All imaging studies in nuclear medicine (SPECT and conventional planar) are acquired on scintillation cameras (also referred to as Anger or gamma cameras) invented by H. O. Anger in 1953 (8,9). The detector of the scintillation camera is a large, thin sodium iodide crystal (Fig. 4). Typical dimensions of the crystal are 40 × 50 cm and 9.5 mm thick. Sodium iodide, NaI(Tl), is a scintillator; it converts absorbed gamma ray energy into visible light. The magnitude of the light flash is proportional to the energy absorbed, so that information about the event energy as well as location is available. Photomultiplier tubes, which convert the scintillation into an electronic pulse, are arranged in a close-packed array that covers the entire sensitive area of the crystal. Approximately sixty 7.5-cm photomultiplier tubes are required for the scintillation camera dimensions given before. The location of the detected event is determined by the positionweighted average of the electronic pulses generated by the photomultiplier tubes in the vicinity of the event. This approach yields an intrinsic spatial resolution in the range of 3–4 mm. In addition to estimating the position of the event, the photomultiplier tube signals are also combined to estimate the energy absorbed in the interaction. The energy signal is used primarily to discriminate against Compton scattered radiation that occurs in the patient and to normalize the position signals so that the size of the image does not depend on the gamma ray energy. It also makes it possible to image distributions of radiotracers labeled with different radionuclides simultaneously. This is often referred to as dual isotope imaging; however, modern gamma cameras can acquire simultaneous images from four or more energy ranges. Because the response of the crystal and photomultiplier tubes is not uniform, additional corrections are made for position-dependent shifts in the energy signal (referred to as Z or energy correction) and in determining the event location (referred to as L or spatial linearity correction). Thus, when a gamma ray is absorbed, the scintillation camera must determine the position and energy of the event, determine if the energy signal falls within a selected pulse height analyzer window, and apply spatial linearity correction. At this point, the location within the image matrix corresponding to the event has its count value increased by one. A scintillation camera image is generated from the accumulation of many (105 –106 ) detected events.
Figure 4. Scintillation camera. Virtually all SPECT imaging is performed with scintillation cameras. The scintillation camera determines the location and energy of each gamma ray interaction through the weighted averaging of photomultiplier signals. A collimator is required to form the image on the NaI(Tl) detector.
The time it takes to process an event is ultimately limited by the scintillation relaxation time [t = 250 ns for NaI(Tl)]. For most SPECT imaging, this does not present any problem. However, it becomes a severe constraint for coincidence imaging, discussed in detail later. Typical performance specifications of scintillation cameras are given in Table 3. Gamma rays cannot be focused because of their high photon energy. Therefore, a collimator must be used to project the distribution of radioactivity within the patient onto the NaI(Tl) crystal (10). A collimator is a multihole lead device that selectively absorbs all gamma rays except those that traverse the holes (Fig. 5). This design severely restricts the number of gamma rays that can be detected. Less than 0.05% of the gamma rays that hit the front
Table 3. Scintillation Camera Specifications Parameter Crystal size Crystal thickness Efficiency at 140 keV Efficiency at 511 keV Energy resolution Intrinsic spatial resolution System count sensitivitya System spatial resolution at 10 cma Maximum count rate(SPECT) Maximum count rate (coincidence) a
Specified for high-resolution collimator.
Specification 40 × 50 cm 9.5 mm 0.86 (photopeak); 0.99 (total) 0.05 (photopeak); 0.27 (total) 10% 3.5 mm 250 counts/min/µCi 8.0 mm 250,000 counts/s >1,000,000 counts/s
1314
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
1.5 mm
25 mm 0.2 mm
Parallel
Fan beam
Cone beam
Figure 5. Collimators. Collimators are the image-forming apertures of the scintillation camera. They can be configured in parallel, fan-beam, and cone-beam geometries.
surface of the collimator are transmitted through to the crystal. Several parameters enter in the design of collimators. Most collimators have parallel holes that map the gamma ray distribution one-to-one onto the detector. Trade-offs are made in optimizing of the design for count sensitivity and spatial resolution. The sensitivity of the collimator is proportional to the square of the ratio of the hole size d and length l) (ε ∝ d2 /l2 ). The spatial resolution (Rcol ), characterized by the full-width-at-half-maximum (FWHM) of the line spread function, is proportional to d/l. The desire is to maximize ε while minimizing Rcol . Because the optimal design often depends on the specific imaging situation, most clinics have a range of collimators available. Some typical examples are given in Table 4. For low-energy studies (Eγ < 150 keV) either high-resolution or ultrahigh resolution collimators are typically used. Because the lead absorption of gamma rays is inversely proportional to the gamma ray energy, the design of collimators is influenced by the gamma ray energies that are imaged. As the photon energy increases, thicker septa are required, and to maintain count sensitivity, the size of the holes is increased which compromises spatial resolution. Parallel hole geometry is not the most efficient arrangement. Substantial increases in count sensitivity are obtained by using fan- and cone-beam geometries (11) (Fig. 5). The disadvantage of these configurations
is that the field of view becomes smaller as the sourceto-collimator distance increases. This presents a problem for SPECT imaging in the body where portions of the radiopharmaceutical distribution are often truncated. Fan-beam collimators are routinely used for brain imaging, and hybrid cone-beam collimators are available for imaging the heart. In addition to the dependence on hole size and length, the spatial resolution of a collimator depends on the source-to-collimator distance, as shown in Fig. 6. The overall system spatial resolution Rsys can be estimated from Rsys =
R2col + R2int .
(2)
For most imaging, the collimator resolution is substantially larger than the intrinsic spatial resolution (Rint ∼ 3.5 mm) and is the dominant factor. Therefore, it is very important for the collimator to be as close to the patient as possible. SPECT Systems A SPECT system consists of one or more scintillation cameras mounted on a gantry that can revolve about a fixed axis in space, the axis of rotation (8,9,12,13) (Fig. 7).
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
1315
Table 4. Collimator Specifications
Collimator Parallel general purpose Parallel high-sensitivity Parallel high-resolution Parallel ultrahigh resolution Ultrahigh fan-beam Cone-beam Parallel medium-energy Parallel high-energy Parallel ultrahigh energy
Energy (keV)
Hole Size (mm)
Hole Length (mm)
Septal Thickness (mm)
Relative Sensitivity
Rcol at 10 cm (mm)
140 140 140 140 140 140 300 364 511
1.4 2.0 1.1 1.1 1.4 1.9 3.5 4.2 4.2
24 24 24 36 35 41 50 63 80
0.20 0.20 0.15 0.15 0.20 0.25 1.30 1.30 2.40
1.0 2.0 0.7 0.3 0.8 1.2 0.8 0.5 0.3
8.3 11.9 6.3 4.5 7.1 7.1 11.6 13.4 12
(a)
5 cm
10 cm
15 cm
20 cm
25 cm
30 cm
35 cm
40 cm
(b) 45 40
FWHM (mm)
35 30 25 20 15 10 5 0 0
50
100
150
200
250
300
350
400
450
500
550
Source distance (mm)
Figure 6. Collimator spatial resolution as a function of source distance. The spatial resolution of a collimator falls continuously as the source distance increases. This is shown in the quality of the images (a) and the plot of the calculated FWHM (b).
SPECT studies are usually acquired over a full 360° arc. This yields better quality images than 180° acquisitions because it tends to compensate somewhat for the effects of attenuation. One exception to this practice is myocardial perfusion studies, which are acquired using views from
only 180° (see later). SPECT acquisitions are performed either in step-and-shoot mode or in a continuous rotational mode. In the step-and-shoot mode, the detector rotates to its angular position and begins collecting data after the detector stops for a preselected frame duration. In the
1316
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
(a)
(b)
Scintillation camera
(c)
(d)
Figure 7. SPECT systems. SPECT systems consist of one or more scintillation cameras mounted on a gantry that allows image collection from 360° around a patient. The most common configuration has two scintillation cameras. To accommodate the 180° sampling of myocardial perfusion studies, many systems can locate the scintillation cameras at either 90° or 180° .
continuous rotational mode, the duration of the entire study is selected, and the detector rotational speed is adjusted to complete one full orbit. Data are collected continually and are binned into a preselected number of projections. Typically 60 to 120 projection views are acquired over 360° . Another feature of SPECT acquisition is body contouring of the scintillation cameras. Because spatial resolution depends on the source-to-collimator distance, it is crucial to maintain close proximity to the body as the detector rotates about the patient. Although a number of different approaches have been used to accomplish this, the most common method moves the detectors radially in and out as a function of rotational angle. Myocardial perfusion studies are the most common SPECT procedures. Because the heart is located in the left anterior portion of the thorax, gamma rays originating in the heart are highly attenuated for views collected from the right lateral and right posterior portions of the arc. For this reason, SPECT studies of the heart are usually collected using the 180° arc that extends from the left posterior oblique to the right anterior oblique view (14) (Fig. 7c). This results in reconstructed images that have the best contrast, although distortions are often somewhat more pronounced than when 360° data are used (15). Because of the widespread use of myocardial perfusion imaging, many SPECT systems have been optimized for 180° acquisition by using two detectors arranged at ∼90° (Fig. 7c). This reduces the acquisition time by a factor of 2 compared to single detectors and is approximately 30% more efficient than triple-detector SPECT systems. Positioning the detectors at 90° poses some challenges for maintaining close proximity. Most systems rely on the motion of both the detectors and the SPECT table to accomplish this.
The heart is continually moving during the SPECT acquisition, and this further compromises spatial resolution. Because the heart beats many times per minute, it is impossible to acquire a stop-action SPECT study directly. However, the heart’s motion is periodic, so it is possible to obtain this information by gating the SPECT acquisition (16). In a gated SPECT acquisition, the cardiac cycle is subdivided, and a set of eight images that span the ECG R–R interval is acquired for each angular view. These images are placed into predetermined time bins based on the patient’s heart rate, which is monitored by the ECG R wave interfaced to the SPECT system. As added benefits of gating, the motion of the heart walls can be observed, and ventricular volumes and ejection fractions can be determined (17) (Fig. 1). Although most SPECT imaging samples a more or less static distribution of radionuclides, some SPECT systems can perform rapid sequential studies to monitor tracer clearance. An example of this is determining regional cerebral blood from the clearance of 133 Xe (18). Multiple 1-minute SPECT studies are acquired over a 10-minute interval. When one acquisition sample is completed, the next begins automatically. To minimize time, SPECT systems that perform these studies can alternately reverse the acquisition direction, although at least one SPECT system uses slip-ring technology, so that the detectors can rotate continuously in the same direction. SPECT Image Reconstruction The projection information collected by the SPECT system has to be mathematically reconstructed to obtain tomographic slices (8,19–21). The information sought is the distribution of radioactivity for one selected transaxial plane, denoted by f (x, y). A projection through this distribution consists of a set of parallel line integrals,
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
p(s, θ ), where s is the independent variable of the projection and θ is the angle along which the projection is collected. Ignoring attenuation effects for the moment, ∞ p(s, θ ) =
f (x , y ) dt,
(3)
−∞
where x = s cos θ − t sin θ ; y = s sin θ + t cos θ.
(4)
If a complete set of projections can be collected over θ , then one can analytically reconstruct f (x, y) by using several different equivalent methods. The most common approach used in SPECT is filtered backprojection: 2π f (x, y) =
plane. To reconstruct a tomographic slice, the projections associated with that plane are gathered together, as shown in Fig. 8. This organization of the projections by angle is often referred to as a sinogram because each source in the plane completes a sinusoidal trajectory. The reconstruction filter is usually applied to the projections in the frequency domain, and the filtered projections are then backprojected to generate the tomographic slice (Fig. 9). The noise level of the acquired projections is typically high. When the ramp reconstruction filter is applied, amplification of the noise-dominant higher frequencies overwhelms the reconstructed image. To prevent this, the reconstruction filter is combined with a low-pass filter (apodization). Many different low-pass filters have been used in this application. One common example is the Butterworth filter. B(ω) =
p∗ [x cos(θ ) + y sin(θ )] dθ,
1317
(5)
1 2N , ω 1+ ωc
(7)
0
where p∗ is the measured projection altered by a reconstruction filter [R(ω)]: p∗ (s, θ ) = FT−1 {FT[p(s, θ )] × R(ω)}.
and the apodized reconstruction filter is |ω| 2N . ω 1+ ωc
R(ω) =
(6)
For an ideal projection set (completely sampled and no noise), R(ω) = |ω| and is commonly referred to as a ramp filter. The amplification of high-frequency information by this filter requires adding a low-pass filter when real projections are reconstructed, as discussed later. Operationally, SPECT imaging proceeds as follows. Projection views are collected with a scintillation camera at multiple angles about the patient. The field of view of the scintillation camera is large, so that information is acquired from a volume where each row of the projection view corresponds to a projection from a transaxial
(8)
The adjustable parameters, the cutoff frequency (ωc ), and the order (N) allow configuring the reconstruction filter for different imaging situations. A low-cutoff frequency is desirable when the projections are noisy. When the count density’s high, a low cutoff yields an overly smoothed result, as shown in Fig. 10. An accurate description of SPECT imaging requires including attenuation. A more appropriate model of the measured projections (excluding scatter and resolution
0°
90°
180°
270°
Projection image set
Sinogram Figure 8. Projection data sets. The scintillation camera collects information from multiple projections in each view. A projection set consists of a stack of image rows from each of the angular views. The organization of projections by angle is commonly referred to as a sinogram. Madsen, M.T. Introduction to emission CT. Radiogrophics 15:975–991, 1995.
1318
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
Transform projection
Multiply by ramp filter
Transform back
Ramp filter
Projection data set (sinogram)
Spatial frequency
Filtered projection data set
Reconstructed image:
Figure 9. Filtered backprojection reconstruction. The projections are modified by the reconstruction filter and then backprojected to yield the tomographic image. For an ideal projection set, the ramp filter provides an exact reconstruction.
Backprojection of filtered projection data set
wc = 0.2 Nyquist
wc = 0.4 Nyquist
wc = 0.2 Nyquist
wc = 0.4 Nyquist
wc = 0.6 Nyquist
wc = 0.8 Nyquist
wc = 0.6 Nyquist
wc = 0.8 Nyquist
Figure 10. SPECT reconstruction filters and noise. Because of the statistical fluctuations in the projection views, it is necessary to suppress the high-frequency components of the ramp reconstruction filter by selecting appropriate filter parameters. Because noise suppression also reduces detail, the optimal filter choice depends on the organ system imaged and the count density. Madsen, M.T. Introduction to emission CT. Radiogrophics 15:975–991, 1995.
effects) is ∞ p(s, θ ) = −∞
f (x , y ) exp −
∞
µ(x , y ) dt dt.
(9)
t
This formulation is known as the attenuated Radon transform. Unfortunately, there is no analytic solution to this problem. Until recently, there were two ways of handling this problem for clinical studies. The first, and still very common, is simply to reconstruct the acquired projections using filtered backprojection and accept the
artifacts that accompany the inconsistent data set. The second is to apply a simple attenuation correction. The most commonly used attenuation correction is the firstorder Chang method in which a calculated correction map is applied to the reconstructed images (22). The correction factors are calculated by assuming that activity is uniformly distributed in a uniformly attenuating elliptical contour. The size of the ellipse is determined from the anterior and lateral projections. This attenuation correction is fairly adequate for parts of the body such as the abdomen and head but is not useful for the thorax where the assumptions are far from valid.
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
The other approach to image reconstruction uses iterative algorithms (23,24) (Fig. 11). Iterative algorithms are more time-consuming than filtered backprojection, but they have several important advantages. These include the elimination of radial streak artifacts, often seen in images reconstructed using filtered backprojection; accurate correction of physical degradations such as scatter, attenuation and spatial resolution; and better performance where a wide range of activities is present or where limited angle data are available. Iterative algorithms for image reconstruction were introduced in the 1970s resulting from the advent of X-ray computed tomography. These algorithms were extensions of general approaches to solving linear systems by using sparse matrices. Significant progress in iterative algorithms for emission computed tomography was made in 1982 when the maximum likelihood expectation maximization (MLEM) algorithm of Shepp and Vardi was introduced (25). In the ML-EM approach, the Poisson nature of the gamma rays is included in the derivation. The likelihood that the measured projections are consistent with the estimated emission distribution is maximized to yield λj cij yi λnew = , j bi i cij
(10)
i
where bi =
cik λk
1319
(11)
k
In this formulation, λj is the emission distribution (i.e., the SPECT image), y is the set of measured projections, and b is the set of calculated projections from the current estimate of λ. The cij are backprojection weighting factors that can also encompass appropriate factors for other physical effects such as attenuation, spatial resolution, and scatter. This yields an algorithm with several nice features. First, it is easy to see that because the updating of the estimate in each iteration depends on a ratio, it automatically restricts results to positive numbers. Second, the algorithm conserves the total image counts in each iteration. Unfortunately, the ML-EM algorithm converges slowly, and 20–50 iterations are often required for a satisfactory result. One reason for the slow convergence of the ML-EM algorithm is that the SPECT estimate is updated only at the end of each iteration. One way of significantly reducing the number of iterations is the ordered subset (OS-EM) approach introduced by Hudson and Larkin (26). Using OS-EM, the projection set is split into multiple equal-sized projection sets. For example, a projection set of 64 angular samples might be split into eight subsets of eight samples each. The members of each set are
ML-EM iterative algorithm
Calculate projections
Measured projections
Estimated projections
X
Updated estimate
Measured Estimated Backproject ratio
Filtered backprojection
OS-EM Figure 11. Iterative reconstruction. In iterative reconstruction, the initial uniform estimate of the tomographic slice is continually updated by backprojecting the ratio of the measured and calculated projections from the latest estimate. Although computationally intensive, iterative reconstructions allow accurate correction for attenuation and other physical degradations. They also reduce streak artifacts and perform better than filtered backprojection when the projection set is undersampled.
1320
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
chosen to span the angular range. Set one would consist of projections 1, 9, 17, 25, 33, 41, 49, and 57, and set two would have projections 2, 10, 18, 26, 34, 42, 50, and 58, and so on. The ML-EM algorithm is applied sequentially to each subset, and one iteration is completed when all the subsets have been operated on. Thus, the estimated emission distribution is updated multiple times in each iteration. This approach decreases the number of required iterations by a factor approximately equal to the number of subsets. As a result, five or less iterations of the OS-EM algorithm are sufficient for most SPECT reconstructions. Although strict convergence has not been demonstrated for the OS-EM algorithm, it is now the most commonly used iterative algorithm for emission tomographic applications. The reconstruction process produces a set of contiguous transaxial slices. These slices can be viewed individually, in a group, or even in a cine format. However, often the best way to visual the information is by using views that are parallel to the long axis of patients. These views can be generated directly from the transaxial slices. Sagittal slices are oriented at 0° (the side of the body) and proceed laterally from the right to the left side. Coronal slices are oriented at 90° (the front of the body) and proceed from posterior to anterior. These orientations are useful because many of the organs are aligned with the long axis of the body. An exception is the heart, which points down and to the left. Oblique views parallel and perpendicular to the long axis of the heart are generated for myocardial perfusion studies (see Figs. 1 and 14). Because myocardial
SPECT is common, automatic routines exist to generate these views. The transverse, sagittal, and coronal views are very useful, but they require that the observer view multiple slices. Another useful display is the maximum pixel intensity reprojection. Projections through the reconstructed slice volumes are calculated for typically 20–30 viewing angles over 360° . Instead of summing the information, the highest count pixel value is projected for each ray. Often this value will also be distance weighted. Then, the set of reprojected images is viewed in a cine format yielding a high-contrast, three-dimensional display. The maximum pixel reprojection displays are most useful for radiopharmaceuticals that accumulate in abnormal areas. Examples of these SPECT displays are shown in Fig. 12. SPECT imaging is susceptible to many artifacts if not performed carefully (27,28). Many of the artifacts are a direct consequence of the fundamental assumptions of tomography. The primary assumption is that the external measurements of the distribution reflect true projections, i.e., line integrals. It has already been noted that attenuation and scatter violate this assumption. In addition, it is critical that an accurate center of rotation is used in the backprojection algorithm. The center of rotation is the point on a projection plane that maps the center of the image field, and it must be known to within one-half pixel. Errors larger than this distort each reconstructed point into a ‘‘doughnut’’ shape (Fig. 13a). It
(a)
R Transverse L
R
Sagittal
L
R
Coronal
L
(b)
Figure 12. SPECT tomographic displays. Because SPECT uniformly samples a large volume, multiple transverse slices are available. This data can be resliced (a) to yield sagittal (side views parallel to the long axis of the body) and coronal (front views parallel to the long axis of the body), or any oblique view. Another useful display is (b) the maximum pixel reprojection set that is often viewed in cine mode.
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
(a)
Accurate COR
COR off by 3 pixels
is also a fundamental assumption of emission tomography that the detector field is uniform. Nonuniformities in the detector result in ring-shaped artifacts (Fig. 13b). This problem is especially acute for uniformities that fall near the center of rotation, where deviations as small as 1% can result in a very distinct artifact. A change of the source distribution from radiotracer kinetics or patient motion also induces artifacts (Fig. 13c). Although much of SPECT imaging today uses the methods described before, the equations presented earlier oversimplify of the actual imaging problem. Both the spatial resolution of SPECT system and the scatter contributions correlate information from other planes. An accurate description of SPECT requires a 3-D formulation such as (29) p(s, θ ) = c
∞ h(t, ω; r)
(b)
1321
f (r) exp −
−∞
∞
µ(u) dt dt dω.
r
(12) Here h(t, ω; r) represents a three-dimensional system transfer function that includes both the effects of spatial resolution and scattered radiation, f (r) is the emission distribution and µ(u) is the attenuation distribution. This more accurate model has not been routinely implemented for clinical situations because of high computational costs. However, investigations have shown measurable improvements in image quality, and it is likely that the 3-D formulation will be standard in the near future. This is discussed in greater detail in the Quantitative SPECT section.
SPECT SYSTEM PERFORMANCE Ring artifact (c)
The system performance of SPECT is summarized in Table 5. The scintillation cameras and the associated collimation determine the count sensitivity of a SPECT system. SPECT spatial resolution is generally isotropic and has a FWHM of 8–10 mm for brain imaging and 12–18 mm for body imaging. Spatial resolution is affected by the collimation, the organ system imaged, and the radiopharmaceutical used. This becomes clear when the components of the spatial resolution are examined. SPECT
No Motion Table 5. SPECT System Performance (High-Resolution Collimator) Parameter
Motion Figure 13. SPECT artifacts. SPECT images are susceptible to a variety of artifacts. (a) Inaccurate center of rotation values blurs each image point. (b) Nonuniformities in the scintillation camera cause ring artifacts. (c) Motion during SPECT acquisition can cause severe distortions.
Number of scintillation cameras Count sensitivity per camera Matrix size Pixel size Spatial resolution (brain studies) Spatial resolution (heart studies) SPECT uniformity Contrast of 25.4-mm spherea
Specification 1, 2 or 3 250 cpm/µCi per detector 64 × 64; 128 × 128 6 mm; 3 mm 8 mm 14 mm 15% 0.45
a Measured in a cylindrical SPECT phantom of 22-cm diameter at a detector orbit radius of 20 cm (58).
1322
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
spatial resolution is quantified by RSPECT =
R2col + R2filter + R2int .
(13)
The intrinsic spatial resolution is a relatively minor factor in this calculation. The collimator resolution depends on the type of collimator selected and how close the scintillation camera can approach the patient during acquisition. The type of collimation depends on the expected gamma ray flux and the energy of the gamma rays emitted. If a 99m Tc radiopharmaceutical is used that concentrates with high uptake in the organ of interest, then a very high-resolution collimator can be used to minimize the Rcol component. However, if a highenergy gamma emitter such as 131 I is used, the appropriate collimator will perform significantly more poorly. Keeping the collimator close to the patient during acquisition is extremely critical for maintaining good spatial resolution.
The best results are obtained for head imaging where a radius of less than 15 cm is possible for most studies. In the trunk of the body, it is difficult to maintain close proximity, and there is a corresponding loss of spatial resolution. In addition, count density is a major consideration. Low count density requires more smoothing within the reconstruction filter, and this imposes additional losses in spatial resolution. QUANTITATIVE SPECT As stated before, until recently, most SPECT imaging relied on either filtered backprojection with no corrections or corrections that use simple physical models that often are poor descriptors of the actual imaging situation (Fig. 14a,b). Routines are available in clinical SPECT systems for enhancing contrast and suppressing noise by using Metz or Wiener filters in conjunction with
(b)
(a)
Conventional attenuation Correction assumptions Reality (c)
(d) Detector 1
Detector 1 Detector 2
No Attenuation Correction
100 keV
Collimated line source End view
Side view 1 Attenuation Correction
Mask size (Y)
2
Line source
1 Mask width (X)
Figure 14. SPECT attenuation correction. Accurate attenuation correction is important for myocardial perfusion imaging because of the heterogeneous distribution of tissues that are different from the assumptions used in the simplified correction schemes (a) and (b). Accurate attenuation correction requires an independent transmission study using an external gamma ray source such as that shown in (c). Attenuation correction removes artifacts that mimic coronary artery disease (d). Photos courtesy of GE Medical Systems.
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
is generally sufficient information to provide useful attenuation compensation. It is desirable to minimize noise in the attenuation map, but because the correction is integral, the propagation of noise is considerably less than in multiplicative corrections such as in PET studies. Compton scattered radiation accounts for about 30–40% of the acquired counts in SPECT imaging. This occurs despite the energy discrimination available in all SPECT systems. This is illustrated in Fig. 15a which shows a plot of the energy of a Compton scattered photon as a function of the scattering angle for different energy gamma rays. Future SPECT systems may have substantially better energy resolution than the 9–10% that is available from NaI(Tl) detectors, but for now, it is necessary to correct for this undesirable information. Scattered
Energy of compton scattered photons as a functiion of angle
(a)
70
% Energy loss
60 50 40 30 20 10 0 0
(b)
50 100 150 Scattering angle (degrees)
Scatter image
200
Photopeak image
290 k
280 k
# of Events
filtered backprojection (30,31). These filters are most often applied to the projection set before reconstruction and have a resolution-restoration component. Then, the resulting projections can be reconstructed by using a ramp filter because the noise is already suppressed. This prefiltering improves image quality significantly, but it still does not accurately correct for attenuation, distance-dependent spatial resolution, or scatter. An iterative approach is required to accomplish that (14,32,33). In the past, iterative algorithms have not been used because of the computational load required to implement the corrections accurately. However, the combination of faster computers and improved reconstruction algorithms in recent years has made these corrections feasible. Gamma ray attenuation by the body destroys the desired linear relationship between the measured projections and the true line integrals of the internally distributed radioactivity. Reconstructing of the measured projections without compensating for attenuation results in artifacts (34). This is especially a big problem in the thorax where the artifacts from diaphragmatic and breast attenuation mimic the perfusion defects associated with coronary artery disease (Fig. 14d). To correct accurately for attenuation, the attenuation distribution needs to be known for each slice. Many different approaches have been investigated to obtain attenuation maps. These range from using information in the emission data to acquiring transmission studies (35,36). Transmission acquisition is the standard approach used today. In this approach, an external source (or sources) is mounted on the gantry opposite a detector, and transmission measurements are acquired at the same angles as the emission data (Fig. 14c). All of the manufacturers of SPECT systems have options for obtaining transmission studies by using external sources to measure the attenuation distribution of the cardiac patients directly using the scintillation camera as a crude CT. Most of the transmission devices allow simultaneous acquisition of emission and transmission information. Therefore, the transmission sources must have energy emissions different from the radiotracers used in the clinical study. Radionuclides that have been used as sources for transmission studies include Am-241, Gd-153, Ba-133 and Cs-137, and at least one vendor uses an X-ray tube. Different source configurations have been used to collect the transmission studies (37), but all of the commercial systems use one or more line sources. In some systems, the line source is translated across the camera field of view (38). One vendor uses an array of line sources that spans the field of view (39). The information collected from these transmission measurements is corrected for cross talk by using the emission gamma rays, and the transmission views are reconstructed to yield an attenuation map. If the photon energy of the transmission source is significantly different from that of the emission radionuclide, the map has to be scaled to the appropriate attenuation coefficients. This is a relatively easy mapping step and can be done with sufficient accuracy. Then, the scaled attenuation map can be used in the iterative algorithm. Because of time, count rate and sensitivity constraints, the quality of the attenuation maps is poor. However, there
1323
Energy Figure 15. Scattered radiation. Compton scattered radiation degrades contrast and compromises attenuation correction. (a) shows a plot of the energy loss of scattered photons as a function of angle and energy. (b) Scattered radiation can be compensated for by acquiring additional data simultaneously from an energy window below the photopeak.
1324
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
radiation decreases contrast and can impact other corrections. For example, when attenuation correction is applied without also correcting for scattered radiation, the heart walls near the liver may be overenhanced. Scatter has been corrected in several different ways (33,40–44). The easiest to implement is the subtraction method where information is simultaneously acquired in a second energy window centered below the photopeak in the Compton scatter region of the energy spectrum (Fig. 15b). After establishing an appropriate normalization factor, the counts from the scatter window are subtracted from the photopeak window. Then, the corrected projections are used in the reconstruction algorithm. The disadvantage of this approach is that it increases noise and it is difficult to establish an accurate normalization factor. An alternate approach is to model the scatter as part of the forward projection routine using a more or less complicated photon-transport model (30,31,42,45,46). This requires information about tissue densities, which is available from transmission measurements. This approach has the potential to provide the best correction; however, it is computationally intensive. It is likely that this problem will be overcome by improvements in computer performance and by more efficient algorithms. For example, it has already been shown that one can calculate the scatter fractions on a coarser grid than the emission data because it has inherently low resolution (47). In addition, because of its low spatial frequency, it converges rapidly, and it is not necessary to update the calculated scatter data at every iteration. Correction must also be made for the distancedependent spatial resolution discussed earlier (Fig. 6). Although a number of approaches have been investigated for applying this correction by using filtered backprojection, the best results have been achieved from iterative algorithms. Like scatter correction, accurate modeling of spatial resolution requires a three-dimensional approach. This is computationally intensive because specific convolution kernels are required for each projection distance. The Gaussian diffusion method is a simpler and faster alternative (48). Using Gaussian diffusion, a convolution kernel is chosen that is sequentially applied at each row of the forward projection matrix. The repeated convolutions reproduce the distance dependence of the collimation fairly accurately. SPECT/PET HYBRID SYSTEMS As previously stated, the motivating force behind SPECT imaging is the availability of radiopharmaceuticals that provide crucial diagnostic information. In recent years, it has become apparent that the premier cancer imaging agent is 18 F Fluorodeoxyglucose (18 F FDG). Fluorodeoxyglucose is a glucose analog that reveals metabolic activity, and it has a sensitivity and specificity for detecting a large number of cancers, including lung, colon, breast, melanoma, and lymphoma. However, 18 F is a positron emitter. This makes it ideally suitable for positron-emission tomography, but unfortunately much less suited for SPECT. The main reason for this is the
high energy of the annihilation radiation resulting from the positron emission. High-energy photons are a problem in SPECT for two reasons. First, the relatively thin NaI(Tl) crystals have low efficiency for detection. At 511 keV, the photopeak efficiency is less than 10% for a 9.6-mm crystal. The second problem is that it is difficult to collimate these high-energy photons (49–51). Because thicker septa are required, the count sensitivity is very low. As a result, the spatial resolution is 30–50% worse than collimators used with 99m Tc. This poor spatial resolution reduces the sensitivity of the test. There is one SPECT application where 18 F FDG performs adequately, and that is in heart imaging. Fluorodeoxyglucose provides information about the metabolic activity of the heart muscle and is a good indicator of myocardial viability. However, the imaging of 18 F FDG in tumors is substantially worse in SPECT than in PET tomographs. As stated previously, the dual detector SPECT system is the most common configuration. One obvious solution to the problem of collimated SPECT using positron emitting radiotracers is to resort to coincidence imaging (Table 6 and Fig. 16). When a positron is emitted from a nucleus during radioactive decay, it dissipates its energy over a short distance and captures an electron. The electron positron pair very quickly annihilates each other and produces two collinear 511keV photons. This feature of annihilation radiation can be exploited in coincidence detection where simultaneous detection by opposed detectors is required. Two opposed scintillation cameras whose collimators are removed can have additional electronics added to enable coincidence detection and essentially turn a SPECT system into a PET tomograph (52,53). Although this may sound easy, there are many problems to overcome. Detection of annihilation photons at the two scintillation cameras represents independent events. The overall efficiency for detection is equal to the product of the individual efficiencies. With a singles efficiency of about 10% (i.e., the efficiency for either detector to register one event), the coincidence efficiency drops to 1%. Although this is very low compared to the detection efficiency at 140 keV (86%), the overall coincidence efficiency is actually very high compared to system efficiency using collimators. But there are still problems. The detection efficiency for detecting only one photon (singles efficiency) is an order of magnitude higher than the coincidence efficiency. This leads to problems with random coincidences. Random coincidences are registered coincidence events that do not result from a
Table 6. SPECT/PET Hybrid System Performance Parameter Number of scintillation cameras NaI(Tl) thickness(mm) Matrix size Pixel size Maximum singles rate(counts/s) Maximum coincidence rate(counts/s) Spatial resolution (mm)
Specification 2 15.9–19 128 × 128 3 mm 1,000,000–2,000,0000 10,000 5
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
1325
Scintillation camera Electronics
Computer acquisition station
Position processor
Nal (TI)
Coincidence gate pulse
Shields & absorbers
Coincidence board
Nal (TI)
Electronics Scintillation camera
single annihilation. These randoms present a background that needs to be subtracted. Because of the low coincidence efficiency, the random rates in coincidence scintillation cameras are quite high and further compromise image quality. One way to improve this problem is to increase efficiency by using thicker crystals. All of the vendors have done this. The crystals have been increased from 9.6 mm to at least 15.9 mm and to as much as 19 mm. As NaI(Tl) crystals increase in thickness, there is a loss of intrinsic spatial resolution that limits the thickness of the crystals that can be used. In most SPECT imaging studies, there are essentially no count rate losses from finite temporal resolution. The amount of radioactivity that can be safely administered and the low sensitivity of the collimation put the observed count rate well within the no-loss range. In fact, most SPECT studies would benefit from higher gamma ray flux. This is not so in coincidence imaging. Once the collimators are removed, the wide open NaI(Tl) crystals are exposed to very large count rates. These count rates are so high that count rate losses are unavoidable, and they become the limiting factor in performance. Much effort has been devoted to improving this situation. In the early 1990s, the maximum observed counting rate for a scintillation camera was in the 200,000–400,000 count/second range. As the SPECT systems were redesigned for coincidence imaging, this rate has been extended to more than 1,000,000 counts/second by shortening the integration time on the pulses and implementing active baseline restoration. The limiting count rate factor in the scintillation camera is the persistence of the scintillation. The 1/e scintillation time for NaI(Tl) is 250 nanoseconds. At the lower energies at which the scintillation camera typically operates, it is advantageous to capture the entire scintillation signal to optimize energy resolution and intrinsic spatial resolution. In coincidence imaging, shortening the pulse is mandatory. Fortunately, the
Figure 16. Coincidence imaging. Opposed scintillation cameras can acquire PET studies by adding of coincidence electronics to record the locations of simultaneously detected events. Photos courtesy of GE Medical Systems.
increased signal obtained from a 511-keV interaction (compared to the typical 140-keV) allows shortening the pulse integration without extreme degradation. Note that, even with these efforts, the coincidence cameras are still limited by count rate and the amount of activity that can be in the camera field of view at the time of imaging is restricted to less than 3 mCi. Thus, one cannot make up for the loss in sensitivity by giving more activity to the patient. Other measures have been taken to help reduce the count rate burden. One of these is a graded absorber. The scintillation camera has to process every event that the crystal absorbs. The only events we care about are the 511-keV photons, but many low-energy photons that result from scatter within the subject also interact with the detector. The photoelectric cross section is inversely proportional to the cube of the gamma ray energy. This means that a thin lead shield placed over the detector will freely pass most 511-keV photons but will strongly absorb low-energy photons. If one uses only lead, there is a problem with the lead characteristic X rays that are emitted as part of the absorption process. These can be absorbed by a tin filter. In turn, the tin characteristic X rays are absorbed by a copper filter and the copper characteristic X rays by an aluminum filter. Even though the detectors are thin, the uncollimated detectors present a large solid angle to the annihilation photons. To achieve maximum sensitivity, it is desirable to accept all coincidences, even those at large angles. Several problems are associated with this. First, it is apparent that the camera sensitivity is highly dependent on position. Sources at the central axis of the detectors have a large solid angle, whereas those at the edge have a very small solid angle. In addition, including of the large angle coincidence events drastically increases the scatter component to more than 50%. Because of this problem, many manufacturer’s use lead slits aligned perpendicular
1326
SINGLE PHOTON EMISSION COMPUTED TOMOGRAPHY (SPECT)
to the axial direction to restrict the angular extent of the coincidences. This reduces the scatter component to less than 30% and also reduces the solid angle variability. The intrinsic spatial resolution of hybrid systems is comparable to that of the dedicated PET systems whose FWHM is 4–5 mm. However, the count sensitivity is at least an order of magnitude lower. This, along with the maximum count rate constraint, guarantees that the coincidence camera data will be very count poor and therefore require substantial low-pass filtering when reconstructed. As a result, the quality of the reconstructed images is perceptibly worse than the dedicated PET images (Fig. 17). In head-to-head comparisons, it has been found that the hybrid systems perform well on tumors greater than 2 cm in diameter located in the lung (54–56). Tumors smaller than 1.5 cm and those located in high background areas are detected by a much lower sensitivity. These results are important because they provide a guide for the useful application of the coincidence camera. Other improvements have also been made on scintillation camera performance for coincidence imaging. As discussed before, the conventional method for determining the location of an interaction on the detector is through a weighted average of PMT signals. At high count rates, this produces unacceptable positioning errors. For example, if two gamma rays are absorbed simultaneously in the opposing corners of the detector, the Anger logic will place a single event in the center of the camera. New positioning algorithms have been developed that use maximum likelihood calculations and can correctly handle that scenario. The projection information collected by coincidence cameras requires correction for random coincidences, scatter, and attenuation for accurate tomographic images. Typically, randoms are either monitored in a separate time window or are calculated from the singles count rate and are subtracted. Scatter correction is sometimes ignored or is accomplished by subtracting data collected in a separate scatter energy window, as discussed for SPECT imaging. Attenuation correction requires information about the transmission of gamma rays through the body at the coincidence lines of response. Some systems ignore this correction and just reconstruct the random- and scatteredcorrected projections. This creates rather severe artifacts
(a)
(b)
Figure 17. Comparison of PET images from a SPECT/PET hybrid system and a dedicated PET system. (a) The top row of images shows several slices obtained by a dedicated PET system from a patient who has lung cancer. (b) The bottom row shows the corresponding images obtained from a SPECT/PET hybrid system. The intensified body edge seen in both the dedicated PET and SPECT/PET hybrid system is an attenuation artifact.
but still shows the accumulation of 18 F FDG in tumor sites (Fig. 17). When attenuation is corrected, a separate transmission study is performed using an external source, and attenuation maps are formed in a manner similar to that discussed in myocardial SPECT. Cs-137 has been used for this purpose for coincidence cameras. Attenuation correction factors for coincidence imaging are very large for annihilation radiation going through thick portions of the body where they can approach a value of 100. SUMMARY SPECT imaging is expected to play a continuing and important role in medical imaging. Future improvements in SPECT instrumentation are likely to include new detectors and collimation schemes. The coincidence scintillation cameras will also continue their evolution by adding more cameras and multidetector levels optimized for SPECT and coincidence imaging. Improvements in reconstruction algorithms will include prior information about the tomographic image such as smoothness constraints and anatomic distributions. The primary motivating factor in SPECT imaging will continue to be the creation and implementation of new radiopharmaceuticals. SPECT will continue in wide use for myocardial perfusion imaging, but SPECT use in tumor imaging will probably experience the largest growth. Applications will include treatment planning for internal radiation therapy as well as diagnostic studies (57). ABBREVIATIONS AND ACRONYMS FWHM MRI ML-EM OS-EM PMT PET SPECT CT
full-width-at-half-maximum magnetic resonance imaging maimum likelihood expectation maximization ordered subsets expectation maximization photomultiplier tube positron emission tomography single photon emission computed tomography X-ray computed tomography
BIBLIOGRAPHY 1. J. A. Patton, Radiographics 18, 995–1,007 (1998). 2. T. R. Miller, Radiographics 16, 661–668 (1996). 3. R. Hustinx and A. Alavi, Neuroimaging Clin. N. Am. 9, 751–766 (1999). 4. B. L. Holman and S. S. Tumeh, JAMA 263, 561–564 (1990). 5. R. E. Coleman, R. A. Blinder, and R. J. Jaszczak, Invest. Radiol. 21, 1–11 (1986). 6. J. F. Eary, Lancet 354, 853–857 (1999). 7. D. J. Simpkin, Radiographics 19, 155–167; quiz 153–154 (1999). 8. R. J. Jaszczak and R. E. Coleman, Invest. Radiol. 20, 897–910 (1985). 9. F. H. Fahey, Radiographics 16, 409–420 (1996). 10. S. C. Moore, K. Kouris, and I. Cullum, Eur. J. Nucl. Med. 19, 138–150 (1992). 11. B. M. Tsui and G. T. Gullberg, Phys. Med. Biol. 35, 81–93 (1990).
STEREO AND 3-D DISPLAY TECHNOLOGIES 12. J. M. Links, Eur. J. Nucl. Med. 25, 1,453–1,466 (1998). 13. W. L. Rogers and R. J. Ackermann, Am. J. Physiol. Imaging 7, 105–120 (1992). 14. B. M. Tsui et al., J. Nucl. Cardiol. 5, 507–522 (1998). 15. K. J. LaCroix, B. M. Tsui, and B. H. Hasegawa, J. Nucl. Med. 39, 562–574 (1998).
1327
48. V. Kohli, M. A. King, S. J. Glick, and T. S. Pan, Phys. Med. Biol. 43, 1,025–1,037 (1998). 49. W. E. Drane et al., Radiology 191, 461–465 (1994). 50. J. S. Fleming and A. S. Alaamer, 1,832–1,836 (1996).
J.
Nucl.
Med.
37,
51. P. K. Leichner et al., J. Nucl. Med. 36, 1,472–1,475 (1995).
16. M. P. White, A. Mann, and M. A. Saari, J. Nucl. Cardiol. 5, 523–526 (1998).
52. T. K. Lewellen, R. S. Miyaoka, and W. L. Swan, Nucl. Med. Commun. 20, 5–12 (1999).
17. D. S. Berman and G. Germano, J. Nucl. Cardiol. 4, S169–171 (1997).
53. P. H. Jarritt and P. D. Acton, Nucl. Med. Commun. 17, 758–766 (1996).
18. P. Bruyant, J. Sau, J. J. Mallet, and A. Bonmartin, Comput. Biol. Med. 28, 27–45 (1998).
54. P. D. Shreve et al., Radiology 207, 431–437 (1998).
19. H. Barrett, in A. Todd-Pokropek and M. Viergever, eds., Medical Images: Formation, Handling and Evaluation, Springer-Verlag, NY, 1992, pp. 3–42. 20. M. T. Madsen, Radiographics 15, 975–91 (1995). 21. T. F. Budinger et al., J. Comput. Assist. Tomogr. 1, 131–145 (1977).
55. P. D. Shreve, R. S. Steventon, and M. D. Gross, Clin. Nucl. Med. 23, 799–802 (1998). 56. R. E. Coleman, C. M. Laymon, and T. G. Turkington, Radiology 210, 823–828 (1999). 57. P. B. Zanzonico, R. E. Bigler, G. Sgouros, and A. Strauss, Semin. Nucl. Med. 19, 47–61 (1989). 58. L. S. Graham et al., Medical Phys. 22, 401–409 (1995).
22. L. Chang, IEEE Trans. Nucl. Sci. NS-25, 638–643 (1978). 23. J. W. Wallis and T. R. Miller, J. Nucl. Med. 34, 1,793–1,800 (1993). 24. B. F. Hutton, H. M. Hudson, and F. J. Beekman, Eur. J. Nucl. Med. 24, 797–808 (1997).
STEREO AND 3-D DISPLAY TECHNOLOGIES
25. L. Shepp and Y. Vardi, IEEE Trans. Medical Imaging 1, 113–122 (1982).
DAVID F. MCALLISTER
26. H. Hudson and R. Larkin, IEEE Trans. Medical Imaging 13, 601–609 (1994).
North Carolina State University Raleigh, NC
27. L. S. Graham, Radiographics 15, 1,471–1,481 (1995). 28. H. Hines et al., Eur. J. Nucl. Med. 26, 527–532 (1999). 29. T. Budinger et al., Mathematics and Physics of Emerging Biomedical Imaging, National Academy Press, Washington, D.C., 1996. 30. J. M. Links et al., J. Nucl. Med. 31, 1,230–1,236 (1990). 31. M. A. King, M. Coleman, B. C. Penney, and S. J. Glick, Med. Phys. 18, 184–189 (1991). 32. M. A. King et al., J. Nucl. Cardiol. 3, 55–64 (1996). 33. B. M. Tsui, X. Zhao, E. C. Frey, and Semin. Nucl. Med. 24, 38–65 (1994).
W. H. McCartney,
34. S. L. Bacharach and I. Buvat, J. Nucl. Cardiol. 2, 246–255 (1995). 35. M. T. Madsen et al., J. Nucl. Cardiol. 4, 477–486 (1997). 36. A. Welch, R. Clack, F. Natterer, and G. T. Gullberg, IEEE Trans. Medical Imaging 16, 532–541 (1997). 37. M. A. King, B. M. Tsui, and T. S. Pan, J. Nucl. Cardiol. 2, 513–524 (1995). 38. R. Jaszczak et al., J. Nucl. Med. 34, 1,577–1,586 (1993). 39. A. Celler et al., J. Nucl. Med. 39, 2,183–2,189 (1998). 40. I. Buvat et al., J. Nucl. Med. 36, 1,476–1,488 (1995). 41. F. J. Beekman, C. Kamphuis, and E. C. Frey, Phys. Med. Biol. 42, 1,619–1,632 (1997). 42. F. J. Beekman, H. W. de Jong, and E. T. Slijpen, Phys. Med. Biol. 44, N183–192 (1999).
INTRODUCTION Recently, there have been rapid advancements in 3-D techniques and technologies. Hardware has improved and become considerably cheaper, making real-time and interactive 3-D available to the hobbyist, as well as to the researcher. There have been major studies in areas such as molecular modeling, photogrammetry, flight simulation, CAD, visualization of multidimensional data, medical imaging, teleoperations such as remote vehicle piloting and remote surgery, and stereolithography. In computer graphics, the improvements in speed, resolution, and economy make interactive stereo an important capability. Old techniques have been improved, and new ones have been developed. True 3-D is rapidly becoming an important part of computer graphics, visualization, virtual-reality systems, and computer gaming. Numerous 3-D systems are granted patents each year, but very few systems move beyond the prototype stage and become commercially viable. Here, we treat the salient 3-D systems. First, we discuss the major depth cues that we use to determine depth relationships among objects in a scene.
43. D. R. Haynor, M. S. Kaplan, R. S. Miyaoka, and T. K. Lewellen, Medical Phys. 22, 2,015–2,024 (1995). 44. M. S. Rosenthal et al., J. Nucl. Med. 36, 1,489–1,513 (1995).
DEPTH CUES
45. A. Welch et al., Medical Phys. 22, 1,627–1,635 (1995). 46. A. Welch and G. T. Gullberg, IEEE Trans. Medical Imaging 16, 717–726 (1997). 47. D. J. Kadrmas, E. C. Frey, S. S. Karimi, and B. M. Tsui, Phys. Med. Biol. 43, 857–873 (1998).
The human visual system uses many depth cues to eliminate the ambiguity of the relative positions of objects in a 3-D scene. These cues are divided into two categories: physiological and psychological.
1328
STEREO AND 3-D DISPLAY TECHNOLOGIES
Physiological Depth Cues Accommodation. Accommodation is the change in focal length of the lens of the eye as it focuses on specific regions of a 3-D scene. The lens changes thickness due to a change in tension of the ciliary muscle. This depth cue is normally used by the visual system in tandem with convergence. Convergence. Convergence, or simply vergence, is the inward rotation of the eyes to converge on objects as they move closer to the observer. Binocular Disparity. Binocular disparity is the difference in the images projected on the retinas of the left and right eyes in viewing a 3-D scene. It is the salient depth cue used by the visual system to produce the sensation of depth, or stereopsis. Any 3-D display device must be able to produce a left- and right-eye view and present them to the appropriate eye separately. There are many ways to do this as we will see. Motion Parallax. Motion parallax provides different views of a scene in response to movement of the scene or the viewer. Consider a cloud of discrete points in space in which all points are the same color and approximately the same size. Because no other depth cues (other than binocular disparity) can be used to determine the relative depths of the points, we move our heads from side to side to get several different views of the scene (called look around). We determine relative depths by noticing how much two points move relative to each other: as we move our heads from left to right or up and down, the points closer to us appear to move more than points further away. Psychological Depth Cues Linear Perspective. Linear perspective is the change in image size of an object on the retina in inverse proportion to the object’s change in distance. Parallel lines moving away from the viewer, like the rails of a train track, converge to a vanishing point. As an object moves further away, its image becomes smaller, an effect called perspective foreshortening. This is a component of the depth cue of retinal image size. Shading and Shadowing. The amount of light from a light source that illuminates a surface is inversely proportional to the square of the distance from the light source to the surface. Hence, the surfaces of an object that are further from the light source are darker (shading), which gives cues of both depth and shape. Shadows cast by one object on another (shadowing) also give cues to relative position and size. Aerial Perspective. Distant objects tend to be less distinct and appear cloudy or hazy. Blue has a shorter wavelength and penetrates the atmosphere more easily than other colors. Hence, distant outdoor objects sometimes appear bluish. Interposition. If one object occludes, hides, or overlaps (interposes) another, we assume that the object doing
the hiding is closer. This is one of the most powerful depth cues. Retinal Image Size. We use our knowledge of the world, linear perspective, and the relative sizes of objects to determine relative depth. If we view a picture in which an elephant is the same size as a human, we assume that the elephant is further away because we know that elephants are larger than humans. Textural Gradient. We can perceive detail more easily in objects that are closer to us. As objects become more distant, the textures become blurred. Texture in brick, stone, or sand, for example, is coarse in the foreground and grows finer as distance increases. Color. The fluids in the eye refract different wavelengths at different angles. Hence, objects of the same shape and size and at the same distance from the viewer often appear to be at different depths because of differences in color. In addition, bright-colored objects will appear closer than dark-colored objects. The human visual system uses all of these depth cues to determine relative depths in a scene. In general, depth cues are additive; the more cues, the better the viewer can determine depth. However, in certain situations, some cues are more powerful than others, and this can produce conflicting depth information. Our interpretation of the scene and our perception of the depth relationships result from our knowledge of the world and can override binocular disparity. A TECHNOLOGY TAXONOMY The history of 3-D displays is well summarized in several works. Okoshi (1) and McAllister (2) each present histories of the development of 3-D technologies. Those interested in a history beginning with Euclid will find (3) of interest. Most 3-D displays fit into one or more of three broad categories: stereo pair, holographic, and multiplanar or volumetric. Stereo pair-based technologies distribute left and right views of a scene independently to the left and right eyes of the viewer. Often, special viewing devices are required to direct the appropriate view to the correct eye and block the incorrect view to the opposite eye. If no special viewing devices are required, then the technology is called autostereoscopic. The human visual system processes the images and if the pair of images is a stereo pair, described later, most viewers will perceive depth. Only one view of a scene is possible per image pair which means that the viewer cannot change position and see a different view of the scene. We call such images ‘‘virtual.’’ Some displays include head tracking devices to simulate head motion or ‘‘look around.’’ Some technologies allow displaying multiple views of the same scene providing motion parallax as the viewer moves the head from side to side. We discuss these technologies here. In general, holographic and multiplanar images produce ‘‘real’’ or ‘‘solid’’ images, in which binocular disparity, accommodation, and convergence are consistent with the
STEREO AND 3-D DISPLAY TECHNOLOGIES
apparent depth in the image. They require no special viewing devices and hence, are autostereoscopic. Holographic techniques are discussed elsewhere. Multiplanar methods are discussed later. Stereo Pairs The production of stereoscopic photographs (stereo pairs or stereographs) began in the early 1850s. Stereo pairs simulate the binocular disparity depth cue by projecting distinct (normally flat) images to each eye. There are many techniques for viewing stereo pairs. One of the first was the stereoscope, which used stereo card images that can still be found in antique shops. A familiar display technology, which is a newer version of the stereoscope, is the ViewMaster and its associated circular reels (Fig. 1). Because some of the displays described are based on the stereo pair concept, some stereo terminology is appropriate. Terminology. Stereo pairs are based on presenting two different images, one for the left eye (L) and the other for the right eye (R). Stereo images produced photographically normally use two cameras that are aligned horizontally and have identical optics, focus, and zoom. To quantify what the observer sees on the two images, we relate each image to a single view of the scene. Consider a point P in a scene being viewed by a binocular viewer through a window (such as the film plane of a camera). A point P in the scene is projected on the window surface, normally a plane perpendicular to the observer’s line of sight, such as the camera film plane, the face of a CRT, or a projection screen. This projection surface is called the stereo window or stereo plane. We assume that the y axis lies in a plane that is perpendicular to the line through the observer’s eyes.
1329
The distance between the eyes is called the interocular distance. Assigning a Cartesian coordinate system to the plane, the point P will appear on the left eye view at coordinates (xL , yL ) and in the right eye view at coordinates (xR , yR ). These two points are called homologous. The horizontal parallax of the point P is the distance xR − xL between the left- and right-eye views; the vertical parallax is yR − yL (Fig. 2). Positive parallax occurs if the point appears behind the stereo window because the left-eye view is to the left of the right-eye view. Zero parallax occurs if the point is at the same depth as the stereo window; zero parallax defines the stereo window, and negative parallax occurs if the point lies in front of the stereo window (Fig. 3). Given the previous geometric assumptions, vertical parallax or vertical disparity should always be zero. Misaligned cameras can produce nonzero vertical parallax. Observers differ about the amount they can tolerate before getting side effects such as headache, eye strain, nausea,
Stereo plane Ho
mo
log
ous
poi
Left eye
nts
Scan line
Right eye
Figure 2. Horizontal parallax.
P1 Image behind window
Stereo window
P3 Image at plane of window
P2 Image in front of window
Left eye Figure 1. ViewMaster.
Right eye
Figure 3. Positive/negative parallax.
1330
STEREO AND 3-D DISPLAY TECHNOLOGIES
or other uncomfortable physical symptoms. Henceforth, the term parallax will mean horizontal parallax. If the horizontal parallax is too large and exceeds the maximum parallax, to view the points, our eyes must go wall-eyed, a condition where the eyes each move to the outside to view the image. After lengthy exposure, this can produce disturbing physical side effects. Images in which the parallax is reversed are said to have pseudostereo. Such images can be very difficult to fuse; the human visual system will have difficulty recognizing the binocular disparity. Other depth cues compete and overwhelm the visual system. Parallax and convergence are the primary vehicles for determining perceived depth in a stereo pair; the observer focuses both eyes on the plane of the stereo window. Hence, accommodation is fixed. In such cases accommodation and convergence are said to be ‘‘disconnected,’’ and the image is ‘‘virtual’’ rather than ‘‘solid’’ (see the section on volumetric images later). This inconsistency between accommodation and convergence can make stereo images difficult for some viewers to fuse. If you cannot perceive depth in a stereo pair, you may be a person who is ‘‘stereo-blind’’ and cannot fuse stereo images (interpret as a 3-D image rather than two separate 2-D images). There are many degrees of stereo-blindness, and the ability or inability to see stereo may depend on the presentation technique, whether the scene is animated, color consistency between the L/R pair, and many other considerations. Computation of Stereo Pairs. Several methods have been proposed for computing stereo pairs in a graphics environment. Certain perception issues eliminate some techniques from consideration. A common technique for computing stereo pairs involves rotating a 3-D scene about an axis parallel to the sides of the viewing screen, followed by a perspective projection. This process can cause vertical displacement because of the foreshortening that occurs in a perspective projection. Hence, the technique is not recommended. Although parallel projection will not produce vertical displacement, the absence of linear perspective can create a ‘‘reverse’’ perspective as the result of a perceptual phenomenon known as Emmert’s law: objects that do not obey linear perspective can appear to get larger as the distance from the observer increases. The preferred method for computing stereo pairs is to use two off-axis centers of perspective projection (corresponding to the positions of the left and right eyes). This method simulates the optics of a stereo camera where both lenses are parallel. For further details, see (2). OVERVIEW OF DISPLAY TECHNOLOGIES Separating Left- and Right-Eye Views When viewing stereo pairs, a mechanism is required so that the left eye sees only the left-eye view and the right eye sees only the right-eye view. Many mechanisms have been proposed to accomplish this. The ViewMaster uses two images each directed to the appropriate eye by lenses. The images are shown in parallel, and there is no way one eye can see any part of the other eye view.
It is common in display technologies to use a single screen to reflect or display both images either simultaneously (time parallel) or in sequence (time multiplexed or field sequential). The technologies used to direct the appropriate image to each eye while avoiding mixing the left- and right-eye images require sophisticated electro-optics or shuttering. Some of the more common methods are described here. Cross Talk Stereo cross talk occurs when a portion of one eye view is visible in the other eye. In this case, the image can appear blurred, or a second or double image appears in regions of the scene being viewed that creates a phenomenon called ghosting. Cross talk can create difficulty in fusing L/R views. When using the same display surface to project both eye views, cross talk can be a problem. When stereo displays are evaluated, the cross talk issue should be addressed. Field-Sequential Techniques A popular method for viewing stereo by a single display device is the field-sequential or time-multiplexed technique. The L/R views are alternated on the display device, and a blocking mechanism to prevent the left eye from seeing the right eye view and vice versa is required. The technology for field-sequential presentation has progressed rapidly. Historically, mechanical devices were used to occlude the appropriate eye view during display refresh. A comparison of many of these older devices can be found in (4). Newer technologies use electro-optical methods such as liquid-crystal plates. These techniques fall into two groups: those that use active versus passive viewing glasses. In a passive system, a polarizing shutter is attached to the display device, as in a CRT, or the screen produces polarized light automatically as in an LCD panel. The system polarizes the left- and right-eye images in orthogonal directions (linear or circular), and the user wears passive polarized glasses where the polarization axes are also orthogonal. The polarizing lenses of the glasses combine with the polarized light from the display device to act as blocking shutters to each eye. When the left eye view is displayed, the light is polarized along an axis parallel to the axis of the left-eye lens and the left eye sees the image on the display. Because the axis is orthogonal to the polarizer of the right eye, the image is blocked to the right eye. The passive system permits several people to view the display simultaneously and allows a user to switch viewing easily from one display device to another because no synchronization with the display device is required. It also permits a larger field of view (FOV). The drawback is that the display device must produce a polarized image. Projector mechanisms must have polarizing lenses, and a CRT or panel display must have a polarizing plate attached to or hanging in front of the screen or the projector. When projecting an image on a screen, the screen must be coated with a material (vapor-deposited aluminum) that does not depolarize the light (the commercially available ‘‘silver’’
STEREO AND 3-D DISPLAY TECHNOLOGIES
screen). Polarization has the added disadvantage that the efficiency or transmission is poor; the intensity of the light to reach the viewer compared to the light emitted from the display device is very low, often in the range of 30%. Hence, images appear dark. LCDs can also be used as blocking lenses. An electronic pulse provided by batteries or a cable causes the lens to ‘‘open’’ or admit light from the display device. When no electronic pulse is present, the lens is opaque and blocks the eye from seeing the display device. The pulses are alternated for each eye while the display device alternates the image produced. The glasses must be synchronized to the refresh of the display device, normally using an infrared signal or a cable connection. For CRT-based systems, this communication is accomplished using the stereo-sync or Left/Right (L/R) signal. In 1997, the Video Equipment Standards Association (VESA) called for the addition of a standard jack that incorporates the L/R signal along with a + 5 volt power supply output. Using this new standard, stereo equipment can be plugged directly into a stereo-ready video card that has this jack. Active glasses have an advantage that the display device does not have to polarize the light before it reaches the viewer. Hence, efficiency is higher and back-projection can be used effectively. The disadvantage is obviously the requirement for synchronization. Though the initial cost of the passive system is higher, the cost to add another user is low. This makes the passive system a good choice for theaters and trade shows, for example, where one does not want to expose expensive eyewear to abuse. If the images in both systems are delivered at a sufficiently fast frame rate (120 Hz) to avoid flicker, the visual system will fuse the images into a three-dimensional image. Most mid- to high-end monitors can do this. A minimum of 100 Hz is acceptable for active eyewear systems. One may be able to use 90 Hz for a passive system without perceiving flicker, even in a well-lit room.
1331
user wore polarized glasses that distributed the correct view to each eye. Polarizing filters can also be attached to glass-mounted slides. Incorrect positioning of the projectors relative to the screen can cause keystoning, in which the image is trapezoidal caused by foreshortening that results in vertical parallax. If more than one projector is used, as is often the case when projecting 35-mm stereo slides, for example, orthogonal polarizing filters are placed in front of each projector, and both left- and right-eye images are projected simultaneously onto a nondepolarizing screen. Hence, the technique is time parallel. The audience wears passive glasses in this case. Using more than one projector always brings with it the difficulties of adjusting the images. L/R views should be correctly registered; there must be minimal luminosity differences, minimal size differences, minimal keystoning, minimal vertical parallax, minimal ghosting, and so forth. Most nonautostereoscopic display systems use one of these methods. Following, we indicate which method. 3D DISPLAYS VIEWING DEVICES REQUIRED Hard Copy
Time-Parallel Techniques
Anaglyphs. The anaglyph method has been used for years to represent stereo pairs, and it was a salient technique in old 3-D movies and comic books. Colored filters cover each eye; red/green, red/blue, or red/cyan filters are the most common. One eye image is displayed in red and the other in green, blue, or cyan, so that the appropriate eye sees the correct image. Because both images appear simultaneously, it is a time-parallel method. The technique is easy to produce using simple image processing techniques, and the cost of viewing glasses is very low. Gray-scale images are most common. Pseudocolor or polychromatic anaglyphs are becoming more common. If correctly done, anaglyphs can be an effective method for presenting stereo images.
Time-parallel methods present both eye views to the viewer simultaneously and use optical techniques to direct each view to the appropriate eye. Often, 3-D movies used the anaglyph method that requires the user to wear glasses that have red and green lenses or filters. Both images were presented on a screen simultaneously; hence, it is a time-parallel method. Many observers suffered headaches and nausea when leaving the theater, which gave 3-D, and stereo in particular, a bad reputation. (A phenomenon called ghosting or cross talk was a significant problem. Colors were not adjusted correctly, and the filters did not completely eliminate the opposite-eye view, so that the left eye saw its image and sometimes part of the right-eye image as well. Other problems included poor registration of the left- and righteye images that caused vertical parallax and projectors out of sync.) The ViewMaster is another example of a time-parallel method. An early technique for viewing stereo images on a CRT was the half-silvered mirror originally made for viewing microfiche (4). The device had polarizing sheets, and the
Vectographs. Polaroid’s Vectograph process was introduced by Edwin Land in 1940. The earliest Vectograph images used extensively were black-and-white polarizing images formed by iodine ink applied imagewise to oppositely oriented polyvinyl alcohol (PVA) layers laminated to opposite sides of a transparent base material. The iodine forms short polymeric chains that readily align with the oriented polymeric molecules and stain the sheet. The chemistry is analogous to that of uniformly stained iodine polarizers, such as Polaroid H-sheet, used in polarizing filters for stereo projection and in 3-D glasses used for viewing stereoscopic images [see (2) for more details]. In 1953, Land demonstrated three-color Vectograph images formed by successive transfer of cyan, magenta, and yellow dichroic dyes from gelatin relief images to Vectograph sheet. Unlike StereoJet digital ink-jet printing described next, preparation of Vectograph color images required lengthy, critical photographic and dye transfer steps. Although the process produced excellent images, it was never commercialized.
1332
STEREO AND 3-D DISPLAY TECHNOLOGIES
StereoJet. The StereoJet process, developed at the Rowland Institute for Science in Cambridge, Massachusetts, provides stereoscopic hard copy in the form of integral, full-color polarizing images. StereoJet images are produced by ink-jet printing that forms polarizing images by using inks formulated from dichroic dyes. Paired left-eye and right-eye images are printed onto opposite surfaces of a clear multilayer substrate, as shown in Fig. 4. The two outer layers, formed of an ink-permeable polymer such as carboxymethylcellulose, meter the ink as it penetrates the underlying image-receiving layers. The image-receiving layers are formed of polyvinyl alcohol (PVA) molecularly oriented at 45° to the edge of the sheet. As the dye molecules are adsorbed, they align with the oriented polymer molecules and assume the same orientation. The two PVA layers are oriented at 90° to one another, so that the images formed have orthogonal polarization. StereoJet transparencies are displayed directly by rear illumination or projected by overhead projector onto a nondepolarizing screen, such as a commercially available lenticular ‘‘silver’’ screen. No attachments to the projector are needed because the images themselves provide the polarization. StereoJet prints for viewing by reflected light have aluminized backing laminated to the rear surfaces of StereoJet transparencies. ChromaDepth. Chromostereoscopy is a phenomenon in optics commercialized by Richard Steenblik (2). The technique originally used double prism-based glasses that slightly deflect different colors in an image, laterally displacing the visual positions of differently colored regions of an image by different amounts. The prisms are oriented in opposite directions for each eye, so that different images are presented to each eye, thereby creating a stereo pair (Fig. 5). Production chromostereoscopic glasses, marketed under the name
This image appears in full contrast to the left eye and invisible to the right eye
This image appears in full contrast to the right eye and invisible to the left eye
Figure 4. StereoJet imaging.
Left eye
Actual object distance Superchromatic prism
Superchromatic prism
B
Right eye
R B Make depth
Average mace distance Figure 5. Superchromatic glasses.
ChromaDepth 3-D, use a unique micro-optic film that performs the same optical function as double-prism optics without the attendant weight and cost. Images designed for viewing with ChromaDepth 3-D glasses use color to encode depth information. A number of color palettes have been successfully employed; the simplest is the RGB on Black palette: on a black background, red will appear closest, green in the middleground, and blue in the background. Reversal of the optics results in the opposite depth palette: BGR on Black. A peculiar feature of the ChromaDepth 3-D process is that the user does not have to create a stereo pair. A single ChromaDepth 3-D color image contains X, Y, and Z information by virtue of the image contrast and the image colors. The stereo pair seen by the user is created by the passive optics in the ChromaDepth 3-D glasses. The primary limitation of the ChromaDepth 3-D process is that the colors in an image cannot be arbitrary if they are to carry the image’s Z dimension; so the method will not work on arbitrary images. The best effects are obtained from images that are specifically designed for the process and from natural images, such as underwater reef photographs, that have natural coloring fitting the required palette. Another limitation is that some color ‘‘fringing’’ can occur when viewing CRT images. The light emitted from a CRT consists of different intensities of red, green, and blue; any other color created by a CRT is a composite of two or more of these primary colors. If a small region of a composite color, such as yellow, is displayed on a CRT, the optics of the ChromaDepth 3-D glasses may cause the composite color to separate into its primary components and blur the region. ChromaDepth 3-D high definition glasses reduce this problem by placing most of the optical power in one eye, leaving the other eye to see the image clearly. The ChromaDepth 3-D technique can be used in any color medium. It has found wide application in laser shows and in print, video, television, computer graphic, photographic slide, and Internet images. Many areas of research have benefited from ChromaDepth 3-D, including interactive visualization of geographic and geophysical data.
STEREO AND 3-D DISPLAY TECHNOLOGIES
1333
Transparency Viewers. Cheap plastic and cardboard slide viewers are available from many companies like Reel 3-D Enterprises (http://stereoscopy.com/reel3D/index.html) for viewing 35-mm stereo slides. The user places the left eye view in the left slot and the right eye view in the right slot and then holds them up to the light. This is a standard technique for checking the mounting of slides for correct registration. Field-Sequential Devices StereoGraphics Systems. Although there are many manufacturers of active and passive glasses systems, StereoGraphics is a well-known company that has produced high-quality CRT and RGB projector based stereo systems for years. The quality of their hardware is excellent, and we report on it here. Active StereoGraphics shutters called CrystalEyes (Fig. 6) are doped, twisted-nematic devices. They ‘‘open’’ in about 3 ms and ‘‘close’’ in about 0.2 ms. The shutter transition occurs within the vertical blanking period of the display device and is all but invisible. The principal figure of merit for such shutters is the dynamic range, which is the ratio of the transmission of the shutter in its open state to its closed state. The CrystalEyes system has a ratio in excess of 1000 : 1. The transmission of the shutters is commonly 32%, but because of the 50% duty cycle, the effective transmission is half that. Their transmission should be neutral and impart little color shift to the image being viewed. The field of view (FOV) also varies. Ninety-seven degrees is typical. SGI can operate at a speed up to 200 fields per second. The cost for eyewear and emitter is $1000. Passive systems have a lower dynamic range than active eyewear systems. The phosphor afterglow on the CRT causes ghosting, or image cross talk, in this type of system. Electrode segmentation can be used to minimize the time during which the modulator is passing an unwanted image. The modulator’s segments change state moments before the CRT’s scanning beam arrives at that portion of the screen. The consequence of this action is a modulator that changes state just as the information is changing. This increases the effective dynamic range of the system and produces a high-quality stereo image.
Figure 6. Active glasses CrystalEyes system.
Figure 7. Passive glasses ZScreen system.
This technique is used by StereoGraphics in their ZScreen system (Fig. 7). A Monitor ZScreen system costs $2200. The above-and-below format is used on personal computers that do not have a stereo sync output. The left image is placed on the top half of the CRT screen, and the right image on the bottom half, thus reducing the resolution of the image. Chasm Graphics makes a software program called Sudden Depth that will format the images this way. Now, the stereo information exists but needs an appropriate way to send each L/R image to the proper eye. The StereoGraphics EPC-2 performs this task. The EPC-2 connects to the computer’s VGA connector and intercepts the vertical sync signal. When enabled, the unit adds an extra vertical sync pulse halfway between the existing pulses. The result causes the monitor to refresh at twice the original rate. In effect, this stretches the two images to fill the whole screen and show field-sequential stereo. The EPC-2 acts as an emitter for CrystalEyes or can be used as a device to create a left/right signal to drive a liquid crystal modulator or other stereo product. The EPC-2 is the same size as the other emitters and has approximately the same range. Its cost is $400. The Pulfrich Technique. Retinal sensors require a minimum number of light photons to fire and send a signal to the visual system. By covering one eye with a neutral density filter (like a lens in a pair of sunglasses), the light from a source will be slightly delayed to the covered eye. Hence, if an object is in motion in a scene, the eye that has the filter cover sees the position of the object later than the uncovered eye. Therefore, the images perceived by the left and right eyes will be slightly different, and the visual system will interpret the result as a stereo pair.
1334
STEREO AND 3-D DISPLAY TECHNOLOGIES
If the motion of an object on a display device is right to left and the right eye is covered by the filter, then a point on the object will be seen by the left eye before the right eye. This will be interpreted by the visual system as positive parallax, and the object will appear to move behind the stereo window. Similarly, an object moving from left to right will appear in front of the display device. The reader can implement the technique easily using one lens of a pair of sunglasses while watching TV. The Fakespace PUSH Display. Fakespace Lab’s PUSH desktop display uses a box-shaped binocular viewing device that has attached handles and is mounted on a triad of cylindrical sensors (Fig. 8). The device allows the user to move the viewing device and simulate limited movement within a virtual environment. The field of view can be as large as 140° on CRT-based systems. The cost is US $25,000 for the 1024 × 768 CRT and US $9,995 for the 640 × 480 LCD version. A variation that permits more viewer movement is the Boom (Fig. 9). The binocular viewing device is attached to a large arm configured like a 3-D digitizer that signals the position of the viewer using sensors at the joints of the arm. The viewer motion is extended to a circle 6 ft in diameter. Vertical movement is limited to 2.5 ft. The Boom sells for US $60,000. A hands-free version is available for US $85,000.
Figure 9. Fakespace Lab’s Boom.
Workbench Displays. Smaller adjustable table-based systems such as the Fakespace ImmersaDesk R2 (Fig. 10) and ImmersaDesk M1 are available. The systems use the active glasses stereo technique. The fully portable R2 sells for approximately US $140,000, including tracking. The M2 sells for US $62,995.
Figure 10. Fakespace ImmersaDesk R2.
Figure 8. The Fakespace Lab’s PUSH desktop display.
VREX Micropolarizers. VREX has patented what they call the µPol (micropolarizer) technology, an optical device that can change the polarization of an LCD display line by line. It is a periodic array of microscopically small polarizers that spatially alternate between mutually perpendicular polarizing states. Each micropolarizer can be as small as 10 millionths of a meter. Hence, a µPol could have more than 6 million micropolarizers of alternating polarization states per square inch in a checkerboard configuration of more than 2500 lines per inch in a one-dimensional configuration. In practice, the µPol encodes the left-eye image on even lines and the right-eye image on odd lines. Passive polarized glasses are needed to view the image.
STEREO AND 3-D DISPLAY TECHNOLOGIES
1335
The format requires a single-frame stereoscopic image format that combines a left-eye perspective view with a right-eye perspective view to form a composite image, which contains both left- and right-eye information alternating line by line. VREX provides software to combine left- and right-eye views into a single image. All VREX hardware supports this image format. The advantages of µPol include the ability to run at lower refresh rates because both eyes are presented with a (lower resolution) image simultaneously and hence the presentation is time parallel. LARGE FORMAT DISPLAYS One of the objects of virtual reality is to give the user the feeling of immersion in a scene. This has been accomplished in various ways. Head-mounted displays are a common solution. In general, head-mounted displays have a limited field of view and low resolution. In addition, allowing the user to move in space requires position tracking which has been a difficult problem to solve. Position tracking results in image lag which is a result of the time required to sense that the viewer’s position has changed, signal the change to the graphics system, render the scene change, and then transmit it to the headmounted display. Any system that must track the viewer and change the scene accordingly must treat this problem. The lag can produce motion sickness in some people. Projection systems have been developed that use large projection surfaces to simulate immersion. In some cases, the user is permitted to move about. In others, the user is stationary, and the scene changes.
Figure 11. Fakespace CAVE, front view.
IMAX Most readers are familiar with the large screen IMAX system that employs a large flat screen to give the illusion of peripheral vision. When projecting stereo, IMAX uses the standard field-sequential polarized projection mechanism where the user wears passive glasses. Similar techniques are used in the Kodak flat screen 3-D movies at Disney. Fakespace Systems Displays Fakespace Systems markets immersive displays that are similar to immersive technologies produced by several other companies. The walk-in, fully immersive CAVE is an extension of flat screen stereo. The CAVE system was developed at the Electronic Visualization Lab of the University of Illinois where the user is in a 10 × 10 ft room that has flat walls (Figs. 11 and 12). A separate stereo image is back-projected onto each wall; the floor and possibly the ceiling give the user the feeling of immersion. Image management is required so that the scenes on each wall fit together seamlessly to replicate the single surrounding environment. Because the system uses back-projection, it requires active shuttering glasses. The user can interact with the environment using 3-D input devices such as gloves and other navigational tools. The system sells for
Figure 12. Fakespace CAVE, inside.
approximately US $325,000 to $500,000, depending on the projection systems used. Fakespace also produces an immersive WorkWall whose screen size is up to 8 × 24 ft (Fig. 13). The system uses two or more projectors, and images blend to create a seamless image. As in the CAVE, the user can interact with the image using various 2-D and 3-D input devices. The cost is approximately US $290,000 for an 8 × 24 ft three-projector system. The VisionDome Elumens Corporation Vision Series displays (5–8) use a hemispherical projection screen that has a single projection lens. Previous dome-based systems relied on multiple projectors and seamed-together output from multiple computers, making them both complicated to configure and prohibitively expensive. The high cost, complexity, and nonportability of these systems made them suitable for highly specialized military and training applications, but they were impractical and out of reach for most corporate users. Available in sizes from 1.5 to 5 meters in diameter, which accommodate from one to forty
1336
STEREO AND 3-D DISPLAY TECHNOLOGIES
screen. The image-based projection depends on the viewer position; if the viewer moves, the image must change accordingly, or straight lines become curved. The number of viewers within the viewing ‘‘sweet spot’’ increases as the screen diameter increases. Field-sequential stereo imaging with synchronized shutter glasses is supported on Elumens products. The maximum refresh rate currently supported is 85 Hz (42.5 Hz stereo pair). Passive stereo that projects leftand right-eye images simultaneously but of opposite polarization is currently under development. AUTOSTEREOSCOPIC DISPLAYS NO VIEWING DEVICES REQUIRED Figure 13. The Fakespace WorkWall.
people, the VisionDome systems range in price from US $15,000 to US $300,000. The projector is equipped with a patented ‘‘fish-eye’’ lens that provides a 180° field of view. This single projection source completely fills the concave screen with light. Unlike other fish-eye lenses, whose projections produce focal ‘‘hot spots’’ and nonlinear distortions, the Vision Series lens uses linear angular projection to provide uniform pixel distribution and uniform pixel size across the entire viewing area. The lens also provides an infinite depth of field, so images remain in focus on screens from 0.5 meters away to theoretical infinity at all points on the projection surface. The single-user VisionStation displays 1024 × 768 pixels at 1000 lumens; larger 3- to 5-meter VisionDomes display up to 1280 × 1024 pixels at 2000 lumens (Fig. 14). Elumens provides an application programming interface called SPI (Spherical Projection of Images). Available for both OpenGL and DirectX applications, SPI is an image-based methodology for displaying 3-D data on a curved surface. It enables off-axis projection that permits arbitrary placement of the projector on the face plane of the
Hard Copy Free Viewing. With practice, most readers can view stereo pairs without the aid of blocking devices by using a technique called free viewing. There are two types of free viewing, distinguished by the way the left- and right-eye images are arranged. In parallel, or uncrossed viewing, the left-eye image is to the left of the right-eye image. In transverse or cross viewing, they are reversed and crossing the eyes is required to form an image in the center. Some people can do both types of viewing, some only one, some neither. In Fig. 15, the eye views have been arranged in left/right/left order. To parallel view, look at the left two images. To cross view, look at the right two images. Figure 16 is a random dot autostereogram in which the scene is encoded in a single image, as opposed to a stereo pair (9). There are no depth cues other than binocular disparity. Using cross viewing, merge the two dots beneath the image to view the functional surface. Crossing your eyes even further will produce other images. [See (10) for a description of the method for generating these interesting images]. Holographic Stereograms. Most readers are familiar with holographic displays, which reconstruct solid images. Normally, a holographic image of a three-dimensional scene has the ‘‘look around’’ property. A popular combination of holography and stereo pair technology, called a holographic stereogram, involves recording a set of 2-D images, often perspective views of a scene, on a piece of holographic film. The film can be bent to form a cylinder, so that the user can walk around the cylinder to view the scene from any aspect. At any point, the left eye will see one view of the scene and the right eye another, or the user is viewing a stereo pair.
Left-eye view Figure 14. The VisionDome.
Right-eye view
Left-eye view
Figure 15. Free viewing examples.
STEREO AND 3-D DISPLAY TECHNOLOGIES
Figure 16. A random dot autostereogram cos[(x2 + y2 )(1/2) ] for −10 ≤ x, y ≤ 10.
Conventional display holography has long been hampered by many constraints such as limitations with regard to color, view angle, subject matter, and final image size. Despite the proliferation of holographic stereogram techniques in the 1980s, the majority of the constraints remained. Zebra Imaging, Inc. expanded on the developments in one-step holographic stereogram printing techniques and has developed the technology to print digital full-color reflection holographic stereograms that have a very wide view angle (up to 110° ), are unlimited in size, and have full parallax. Zebra Imaging’s holographic stereogram technique is based on creating an array of small (1- or 2-mm) square elemental holographic elements (hogels). Much like the pixels of two-dimensional digital images, hogel arrays can be used to form complete images of any size and resolution. Each hogel is a reflection holographic recording on panchromatic photopolymer film. The image recorded in each hogel is of a two-dimensional digital image on a spatial light modulator (SLM) illuminated by laser light in the three primary colors: red, green, and blue (Fig. 17).
Volumetric interference pattern
1337
Parallax Barrier Displays. A parallax barrier (2) consists of a series of fine vertical slits in an otherwise opaque medium. The barrier is positioned close to an image that has been recorded in vertical slits and backlit. If the vertical slits in the image have been sampled at the correct frequency relative to the slits in the parallax barrier and the viewer is the required distance from the barrier, then the barrier will occlude the appropriate image slits to the right and left eyes, respectively, and the viewer will perceive an autostereoscopic image (Fig. 18). The images can be made panoramic to some extent by recording multiple views of a scene. As the viewer changes position, different views of the scene will be directed by the barrier to the visual system. The number of views is limited by the optics and, hence, moving horizontally beyond a certain point will produce ‘‘image flipping’’ or cycling of the different views of the scene. High resolution laser printing has made it possible to produce very high quality images: the barrier is printed on one side of a transparent medium and the image on the other. This technique was pioneered by Artn in the early 1990s to produce hard-copy displays and is now being used by Sanyo for CRT displays. Lenticular Sheets. A lenticular sheet (1,2) consists of a series of semicylindrical vertical lenses called ‘‘lenticles,’’ typically made of plastic. The sheet is designed so the parallel light that enters the front of the sheet will be focused onto strips on the flat rear surface (Fig. 19). By recording an image in strips consistent with the optics of the lenticles, as in the parallax barrier display, an autostereoscopic panoramic image can be produced. Because the displays depend on refraction versus occlusion, the brightness of a lenticular sheet display is usually superior to the parallax barrier and requires no backlighting. Such displays have been mass produced for many years for such hard-copy media as postcards. In these two techniques, the image is recorded in strips behind the parallax barrier or the lenticular sheet. Although the techniques are old, recent advances in printing and optics have increased their popularity for both hard-copy and autostereoscopic CRT devices. In both the lenticular and parallax barrier cases, multiple views of a scene can be included to provide
“Hogel” Spatial light modulator (SLM)
Reference beam
Film image plane
Converging lens
Figure 17. Zebra ogram recording.
Imaging
holographic
stere-
1338
STEREO AND 3-D DISPLAY TECHNOLOGIES
an autostereoscopic image. Many variants have been proposed but to date the author knows of no commercially viable products using the technique. Right eye
Left eye image
Left eye
Right eye image
Parallax barrier Figure 18. Parallax barrier display.
Lenticular sheet
Alternating left and right eye image strips
Right eye Left eye
Figure 19. Lenticular sheet display.
motion parallax as viewers move their heads from side to side creating what is called a panoramagram. Recently, parallax barrier liquid-crystal imaging devices have been developed that can be driven by a microprocessor and used to view stereo pairs in real time without glasses. Some of these techniques are discussed later. Alternating Pairs The output from two vertically mounted video cameras are combined. An integrating circuit was designed to merge the two video streams by recording a fixed number of frames from one camera, followed by the same number of frames from the other camera. The technique imparts a vertical rocking motion to the image. If the scene has sufficient detail and the speed of the rocking motion and the angle of rotation are appropriate for the individual viewing the system, most viewers will fuse a 3-D image. The system was commercialized under the name VISIDEP. The technique can be improved using graphical and image processing methods. More details can be found in (2).
The DTI System The Dimension Technologies, Inc. (DTI) illuminator is used to produce what is known as a multiperspective autostereoscopic display. Such a display produces multiple images of a scene; each is visible from a well-defined region of space called a viewing zone. The images are all 2-D perspective views of the scene as it would appear from the center of the zones. The viewing zones are of such a size and position that an observer sitting in front of the display always has one eye in one zone and the other eye in another. Because the two eyes see different images in different perspectives, a 3-D image is perceived. The DTI system is designed for use with an LCD or other transmissive display. The LCD is illuminated from behind, and the amount of light passing through individual elements is controlled to form a full-color image. The DTI system uses an LCD backlight technology which they call parallax illumination (11). Figures 20 and 21 illustrate the basic concept. As shown in Fig. 20, a special illuminator is located behind the LCD. The illuminator generates a set of very thin, very bright, uniformly spaced vertical lines. The lines are spaced with respect to pixel columns such that (because of parallax) the left eye sees all the lines through the odd columns of the LCD and the right eye sees them through even columns. There is a fixed relation between the distance of the LCD to the illumination plate, and the distance of the viewer from the display. This in part determines the extent of the ‘‘viewing zones.’’ As shown in Fig. 21, viewing zones are diamond-shaped areas in front of the display where all of the light lines are seen behind the odd or even pixel columns of the LCD. To display 3-D images, left- and right-eye images of a stereoscopic pair are placed in alternate columns of elements. The left image appears in the odd columns, and the right image is displayed in even columns. Both left and right images are displayed simultaneously, and hence
Illumination plate
Light lines Pixels
Moving Slit Parallax Barrier A variation of the parallax barrier is a mechanical moving slit display popularized by Homer Tilton that he called the Parallactiscope (2). A single vertical slit is vibrated horizontally in front of a point-plotting output display such as a CRT or oscilloscope. The image on the display is synchronized with the vibration to produce
Liquid crystal display
d
Figure 20. DTI illuminator.
STEREO AND 3-D DISPLAY TECHNOLOGIES
L
R
L
R
L
R
Figure 21. Viewing zones.
the display is time parallel. Because the left eye sees the light lines behind the odd columns, it sees only the left-eye image displayed in the odd columns. Similarly, the right eye sees only the right-eye image displayed in the even columns. The 2-D/3-D Backlight System. There are many ways to create the precise light lines described before. One method that is used in DTI products is illustrated in Fig. 22 (12,13). The first component is a standard off-theshelf backlight of the type used for conventional 2-D LCD monitors. This type of backlight uses one or two miniature fluorescent lamps as light sources in combination with a flat, rectangular light guide. Two straight lamps along the top and bottom of the guide are typically used for large displays. A single U-shaped lamp is typically used for smaller displays. An aluminized reflector is placed around the lamp(s) to reflect light into the light guide.
Front diffuser Lenticular lens
Secondary LCD Light guide
1339
The flat, rectangular light guide is typically made of acrylic or some other clear plastic. Light from the lamp enters the light guide from the sides and travels through it due to total internal reflection from the front and back surfaces of the guide. The side of the light guide facing away from the LCD possesses a pattern of reflective structures designed to reflect light into the guide and out the front surface. Several possible choices for such structures exist, but current manufacturers usually use a simple pattern of white ink dots applied to the rear surface of the light guide in combination with a white reflective sheet placed behind the light guide. The second component is a simple, secondary LCD which, in the ‘‘on’’ state, displays a pattern of dozens of thin, transparent lines that have thicker opaque black stripes between them. These lines are used for 3-D imaging as described in the previous section. The third major component is a lenticular lens, again shown in Fig. 22 This lens consists of a flat substrate upon whose front surface of hundreds of vertical, parallel cylindrical lenslets are molded. Light coming through the dozens of thin transparent lines on the secondary LCD is reimaged into thousands of very thin, evenly spaced vertical lines by a lenticular lens array spaced apart from and in front of the secondary LCD. The lines can be imaged onto an optional front diffuser located in a plane at one focal length from the lenticular lenslets. The pitch (centerto-center distance) of the lines on the light guide and the lenticular lenses must be chosen so that the pitch of the light lines reimaged by the lenticular lenslets bears a certain relationship to the pitch of the LCD pixels. Because the displays are likely to be used for conventional 2-D applications (such as word processing and spreadsheets) as well as 3-D graphics, the system must be capable of generating illumination so that each eye sees all of the pixels of the LCD and a conventional full-resolution 2-D image can be displayed by using conventional software. Note that when the secondary LCD is off, in other words in the clear state where the lines are not generated, the even diffuse light from the backlight passes through it freely and remains even and diffuse after being focused by the lenticular lens. Therefore, when the secondary LCD is off, no light lines are imaged, and the observer sees even, diffuse illumination behind all of the pixels of the LCD. Therefore, each of the observer’s eyes can see all of the pixels on the LCD, and full-resolution 2-D images can be viewed. DTI sells two displays, a 15-inch at $1699 and optional video input at $300 extra, and an 18.1-inch at $6999, video included. Both have 2-D and 3-D modes and accept the standard stereo formats (field sequential, frame sequential, side by side, top/bottom). Seaphone Display
Reflector Lamp
Figure 22. Backlight system.
Figure 23 shows a schematic diagram of the Seaphone display (14–16). A special transparent µPol-based color liquid-crystal imaging plate (LCD : SVGA 800 × 600) that has a lenticular sheet and a special backlight unit is used to produce a perspective image for each eye. The lenticular
1340
STEREO AND 3-D DISPLAY TECHNOLOGIES
Fresnellens LCD White LED array
Lenticular sheet
Polarizers
Mirror Diffuzer Infrared camera
Figure 25. Plan view of a backlight unit.
Image circuit
Mirror Viewers Infrared illuminator Figure 23. A schematic of the Seaphone display.
sheet creates vertical optical scattering. Horizontal strips of two types of micropolarizers that have orthogonal polarization axes are transmitted on odd versus even lines of the LCD. The backlight unit consists of a large format convex lens and a white LED array filtered by the polarizers whose axes of polarization are the same as those of the µPol array. The large format convex lens is arranged so that an image of the viewers is focused on the white LED array. The light from the white LED array illuminates the right half face of the viewer using the odd (or even) field of the LCD, when the geometrical condition is as indicated in Fig. 25. The viewers’ right eyes perceive the large convex lens as a full-size bright light source, and the viewers’ left eyes perceive it as dark one; similarly for the left eye (see Fig. 24). On the head tracking system, the viewers’ infrared image is focused on the diffuser by the large format convex lens and is captured by the infrared camera. An image circuit modulates the infrared image and produces binary half right and left face images of each viewer. The binary half face images are displayed on the appropriate cells of the white LED array. The infrared image is captured by using the large convex format lens. There is no parallax
Microretarder
in the captured infrared image when the image is focused on the white LED array. Hence, the displayed infrared viewers’ binary half right face image (the appropriate cells) and the viewers’ image that is focused by the large format convex lens are automatically superimposed on the surface of the white LED array. The bright areas of the binary half face images (the appropriate cells) are distributed to the correct eye of the viewers. On the Seaphone display, several viewers can perceive a stereo pair simultaneously, and they can move independently without special attachments. The display currently costs 1,492,000 yen. The Sanyo Display The Sanyo display uses LC technology for both image presentation and a parallax barrier (17). Because the thermal expansion coefficients are the same, registration is maintained under different operating conditions. They call the parallax barrier part of the display the ‘‘image splitter.’’ They use two image splitters, one on each side of the LC (image presentation) panel (Fig. 26). The splitter on the backlight side is two-layer thin films of evaporated aluminum and chromium oxide. The vertical stripes are produced by etching. The stripe pitch is slightly larger than twice the dot pitch of the LC panel. The viewerside splitter is a low-reflection layer. The stripe pitch is slightly smaller than twice the dot pitch on the LC image presentation panel. Each slit corresponds to a column of the LC panel. They claim that the technique produces no
Barrier Aperture
Polarizers
Right-eye image Left-eye image Barrier Aperture 65 mm
White LED array Polarizers
Backlight
LCD Figure 24. Each perspective backlight.
Image splitter 2 LC panel Image splitter 1
Viewer
ex. 580 mm ex. 0.9 mm (in air) Figure 26. A double image splitter.
STEREO AND 3-D DISPLAY TECHNOLOGIES
ghosting. They also have a head-tracking system in which the viewer does not have to wear any attachments. The HinesLab Display An autostereoscopic display using motion parallax (18–20) has been developed by HinesLab, Inc. (www.hineslab.com) of Glendale, California. The display uses live or recorded camera images, or computer graphics, and displays multiple views simultaneously (Fig. 27). The viewer stands or sits in front of the display where the eyes fall naturally into two of multiple viewing positions. If the viewer shifts positions, the eyes move out of the two original viewing positions into two different positions where views that have the appropriate parallax are prepositioned. This gives a natural feeling of motion parallax as the viewer moves laterally. An advantage of this approach is that multiple viewers can use the display simultaneously. The technology provides from 3 to 21 eye positions that give lateral head freedom and look-around ability, confirming the positions and shapes of objects. The device is NTSC compatible, and all images can be projected on a screen simultaneously in full color without flicker. The display is built around a single liquid-crystal panel, from which multiple images are projected to a screen where they form the 3-D image. The general approach used to create the autostereo display was to divide the overall area of the display source into horizontal rows. The rows were then filled by the maximum number of images, while maintaining the conventional 3 : 4 aspect ratio; no two images have the same lateral position (Fig. 28). The optical design for these configurations is very straightforward. Identical projection lenses are mounted
Figure 27. HinesLab autostereoscopic computer display — video arcade games.
2 Rows, 75% efficiency
3 Rows, 78% efficiency
1341
on a common surface in the display housing, and they project each image to the back of a viewing screen from unique lateral angles. Working in conjunction with a Fresnel field lens at the viewing screen, multiple exit pupils, or viewing positions, are formed at a comfortable viewing distance in front of the display. Figure 29 shows an arrangement of seven images displayed in three horizontal rows on the LCD panel. VOLUMETRIC DISPLAYS A representation technique used in computer visualization to represent a 3-D object uses parallel planar cross sections of the object, for example, CAT scans in medical imaging. We call such a representation a multiplanar image. Volumetric or multiplanar 3-D displays normally depend on moving mirrors, rotating LEDs, or other optical techniques to project or reflect light at points in space. Indeed, aquariums full of Jell-O that have images drawn in ink inside the Jell-O have also been used for such displays. A survey of such methods can be found in (2,21). A few techniques are worth mentioning. First, we discuss the principle of the oscillating mirror. Oscillating Planar Mirror Imagine a planar mirror which can vibrate or move back and forth rapidly along a track perpendicular to the face of a CRT, and assume that we can flash a point (pixel) on the CRT that decays very rapidly (Fig. 30). Let the observer be on the same side of the mirror as the CRT, so that the image in the CRT can be seen reflected by the mirror. If a point is rendered on the surface of the CRT when the mirror reaches a given location in its vibration and the rate of vibration of the mirror is at least fusion frequency (30 Hz), the point will appear continuously in the same position in space. In fact, the point would produce a solid image in the sense that, as we changed our position, our view of the point would also change accordingly. If the point is not extinguished as the mirror vibrates, then the mirror would reflect the point at all positions on its track, and the viewer would see a line in space perpendicular to the face of the CRT. Any point plotted on the surface of the CRT would appear at a depth depending on the position of the mirror at the instant the point appears on the CRT. The space that contains all possible positions of points appearing on the CRT defines what is called the view volume. All depth cues would be consistent, and there would be no ‘‘disconnection’’ of accommodation and vergence as for stereo pairs. The optics of the planar mirror produce a view volume depth twice that of the mirror excursion or
4 Rows, 81% efficiency
Figure 28. Possible image arrangements on the liquid-crystal projection panel.
1342
STEREO AND 3-D DISPLAY TECHNOLOGIES
Optional television broadcast
Subject
Lamp Fresnel lens
Mirror
3-D Image
Liquid-crystal projection panel
Combines images
7 - Lens camera
7 Eye positions 7 Lenses Screen
HinesLab 3DTV U.S. Pats. 5,430,474 & 5,614,941 Mirror
Figure 29. The seven-lens autostereo display.
Volume in which image will appear
Image of CRT
CRT
Limit of prime viewing
Mirror displacement
Figure 30. Vibrating mirror.
+
Resulting image displacement
−
p
q
Volume in which image will appear
d CRT
Figure 31. A varifocal mirror.
displacement depth. If the focal length of the mirror is also changed during the oscillation, a dramatic improvement in view volume depth can be obtained. Varifocal Mirror The varifocal mirror was a commercially available multiplanar display for several years. The technique uses
Image of CRT
h
Mirror displacement extremes (exaggerated for clarity)
a flexible circular mirror anchored at the edges (Fig. 31). A common woofer driven at 30 Hz is used to change the focal length of the mirror. A 3-D scene is divided into hundreds of planes, and a point-plotting electrostatic CRT plots a single point from each. The mirror reflects these points, and the change in the focal length of the mirror affects their apparent distance from the viewer. A software
STEREO AND 3-D DISPLAY TECHNOLOGIES
1343
z Z Dimension volume
x y
Out resolution angle
Multiplanar display surface
Display control computer X, Y Input synchronization electronics
X, Y Scanners Layers (RGB)
Modulator
Figure 32. Omniview volumetric display.
program determines which point from each plane is to be rendered, so that lines appear to be continuous and uniform in thickness and brightness. The resulting image is solid. The view volume depth is approximately 72 times the mirror displacement depth at its center. The images produced by the CRT would have to be warped to handle the varying focal length of the mirror. Such a mirror was produced by several companies in the past. At that time, only a green phosphor existed which had a sufficiently fast decay rate to prevent image smear.
although head trackers could be implemented for single view use. In addition, they are limited to showing computer-generated images. Another major disadvantage of multiplanar displays has been that the electro-optics and point-plotting devices used to produce the image are not sufficiently fast to produce more than a few points at a time on a 3-D object, and laser grids are far too expensive to generate good raster displays. Hence, multiplanar or volumetric displays have been limited to wire frame renderings.
Rotating Mirror
Acknowledgments The author thanks the following individuals who contributed to this article: Marc Highbloom, Denise MacKay, VREX; Shihoko Kajiwara, Seaphone, Inc.; Jesse Eichenlaub, Dimension Technologies, Inc.; Jeff Wuopio, StereoGraphics, Inc.; Richard Steenblik, Chromatek; David McConville, Elumens Corporation; Vivian Walworth, The Rowland Institute for Science; Shunichi Kishimoto, Sanyo Corporation; Jeff Brum, Fakespace Systems, Inc.; Michael Starks, 3-DTV Corp.; David McConville, Elumens Corp.; Stephen Hines, HinesLab, Inc.; and Mark Holzbach, Zebra Imaging, Inc.
A variant of this approach developed by Texas Instruments using RGB lasers for point plotting and a double helix mirror rotating at 600 rpm as a reflecting device was also commercially available for a time under the name of Omniview (Fig. 32). Some recent efforts have included LCD displays, but the switching times are currently too slow to produce useful images. Problems and Advantages A major advantage of multiplanar displays is that they are ‘‘solid.’’ Accommodation and convergence are not disconnected, as in viewing stereo pairs where the visual system always focuses at the same distance. Users who are stereo-blind can see the depth, and the image is viewable by several people at once. The primary problem that these mirror-oriented technologies have is that the images they produce are transparent. The amount of information they can represent before the user becomes confused is small because of the absence of hidden surface elimination,
BIBLIOGRAPHY 1. T. Okoshi, Three-Dimensional Imaging Techniques, Academic Press, NY, 1976. 2. D. F. McAllister, ed., Stereo Computer Graphics and Other True 3-D Technologies, Princeton University Press, Princeton, NJ, 1993. 3. H. Morgan and D. Symmes, Amazing 3-D, Little, Brown, Boston, 1982. 4. J. Lipscomb, Proc. SPIE: Non-Holographic True 3-D Display Techniques, 1989, Vol. 1083, pp. 28–34, LA.
1344
STILL PHOTOGRAPHY
5. Multi-pieced, portable projection dome and method of assembling the same. US Pat. 5,724,775, March 10, 1998, R.W. Zobel Jr. et al. 6. Tiltable hemispherical optical projection systems and methods having constant angular separation of projected pixels. US Pat. 5,762,413, June 9, 1998, D. Colucci et al. 7. Systems, methods and computer program products for converting image data to nonplanar image data. US Pat. 6,104,405, August 15, 2000, R.L. Idaszak et al. 8. Visually seamless projection screen and methods of making same. US Pat. 6,128,130, October 3, 2000, R.W. Zobel Jr. et al. 9. C. W. Tyler and M. B. Clarke, Proc SPIE: Stereoscopic Displays and Applications, 1990, Vol. 1256, pp. 187, Santa clara. 10. D. Bar-Natan, Mathematica. 1(3), pp. 69–75 (1991). 11. Autostereoscopic display with illuminating lines and light valve, US Pat. 4,717,949, January 5, 1988, J. Eichenlaub. 12. Autostereoscopic display illumination system allowing viewing zones to follow the observer’s head, US Pat. 5,349,379, September 20, 1994, J. Eichenlaub. 13. Stroboscopic illumination system for video displays, US Pat. 5,410,345, April 25, 1995, J. Eichenlaub. 14. T. Hattori, T. Ishigaki et al., Proc. SPIE, 1999, Vol. 3639, pp. 66–75, San Jose. 15. D. Swinbanks, Nature, 385(6,616), 476 Feb. (1997). 16. Stereoscopic display, US Pat. 6,069,649, May 30, 2000, T. Hattori, San Jose. 17. K. Mashitani, M. Inoue, R. Amano, S. Yamashita, and G. Hamagishi, Asia Display 98, 151–156 (1998). 18. S. Hines, J. Soc. Inf. Display 7(3), 187–192 (1999). 19. Autostereoscopic imaging system, US Pat. 5,430,474, July 4, 1995, S. Hines. 20. Multi-image autostereoscopic imaging system, US Pat. 5,614,941, March 25, 1997, S. Hines. 21. B. Blundell and A. Schwarz, Volumetric Three Dimensional Display Systems, Wiley, NY, 2000.
STILL PHOTOGRAPHY RUSSELL KRAUS Rochester Institute of Technology Rochester NY
INTRODUCTION Camera still imaging: The use of a lighttight device that holds a light-sensitive detector and permits controlled exposure to light by using a lens that has diaphragm control and a shutter, a device that controls the length of exposure. Controlled exposure is simply taken as an amount of light during a continuous period of time that produces a desired amount of density on the film after development. The desired amount of exposure is typically determined by either experimental methods know as sensitometry or through trial and error. The shutter range for exposure can be between several hours and 1/8000th of a second, excluding the use of a
stroboscopic flash that permits exposure times shorter than a millionth of a second. The image has a size or format of standard dimensions: 16-mm (subminiature), 35-mm(miniature), and 60-mm (medium format). These are commonly referred to as roll film formats. Four inches by five inches, 5 × 7 inches, and 8 × 10 inches are three standard sheet film sizes. A view camera, monorail camera, or folding type is used to expose each sheet one at a time. Folding type, technical cameras that have a baseboard can be single sheet exposure or can adapt to a roll film back of reduced dimensions. All current camera systems can either replace their film detectors with digital detectors or themselves are replaced entirely by a digital version. The nature of current still photography can be seen in its applied aspects: Documentary, reportage, scientific recording, commercial/ advertising, and fine art shooting are the primary realms of the professional photographer. In each activity, care is given to the materials, equipment, and processes by which a specific end is achieved. Scientific and technical photographers use the photographic process as a data collection tool where accuracy in time and space is of paramount importance. Photography for the commercial, documentary, and fine arts photographers has never been an objective and simple recording of an event or subject. Documentary photography before the Farm Security Administration attempts at reporting the Dust Bowl of the 1930s illustrated a point of view of the photographer. Photography was not simply a moment of captured time, an opening of a window blind to let in the world through a frame, but rather the photograph was the result of a complex social and political view held by the image-maker. The documentary/reportage photography of Hine, Evans, and Peress represent their unique vision and understanding of their times, not a na¨ıve recording of events. This type of photography has as much purpose and artifice as commercial/advertising shooting. The visual selling of product by imagery designed to elicit an emotional response has been an integral part of advertising for more than 100 years. The psychology of photography has remained relatively the same during this time, albeit photocriticism has had many incarnations, but the technology that has made photography the tool it is has changed very rapidly and innovatively during the last century. The current digital evolution in photography will further advance the tools available and alter the way images are captured and displayed.
A BRIEF HISTORY The history of photography traces the coincidence of two major technical tracks. The first track is the optical track. It includes the history of the camera obscura. Mentioned by Aristotle in the fourth century B.C., the camera obscura is basically a dark chamber useful for projecting the world through a pinhole into a closed, dark space. Described for the viewing of a solar eclipse, this pinhole device remained the basis for ‘‘camera imaging’’ for more than 1000 years. By the thirteenth century A.D. a lens had been added to the
STILL PHOTOGRAPHY
device, at least in theory. Three hundred years later at the height of the Italian Renaissance a number of scientificoartisans (Cardano and della Porto) mention or describe the camera obscura using a lens. By the mid sixteenth century, Barbaro describes the use of a lens in conjunction with a diaphragm. The diaphragm allows the camera to project a finely sharpened image by virtue of stopping down to minimize aberrations and to control the intensity of the light passing through the lens. The diaphragm or stop and the lens limit the width of the beam of light passing through the lens. The changeover from double convex lens to a meniscus type lens at the end of the eighteenth century limited the aberrations produced by the camera obscura. The prototype of the modern camera was born. There are other modifications that gradually came into existence. Optical achievements in the areas of astronomy and microscopy contributed toward contemporary photographic equipment. Sturm, a mathematician, produced the forerunner of the modern single-lens reflex camera by the later part of the seventeenth century. Photography had to wait approximately 150 years before the idea of permanently capturing an image became a practicality, and modern photography began its journey into the twenty-first century. The idea of permanently fixing an image captured by the camera obscura must have been in the ether for hundreds of years. Successful fixing took the independent efforts of Louis Daguerre and Joseph Niepce who approached the problem of permanent imaging from different vantage points. Both Daguerre and Niepce followed the trail broken by others to permanent pictures. The chemist, Johann Heinrich Schulze, is credited with the discovery of the light sensitivity of silver nitrate (AgNO3 ), and a decade later in England, Sir Josiah Wedgewood of Wedgewood China fame employed AgNo3 to make temporary photograms. These silhouettes were made by exposing silver nitrate coated paper to sunlight. Silver nitrate darkened upon exposure to sunlight, and those areas of the coated paper covered by an object remained white. Unfortunately, Wedgewood could not prevent the AgNO3 paper from darkening in toto after time. While Niepce and Daguerre worked in France, Sir John Herschel and Wm. Henry Fox Talbot contributed to the emerging science of photography from across the channel. Talbot is credited with the invention of the negative-positive process resulting in the Calotype in photography. A coating of silver iodide and potassium iodide is applied to heavy paper and dried. Before exposure, a solution of silver nitrate, acetic acid, and gallic acid is applied. Immediately after exposure to bright sunlight, the paper is developed in a silver nitrate and gallic acid solution. Washing, fixing in sodium thiosulfate, and drying completed the processing of a Calotype paper negative. Herschel’s chemical tinkering led to his formulation of ‘‘hyposulfite.’’ ‘‘Hypo’’ was a misnomer for sodium thiosulfate; however, the term hypo has remained a photographer’s shortspeak for fixer. The invention of this chemistry permitted fixing the image. The removal of light-sensitive silver nitrate by this solution prevented the
1345
image from darkening totally under continued exposure to light. Daguerre continued to proceed with his investigations into the direct positive process that was soon to be known by his name, the daguerreotype, and by 1837, he had produced permanent direct positive images. The daguerreotype depended on exposing silvered metal plates, generally copper, that had previously been in close proximity to iodine vapors. The sensitized plate is loaded into a camera for a long exposure, up to one hour. Development was managed by subsequent fuming of the plate in hot mercury vapors (60° C). The plate is finally washed in a solution of hot salts that remove excess iodine, permanently fixing the image (Fig. 1). For 15 years, the daguerreotype achieved worldwide recognition. During these years, major improvements were made in the process. Higher sensitivity was achieved through by using faster lenses and bromide-chlorine fuming. Images that formerly required 30 minutes or more of exposure could be made in 2 minutes. This process produced an amazing interest in photography, and an untold number of one of a kind images were produced, but the daguerreotype eventually gave way to the negative-positive process of Talbot and Herschel. Modern still photography had been born. Contemporary still photography begins where the daguerreotype ends. The ascendancy of the negativepositive process is the fundamental basis of film-based photography. IMAGE RECORDING Photography is distinguished by the idea that the image that will be recorded is of a real object. The recording of the image is the result of the optics involved in collecting the reflected light, and the recorded density range is a relative recording of the tones of the subject created by the source light falling, illuminance (luminous flux per unit area), on the subject. The establishment of a viewpoint controls perspective. The photographer must first choose
Figure 1. A Giroux camera explicitly designed for the daguerreotype.
1346
STILL PHOTOGRAPHY
the location from which the photograph will be taken. This achievement of perspective is essential to creating an appearance of a three-dimensional image on a two dimensional surface. The image projected by the lens is seen on ground glass placed at the focal plane or reflected by a mirror arrangement as in a single-reflex camera. The ground glass organizes or frames the image. In miniature and medium format photography, the photographer is looking at the image through a viewfinder. The image produced in the viewfinder shows the relative positions of objects in the scene as images produced by the lens in the same relative relationship. Distance is suggested in the image by the relative image sizes of objects in the field of view. PERSPECTIVE
Focal plane of 35-mm image
Subject
Photographic perspective is also controlled by the choice of optics. Using short focal length lenses, a wide angle view of the scene is imaged in the viewfinder. The angle of view determines how much of the image will be projected by the lens onto the frame. In practice, the frame is considered to the film size, horizontal × vertical dimensions. The lens focal length and the film size determine the angle of view (Fig. 2). The wider angle permits projecting a greater area, and objects in the scene will appear smaller in image size. A change to a longer than normal focal length lens limits the area projected due to a narrower angle of view, but the relative size of the image is greater than normal. In both cases, the change in relative size of the images depends solely on the distance of the objects from the camera. Both the position of the camera and the lens focal length control perspective. Image size is directly proportional to focal length and inversely proportion to the distance from the viewpoint and camera position. For a given lens, a change in film size alters the angle of view, for example, a 4 × 5 inch reducing back used in place of a 5 × 7 inch film plane decreases the angle of view. Perspective creates the illusion of depth on the twodimensional surface of a print. Still photography employs several tools to strengthen or weaken perspective. Control of parallel lines can be considered part of the issue of perspective. Linear perspective is illustrated by the convergence of parallel lines that give visual cues to the presentation of implied distance in the photograph. This
47° Lens axis
is shown through the classic example of the illusion of converging railroad tracks that seem to come together at a vanishing point in the distance. The image size of the cross ties becomes smaller as the their distance from the camera increases. We are more used to seeing horizontal receding lines appear to converge than vertical lines. In photographing a building from a working distance that requires the camera to be tilted upward to capture the entire building, the parallel lines of the building converge to a point beyond the film frame. When the film format is presented to the scene in portrait fashion, the vertical dimension of the film is perpendicular to the lens axis, parallel lines in an architectural structure will cause the image of the building to fall away from the viewer. The resultant image is known as keystone distortion. This can be controlled by the photographer by using of a view or field camera whose front and rear standards can tilt or rise. On a smaller camera, 35-mm or 6-cm, a perspective control lens can be used for correction. This type of lens allows an 8 to 11-mm shift (maximum advantage occurs when the film format is rotated so that the longer dimension is in the vertical direction) and a rotation of 360° . Small and medium format camera lenses permit tilting the lens relative to the film plane. In a relatively short object distance, the tilt-shift or perspective control (PC) lens is very helpful. The PC lens has a greater angle of view than the standard lens of the same focal length. This larger angle of view allows the photographer to photograph at a closer working distance. The angle of view can be calculated as follows: 2 tan−1 (d/2/f ): where d is equal to the dimension of the film format, and f is the focal length of the lens. This larger angle of view is referred to in relation to its diagonal coverage. These lenses have a greater circle of good definition. This larger circle of good definition permits shifting the lens by an amount equal to the difference between the standard circle of good definition (which is usually equal to the diagonal of the image frame) and the circle of good definition for the PC lens that has the greater angle of view (Fig 3). Further, assume that the image size of a building has a 0.001 scale of reproduction for a 35-mm lens at a working distance of 35 m. Therefore, every 1-mm upward shift of the PC lens causes a 1-m downward movement of the object. The effect generated by the perspective control lens is the same as the rising front on a view/field camera. Objects far a away subtend a smaller angle and create an image that is so small that the details of the object are unresolvable. The photographer can also use this lack
f le o tion Circ defini d goo Lens Axis
ge Ima at m r o f
Angle of view Figure 2. Angle of view of a lens–film format combination.
Figure 3. Format and circle of good definition.
STILL PHOTOGRAPHY
of detail by the controlling the depth of field (DOF). A narrow DOF causes objects beyond a certain point of focus to appear unsharp, thus fostering a sense of distance. Camera lenses focus on only one object plane but can render other objects acceptably sharp. These other objects in front of and behind the plane of focus are not as sharp. These zones of sharpness are referred to as depth of field. Acceptable sharpness depends on the ability of the eye to accept a certain amount of blur. The lens does not image spots of light outside the focused zone as sharply as objects that are focused. These somewhat out of focused spots are known as circles of confusion. If small enough, they are acceptable to the average viewer in terms of perceived sharpness. When the sharpness of objects in the zones before and after the zone of focus cannot be made any sharper, then the circles of confusion are commonly referred to as permissible circles of confusion. In 35-mm format, these permissible circles of confusion typically have a diameter of 0.03 mm in the negative. This size permits magnification of the negative to 8 to 10× and maintains acceptable sharpness when the print is viewed at a normal viewing distance. When the lens renders permissible circles of confusion in front of and behind the focused zone, these zones are referred to as the near distance sharp and the far distance sharp. At long working camera-to-subject distances, near and far sharpness determine depth of field (distance far sharp minus distance near sharp). In practice, the hyperfocal distance, the near distance rendered sharp when the lens is focused at infinity, for a given f # is used to determine the near distance sharp and the far distance sharp. Hyperfocal distance is focal length squared divided by f # times the circle of confusion, H = fl2 /(f # × cn). Therefore, the near and far distances sharp can be calculated as follows: Dn = Hu/[H + (u − fl)] where u is the distance from camera lens to subject and Df = Hu/[H − (u − fl)]. In studio photography, space is limited, a sense of depth can be fostered by the placement of lighting. The casting of foreground and background shadows and the creation of tonality on curved surfaces suggest greater depth than is actually there. In outdoor landscape photography, the inclusion of some near camera objects such as a tree branch in relationship to a distance scene cues the viewer to the distances recorded. LENS CHOICE AND PERSPECTIVE Photographs have either a strong or weak perspective that is permanently fixed. In three-dimensional viewing of a scene, the perspective and image size change in response to a change in viewing distance. This does not occur in photographs of two-dimensional objects. But in photographs containing multiple objects at different image planes or converging parallel lines, viewing distance can have an effect. Viewing distance changes influence the sense of perspective. Viewing distance is equal to the focal length of the camera’s lens when the print is made from contact exposure with the negative. When an enlarger is used, the focal length of the camera lens must be multiplied by the amount of magnification. Thus, if a 20-mm lens is used on a 35-mm camera and the
1347
negative is magnified by projection to an image size of 4 × 6 inches, the correct viewing distance is 80 mm. This has the effect of changing the perspective from a strong to a normal perspective. People tend to view photographs from a distance that is approximately the diagonal measurement of the print being viewed, thereby accepting the perspective determined by the photographer. In the previous example, 80 mm is too close for people to view a photograph. Seven to 8 inches is most likely to be chosen as the correct viewing distance. This maintains the photographer’s point of view and the strong perspective chosen. The choice of a wide-angle lens to convey strong perspective carries certain image limitations. Wide-angle close-up photography of a person’s face can present a distorted image of the person. The nose will appear unduly large in relation to the rest of the face. This would occur if the photographer wishes to fill the film frame with the face and does so by using a short object distance and a wide-angle lens. To maintain a more naturalistic representation, an experienced photographer will use a telephoto lens from a greater distance. Other distortions arise when using wide-angle lenses. In large group portraits where the wide-angle lens has been chosen because of the need to include a large group and object distance is limited by physical constraints, the film receives light rays at an oblique angle. Spherical objects such as the heads of the group portrait will be stretched. The amount of stretching is determined by the angle of the light from the lens axis that forms the image. The image size of the heads changes relative to the reciprocal of the cosine of the angle of deviation from the lens axis. This type of distortion occurs in using normal focal length lenses, out the amount of stretch is much less, and because the final print is viewed at a normal viewing distance, the viewer’s angle of view approximates the original lens’ angle of view and corrects for the distortion. NEGATIVE-POSITIVE PROCESS The process of exposing silver halide light-sensitive materials, in camera, relies on the range of density formed on a developed negative. Light reflected, luminance (intensity per unit area), from a subject is collected by a lens and focused as an image at a given distance behind the lens, the focal plane. The purposeful act of exposure and the concomitant development given to the exposed film achieves a tonal response or range of density that captures the relative relationships of tones in the original scene, albeit in reverse or negative state. Whites of uniform brightness in a scene, considered highlights, appear as a uniform area of darkness or relatively greater density in the negative. Blacks of uniform darkness in a scene, considered shadows, appear as uniform areas of lightness or a lesser density in the negative. Other tones between whites and blacks are rendered as relative densities between the extremes of black and white. This is the continuous tone or gray scale of photography. The negative’s density range must be exposed to other silver sensitized material to convert the reversed tones into a positive. The negative is made on a transparent base
1348
STILL PHOTOGRAPHY
that permits its subsequent exposure to a positive, either by contact or projection. There are obvious exceptions to the general practice of the negative-positive photographic approach in image making. Positives can be made directly on film for projection, direct duplication, or scanning. These transparencies may either be black-and-white or color. Likewise, direct positives can be made on paper for specific purposes. This process of representing real world objects by a negative image on film and then by a positive image on paper is basic to traditional still photography. CAMERAS Comments on the breadth of available camera styles, formats, features and advantages and disadvantages are beyond the scope of this article. However, certain specific cameras will be mentioned in terms of their professional photographic capabilities. The miniature camera is the 35-mm format. This format is represented by two types: compact and ultracompact. There are fixed-lens and interchangeable-lens type 35-mm cameras. The latter type, unlike the former, constitutes the basis of an imaging system that can be further subdivided into two basic types: rangefinder and single-lens reflex cameras. Both singlelens reflex and rangefinder style cameras are available as medium and special format cameras as well (Fig. 4). Professional 35-mm camera systems are characterized by their interchangeable lens systems, focal plane shutters, motorized film advance and rewinds, electronic flash systems, extra length film magazines, and highly developed autofocusing and complex exposure metering systems. Lenses from ultrawide-angle focal lengths, fish-eye (6mm), to extremely long focal lengths (2,000-mm) can replace the typical normal (50-mm) focal length lens. Special purpose lenses such as zoom lenses of various focal lengths and macro lenses that permit close-up photography resulting in image to object ratios greater than 1 : 1 are two of the most common specialty lenses.
21/4″
5″
36 mm
21/4″
24 mm
4″ Figure 4. Film/camera formats showing relative differences among them.
Among the specialty lenses that are of interest is the fish-eye lens. A wide-angle lens can be designed to give a greater angle of view if the diagonal format covered is reduced relative to the standard format. This unique lens projects a circular image whose diameter is 21 to 23-mm onto the 43-mm diagonal dimension of the 35 mm film format. Hence, it provides a circular image of the scene. The true fish-eye angle of view is between 180 and 220° . A normal wide-angle lens that has a rectilinear projection can achieve a very short focal length (15-mm) until its peripheral illumination noticeably decreases by the cos4 law. A change in lens design to one of retrofocus does not alter this loss of illumination. Other projection geometry is used to permit recording angles of view greater than 180° . Equidistant, orthographic, and equisolid angle projections are used to increase angles of view. The image projection formula for a rectilinear projection is y = f tan θ , for equidistant projection, y = f θ , and for equisolid angle projection, y = 2f sin(θ /2). Motor Drive A motor drive permits exposing of individual frames of film at relatively high film advance speeds. Unlike motion picture film cameras that expose at a rate of 24 frames per second continuously, motor drives advance a film frame at typically from three to eight frames per second. An entire roll of 35-mm 36-exposure film may be rewound by the motor in approximately 4 seconds. State-of-the-art 35-mm camera systems provide built-in film advance systems and do not require separate add-on hardware. When coupled with advanced fast autofocusing, the photographer has an extraordinary tool for recording sports action, nature, and surveillance situations. Electronic Flash Specialized flash systems are another feature of the advanced camera system. The fundamental components are power supply, electrolytic capacitor, reflector design, triggering system, and flash tube. Some triggering systems control the electronic flash exposure through the lens and are referred to as TTL. TTL cameras permit evaluating light that reaches the film plane. Variations of the basic TTL approach are the A and E versions of TTL. The A-TTL approach uses a sensor on the flash unit to evaluate the flash in conjunction with the aperture and shutter settings of the camera. E-TTL uses the camera’s internal sensor to evaluate the flash and to set the camera aperture. Another variant uses the camera-to-subject distance determined by the camera’s focusing system to control the flash duration. Modern camera systems communicate with the camera via a hot shoe and or cable (PC) connection. Professional cameras can program electronic flash to allow for highspeed flash at a shutter speed of 1/300 of a second. This is achieved by dividing the current discharged by the capacitor into a series of very short pulses. There are occasions for flash exposure at the end of the shutter travel. The rear-curtain synchronization allows continuous illumination of a moving subject that would produce any associated movement blur trailing the subject. X-synchronization flash would occur when the shutter is
STILL PHOTOGRAPHY
first fully opened. In X synchronization, the flash occurs when the first curtain is fully open and the following curtain has not begun to move; the blur of a moving image occurs in front of the image. Electronic flash systems produce high intensity, ultrashort bursts of light from the discharge of an internal capacitor. This discharge occurs in a quartz envelope filled with xenon gas. Typical effective flash excitation occurs within 1 millisecond. When used with a thyristor that controls exposure, the flash duration can be as short as 0.02 milliseconds (Fig. 5). The thyristor switches off the flash when a fast photodiode indicates that sufficient exposure has occurred. This system allows fast recharging of the capacitor to the appropriate current if an energy efficient design is used. This type of design uses only the exact level of charge in the capacitor needed for a specific exposure. Fast recycling times are available, albeit at power levels well below the maximum available. The quenching tube design permits dumping excess current to a secondary, low-resistance flash tube after the primary flash tube has been fired. While the accuracy of the exposure is controlled, the current is completely drained from the capacitor. Recycling times and battery life are fixed. The output of an electronic flash is measured in lumens per second, a measure of the rate of flux. A lumen is the amount of light falling on a uniform surface of 1 ft2 . The source is one candela at a distance of 1 foot from the surface. Because an electronic flash emits light in a defined direction, a lumen can be considered the amount of light given off in a solid angle. A more photographic measure is beam-candle-power seconds (BCPS), a measure of the flash’s output at the beam position of the tube. Beam-candle-power seconds is used to determine the guide number for flash effectiveness. An electronic flash’s guide number can be expressed as GN = (ISO × BCPS) 0.5 K, where K is a constant, 0.25 if distance is in feet or 0.075 if distance is measured in meters. Guide numbers express the relationship between object distance and aperture number. If a guide number is 88 for a given film speed/flash combination, the photographer can use an aperture of f /8 at the camera-to subject-distance of 11 ft or f /11 at 8 feet. At 16 feet, the aperture would need to be set two stops wider or f /5.6. The inverse square law governs this relationship. The law states that illumination increases or decrease as a function of the square of the distance
100%
Peak
50% Effective flash duration
0% Time (ms) Figure 5. Flash curve and time.
1349
(E = I/d2 ) from a subject to the point source of light. Guide numbers are approximations or initial exposure recommendations. The environment of the subject can alter the guide number, plus or minus, by as much as 30%. The type of electronic flash and the placement of the flash have a significant effect on the photographic image. Flash units mounted on the reflex prism of the camera are also on-axis (2° or less) with the lens. The combination of this location and the camera-to-subject distance is the source of ‘‘red-eye’’, the term given to the reflection of the flash by the subject’s retina. The reddish hue is caused by the blood vessels. This particularly unpleasing photographic effect is also accompanied by flat and hard lighting. Moving the flash head well off center and bouncing the flash off a reflector or nearby wall can provide more pleasing lighting. When shadowless lighting is desired, a ring flash can be used. A ring flash uses a circular flash tube and reflector that fits around the front of the lens. Ring flash systems generally have modeling lighting ability to assist in critical focusing. Scientific, technical, and forensic applications use this device. Single-lens Reflex Camera The single-lens reflex (SLR) camera is the most widely used miniature format, professional, camera system. Having supplanted the rangefinder as the dominant 35-mm system by the late 1970s, its distinctive characteristic is direct viewing and focusing of the image by a mirror located behind the lens and in front of the film gate. This mirror reflects the image formed by the lens to a ground glass viewing screen. The mirror is constructed of silveror aluminum-coated thin glass or metal. The coatings prevent optical distortions and increase the illuminance reflected to the viewing screen. The mirror is characterized by its fast return to its rest position after each exposure. The viewing screen is viewed through a viewfinder or pentaprism that laterally reverses the image. The image is in view in correct vertical and horizontal relationship to the subject. Mirror shape and length are two important design considerations that impact performance. Maximum reflectance depends on the intersection of the light path exiting the lens and mirror location. Mirrors can be trapezoidal, rectangular, or square and are designed to intersect the exiting cone of light best. Mirror length can impact the overall size of the camera housing and/or the viewing of the image. When the mirror is too short, images from telephoto lenses are noticeably darker at the top and bottom of the viewing screen. Overly long mirrors necessitate deep camera bodies or a lens system that uses a retrofocus design. Some manufacture’s use a hinged mirror that permits upward and rearward movement during exposure. Secondary mirrors that are hinged to the main mirror are used to permit autofocusing and exposure metering through the lens (Fig. 6). Almost all professional SLR systems have a mirror lockup feature that eliminates vibrations during exposure. This is very useful for high magnification photography where slight vibration can cause a loss of image quality, and for very long telephoto use at slow shutter speeds where a similar loss of image quality can occur. It is noteworthy to mention that long focal lengths (telephoto)
1350
STILL PHOTOGRAPHY
ant m e P ris p
(a)
Object
Eye piece
Ground glass Lens Mirror that rotates Axis
Film plane
Coincident image Eyepiece (viewfinder)
Mirror Figure 6. Lens, mirror, and pentaprism arrangement.
(b)
Object
from 150 to 2,000 mm are more accurately focused in a SLR system than in a rangefinder system. The Rangefinder In 1925, Leica introduced a personal camera system that established the 35-mm roll format and the rangefinder type camera as a professional photographic tool. The modern-day camera is compact and easy to focus, even in low light. It is compact and quiet because it does not require a reflex mirror mechanism, and it permits using a wide-angle lens of nonretrofocusing design. The rangefinder establishes the focal distance to the subject by viewing the subject from two separate lines of sight. These lines converge at different angles, depending on the working distance. At near distances, the angle is greater than at far distances. The subject is viewed through one viewing window and through a mirror that sees the subject through a second window. The distance between the two windows is the base length. By virtue of a sliding mirror or rotating mirror, the subject is viewed as two images that coincide on each other. The mirror or mirrors may be silvered or half silver and can be made to image the subject in two halves, one image half (upper) above the other image half (lower), or one image half alongside the other image half. This vertical or horizontal coincidence is the basis for focusing the camera lens. There are several design variations for constructing a rangefinder system. One variation maintains both mirrors in fixed position, and the appropriate deviation is achieved by the inserting a sliding prism in the light between the two mirrors (Fig. 7). Given that tan x = b/d, where b = base length of the rangefinder, d = distance of the object, and tan x is the angle of rotation of the mirror, it is obvious that a minimum rotation can accommodate focusing from near to distant. When the rangefinding system is mechanically coupled to the lens (cams and gears), visual focusing through the eyepiece of the viewfinder and lens focusing of the subject now are in unison. When the superimposed images of the rangefinder are coincident, the lens is correctly focused on the subject. Focusing accuracy depends on a number of factors: base length, focusing error tolerance, mechanical couplings, and
Sliding wedge
Eyepiece Figure 7. Drawing of rangefinder types.
image scale of reproduction in the viewfinder. Accuracy of focus is defined by the limits of the eye (using a standard of 1 of arc) as an angle. Because the acuity of the eye can be influenced by a magnifier in the eyepiece of the optical viewfinder, rangefinder error is described as 2D2 a/Rb, where Rb is the scale of reproduction × base length, D is distance, and a is the angle of 1 of arc. Rb is usually referred to as true base length. Therefore, if the base length of a rangefinder camera is 150 mm and the scale of reproduction is 0.8, the true base length is 120-mm. Note that the scale of reproduction in all miniature cameras and smaller formats is less than one. This is necessary to permit the rangefinder image in the viewfinder window. There are some systems that permit attaching a scale of reproduction larger than one. This extends the true base length without expanding the physical dimensions of the camera. Telephoto lenses can focus on a subject less than 5 feet from the camera. The coupled rangefinder permits focusing a 50-mm lens on a 35-mm camera from infinity to 30 inches (0.75 m). However, if the focusing error is too
STILL PHOTOGRAPHY
great, it will exceed the DOF for a given lens, aperture, and distance. The appropriate base length has a practical limit because there is a maximum focal length for a given base length. The base length becomes unwieldy, or the camera would require large magnification viewfinders for focal lengths in excess of 150 mm in a 35-mm system. This is expressed in the formula Rb = focal length squared/f number times C, where C is the permissible circle of confusion in the negative. At the end of the 1900s, the rangefinder camera had resurgence in the issuance of several new 35-mm systems, in the marketing of a large number of new APS (Advanced Photo System, a sub 35 mm Format) roll film cameras, and in the establishment of ‘‘prosumer’’ digital cameras. Medium Format Cameras These cameras are chiefly considered professional by virtue of their extensive accessories, specialized attachments, and their larger film format. The larger film format has distinct advantages in that the magnification used to reach final print output is generally less than that for the smaller format. Consequently, microimage characteristics of the film that may detract from overall print quality (definition) are limited by the reduced magnification required. The film format is generally unperforated roll type in sizes of 120 or 220, but specialized cameras in this classification using 120 roll film can produce images that are 2 14 × 6 34 inches to 6 × 4.5 cm. Other film formats derived from 120 or 220 film are 6 × 6 cm, 6 × 7 cm, 6 × 8 cm, 6 × 9 cm and 6 × 12 cm. Seventy-millimeter perforated film stock used in long roll film-magazines is considered medium format. Two-twenty film essentially provides for double the number of exposures as the 120 film and has us backing paper. Nonperforated 120 roll film is backed by yellow and black paper. Exposure numbers and guide markings printed on the yellow side are visible through a viewing area (red safety window) on the camera back. When there is no viewing window, the guide marks are aligned with reference marks on the cameras’ film magazine, the film back is shut, and the film is advanced forward by a crank until the advancing mechanism locks into the first frame exposure position. The film is acetate approximately 3.6 mils thickness and must be held flat in the film gate by a pressure plate similar to that employed in 35-mm cameras. Medium format cameras are generally SLR types, but rangefinder types and the twin lens reflex are also available. Many of the available medium format camera offer only a few lenses, viewfinders, and interchangeable back, and a few offer complete and unique systems that provide all of the advantages of a large format camera within the smaller and more transportable design of the medium format camera. More recently introduced in the 2 14 s × 2 14 -inch format are systems that provide tilt and shift perspective control, a modified bellows connection between the front lens board and the rear film magazine, and interchangeable backs for different film formats and for Polaroid film. In some cameras, the backs can rotate to either portrait or landscape mode. These backs protect the film by a metal slide that is placed between the film gate and the camera body. The slide acts as an interlock, thereby preventing exposure or accidental fogging of the film when in place.
1351
This feature also permits exchanging film backs in mid roll. Other forms of interlocks prevent multiple accidental exposures. The twin-lens reflex camera is a unique 2 14 -inch square, medium format camera design. This design permits viewing the image at actual film format size. The image is in constant view and it is unaffected by the action of the shutter or advancing the film. However, its capabilities are affected by parallax error when it is used for close-up photography. The viewing lens of the camera is directly above the picture-taking lens. The mirror that reflects the image for viewing to a ground glass focusing screen is fixed. The camera is designed to frame the subject from a low viewpoint, generally at the photographer’s waist level. The shutter is located within the picture-taking lens elements and is typical of other medium format shutter mechanisms. Shutters All types of shutter mechanisms may be found in all format cameras. Typically, one type of shutter may be mostly associated with a specific format camera. Focal plane shutters are found mostly in miniature format cameras, although some medium format cameras use this design. Likewise, leaf shutters, so-called between the lens shutters because of the leaflike design of their blades, are used mostly in medium and large format cameras, although they are found in some fixed-lens 35-mm cameras and in smaller format cameras as well (Fig. 8). Specialized shutters such as revolving shutters that are generally considered typical in aerial technical cameras have been adapted to 35-mm systems. Leaf shutters are located between lens elements or behind the lens itself. Ideally, the shutter should be at the optical center of the lens, parallel with the diaphragm. The blades of the shutter can be made to function as an iris diaphragm as well. The shutter and aperture are mostly part of one combined mechanism, the compound shutter, exemplified by manufacturers such as Compur, Copal, and Prontor. These shutters are numbered from 00 to 3 and reflect an increasing size that is necessitated by the increasing exit pupil diameter of the lens. For the most part, these lenses are limited to a top shutter speed of 1/500th of a second. X synchronization is available at all shutter speeds. When fired, the shutter opens from the center to the outer edges of the lens. Because the shutter does not move infinitely fast, the center of the aperture is uncovered first and stays uncovered the longest. The illuminance changes as the shutter opens and closes. Consequently, if exposure times were based on the full travel path of the shutter blades, the
Leaf shutter opening
Enlarged leaf shutter closed
Figure 8. Illustration of leaf design.
1352
STILL PHOTOGRAPHY
resulting image would be underexposed. To correct for this, the shutter speed is measured from the one-half open position to the one-half closed position. This is known as the effective exposure time. Note that small apertures are exposed sooner and longer than larger apertures. Effective exposure time is longer for smaller apertures. This poses no problem for exposure at slower shutter speeds; however, as shutter speeds increase and apertures decrease, exposures move in the direction of overexposure. In a test of a Copal shutter, a set shutter speed of 1/500th of a second and an aperture number of f /2.8 produced a measured exposure time of 1/400th of a second. This difference between the set exposure time and the measured exposure time was minimal, less than one-third of a stop. However, when the diaphragm was stopped down to f /32 and the 1/500th of a second shutter speed was left untouched, the measured shutter speed was 1/250th of a second. This difference is attributed solely to the fact that the smaller aperture is left uncovered for a longer time. This one stop difference is significant (Fig. 9). Between the lens shutters increase the cost of lenses for medium and large format cameras due to the builtin shutter for each lens. These shutters may be fully mechanical and have clockwork type gears and springs or may be electromechanical, using a battery, resistorbased timing circuits, and even quartz crystals timers. Resistors can be coupled in circuit with a photocell, thereby creating an autoexposure system where the shutter speed is determined by the amount of light sensed by the photocell relative to the aperture setting, that is, aperture priority. Focal plane shutters have a distinct advantage because they are part of the camera. The cost of lenses does not reflect the need for shutters to be incorporated. Higher shutter speeds are possible, 1/8,000th of a second, and advanced electronics can synchronize flash to shutter speeds as fast as 1/300th of a second. The heart of modern focal plane shutter design is the Copal vertical travel
Shutter constant at 1/500th second 100%
50
0 Milliseconds Total time of 4 ms at aperture f /32 and 2.5 ms at aperture f /2.8 Figure 9. Oscilloscope trace of two apertures and fast shutter speed and an example of effective exposure time.
shutter. Horizontal travel shutters are still used. Focal plane shutters derive their name from the fact that they are located close to the focal plane at the film or detector. This closeness avoids effective exposure time problems for small apertures as happens with between the lens shutters. Historically, the shutter was a slit in a curtain that was comprised of a leading and trailing edge; the film was exposed by the action of the slit scanning the film. The curtain traveled horizontally. Exposure time is the result of the width of the slit in the curtain divided by the velocity of the traveling curtain. The slit used for the exposure may in fact not be a slit in a curtain, but rather a set of titanium or metal alloy blades or the travel difference between two metallic curtains or blinds. High-speed shutters require blades made of durable and lightweight materials such as carbon fibers, and they require complex systems to control vibrations (shutter brakes and balancers) and to prevent damage to the blades. State-of -the-art shutters may include a self-monitoring system to ensure accuracy of speed and reliability. These systems are made even more complex by electronic components that are used to control exposure. The Hasselblad 200 series camera systems offer electronically controlled focal plane shutters that permit exposure times from 34 minutes to 1/2,000th of a second and flash synchronization up to 1/90th of a second. These electronic shutters permit aperture priority mode exposures that couple an atypical shutter speed (e.g., 1/325th of a second) to a selected aperture. Normal shutter speeds are established on a ratio scale whereby the speeds increase or decrease by a constant factor of 2. Unique to focal plane shutters is the capability of metering changing illuminance values for exposure control off-the-film plane (OTF). Focal plane shutters are identified mostly with a professional, 35-mm system and in rarer medium format systems. They are still rarer in large format camera systems but are found in barrel lens camera systems. Large Format Cameras The view camera, a large format camera, is the most direct approach for producing photographic exposures. A view camera is essentially a lens mounted on a board connected by a bellows to a ground glass for composing the image. The frame is further supported by using a monorail or for a field or technical camera, by folding flatbed guide rails. These field cameras may use rangefinder focusing in addition to direct viewing or an optical viewfinder. The view camera is capable of a series of movements. The lens plane and film plane can move independently. These movements are horizontal and vertical shifts for the lens and film planes, forward and rearward tilting of the lens and film planes, clockwise and counterclockwise swinging of the lens plane about its vertical axis, and clockwise and counterclockwise swinging of the film plane about its vertical axis. These movements control image shape, sharpness of focus, and location of the image on the film plane. Simply shifting the film plane or lens plane places the image appropriately on the film. This movement can avoid the need to tilt the camera (upward/downward) to include the full object. Because the film and lens planes can be shifted independently, a shift of the lens plane in one direction
STILL PHOTOGRAPHY
is equivalent to shifting the film plane an equal distance in the opposite direction. This type of shift, as well as the other movements, functions purposefully as long as the diagonal of the image formed by the lens covers the diagonal of the film plane format. The projection of the image circle in which sharpness is maintained is known as the circle of good definition. The circle of good definition increases in size when the lens to film-plane distance is increased to maintain good focus. Stopping down, that is, reducing the aperture, can increase the circle of good definition. The larger the circle of good definition, the greater the latitude of view camera movement. Often the angle of coverage is used as another measure of the covering power of the lens. The angle of coverage is unaffected by changes in the image distance. Movements can be done in tandem. When shifting the lens plane to include the full object being photographed, the shift may not be sufficient. Tilting the lens plane back and tilting the film plane forward can position the image within the area of good definition. Swinging the film plane around its vertical axis, so that it parallels the object being photographed, can eliminate the convergence of horizontal lines. This has the effect of making the scale of reproduction equal at the ends of all horizontal lines. This image effect is explained by the formula, R = V/U, where R is the scale of reproduction, V is the image distance, and U is the object distance. Therefore, given an object, where the two ends of its horizontal lines are at an object distance of 20 feet(left side) and 10 feet (right side), the image distance V must be adjusted by swinging the film plane so that Vr /Ul is equal to Vl /Ur . Swinging the lens plane will not produce the same effect because Ur is increased and Vl is increased; consequently, the ratio V/U is the same, and R is constant (Fig. 10). Film plane movements can be used to improve the plane of sharp focus. However, when using film-plane swing capabilities to control image shape, the lens plane swing must be used to control sharp focus because adjusting the film plane around its vertical axis affects sharpness and shape simultaneously. Swinging the lens plane does not affect image shape. Focus control by the lens plane is limited by the covering power of the lens. Movement of the lens plane moves the image location relative to the film plane. Excessive movement can adjust the image
(a)
beyond the circle of good definition and beyond the circle of illumination, that is, optical vignetting. An oblique pencil of light is reduced in illumination compared to an axial beam from the same source. Physical features of the lens, such as lens length, can further impact optical vignetting. Vignetting can be reduced by using smaller apertures, but some loss of illumination will occur as a matter of course, natural vignetting, because the illumination falls off as the distance from the lens to the film increases. Illumination is inversely proportional to the square of the lens to film distance, cos4 law. It has been calculated that the cos4 law affects a normal focal length lens that has an angle of view of 60° , so that there is a 40 to 50% loss of illumination at the edges of the film plane. For lenses of greater angle of view such as wide angle lenses, a 90° lens could have as much as a 75% loss of illumination at the edges. Reverse telephoto lens, a design that permits a greater lens to film plane distance than a normal lens design for the same focal length. The swings, shifts, and tilts achieved by the view camera provide a powerful tool for capturing a sharp image. When these tools are unavailable, the photographer can focus only at a distance where DOF can be employed to achieve overall acceptable sharpness in the negative. The relationship of the lens to film-plane distance expressed by the formula 1/f = (1/U) + (1/V) determines that objects at varying distances U are brought into focus as the lens to film-plane distance V is adjusted. The view camera that has independent front and back movements may use either front focusing when the lens to film-plane distance is adjusted by its movement or rear focusing when the film plane is moved closer to or further from the lens. Back focusing controls image focus, and front focusing alters image size as well as focus. It can be seen from the previous formula that U and V are conjugates and vary inversely. Back focus is required when copying to form an image of exact size. The plane of lens, the plane of the film, and the plane of the object are related by the Scheimpflug rule (Fig. 11). When the object is at an oblique angle to the film plane, the lens plane can be swung about its vertical axis or tilted around its horizontal axis, so that the three planes, object, film, and lens planes, meet at a common line. When the lens plane is parallel to the object plane, the film plane is swung or tilted in the
(b)
X
VL
UL
1-1 0
0
I
X
V
U
View camera, not parallel to object
1353
VR
UR
View camera parallel back to object U L/VR: U R/VL = 1:1 Figure 10. Diagram of the film plane swing.
1354
STILL PHOTOGRAPHY
Object
Common line
Back-plane Figure 11. Illustration of the Scheimpflug rule.
opposite direction to achieve the three-plane convergence. The correct order of action is to adjust the back-plane to ensure correct image shape and then to adjust the lens plane to ensure sharpness. It is apparent that the image plane has different foci, left to right, when swung, and top to bottom when tilted. DOF must be used to further the creation of an overall sharp image. There are limitations in using DOF. DOF calculations depend on the formula C = f 2 /(NH). Both f , focal length, and N, aperture number, alter the DOF. Doubling the size of the permissible circle of confusion would require a change of one stop of aperture. Depth of field is directly proportional to f -number. DOF increases as the object distance U increases, and it is expressed as D1 /D2 = (U1 )2 /(U2 )2 ; this is conditional on the caveat that the hyperfocal distance does not exceed the object distance. DOF increases as focal length decreases for a given image format. When comparing the DOF for two lenses for the same image format, the DOF ratio is equal to the focal length ratio squared. Lenses and Image Forming Principles Focal length is defined by the basic formula 1/f = (1/U) + (1/V). This is the foundation for a series of equations that describe basic image formation. I (image size) is equal to O (object size) × V/U, except for near objects less than twice the focal length. Practically, I = O × f /U, I/O = f /U, and the scale of reproduction R is equal to f /U. R is determined by focal length for any given distance, and for a specific focal length, R is determined by distance. Focal length determines the size of the image for an object located at infinity for any given film/detector size. The measured distance between the lens and the focused image is expressed as focal length. In camera systems where the lens is focused by varying the location of an element within the lens, the focal length is dynamic. The ability of the lens to project a cone of light of differing brightness is a function of the aperture control or iris diaphragm. The ratio of the focal length to the maximum diameter of the diaphragm (entrance pupil) is the lens’ f -number (f #). f# = focal length / D, where D is the diameter of the entrance pupil. f -numbers are a ratio scale, 1 : 1.4, and the
image illuminance changes by a factor of 2. In photographic parlance, this factor of change is referred to as a stop. The smaller the f #, the relatively brighter the image. The intensity of the image will be less than the intensity of the light falling on the lens. The transmission of the reflected light depends on a number of lens properties; absorption and reflection factors. When the focal length of the lens equals the diagonal of the image format of the camera, the focal length is considered ‘‘normal.’’ The normal focal length lens has an angle of view of approximately 47 to 53° that is akin to the angle of view of the human eye. The 50-mm lens is the ‘‘normal’’ standard for 35-mm format photography. Medium format lenses have been standardized at 75 or 80 mm, and 4 × 5-inch cameras have a standardization range between 180 and 210 mm. There is a difference between the calculated normal lens determined by the diagonal of the film format and those that are actually found on cameras. The actual normal lenses are those of a longer focal length than would be required by the length of the diagonal. Wide-angle lenses are characterized by the larger angle of view. The focal length of these lenses is much less than the diagonal of the image format that they cover. Because of the short lens to focal plane distance, problems of camera function may occur. In the SLR camera, the mirror arrangement may be impeded by using short focal lengths, and in view cameras, camera movement may be hindered. Reverse-telephoto wide-angle designed lenses (retrofocus) overcome such problems. The design requires placing a negative element in front of a positive element, thereby spacing the lens at a greater distance from the image plane. When the focal length of a lens is much greater than the diagonal of the film, the term telephoto is applied. The angle of view is narrower than the normal focal length. The telephoto design lens should not be confused with a long focus lens (Fig. 12). In the telephoto design, the placement of a negative element/group behind the positive objective brings the cone of light to a focus as if it had been from a positive objective of greater focal length. The back nodal plane is now located to give a shorter lens to film distance than that of a lens of normal design for the same focal length. The opposite of this is short focal length wide-angle designs by which the distance is increased when a negative element is placed in front of the objective. This greater lens to film distance permits full use of the SLR mirror and more compact 35-mm camera designs. Macro lenses primarily used in scientific and medical photography have found their way into other photographic venues. This has been made possible by the availability of ‘‘telemacro’’ lenses. These are not true macro lenses;
Axis
Focal plane
Focal plane
BFD Lens
f1
Normal design lens
BFD
f1 Telephoto design Figure 12
STILL PHOTOGRAPHY
although they allow a close working distance, the image delivered is generally of the order of 0.5 magnification. True photomacrography ranges from 1 to 40×. The ‘‘telemacros’’ and some normal macros permit close-up photography at a working distance of approximately 3 to 0.5 feet and at a magnification of 0.10 to 1.0. These are close-up photography lenses, although they are typically misnamed macro-zoom lenses. Close-up photography requires racking out the lens to achieve a unit of magnification. The helical focusing mount limits the near focusing distance, so that other alternatives must be found to extend the lens to film distance, increase magnification, and shorten working distance. Many of these close-up lenses require supplemental lenses or extension tubes to achieve magnification beyond 0.5×. The classic Micro Nikkor 55-mm lens for the 35-mm format can magnify by 0.5×. An extension tube permits the lens to render 1× magnification. The 60-mm, 105-mm, and 200-mm Micro Nikkor achieve 1 : 1 reproduction without extension tubes. Long focus length macro lenses for miniature and medium format cameras require supplemental lenses to achieve magnification up to 3×. Positive supplemental lens focal lengths are designated in diopters. Diopter (D) power can be converted into focal length by the formula f = 1 (meter) /D. It is a practice to add supplemental lenses to each other to increase their power and increase magnification. The useful focal length is now the sum of all of the focal lengths in the optical system. The formula 1/f = (1/f1 ) + (1/f2 ) + (1/fn ) expresses this. When used with rangefinder cameras or TLL reflex cameras, an optional viewfinder that corrects for parallax must be used. The working distance relationship to the focused distance is determined by the formula uc = ufs /(u + fs ); where uc is the close-up working distance, fs is the focal length of the system, and u is the focused distance of the main lens. Extension tubes are placed between the lens and the camera body and may be replaced by a bellows that provides variable magnification. The bellows system offers continuous magnification, an option to attach different lenses to the bellows to achieve different results, and the ability to use a reversing ring that reverses the lens position so that the front element faces toward the camera. As for lenses that use an internal floating element/group to achieve increased magnification, the bellows attachment makes good use of TTL metering for optimum exposure control. Autoexposure Automated systems for exposure and focusing are the hallmarks of modern camera systems. Photographic exposure is defined as H = E × T, where H is meter candle seconds (log10), E is illuminance in meter candles, and T is time in seconds. Autoexposure systems are designed to determine the optimum H, range of apertures, and choice of shutter speed for a given film speed. When the aperture is predetermined by the photographer, the camera’s autoexposure system will select the appropriate shutter speed. When the image at a selected speed may show camera shake, a warning signal may occur, or a flash is activated in those cameras that incorporate built-in-flash. This aperture priority system is found in
1355
many autoexposure cameras. The nature of autoexposure depends on the calculated relationship between the luminance of the subject and the sensitivity of the film. Film sensitivity is defined by the International Standards Organization (ISO). ISO has two parts to its designation, or Deutsche Industrie Normale (DIN) and Arithmetic (ASA). Both designations are used to represent the same minimum exposure necessary to produce a density level above the base + fog of the developed film for a given range of exposure. The relationship between the two components is described by the formula log(ASA) × 10 + 1. Thus an ASA of 100 is also a DIN of 21, [log(100) = 2 × 10 + 1 = 21]. The most advanced autoexposure systems measure the subject luminances passed by the lens aperture and determine the shutter speed. Conversely, a shutter speed may be set, and the aperture of the camera would be automatically determined. Illuminance measured by a photometer located within the light path projected by the lens. Because focusing and viewing are done through a wide open aperture and metering is for a working aperture, a stop-down method is required, if a full aperture method is not offered. Off-the-film metering is a very useful stopdown approach. A number of devices from secondary reflex mirrors, beam splitters, prisms, multiple (segmented) photocells, and specially designed reflex mirrors allow a photocell to measure the light that reaches the film plane. The ideal location of the cell for measurement is the film plane. Photocell choices for measurement have specific advantages and certain disadvantages. For example, a selenium photocell does not require a battery, is slow, and has low sensitivity, but its spectral sensitivity matches that of the eye. A gallium arsenic phosphide cell requires a battery and an amplifier but is fast and very sensitive to low light. Its spectral sensitivity is limited to the visual spectrum. The calculations of exposure depend on the assumption that the luminance transmitted by the lens is 18%, the integrated value of an average scene. This is not always the case. Consequently, in camera metering systems apply certain patterns that vary the nature of the calculation for exposure. The patterns are selectable and can cover the range from a 3° spot to an overall weighted average. These metering approaches are found in miniature and medium format cameras; however, view cameras can use direct measurement by a special fiber optic or other types of probes directly on the ground glass. Without such a device, large format camera users must resort to handheld meters. Unique to the handheld meter is the incident or illuminance meter. An integrating, hemispheric diffuser covers the photocell. The meter is
Eight segment meter cell pattern, pentax w/spot
Nikon eight segment with spot metering
Figure 13. Illustration of metering patterns.
1356
STILL PHOTOGRAPHY
held at the subject and aimed at the camera. It is assumed that the subject is not very dark or light. The meter is designed on the assumption that the subject approximates a normal range of tones. When the subject is very dark or very light, exposure must be adjusted by one-half to one stop. The location of the light source is also of importance. Studio lighting close to the subject requires that the photographer compensate for any loss of luminance that results from a increase in the subject to light source distance, following the inverse square law. Handheld meters do not compensate for other photographic exposure influences such as reciprocity law failure or the use of filters for contrast or color control; nor can handheld illuminance meters be used for emitting sources. The relationship among the various factors, film speed, shutter speed, f#, and illuminance is expressed in the formula, foot candles = 25 (a constant) × f #2 /(arithmetic film speed × shutter speed). It is obvious that a measurement of the illuminance in foot candles can be used in the previous formula to solve for f # or shutter speed. Autofocusing Coupled with autoexposure systems are advanced autofocusing systems. Autofocusing can be applied to a number of photographic systems. Likewise, a number of approaches can be used for autofocusing. Phase detection, sound ranging, and image contrast comparisons have been used in different camera systems. Electromechanical coupling racks the lens forward or rearward for correct exposure. By using a linear array containing up to 900 photosites, adjacent photosites on the array are compared. Distance may be measured by correlation based on the angle subtended between the two zones. Multiple zone focusing in which a number of fixed zones are preset for focusing distances from infinity to less than 2 ft are found in a number of current prosumer digital cameras and nonprofessional miniature cameras. Other similar camera systems offer infrared (IR) focusing. IR focusing involves scanning an IR beam emitted through the lens by using a beam splitter. The return IR reflection is read by a separate photo array through a nearby lens. This array sets the focus. IR beamsplitting focusing fails for subjects that have very low or very high IR reflectance. An IR lock can be set when one is photographing through glass. If the camera is equipped with autozooming, the IR diode detector can drive the zoom and maintain constant focus at a given magnification. The use of CCD arrays, photodiodes, and electronic focusing controls (micromotors) is made possible by incorporating of high-quality, miniaturized analog–digital circuitry and attendant CPUs and ROM chips. Such advanced technology permits focusing by measuring image contrast or phase difference. The state-of-the-art Nikon F-5 and its digital equivalent use a phase detection system. This system has a specifically designed array. Phase or image shift is the measured illuminance exiting from the pupil in two distinct zones. Two images are projected to the focal plane, and their displacement is measured. This is very much like a rangefinder system, but instead of a visual split image being seen by the photographer, a digital equivalent is detected by the CCD array (Fig. 14).
Autofocus array CCD
Spot 150 CCDs
50 CCDs Figure 14. Nikon autofocus arrays.
CAMERA FILM A light-sensitive material that upon exposure creates a latent image whose susceptibility to development (amplification) is proportional to the exposure received. Camera film is made of a colloidal suspension commonly referred to as an emulsion. A polyester, acetate base, or other substrate is coated with a suspension of a compound of silver and one or more halides, as well as other addenda such as spectral sensitizing dyes. A film may have several different emulsions coated on it. The difference in emulsion gives rise to the characteristics of the film. The pictorial contrast of a film is the result of coating the substrate with a number of emulsions that have varying sizes of silver halide grains. Silver halides, denoted AgX, can be made of silver (Ag) and any or all of the three halides bromide, iodine, and chloride. The combination of three halides extends the spectral response of the film beyond the film’s inherent UV–blue sensitivity. The other two halides, astatine and fluoride, are not used because of either radioactivity or water solubility. The size of the grain, a microscopic speck of AgX, is the primary determinant of speed or the sensitivity of the film: the larger the grain, the greater the response to a photon. Other addenda are added to increase the speed of the film further. Photographic speed (black-and-white) is determined by the formula ASA = 1/Hm × 0.8, where HM is the exposure in luxseconds that produces a density of 0.1 above base + fog and 0.8 is a safety factor to guard against underexposure. Overall, films exhibit a variety of properties such as speed, spectral sensitivity, exposure latitude, and a unique characteristic curve response (nonlinear) to exposure and development combinations. The overall film’s image quality is referred to as definition. Definition is comprised of three components: resolution, graininess, and sharpness. Black-and-white films (panchromatic) that are used in pictorial photography can be made of three to nine emulsion layers, have an exposure latitude of 1,000 : 1, and a spectral sensitivity from UV to deep red. Black-and-white pan films are sensitive to all visible wavelengths as various tones of gray. They record color information from the object in terms of luminance and chromaticity. Pan films can be designed to limit their spectral response only to the UV and IR portions of the spectrum (Fig. 15). Color films are essentially monochrome emulsions that record the blue, green, and red record of a object or scene on three discrete emulsions that are superimposed, a ‘‘tripack.’’ Development causes these layers to hold the
STILL PHOTOGRAPHY
Hardness Antistatic Antiscatter dye Wetting agent Antihalation layer
1357
Topcoat Emulsion
AgX Au2 Dye addenda
Base Noncurl coat
image as layers of yellow, magenta, and cyan dyes from which a positive or full color print may be made. Negative color film requires reexposing the negative to a similar negative emulsion coating on paper. This negative-tonegative process produces a full color positive image. If the film selected for exposure is transparent, then the image captured after processing will be positive, that is, the colors of the image will be the same as the colors of the object.
Figure 15. Cross section of panchromatic film.
up the curve is greater than that toward the base+fog region. The exposure latitude of negative films is much greater than that of transparent film. The characteristic curve that illustrates these relationships is not fixed but can exhibit different slopes or measures of contrast, depending on development factors such as time, agitation, dilution, and type of developer. The sensitometric studies and densitometric plottings graphically illustrate for the photographer the possible outcomes of film and developer combinations and their exposure latitude.
Density The most useful approach to determining the effect of exposure and development on a given film is measuring the film’s density. The nature of film and light produce three interactive effects during exposure: scattering, reflection, and absorption. Film density is the result of these three exposure outcomes, and it is defined as −log(1/T). T is transmittance, the ratio of transmitted light to incident light. Density can be measured and provides a logarithmic value. The relationship between density, as an outcome of photographic exposure and development, is graphically expressed in the D–log H curve. The curve typically illustrates four zones of interest to the photographer. The base + fog region, the toe, the straight line, and the shoulder region are the four sections that provide useful information about exposure and development. The aim of exposure is to produce a shadow detail of the subject as a density in the negative that has a value of 0.1 above the density of the film’s base+fog. The midtone and highlight reflectances of the scene follow the placement of the shadow on the curve. This correct placement or exposure permits proper transfer of these densities in the negative to paper during the printing process. Because a film’s density range can greatly exceed the capacity of paper’s density range, the correct placement of shadow (exact exposure) is the fundamental first step in tone reproduction. Underexposure results in a lower than useful density in the shadow detail. Graphically, this would place the shadow density below 0.1 and possibly into the base+fog of the film. Consequently, no tonal detail would be captured. In a severe overexposure, the shadow detail would be placed further to the right, up the curve. Though shadow detail would still exist further up the curve and greater in density, it is quite likely that the highlights would move to the shoulder zone. Loss of highlight detail would result. The permissible error range between the shadow–highlight shift is known as exposure latitude. Consider that a subject’s luminance range is 1 : 160. Its log luminance is 2.2. If the film-developer’s exposure range is log 2.8, the difference of 0.6 is the exposure latitude or two stops. This margin for adjusting exposure or for error is not equidistant on the curve. The two stops of adjustment are primarily in favor of overexposure because the distance
MEASURES OF IMAGE QUALITY The overall measure of image quality is termed definition. It is comprised of three major components: graininess, sharpness, and resolution. The transfer of tonal information from the original scene or subject to the negative and through to the final print is of great importance, but the lack of definition in whole or part can contribute to an overall bad image. Sharpness and resolution are attributes of the optical system and the detector. Graininess (noise) is a characteristic of the detector. The quality of an edge can be described as a subjective evaluation of sharpness. When measured on a microlevel as a change of density across an edge, it is known as acutance. The rate of change of density across the edge, or slope, determines the image’s appearance of sharpness. Many factors can influence sharpness. Imprecise focusing, low contrast, poor optics, camera vibrations, and bad developing techniques can result in loss of sharpness. After exposure, silver halides are transformed during development into silver grains whose structure and size change. The increase in size causes the grains to overlap and clump into an irregular pattern that is detectable at higher magnifications, such as enlargement printing. This noticeable pattern is referred to as graininess. It is not objectionable in its own right, but it can obfuscate detail. As density increases in the negative, the perception of graininess decreases. Graininess is most visible in the midtone region. It is inherent in the material and cannot be eliminated simply. Options to minimize graininess are to use film sizes that require minimum magnification and print on lower contrast or matte paper. A microdensitometer can measure graininess and provide a trace of density fluctuations across distance. This granularity is considered a measure of standard deviation around the mean or average density measured. Because the standard deviation is the root mean square, this measure is known as rms granularity. Manufacturers’ measures of rms correlate well with perceptional graininess, but these measures of granularity do not correlate well among various manufacturers.
1358
STILL PHOTOGRAPHY
Resolution is the ability of the optical and detection systems to reproduce fine detail. All of the components in the imaging system combine to produce an overall measure of resolution known as resolving power. Resolving power is expressed as 1/RS = 1/RL + 1/RD . Greater accuracy can be achieved by taking the second moment, that is, 1/(RS )2 = 1/(RL )2 + 1/(RD )2 . Every component in the system contributes to the overall measure, and this measure cannot be higher than the lowest component. Overall photographic definition describes the total image and can consist of many isolated measures that affect the image. Such measures are the point-spread function, as indicated by the size of the Airy disk or diffraction effects, the emulsion spread function of a particular film, and the line spread function that measures the ability of the image to separate adjacent lines in the image. It would be onerous for the photographer to collect various measures and attempt to correlate them. There is an overall measure made available by manufacturers that eliminates such a task. The modulation transfer function (MTF) represents the overall contrast transfer of the object to the image (Fig. 16). If the contrast of the object were to be matched totally by the image, the transfer would be 100%. All detail or frequencies of the object would be maintained at a 1 : 1 contrast ratio regardless of a change in the finest details or frequencies. Modulation is determined as MO = Emax − Emin /(Emax + Emin ) and MI = Emax − Emin /(Emax + Emin ), therefore, Mimage /Mobject . Individual MTF measures for various imaging components can be multiplied to produce one MTF factor for the system. DIGITAL PHOTOGRAPHY The application of CCD or CMOS detectors in place of film at the plane of focus has quickly changed photography. As the resolving power of the detectors has improved and the inherent firmware in the digital camera/back has improved its algorithms for image reconstruction, the usefulness and availability of digital hardware has increased as well. The basic principles of image formation, lens types, and image quality also hold true for digital imaging. The creation of binary image data that are easily manipulated by computer-based software can take enormous advantages of digital pictures. Postexposure photographic data can be eliminated, improved, edited, or added. Images may be sent directly to video monitors, satellites, and remote sites or may be printed on hard copy via a number of devices that do not require any darkroom or projection device. Ink-jet printers, thermal dye imagers, and other devices can produce images that are virtually indistinguishable from traditional photographic images. The crossover to
Modulation transfer curves 100%
Ideal
80 Film A 60 Film B
40 20 0 5
10 15 Frequencies
20
Figure 16. Modulation transfer function.
digital from analog in the professional arena began with the photojournalist and soon extended into catalog photography. This was driven by improved quality, ease of application, and cost effectiveness compared to film. Publication (matrix reproduction) of images in magazines, newspapers, journals, and other media have become more digital than analog. Hybrid approaches that use film as the capture medium and scanners that convert the image to digital data have almost totally ended the use of the process camera in the printing industry. Large format scanning backs are readily available for the view camera. Medium format camera manufacturers provide digital back options for all major systems. Surveys of digital camera sales at the consumer and professional level show a steadily upward trend. The indications are that digital photography will not disappear and may be the preferred method of basic still photography. BIBLIOGRAPHY 1. C. R. Arnold, P. J. Rolls, and J. C. J. Stuart, in D. A. Spencer, ed., Applied Photography, Focal Press, London, 1971. 2. M. J. Langford, Advanced Photography, Focal Press, London, 1972. 3. S. Ray, Applied Photographic Optics, Focal Press, Boston, 1994. 4. S. Ray, Camera Systems, Focal Press, Boston, 1983. 5. L. Stroebel, J. Compton, I. Current, and R. Zakia, Basic Photographic Materials and Processes, 2 ed., Focal Press, Boston, 1998. 6. L. Stroebel, View Camera Techniques, Focal Press, Boston, 1992. 7. The Encyclopedia of Photography, Eastman Kodak Co., Rochester, 1981.
T TELEVISION BROADCAST TRANSMISSION STANDARDS
ANALOG TELEVISION SYSTEMS Black-and-White Television
ALAN S. GODGER JAMES R. REDFORD
The purpose of all conventional broadcast television systems is to provide instantaneous vision beyond human sight, a window into which the viewer may peer to see activity at another place. Not surprisingly, all of the modern systems evolved to have similar characteristics. Basically, a sampling structure is used to convert a threedimensional image (horizontal, vertical, and temporal variations) into a continuous time-varying broadband electrical signal. This modulates a high-frequency carrier with the accompanying sound, and it is broadcast over the airwaves. Reasonably inexpensive consumer television sets recover the picture and sound in the viewer’s home.
ABC Engineering Lab 30 W/7 New York, NY
Since the invention of television, the images and sound have been captured, processed, transmitted, received, and displayed using analog technology, where the picture and sound elements are represented by signals that are proportional to the image amplitude and sound volume. More recently, as solid-state technology has developed, spurred primarily by the development of computers, digital technology has gradually been introduced into handling the television signal, both for image and sound. The digital electric signal representing the various elements of the image and sound is composed of binary numbers that represent the image intensity, color, and so on, and the sound characteristics. Many portions of television systems are now hybrid combinations of analog and digital, and it is expected that eventually all television equipment will be fully digital, except for the transducers, cameras, and microphones (whose inputs are analog) and the television displays and loudspeakers (whose outputs are analog). The currently used broadcast television transmission standards [National Television Systems Committee (NTSC), phase alternate line (PAL) and sequential and memory (SECAM)] for 525- and 625-line systems were designed around analog technology, and although significant portions of those broadcast systems are now hybrid analog/digital or digital, the ‘‘over the air’’ transmission system is still analog. Furthermore, other than for ‘‘component’’ processed portions of the system, the video signals take the same ‘‘encoded’’ form from studio camera to receiver and conform to the same standard. The recently developed ATSC Digital Television Standard, however, uses digital technology for ‘‘over the air’’ transmission, and the digital signals used from the studio camera to the receiver represent the same image and sound, but differ in form in portions of the transmission system. This variation is such that in the studio, maximum image and sound information is coded digitally, but during recording, special effects processing, distribution around a broadcast facility, and transmission, the digital signal is ‘‘compressed’’ to an increasing extent as it approaches its final destination at the home. This permits practical and economical handling of the signal.
Image Representation. The sampling structure first divides the motion into a series of still pictures to be sequenced rapidly enough to restore an illusion of movement. Next, each individual picture is divided vertically into sufficient segments so that enough definition can be retrieved in this dimension at the receiver. This process is called scanning. The individual pictures generated are known as frames; each contains scanning lines from top to bottom. The number of scanning lines necessary was derived from typical room dimensions and practical display size. Based on the acuity of human vision, a viewing distance of four to six picture heights is intended. The scanning lines must be capable of enough transitions to resolve comparable definition horizontally. The image aspect ratio (width/height) of all conventional systems is 4 : 3, from the motion picture industry ‘‘academy aperture.’’ All systems sample the picture from the top left to bottom right. In professional cinema, the projection rate of 48 Hz is sufficient to make flicker practically invisible. Longdistance electric power distribution networks throughout the world use slightly higher rates of 50–60 Hz alternating current. To minimize the movement of vertical ‘‘hum’’ in the picture caused by marginal filtering in direct current power supplies, the picture repetition rate was made equal to the power line frequency. A variation of this process used by all conventional systems is interlaced scanning, whereby every other line is scanned to produce a picture of half the vertical resolution, known as a field. The following field ‘‘fills in’’ the missing lines to form the complete frame. Each field illuminates a sufficient portion of the display so that flicker is practically invisible, yet only half the information is being generated. This conserves bandwidth in transmission. For both fields to start and stop at the same point vertically, one field must have a half scanning line at the top, and the other field must have a half scanning line at the bottom of the picture. This results in an odd number of scanning lines for the entire frame. 1359
1360
TELEVISION BROADCAST TRANSMISSION STANDARDS
Mechanical systems using rotating disks that have spiral holes to scan the image were investigated in the 1920s and 1930s, but these efforts gave way to ‘‘all electronic’’ television. Prior to World War II, developers in the United States experimented with 343-line and 441line systems. Developers in Great Britain began a 405-line service, and after the war, the French developed an 819line system, but these are no longer in use. Synchronization. In most of North and South America and the Far East, where the power line frequency is 60 Hz, a 525-line system became the norm. This results in an interlaced scanning line rate of 15.750 kHz. The development of color television in Europe led to standardization of 625 lines in much of the rest of the world. The resulting line frequency of a 50 Hz field rate is 15.625 kHz. The similar line and field rates enable the use of similar picture tube deflection circuitry and components. Horizontal and vertical frequencies must be synchronous and phase-locked, so they are derived from a common oscillator. Synchronization pulses are inserted between each scanning line (Fig. 1) and between each field to enable the television receiver to present the picture details that have the same spatial orientation as that of the camera. The sync pulses are of opposite polarity from the picture information, permitting easy differentiation in the receiver. The line sync pulses, occurring at a faster rate, are narrower than the field sync pulses, which typically are the duration of several lines. Sync separation circuitry in the receiver discriminates between the two time constants. Sync pulses cause the scanning to retrace rapidly from right to left and from bottom to top.
Blanking. To provide time for the scanning circuits to reposition and stabilize at the start of a line or field, the picture signal is blanked, or turned off. This occurs just before (front porch) and for a short time after (back porch) the horizontal sync pulse, as well as for several lines before and after vertical sync. During vertical sync, serrations are inserted to maintain horizontal synchronization. Shorter equalizing pulses are added in the several blanked lines before and after vertical sync (Fig. 2). All of these pulses occur at twice the rate of normal sync pulses, so that the vertical interval of both fields (which are offset by one-half line) can be identical, simplifying circuit design. Additional scanning lines are blanked before the active picture begins; typically there is a total of 25 blanked lines per field in 625-line systems and 21 blanked lines per field for the 525-line system M. Modern television receivers complete the vertical retrace very soon after the vertical sync pulse is received. The extra blanked lines now contain various ancillary signals, such as for short-time and line-time distortion and noise measurement, ghost cancellation, source identification, closed captioning, and teletext. Fields and lines of each frame are numbered for technical convenience. In the 625-line systems, field 1 is that which begins the active picture with a half line of video. In the 525-line system M, the active picture of field 1 begins with a full line of video. Lines are numbered sequentially throughout the frame, beginning at the vertical sync pulse in the 625-line systems. For the 525-line system M, the line numbering begins at the first complete line of blanking for each field. Field 1 continues halfway through line 263, at which point field 2 begins, containing through line 262.
IRE Maximum chrominance excursions: +120 IRE
120 110 100
Maximum luminance level : 100 +0/−2 IRE
90 80 70 60 50 40 30 20
Horizontal blanking 10.9 ± 0.2 µs
at 20 IRE
Setup level = Picture black :7.5 ± 2 IRE
7.5 0 −10 Front porch Horiz. sync 1.5 ± 0.1 µs −20 4.7 µs −30 Start of ± 0.1 µs line at 50% −40 Sync level −40 ± 2 IRE 55
Active line time : 52.7µs
60 0 µs
5
Sync rise time: 10−90% = 140 ns ± 20 ns
Color black porch : 1.6 ± 0.1 µs Blanking level defines 0 IRE = 0V ± 50 mV Maximum chrominance excursions : −20 IRE Color burst : 5.3 ± 0.1 µs after sync leading edge. 9 ± 1 cycles @ 3.58 MHz (= 2.5µs), 40 ± 2 IRE P-P Breezway : 0.6 ± 0.1 µs Total line time 63.6 µs 10
15
20
25
30
35
40
45
Figure 1. The 525-line system M: Line-time signal specifications.
50
55
60
0 µs
Closed captioning
1361
Postequalizing pulses
−40
NABTS FCC MB
−30
NABTS
NABTS
−20
CC ONLY
0 −10
GCR ONLY
Preequalizing pulses
NABTS NTC-7/FCC Comp
Vertical sync pulse with serrations
20
Source ID
Intercast
IRE
7.5
Ghost cancellation
TELEVISION BROADCAST TRANSMISSION STANDARDS
0
GCR ONLY
NABTS
Burst blanking (9H)
7.5
NABTS FCC bars
NABTS
20
NABTS NTC-7 Comp
261 262 1&3 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 1 F Test, cue, control Vertical blanking interval = 19−21 lines VIRS permitted I Start of and ID E Max 70 IRE Telecommunications Max 80 IRE Vertical sync (9H) interval L fields D 261 262 263 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2&4
−10
Ghost cancellation
−30 −40
CC/extended data service (entire line may be used)
−20
Figure 2. The 525-line system M: Field-time and vertical interval signal specifications.
120
L.D. = 2% .5%
100 80
Kpb − +4% − − − − − − − − −5%
0% − − − − − − − − − − − −
12.5%
25%
60
37.5%
40
50%
20 − − − − − − − − −
7.5
(−)
−20 + −40
200 ns + −36 − − − −44 −
(+)
62.5%
87.5% +
Signal Levels. During blanking, the video signal is at 0 V, the reference used to measure picture (positive-going) and sync (negative-going) levels. Signal amplitudes are measured directly in millivolts, except that, because of changes made during the conversion to color, the 525line system uses IRE units. A specialized oscilloscope
1 3.58
100%
Figure 3. Typical NTSC-M waveform monitor graticule. A number of additional markings are for measuring various signal distortions.
is used to monitor the characteristics of the signal amplitude and period. The waveform monitor has its voltage scale calibrated in millivolts (or IRE units for 525-line applications), and its time base is calibrated to scanning line and picture field rates, as well as in microseconds (Fig. 3).
1362
TELEVISION BROADCAST TRANSMISSION STANDARDS
Originally, the 525-line system used an amplitude of 1 V peak-to-peak (p–p) for the picture information, and it used 0.4 V for sync. So that color information (modulated onto a subcarrier which can extend above peak white level) could be accommodated within the same dynamic range of existing equipment, the 1.4 V p–p scaling was compressed in amplitude to 1 V p–p. This created fractional voltage levels for peak white (714.3 mV) and sync (−286.7 mV), so a 1 V scale of 140 IRE units was adopted to simplify measurement. The 625-line standards did not have this historical complication. The peak white level is 700 mV, and sync level is −300 mV. Another anachronism of the 525-line system is the use of a direct-current (dc) offset of the picture black from the blanking level. This was done to ensure that during retrace, the electron beam in the display tube was completely cut off, so retrace lines did not appear in the picture. This setup level originally varied between 5 and 10 IRE units above blanking but was standardized at 7.5 IRE for color TV, although it has been discarded altogether in Japan. Setup, or ‘‘lift,’’ was used to some extent in earlier systems, but abandoned by the advent of 625-line services. The electrical-to-optical transfer characteristic (gamma) of the cathode-ray picture tube is nonlinear. Doubling the video signal level applied to the control grid of the picture tube does not cause the light output to double; rather, it follows a power law of approximately 2.5. To correct for this, the video signal itself is made nonlinear, and an opposite transfer characteristic is about 0.4. This correction is applied at the camera in all systems. Resolution. Resolution in the vertical direction is determined by taking the total number of scanning lines and subtracting those used for vertical blanking. This is multiplied by 0.7, a combination of the Kell factor, (a correction for slight overlap between adjacent lines) and an additional correction for imperfect interlace. By convention, television resolution is expressed in television (TV) lines per picture height, in contrast to photographic ‘‘line pairs per millimeter.’’ Because the resolution is fixed by the scanning system, picture size is immaterial. Note that a vertical ‘‘line pair’’ in television requires two scanning lines. To compute the bandwidth necessary for equal horizontal resolution, the vertical resolution is multiplied by the aspect ratio of 4 : 3 and is divided by the ratio of total scanning line time to active picture (unblanked) line time. This number is halved because an electric cycle defines a line pair, whereas a ‘‘TV line of resolution’’ is really only one transition. Multiplying the number of cycles per scanning line by the total number of scanning lines in a field and then multiplying the number of fields per second gives the bandwidth of the baseband video signal. Broadcasting Standards. The various systems have been assigned letter designations by the International Telecommunications Union (ITU). The letters were assigned as the systems were registered, so alphabetical order bears no relation to system differences (Table 1), but a rearrangement highlights similarities (Table 2).
Only scanning parameters and radio-frequency (RF) characteristics are defined; color encoding is not specified. Systems A, C, E, and F are no longer used. Portions of the very high frequency (VHF) and ultrahigh-frequency (UHF) RF spectrum are divided into channels for television broadcast. Modern channel assignments are 6 MHz wide in the Americas and the Far East. Elsewhere, they are generally 7 MHz in VHF and 8 MHz in UHF, and the carrier is a fixed distance from one edge. Because the picture carrier in most systems is near the lower edge and the audio signals are at the upper end, when the opposite is true, the channels are called inverted. As a bandwidth-saving technique, the amplitudemodulated RF signal is filtered so that only one sideband is fully emitted; the other sideband is vestigial, or partially suppressed, which aids in fine-tuning to the correct carrier frequency at the receiver. The full sideband, which represents the video bandwidth, extends in the direction of the audio carrier(s), but sufficient guard band is included to prevent interference. The bandwidth of the vestigial sideband varies among systems as does the placement of the audio carrier in relation to the picture carrier (Fig. 4). These factors complicate receiver design in areas where signals of two or more systems may exist. The main audio signal is sent via an amplitude-modulated or, more commonly, frequency-modulated carrier. Peak deviation in frequency modulation (FM) is ±50 kHz with 50 µm preemphasis, except ±25 kHz and 75 µs for systems M and N. Preemphasis for improving the signal-to-noise ratio is common in FM systems; it was used in some amplitudemodulation (AM) systems to simplify receivers that could accommodate both modulation schemes. Amplitude modulation is used in all systems for the video waveform, which, unlike audio, is not sinusoidal. The majority of systems employ a negative sense of modulation, such that negative excursions of the baseband signal produce an increase in the amplitude of the modulated carrier. This allows the constant amplitude sync pulses to serve as an indication of the received RF signal level for automatic gain control. Interfering electrical energy also, tends to produce less noticeable black flashes in the received picture, and the duty cycle of the signal is reduced, which consumes less power at the transmitter. Multichannel Sound Early attempts to provide stereo sound for special TV events involved simulcasting, whereby an FM radio station in the same coverage area broadcast the stereo audio for the program. Due to the high profitability of FM radio today, this scheme is becoming impractical. For the 525line system M that has channels of only 6 MHz bandwidth, a multiplexing scheme is used on the existing single audio carrier. Due to the wider channel bandwidths in 625-line systems, multiple sound carriers emerged as the solution. Multiplex Sound Systems. In the United States, Zenith Electronics developed multichannel television sound (MTS), a pilot-tone system in which the sum of the
1363
Frequency band Channel B/W (MHz) Visual/Edge separation (MHz) Video B/W (MHz) Vestigial sideband (MHz) Visual modulation polarity (Wht =) Picture/synchronization ratio Black pedestal(%) Visual/aural separation (MHz) Aural modulation Aural peak deviation (kHz) Aural preemphasis (µs) Lines per frame Field blanking (lines) Field synchronization (lines) Equalization pulses (lines) Vertical resolution (L/ph) Horizontal resolution (L/ph) Field rate (Hz) Line rate (Hz) Line blanking (µs) Front porch (µs) Line synchronization (µs)
Standard
AM
50 625 25 2.5
7/3 0 +5.5 FM ±50 50 625 25 2.5 2.5 413 400 50 15,625 12 1.5 4.7
7/3
0 −3.5
AM
405 13 4
0
254 270
50 10,125 18 1.75 9.0
50 15,625 12 1.4 5.0
413 400
2.5
0 +5.5
7/3
5 −0.75 Pos.
5 −0.75 Neg.
3 +0.75 Pos.
VHF 7 +1.25
C Luxembourg
V/U 7/8 +1.25
B/G Western Europe
VHF 5 −1.25
A United Kingdom
50 15,625 12 1.5 4.7
413 450
2.5
50 625 25 2.5
FM ±50
0 +6.5
7/3
6 −0.75 Neg.
V/U 8 +1.25
D/K Eastern Europe
Table 1. Principal Characteristics of World Television Systems
50 20,475 9.5 0.6 2.5
516 635
0
819 33 20 µs
AM
0 ±11.15
7/3
10 ±2.0 Pos.
VHF 14 ±2.83
E France
H
50 20,475 9.2 1.1 3.6
516 400
3.5
50 819 33 3.5
AM
0 +5.5
7/3
5 −0.75 Pos.
50 15,625 12 1.5 4.7
413 400
2.5
50 625 25 2.5
FM ±50
0 +5.5
7/3
5 −1.25 Neg.
UHF 8 +1.25
Belgium VHF 7 +1.25
F
50 15,625 12 1.5 4.7
413 430
2.5
50 625 25 2.5
FM ±50
0 +6.0
7/3
5.5 −1.25 Neg.
V/U 8 +1.25
I United Kingdom
50 15,625 12 1.5 4.7
413 450
2.5
50 625 25 2.5
FM ±50
0 +6.5
7/3
6 −1.25 Neg.
V/U 8 +1.25
K French Overseas Territories
50 15,625 12 1.5 4.7
413 450
2.5
625 25 2.5
AM
0 +6.5
7/3
6 −1.25 Pos.
50 15,625 12 1.5 4.7
413 450
2.5
625 25 2.5
AM
0 −6.5
7/3
6 +1.25 Pos.
VHF-1 8 −1.25
France V-3/U 8 +1.25
L
L
59.94 15,734 10.9 1.5 4.7
350 320
3
75 525 21 3
FM ±25
7.5 +4.5
10/4
4.2 −0.75 Neg.
V/U 6 +1.25
M North America/ Far East
50 15,625 12 1.5 4.7
413 320
2.5
75 625 25 2.5
FM ±25
0 +4.5
7/3
4.2 −0.75 Neg.
V/U 6 +1.25
N South America
Frequencies in MHz () = obsolete std.
AM 3.0
0.75
(A)
1.25
0 3.5
5
NTSC, PAL FM M N 0
0.75
4.2
1.5
1.25
0.5 3.58
4.5 6
Mostly PAL B (C) (F) 0
0.75
5.0
B-FM (C, F-AM)
1.5
1.25
0.5 4.43
5.5
7
Mostly PAL FM D G K 0
G, H = 5.0 I = 5.5
0.75
4.43
H, I, K′ L
0
5.5
6.0
1.5
1.25
I
G H I
6.0
8
L-AM D, K, K′-FM
Mostly SECAM
1.25
4.25
1.5 4.41
D, K K, L 6.5 8
AM (E)
0 11.15
~ ~ ~ ~ ~ ~
Figure 4. Television RF channel spectra for the various world television systems. For simplicity, only a single illustration of the lower and upper portions of the channel spectra is shown for the 8 MHz wide channels. Therefore, for systems D, H, I, and K, the lower and upper illustrations are not adjacent to each other.
1.5
1.25
G, H G H I 0.5 1.0
1364
10.0
2.0 2.83
14
TELEVISION BROADCAST TRANSMISSION STANDARDS
two stereo channels (L + R) modulates the main TV audio FM carrier and provides a monophonic signal to conventional receivers. A difference signal (L − R) is dbxTV suppressed-carrier amplitude-modulated onto an audio subcarrier at twice the line scanning frequency (2fH ), and a pilot is sent at the line frequency as a reference for demodulation. A second audio program (SAP) may be frequency modulated at 5fH , and nonpublic voice or data may be included at 6.5fH . Japan’s developed a similar FM/FM stereo system using FM of the L − R subchannel at 2fH . A control subcarrier at 3.5fH is tone-modulated to indicate whether stereo or second audio programming is being broadcast. Multiple Carrier Systems. In Germany, IRT introduced Zweiton, a dual carrier system for transmission standards B and G. In both standards, the main carrier is frequencymodulated at 5.5 MHz above the vision carrier. For stereo, this carrier conveys the sum of the two channels (L + R). The second frequency-modulated sound carrier is placed 15.5 times the line scanning frequency above the main carrier, or 5.742 MHz above the vision carrier. For transmission system D, a similar relationship exists to the 6.0 MHz main channel. For stereo, this carrier conveys only the right audio channel. A reference carrier at 3fH is tone-modulated, and the particular tone frequency switches receivers for stereo or second audio channel sound. A variant is used in Korea where the second carrier is placed at 4.72 MHz above the vision carrier and conveys L − R information. The British Broadcasting Corporation (BBC) in Britain developed near-instantaneous companded audio multiplex (NICAM), a digital sound carrier system. Both audio channels are sampled at 32 kHz at 14-bit resolution. Each sample is compressed to 10 bits, then arranged into frame packages of 728-bit length. These are rearranged and then the data are scrambled to disperse the effect of noise at the receiver. Finally, two bits at a time are fed to a QPSK modulator for transmission. Either stereo or two independent sound signals may be transmitted. Other possible combinations are (1) one sound and one data channel or (2) two data channels. The original analog-modulated single-channel sound carrier is retained for older receivers. The digital carrier is 6.552 MHz above the vision carrier for transmission system I or 5.85 MHz for systems B, G, and L. NTSC Color Television It has long been recognized that color vision in humans results from three types of photoreceptors in the eye, each sensitive to a different portion of the visible spectrum. The ratio of excitation creates the perception of hue and saturation, and the aggregate evokes the sensation of brightness. Stimulating the three types of photoreceptors using three wavelengths of light can produce the impression of a wide gamut of colors. For television, the image is optically divided into three primary color images, and this information is delivered to the receiver, which spatially integrates the three pictures, something like tripack color film and printing.
1365
Sequential Color Systems. The first attempts at commercial color TV involved transmitting the three color images sequentially. Compatibility with existing transmitters was essential from an economic standpoint. Because linesequential transmission caused crawl patterning, fieldsequential was preferred. However, there was no separate luminance signal for existing black-and-white receivers to use. To reduce flicker caused by the apparent brightness differences among the three primary colors, the field rate had to be increased, and maintaining an equivalent channel bandwidth required that the number of scanning lines be reduced. These changes aggravated the compatibility problem with existing sets. A field-sequential system developed by the Columbia Broadcasting System (CBS) network was briefly commissioned in the United States during 1951. To be compatible, a color television system must have the same channel bandwidth as existing monochrome transmitters and receivers, use equivalent scanning parameters, and supply the same luminance signal, as if the picture were black-and-white. An all-industry body, the National Television Systems Committee (NTSC), was set up in the United States to devise such a color TV system. Separate Luminance and Mixed Highs. The human visual system senses shapes and edges from brightness variations. Color fills in only the larger areas, much like a child’s coloring book. At the suggestion of Hazeltine Electronics, the existing wide-bandwidth luminance signal of black-and-white television was retained. The color information is limited to a much narrower bandwidth of the order of 1 MHz and restricts its resolution in the horizontal direction. This first led to a dot-sequential system that sampled the three colors many times along each scanning line to form a high-frequency chrominance signal. The frequency of sampling may be likened to a subcarrier signal whose amplitude and phase are changing according to color variations along the line. At the receiver, the ‘‘dot’’ patterns of each primary resulting from sampling are averaged in low-pass frequency filters. The result is a continuous but low-resolution full-color signal. Equal amounts of their higher frequency components are summed to form a mixed-highs signal for fine luminance detail (Y = 1/3R + 1/3G + 1/3B), an idea from Philco. Quadrature Modulation. The dot-sequential concept formed the basis for a more sophisticated simultaneous system. The luminance signal contains both high- and low-frequency components. Only two lower resolution color signals are needed (the third can be derived by subtracting their sum from the low-frequency portion of luminance). The spectral composition of green is nearest to that of luminance, so transmitting the red and blue signals improves the signal-to-noise performance. These low-frequency color signals are sampled by using a timemultiplexing technique proposed by Philco, known as quadrature modulation. The chrominance signal is formed by the addition of two subcarriers, which are locked at the same frequency
1366
TELEVISION BROADCAST TRANSMISSION STANDARDS
but differ in phase by 90 ° . The two subcarriers are modulated by separate baseband signals such that each is sampled when the other carrier is at a null. This results in modulating the subcarrier in both amplitude and phase. The amplitude relates to the saturation of the color, and the phase component corresponds to the hue (Fig. 5).
Amplitude
Addition of modulated I and Q Modulated I signal
Modulated Q signal
Time
Sound carrier
Color subcarrier
Picture carrier
Figure 5. Quadrature modulation. Addition of two amplitude-modulated signals whose carrier frequencies are in phase quadrature (same frequency but offset in phase by 90 ° ) produces an output signal whose carrier is modulated in both amplitude (AM) and phase (PM) simultaneously. This method of combining two baseband signals onto a single carrier is called quadrature modulation. In the case of NTSC or PAL encoding for color television, the two baseband components represent the two chrominance signals (I and Q for NTSC, U and V for PAL). The resulting amplitude of the subcarrier relates to the saturation, and the phase conveys the hue information. The frequency of the subcarrier is unchanged.
Frequency Multiplexing. The sampling rate is more than twice the highest frequency of the color signals after low-pass filtering, so the chrominance information shares the upper part of the video spectrum with luminance. This frequency-multiplexing scheme was put forward by General Electric. The scanning process involves sampling the image at line and field rates; therefore, energy in the video signal is concentrated at intervals of the line and field frequencies. These sidebands leave pockets between them where very little energy exists. The exact subcarrier frequency was made an odd multiple of one-half the line scanning frequency. This causes sidebands containing the color information to fall likewise in between those of the existing luminance signal (Fig. 6). Therefore, the phase of the subcarrier signal is opposite line-to-line. This prevents the positive and negative excursions of the subcarrier from lining up vertically in the picture, and it results in a less objectionable ‘‘dot’’ interference pattern between the subcarrier and luminance signal. Comb filtering to separate luminance and chrominance may be employed by examining the phase of information around the subcarrier frequency on adjacent lines. The dot pattern is further concealed because the subcarrier phase is also opposite frame-to-frame. A four-field sequence is established whereby the two interlaced picture fields, together with the alternating phase of subcarrier on sequential frames, requires maintaining the proper sequence. Sources to be intercut or mixed must be properly timed, and editing points must be chosen to preserve the sequence of the four color fields.
fH /2 −0.75 0
0.5
0
32
64
96 128 160 192 224 256 288 320 352 384 416 1
0 1.25
2
2 3
3 4
455
502 534 572
3.58
4.2 4.5
5
Sidebands MHz baseband 6 MHz channel
Figure 6. Frequency spectrum of composite NTSC-M color television signal showing relationships between the baseband and channel spectra and between sidebands of the picture carrier and color subcarrier.
TELEVISION BROADCAST TRANSMISSION STANDARDS
A slight modification of the line and field scanning frequencies was necessary because one of the sidebands of the new color subcarrier fell right at 4.5 MHz, the rest frequency of the FM sound carrier for system M. Existing black-and-white receivers did not have adequate filtering to prevent an annoying buzz when the program sound was low and color saturation was high. By reducing the scanning frequencies by a mere 0.1%, the sidebands of luminance and chrominance remained interleaved, but shifted to eliminate the problem. Hence, the field frequency became 59.94 Hz, and the line frequency became 15.734 kHz. Color-Difference Signals. Another suggestion came from Philco: Interference with the luminance signal is minimized by forming the two color signals as the difference between their respective primary and luminance (i.e., R − Y, B − Y). This makes the color-difference signals smaller in amplitude because most scenes have predominantly pale colors. The subcarrier itself is suppressed, so that only the sidebands are formed. When there is no color in the picture, the subcarrier vanishes. This necessitates a local oscillator at the receiver. A color-burst reference is inserted on the back porch of the horizontal sync pulse that synchronizes the reference oscillator and provides an amplitude reference for color saturation automatic gain control. Constant Luminance. In the constant-amplitude formulation (Y = 1/3R + 1/3G + 1/3B), the luminance signal does not represent the exact scene brightness. Part of the brightness information is carried by the chrominance channels, so unwanted irregularities in them, such as noise and interference, produce brightness variations. Also, the gray-scale rendition of a color broadcast on a black-and-white receiver is not correct. Hazeltine Electronics suggested weighting the contributions of the primaries to the luminance signal according to their actual addition to the displayed brightness. The color-difference signals will then represent only variations in hue and saturation because they are ‘‘minus’’ the true brightness (R − Y, B − Y). A design based on this principle is called a constant-luminance system. For the display phosphors and white point originally specified, the luminance composition is Y = 30%R + 59%G + 11%B. Scaling Factors. The two low-bandwidth color-difference signals modulate a relatively high-frequency subcarrier superimposed onto the signal level representing luminance. However, the peak subcarrier excursions for some hues could reach far beyond the original black-and-white limits, where the complete picture signal is restricted between levels representing blanking and peak white picture information. Overmodulation at the transmitter may produce periodic suppression of the RF carrier and/or interference with the synchronizing signals. If the overall amplitude of the composite (luminance level plus superimposed subcarrier amplitude) signal were simply lowered, the effective power of the transmitted signal would be significantly reduced.
1367
A better solution was to reduce the overall amplitude of only the modulated subcarrier signal. However, such an arbitrary reduction would severely impair the signalto-noise ratio of the chrominance information. The best solution proved to be selective reduction of each of the baseband R − Y and B − Y signal amplitudes to restrict the resulting modulated subcarrier excursions to ±4/3 of the luminance signal levels. The R − Y signal is divided by 1.14, and B − Y is divided by 2.03. It was found that the resulting 33.3% overmodulation beyond both peak white and blanking levels was an acceptable compromise because the incidence of highly saturated colors is slight (Fig. 7). Proportioned Bandwidths. RCA proposed shifting the axes of modulation from R − Y, B − Y to conform to the greater and lesser acuity of human vision for certain colors. The new coordinates, called I and Q, are along the orange/cyan and purple/yellow-green axes. This was done so that the bandwidths of the two color signals could be proportioned to minimize cross talk (Fig. 8). Early receivers used the wider bandwidth of the I signal; however, it became evident that a very acceptable color picture could be reproduced when the I bandwidth is restricted to the same as that of the Q channel. Virtually all NTSC receivers now employ ‘‘narrowband’’ I channel decoding. A block diagram of NTSC color encoding is shown in Fig. 9. These recommendations were adopted by the U.S. Federal Communications Commission in late 1953, and commercial color broadcasting began in early 1954. Sequential and Memory (SECAM) Economic devastation of World War II delayed the introduction of color television to Europe and other regions. Because differences between 525- and 625line scanning standards made video tapes incompatible anyway and satellite transmission was unheard of, there seemed little reason not to explore possible improvements to the NTSC process. Sequential Frequency Modulation. The most tenuous characteristic of NTSC proved to be its sensitivity to distortion of the phase component of the modulated subcarrier. Because the phase component imparts color hue information, errors are quite noticeable, especially in skin tones. Also of concern were variations in the subcarrier amplitude, which affect color saturation. Most long-distance transmission circuits in Europe did not have the phase and gain linearity to cope with the added color subcarrier requirements. A solution to these drawbacks was devised by the Campagnie Fran¸caise de T`el`evision in Paris. By employing a one-line delay in the receiver, quadrature modulation of the subcarrier could be discarded, and the color-difference signals (called DR and DB in SECAM) sent sequentially on alternate lines. This reduces vertical resolution in color by half; however, it is sufficient to provide only coarse detail vertically, as is already the case horizontally. In early development, AM was contemplated; however, the use of FM also eliminated the effects of subcarrier
1368
TELEVISION BROADCAST TRANSMISSION STANDARDS
131.3 131.1
(a)
(b)
934 933
117.0
824 100.4
100 89.8
702
700
94.1
652
620
72.3
491
62.1
59.1
411
393
45.4
48.4
289
35.2
307
209
17.7 13.4
80
7.5 IRE 7.1
48 −9.5
0mV −2.5 −124
−23.6 −23.8
−233 −234 (c)
(100) 100.3 100.2
(d)
700 700 700
89.6 76.9
618 77.1
69.2
(525)
72.4
527 465
56.1 48.4
368
46.2
308
35.9 38.1
295 217
230
157
28.3
60
15.1 7.5 IRE
12.0 7.2
36 −5.3 −15.8 −16.0
489
0mV −2 −93
−175 −175
Figure 7. (a) 100% NTSC color bars (100/7.5/100/7.5). (b) 100% PAL color bars (100/0/100/0). (c) Standard 75% ‘‘EIA’’ color bars (75/7.5/75/7.5). (d) Standard 75% ‘‘EBU’’ color bars (100/0/75/0).
amplitude distortion. In addition, FM allowed recording the composite signal on conventional black-and-white tape machines because precise time base correction was not necessary. Compatibility. In FM, the subcarrier is always present, superimposed on the luminance signal at constant amplitude (unlike NTSC, in which the subcarrier produces noticeable interference with the luminance only on highly saturated colors). To reduce its visibility, a number of techniques are employed. First, preemphasis is applied to the baseband colordifference signals to lessen their amplitudes at lower saturation, but preserve adequate signal-to-noise ratio (low-level preemphasis; see Fig. 10). Second, different subcarrier frequencies are employed that are integral multiples of the scanning line frequency; foB is 4.25 MHz (272 H), and foR is 4.40625 MHz (282 H). The foR signal is inverted before modulation so that the maximum deviation is toward a lower frequency, reducing the bandwidth required for the dual subcarriers. Third, another level of preemphasis is applied to the modulated subcarrier around a point between the two rest frequencies, known as the ‘‘cloche’’ frequency of 4.286 MHz (high-level preemphasis, the so-called ‘‘antibell’’ shaping shown in Fig. 11). Finally, the phase of the modulated subcarrier is reversed on consecutive
fields and, additionally, on every third scanning line, or, alternately, every three lines. Line Identification. Synchronizing the receiver to the alternating lines of color-difference signals is provided in one of two ways. Earlier specifications called for nine lines of vertical blanking to contain a field identification sequence formed by truncated sawteeth of the colordifference signals from the white point to the limiting frequency (so-called ‘‘bottles’’; see Fig. 12). This method is referred to as ‘‘SECAM-V.’’ As use of the vertical interval increased for ancillary signals, receiver demodulators were fashioned to sample the unblanked subcarrier immediately following the horizontal sync pulse, providing an indication of line sequence from the rest frequency. Where this method is employed, it is called ‘‘SECAM-H.’’ An advantage of this method is near-instantaneous recovery from momentary color field sequence errors, whereas SECAM-V receivers must wait until the next vertical interval. Issues in Program Production. High-level preemphasis causes the chrominance subcarrier envelope to increase in amplitude at horizontal transitions, as can be seen on a waveform monitor (Fig. 13). Unlike NTSC, the subcarrier amplitude bears no relation to saturation, so, except for testing purposes, a luminance low-pass filter is employed
TELEVISION BROADCAST TRANSMISSION STANDARDS
rp
les
53
5
C
C
C
100 IRE P−P 80
7
Red 88
Pu
520
496C
510
493C= 780 630
R−Y
611
58
590
600
118
500C
Reds
s
nge Ora
+I
1369
110 0C
55
Magenta 82
58
+Q
Yel low s
0
0C
56
60
40
67 C
°
380=5 430
33
573
83
20
Yellow 62
167°
180°
(B−Y)570
ts Viole
3°
12
61 °
° 103
90°
440 450 455 B-Y 460 nm
0° 347°
62 Blue 83
3°
21
470
es Blu
23 246° 1°
3°
30 283°
270°
0
56
48
0
−Q
54
496
500
520
5 53
510
ns
4
ee
490.5
G−Y
Gr
110
48
88 Cyan
2
55
0
82 Green
118
ans Cy
−(R−Y)
Figure 8. Vector relationship among chrominance components and corresponding dominant wavelengths. 75% color bars with 7.5% setup. Linear NTSC system, NTSC luminophors, illuminant C. Hexagon defines maximum chrominance subcarrier amplitudes as defined by 100% color bars with 7.5% setup. Caution: The outer calibration circle on vectorscopes does not represent exactly 100 IRE P–P.
on the waveform monitor. A vectorscope presentation of the saturation and hue is implemented by decoding the FM subcarrier into baseband color-difference signals and applying them to an X, Y display. Unfortunately, the choice of FM for the color subcarrier means that conventional studio production switchers cannot be employed for effects such as mixing or fading from one scene to another because reducing the amplitude of the subcarrier does not reduce the saturation. This necessitates using a component switcher and then using encoding afterward. When the signal has already been encoded to SECAM (such as from a prerecorded video tape), it must be decoded before the component switcher and then reencoded. Like NTSC, program editing must be done in two-frame increments. Although the subcarrier phase is reversed on
a field-and-line basis, establishing a ‘‘12-field sequence,’’ it is the instantaneous frequency — not phase — that defines the hue. However, the line-by-line sequence of the colordifference signals must be maintained. The odd number of scanning lines means that each successive frame begins with the opposite color-difference signal. As described before, mixes or fades are never done by using composite signals. Because the instantaneous frequency of the subcarrier is not related to the line scanning frequency, it is impossible to employ modern comb-filtering techniques to separate the chrominance and luminance in decoding. Increasingly, special effects devices rely on decoding the composite TV signal to components for manipulation, then reencoding. Every operation of this sort impairs luminance resolution because a notch filter must be used around
Gamma corr.
RED
30% 59% 11%
Y Matrix
Luminance
Delay 1ms
Sync
Clock − +
Color bar generator
GRN
Gamma corr.
Sync generator
Burst Flag
Bars
21% −52%
Pix
31%
LPF 0.5 MHz
Q Matrix
Adder
Blanking
Q Modulator
NTSC
Encoded chroma
33° Burst generator
+ − Auto white balance
BLU
Subcarrier generator
Chroma adder 123°
60% −28% −32%
Gamma corr.
Modulated I I Matrix
LPF 1.5 MHz
I Modulator
Delay 400 ns
Baseband I Figure 9. Block diagram of RGB to NTSC encoding (and related circuitry).
3.900.25
−506
Modulation limit
71.4 Subcarrier: Amplitudes in % of the luminance interval (peak−to−peak)
4,126.25 4,171.25
−280 −235
23
4,286.00 4,361.25 4,406.25 4,451.25
4,641.25 4,686.25
35.8 Red Magenta 30.2
− 45 0 +45
+235 +280 +350 + Kc
26.2 30.4 35.8
<> center frequency
3.900
− 506
4.020
− 230
4.098
−152
4.172
− 78
30
0
23.7
4.250 4.286
Yellow White-black Blue
D′B identification line
51.6 39.9
Subcarrier: Amplitudes in % of the luminance interval (peak−to−peak)
71.4
Yellow Green Red White-black
23
<> center frequency
24
Cyan
4.328
+78
4.402
+152
4.480
+230
39.4
4.756
+506 + Kc
Modulation limit
30
Magenta Blue
61.5 Green 67.8 Cyan D′R identification line
77.2
Figure 10. SECAM baseband (low-level) preemphasis.
1370
77.2
TELEVISION BROADCAST TRANSMISSION STANDARDS
the subcarrier frequency. These concerns have led many countries that formerly adopted SECAM to switch to PAL for program production and transcode to SECAM only for RF broadcasting.
scaling factors are used, and the signals are known as V and U, respectively. Color Phase Alternation. In the PAL System, the phase of the modulated V component of the chrominance signal is reversed on alternate lines to cancel chrominance phase distortion acquired in equipment or transmission. A reference is provided to indicate which lines have +V or −V phase by also shifting the phase of the color burst signal by ±45 ° on alternate lines. Any phase shift encountered will have the opposite effect on the displayed hue on adjacent lines in the picture. If the phase error is limited to just a few degrees, the eye integrates the error, because, more chrominance detail is provided in the vertical direction than can be perceived at normal viewing distances. Receivers based on this principle are said to have ‘‘simple PAL’’ decoders. If the phase error is more than a few degrees, the difference in hue produces a venetian-blind effect, called Hanover bars. Adding a one-line delay permits integrating chrominance information from adjacent scanning lines electrically, and there is a slight reduction in saturation for large errors. Color resolution is reduced by half in the vertical direction but more closely matches horizontal resolution due to band-limiting in the encoder. This technique of decoding is called ‘‘deluxe PAL.’’
Phase Alternate Line (PAL) To retain the ease of the NTSC in program production, yet correct for phase errors, the German Telefunken Company developed a system more comparable to the NTSC that retains quadrature modulation. Because of the wider channel bandwidth of 625-line systems, the color subcarrier could be positioned so that the sidebands from both color-difference signals have the same bandwidth. This means that R − Y and B − Y signals could be used directly, rather than I and Q as in the NTSC. Identical
<> dB
<> dB
0
ATTENUATION
Shaping and <> (complementary curves)
3.8
4
−1
3
−3
5
−5
7
−7
9
−9
11
−11
13
−13 −15
15 4.4
4.2
Compatibility. The line-by-line phase reversal results in an identical phase on alternate lines for hues on or near the V axis. To sustain a low-visibility interference pattern in PAL, the subcarrier frequency is made an odd multiple of one-quarter of the line frequency (creating eight distinct color fields). This effectively offsets the excursions of the V component by 90 ° line-to-line and offsets those of the U component by 270 ° . Because this 180 ° difference would cause the excursions of one component to line up vertically with those of the other in the next frame, an additional 25 Hz offset (fV /2) is added to the PAL subcarrier frequency to reduce its visibility further. In early subcarrier oscillator designs, the reference was derived from the mean phase of the alternating burst signal. Interlaced scanning causes a half-line offset between fields with respect to the vertical position, so that
Frequencies
4.6
4.8
Mc
FC = 4.286 Mc 4.75 Mc
3.9 Mc
Figure 11. SECAM RF (high-level) preemphasis.
Characteristic signal of identification lines
D′R and D′B
End of frame blanking interval
Frequency
3.6
1
1
2
3
D′B D′R
4
5
6
7
8
9
10
1371
11 12 13
14 15 16 17 18
19 20
21 22
D′B D′R Figure 12. SECAM field identification ‘‘bottles.’’
1372
TELEVISION BROADCAST TRANSMISSION STANDARDS
Black
Blue
Red
Mauve
Green
Turquoise
Yellow
White
LINE D'R
LINE D'B
Figure 13. SECAM color bar waveforms.
the number of bursts actually blanked during the 7(1/2) H vertical interval would be different for the odd versus even fields. Because the phase of burst alternates line-to-line, the mean phase would then appear to vary in this region, causing disturbances at the top of the picture. This is remedied by a technique known as ‘‘Bruch blanking.’’ The burst blanking is increased to a total of nine lines and repositioned in a four-field progression to include the 7(1/2) H interval, such that the first and last burst of every field has a phase corresponding to (−U + V) or +135 ° . The burst signal is said to ‘‘meander,’’ so that color fields 3 and 7 have the earliest bursts.
This complicates program editing because edit points occur only every four frames, which is slightly less than 1/10 s. Comb filtering to separate chrominance and luminance in decoding is somewhat more complicated in PAL; however, it has become essential for special picture effects. On a waveform monitor, the composite PAL signal looks very much like NTSC, except that the 25 Hz offset causes a slight phase shift from line to line, so that when viewing the entire field, the sine wave pattern is blurred. Because of the reversal in the phase of the V component on alternate lines, the vectorscope presentation has a mirror image about the U axis (Fig. 14).
Issues in Program Production. Because the subcarrier frequency in PAL is an odd multiple of one-quarter the line frequency, each line ends on a quarter-cycle. This, coupled with the whole number plus one-half lines per field, causes the phase of subcarrier to be offset in each field by 45 ° . Thus, in PAL, the subcarrier phase repeats only every eight fields, creating an ‘‘eight-field sequence.’’
Variations of PAL. The differences between most 625line transmission standards involve only RF parameters (such as sound-to-picture carrier spacing). For 625-line PAL program production, a common set of technical specifications may be used. These standards are routinely referred to in the production environment as ‘‘PAL-B,’’ although the baseband signals may be used with any 625-line transmission standard.
TELEVISION BROADCAST TRANSMISSION STANDARDS
1373
cy g
R
MG V
75% YL
b
100%
U yl
B 20% 3° G
CY
20%
mg 5% 10°
r 2°
10% 10%
Several countries in South America have adopted the PAL system. The 6 MHz channel allocations in that region meant that the color subcarrier frequency had to be suitably located, about 1 MHz lower in frequency than for 7 MHz or 8 MHz channels. The exact frequencies are close to, but not the same as, those for the NTSC. The 625-line system is known as PAL-N. Studio production for this standard is done in conventional ‘‘PAL-B,’’ then converted to PAL-N at the transmitter. The 525-line PAL is known as PAL-M, and it requires studio equipment unique to this standard, although the trend is toward using conventional NTSC-M equipment and also transcoding to PAL-M at the transmitter. PAL-M does not employ a 25 Hz offset of the subcarrier frequency, as in all other PAL systems. Similarities of Color Encoding Systems The similarities of the three basic color television encoding systems are notable (see Table 3). They all rely on the concept of a separate luminance signal that provides compatibility with black-and-white television receivers. The ‘‘mixed-highs’’ principle combines high-frequency information from the three color primaries into luminance, where the eye is sensitive to fine detail; only the relatively low-frequency information is used for the chrominance channels. All three systems use the concept of a subcarrier, located in the upper frequency spectrum of luminance, to convey the chrominance information (Fig. 15). All systems use color-difference signals, rather than the color primaries directly, to minimize cross talk with the luminance signal, and all derive the third color signal by subtracting the other two from luminance. The constant luminance principle is applied in all systems, based on the original NTSC picture tube phosphors, so the matrix formulations for luminance and color-difference signals are identical (some recent NTSC encoders use
Figure 14. Typical PAL vectorscope graticule.
equal-bandwidth R − Y and B − Y signals, instead of proportioned-bandwidth I and Q signals). All of the systems use scaling factors to limit excessive subcarrier amplitude (NTSC/PAL) or deviation (SECAM) excursions. Finally, all three systems use an unmodulated subcarrier sample on the back porch of the horizontal sync pulse for reference information in the decoding process. Because of these similarities, conversion of signals between standards for international program distribution is possible. Early standards converters were optical, essentially using a camera of the target standard focused on a picture tube operating at the source standard. Later, especially for color, electronic conversion became practical. The most serious issue in standards conversion involves motion artifacts due to the different field rates between 525- and 625-line systems. Simply dropping or repeating fields and lines creates disturbing discontinuities, so interpolation must be done. In modern units, the composite signals are decoded into components, using up to threedimensional adaptive comb filtering, converted using motion prediction, then reencoded to the new standard. Table 4 lists the transmission and color standards used in various territories throughout the world. Component Analog Video (CAV) The advent of small-format video tape machines that recorded luminance and chrominance on separate tracks led to interest in component interconnection. Increasingly, new equipment decoded and reencoded the composite signal to perform manipulations that would be impossible or would cause significant distortions if done in the composite environment. It was reasoned that if component signals (Y, R − Y, B − Y) could be taken from the camera and if encoding to NTSC, PAL, or SECAM could be done just before transmission, then technical quality would be greatly improved.
1374
TELEVISION BROADCAST TRANSMISSION STANDARDS
Table 3. Principal Characteristics of Color Television Encoding Systems System Display primaries White reference Display gamma Luminance Chrominance signals
Chrominance baseband video preemphasis (kHz) Modulation method
NTSC FCC CIE III C 2.2 EY = +0.30ER − 0.59EG + 0.11EB Q = +0.41(B − Y) + 0.48(R − Y) I = −0.27(B − Y) + 0.74(R − Y)
PAL EBU CIE III D65 2.8
EBU CIE III C 2.8 EY = +0.299ER + 0.587EG + 0.114EB
U = 0.493(B − Y)
DB = +1.505(B − Y)
V = 0.877(R − Y)
DR = −1.902(R − Y)
—
—
Amplitude modulation of two suppressed subcarriers in quadrature
Axes of modulation Chroma BW/Deviation (kHz)
Q = 33 ° , I = 123 ° Q = 620, I = 1, 300
U = 0 ° , V = ±90 ° U + V = 1, 300
Vestigial sideband (kHz)
+620
Composite color video signal (CCVS)
EM = EY + EQ (sin ωt + 33 ° ) + EI (cos ωt + 33 ° ) GSC = EQ2 + EI2
+570 (PAL-B,G,H), +1, 070 (PAL-I), +620 (PAL-M,N) EM = EY + EU sin ωt ± EV cos ωt
Modulated subcarrier amplitude/preemphasis
SC/H frequency relationship
fSC = (455/2)fH
Subcarrier frequency (MHz)
3.579545 ± 10 Hz
Phase/Deviation of SC reference Start of SC reference (µs)
180 °
SC reference width (cycles) SC reference amplitude (mV)
9±1
5.3 ± 0.1
286 (40 IRE ± 4)
SECAM
GSC =
EU2 + EV2
fSC = (1, 135/4)fH + fV /2 (PAL-B, G, H, I) = (909/4)fH (PAL-M) = (917/4)fH + fV /2 (PAL-N) 4.43361875 ± 5 Hz(PAL-B, G, H); ±1 Hz(PAL-I) 3.57561149 ± 10 Hz(PAL-M) 3.58205625 ± 5 Hz(PAL-N) +V = +135 ° , −V = −135 ° 5.6 ± 0.1 (PAL-B, G, H, I, N); 5.2 ± 0.5 (PAL-M) 10 ± 1 (PAL-B, G, H, I); 9 ± 1 (PAL-M, N) 300 ± 30
However, component signal distribution required that some equipment, such as switchers and distribution amplifiers, have three times the circuitry, and interconnection required three times the cable and connections as those of composite systems. This brought about consideration of multiplexed analog component (MAC) standards, whereby the luminance and chrominance signals are time-multiplexed into a single, higher bandwidth signal. No single standard for component signal levels emerged (Table 5), and the idea was not widely popular. Interest soon shifted to the possibility of digital signal distribution.
D∗B = A × DR D∗R = A × DR
1 + j × fB /fR 85 A= /f f B R 1 + j × 255
Frequency modulation of two sequential subcarriers — foB = ±230 + 276/ − 120, foR = ±280 + 70/ − 226 — EM = EY + GSC × cos 2π(foB + D∗ B foB )t + GSC × cos 2π(foR + D∗ B foR )t GSC = D∗B /D∗R × 0.115EY (P − 1 + j(16)F P) × 1 + j(1.26)F f0 fB /fR − (f0 = F= f0 fB /fR 4.286 ± 0.02MHz) foB = 272 fH foR = 282 fH
foB = 4.250000 ± 2 kHz, foR = 4.406250 ± 2 kHz DB = −350 kHz, DR = +350 kHz 5.7 ± 0.3 — DB = 167, DR = 215
Digital Video Devices such as time-base correctors, frame synchronizers, and standards converters process the TV signal in the digital realm but use analog interfaces. The advent of digital video tape recording set standards for signal sampling and quantization to the extent that digital interconnection became practical. Component Digital. The European Broadcasting Union (EBU) and Society of Motion Picture and Television Engineers (SMPTE) coordinated research and conducted
TELEVISION BROADCAST TRANSMISSION STANDARDS
Camera signals
1375
G B R
R, G, B Y, R, B
0
1
2
3
4
5
6 MHz
Baseband primary components Matrixing Y, I, Q Y, U, V Y, PB, PR Y (Luminance) I 2
Q 0
1.5
3 4 5 6 MHz Baseband color-difference components
0.5
Color Subcarrier
Color encoding NTSC PAL SECAM Baseband luminance 0
1
1.5 2
3
CH 2
5
6 MHz
61 1.25
62
4.2 63
4.5 TV channel 3
64
65
demonstrations in search of a component digital video standard that would lend itself to the exchange of programs worldwide. A common data rate of 13.5 Mbps based on line-locked sampling of both 525- and 625line standards was chosen. This allows analog video frequencies of better than 5.5 MHz to be recovered and is an exact harmonic of the scanning line rate for both standards, enabling great commonality in equipment. A common image format of static orthogonal shape is also employed, whereby the sampling instants on every line coincide with those on previous lines and fields and also overlay the samples from previous frames. There are 858 total luminance samples per line for the 525-line system, 864 samples for the 625-line system, but 720 samples during the picture portion for both systems. This image structure facilitates filter design, special effects, compression, and conversion between standards.
Audio
Picture carrier
Color Subcarrier
B, D, G, H, I, K, M, N
60
0.5 3.58 4
Baseband composite
RF transmission
59
Encoded chrominance
66
67 CH 4
MHz
Figure 15. Four-stage color television frequency spectrum, showing the compression of three wideband color-separation signals from the camera through bandwidth limiting and frequency multiplexing into the same channel bandwidth used for black-and-white television.
For studio applications, the color-difference signals are sampled at half the rate of luminance, or 6.75 MHz, cosited with every odd luminance sample, yielding a total data rate of 27 Mbps. This provides additional resolution for the chrominance signals, enabling good special effects keying from color detail. The sampling ratio for luminance and the two chrominance channels is designated ‘‘4 : 2 : 2.’’ Other related ratios are possible (Table 6). Quantization is uniform (not logarithmic) for both luminance and color-difference channels. Eight-bit quantization, providing 256 discrete levels, provides adequate signal-to-noise ratio for videotape applications. However, the 25-pin parallel interface selected can accommodate two extra bits, because 10-bit quantization was foreseen as desirable in the future. Only the active picture information is sampled and quantized, allowing better resolution of the signal amplitude. Sync and blanking are coded by special signals (Figs. 16 and 17).
Table 4. National Television Transmission Standards
Table 4. (Continued)
Territory
Territory
VHF
Eritrea Estonia Ethiopia Faeroe Islands Falkland Islands Fernando Po Fiji Finland France French Guyana French Polynesia Gabon Gambia Georgia Germany Ghana Gibraltar Greece Greenland Grenada Guadeloupe Guam Guatemala Guinea Guinea-Bissau Guyana, Republic of Haiti Honduras Hong Kong Hungary Iceland India Indonesia Iran Iraq Ireland Israel Italy Ivory Coast = Cˆote d’Ivoire Jamaica Japan Johnston Islands Jordan Kampuchea = Cambodia Kazakhstan Kenya Korea, Democracy of (N) Korea, Republic of (S) Kuwait Kyrgyzstan Laos Latvia Lebanon Leeward Islands = Antigua Lesotho Liberia Libya Lichtenstein Lithuania Luxembourg Macao Macedonia Madagascar
B D B B I B M B L K K K B D B B B B B M K M M K I M M M
VHF
UHF
Afars and Isaas = Djibouti Afghanistan D Albania B G Algeria B Andorra B Angola I Antigua and Barbuda M Argentina N Armenia D K Ascension Islands I Australia B B Austria B G Azerbaijan D K Azores B Bahamas M Bahrain B G Bangladesh B Barbados M Belarus D K Belgium B H Benin K Bermuda M Bolivia M M Bosnia and Herzegovina B G Botswana I Brazil M Brunei Darussalam B Bulgaria D K Burkina Faso K Burma = Myanmar Burundi K Cambodia B Cameroon B Canada M M Canary Islands B G Cape Verde Islands I Cayman Islands M Central African Republic K Ceylon = Sri Lanka Chad K Channel Islands I Chile M China D D Colombia M Commonwealth of Independent States: see state Comores K Congo K Costa Rica M Cˆote d’Ivoire K Croatia B G Cuba M Curacao M M Cyprus B G Czech Republic D K Dahomey = Benin Denmark B G Diego Garcia M Djibouti K Dominican Republic M Ecuador M Equtorial Guinea = Fernando Po Egypt B G El Salvador M
Color
PAL PAL PAL PAL PAL NTSC PAL SECAM PAL PAL SECAM PAL NTSC PAL PAL NTSC SECAM PAL SECAM NTSC NTSC PAL PAL PAL PAL P&S SECAM SECAM PAL PAL NTSC PAL PAL NTSC SECAM SECAM PAL NTSC PAL NTSC
SECAM NTSC SECAM PAL NTSC NTSC P&S SP PAL NTSC SECAM NTSC NTSC P&S NTSC 1376
D B B B B B I B B M M M B D B D M B D B D B I B B B D B B K
UHF
K G
G L
K G G G
M
I K G
I G G
M G K
M G K K G
G K G/L I G
Color PAL SECAM PAL PAL PAL PAL NTSC PAL SECAM SECAM SECAM SECAM PAL SECAM PAL PAL PAL SECAM PAL NTSC SECAM NTSC NTSC PAL NTSC NTSC NTSC PAL P&S PAL PAL PAL SECAM SECAM PAL PAL PAL NTSC NTSC NTSC PAL SECAM PAL PAL NTSC PAL SECAM PAL SECAM SECAM PAL PAL SECAM PAL SECAM P P/S PAL PAL SECAM
TELEVISION BROADCAST TRANSMISSION STANDARDS Table 4. (Continued)
Table 4. (Continued) Territory
1377
VHF
Madeira B Malawi B Malaysia B Maldives B Mali K Malta B Martinique K Mauritania B Mauritius B Mayotte K Mexico M Micronesia M Moldova D Monaco L Mongolia D Montserrat M Morocco B Mozambique Myanmar M Namibia I Nepal B Netherlands B Netherlands Antilles M New Caledonia K New Zealand B Nicaragua M Niger K Nigeria B Norway B Oman B Pakistan B Palau M Panama M Papua New Guinea B Paraguay N Peru M Philippines M Poland D Portugal B Puerto Rico M Qatar B Reunion K Romania D Russia D Rwanda K St. Helena I St. Pierre et Miquelon K St. Kitts and Nevis M Samoa (American) M Samoa (Western) B B Sao Tom´e e Princ´ıpe San Andres Islands M San Marino B Saudi Arabia B Senegal K Serbia B Seychelles B Sierra Leone B Singapore B Slovakia B Slovenia B Society Islands = French Polynesia Somalia B
UHF
M K G/L
B
G
G
G G
G M K G M G G/K K K
G G G
G/K G
Color PAL PAL PAL PAL SECAM PAL SECAM SECAM SECAM SECAM NTSC NTSC SECAM S P/S SECAM NTSC SECAM PAL NTSC PAL PAL PAL NTSC SECAM PAL NTSC SECAM PAL PAL PAL PAL NTSC NTSC PAL PAL NTSC NTSC P&S PAL NTSC PAL SECAM PAL SECAM
SECAM NTSC NTSC PAL PAL NTSC PAL S/P S SECAM PAL PAL PAL PAL PAL PAL PAL
Territory
VHF
UHF
South Africa S. West Africa = Namibia Spain Sri Lanka Sudan Suriname Swaziland Sweden Switzerland Syria Tahiti = French Polynesia Taiwan Tajikistan Tanzania Thailand Togo Trinidad and Tobago Tunisia Turks and Caicos Turkey Turkmenistan Uganda Ukraine USSR: see independent state United Arab Emirates United Kingdom United States Upper Volta = Burkina Faso Uruguay Uzbekistan Venezuela Vietnam Virgin Islands Yemen Yugoslavia: see new state Zaire Zambia Zanzibar = Tanzania Zimbabwe
I
I
PAL
B B B M B B B B
G
PAL PAL PAL NTSC PAL PAL PAL P&S
G G G G
Color
M D B B K M B M B D B D
M K I M
K
NTSC SECAM PAL PN SECAM NTSC PS NTSC PAL SECAM PAL SECAM
B
G I M
PAL PAL NTSC
M N D M D/M M B
M G G K
K
PAL SECAM NTSC S/N NTSC PAL
K B
SECAM PAL
B
PAL
These specifications were standardized in ITU-R BT.601 — hence the abbreviated reference, ‘‘601 Video.’’ The first component digital tape machine standard was designated ‘‘D1’’ by SMPTE. This term has become used in place of more correct designations. For wide-screen applications, a 360-Mbps standard scales up the number of sampling points for a 16 : 9 aspect ratio. Interconnecting digital video equipment is vastly simplified by using a serial interface. Originally, an 8/9 block code was devised to facilitate clock recovery by preventing long strings of ones or zeros in the code. This would have resulted in a serial data rate of 243 Mbps. To permit serializing 10-bit data, scrambling is employed, and complementary descrambling is at the receiver. NRZI coding is used, so the fundamental frequency is half the bit rate of 270 MHz. Composite Digital. Time-base correctors for composite 1 in. videotape recorders had been developed that had several lines of storage capability. Some early devices sampled at three times the color subcarrier frequency
1378
TELEVISION BROADCAST TRANSMISSION STANDARDS Table 5. Component Analog Video Format Summary Color Bar Amplitudes (mV)
Format R/G/B/Sa G/B/R Y/I/Q (NTSC) Y/Q/I (MI ) Y/R - Y/B - Y∗ Y/U/V (PAL) Betacam 525 2 CH Y/CTDM Betacam 625 2 CH Y/CTDM MII 525 2 CH Y/CTCM MII 625 2 CH Y/CTCM SMPTE/EBU (Y/PB /PR ) a
Peak Excursions (mV)
Channel 1
Channel 2
Channel 3
100% 75%
100% 75%
100% 75%
+1V/+750 +700/+525 +714/+549 +934/+714 +700/+525 +700/+525 +714/+549 +714/+549 +700/+525 +700/+525 +700/+538 +714/+549 +700/+525 +700/+525 +700/+525
+1V/+750 +700/+525 ±393/ ± 295 ±476/ ± 357 ±491/ ± 368 ±306/ ± 229 ±467/ ± 350 ±467/ ± 350 ±467/ ± 350 ±467/ ± 350 ±324/ ± 243 ±350/ ± 263 ±350/ ± 263 ±350/ ± 263 ±350/ ± 263
+1V/+750 +700/+525 ±345/ ± 259 ±476/ ± 357 ±620/ ± 465 ±430/ ± 323 ±467/ ± 350 ±467/ ± 350 ±324/ ± 243 ±350/ ± 263 ±350/ ± 263
Synchronization Channels/Signals S = −4V G, B, R = −300 Y = −286 Y = −286 Y = −300 Y = −300 Y = −286 Y = ±286 Y = −300 Y = ±300 Y = −300 Y = −286 Y = −300 Y = −300 Y = −300
Setup
I = −600
No No Yes Yes No No Yes
C = −420 No C = −420 Yes C = −650 No C = −650 No
Other levels possible with this generic designation.
Table 6. Sampling Structures for Component Systemsa Sample/Pixel
Sample/Pixel
1
2
3
4
5
1
2
3
4
5
YCb Cr YCb Cr
YCb Cr YCb Cr
YCb Cr YCb Cr
YCb Cr YCb Cr
YCb Cr YCb Cr
YCb Cr YCb Cr
Y Y
YCb Cr YCb Cr
Y Y
YCb Cr YCb Cr
YCb Cr YCb Cr
YCb Cr YCb Cr
YCb Cr YCb Cr 4:4:4
YCb Cr YCb Cr
YCb Cr YCb Cr
YCb Cr YCb Cr
Y Y
YCb Cr YCb Cr 4:2:2
Y Y
YCb Cr YCb Cr
Line
Line
Sample/Pixel
Sample/Pixel
1
2
3
4
5
1
2
3
4
5
YCb Cr
Y
Y
Y
YCb Cr
Y
Y
Y
Y
YCb Cr
Y Cb Cr Y
Y
YCb Cr
Y Cb Cr Y
Y Cb Cr Y
Y Cb Cr Y
Y
Y Cb Cr Y 4:2:0
Y
Line
Y
Y
Line YCb Cr
Y
Y
Y
YCb Cr
YCb Cr
Y
Y 4:1:1
Y
YCb Cr
Y
Y
Y Cb Cr Y
a Y = luminance sample; Cb Cr = chrominance samples; YCb Cr = pixels so shown are cosited. Boldfaced type indicates bottom field, if interlaced.
(3fSC ); however, better filter response could be obtained by 4fSC sampling. The sampling instants correspond to peak excursions of the I and Q subcarrier components in NTSC. The total number of samples per scanning line is 910 for NTSC and 1,135 for PAL. To accommodate the 25 Hz offset in PAL, lines 313 and 625 each have 1,137 samples. The active picture portion of a line consists of 768 samples in NTSC and 948 samples in PAL. These specifications are standardized as SMPTE 244M (NTSC) and EBU Tech. 3,280 (PAL).
Unlike component digital, nearly the entire horizontal and vertical blanking intervals are sampled and quantized, which degrades the amplitude resolution (Fig. 18). However, in PAL, no headroom is provided for sync level, and the sampling instants are specified at 45 ° from the peak excursions of the V and U components of the subcarrier (Fig. 19). This allows ‘‘negative headroom’’ in the positive direction. Thus, an improvement of about 0.5 dB in the signal-to-noise ratio is obtained.
10-Bit
8-Bit Waveform location
Voltage level
Decimal Value
Hox Value
Binary Value
Decimal Value
Hox Value
Binary Value
Excluded Excluded
766 .3 mV 763 .9 mV
255
FF
1111 1111
1023 1020
3FF 3FF
11 1111 1111 11 1111 1100
Peak
700 .0 mV
235
EB
1110 1011
940
3AC
11 1010 1100
0 .0 mV
16
10
0001 0000
64
040
00 0100 0000
−48.7 mV −51.1 mV
0
0
0000 0000
3 0
003 000
00 0000 0011 00 0000 0000
Black Excluded Excluded
Figure 16. Quantizing levels for component digital luminance.
10-Bit Decimal Hex Value Value
Waveform location
Voltage level
Decimal Value
8-Bit Hex Value
Binary Value
Excluded Excluded
399.2 mV 396.9 mV
255
FF
1111 1111
1023 1020
3FF 11 1111 1111 3FC 11 1111 1100
Max positive
350.0 mV
240
F0
1111 0000
960
3C0 11 1100 0000
Black
0.0 mV
128
80
1000 0000
512
200
10 0000 0000
Max negative
−350.0 mV
16
10
0001 0000
64
040
00 0100 0000
Excluded Excluded
−397.7 mV −400.0 mV
0
00
0000 0000
3 0
003 000
00 0000 0011 00 0000 0000
Binary Value
Figure 17. Quantizing levels for component digital color difference.
1379
1380
TELEVISION BROADCAST TRANSMISSION STANDARDS
Decimal value
8-Bit Hex value
Binary value
Decimal value
998.7 mV 139.8 994.9 mV 139.3
255
FF
1111 1111
1023 1020
3FF 11 1111 1111 3FC 11 1111 1100
100% Chroma 907.7 mV 131.3
244
F4
1111 0100
975
3CF 11 1100 1111
Waveform location Excluded Excluded
Voltage level
IRE units
10-Bit Hex Binary value value
Peak white
714.3 mV
100
200
C8
1100 1000
800
320
11 0010 0000
Blanking
0.0 mV
0
60
3C
0011 1100
240
0F0
00 1111 0000
Sync tip
−285.7 mV −40
4
04
0000 0100
16
101
00 0001 0000
Excluded Excluded
−302.3 mV −42.3 −306.1 mV −42.9
0
00
0000 0000
3 0
003 000
00 0000 0011 00 0000 0000
Figure 18. Quantizing levels for composite digital NTSC.
Rate conversion between component and composite digital television signals involves different sampling points and quantizing levels. Each conversion degrades the picture because exact levels cannot be reproduced in each pass. An important advantage of digital coding is thereby lost. In addition, decoding composite signals requires filtering to prevent cross-luminance and cross-color effects. This forever removes a part of the information; therefore, this process must be severely limited in its use. Ancillary data may be added to digital component and composite video signals. AES/EBU-encoded digital audio can be multiplexed into the serial bit stream. Four channels are possible in the composite format, and 16 channels are possible in component digital video. Component Video Standards The video signals from a camera before encoding to NTSC, PAL, SECAM, or the ATSC Digital Standard are normally green (G), blue (B), and red (R). These are described as component signals because they are parts or components of the whole video signal. It has been found more efficient of bandwidth use for distribution and sometimes for processing to convert these signals into a luminance signal (Y) and two color-difference signals, blue minus luminance (B − Y) and red minus luminance (R − Y), where the color difference signals use 1/2 or 1/4 of the bandwidth of the luminance signal. The SMPTE/EBU Standard N10 adopted has a uniform signal specification for all 525/60 and 625/50 television systems. When the color-difference signals in this standard, are digitally formatted, they are
termed Cb and Cr , respectively. At the same time, due to the lower sensitivity of the human eye to fine detail in color, it is possible to reduce the bandwidth of the color-difference signals compared to that of the luminance signal. When these signals are digitized according to International Telecommunication Union, Radiocommunication Sector, (ITU-R) Recommendation 601, for both 525/60 and 625/50 systems, several modes of transmission may be used, all based on multiples of a 3.75 MHz sampling rate. For the ATSC standard, 4 : 2 : 0 is used (see below and the ATSC digital television standard). Either eight, or more frequently, 10 bits per sample are used. 4 : 4 : 4 Mode. The G, B, R or Y, Cb , Cr signal at an equal sampling rate of 13.5 MHz for each channel is termed the 4 : 4 : 4 mode of operation, and it yields 720 active samples per line for both 525/60 and 625/50 standards. This mode is frequently used for postproduction. If a (full-bandwidth) key signal must also be carried with the video, this combination is known as a 4 : 4 : 4 : 4 signal. 4 : 2 : 2 Mode. The 4 : 2 : 2 mode is more frequently used for distribution, where Y is sampled at 13.5 MHz, and the color-difference signals are sampled at a 6.25 MHz rate, corresponding to 360 active samples per line. 4 : 1 : 1 Mode. The 4 : 1 : 1 mode is used where bandwidth is at a premium, and the color-difference signals are each sampled at a 3.75 MHz rate, corresponding to 180 samples per line.
1381
1
0
−300.0 mV
−301.2 mV −304.8 mV
Excluded Excluded 0000 0000
0000 0001
0100 0000
3 0
4
256
844
3E8
003 000
004
100
34C
3E8
3FF 3FC
1023 1020
Binary value
00 0000 0011 00 0000 0000
00 0000 0010
01 0000 0000
11 0100 1100
11 1110 1000
11 1111 1111 11 1111 1100
10-Bit Hex value
Decimal value
Figure 19. Quantizing levels for composite digital PAL.
00
01
40
1101 0011
D3
211
Sync tip
700.0 mV
Peak white
1111 1010
F4
250
64
886.2 mV
100% Chroma Highest sample
Binary value
1111 1111
8-Bit Hex value
FF
255
0.0 mV
913.1 mV 909.5 mV
Excluded Excluded
Decimal value
Blanking
Voltage level
Waveform location
1382
TELEVISION BROADCAST TRANSMISSION STANDARDS
4 : 2 : 0 Mode. A further alternative, 4 : 2 : 0 mode, whose structure is not self evident, is derived from a 4 : 2 : 2 sampling structure but reduces the vertical resolution of the color-difference information by 2 : 1 to match the reduced color-difference horizontal resolution. Four line (and field sequential if interlaced) cosited Cb , Cr samples are vertically interpolated weighted toward the closest samples, and the resultant sample is located in between two adjacent scanning lines. This mode is used in MPEG bit-rate reduced digital signal distribution formats, and hence in the ATSC digital television standard. These four modes are illustrated in Table 6. ADVANCED TELEVISION SYSTEMS, CURRENT AND FUTURE ATSC Digital Television Standard Overview. From 1987 to 1995, the Advisory Committee on Advanced Television Service (ACATS) to the Federal Communications Commission, with support from Canada and Mexico, developed a recommendation for an Advanced Television Service for North America. The ACATS enlisted the cooperation of the best minds in the television industry, manufacturing, broadcasting, cable industry, film industry, and federal regulators, in its organization to develop an advanced television system that would produce a substantial improvement in video images and in audio performance over the existing NTSC, 525-line system. The primary video goal was at least a doubling of horizontal and vertical resolution and a widening in picture aspect ratio from the current 4 (W) × 3 (H) to 16 (W) × 9 (H), and this was named ‘‘high-definition television.’’ Also included was a digital audio system consisting of five channels plus a low-frequency channel (5.1). Twenty-one proposals were made for terrestrial transmission systems for extended-definition television (EDTV) or high-definition television (HDTV), using varying amounts of the RF spectrum. Some systems augmented the existing NTSC system by an additional channel of 3 MHz or 6 MHz, some used a separate simulcast channel of 6 MHz or 9 MHz bandwidth, and all of the early systems used hybrid analog/digital technology in signal processing by an analog RF transmission system. Later proposals changed the RF transmission system to digital along with all-digital signal processing. It was also decided that the signal would be transmitted in a 6 MHz RF channel, one for each current broadcaster of the (6 MHz channel) NTSC system, and that this new channel would eventually replace the NTSC channels. The additional channels were created within the existing UHF spectrum by improved design of TV receivers, so that the previously taboo channels, of which there were many, could now be used. In parallel with this effort, the Advanced Television Systems Committee (ATSC) documented and developed the standard known as the ATSC Digital Television Standard, and it is subsequently developing related implementation standards. In countries currently using 625-line, 4 : 3 aspect ratio television systems, plans are being developed to use a 1,250-line, 16 : 9 aspect ratio system eventually, and
the ITU-R has worked successfully to harmonize and provide interoperability between the ATSC and 1,250-line systems. Figure 20 shows the choices by which the signals of the various television standards will reach the consumer. Other articles detail satellite, cable TV, and asynchronous transfer mode (ATM) common carrier networks. The ATSC and the ITU-R have agreed on a digital terrestrial broadcasting model, which is shown in Fig. 21. Video and audio sources are coded and compressed in separate video and audio subsystems. The compressed video and audio are then combined with ancillary data and control data in a service multiplex and transport, in which form the combined signals are distributed to the terrestrial transmitter. The signal is then channel-coded, modulated, and fed at appropriate power to the transmission antenna. The receiver reverses the process, demodulates the RF signal to the transport stream, then demultiplexes the audio, video, ancillary, and control data into their separate but compressed modes, and the individual subsystems then decompress the bit streams into video and audio signals that are fed to display screen and speakers, and the ancillary and control data are used if and as appropriate within the receiver. Information Service Multiplex and Transport System. These subsystems provide the foundation for the digital communication system. The raw digital data are first formatted into elementary bit streams, representing image data, sound data, and ancillary data. The elementary bit streams are then formed into manageable packets of information (packetized elementary stream, PES), and a mechanism is provided to indicate the start of a packet (synchronization) and assign an appropriate identification code (packet identifier, PID) within a header to each packet. The packetized data are then multiplexed into a program transport stream that contains all of the information for a single (television) program. Multiple program transport streams may then be multiplexed to form a system level multiplex transport stream. Figure 22 illustrates the functions of the multiplex and transport system and shows its location between the application (e.g., audio or video) encoding function and the transmission subsystem. The transport and demultiplex subsystem functions in the receiver in the reverse manner, being situated between the RF modem and the individual application decoders.
Fixed-Length Packets. The transport system employs the fixed-length transportation stream packetization approach defined by the Moving Picture Experts Group (MPEG), which is well suited to the needs of terrestrial broadcast and cable television transmission of digital television. The use of moderately long, fixed-length packets matches well with the needs for error protection, and it provides great flexibility for initial needs of the service to multiplex audio, video, and data, while providing backward compatibility for the future and maximum interoperability with other media (MPEG-based). Packet Identifier. The use of a PID in each packet header to identify the bit stream makes it possible to have a mix
TELEVISION BROADCAST TRANSMISSION STANDARDS
Standard TV program
Wide-screen television program
Standard TV encoder
Wide-screen television encoder
1383
HDTV program Program sources
HDTV encoder
(Service multiplex and transport) Broadcaster program interface
MPEG-2 packets
Broadcaster distribution interface (physical (modulation) layer)
Terrestrial modulator
Satellite modulator
Cable modulator
Terrestrial services
Satellite services
Cable services
Antenna, tuner, and demodulator
Dish, tuner, and demodulator
Switched Switched network network distribution distribution ATM packets Physical delivery (disks, tapes)
Converted and demodulator Local hub
Consumer interface
Standard TV receiver
HDTV receiver
Consumer recorder
Figure 20. Television service model.
Video
Video subsystem Video source coding and compression
Service multiplex and transport
RF/transmission system Channel conding
Transport
Audio
Audio subsystem Audio source coding and compression
Service multiplex Modulation
Ancillary data Control data
Receiver characteristics
of audio, video, and auxiliary data that is not specified in advance.
Scalability and Extensibility. The transport format is scalable in that more elementary bit streams may be added at the input of the multiplexer or at a second multiplexer. Extensibility for the future could be achieved without hardware modification by assigning new PIDs for additional elementary bit streams.
Figure 21. Block diagram showing ATSC and ITU-R terrestrial television broadcasting model.
Robustness. After detecting errors during transmission, the data bit stream is recovered starting from the first good packet. This approach ensures that recovery is independent of the properties of each elementary bit stream. Transport Packet Format. The data transport packet format, shown in Fig. 23, is based on fixed-length packets (188 bytes) identified by a variable-length header, including a sync byte and the PID. Each header identifies a
TELEVISION BROADCAST TRANSMISSION STANDARDS
Sources for encoding (video, audio, data, etc.)
Application encoders
Transmitter
PID1 Elementary stream 1 (Video?) PID2 Elementary stream 2 (Audio 1?) PID3 Elementary stream 3 (Audio 2?) . . PID(n − 1) Elementary stream n − 1 (Data i) PIDn Elementary stream n (Data j) PID(n + 1) Elementary stream map (program_map_table)
Multiplexer
Program transport stream 1
System level
Program transport stream 2 Program transport stream 3 Program transport stream 4
multi-
. . .
plex Multiplexer
Modem
Program transport stream 5 Program stream map
PID = 0
Clock
. . .
Transport depacketization and demultiplexing
Application encoders
Presentation
Receiver
Elementary
Program transport stream 1 Transport bit stream with error signaling
bit streams
System level
Program transport stream 2 Program transport stream 3 Program transport stream 4
format
(program_association_table) Transmission
1384
multi-
. . .
plex Demultiplexer
Modem
Program transport stream 5
with error signaling
Clock Clock control
Figure 22. Organization of functionality within a transport system for digital TV programs.
188 bytes 4 bytes "Link" header
Variablelength adaptation header Payload (Not to scale)
Figure 23. Transport packet format.
particular application bit stream (elementary bit stream) that forms the payload of the packet. Applications include audio, video, auxiliary data, program and system control information, and so on.
the (fixed-length) transport packet layer. New PES packets always start a new transport packet, and stuffing bytes (i.e., null bytes) are used to fill partially filled transport packets.
PES Packet Format. The elementary bit streams themselves are wrapped in a variable-length packet structure called the packetized elementary stream (PES) before transport processing. (Fig. 24). Each PES packet for a particular elementary bit stream then occupies a variable number of transport packets, and data from the various elementary bit streams are interleaved with each other at
Channel Capacity Allocation. The entire channel capacity can be reallocated to meet immediate service needs. As an example, ancillary data can be assigned fixed amounts depending on a decision as to how much to allocate to video; or if the data transmission time is not critical, then it can be sent as opportunistic data during periods when the video channel is not fully loaded.
Variable length
Figure 24. Structural overview of packetized elementary stream (PES) packet.
3 bytes Packet start code prefix
2 2 bytes bits 14 bits PES PES Stream packet 10 packet ID length flags 1 byte
1 byte PES packet length
PES header PES packet data block fields
TELEVISION BROADCAST TRANSMISSION STANDARDS
Audio
Audio
Video
1385
Video
Packetized elementary stream (PES) Audio Video Audio Video Video Video Audio Video Audio Audio Video Video Video Audio Video Figure 25. Variable-length PES packets and fixed-length transport packets.
Transport stream
Figure 25 illustrates how the variable-length PES packets relate to the fixed-length transport packets. The transport system provides other features, including decoder synchronization, conditional access, and local program insertion. Issues relating to the storage and playback of programs are also addressed, and the appropriate hooks are provided to support the design of consumer digital products based on recording and playback of these bitstreams, including the use of ‘‘trick modes’’ such as slow motion and still frame, typical of current analog video cassette recorders (VCRs).
Local Program Insertion. This feature is extremely important to permit local broadcast stations to insert video, audio, or data unique to that station. As shown in Fig. 26 to splice local programs, it is necessary to extract (by demultiplexing) the transport packets, identified by the PIDs of the individual elementary bit streams, which make up the program that is to be replaced, including the program map table, which identifies the individual bit streams that make up the program. Program insertion can then take place on an individual PID basis, using the fixed-length transport packets.
Input program transport stream
Program_map_PID Program_map_ table update
Elementar y bit streams
Pr og ra m PI _m D
ap _
Source of program to be spliced in
Splicing operation Splicing operation
Flow through
Output program transport stream
Elementary bit stream termination Figure 26. Example of program insertion architecture.
Presentation Time Stamp and Decoding Time Stamp. Both of these time stamps occur within the header of the PES packet, and they are used to determine when the data within the packet should be read out of the decoder. This process ensures the correct relative timing of the various elementary streams at the decoder relative to the timing at which they were encoded. Interoperability with ATM. The MPEG-2 transport packet size (188 bytes) is such that it can easily be partitioned for transfer in a link layer that supports asynchronous transfer mode (ATM) transmission (53 bytes per cell). The MPEG-2 transport layer solves MPEG2 presentation problems and performs the multimedia multiplexing function, and the ATM layer solves switching and network adaptation problems. Video Systems
Compressed Video. Compression in a digital HDTV system is required because the bit rate required for an uncompressed HDTV signal approximates 1 Gbps (when the luminance/chrominance sampling is already compressed to the 4 : 2 : 2 mode). The total transmitted data rate over a 6 MHz channel in the ATSC digital television standard is approximately 19.4 Mbps. Therefore, a compression ratio of 50 : 1 or greater is required. The ATSC Digital Television Standard specifies video compression using a combination of compression techniques which, for compatibility, conform to the algorithms of MPEG-2 Main Profile, High Level. The goal of the compression and decompression process is to produce an approximate version of the original image sequence, such that the reconstructed approximation is imperceptibly different from the original for most viewers, for most images, and for most of the time. Production Formats. A range of production format video inputs may be used. These include the current NTSC format of 483 active lines, 720 active samples/line, 60 fields, 2 : 1 interlaced scan (60I); the Standard Definition format of 480 active lines, 720 active samples/line, 60 frames progressive scan (60P); and high definition formats of 720 active lines, 1,280 active samples/line, 60P, or 1,080 active lines, 1,920 active samples/line, 60I. Compression Formats. A large range of 18 compression formats is included to accommodate all of the those production formats. The 30P and 24P formats are included primarily to provide efficient transmission of film images associated with these production formats. The VGA Graphics format is also included at 480 lines and 640 pixels
1386
TELEVISION BROADCAST TRANSMISSION STANDARDS Table 7. ATSC Compression Formats: A Hierarchy of Pixels and Bitsa
Active Lines
Pixels per Line
1,080 1,920 720 1,280 480 704 480 640 Vertical Horizontal Resolution a
Total Pixels per Frame 2,073,600 921,600 337,920 307,200
Uncompressed Payload Bit Rate in Mbps (8-bit 4 : 2 : 2 sampling) at Picture (Frame) Rate 60P
60I
Future 885 324 295 Higher
995 — 162 148
30P
995 442 162 148 ← → Temporal Resolution
24P 796 334 130 118 Lower
Aspect Ratio and Notes 16 : 9 only 16 : 9 only 16 : 9&4 : 3 4 : 3 only (VGA)
Data courtesy of Patrick Griffis, Panasonic, NAB, 1998.
(see later for pixel definition). Details of these compression formats are found in Table 7. Colorimetry. The Digital Television Standard specifies SMPTE 274M colorimetry (same as ITU-R BT.709, 1990) as the default and preferred colorimetry. This defines the color primaries, transfer characteristics, and matrix coefficients. Sample Precision. After preprocessing, the various luminance and chrominance samples will typically be represented using 8 bits per sample of each component. Film Mode. In the case of 24 fps film which is sent at 60 Hz rate using a 3 : 2 pull-down operation, the processor may detect the sequences of three nearly identical pictures followed by two nearly identical pictures and may encode only the 24 unique pictures per second that existed in the original film sequence. This avoids sending redundant information and permits higher quality transmission. The processor may detect similar sequencing for 30 fps film and may encode only the 30 unique pictures per second. Color Component Separation and Processing. The input video source to the video compression system is in the form of RGB components matrixed into luminance (Y) (intensity or black-and-white picture) and chrominance (Cb and Cr ) color-difference components, using a linear transformation. The Y, Cb , and Cr signals have less correlation with each other than R, G, and B and are thus easier to code. The human visual system is less sensitive to high frequencies in the chrominance components than in the luminance components. The chrominance components are low-pass-filtered and subsampled by a factor of 2 in both horizontal and vertical dimensions (4 : 2 : 0 mode) (see section entitled Component Video Standards). Representation of Picture Data. Digital television uses digital representation of the image data. The process of digitization involves sampling the analog signals and their components in a sequence corresponding to the scanning raster of the television format and representing each sample by a digital code. Pixels. The individual samples of digital data are referred to as picture elements, ‘‘pixels’’ or ‘‘pels.’’ When
the ratio of active pixels per line to active pixels per frame is the same as the aspect ratio, the format is said to have ‘‘square pixels.’’ The term refers to the spacing of samples, not the shape of the pixel. Blocks, Macroblocks, and Slices. For further processing, pixels are organized into 8 × 8 blocks, representing either luminance or chrominance information. Macroblocks consist of four blocks of luminance (Y) and one each of Cb and Cr . Slices consist of one or more macroblocks in the same row, and they begin with a slice start code. The number of slices affects compression efficiency; a larger number of slices provides for better error recovery but uses bits that could otherwise be used to improve picture quality. The slice is the minimum unit for resynchronization after an error. Removal of Temporal Information Redundancy: Motion Estimation and Compensation. A video sequence is a series of still pictures shown in rapid succession to give the impression of continuous motion. This usually results in much temporal redundancy (picture sameness) among adjacent pictures. Motion compensation attempts to delete this temporal redundancy from the information transmitted. In the standard, the current picture is predicted from the previously encoded picture by estimating the motion between the two adjacent pictures and compensating for the motion. This ‘‘motioncompensated residual’’ is encoded rather than the complete picture and eliminates repetition of the redundant information.
Pictures, Groups of Pictures, and Sequences. The primary coding unit of a video sequence is the individual video frame or picture, which consists of the collection of slices constituting the active picture area. A video sequence consists of one or more consecutive pictures, and it commences with a sequence header that can serve as an entry point. One or more pictures or frames in sequence may be combined into a group of pictures (GOP), optional within MPEG-2 and the ATSC Standard, to provide boundaries for interpicture coding and registration of a time code. Figure 27 illustrates a time sequence of video frames consisting of intracoded pictures (I-frames), predictive
TELEVISION BROADCAST TRANSMISSION STANDARDS Group of picutres
Encoding and transmission order 100 99 102 101 104 Frame 103 B I B P B P B P B I type 106 105 Source and 108 99 100 101 102 103 104 105 106 107 108 display order 107
Block (8 pels × 8 lines)
1
2
3
4
4:2:0 5
Cb y
6
Cr slice
Macroblock
Picture (frame)
Group of pictures (GOP) Video sequence
Figure 27. Video frame order, group of pictures, and typical I-frames, P-frames, and B-frames.
Figure 28. Video structure hierarchy.
coded pictures (P-frames), and bidirectionally predictive coded pictures (B-frames). (a)
I-, P-, and B-Frames. Frames that do not use any interframe coding are referred to as I-frames (where I denotes intraframe coded). All of the information for a complete image is contained within an I-frame, and the image can be displayed without reference to any other frame. (The preceding frames may not be present or complete for initialization or acquisition, and the preceding or following frames may not be present or complete when noncorrectable channel errors occur.) P-frames (where P denotes predicted) are frames where the temporal prediction is only in the forward direction (formed only from pixels in the most recently decoded I- or P-frame). Interframe coding techniques improve the overall compression efficiency and picture quality. P-frames may include portions that are only intraframecoded. B-frames (where B denotes bidirectionally predicted) include prediction from a future frame as well as from a previous frame (always I- or P-frames). Some of the consequences of using future frames in the prediction are as follows: The transmission order of frames is different from the displayed order of frames, and the encoder and decoder must reorder the video frames, thus increasing the total latency. B-frames are used for increasing compression efficiency and perceived picture quality. Figure 28 illustrates the components of pictures, as discussed before. Removal of Spatial Information Redundancy: The Discrete Cosine Transform. As shown in Fig. 29, 8 × 8 blocks of spatial intensity that show variations of luminance and chrominance pel information are converted into 8 × 8 arrays of coefficients relating to the spatial frequency content of the original intensity information. The transformation method used is the discrete cosine transform (DCT). As an example, in Fig. 29a, an 8 × 8 pel array representing a black-to-white transition is shown as increasing levels of a gray scale. In Fig. 29b, the grayscale steps have been digitized and are represented by pel amplitude numerical values. In Fig. 29c, the grayscale block is represented by its frequency transformation
1387
(b)
0 12.5 25 37.5 50 62.5 75 87.5 0 12.5 25 37.5 50 62.5 75 87.5 0 12.5 25 37.5 50 62.5 75 87.5 0 12.5 25 37.5 50 62.5 75 87.5 0 12.5 25 37.5 50 62.5 75 87.5 0 12.5 25 37.5 50 62.5 75 87.5 0 12.5 25 37.5 50 62.5 75 87.5 0 12.5 25 37.5 50 62.5 75 87.5 DCT-DC component
(c)
43.8 −40 0 −4.1 0 −1.1 0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Figure 29. Discrete cosine transform.
1388
TELEVISION BROADCAST TRANSMISSION STANDARDS
coefficients, appropriately scaled. The DCT compacts most of the energy into only a small number of the transform coefficients. To achieve a higher decorrelation of the picture content, two-dimensional (along two axes) DCT coding is applied. The (0,0) array position (top left), represents the DC coefficient or average value of the array. Quantizing the Coefficients. The goal of video compression is to maximize the video quality for a given bit rate. Quantization is a process of dividing the coefficients by a value of N which is greater than 1, and rounding the answer to the nearest integral value. This allows scaling the coefficient values according to their importance in the overall image. Thus high-resolution detail to which the human eye is less sensitive may be more heavily scaled (coarsely coded). The quantizer may also include a dead zone (enlarged interval around zero) to core to zero small noise-like perturbations of the element value. Quantization in the compression algorithm is a lossy step (information is discarded that cannot be recovered). Variable Length Coding, Codeword Assignment. The quantized values could be represented using fixed-length codewords. However, greater efficiency can be achieved in bit rate by employing what is known as entropy coding. This attempts to exploit the statistical properties of the signal to be encoded. It is possible to assign a shorter code word to those values that occur more frequently and a longer code word to those that occur less frequently. The Morse code is an example of this method. One optimal code word design method, the Huffman code, is used in the Standard. Note that many zero value coefficients are produced, and these may be prioritized into long runs of zeros by zigzag scanning or a similar method. Channel Buffer. Motion compensation, adaptive quantization, and variable-length coding produce highly variable amounts of compressed video data as a function of time. A buffer is used to regulate the variableinput bit rate into a fixed-output bit rate for transmission. The fullness of the buffer is controlled by adjusting the amount of quantization error in each image block (a rate controller driven by a buffer state sensor adjusts the quantization level). Buffer size is constrained by maximum tolerable delay through the system and by cost. Audio System Overview. The audio subsystem used in the ATSC Digital Television Standard is based on the AC-3 digital audio compression standard. The subsystem can encode from one to six channels of source audio from a pulse-code modulation (PCM) representation (requiring 5.184 Mbps for the 5.1 channel mode) into a serial bit stream at a normal data rate of 384 kbps. The 5.1 channels are left (front), center (front), right (front), left surround (rear), right surround (rear) (all 3 Hz to 20 kHz), and lowfrequency subwoofer (normally placed centrally) (which
represents the 0.1 channel, 3 Hz to 120 Hz). The system conveys digital audio sampled at a frequency of 48 kHz, locked to the 27 MHz system clock. In addition to the 5.1 channel input, monophonic and stereophonic inputs and outputs can be handled. Monophonic and stereophonic outputs can also be derived from a 5.1 channel input, permitting backward compatibility. The audio subsystem, illustrated in Fig. 30, comprises the audio encoding/decoding function and resides between the audio inputs/outputs and the transport system. The audio encoder(s) is (are) responsible for generating the audio elementary stream(s), which are encoded representations of the baseband audio input signals. The transport subsystem packetizes the audio data into PES packets which are then further packetized into (fixedlength) transport packets. The transmission subsystem converts the transport packets into a modulated RF signal for transmission to the receiver. Transport system flexibility allows transmitting multiple audio elementary streams. The encoding, packetization, and modulation process is reversed in the receiver to produce reconstructed audio. Audio Compression. Two mechanisms are available for reducing the bit rate of sound signals. The first uses statistical correlation to remove redundancy from the bit stream. The second uses the psychoacoustical characteristics of the human hearing system such as spectral and temporal masking to reduce the number of bits required to recreate the original sounds. The audio compression system consists of three basic operations, as shown in Fig. 31. In the first stage, the representation of the audio signal is changed from the time domain to the frequency domain, which is more efficient, to perform psychoacoustical audio compression. The frequency domain coefficients may be coarsely quantized because the resulting quantizing noise will be at the same frequency as the audio signal, and relatively low signal-tonoise ratios (SNRs) are acceptable due to the phenomena of psychoacoustic masking. The bit allocation operation determines the actual SNR acceptable for each individual frequency coefficient. Finally, the frequency coefficients are coarsely quantized to the necessary precision and formatted into the audio elementary stream. The basic unit of encoded audio is the AC-3 sync frame, which represents six audio blocks of 256 frequency coefficient samples (derived from 512 time samples), a total of 1,536 samples. The AC-3 bit stream is a sequence of AC-3 sync frames. Additional Audio Services. Additional features are provided by the AC-3 subsystem. These include loudness normalization, dynamic range compression that has an override for the listener, and several associated services; dialogue, commentary, emergency, voice-over, help for the visually impaired and hearing-impaired (captioning), and multiple languages. Some of these services are mutually exclusive, and multilanguage service requires up to an extra full 5.1 channel service for each language (up to an additional 384 kbps).
TELEVISION BROADCAST TRANSMISSION STANDARDS
1389
8VSB RF transmission Left
Transport subsystem
Right Center Left surround Right surround
Signal encoder
Modulator
Transport packets
Elementary stream(s)
Modulated signal
LFE
Channel
Left Right Center Left surround Right surround
Transport subsystem Signal decoder
Elementary stream(s)
Demodulator
8VSB RF reception
Transport packets
LFE
Frequency coefficients Audio source
Analysis filter bank
Figure 30. Audio subsystem within the digital television system.
Elementary bit stream Bit allocation
Quantization
Channel
Reconstructed audio
Synthesis filter bank
Dequantization
Bit allocation
Reconstructed frequency coefficients
Ancillary Data Services Several data services have been included in the ATSC Standard. Other services can be added in the future. Currently, program subtitles (similar to closed captioning in NTSC), emergency messages (mixed into baseband video in NTSC), and program guide information are included. Possible Future Data Services. Information data related to the following may be desired: conditional access, picture structure, colorimetry, scene changes, local program insertion, field/frame rate and film pull-down, pan/scan, multiprogram, and stereoscopic image. Transmission Characteristics The transmission subsystem uses a vestigial sideband (VSB) method: (1) 8-VSB for simulcast terrestrial
Elementary bit stream
Figure 31. Overview of audio compression system.
broadcast mode and (2) a 16-VSB high data rate mode. VSB includes a small part of the lower sideband and the full upper sideband. Sloped filtering at the transmitter and/or the receiver attenuates the lower end of the band. The 8-VSB coding maps three bits into one of eight signal levels. The system uses a symbol rate of 10.76 Msymbols/s, capable of supporting a data stream payload of 19.39 MBits/s. See Fig. 32 VSB in a 6 MHz channel. Modulation techniques for some other planned broadcast systems use orthogonal frequency division multiplexing (OFDM) or coded OFDM (COFDM), which is a form of multicarrier modulation where the carrier spacing is selected, so that each subcarrier within the channel is orthogonal to the other subcarriers; this mathematically ensures that during the sampling time for one carrier, all other carriers are at a zero point.
1390
TELEVISION BROADCAST TRANSMISSION STANDARDS
1.0 0.7 Pilot, suppressed carrier
0.5
0
d d
d = 0.31 MHz 5.38 MHz 6.00 MHz
d d
Figure 32. Vestigial sideband (VSB) in a 6 MHz channel for digital transmission.
The 8-VSB subsystem takes advantage of a pilot, segment sync, and a training sequence for robust acquisition and operation. To maximize service area, an NTSC rejection filter (in the receiver) and trellis coding are used. The system can operate in a signal-to-additivewhite-Gaussian noise (S/N) environment of 14.9 dB. The transient peak power to average power ratio measured on a low-power transmitted signal that has no nonlinearities is no more than 6.3 dB for 99.9% of the time. A block diagram of a generic transmitter subsystem is shown in Fig. 33. The incoming data (19.39 MB/s) are randomized and then processed for forward error correction (FEC) in the form of Reed–Solomon coding (20 RS parity bits are added to each packet, known as outer error correction). Data interleaving to reorganize the data stream so that it is less vulnerable to bursts of errors, then interleaves to a depth of about 1/6 data field (4 ms deep). The second stage, called inner error correction, consists of a 2/3 rate trellis coding. This encodes one bit of a two-bit pair into two output bits, using a 1/2 convolutional code, whereas the other input bit is retained as precoded. Along with the trellis encoder, the data packets are precoded into data frames and mapped into a signaling waveform using an eight-level (3 bit), one-dimensional constellation (8 VSB). Data segment sync (4 symbols = 1 byte) at the beginning of a segment of 828 data plus parity symbols, and data field sync at the beginning of a data field of 313 segments (24.2 ms), are then added. data field sync includes the training signal used for setting the receiver equalizer.
A small in-phase pilot is then added to the data signal at a power of 11.3 dB below the average data signal power. The data are then modulated onto an IF carrier, which is the same frequency for all channels. The RF up-converter then translates the filtered, flat IF data signal spectrum to the desired RF channel. It is then amplified to the appropriate power for the transmitting antenna. For the same approximate coverage as an NTSC transmitter (at the same frequency), the average power of the ATV signal is approximately 12 dB less than the NTSC peak sync power. The frequency of the RF up-converter oscillator will typically be the same as that for NTSC (except for offsets). For extreme cochannel situations, precise RF carrier frequency offsets with respect to the NTSC cochannel carrier may be used to reduce interference into the ATV signal. The ATV signal is noise-like, and its interference into NTSC does not change with precise offset. The ATV cochannel pilot should be offset in the RF upconverter from the dominant NTSC picture carrier by an odd multiple of half the data segment rate. An additional offset of 0, +10 kHz, or −10 kHz is required to track the principal NTSC interferer. For ATV-into-ATV cochannel interference, precise carrier offset prevents the adaptive equalizer from misinterpreting the interference as a ghost. The Japanese High-Definition Television Production System This television production system was developed by the Japanese Broadcasting Corporation (NHK). It was standardized in 1987 by the Broadcast Technology Association (BTA), now renamed the Association of Radio Industries and Business (ARIB), in Japan and in the United States by SMPTE (240M and 260M Standards). It uses a total of 1,125 lines (1,035 active lines), is interlaced at a field rate of 60 Hz, and has an aspect ratio of 16 : 9. It requires a bandwidth of 30 MHz for the luminance signal (Y), and 15 MHz for each of the two color difference signals PB and PR When digitized at eight bits per sample, it uses 1,920 pixels per line, and it requires a total bit rate of 1.2 Gbps. Note that this production system is similar to the interlaced system used in the ATSC standard, except that the latter uses 1,080 active lines.
Transport layer interface
Transport data Stream
Inner error correction
Outer error correction
Data interleaver
Mapper Mux
(Optional)
Synchronization signal
Preequalizer filter
Figure 33. 8-VSB block diagram.
transmitter
subsystem (Optional)
Modulator
Pilot insertion
RF up-converter
Amplifier
Pre-coder
TELEVISION BROADCAST TRANSMISSION STANDARDS
Japanese MUSE Transmission Systems
compared to 525I, and the aspect ratio has been changed from 4 : 3 to 16 : 9. This increase of sampling frequency permits maintaining comparable resolution in H and V axes. The production system is effectively an 8 : 4 : 4 digital system that has production interfaces at 540 Mbps. A 4 : 2 : 0 system can also be used in production and would require interfacing at 360 Mbps. Horizontal blanking is shrunk to achieve this bit rate. The EDTV-II analog transmission system is used for both terrestrial and satellite broadcasting. It requires the same bandwidth as the NTSC system, and no changes are needed in transmitter implementations. The image is displayed on an EDTV-II receiver, progressively by 480 lines and a 16 : 9 aspect ratio. It is compatible with existing NTSC receivers, except that the display image has a 16 : 9 aspect ratio and so appears in a letterbox format that has black bars at top and bottom. The 525P signal requires a video bandwidth of approximately 6.2 MHz. The EDTV-II system creates three enhancement signals in addition to an NTSC signal, with which they are then frequency-domain-multiplexed.
A range of transmission systems were developed by NHK based on the multiple sub-Nyquist encoding (MUSE) transmission scheme (see Table 8). MUSE (8.1 MHz bandwidth) was developed for DBS broadcasting and MUSE-T (16.2 MHz bandwidth) was developed for satellite transmission. MUSE-6 was designed to be compatible with a 6 MHz channel and NTSC receivers. MUSE-9 uses a 3 MHz augmentation channel in addition to the standard 6 MHz channel and is NTSC receivercompatible. Japanese Hi-Vision System This system incorporates the 1,920 × 1,035 television production system and the MUSE-E transmission system. MUSE-E uses an 8.1 MHz bandwidth and is incompatible with standard NTSC receivers and channel allocations. Four audio channels are time-division-multiplexed with the video signals in the blanking intervals. The encoding and decoding processes are both very complex and require many very large scale integration (VLSI) chips. This system requires a MUSE-E receiver, or a set-top box equipped with a MUSE decoder that feeds either a 16 : 9 display or a 4 : 3 aspect ratio conventional receiver. In the near-term, NHK will use simultaneous Hi-Vision/NTSC program production. The MUSE systems are not receiver-compatible with either the North-American ATSC system or the European DVB system (see later).
Main Picture (MP). The 525P 16 : 9 signal is reduced from 6.2 MHz to 4.2 MHz bandwidth, and the 480 lines are decimated to 360 lines to produce a letterbox display on the NTSC 4 : 3 receiver. Black bars at top and bottom are each 60 lines wide. Thus, horizontal and vertical resolution are reduced to conform to the NTSC format, but to maintain the 16 : 9 aspect ratio. Horizontal High (HH 4.2 MHz to 6.2 MHz). A frequency enhancement signal is extracted from the original 525P image and is multiplexed into the MP signal to increase the horizontal bandwidth to 6.2 MHz in the EDTV-II receiver. For transmission, the HH signal is downshifted to 2 to 4 MHz and frequency-division-multiplexed into an unused vertical temporal frequency domain in the conventional NTSC system called the Fukinuki hole. The Fukinuki hole may be used only for correlated video information, which applies in this case. In the EDTV-II receiver, a motion detector multiplexes the HH signal only onto the still parts of the picture where there is more need for high resolution to satisfy human vision characteristics. Two enhancement signals are frequency-division-multiplexed together into the top
The Japanese Enhanced Definition Television System (EDTV-II) EDTV-II is an NTSC-compatible letterbox analog transmission system standardized by the ARIB in Japan. The input signal is a 525-line, 60-frame progressive scan (525P) that has a 16 : 9 aspect ratio. A 525-line, 30-frame interlaced scan (525I) can be up-converted as an input signal. Note that the 525P signal is one of the SDTV signal formats defined in the ATSC Standard (720 × 480 at 60 P). It is also defined as a production format in SMPTE 293M and SMPTE 294M standards documents. Compared with the current 525I standard, the frame rate has been doubled from 30 to 60. The sampling frequency in the format has been doubled to 27 MHz, Table 8. MUSE Transmission Systems Transmission System MUSE
MUSE-T MUSE-6 MUSE-9
MUSE-E
Type of Transmission Direct broadcast by satellite (DBS) Satellite Terrestrial broadcast Terrestrial broadcast Terrestrial broadcast
1391
Bandwidth
Channel Compatible
Compatible with NTSC
8.1 MHz
NA
No
16.2 MHz 6 MHz
NA Yes
No Yes
6 + 3 MHz Augmentation
Yes, with 2nd 3 MHz channel No
Yes
8.1 MHz
No
1392
TELEVISION BROADCAST TRANSMISSION STANDARDS
and bottom panels, which together occupy one-third as much area as the main picture. As these are generated in a 360-line format, they must be compressed by a 3 to 1 pixel downsampling decimation process to fit into the 120 lines of the top and bottom panels. Vertical High Frequency (VH). The VH signal enhances the vertical still picture resolution back up to 480 lines. The signal is transmitted only for stationary areas of the image, and temporal averaging is applied. Vertical Temporal Frequency (VT). The VT enhancement signal is derived from the progressive-to-interlace scan conversion at the encoder and improves the interlace-toprogressive scan (360/2 : 1 to 360/1 : 1) conversion in the receiver. The EDTV-II receiver performs the reverse of the encoding process. The NTSC receiver uses the MP signal directly. The European DVB System The Digital Video Broadcast (DVB) system has been designed for MPEG-2-based digital delivery systems for satellite, cable, community cable, multichannel multipoint distribution (MMDS), and terrestrial broadcasting. Service information, conditional access, and teletext functions are also available. All DVB systems are compatible. DVB-T, the terrestrial broadcasting standard, is similar in many respects to the ATSC standard. However, there are a number of significant differences. DVB-T uses coded orthogonal frequency division multiplexing (COFDM). This technique is already being used for direct audio broadcast (DAB). Individual carriers 1,704 (2 k) or 6,816 (8 k) may be used. The 8-k system is more robust, but increases receiver complexity and cost. Some broadcasters have already adopted the 2-k system, although it will not be compatible with the 8-k system. DVB-T uses the MPEG-2 Layer II Musicam audio standard, a 50 Hz frame rate, and aspect ratios of 4 : 3, 16 : 9, or 20 : 9. The European PALplus System This is an analog delivery system that uses a current TV channel to transmit an enhanced wide-screen version of the PAL signal. A conventional receiver displays the PALplus picture as a letterbox in a 4 : 3 aspect ratio. A wide-screen receiver shows the same transmitted picture in a 16 : 9 format at higher resolution. European broadcasters are divided on whether to use this format. The PALplus concept is similar to the Japanese EDTV-II format described before. Acknowledgments The authors sincerely thank the following for permission to use portions of their work in this article: • The Advanced Television Systems Committee (ATSC) and its Executive Director, Craig Tanner, for text and figures from Standards A/52, A/53, and A/54. • Mr. Stanley N. Baron for text and figures from his book Digital Image and Audio Communications, Toward a Global Information Infrastructure.
• Mr. Patrick Griffis for the data in a figure from his article Bits = Bucks, Panasonic, paper presented at NAB, 1998, unpublished.
BIBLOGRAPHY 1. M. Ashibe and H. Honma, A wide-aspect NTSC compatible EDTV system, J. SMPTE, Mar. 1992, p. 130. 2. ATSC Digital Television Standard, Advanced Television Systems Committee, Doc. A/53, 16 Sept., 1995. 3. S. N. Baron, ed., Composite Digital Television: A Primer, Soc. of Motion Picture and Television Eng., White Plains, NY, 1996. 4. S. N. Baron and M. I. Krivocheev, Digital Image and Audio Communications, Toward a Global Information Infrastructure, Van Nostrand Reinhold, New York, 1996. 5. K. B. Benson, ed., Television Engineering Handbook, revised by J. Whitaker, McGraw-Hill, New York, 1992. 6. Digital Audio Compression Standard (AC-3), Advanced Television Systems Committee, Doc. A/52, 20 Dec., 1995. 7. A. Dubec, The SECAM Colour Television System, Compagnie Fran¸caise de T´el´evision, Paris, 1976. 8. P. Griffis, Bits = Bucks, Panasonic, paper at NAB, 1998. 9. Guide to the Use of the ATSC Digital Television Standard, Advanced Television Systems Committee, Doc. A/54, 4 Oct., 1995. 10. A. Itoh, 525 line progressive scan signal digital interface standard and system, J. SMPTE, Nov. 1997, p. 768. 11. R. W. G. Hunt, The Reproduction of Colour, 5th ed., Fountain Press, Kingston-upon-Thames, 1995. 12. G. Hutson, P. Shepherd, and J. Brice, Colour Television, McGraw-Hill, London, 1990. 13. A. F. Inglis and A. C. Luther, Video Engineering, 2nd ed., McGraw-Hill, New York, 1996. 14. ISO/IEC IS 13818-1, International Standard MPEG-2 Systems, 1994. 15. ISO/IEC IS 13818-2, International Standard MPEG-2 Video, 1994. 16. ISO/IEC IS 13818-2, Section 8. 17. ITU-R BT.470-4, Characteristics of television systems, International Telecommunications Union, Geneva, 1995. 18. ITU-R Document, 11-3/15, MPEG digital compression systems, 9 Aug., 1994. 19. K. Jack, Video Demystified, 2nd ed., HighText Interactive, San Diego, 1996. 20. K. Jackson and B. Townsend, eds., TV & Video Engineer’s Reference Book, Butterworth-Heinemann, Oxford, England, 1991. 21. H. Y. Kim and S. Naimpally, Digital EDTV, compatible HDTV, J. SMPTE, Feb. 1993, p. 119. 22. B. Marti et al., Problems and perspectives of digital terrestrial TV in Europe, J. SMPTE, Aug. 1993, p. 703. ¨ 23. R. Mausl, Refresher Topics — Television Technology, Rohde & Schwarz, Munich, 1992. 24. R. S. O’Brien, ed., Color Television, Selections from the Journal of the SMPTE, Society of Motion Picture and Television Engineers, New York, 1970. 25. G. Pensinger, ed., 4 : 2 : 2 Digital Video Background and Implementation, Society of Motion Picture and Television Engineers, White Plains, NY, 1989. 26. D. H. Pritchard and J. J. Gibson, Worldwide color television standards — similarities and differences, J. Soc. Motion Picture and Television Eng. 89, 111–120 (1980).
TERAHERTZ ELECTRIC FIELD IMAGING 27. Proc. IRE, Color Television Issue 39(10) (1951). 28. V. Reimer, Advanced TV systems, Germany and Central Europe, J. SMPTE, May 1993, p. 398. 29. M. Robin, Addendum to Worldwide color television standards — similarities and differences, J. Soc. Motion Picture and Television Eng. 89, 948–949 (1980). 30. M. Robin and M. Poulin, Digital Television Fundamentals, McGraw-Hill, New York, 1997. 31. T. Rzeszewski, ed., Color Television, IEEE Press, New York, 1983. 32. T. S. Rzeszewski, ed., Television Technology Today, IEEE Press, New York, 1984. 33. H. V. Sims, Principles of PAL Colour Television and Related Systems, Newnes Technical Books, London, 1969. 34. SMPTE 274 M Standard for Television, 1, 920 × 1, 080 Scanning and Interface, 1995. 35. SMPTE S17.392 Proposed Standard for Television, 1, 280 × 720 Scanning and Interface, 1995. 36. V. Steinberg, Video Standards, Signals, Formats, and Interfaces, Snell & Wilcox, Durford Mill, England, 1997. 37. N. Suzuki et al., Matrix conversion VT resolution in letterbox, J. SMPTE, Feb. 1991, p. 104. 38. N. Suzuki et al., Experiments on proposed multiplexing scheme for vertical-temporal and vertical high helper signals in EDTV-II, J. SMPTE, Nov. 1994, p. 728. 39. Television Operating and Interface Standards, Society of Motion Picture and Television Engineers, 595 W. Hartsdale Ave., New York, NY 10607-1824. 40. Television Measurements Standards, Institute of Electrical and Electronic Engineers, Inc., Broadcast Technology Society, c/o IEEE Service Center, 445 Hoes Lane, Box 1,331, Piscataway, NJ 08855. 41. J. Watkinson, Television Fundamentals, Focal Press, Oxford, England, 1996.
TERAHERTZ ELECTRIC FIELD IMAGING X. -C. ZHANG Rensselaer Polytechnic Institute Troy, NY
INTRODUCTION TO THE TERAHERTZ WAVE Various frequencies are spaced along the frequently used electromagnetic spectrum, including microwaves,
Electronics
THz
Microwaves
infrared, visible light, and X rays. Terahertz radiation between microwave and infrared frequencies lies (Fig. 1). In the electromagnetic spectrum, radiation at 1 THz has a period of 1 ps, a wavelength of 300 µm, a wave number of 33 cm−1 , photon energy of 4.1 meV, and an equivalent temperature of 47.6 K. In the same way that visible light can create a photograph, radio waves can transmit sound, and X rays can see shapes within the human body, terahertz waves (Trays) can create pictures and transmit information. Until recently, however, the very large terahertz portion of the spectrum has not been particularly useful because there were neither suitable emitters to send out controlled terahertz signals nor efficient sensors to collect them and record information. Recent developments in terahertz time-domain spectroscopy and related terahertz technologies now lead us to view the world in a new way. As a result of developing research, terahertz radiation now has widespread potential applications in medicine, microelectronics, agriculture, forensic science, and many other fields. Three properties of THz wave radiation triggered research to develop this frequency band for applications: • Terahertz waves have low photon energies (4 meV at 1 THz) and thus cannot lead to photoionization in biological tissues. • Many molecules exhibit strong absorption and dispersion rates at terahertz frequencies, due to dipole allowed rotational and vibrational transitions. These transitions are specific to the molecule and therefore enable terahertz wave fingerprinting. • Coherent terahertz wave signals can be detected in the time domain by mapping the transient of the electrical field in amplitude and phase. This gives access to absorption and dispersion spectroscopy. Coherent terahertz time-domain spectroscopy that has an ultrawide bandwidth provides a new method for characterizing the electronic, vibronic, and compositional properties of solid, liquid, and gas phase materials, as well as flames and flows. In theory, many biological and chemical compounds have distinct signature responses to terahertz waves due to their unique molecular vibrations and rotational energy levels, this implies that their chemical compositions might be examined
Photonics Visible
X ray
g ray
MF, HF, VHF, UHF, SHF, EHF
100
103
106
109
Kilo
Mega
Giga
1012
Tera
1393
1015
Peta
1018
Exa
1021
1024
Zetta
Yotta
Hz
Frequency (Hz) Figure 1. The terahertz gap: a scientifically rich but technologically limited frequency band between microwave and optical frequencies.
1394
TERAHERTZ ELECTRIC FIELD IMAGING
using a terahertz beam. Such capability could be used diagnosing disease, detecting pollutants, sensing biological and chemical agents, and quality control of food products. It is also quite possible that plastic explosives could be distinguished from suitcases, clothing, common household materials, and equipment based on molecular structure. Detecting the binding state of genetic materials (DNA and RNA) by directly using terahertz waves, without requiring markers, provides a label-free method for genetic analysis for future biochip technologies. A T-ray imaging modality would produce images that have ‘‘component contrast’’ enabling analysis of the water content and composition of tissues in biological samples. Such capability presents tremendous potential to identify early changes in composition and function as a precursor to specific medical investigations and treatment. Moreover, in conventional optical transillumination techniques that use near-infrared pulses, large amounts of scattering can spatially smear out the objects to be imaged. T-ray imaging techniques, due to their longer wavelengths, can provide significantly enhanced contrast as a result of low scattering (Rayleigh scattering).
TECHNICAL BACKGROUND The development of terahertz time-domain spectroscopy has recently stimulated applications of this unexplored frequency band. Hu and Nuss first applied THz pulses to imaging applications (1). Far-infrared images (T-ray imaging) of tree leaves, bacon, and semiconductor integrated chips have been demonstrated. In an imaging system that has a single terahertz antenna, the image is obtained by pixel-scanning the sample in two dimensions (2–4). As a result, the time for acquiring an image is typically of the order of minutes or hours, depending on the total number of pixels and the lowest terahertz frequency components of interest. Although it is highly desirable to improve the data acquisition rate further for real-time imaging by fabricating a focal plane antenna array, technical issues such as high optical power consumption and limits on the antenna packaging density would hinder such a device (5). Recently, a free-space electro-optic sampling system has been used to characterize the temporal and 2-D spatial distribution of pulsed electromagnetic radiation (6–8). A T ray can be reflected as a quasi-optical beam, collimated by metallic mirrors, and focused by a plastic or high-resistivity silicon lens. The typical powers of a T-ray sensing (single pixel) and an imaging (2-D array) system are microwatts and milliwatts, respectively. Thus a terahertz imaging system, based on the electro-optic sampling technique, shows promise for 2-D real-time frame imaging using terahertz beams.
GENERATION OF TERAHERTZ BEAMS Currently, photoconduction and optical rectification are the two basic approaches for generating terahertz beams
by using ultrafast laser pulses. The photoconductive approach uses high-speed photoconductors as transient current sources for radiating antennas (9). These antennas include elementary hertzian dipoles, resonant dipoles, tapered antennas, transmission lines, and largeaperture photoconducting antennas. The optical rectification approach uses electro-optic crystals as a rectification medium (10). The rectification can be a second-order (difference frequency generation) or higher order nonlinear optical process, depending on the optical fluency. The physical mechanism for generating a terahertz beam by photoconductive antennas is the following: a laser pulse (hω ≥ Eg ) creates electron–hole pairs in the photoconductor, the free carriers then accelerate in the static field to form a transient photocurrent, and the fast time-varying current radiates electromagnetic waves. In the far field, the electrical component of terahertz radiation is proportional to the first time derivative of the photocurrent. The waveform is measured by a 100-µm photoconducting dipole that can resolve subpicosecond electrical transients. Because the radiating energy comes mainly from stored surface energy, the terahertz radiation energy can scale up with the bias and optical fluency (11). Optical rectification is the inverse process of the electro-optic effect (12). In contrast to photoconducting elements where the optical beam functions as a trigger, the energy of terahertz radiation during transient optical rectification comes from the excitatory laser pulse. The conversion efficiency depends on the value of the nonlinear coefficient and the phase matching condition. In the optical rectification mode, the terahertz pulse duration is comparable to the optical pulse duration, and the frequency spectrum is limited mainly by the spectral broadening of the laser pulse, as determined by the uncertainty principle. Materials used for terahertz sources have been adapted from conventional electrooptic crystals and include semiconductor and organic crystals. Enhancement and sign change of second-order susceptibility by optically exciting electronic resonance states have been reported (13).
FREE-SPACE ELECTRO-OPTIC DETECTION Fundamentally, the electro-optic effect is a coupling between a low-frequency electrical field (terahertz pulse) and a laser beam (optical pulse) in the sensor crystal. Free-space electro-optic sampling via the linear electrooptic effect (Pockels effect) offers a flat frequency response across an ultrawide bandwidth. Because field detection is purely an electro-optic process, the system bandwidth is limited mainly by either the pulse duration of the probe laser or the lowest transverse optical (TO) phonon frequency of the sensor crystal. Furthermore, because electro-optic sampling is purely an optical technique, it does not require electrode contact or wiring on the sensor crystal (14,15). Figure 2 is a schematic of the experimental setup for using optical rectification and electro-optic effects. Nonlinear optics forms the basis of the terahertz system. A
TERAHERTZ ELECTRIC FIELD IMAGING
mode-locked Ti:sapphire laser is used as the optical source. Several different gigahertz/terahertz emitters can be used, including photoconductive antennas (transient current source) and a 111 GaAs wafer at normal incidence (optical rectification source) (16–18). Generally, an optical rectification source emits terahertz pulses whose duration is comparable to that of the optical excitatory pulse, and a transient current source radiates longer terahertz pulses. Figure 3 shows the details of the sampling setup. Simple tensor analysis indicates that using a 110 oriented zincblende crystal as a sensor gives the best sensitivity. The polarization of the terahertz beam and optical probe beam are parallel to the [1,−1,0] crystal direction. Modulating the birefringence of the sensor crystal via an applied electrical field (terahertz) modulates the polarization ellipticity of the optical probe beam that passes through the crystal. The ellipticity modulation of the optical beam can then be analyzed for polarization to provide information on both the amplitude and phase of the applied electrical field. The detection system will analyze a polarization change from the electro-optic crystal and correlate it with the amplitude and phase of the electrical test field. For weak field detection, the power of
ZnTe
Time delay
12"
l/4
Terahertz beam
18"
Terahertz emitter
Fiber laser
the laser beam Pout (E) modulated by the electrical field of the terahertz pulse (E = V/d) is πE , Pout (E) = P0 1 + Eπ
60
0]
[1,−1,0]
Wollaston polarizer
, ,1
[1
EO signal (nA)
40
p Detector
(1)
where P0 is the output optical probe power at zero applied field and Eπ is the half-wave field of the sensor crystal of certain thickness. By measuring Pout from a calibrated voltage source as a function of time delay between the terahertz pulse and optical probe pulse, the time-resolved sign and amplitude of V can be obtained, and a numerical FFT provides frequency information. For a 3-mm thick ZnTe sensor crystal, the shot-noise limit gives a minimum detectable field of 100 nVcm−1 Hz1/2 and a frequency range from near dc to 4 THz. Figure 4 is a plot of the temporal electro-optic waveform of a terahertz pulse whose a half-cycle duration is 1 ps, as measured by a balanced detector using a 110 ZnTe sensor crystal. The time delay is provided by changing the relative length of the optical beam path between the terahertz pulses and the optical probe pulses. Detection sensitivity is significantly improved by increasing the interactive length of the pulsed field and the optical probe beam within the crystal. The dynamic range can exceed 10, 000 : 1 using unfocused beams, 100, 000 : 1 using unamplified focused beams, and 5, 000, 000 : 1 using focused amplified beams and a ZnTe sensor crystal. Figure 5 is a plot of the signal and noise spectra, where the a SNR > 50, 000 from 0.1 to 1.2 terahertz, corresponding to the waveform in Fig. 4. A linear response in both generating and detecting of the terahertz pulses is crucial. Figure 6 is a plot of the electro-optic signal versus peak terahertz field strength. Excellent linearity is achieved. By increasing the optically illuminated area of the photoconductor on the terahertz emitter, the total emitted terahertz power scales linearly with the illumination area (assuming a nonsaturating optical fluence). A shorter laser pulse (<15 fs) and a thinner sensor (<30 µm) allow us to enter the mid-IR range. Figure 7 shows a typical mid-IR waveform. A 30-µm and a 27-µm thick ZnTe crystal are used as an emitter and a sensor,
Figure 2. A simplified terahertz system.
s
1395
ZnTe sensor
20 0 −20
Compensator
tz er ah m r Te bea → E
−40
ZnTe Probe beam Pellicle
→ E
Polarizer
Figure 3. Details of a ZnTe sensor setup.
−60
0
2
4
6
8
10
Time (ps) Figure 4. Temporal electro-optic signal of a 1-ps terahertz pulse measured by a ZnTe sensor.
1396
TERAHERTZ ELECTRIC FIELD IMAGING
100
4 30-µm ZnTe emitter
10−1
ZnTe Signal EO signal (nA)
−2
Amplitude
10
10−3 −4
10
10−5
Noise floor
10−6
27-µm ZnTe sensor
2
0
−2 −4
0.0
0.5
1.0
1.5
2.0
0
1
2
Frequency (THz)
3
Time (ps)
Figure 5. Amplitude spectrum of the waveform in Fig. 4 that shows a SNR >100,000 from 0.1 to 1.2 THz.
Figure 7. Temporal waveform of the terahertz radiation measured by a 30-µm ZnTe sensor. The shortest oscillation period is 31 fs (corresponding to 8 µm).
1000 Optical probe power = 85 µW 25-µm ZnTe
10 10
Transmission
EO signal (nA)
100
1 0.1
1
0.1
0.01 0.1
1
10
100
1000
Electric field (V/cm) Figure 6. Electro-optic signal versus terahertz field.
respectively. The shortest period of the oscillation is 31 fs. The Fourier transform of the mid-IR pulse after it passes through a GaAs wafer is shown in Fig. 8, where the highest frequency response reaches more than 30 terahertz. The dip at 5.3 terahertz is due to phonon absorption by the ZnTe crystals. The preliminary result demonstrates the advantages of using the electro-optic sampling technique for measuring of ultrafast far-infrared to mid-infrared pulsed electromagnetic radiation. Current research has demonstrated that by using a thin ZnTe crystal (of the order of 10 µm), a detection bandwidth of more than 40 THz is feasible (19,20). ELECTRO-OPTIC MEASUREMENT USING A CHIRPED PROBE PULSE Using a linearly chirped optical probe pulse in electrooptic measurements, a temporal waveform of an electric field is linearly encoded onto the frequency spectrum of the probe pulse, and then decoded by dispersing the probe pulse from a grating to a detector array (21). Real-time and single-shot acquisition of picosecond terahertz field pulses is achieved without using a mechanical time-delay device (22).
0
10
20
30
40
50
60
Frequency (THz) Figure 8. Frequency spectrum of the terahertz field in Fig. 7. The absorption dips at 5.3 THz are due to the phonon band. By using GaP, this dip can be moved to more than 10 THz.
Free-space terahertz electro-optic detection employing a chirped pulse technique may provide a solution for measuring freely propagating electromagnetic pulses (23) at an unprecedented acquisition time. A terahertz imaging system using a free-space electro-optic field sensor provides the capability of real-time and single-shot recording (24,25). Figure 9 schematically illustrates the concepts of electro-optic measurement using the standard sampling method and a chirped optical probe beam. Figure 10 shows chirped pulse measurement. The geometry is similar to the conventional free-space electro-optic sampling setup, except for the use of a grating pair for chirping and stretching the optical probe beam and a grating–lens combination using a detector array for measuring the spectral distribution. Due to the negative chirp of the grating (pulse has decreasing frequency versus time), the blue component of the pulse leads the red component. The fixed delay line is used only to position the terahertz pulse within the duration of the synchronized probe pulse (acquisition window) and for temporal calibration.
TERAHERTZ ELECTRIC FIELD IMAGING
EO modulation
Output Counts (×104)
Input
t-division
R
Terahertz pulse
S
Orignal pulse Input
λ-division
EO modulation
1.0 0.8 0.6 0.4 0.2 0.0 −0.2
Output
1397
S
R D
0
50 100 150 200 250 Pixel
Counts (×104)
1.0
Terahertz pulse
R
Chirped pulse S
0.8
S
0.4 0.2
D
0.0 −0.2.
Figure 9. Schematics for (top) time and (bottom) wavelength division multiplexing and demultiplexing of the terahertz pulse. The wavelength division technique uses a chirped pulse.
R
0.6
0
50 100 150 200 250 Pixel
Figure 11. (Left) Imaging traces with/without a terahertz signal. (Right) 2-D plots of spectral distribution reconstructing the temporal profile of the terahertz signal.
Grating Laser R
R
S CCD
Polarizer Delay line ZnTe
Spectrometer
E (a.u.)
Chirped pulse
1
0.5
0
10
Terahertz pulse
Figure 10. Schematic experimental setup of electro-optic measurement using a chirped optical probe beam. Images of the chirped probe pulses; S and R denote image traces of the signal and reference beam, respectively.
The temporal terahertz signal can be extracted by measuring the difference between the spectral distributions of the probe pulse with and without terahertz pulse modulation. Figure 11 shows the spectral distributions of the chirped probe pulse measured by the CCD camera with and without a terahertz signal. The normalized distribution reconstructs both the amplitude and phase of the temporal waveform of the terahertz pulse. Electro-optic measurement using a chirped optical probe pulse is capable of single-shot measurement of a full terahertz waveform. Figure 12 is a plot of a singleshot image from a GaAs photoconductive dipole antenna. The total time for wavelength division multiplexing and demultiplexing is a few picoseconds. The dipole length is 7 mm, and the bias voltage is 5 kV. One-dimensional spatial distribution across the dipole and its temporal terahertz waveform are obtained simultaneously in a single laser pulse. The size of the spatio temporal image is 10 mm by 25 ps. Typical oscillatory features
m
5 −0.5 0
m
Polarizer Terahertz = modulated chirped pulse
5
10
15
20
0
y(
Emitter
)
S
25
Time (ps) Figure 12. A single-shot spatiotemporal (10 mm–25 ps) image of a terahertz pulse measured by a chirped optical probe pulse. The emitter is a 7-mm dipole antenna.
and the symmetrical spatial distribution of the farfield pattern are obtained from a dipole photoconductive emitter. TERAHERTZ IMAGING One of the most advantageous applications of free-space electro-optic detection is two-dimensional terahertz wave imaging (26–30). Figure 13 schematically illustrates an experimental arrangement for free-space electro-optic terahertz imaging. Silicon lenses are used to focus the terahertz radiation on a 110 oriented ZnTe crystal. An optical readout beam whose diameter is larger than that of the terahertz beam probes the electric field distribution within the crystal via the Pockels effect. The 2-D terahertz field distribution in the sensor crystal is converted into a 2-D optical intensity distribution after the readout beam passes through a crossed polarizer (analyzer), and the optical image is then recorded by a linear diode array
1398
TERAHERTZ ELECTRIC FIELD IMAGING
or a digital CCD camera. Figure 14 shows one of the ZnTe crystals used for the terahertz imaging system and several electrical and optical parameters. By illuminating an electro-optic crystal using a terahertz beam and an optical readout beam, then detecting the optical beam using a linear diode array or CCD camera, time-resolved 1-D or 2-D images of pulsed far-infrared radiation can be achieved, respectively. This system is can noninvasively image of moving objects, turbulent flows, or explosions. Furthermore, real-time 2-D terahertz imaging is feasible using the electro-optic technique. By chopping the terahertz beam, the change in optically recorded gray levels can be observed in real time by the human eye. In this system, the temporal resolution is about 50 fs, and the frame transfer rate is 38 images per second. Astonishingly, the transfer rate can be extended to as much as 190 images/second (digital out) and 2,000 images/second (video out)! Figure 15 shows terahertz images of a mammographic phantom, which is designed to test the performance of a mammographic system by quantitatively evaluating of the system’s ability to image small structures similar to those found clinically. Images of these fibers, masses, and specks, that are as thin as 0.24 mm have been resolved. The diameter of the current image is 1 cm, using an intense laser system, an imaging size of 20 cm is possible. Figure 16 shows a standard photo and a terahertz image of compressed breast tissue obtained by the scanning image system. A 0.6 mm abnormal wire structure is clearly distinguished as a dark line in the breast tissue in the image range of 1 × 1 cm2 . Figure 17 shows a standard photo, a transmitted terahertz image, and the terahertz transmission waveform through normal and cancerous tissues (breast cancer phantom). The dense tumor has a lower water concentration than the normal tissue, so the transmitted waveform has a temporal delay and provides less attenuation compared with normal tissue. Significant spectral difference between the normal and cancerous tissues can be seen
Terahertz beam
Pellicle
ZnTe
Analyzer
Eg = 2.2 eV nphonon = 5.3 THz
Ep = 89 V/cm
e = 11; ng = 3.2
ZnTe
r41 = 4 pm/V
vg(800 nm) = vp(150 µm)
f > 40 THz; t < 30 fs
ZnTe
Figure 14. One of the ZnTe crystals used in terahertz imaging. The useful area is more than 3 × 3 cm2 .
Fibers, specks, and masses
THz images Fibers
Specks
Mass
Figure 15. Terahertz images of fibers, mass, and specks. Small structures less than 0.5 mm thick and diameter less than 0.24 mm can be resolved.
Photo
Terahertz image
CCD camera
Polarizer
Readout beam Figure 16. Photo of human breast tissue and a T-ray image of a 0.6 mm abnormal structure (shadow).
Computer
THz Image
Figure 13. Setup for converting a terahertz image into an optical image. The 2-D field distribution in the sensor crystal is converted into a 2-D optical intensity distribution after the readout beam passes through a crossed analyzer.
between 30 GHz and 0.2 THz. The electro-optic imaging system can image fast moving species, such as real-time imaging of living insects. Figure 18 demonstrates realtime in vivo terahertz images of insects, such as a fly, worm, ant and ladybug. The antennae and legs of the ant and the organs in the ladybug can be resolved. Terahertz radiation has no ionizing effect, and the spectrum of a
TERAHERTZ ELECTRIC FIELD IMAGING
Terahertz image
Standard photo
1399
Terahertz waveforms
0 4
Cancer tissue Normal tissue
EO signals (nA)
mm
5 10 15 20 25 A tumor
3 2 1 0
0
5
10
15 mm
20
25
0
10
20
30
40
Time delay (ps)
Figure 17. A breast tissue sample for terahertz measurement. The light spot near the center is a cancerous tumor. Transmitted terahertz waveform is from normal tissue and cancerous tissue.
Terahertz beam
2f
2f
A fly on a leaf
Invisible object
CCD camera
ZnTe
Analyzer Polarizer
Lens
Readout beam Terahertz images of insects
A worm
An ant
A ladybug
Computer
Figure 18. In vivo terahertz images of insects, such as a fly, worm, ant and ladybug. An image rate as fast as 30 frames/s is achieved.
terahertz wave falls within the heat range; therefore it is safe for medical applications. Figure 19 shows the schematic for terahertz imaging of currency watermarks. Unlike intensity images viewed by a visible beam, the watermark images in Fig. 19 are obtained purely by the phase difference of the terahertz pulse transmitted through the watermarks. The maximum phase shift is less than 60 fs. The terahertz absorption is less than 1%. Clearly these terahertz watermark images in the terahertz spectrum show an alternative method for detecting-counterfeiting. Electro-optic imaging makes it possible to see terahertz wave images of electrical fields, diseased tissue, the chemical composition of plants, and much more that is undetectable by other imaging systems. Real-time monitoring of a terahertz field supports real-time diagnostic techniques.
THZ WAVE TRANSCEIVER In a conventional experimental setup of terahertz timedomain spectroscopy, a separate terahertz transmitter and terahertz receiver are used to generate and detect the terahertz signal. However, because electro-optic detection
is the reverse of rectified generation, the transmitter and the receiver can be the same crystal (31). Therefore, a terahertz transceiver, which alternately transmits pulsed electromagnetic radiation (optical rectification) and receives the returned signal (electro-optic effect) is feasible. The use of a transceiver has its advantages for terahertz range remote sensing and tomographic imaging. Theoretically and experimentally, it has also been demonstrated that the working efficiency of an electrooptic transceiver constructed from a (110) zincblende crystal is optimized when the pump beam polarization is 26° counterclockwise from the crystallographic Zaxis of the crystal. An experimental setup of a terahertz imaging system using an electro-optic transceiver is shown in Fig. 20. Compared to the traditional terahertz tomographic setup in reflective geometry, this imaging system using an electro-optic transceiver is simpler and easier to align. Besides, the normal incidence of the terahertz beam on the sample can be maintained. Greater than 50 meters of free-space terahertz generation, propagation and detection has been demonstrated by using this transceiver. Terahertz tomographic imaging using an electro-optic transceiver is illustrated by using a razor pasted on a metal
1400
TERAHERTZ ELECTRIC FIELD IMAGING
2f
Lens
Ggahertz/Terahertz beam
2f
CCD camera
ZnTe
Analyzer Polarizer
Optical beam
readout beam
compute r
Figure 19. Terahertz image based on the phase difference in a currency watermark structure. A phase shift as small as a few femtoseconds can be resolved.
Sample
(c)
Chopper
Probe
Terahertz signal (a.u.)
6
Pump
4
(b)
2 (a) 0 −2
0
2
4
6
8
Time delay (ps)
Figure 21. Terahertz waveforms reflected from (a) the metal handle of a razor, (b) the razor surface, and (c) the metal mirror.
ZnTe
mirror. There are three different reflective metal layers in this sample; the first is the metal handle of the razor, the second is the razor surface, and the third is the metal mirror. Figure 21 shows the terahertz waveforms reflected from these three different layers; the timing difference in the peak intensity spatially separate these layers, which can be used to construct a three-dimensional tomographic image of a razor, as shown in Fig. 22. Using the same imaging system, terahertz tomographic images of a quarter dollar and a 50-pence piece are shown in Fig. 23. The image contrast is limited by the terahertz beam focal size and the flatness of the background metal surface. The width of the short timing window is determined by degree of ‘‘unflatness’’ of the target. If two images are from two different reflective layers and their spatial separation (depth) is large enough, the image can be displayed in this fashion at two different timing positions; the timing difference is proportional to the depth difference between two layers. Three-dimensional terahertz imaging can still be realized
ps
Figure 20. Schematic experimental setup of an electro-optic terahertz transceiver. The terahertz signal is generated and detected by the same ZnTe crystal. 5 0 −5 3 4 2
Cm
2
1
Cm
0
0
Figure 22. Terahertz tomographic image of a razor, the gray level represents the timing of the peak intensity.
without displaying the image in terms of the timing of peak intensity. TERAHERTZ WAVE NEAR-FIELD IMAGING The terahertz wave near-field imaging technique can greatly improve the spatial resolution of a terahertz wave sensing and imaging system (32). Dr. Klass Wynne
TERAHERTZ ELECTRIC FIELD IMAGING (a) 2.5
Cm
2
1.5
1
0.5
0.5
1
1.5
2
2.5
Cm
(b) 2.5
Cm
2
1.5
1
0.5
0.5
1
1.5
2
2.5
Cm Figure 23. Terahertz image of (a) a quarter dollar; (b) a fifty-pence piece; the gray level represents the peak intensity within a certain temporal window.
in the United Kingdom has demonstrated 110-µm and 232-µm spatial resolution for λ = 125 µm and λ = 1 mm, respectively (33). The improvement factor is about 2 to 4. O. Mitrofanov and John Federici at the New Jersey Institute of Technology and Bell Laboratory reported the use of collection mode near-field imaging to improve spatial resolution (34–36). The best result reported is 7µm imaging resolution using 0.5 terahertz pulses. This is about 1/100 of the wavelength. The limitation of such a system is the low throughput of the terahertz wave past the emitter tip, the throughput terahertz wave field is inversely proportional to the third power of the aperture size of the emitter tip. A newly developed dynamic-aperture method that introduces a third gating beam can image objects at a
1401
subwavelength resolution (λ/100); however, the drawback of this method is the difficulty in coating a gating material on the surface of biomedical samples such as cells and tissues (37). Dr. Wynne’s method (the use of an electro-optic crystal as a near-field emitter) led to the development of the terahertz wave microscope. Biomedical samples are mounted directly on the surface of the crystal. Figure 24 shows the terahertz wave near-field microscope for 2D microscopic imaging. In this case, terahertz waves are generated in the crystal by optical rectification and are detected by a terahertz wave detector crystal by the electro-optic effect. The spatial resolution is limited only by the optical focal size of the laser on the crystal (less than 1 µm due to the large refractive index of 2.8 for ZnTe) under moderate optical power, and it is independent of the wavelength of the terahertz wave. A coated thin ZnTe plate (antireflective coating for the bottom surface and highly-reflective coating for the top surface) is placed at the focal plane of the microscope as the terahertz wave emitter. The coating prevents optical loss in the crystal and leakage of the optical beam into the tissue sample. The tissue can be monitored by the optical microscope. A laser beam is guided from the bottom of the microscope into the terahertz emitter. Terahertz waves generated by the emitter can be detected in the transmitted mode (a terahertz wave sensor is mounted on top of the microscope) and/or reflection mode (transceiver). The emitter, the sample, or the terahertz beam can be scanned laterally to obtain a 2-D image. Submicron spatial resolution is expected, even though the imaging wavelength is about 300 µm at 1 THz. In the transmitted mode shown in Fig. 25, a separate ZnTe sensor crystal (terahertz detector) is required, and a probe beam is required to sample the terahertz wave in the sensor crystal. This construction also applies the concept of the terahertz wave transceiver, which combines the emitter and receiver in one crystal in the nearfield range, as shown in Fig. 26. Both transmitted and reflected terahertz wave microscopic images can therefore be obtained from the proposed system. When a Ti : sapphire laser where λ = 0.8 µm is used as the optical source, the smallest optical focal spot
Terahertz wave image
Microscope monitor
Laser Figure 24. The concept of 2-D near-field inverted terahertz wave microscope imaging (left) and the schematic of a terahertz wave microscope system (right). A tissue sample is placed on top of a terahertz wave emitter.
1402
TERAHERTZ ELECTRIC FIELD IMAGING
Parabolic mirror
Terahertz wave
Thin tissue Terahertz emitter
To Terahertz detector Microscope objective Laser
Figure 25. The Tray is generated and detected in one ZnTe crystal in the reflected geometry. The T-ray imaging spot on the tissue is comparable to the focal spot of the optical beam.
Tissue <110> ZnTe Index matched lens
Lens
The optical beam is focused in the ZnTe through the matching refractive index lens to a spot size comparable to 1.22λ/n (assume NA = 1). If λ = 0.8 µm and n = 2.8, in theory, a focal spot can be as small as 0.35 µm. By using a shorter optical wavelength, such as the second-harmonic wave from a Ti : sapphire laser, a smaller focal spot is expected. An electro-optic terahertz transceiver can be used in the reflective mode of the near-field terahertz wave microscope. In this case, the terahertz wave is generated and detected at the same focal spot within a thin crystal (ZnTe). The target sample (biomedical tissue) is placed on top of the crystal (terahertz transceiver). The measured area of the tissue is comparable to the optical focal size. Due to the intense power density at an optical focal spot (micron or submicron), higher order nonlinear phenomena other than optical rectification have to be considered. Some effects may limit T-ray generation and detection. For example, two-photon absorption (a third-order nonlinear optical effect) in ZnTe generates free carriers. In a tight focal spot, extremely high free-carrier density changes the ZnTe local conductivity, screens the Trays, and saturates the T-ray field. A possible solution is to reduce the optical peak power and increase the pulse repetitionrate. This method can maintain the same average power. CONCLUSION
BS
Diode Laser P
Lens
Figure 26. The T-ray is generated and detected in one ZnTe crystal in the reflected geometry.
a in the air is calculated from the standard equation a = 1.22λ2f /D the (1.22 factor comes from the diffraction limit under the Gaussian beam approximation) where f is the wavelength, D is the beam diameter, and D/2f is the numerical aperture NA of the microscopes objective lens. Assuming the ideal case where NA = 1, then a = 1 µm. A possible way of achieving submicron lateral resolution is to focus the optical beam into a high refractive index medium. The refractive index of ZnTe is greater than 1, therefore, the focal spot in a ZnTe must be smaller than that in air by the factor of the refractive index value. However, when directly focusing a laser beam from air into a ZnTe plate, as shown in Fig. 25, it is difficult to achieve a much smaller focal spot because of the change in the numerical aperture after optical refraction at the ZnTe interface by Snells law. This can be improved by using a high-index hemispherical lens, as shown in Fig. 26. The numerical apertures of the first focal lens and the hemispherical lens must be identical. A thin ZnTe plate is placed on the top of the hemispherical lens, which has the same refractive index as that of the ZnTe (n = 2.8).
The terahertz band occupies an extremely broad spectral range between the infrared and microwave bands. However, due to the lack of efficient terahertz emitters and sensors, far less is known about spectra in the terahertz band than those in the rest of the electromagnetic spectrum. Recently developed photoconductive antennas and free-space electro-optic sampling provide measurement sensitivity by several orders better than conventional bolometer detection, but it is still far from the detection resolution achieved in other frequency bands. The development of instruments impacts physics and basic science; recent examples are the scanning tunneling microscope and near field optical microscope, which opened the following new fields to the physics community. Terahertz System for Spectroscopy The powerful capabilities of the time-domain terahertz spectroscopic technique results from the ability to provide 20-cm diameter real-time images at a variable frame rate up to 2,000 frames/second and the ability to image moving objects, turbulent flows, and explosions noninvasively. Furthermore, the imaging system will also have a subwavelength spatial resolution (1/1,000 λ), 50-femtosecond temporal resolution, sub-mV/cm field sensitivity and be capable of single-shot measurements. New Terahertz Sources New terahertz beam sources emphasize tunable narrowband terahertz lasers using novel semiconductor structures. For example, a terahertz laser was recently developed using a p-type Ge, which operates at the temperature of liquid nitrogen. In this laser, a novel unipolar-type
TERAHERTZ ELECTRIC FIELD IMAGING
population inversion is realized by the streaming motion of carriers in the semiconductor. It may be possible to improve these solid-state terahertz laser sources further by using strained SiGe. Terahertz BioChip and Spectrometer The detection of nanolayers using terahertz techniques is a challenge, which cannot be solved by conventional transmission techniques. The fact that the thickness of the layer is orders of magnitude smaller than the wavelength of the terahertz radiation leads to layer specific signatures that are so small that they are beyond any reasonable detection limit. An alternative method for detecting nanolayers is grating couplers. Evanescent waves, that travel on the grating enlarge the interactive length between the nanolayers and terahertz radiation from nanometers to several tens of micrometers. Quantum Terahertz Biocavity Spectroscopy The concept is to design and fabricate photonic band-gap structures in the terahertz regime and place materials such as DNA, biological and chemical agents, or quantum dots in the active region of the cavity. This configuration will be useful for enhanced absorption as enhanced spontaneous emission spectroscopy. Previous research has indicated that DNA (and other cellular materials) possess large numbers of unique resonances due to localized phonon modes that arise from DNA base-pair interaction, that is absent from far-infrared data.
1403
than relying on transit and/or tunneling phenomena for individual electrons. Just as sound waves propagating in air achieve velocities that are orders of magnitude higher than the velocities of individual molecules propagating from the sound source to a sound detector, such as the human ear, in the same way electron plasma waves can propagate at much higher velocities and can generate and detect terahertz radiation. A preliminary theoretical foundation of this approach to terahertz generation and detection has already been established and the first experimental results have been obtained for detecting and generating terahertz radiation by a two-dimensional electron gas. Acknowledgments This work was supported by the U.S. Army Research Office and the U.S. National Science Foundation.
ABBREVIATIONS AND ACRONYMS THz GaAs ZnTe 2-D IR DNA RNA CCD ps fs
terahertz gallium arsenate zinc telluride two-dimensional infrared deoxyribonucleic acid ribonucleic acid charged couple device picosecond femtosecond
BIBLIOGRAPHY Terahertz Molecular Biology Spectroscopy Understanding how genes are regulated is one of the grand challenges in molecular biology. Recent reports indicate that pulsed terahertz spectroscopy provides a new handle on the interactions of biopolymers. The use of pulsed terahertz spectroscopy in the study of the binding of transcription factors to these cognate DNA binding sites will be explored. Near-Field Terahertz Imaging The current near-field terahertz imaging system has a spatial resolution of 1/50 λ. The microscopic imaging system will achieve submicron spatial resolution (<1/1,000 λ resolution). If this can be done, it will provide direct spectroscopy and imaging of biological cells at terahertz frequency, which has not been possible in the past. New Terahertz Source and Detector (Plasma Wave Generator) Existing terahertz sources have limited power and typically deliver very broadband radiation. Existing terahertz detectors are also broadband. To achieve breakthroughs in the physics and applications of terahertz technology, we need to develop much more efficient and powerful sources of terahertz radiation and realize resonant and tunable detectors of terahertz radiation. To achieve these goals, we will need to develop entirely new physical approaches. The new physics will be based on using collective electronic motion plasma waves, rather
1. B. B. Hu and M. C. Nuss, Opt. Lett. 20, 1,716–1,718 (1995). 2. M. C. Nuss, Circuits & Devices 12, 25–30 (1996). 3. D. M. Mittleman, R. H. Jacobsen, and M. C. Nuss, IEEEJSTQE 3, 678–692 (1996). 4. D. M. Mittleman, J. Cunningham, M. C. Nuss, and M. Geva, Appl. Phys. Lett. 71, 16–18 (1997). 5. S. Mickan, D. Abbott, and X. -C. Zhang, Microelectronics J. 31, 503–514 (2000). 6. Q. Wu and X. -C. Zhang, Appl. Phys. Lett. 67, 3,523–3,525 (1995). 7. A. Nahata, D. H. Auston, T. F. Heinz, and C. Wu, Appl. Phys. Lett. 68, 150–152 (1996). 8. P. Uhd Jepsen et al., Phys. Rev. E 53, 3,052–3,054 (1996). 9. Ch. Fattinger and D. Grischkowsky, Appl. Phys. Lett. 53, 1,480–1,482 (1988). 10. A. Rice et al., Appl. Phys. Lett. 64, 1,324–1,326 (1994). 11. J. T. Darrow, X. -C. Zhang, D. H. Auston, and J. D. Morse, IEEE J. Quantum Electron. 28, 1,607–1,616 (1992). 12. M. Bass, P. A. Franken, J. F. Ward, and G. Weireich, Phys. Rev. Lett. 9, 446–448 (1962). 13. X. -C. Zhang, Y. Jin, K. Yang, and L. J. Schowalter, Phys. Rev. Lett. 69, 2,303–2,306 (1992). 14. Q. Wu and X. -C. Zhang, IEEE-JSTQE 3, 693 (1996); Opt. Quantum Electron. 28, 945–951 (1996). 15. A. Nahata and T. F. Heinz, IEEE-JSTQE 3, 701–708 (1996). 16. J. T. Darrow, B. B. Hu, X. -C. Zhang, and D. H. Auston, Opt. Lett. 15, 323–325 (1990). 17. X. -C. Zhang and D. H. Auston, J. Electromagnetic Waves Appl. 6, 85–106 (1992).
1404
TOMOGRAPHIC IMAGE FORMATION TECHNIQUES
18. X. -C. Zhang and D. H. Auston, J. Appl. Phys. 71, 326–338 (1992). 19. Q. Wu and X. -C. Zhang, Appl. Phys. Lett. 70, 1,784–1,786 (1997). 20. Q. Wu and X. -C. Zhang, Appl. Phys. Lett. 71, 1,285–1,286 (1997). 21. Z. Jiang and X. -C. Zhang, Appl. Phys. Lett. 72, 1,945–1,947 (1998). 22. Z. Jiang and X. -C. Zhang, Engineering Laboratory Note, Optic & Photonic News 9(1-2), (1998). Also in Appl. Opt. 37, 8,145–8,146 (1998). 23. F. G. Sun, Z. Jiang, and X. -C. Zhang, Appl. Phys. Lett. 73, 2,233–2,235 (1998). 24. Z. Jiang and X. -C Zhang, Opt. Lett. 23, 1,114–1,116 (1998). 25. Z. Jiang and X. -C. Zhang, Opt. Express 5(11), 243–248 (1999). 26. X. -C. Zhang and Q. Wu, Opt. & Photonics News 12, 9 (1996). 27. X. -C. Zhang, Q. Wu, and T. D. Hewitt, in Ultrafast Phenomena X, Springer Series in Chemical Physics, 54, 463–465 Berlin, Heidelberg, 1996. 28. Q. Wu, F. G. Sun, P. Campbell, and X. -C. Zhang, Appl. Phys. Lett. 68, 3,224–3,226 (1996). 29. Q. Wu, T. D. Hewitt, and X. -C. Zhang, Appl. Phys. Lett. 69, 1,026–1,028 (1996). 30. Z. G. Lu, P. Campbell, and X. -C. Zhang, Appl. Phys. Lett. 71, 593–595 (1997). 31. Q. Chen, Z. Jiang, M. Tani, and X. -C. Zhang, Electron. Lett. 36, 1,298–1,299 (2000). 32. S. Hunsche, M. Koch, I. Brener, and M. C. Nuss, Opt. Commun. 150, 22–26 (1998). 33. K. Wynne and D. A. Jaroszynske, Opt. Lett. 24, 25–27 (1998). 34. O. Mitrofanov et al., Appl. Phys. Lett. 78, 252–254 (2001). 35. O. Mitrofanov et al., Appl. Phys. Lett. 77, 3,496–3,498 (2000). 36. O. Mitrofanov et al., Appl. Phys. Lett. 77, 591–593 (2000). 37. Q. Chen, Z. Jiang, G. X. Xu, and X. -C. Zhang, Opt. Lett. 25, 1,122–1,124 (2000).
TOMOGRAPHIC IMAGE FORMATION TECHNIQUES GABOR T. HERMAN The Graduate Center, CUNY New York, NY
THE RECONSTRUCTION PROBLEM Computerized tomography is the process of reconstructing the interiors of objects from collected data based on transmitted or emitted radiation. This section explains the nature of the reconstruction problem, and the following sections present computer algorithms used for achieving the reconstructions and approaches for evaluating the efficacy of such algorithms. In the reconstruction problem, there is a threedimensional structure of unknown internal composition. The structure is subjected to some kind of radiation, either by transmitting the radiation through the structure or by introducing the emitter of the radiation into the structure. The radiation transmitted through or emitted
from the structure is measured at a number of points. Computerized tomography (CT) is the process of obtaining from these measurements the distribution of the physical parameter(s) inside the structure that have an effect on the measurements. The problem occurs in a wide range of areas, such as X-ray CT, emission tomography, photon migration imaging, and electron microscopic reconstruction; see, for example (1,2). All of these are inverse problems of various sorts; see, for example (3). A snapshot of the recent state of the art regarding volumetric reconstruction of medical images is provided by the May 2000 Special Issue of the IEEE Transactions on Medical Imaging. An important special case that provides useful insights into all variations of the reconstruction problem is estimating of a function of two variables (in other words, a planar object) from estimates of its line integrals. As is quite reasonable for any application, it is assumed that the domain of the function is contained in a finite region of the plane. Suppose that f is a function of the two polar variables r and φ. Let [Rf ](, θ ) denote the line integral of f along the line that is at distance from the origin and makes an angle θ with the vertical axis. This operator R is usually referred to as the Radon transform. [An alternative name for the operator is the X-ray transform. In three dimensions (3D), the two operators differ from each other: the X-ray transform still maps the function to be reconstructed into its line integrals, but the Radon transform in 3D becomes an operator that maps a function f into a function that associates with each plane the integral of f across that plane. Such planar integrals are natural for certain applications, such as magnetic resonance imaging (4).] The input data to a reconstruction algorithm are estimates (based on physical measurements) of the values of [Rf ](, θ ) for a finite number of pairs (, θ ); its output is an estimate, in some sense, of f . More precisely, suppose that the estimates of [Rf ](, θ ) are known for I pairs: (i , θi ), 1 ≤ i ≤ I. The notation y is used for the I-dimensional column vector (called the measurement vector) whose ith component, yi , is the available estimate of [Rf ](i , θi ). The task of a reconstruction algorithm is given the data y, estimate the function f . Following (1), reconstruction algorithms are classified as either transform methods or series expansion methods. The following sections discuss the underlying ideas of these two approaches and give detailed descriptions of two algorithms from each category. TRANSFORM METHODS The Radon transform has an inverse, R−1 , defined as follows. For a function p of and θ , [R−1 p](r, φ) =
1 2π 2
π ∞ 0 −∞
1 p1 (, θ ) d dθ, r cos(θ − φ) − (1)
TOMOGRAPHIC IMAGE FORMATION TECHNIQUES
where p1 (, θ ) denotes the partial derivative of p with respect to its first variable . [Note that it is intrinsically assumed in this definition that p is sufficiently smooth for the existence of the integral in Eq. (1).] It is known (1) for any function f that satisfies some physically reasonable conditions (such as continuity and boundedness), for all points (r, φ), that [R−1 Rf ](r, φ) = f (r, φ).
(2)
Transform methods are numerical procedures that estimate values of the double integral on the right-hand side of Eq. (1) from given values of p(i , θi ), for 1 ≤ i ≤ I. Examples of such methods are the widely adopted filtered backprojection (FBP) algorithm, called the convolution method in (1), and the more recently developed linogram method. Filtered Backprojection (FBP) In this algorithm, the right-hand side of Eq. (1) is approximated by a two-step process [for derivational details, see (1,5) or, in a more general context, (3)]. First, for fixed values of θ , convolutions defined by [p ∗Y q]( , θ ) =
∞
p(, θ )q( − , θ ) d,
(3)
−∞
are carried out, using a convolving function q (of one variable) whose choice has important influence on the appearance of the final image. Second, the estimate f ∗ of f is obtained by backprojection as follows: ∗
π
f (r, φ) =
[p ∗Y q](r cos(θ − φ), θ ) dθ.
(4)
0
To make the implementation of this explicit for a given measurement vector, suppose that the data function p is known at points (nd, m), −N ≤ n ≤ N, 0 ≤ m ≤ M − 1, and M = π . Assume further that the function f is to be estimated at points (rj , φj ), 1 ≤ j ≤ J. The computer algorithm operates as follows. A sequence f0 , . . . , fM−1 , fM of estimates is produced; the last of this is the output of the algorithm. First, let f0 (rj , φj ) = 0,
(5)
for 1 ≤ j ≤ J. Then, for each value of m, 0 ≤ m ≤ M − 1, let the (m + 1)st estimate be produced from the mth estimate by a two-step process: 1. For −N ≤ n ≤ N, calculate pc (n d, m) = d
N
p(nd, m)q[(n − n)d],
(6)
n=−N
using the measured values of p(nd, m) and precalculated values (same for all m) of q[(n − n)d]. This
1405
is a discretization of Eq. (3). One possible choice is
1 , 2 4d q(nd) = 1 − , (π dn)2
if n = 0, if n is odd and nonzero,
(7) and q(nd) = 0, otherwise. [This is often referred to as the Ram–Lak filter (6); it is the sampled inverse Fourier transform of the ramp function, Q(X) = |X|, between — 1/2d and 1/2d.] The actual choice of the convolving function q should depend on a number of things, especially on the noise level in the measurement data; see Section 8.6 of (1). 2. For 1 ≤ j ≤ J, set fm+1 (rj , φj ) = fm (rj , φj ) + pc [rj cos(m − φj ), m]. (8) This is a discretization of Eq. (4). To obtain the values needed in Eq. (8), the first variable of pc has to be interpolated from the values calculated in Eq. (6). There are many ways to do this interpolation (7). One recommended methodology is to use modified cubic splines (8), that is to use a cubic to interpolate between any pair of neighboring sample points so that the slope of the interpolated function at the sample points is the same as the slope of the straight line that connects the values at the two sample points on either side of it. An alternative is to use sinc interpolation, which can be achieved as follows: instead of convolving, as in Eq. (6), calculate the Fourier transform of p, multiply it pointwise by the ramp function Q, and pad out the product with many zeros before taking the inverse Fourier transform. The result of this will be a finely sampled version of the sinc interpolation. In practice, once fm+1 (rj , φj ) has been calculated, fm (rj , φj ) is no longer needed, and the computer can reuse the same memory location for f0 (rj , φj ), . . . , fM−1 (rj , φj ), fM (rj , φj ). In a complete execution of the algorithm, the use of Eq. (6) requires M(2N + 1)2 multiplications and additions. However, the computational demand can be reduced by taking the discrete Fourier transforms (DFTs) of both p and q, multiplying these transforms pointwise, and then taking the inverse DFT. The DFT can always be implemented (possibly after some padding by zeros) very efficiently (9) by using the fast Fourier transform (FFT) that reduces the order of the required operations to M(2N + 1) log(2N + 1). On the other hand, all of the uses of Eq. (8) require MJ interpolations and additions. Because J is typically of the order of N 2 and N itself in typical applications is between 100 and 1000, the computational cost of backprojection is likely to be much higher than the cost of convolution. In any case, reconstruction of a cross section consisting of 1,000 × 1,000 pixels (picture elements) from data collected by a typical X-ray CT device is not a challenge to state-ofthe art computational capabilities; it is routinely done in time of the order of a second or so and can be done, using a pipeline architecture, in a fraction of a second (10).
1406
TOMOGRAPHIC IMAGE FORMATION TECHNIQUES
The Linogram Method This method was proposed in (11), and the reason for its name can be found there. The basic result that justifies the method is the well-known projection theorem which says that ‘‘taking the two-dimensional Fourier transform is the same as taking the Radon transform and then applying the Fourier transform with respect to the first variable’’ (1). (This theorem leads to great insight in many areas of tomography.) The main motivation for the method is its speed of execution. In the description that follows, the approach of (12) is used. That paper deals with the fully three-dimensional problem; here it is simplified to the two-dimensional case. For the linogram approach, it is assumed that the data were collected at particular points whose locations are specified next; if collected otherwise, one needs to interpolate before reconstruction. If the function is to be estimated at an array of points that have rectangular coordinates [(id, jd)| − N ≤ i ≤ N, −N ≤ j ≤ N] (it is assumed that this array covers the object to be reconstructed), then the data function p needs to be known at points {(ndm , θm )| − 2N − 1 ≤ n ≤ 2N +1, −2N − 1 ≤ m ≤ 2N +1} (9) and at points
π ndm , + θm − 2N − 1 ≤ n ≤ 2N + 1, 2
− 2N − 1 ≤ m ≤ 2N + 1 ,
where θm = tan−1
(10)
2m 4N + 3
and dm = d cos θm .
(11)
The linogram method produces estimates of the function values from such data at the desired points by using a multistage procedure. The most expensive computation that is used in any of the stages is the taking of DFTs, which can always be implemented very efficiently by using the FFT. The output of any stage produces estimates of function values at exactly those points where they are needed for the discrete computations of the next stage; there is never any need to interpolate between stages. These two properties indicate why the linogram method is both computationally efficient and accurate. The stages of the procedure are as follows. 1. Fourier transforming of the data. For each value of the second variable, take the DFT of the data with respect to the first variable in Eqs. (9) and (10). By the projection theorem, this provides the estimates of the two-dimensional Fourier transform F of the object at points (in a rectangular coordinate system)
k k , tan θm − 2N − 1 ≤ k (4N + 3)d (4N + 3)d ≤ 2N + 1, −2N − 1 ≤ m ≤ 2N + 1 (12)
and at points (also in a rectangular coordinate system)
π k k tan + θm , − 2N − 1 (4N + 3)d 2 (4N + 3)d ≤ k ≤ 2N + 1, −2N − 1 ≤ m ≤ 2N + 1 . (13)
2. Windowing. At this point, those frequencies that are considered noise-dominated may be suppressed by multiplication by a window function (corresponding to the convolving function in the FBP method). 3. Separating into two functions. The sampled Fourier transform F of the object to be reconstructed is written as the sum of two functions G and H. G has the same values as F at all of the points specified in Eq. (12) except at the origin and is zero-valued at all other points. H has the same values as F at all of the points specified in Eq. (13), except at the origin, and is zero-valued at all other points. Clearly, except at the origin, F = G + H. The idea is that by first taking the two-dimensional inverse Fourier transforms of G and H separately and then adding the results, an estimate of f is obtained [except for an additive constant, called the DC term, that has to be estimated separately, see (12)]. What needs to be done with G is described now; the treatment of H is analogous. 4. Chirp z-transforming in the second variable. Note that the way the θm were selected implies that for a fixed k, the sampling in the second variable of Equation (12) is uniform. Furthermore, it is known that G is zero outside the sampled region. Hence, for each fixed k, 0 < |k| ≤ 2N + 1, the chirp ztransform (13) can be used to estimate the inverse DFT in the second variable at points
k , jd − 2N − 1 ≤ k ≤ 2N + 1, (4N + 3)d −N ≤j≤N (14)
The chirp z-transform can be implemented by using three FFTs; see (14). 5. Inverse transforming in the first variable. Now, the inverse Fourier transform of G can be estimated at the required points by taking, for every fixed j, the inverse DFT in the first variable of the values at the points of Eq. (14). SERIES EXPANSION METHODS This approach assumes that the function f to be reconstructed can be approximated by a linear combination of a finite set of known and fixed basis functions bj (r, φ), that is, J xj bj (r, φ), (15) f (r, φ) ≈ j=1
TOMOGRAPHIC IMAGE FORMATION TECHNIQUES
and the task is to estimate the unknowns xj . If it is assumed that the measurements depend linearly on the object to be reconstructed (certainly true in the special case of line integrals) and that one can calculate (at least approximately) what the measurements would be if the object to be reconstructed were one of the basis functions (ri, j is used to denote the value of the ith measurement of the jth basis function), then it can be concluded (1) that the ith of the measurements of f is approximately J
ri, j xj .
(16)
j=1
Then, the problem is to estimate the xj from the measured approximations (for 1 ≤ i ≤ I) of a Eq. (16). The estimate can often be selected as one that satisfies some optimization criterion. To simplify notation, the image is represented by a J-dimensional image vector x (whose components are xj ), and the data form an I-dimensional measurement vector y. There is an assumed projection matrix R (that has entries ri, j ). Let ri denote the transpose of the ith row of R (1 ≤ i ≤ I), and so the inner product ri , x is the same as the expression in Eq. (16). Then, y is approximately Rx, and there may be further information that x belongs to a subset C of RJ , the space of J-dimensional real vectors. In this formulation, R, C, and y are known, and x is to be estimated. Then, substituting the estimated values of xj in Eq. (15) will give an estimate of the function f . The simplest way of selecting the basis functions is by subdividing the plane into pixels (or 3-D space into voxels) and choosing basis functions whose value is one inside a specific pixel (or voxel) and is zero everywhere else. However, other choices may be preferable; for example, Ref. (15) uses spherically symmetrical basis functions that are spatially limited and also can be chosen to be very smooth. The smoothness of the basis functions then results in smoothness of the reconstructions, and the spherical symmetry allows easy calculation of the numbers ri, j . It has been demonstrated (16) for fully three-dimensional positron emission tomography (PET) reconstruction that such basis functions lead to statistically significant improvements in the task-oriented performance of series expansion reconstruction methods. In many situations, only a small proportion of the numbers ri, j are nonzero. (For example, if the basis functions are based on voxels in a 200 × 200 × 100 array and the measurements are approximate line integrals, then the percent of nonzero ri, j is less than 0.01 because a typical line will intersect fewer than 400 voxels.) This makes certain types of iterative methods surprisingly efficient for estimating the xj because one can use a subroutine that returns a list of those values of j, for any i, for which ri, j is not zero, together with the values of the ri, j (1,17). Two such iterative approaches are discussed now, the so-called algebraic reconstruction techniques (ART) and the use of expectation maximization (EM).
1407
Algebraic Reconstruction Techniques (ART) The basic version of ART operates as follows (1). The method cycles through the measurements repeatedly, considering only one measurement at a time. Only those xj are updated for which the corresponding ri, j for the currently considered measurement i is nonzero, and the change made to xj is proportional to ri, j . The factor of proportionality is adjusted so that if Eq. (16) is evaluated for the resulting xj , then it will match the ith measurement exactly. Other variants will use a block of measurements in one iterative step and will update the xj in different ways to ensure that the iterative process converges according to a chosen optimization criterion. Only one specific optimization criterion (and the associated algorithm) is discussed in detail here. [It is one of the several optimization criteria that are presented and justified in Section 6.4 of (1).] The task is to find the x in RJ that minimizes r2 ||y − Rx||2 + ||x − µX ||2 ,
(17)
(||·|| indicates the usual Euclidean norm) for a given constant scalar r (called the regularization parameter) and a given constant vector µX . The algorithm, that uses an I-dimensional vector u of additional variables (one for each measurement), works as follows. First, define u(0) as the I-dimensional zero vector and x(0) as µX . Then, for k ≥ 0, calculate u(k+1) = u(k) + c(k) eik , x(k+1) = x(k) + rc(k) rik ,
(18)
where ei is the I-dimensional vector whose ith component is one, and all other components are 0, and c(k) = λ(k)
r(yik − rik , x(k) ) − u(k) ik 1 + r2 ||rik ||2
,
(19)
where ik = k(modI) + 1. Theorem. [see (1) for a proof]. Let y be any measurement vector, r be any real number, and µX be any vector in RJ . Then, for any real numbers λ(k) satisfying 0 <∈1 ≤ λ(k) ≤∈2 < 2,
(20)
for arbitrary fixed ∈1 and ∈2 , the sequence x(0) , x(1) , x(2) , . . ., determined by the algorithm given before, converges to the unique vector x that minimizes the expression in Eq. (17). The implementation of this algorithm is as simple as that of basic ART which is described at the beginning of this subsection. An additional sequence of I-dimensional vectors u(k) is needed, but only one component of u(k) is used or altered in the kth iterative step. Because the indices ik are defined in a cyclic order, the components of the vector u(k) (just as the components of the measurement vector y) can be sequentially accessed. (The exact choice of this — often referred to as data access ordering — is very
1408
TOMOGRAPHIC IMAGE FORMATION TECHNIQUES
important for fast initial convergence; one such efficient ordering is described in (18). The underlying principle there is that the individual actions in any subsequence of steps should be as independent as possible.) Also, for every integer k ≥ 0, a positive real number λ(k) needs to be selected. [These are the so-called relaxation parameters that are free parameters of the algorithm and in practice need to be optimized (18).] The vectors ri are usually not stored at all, but the location and size of their nonzero components are calculated, as and when needed. Hence, the algorithm described by Eqs. (18) and (19) shares the storage-efficient nature of basic ART, and its computational requirements are essentially the same. Assuming, as is reasonable, that the number of nonzero ri, j is of the same order as J, the cost of cycling once through the data using ART is of the order IJ, which is approximately the same as the cost of a reconstruction using the FBP method. [That this is so is confirmed by the timings reported in (19).] An important thing to note about the Theorem is that there are no restrictions of consistency of the system y = Rx in its statement. Hence, the algorithm of Eqs. (18) and (19) will converge to the minimizer of Eq. (17) — the so-called regularized least squares solution — using the real data collected in any application. ART can also be used as a maximum entropy technique. The entropy of a J-dimensional vector x is defined as
that, based on the so-called EM (expectation maximization) approach, was proposed in (23). The following variant of this approach was designed for a somewhat more complicated optimization criterion (24) that enforces smoothness of the results where the original maximum likelihood criterion may result in noisy images. Let RJ+ denote those vectors of RJ in which all components are nonnegative. The task is to find the x in RJ+ that minimizes I i=1
γ ri , x − yi ln ri , x + x, Sx , 2
where ln is the natural logarithmic function, γ is a real regularization parameter, and the J × J matrix S (where entries are denoted by sj,u ) is a modified smoothing matrix (1) that has the following property. (This definition is applicable only if pixels are used to define the basis functions.) Let N denote the set of indexes corresponding to pixels that are not on the border of the digitization. Each such pixel has eight neighbors; let Nj denote the indexes of the pixels associated with the neighbors of the pixel indexed by j. Then,
x, Sx =
j∈N
−
J xj xj ln , Jx Jx
(21)
xj(k+1)
=
xj(k)
yik rik , x(k)
2 1 xj − xk . 8
I
ri
k ,j
, for 1 ≤ j ≤ J.
(22)
There are versions of this algorithm (22) that do not require exact satisfaction of a system of equations for convergence; such versions allow a theory that is applicable to real data. Expectation Maximization (EM) A reasonable alternative optimization task is to seek the x that maximizes the likelihood of observing the actual measurements, based on the assumption that the ith measurement comes from a Poisson distribution whose mean is given by Eq. (16). An iterative method to do exactly
(24)
k∈Nj
Consider the following rules for obtaining x(k+1) from x(k) .
j=1
where x is the average value of the xj for 1 ≤ j ≤ J. The use of this is usually justified by arguments (20) aimed at showing that of all x that satisfy some model equations, the one that maximizes entropy has the least information content, and so the resulting function f , defined by the right-hand side of Eq. (15), is the least likely to mislead the user by the presence of spurious features. There is a multiplicative version of ART [as opposed to the additive one, which is called so because of the additions on the right-hand side of Eq. (18)], which finds the maximum entropy solution of Rx = y, provided that there is a solution at all (21). A typical iterative step of such an algorithm (without a relaxation parameter, which may also be introduced here if desired) is
(23)
pj(k) =
9γ sj,j
− xj(k) +
J 1 sj,u x(k) u , 9sj,j u=1
I xj(k) ri, j yi , 9γ sj,j ri , x(k) i=1 1 (k) (k) 2 (k) −pj + (pj ) + 4qj . = 2
qj(k) = xj(k+1)
i=1 ri, j
(25)
(26)
(27)
Because the first term of Eq. (25) can be precalculated, executing Eq. (25) requires essentially no more effort than multiplying x(k) by the modified smoothing matrix. As explained in (1), there is a very efficient way of doing this. Executing Eq. (26) requires approximately the same effort as cycling once through the data set using ART; cf. Eq. (19). Algorithmic details of efficient computations of Eq. (26) appear in (25). Clearly, executing Eq. (27) requires a trivial amount of computing. Thus, one iterative step of the EM algorithm of Eq. (25) to Eq. (27) requires approximately the same total computing effort as cycling once through the data set using ART, which costs about the same as a complete reconstruction by the FBP method. A basic difference between the ART method of the previous subsection and the EM method of the current section is that the former updates its estimate based on one measurement at a time, whereas the latter deals with all measurements simultaneously. Theorem. [see (24) for a proof]. For any x(0) that has only positive components, the sequence x(0) , x(1) , x(2) , . . .,
TOMOGRAPHIC IMAGE FORMATION TECHNIQUES
generated by the algorithm of Eq. (25) to Eq. (27), converges to the nonnegative minimizer of the expression in Eq. (23). COMPARISON OF THE PERFORMANCE OF ALGORITHMS Four very different-looking algorithms have been discussed previously, and the scientific literature is full of many others, only some of which are surveyed in books such as (1,5,26). Many of the algorithms are available in general purpose image reconstruction software packages, such as SNARK93 (17). The novice faced with a problem of reconstruction is justified in being puzzled as to which algorithm to use. Unfortunately, there is not a generally valid answer: the right choice may very well depend on the area of application and on the instrument used for gathering the data. This section contains only some general comments regarding the four approaches discussed before, followed by some discussion of the methodologies that are available for comparative evaluation of reconstruction algorithms for a particular application. Concerning the two transform methods discussed, the linogram method (essentially an N 2 log N method) is faster than the FBP method (an N 3 method) and, when the data are collected according to the geometry expressed by Eqs. (9) and (10), then the linogram method is likely to be more accurate because it requires no interpolations. However, data are not normally collected in this way, and the need for an initial interpolation together with the more complicated looking expressions which need to be implemented for the linogram method, may steer some users toward the FBP method, despite its extra computational requirements. The advantages of series expansion methods over transform methods include their flexibility (no special relationship needs to be assumed between the object to be reconstructed and the measurements taken, such as that the latter are samples of the Radon transform of the former) and the ability to control the desired type of solution by specifying the exact sense in which the image vector is to be estimated from the measurement vector; see Eqs. (17) and (23) . The major disadvantage is that it is computationally much more intensive to find these precise estimators than to perform a numerical evaluation of Eq. (1). If the model (the basis functions, the projection matrix, and the optimization criterion) is not well chosen, then the resulting estimate may also be inferior to that provided by a transform method. The recent literature has demonstrated that usually there are models that make the efficacy of a reconstruction provided by a series expansion method at least as good as that provided by a transform method. To avoid the problem of computational expense, one usually stops the iterative process involved in the optimization long before the method has converged to the mathematically specified optimizer. Practical experience indicates that this can be done very efficaciously. For example, as reported in (19), in the area of fully threedimensional PET, the reconstruction times for FBP are slightly longer than for cycling just once through the data using a version of ART and spherically symmetric basis functions, and the accuracy of FBP is significantly worse
1409
than that obtained by this very early iterate produced by ART. Similar experience has been reported in the area of fully three-dimensional electron microscopy (27). Because the iterative process in practice is stopped early, in evaluating the efficacy of the result of a series expansion method, one should look at the actual outputs rather than the ideal mathematical optimizer. Reported experiences comparing an optimized version of ART with an optimized version of EM (16,18) indicate that the former can obtain as good or better reconstructions as the latter, but at a fraction of the computational cost. This computational advantage results from not trying to use of all the measurements in each iterative step. The proliferation of image reconstruction algorithms imposes a need to evaluate the relative performance of these algorithms and to understand the relationship between their attributes (free parameters) and their performance. In a specific application of an algorithm, choices have to be made regarding its parameters (such as the basis functions, the optimization criterion, constraints, relaxation parameters, etc.). Such choices affect the performance of the algorithm, and there is a need for an efficient and objective evaluative procedure for selecting the best variant of an algorithm for a particular task and for comparing the efficacy of different algorithms for that task. An approach to evaluating an algorithm is to start first with a specification of the task for which the reconstructed image is to be used and then define a figure of merit (FOM) that determines quantitatively how helpful the image is and hence, the reconstruction algorithm, for performing that task. In the numerical observer approach (18,28,29), a task-specific FOM is computed for each image. Based on the FOMs for all of the images produced by two different techniques, one can calculate the statistical significance at which one can reject the null hypothesis that the methods are equally helpful for solving a particular task in favor of the alternative hypothesis that the method that has the higher average FOM is more helpful for solving that task. Different imaging techniques can then be rank-ordered on the basis of their average FOMs. It is strongly advised that a reconstruction algorithm should not be selected on the basis of the appearance of a few sample reconstructions, but rather that a study be carried out along the lines indicated before. In addition to the efficacy of the images produced by the various algorithms, one should also be aware of the computational time and costs (possibly using special purpose hardware) needed for executing them. A survey from this point of view can be found in (2). FIGURES OF MERIT In this section, some examples of the FOMs mentioned in the last section are reviewed. Because this is a general discussion, only FOMs of general usefulness are presented; in a particular application, one may often wish to define FOMs that are much more task-specific (e.g., an FOM that measures how well a stenosis of a blood vessel can be seen in the reconstruction).
1410
TOMOGRAPHIC IMAGE FORMATION TECHNIQUES
Resolution In the reconstruction, it is generally desirable to be able to distinguish objects that are distinct in reality. To measure this ability, one can, for example, consider the following. If there are two identical small objects of uniform density (say, in PET, two small positron emitting spheres) near each other, then to be able to recognize them as separate, they need to be spaced at more than the full width at half maximum (FWHM) along the line connecting their centers. The FWHM is defined as the distance between the points on the two sides of the center of a single object at which the reconstructed density is half of the reconstructed density at the center of the object. If there are two objects, the reconstructed densities of each add up in the reconstruction, and if they are placed at less than the FWHM from each other, then the reconstructed value at the halfway point between their centers is likely to be actually larger than the reconstructed values at the centers themselves; this makes the two objects indistinguishable from a single elongated object. Because the FWHM in many situations depends on the direction of the line along which it is measured, it is advisable to report on its value for multiple directions, such as axial and tangential; see (30). The FWHM may also be location dependent, especially for fully three-dimensional modes of data collection. Signal-to-Noise Ratio Practical image reconstruction is inevitably a noisy process. Noise (defined in the very general sense, including both systematic effects and stochastic processes, as the difference between the actual reconstruction and what we would like to achieve in an absolutely ideal situation) is due to the counting statistics of the emitted or transmitted radiation, inaccuracies of the instrument, imperfections of the reconstruction algorithm, etc. It is clearly desirable to be able to distinguish actual signals in the reconstructions (such as radiation-emitting hot spots in the brain) from noise that may give rise to something of a similar appearance. The signal-to-noise ratio (SNR) is an FOM that measures this capability. One way of defining SNR (30) is the following. Suppose that the values in the signal are higher than those in the background. Let V be the average reconstructed value in the region of the signal, and let B and σB be the mean and standard deviation, respectively, of the average reconstructed value in a number of regions (similar in size to the signal region) that are in the background. Then SNR is defined as (V − B)/σB . Such a measure is also not a single property of the reconstruction algorithm; for example, it is reported in (30) that the SNR is considerable larger when the signal regions are spheres 25 mm in diameter than when they are spheres 18 mm in diameter. Artifact Limitations Based on the general definition of noise given in the last subsection, ‘‘artifacts’’ are those components of the noise that are not due to statistical fluctuations. For example, a specific choice of the convolving function q in FBP may result in ripples coming off sharp edges of the image; see
Fig. 8.8(a) in (1). Another choice of q may not suffer from such an artifact to the same extent, but usually there is a price to pay, such as reduced SNR; see Fig. 8.8(c) in (1), but compare also Fig. 8.9(c) with Fig. 8.9(a). One kind of artifact that occurs frequently in practice is due to missing angles in the data collection. In the terminology of the previously introduced notation, this means that if the θi s are arranged in increasing order, than there is an i for which θi+1 − θi is considerably larger than its typical value for 1 ≤ i ≤ I (using the definition that θI+1 = θ1 + 2π ). Then, the reconstructions will typically appear to be blurred in the direction of the missing angles. One example of an application where this happens is the electron microscopic reconstruction of certain kinds of particles from data collected (due to the nature of these particles) using so-called conical tilting geometry. This is a three-dimensional mode of data collection, where projections are not collected for any of the directions within a cone of directions around an axis. This results in blurring along this axis, and the resolution in the direction of this axis can be measured by an appropriately defined FOM (27). Using such an FOM, it can be shown that some appropriate version of ART is better than some version of FBP from the point of view of blurring in the direction of missing angles. However, there are techniques to correct the output of FBP-like algorithms for data that have missing angles. For example, one can use prior knowledge, such as the region within which the object to be reconstructed must lie and the range within which the reconstructed values should be; see, Chapter 4 of Part I of (3). SUMMARY This article has covered some of the basic principles behind tomographic imaging, including such topics as the Radon transform, interpolation for backprojection (both modified cubic spline and sinc), the Ram–Lak filter, missing angle techniques, maximum entropy techniques, resolution, the signal-to-noise ratio, and artifact limitations. Further details may be found in the references. BIBLIOGRAPHY 1. G. T. Herman, Image Reconstruction from Projections: The Fundamentals of Computerized Tomography, Academic Press, N.Y., 1980. 2. G. T. Herman, J. Real-Time Imaging 1, 3–18 (1995). 3. G. T. Herman, H. K. Tuy, K. J. Langenberg, and P. C. Sabatier, Basic Methods of Tomography and Inverse Problems, Adam Hilger, Bristol, England, 1987. 4. P. Lauterbur, Nature 242, 190–191 (1973). 5. A. C. Kak and M. Slaney, Principles of Computerized Tomographic Imaging, SIAM, Philadelphia, 2001. 6. G. N. Ramachandran and A. V. Lakshminarayanan, Proc. Natl. Acad. Sci. U.S.A. 68, 2,236–2,240 (1971). 7. E. H. W. Meijering, W. J. Niessen, and M. A. Viergever, Med. Imaging Anal. 5, 111–126 (2001). 8. G. T. Herman, S. W. Rowland, and M. -M. Yau, IEEE Trans. Nucl. Sci. 26, 2,879–2,894 (1979). 9. E. A. Brigham, The Fast Fourier Transform, Prentice-Hall, Englewood Cliffs, N.J., 1974.
TOMOGRAPHIC IMAGE FORMATION TECHNIQUES
1411
10. J. L. C. Sanz, E. B. Hinkle, and A. K. Jain, Radon and Projection Transform-Based Computer Vision, SpringerVerlag, Berlin, 1988.
21. A. Lent, in R. Shaw, ed. Image Analysis and Evaluation, Society of Photographic Scientists and Engineers, Washington, DC, 1977, pp. 249–257.
11. P. Edholm and G. T. Herman, IEEE Trans. Med. Imaging 6, 301–307 (1987). 12. G. T. Herman, D. Roberts, and L. Axel, Phys. Med. Biol. 37, 673–687 (1992). 13. L. R. Rabiner, R. W. Schafer, and C. M. Rader, Bell Syst. Tech. J. 48, 1,249–1,292 (1969).
22. Y. Censor et al., in A. Wakulicz, ed., Numerical Analysis and Mathematical Modelling, Banach Center P, PWN — Polish Scientific, Warsaw, Poland, 1990, pp. 145–163. 23. L. A. Shepp and Y. Vardi, IEEE Trans. Med. Imaging 1, 113–122 (1982). 24. G. T. Herman, A. R. De Pierro, and N. Gai, J. Visual Commun. Image Representation 3, 316–324 (1992).
14. P. Edholm, G. T. Herman, and D. A. Roberts, IEEE Trans. Med. Imaging 7, 239–246 (1988). 15. R. M. Lewitt, Phys. Med. Biol. 37, 705–716 (1992). 16. S. Matej et al., Phys. Med. Biol. 39, 355–367 (1994). 17. J. A. Browne, G. T. Herman, and D. Odhner, Technical Report MIPG198, Dept. of Radiology, University of Pennsylvania, Philadelphia, 1993. 18. G. T. Herman and L. B. Meyer, IEEE Trans. Med. Imaging 12, 600–609 (1993). 19. S. Matej and R. M. Lewitt, IEEE Trans. Nucl. Sci. 42, 1,361–1,370 (1995). 20. R. Gordon, R. Bender, and G. T. Herman, J. Theor. Biol. 29, 471–482 (1970).
25. G. T. Herman, D. Odhner, K. D. Toennies, and S. A. Zenios, in T. F. Coleman and Y. Li, ed., Large-Scale Numerical Optimization, SIAM, Philadelphia, 1990, pp. 3–21. 26. F. Natterer, The Mathematics of Computerized Tomography, Wiley, Chichester, England, 1986. 27. R. Marabini, G. T. Herman, and J. M. Carazo, Ultramicroscopy 72, 53–65 (1998). 28. K. M. Hanson, J. Opt. Soc. Am. A 7, 1,294–1,304 (1990). 29. S. S. Furuie et al., Phys. Med. Biol. 39, 341–354 (1994). 30. P. E. Kinahan et al., IEEE Trans. Nucl. Sci. 42, 2,281–2,287 (1995).
U ULTRASOUND IMAGING
quantitative imaging or tissue characterization in medical ultrasound. This area still remains a subject of active research. This article deals only with the first objective. Pulse echo imaging has become preeminent in medical ultrasound for reasons that will become clear later. The concept is illustrated by Fig. 1. Note that the figure depicts wave propagation only in two dimensions, whereas reality is three-dimensional. A finite size transducer located on the left side generates sound waves in response to electrical excitation and couples them to the anatomy under examination. The generated waves are of finite duration, and hence the disturbance in the tissue medium has a finite spatial extent in the propagative direction. This can be seen in a snap-shot of the disturbance in Fig. 1a, taken soon after transduction process is completed. Now, this disturbance propagates away from the transducer as time progresses. If the medium responds linearly, the propagative effects are governed by solving the space–time wave equation in free space (lossy or lossless) and at boundaries. The macroscopic effects are embodied in the familiar concepts of reflection, scattering, and refraction at the boundaries and diffraction in the homogeneous
NAVALGUND A. H. K. RAO Rochester Institute of Technology Rochester, NY
Ultrasound is defined as sound waves whose frequency is above 20 kHz, the upper limit of the human audible range. The past century has witnessed the growth of an imaging technology that uses ultrasound as the information carrier. A physical medium is required to sustain longitudinal and transverse ultrasound waves. The imaging can be passive, when the object to be detected is the source of sound, or active, when an external deliberately controlled source of sound is used and the echoes from objects in the medium are used to generate an image. State-ofthe-art active systems can launch waves into a medium and nondestructively collect echo signals and display information pertaining to the inside of an optically opaque three-dimensional object. In medical applications, the propagation medium is soft tissue. Present day technology permits real-time (20 or more image frames per second) visualization of the gross anatomy and pathology. Medical ultrasound has found applications throughout the breadth of diagnostic investigations that includes obstetrics, gynecology, abdomen, ophthalmology, cardiovascular system, breast, and superficial structures. The frequency used can range from 1–10 MHz for imaging various large body parts and can go as high as 20–50 MHz for specialized imaging tasks where the required depth penetration is small such as in endoscopy, dermatology, or ophthalmology. Besides medical applications, ultrasound is also used in nondestructive evaluation (NDE) and sound navigation and ranging (SONAR). NDE is commonly used in the aerospace and nuclear power industries to inspect safety-critical parts for flaws during in-service use. The medium here is generally solid material. SONAR pertains to imaging under the sea. At the turn of the millennium, the medical application of ultrasound has emerged as the technology leader even though its use to detect internal defects in metal structures in 1940 preceded its first use in medicine. The imaging techniques and technologies used in these different application areas are quite similar, at least for medical and NDE. Therefore, the main body of the discussion will center on medical imaging. An ultrasound imaging system usually conveys information about echo generating objects. The image can be thought of as a display of the multidimensional spatial distribution of some object-dependent entity. Visualization of this information is one of the objectives, and perhaps the major objective in medical ultrasound. The image value is generally related to the mechanical or elastic property of the object. A secondary objective therefore is to understand and quantify the relationship between the image value and the object property, also known as
x
(a)
Scatterer boundary
Transducer
z (b)
z (c) Voltage + − Time Figure 1. Physical processes involved in pulse echo imaging. Image data was generated using software WAVE 2000. 1412
ULTRASOUND IMAGING
free space. Figure 1b shows a snapshot of the wave field after free space propagation to some depth z just in front of a scatterer boundary. Note that the spatiotemporal description of the wave field at any depth z is governed by diffraction, a phenomenon that plays an important role in determining the image quality in pulse echo imaging. Figure 1c depicts a later situation where the spatiotemporal echo field is seen propagating back toward the transducer. The transmitted primary field can also be seen propagating down along the z axis. This can generate additional echoes that arrive at the transducer later. Time encodes the range or depth of the scatterer. The echo pressure wave field incident on the same transducer generates an electrically time-varying voltage signal. The wave field incident on the face of the transducer is complex; it has a magnitude and phase at each point. If the voltage signal at any given instant, is considered proportional to the spatial integration of the complex wave field over the face of the transducer, we refer to it as phase-sensitive detection. The majority of transducers in use today are of this type. The detected voltage–time signal is referred to as the A-line signal. The three steps involved in the imaging process, namely, transduction and reception by the transducer, two-way propagation in the medium to some depth, and the scattering/reflection phenomenon will be considered in detail. But first we digress to consider different modes of scanning and information display employed in medical diagnosis and NDE. SCANNING MODES AND IMAGE DISPLAY FORMATS The A scan is simply the display of the transducer voltage–time signal generated by echoes. It is readily observed by connecting the received output to an oscilloscope triggered at the instant of initial excitation. If the ultrasound propagative speed c in the object is constant, time t = 2z/c represents the range distance z. A large amount of information is contained in the radio-frequency (Rf) signal displayed in the A scan. The disadvantage of the A-mode is that it provides information about the object only along the line of sight and within the beam width of the transducer (Fig. 2). It is tedious
A-line signal Voltage + _ Time t Transducer
z
Figure 2. A-scan process.
1413
and time-consuming to move the transducer around. Furthermore, a large amount of spatial information has to be interpreted quickly and kept in memory by the observer. The B scan refers to a brightness scan. It is essentially a display of adjacent A-scan signals that are generated when the transducer is moved laterally along a line, say the x axis, as part of a linear scanning protocol (Fig. 3). However, only the envelope or amplitude of the oscillating voltage signal is used in the display. The signal values are mapped into gray-scale values that can be displayed as a range of brightness on a display device. The B-scan image represents a two-dimensional planar slice of the three-dimensional medium. The plane is determined by the transducer beam axis and the direction in which the scanning takes place. For example, the image in Fig. 3 is a x–z slice at some fixed y location. The scanning movement of the transducer was performed mechanically in the early 1970s. This is now largely replaced by electronic scanning using one-dimensional phased array transducers. The B scan is particularly suited for developing fast scanning techniques that can produce real-time images. For our present purpose, we may define real-time imaging technique as one that can acquire and display 20 or more image frames per second. For example, such high speeds are essential for adequate cardiovascular, abdominal, and fetal imaging. Qualitatively, the brightness at any given point in the image represents the scattering strength of the scatterers at that point in the chosen plane. Due to the finite size of the ultrasound beam, the plane has a finite thickness. We may define a point-spread function (PSF) here as the image of an infinitesimally small point scatterer located at some point in the plane that is being imaged. The PSF generally represents an image quality metric of any imaging system or modality. The spatial size of the PSF determines the resolution of that imaging system. The PSF for this modality varies spatially due to the divergence of the transducer beam. This will be considered in more detail later. B-scan imaging forms the basis for most of the medical ultrasound imaging techniques and their advancement (1). The C scan refers to constant depth scanning. This technique also uses pulse echo signals, as illustrated in Fig. 4. Here, an electronically time-gated portion of the A-scan signal is used for a brightness-modulated display. A signal amplitude within a time gate centered at t0 = 2z0 /c only is used. The transducer has to be scanned laterally in the x, y plane for this modality. The mapping of the brightness extracted from the echo signal constitutes a two-dimensional image that provides a picture of the scatterers located in a thin slice in the x, y plane of the object at depth z0 . The thickness of the slice depends on the gate length. The PSF is spatially invariant. The advantage is that the transducer beam can be focused to a particular depth and optimal uniform resolution can be maintained everywhere in the image. A major disadvantage is that two-dimensional scanning requires considerable time for acquiring data. Large area coverage for real-time imaging is difficult now, even for one-dimensional phased array transducers. For this reason, it is rarely used for medical imaging but has found some applications in NDE. The
1414
ULTRASOUND IMAGING
B-scan image A-line signals
Time t
z
Phased array transducer
x Scan direction x
z
Figure 3. B-scan process.
Time gate A-line signal at t0 = 2 z0 /c
Image (x, y )
y x z0 z
Figure 4. C-scan process.
future design of two-dimensional phased array scanners could possibly render real-time imaging feasible by this modality. The M scan is used to monitor slow changes in a given A scan as a function of time. The transducer is held stationary at a fixed location, and the A-scans signals from the same line of sight are recorded at regular time intervals (T) (Fig. 5). The time-modulated brightness is displayed as a function of depth z (vertical axis) and T (horizontal axis). It is used to detect a pattern of slow movement in time of a reflector or scatterers along a fixed line of sight. The reflectors A and C in Fig. 5 are
T Excite the transducer at regular time interval T and record the A line Stationary transducer
B
z
A B
z Figure 5. M-scan process.
A
C
C
ULTRASOUND IMAGING
stationary in time, and B is moving slowly. The motion of B can be studied by an M scan. A typical example of an application of M-mode scanning is examination of the pattern of movement of heart-valve leaflets (2).
1415
stands for equilibrium medium density and K is the compressibility. Table 1 provides a listing of values that are of interest in medical ultrasound. Reflection/Scattering
ULTRASONIC PROPERTIES OF A BIOLOGICAL MEDIUM AND CONTRAST AGENTS We will consider here those medium properties that are of major importance for medical ultrasound. The medium here therefore refers mainly to different types of soft tissue. This is essential for understanding the technology development, its limitations, and image interpretation. Contrast agents are now used clinically. Because these agents are injected intravenously into the body, a brief study of their properties is warranted here. The reader is referred to (3,4) for medium properties that are important in NDE. Propagative Speed c Soft tissue behaves as a fluid and therefore cannot sustain shear or transverse waves. We need consider only longitudinal or compressional waves. Conventional pulse echo image reconstruction is based on assuming a constant compressional wave propagative speed c in the human body. Fortunately, for soft tissue, 1,540 m/s is considered a good average value for c, and the range of variations for different tissues is generally less than 5% (see Table 1). Bone and air are exceptions, and consequently they do become a problem and a source of artifacts in medical ultrasound. Dispersion, that is, variation of c with frequency is negligible in soft tissue (less than 1% for frequencies in the 1–15 MHz range). Typical wavelengths can range from 1.5 mm at 1 MHz to 0.1 mm at 15 MHz. The finite speed also implies that all echoes from a body 15 cm thick can be received in less than 200 microseconds. Characteristic Acoustic Impedance Z This medium property is formally represented by Z = ρo c = (ρo K)1/2 for the plane wave condition where ρo Table 1. Ultrasonic Properties of Biological Tissues
Material
Speed, c m s−1
Acoustic Impedance, Z kg m−2 s−1 × 10−6
Water Air
1,497 330
1.49 0.0004
Bone Liver, pig Lung
2,900 1,570
5.37 1.67
Muscle
1,585
Fat, pig Blood Aluminum Glass, Pyrex
1,479 1,580 6,419 5,640
1.39 1.67 17.33 12.63
This phenomenon is responsible for generating the echo signal from any point in the medium. We will start with the plane wave where it is assumed that the wave field is uniform in amplitude and phase across a plane that is perpendicular to the direction of propagation and this plane is of infinite extent (Fig. 6a). An interface is defined here as a planar boundary across which there is a change in the acoustic impedance from Z1 to Z2 . When a wave meets an interface and its direction of propagation makes an angle θi with respect to the normal to the interface, in general, it will be reflected at angle θi and refracted at an angle θr . As in optics, reflection and transmission coefficients can be defined as R=
Z2 cos θi − Z1 cos θt reflected − amplitude , = incident − amplitude Z2 cos θi + Z1 cos θt
T=
transmitted − amplitude 2Z2 cos θi , (2) = incident − amplitude Z2 cos θi + Z1 cos θt
along with θi = θr and sin θi / sin θt = c1 /c2 . Even though the ideal situation described before is rarely manifested in soft tissue imaging, these equations are a starting point for deeper understanding. There are two different ways of looking at how and where the echoes are generated in tissue. An echo generating boundary can be defined as a surface across which a difference in Z exists. This thought process leads to a discrete model for the tissue where such boundaries, it is assumed, are distributed in three-dimensional space. The term scattering may be used when the incident wave energy is redirected in many directions instead of being specularly reflected or refracted. This can happen if the boundary is small compared to the wavelength or if its surface is rough. To understand scattering from tissue fully, it is essential to know what the scattering properties are for all possible boundaries of varying shape and sizes. Alternatively, one can assume that the acoustic impedance Z varies continuously in the medium. Its spatial derivative is the source of echoes. This can lead to a continuum
Attenuation db cm−1 MHz−1 0.001 10 dB cm−1 at 1 MHz 4.06 0.11 40 dB cm−1 at 1 MHz 1.65–1.75 dB cm−1 at 1 MHz 0.30 0.11 0.006 0.001
(1)
Scattering Specular
qr
Diffuse
qi
Z 1, c 1
l
Z 2, c 2 qt
(a)
(b)
Figure 6. Specular reflection and diffuse scattering.
1416
ULTRASOUND IMAGING
model where the inhomogeneous wave equation needs to be solved to determine the scattered signal (5,6). The difficult aspect of tissue as an echo generating medium is that our knowledge of the spatial variation of impedance in soft tissue is still in its infancy. It was surmised early that for frequencies in the range of 1–10 MHz, the collagenous connective tissues constitute the dominant boundaries (7). But much weaker echoes could be generated from blood and tissue cells as well. Theoretically, the scattering problem for boundaries that have deterministic shape and size, such as a point that is infinitesimally small compared to wave length, a sphere or cylindrical rod etc., has been solved (8). But for arbitrary shapes that are encountered in soft tissue imaging, the task is difficult. Quite frequently, a simplified stochastic model is assumed for theoretical understanding and for designing phantoms that mimic scattering from soft tissue. For example, the tissue could be modeled as a random distribution of point scatterers. The spacing, shape, and/or scattering coefficients, it may be assumed, are governed by some predetermined probability distribution functions (9–12). The tissue mimicking phantoms designed for imaging system quality control studies use small glass/plastic beads that are distributed randomly in the phantom, so that a B-scan image texture has an appearance similar to that seen in a clinical image. In what follows, we will proceed to interpret tissue scattering based on our current general understanding of scattering on different length scales. Different kinds of scattering phenomena occur on different length scales ‘‘d’’ of the scattering structure relative to the wavelength λ (13). Structures within tissue that may scatter ultrasound can range from cells that are 10 microns in diameter to diaphragm, tissue–bone interface, or organ boundaries whose sizes are of the order of many centimeters. For the sake of discussion, we may classify them in three categories. When d λ and the surface is very smooth, that is, the surface ‘‘roughness’’ is very small compared to λ, the beam is specularly reflected, as shown in Fig. 6a, and follows the law of reflection and refraction reasonably well. There are no such absolutely perfect reflectors in the body, though some organs such as bone surface or diaphragm may come close. It is more appropriate to interpret the reflection from such large-scale structures as diffuse, as shown in Fig. 6b. This can happen when the largescale boundary is not smooth and surface irregularities are comparable in magnitude to λ. Diffuse scattering redirects the sound wave in many directions, including the backward direction, which we refer to here as backscatter. This fact is primarily responsible for visualization of such large-scale structures even, when they are not positioned perpendicularly to the ultrasound beam. Reflection or diffuse scattering is strong, and its frequency dependence is generally weak, ≈ f 0 . Except for the bone or air interface, the change in speed across the interface is usually small, so that refraction effects (beam bending) are negligible and the existence of a straight line of sight can be assumed for B-scan imaging. The second category is defined by scatterers that possess the characteristic dimension d λ. Blood cells
and cells in tissue are some examples. Microbubbles that are used in contrast agents injected into the body for imaging also fit into this category. The scattering strength is generally very weak, microbubbles may be an exception because of a large acoustic impedance difference between tissue and air. Scattering in this regime is termed Rayleigh scattering, and it generally has f 4 frequency dependence (6). The angular dependence is fairly uniform, but backscatter may be predominant in some cases. The third category corresponds to situations where the scatterer spacing is comparable to the wavelength, d ≈ λ. This probably represents many tissue microstructures. It is also the most difficult to analyze because not much is known regarding what constitutes a scattering boundary in tissue parenchyma. It is a generally held hypothesis that the interfaces of collagenous connective tissue are dominant scatterers below 10 MHz (7). The microstructure formed by connective tissue appears in a variety of scales and shapes in the body. For example, evidence shows that the connective tissue matrix surrounding the lobular boundaries in liver tissue histology is the dominant scattering entity (11,12). The size or spacing is of the order of millimeters. Similarly, it has been argued that the collagenous septa surrounding cylindrically shaped muscle fibers and fascicles, known as the endomysium and perimysium, respectively, are the main contributors to scattering in muscle tissue (14). The diameters here can vary from 50 to 5,000 microns but the length can be in centimeters. In general, the scattering strength in this category is moderate but significantly weaker (10–30 dB lower) compared to the first category of nearly specular reflectors. In the 1–10 MHz range, the frequency dependence is ≈ f m where m ≈ 1–1.3; in other words, it is mostly linear. The scattering can be highly anisotropic, as in muscle fibers. Because the system resolution is on the order of λ ≈ a few millimeters, the echoes from adjacent scattering boundaries are not resolved, and an interference pattern of the echoes ensues. Due to the statistical variability inherent in this phenomenon, a noise-like texture can be seen in B-scan images. This is generally referred to as speckle (15). Scattering characteristics are most conveniently expressed in terms of the scattering cross section. The reader is referred to the references cited for a more detailed model-based treatment of this topic (1,5,6). A model-based spectrum analysis of the radio-frequency (Rf) echo signal can provide some information about the tissue scattering microstructure. Clinical applications still remain a potentially useful research endeavor (16). Because of the nondeterministic nature of the tissue scattering microstructure, several researchers have tried to model it as a discrete random process and attempted to extract tissue specific information from the statistical analysis of the echo signal (9–12, 17–19). Though a great potential for characterizing tissue structure lies here, the understanding still remains in its infancy. Attenuation As a wave propagates in a medium, its amplitude or intensity can decrease with distance due to two types of losses: absorption of energy in the medium and loss of energy due
ULTRASOUND IMAGING
to scattering. The absorption mechanisms in soft tissue are quite complex and they show a frequency dependence (1). The loss due to both mechanisms collectively is termed attenuation. The wave intensity I0 decreases exponentially to a value Iz as it propagates a distance z: Iz = I0 e−µ·z . An attenuation coefficient µ for soft tissue can be defined from measurement as Iz 1 10 log10 . (3) µ(dB cm−1 ) = − z I0 For many soft tissues, µ ≈ αf , that is, the frequency dependence is nearly linear. Table 1 lists the values of α for several different tissues. A good average value of α for soft tissue turns out to be around 0.5 dB/cm/MHz. Note that the value for lung and skull bone is much higher and for water it is much lower. Attenuation poses a challenge for ultrasound imaging system designers. The total cumulative attenuation suffered by a wave that has to travel to a depth of 15 cm and back in tissue can be as high as 75 dB for a 5-MHz signal. The demand on the dynamic range to display all of this information is too high. The signal processing to meet this demand will be examined later. Attenuation and its frequency dependence are tissue parameters. Methods have been explored to extract this information from the echo signal for tissue characterization (16,18). The methods require the raw Aline radio frequency (Rf) data before it has undergone any envelope detection processing. The technical difficulties for in vivo data stem from (1) the need to correct for transducer diffraction effects and (2) the need for ensemble averaging across a finite volume of tissue to reduce the speckle interference artifact. Nonlinearity Parameter B /A Derivation of the basic differential equations that embody harmonic wave distortion for plane wave propagation in one dimension can be found in many texts (20). The following wave equation describes the nonlinear response of a medium to the wave amplitude or equivalently, the particle displacement ξ(x, t)in space and time: ∂ 2ξ = ∂t2
∂ 2ξ ∂ 2ξ c20 c20 (B/A)+2 2 = B 2 . ∂x ∂ x ∂ξ K ξ, 1+ A ∂x
(4)
Note that co is the wave velocity when the nonlinear response is absent or can be ignored. Clearly, when the amplitude ξ is not infinitesimally small, the propagative speed begins to depend on the wave amplitude through its gradient ∂ξ/∂x. The ratio B/A is the nonlinearity parameter of the medium. For nonlinear propagation, K(ξ, B/A) deviates from its limiting value of one. The implication is that for a single frequency wave of finite amplitude, different parts of the wave propagate at a speed slightly different from co . The higher amplitude portion of the wave front will travel at a higher propagative speed than those portions of the wave front of lower amplitude. Consequently the single-frequency waveform is distorted. Through Fourier analysis, it can be shown that a distorted albeit periodic wave possesses harmonic components.
1417
The Fubini solution (21) for a harmonic source signal that has a first-order approximation provides a harmonic description of the distorted wave as nz ∞ Jn u nzl sin[n(ωt − kz)] =2 uo n=1 l where l=
(5)
c2o
, B + 1 ωuo 2A
where uo is the initial wave amplitude, B/A is the nonlinearity parameter of the medium, co is the small amplitude velocity, k = ω/c0 is the wave number, and l is the propagative distance at which a shock wave is formed, that is, when the distorted wave assumes a sawtooth form. The solution is valid for z < l. The index n stands for the harmonics. The amplitudes of the harmonics depend on several system and medium-dependent parameters. The B/A ratio for water is 5, for blood it is 6, for fat it is 11, and for most soft tissue it is around 8 (6). Recent developments of a technique, widely known as harmonic imaging, has its basis in this phenomenon. Although the B/A ratio may be a tissue characterization parameter, this has not been investigated extensively. As far as B-scan imaging is concerned, the properties of the biological media considered before pose several major implications and challenges for system designers. First, the echo signal strength can vary from 20–40 dB due to the different classes of scatterers in the tissue. Second, a speckle-like texture in the image is an accepted fact and is often considered desirable because it carries qualitatively useful information. Third, the predominantly diffuse nature of scattering in tissue ensures that, in general, there will be some backscattered signal from almost all types of boundaries. This has made the simple B-scan pulse echo mode of imaging with a single transmitreceive device a useful technique in medical ultrasound. Fourth, refraction at many soft tissue boundaries, is negligible. Therefore, no major correction is needed in most cases for proper registration of echoes in the B-scan image. In some situations, however, refractive artifacts can be observed (2). Fifth, due to the generally weak backscattering scenario, multiple reflection/scattering can be considered negligibly small. Again in some special cases artifacts can be observed when this is not the case (2). Sixth, a constant speed of propagation, it can be safely assumed, converts arrival times into depth. Again, only in some special, cases artifacts due to misregistration can occur. Seventh, the attenuation in tissue increases linearly with frequency. A typical value of 0.5 dB cm−1 .MHz−1 has many implications. For body cross-sectional imaging, assuming a required depth range of 20 cm and transducer frequency of 4 MHz, the echoes from the deepest reflector can be 80 dB below that from the nearest one. Display of information over such a large dynamic range is a challenge. At still higher frequencies, the signal-to-noise ratio (SNR) can become very poor at large depths. Therefore, the attenuation in tissue sets the
1418
ULTRASOUND IMAGING
upper limit on the usable frequency and the maximum depth that can be imaged with reasonably good SNR. Typically, imaging at frequencies higher than 5–7 MHz is possible only for superficial tasks that do not require large depth penetration. Eighth, harmonic signal generation due to nonlinear propagation in tissue is predominant at the pressure amplitudes that are commonly employed in medical ultrasound. Its clinical usefulness has been identified only recently. The details are considered in a later section under the heading of harmonic imaging. Doppler Effect The Doppler effect is a phenomenon in which an apparent change in the frequency (and wavelength) of sound is observed if there is relative motion between the source of sound and the receiver or scatterer. If the source is moving toward the receiver, the receiver is moving toward the source, or the scatterer is moving toward the source and the receiver (as in pulse-echo imaging), the received wave has higher frequency than would be experienced without the motion. For the pulse echo, the shift in the frequency fD , commonly referred to as the Doppler shift, is given by 2vf cos φ , (6) fD = c where v is the velocity of the scatterer or the interface, f the frequency of the transducer, c the speed of sound in the medium, and φ is the angle between the direction of v and the sound propagative direction (see Fig. 24). If the motion is away (receding), the received wave has a lower frequency, and fD is negative. For medical applications, it was realized early that blood cells represent a source of moving scatterers and therefore can produce a measurable Doppler shift (1,2,13). For physiological blood velocities ranging from 20 to 200 cm/s and f from 2 to 7.5 MHz, the Doppler shift typically falls in the audible range (200 kHz to 10 kHz). Doppler ultrasound imaging technology is based on this phenomenon and continues to play an important role in medical ultrasound. The technology development to this date is summarized in a later section under the heading of Doppler ultrasound. Contrast Agents Ultrasound contrast agents (UCA) are now a clinical reality. They generally consist of air-filled micron-size gas bubbles that can be injected into the blood stream. The interaction between ultrasound and the contrast agent is still a subject of vigorous research. What is presented here are some general statements about the current understanding of bubble response when the bubble diameter is very small compared to the ultrasound wavelength. The response of a gas bubble to an ultrasound wave depends on the acoustic pressure amplitude and can be divided into three regimes. At small amplitudes, the relative compression and expansion of a bubble is linearly related to the instantaneous pressure. This is the linear regime that has moderate scattering at the insonifying frequency. At higher amplitudes, we enter a regime of nonlinear response. Bubble compression generally retards
relative to expansion. Consequently, the bubble size is not linearly related to the applied acoustic pressure, and the bubble vibration contains second and higher multiples of the insonifying frequency (22). Thus, the backscatter signal contains the fundamental (transmitted) frequency, and also harmonic frequencies. If the amplitude is increased further, the scattering levels of most contrast agents increase abruptly for a short time. Now, it is believed that this is associated with bubble rupture and release of free gas bubbles (23). This irreversible effect is often termed transient cavitation and the scattering in this regime becomes highly nonlinear. Typical values for the range of pressure p for the three regimes are p less than 50 kPa for linear scattering, p between 50 kPa and 200 kPa for nonlinear harmonic regimen, and p between 0.2 MPa and 2 Mpa for transient scattering. Recent research efforts have explored all the three regimes for possible imaging applications. A summary of these efforts is given in a later section under the heading of contrast imaging. INSTRUMENTATION AND PROCESSING The basic engineering principles of B-scan imaging are described here starting with signal generation/reception followed by major elements of the signal processing in an ultrasound scanner. This is followed by a description of a number of technological improvements that have taken place in recent years. Generation and Detection Transducers The central function of the transducer is to convert an electrical signal to pressure waves that couple to the medium and vice versa. However, from the standpoint of image quality, the transducer is the single most important component. Its design parameters influence the resolution and performance of an imaging system. Figure 7 illustrates the main features of a single-element transducer. The active element, which is a piezoelectric material, is a thin plate that is responsible for the conversion. Its operation is based on the well-known piezoelectric effect. If a voltage of some polarity is applied across the two faces of a plate via gold-plated electrodes, the plate thickness increases in response, due to reorientation of the electric dipoles in the element.
Electrical network
Case Backing Connector PZT crystal
Electrodes
Wear plate
Figure 7. Components of a single element transducer.
ULTRASOUND IMAGING
If the polarity is reversed, the plate thickness can be made to decrease from its equilibrium thickness. Thus, the plate can be made to vibrate at some frequency by applying a sinusoidal voltage. Conversely, the application of a stress due to the incoming echo wave produces a sinusoidal voltage across the plate surface that can be picked up by sensitive electronics. The most commonly used material is lead zirconate titanate (PZT) which is commercially available. For details about other usable materials and their properties such as quartz, polyvinylidine difluoride (PVDF), and other newly developed PZT-polymer composites, the reader is referred to references cited in (1). The thickness of the PZT crystal determines its resonant frequency fo , the first among many others. By considering the two surfaces of the piezoelectric crystal as two independent vibrators, it is possible to see that the resonant frequencies are those where two waves add in phase. The thickness selected is one-half of the sound wavelength in PZT, considering that another 180° phase change takes place at the front surface. For example, for PZT, the thickness (in mm) = 2/fo (in MHz). The frequency response T(f ) of the transducer drops off above and below its resonant frequency, as shown in Fig. 8. Given the T(f ) of a transducer, its inverse Fourier transform, F −1 [T(f )] = p(t), defines a time domain pulse p(t). Considering the transducer as a linear system, it is possible to interpret p(t) as the system impulse response. In other words p(t) is the short pulse emitted by the transducer when the input electrical drive signal is a very short rectangular voltage pulse (mathematically called an impulse or a delta function) (24). Because a pulse of short duration is desirable for pulse-echo imaging, the transducers are normally excited by such impulse signals. The transducer quality factor, defined as Q = fo /f , is an important parameter, where f stands for the −6 dB bandwidth of T(f ). As we will see later, the shorter the duration of a pulse, the better the resolution along the transducer axis. The pulse duration and the bandwidth f are inversely related (24). For superior resolution, the bandwidth has to be increased. This is achieved by a combination of two approaches: attaching a highly attenuating backing material to the back of the piezoelectric element to damp the vibrations and by electrically tuning the network, as shown in Fig. 7. This, of course, comes at a cost, a
0 dB
T (f ) −6 dB
∆f
f0
Frequency f
Figure 8. Typical frequency response of a transducer.
1419
decrease in energy transfer efficiency. Typical Q values for medical transducers are around 2–4. PZT has an acoustic impedance that is 14 times that of water or soft tissue. Therefore, a matching layer is added to the front surface that also serves as a protective wear plate. As we shall see later, the shape and size of the element (circular, rectangular, etc.) combined with the dominant wavelength determines the lateral resolution that is limited by the diffractive effect. Focusing can be achieved by introducing curvatures into the element (1,2,13). Processing the Echo Signal Figure 9 illustrates the essential components of a B-scan pulse-echo imaging system. The transducer is driven by a very short duration, high voltage (100–400 volts) signal. To ensure synchronization with the lateral scanning motion, the timing is controlled by a trigger from a clock. The pulsed excitation has to be repeated for every A-line data acquisition. This pulse repetition frequency (PRF) must be chosen so that sufficient time is available to receive echoes from the farthest reflector of interest. For example, if we wish to image up to a depth of 15 cm, then the PRF−1 must be larger than 195 µs. Typical PRFs are of the order of 1 kHz. The A-line radio-frequency (Rf) echo signals are generally weak, so they require low-noise, lowvoltage Rf preamplification. A transmit-receive circuit is used to protect the preamplifier from high voltage transmit signals (25). As illustrated in Fig. 9, a typical signal at this point has a dynamic range (ratio of the weakest to strongest echoes) of around 70–80 dB. Attenuation accounts for a 30–50 dB loss. These losses can be seen on an oscilloscope as an exponentially decreasing echo signal amplitude with time. All scanners have some sort of time-gain correction (TGC) circuit that attempts to correct approximately for this attenuative loss by applying time (depth)-dependent amplification (25). Even after this correction, the remaining dynamic range is 40–50 dB. This is now mostly due to the differences in backscatter within the tissue microstructure. This is too large for a display monitor to handle and for the human visual system to comprehend. Therefore, the signal further undergoes compression amplification. This is usually done by a non linear amplifier that amplifies weak echo signals more than strong ones. The dynamic range can be brought to within 20–30 dB. At this point, the entire Rf echo signal time series is considered a slowly varying amplitude modulated carrier wave at frequency fo (transducer center frequency). Only the slowly varying amplitude information is displayed in a B-scan image. Therefore, the signal is demodulated by using any of a number of techniques borrowed from communication engineering theory (26). For example, full or half-wave rectification followed by low-pass smoothing (filtering) is often performed by dedicated analog circuits. This envelope signal constitutes the A-line video signal that is digitized and stored for display. Once scanning is completed by, say, the lateral movement of the transducer in the x direction (see Fig. 3) and all of the A-line signals have been acquired, a static B-scan image I(x, z) is produced by simply displaying the adjacent A-line values side by side in gray scale. Time encodes the depth z = t/2c0 . Analog and digital scan
1420
ULTRASOUND IMAGING
Position-direction or time-indicating signals
Input drive signal
Clock pulse generator
Time delay
Transmitter
RF preamplifier
RF echo signal 70−80 dB
Time-gain function generator (ramp) Time gain control (TGC)
RF signal after TGC 40−50 dB
Variable attenuation (system gain) Compression amplifier
After TGC and compression 20−30 dB
Demodulation
20−30 dB
Time-based ramp generator Preprocessing
Envelope (or A-scan)
Display Figure 9. Block diagram depicting the essential echo signal processing in an ultrasound scanner. The modified echo signal is schematically shown to the right at various points in processing. In the schematic, time increases to the right. (From Bamber et al. (13)).
converters have been used in the past as a mapping and registration interface to the display unit (25). Further nonlinear gray value transformation, generally referred to as preprocessing, may be used under operator control before the display stage. The reader should bear in mind that many variations in the signal processing chain are possible and are commonly adopted in scanner design.
IMAGE QUALITY METRICS Speckle A typical ultrasound image from a tissue-mimicking phantom is shown in Fig. 10. The mottled or granular gray-scale structure seen in the image is called speckle. Speckle results from the random scattering microstructure of the tissue and the coherent nature of the transducer detection process (15). For a basic understanding of the
process, consider the simple example in Fig. 11. Echoes from four closely spaced scatterers that are positioned randomly in the ultrasound beam can superimpose and produce an A-line envelope signal that shows seemingly random variations in amplitude with echo arrival time. Upon lateral scanning, these scatterers are imaged as randomly fluctuating gray values in the B-scan image. Note that because of the underlying interference phenomenon, there is no one-to-one correspondence between a high brightness value seen in the B scan and any particular scatterer location. In other words, the B-scan image does not turn out to be simply four distinct bright spots, one corresponding to each scatterer. This leads to an implication that for a speckle pattern to exist, the interscatterer spacing must be much smaller than some resolution dimensions of the B-scan imaging system. These dimensions are defined by the imaging system’s point-spread function (PSF)
ULTRASOUND IMAGING
1421
speckle does carry some information about the tissue microstructure, several researchers have attempted to extract statistical information from the speckle for tissue characterization (9–12,17–19). These methods are not deterministic, and therefore we have to assume some statistical model for the scattering microstructure. To a radiologist who is used to looking at images from other medical imaging modalities, speckle appears as fine randomly fluctuating gray values superimposed on an otherwise uniform background. Bear in mind that speckle is often a very low-level signal that is amplified nonlinearly in the processing stage. But for the speckle, it would be difficult to visualize low-contrast lesions in tissue. In Fig. 10, some circularly shaped low-contrast ‘‘lesions’’ can be visualized in the presence of speckle. Here, we will consider speckle from the standpoint of its role in lowcontrast detection. It is easier to recognize the quality (subjective evaluation) of an image than to define it. But subjective evaluation is not always convenient and is task-dependent. On the other hand, it is easy to define some objective image quality metrics that depend mainly on the design parameters of the imaging system (27,28). The relationship between objective and subjective evaluation is not well understood at present. We will consider only the objective metrics here. Some have been defined and have been in use for many years. Some others, such as contrast resolution and uniformity, have come into existence more recently as the use of computer technology continues to bring subtle improvements in image quality.
Scanning direction x Low-contrast circular “lesions”
Speckle texture
Depth z
Figure 10. B-scan image of a tissue-mimicking phantom.
which we will consider shortly. It is generally believed, therefore, that speckle results from the random scattering microstructure when the interscatterer spacing is much smaller than the resolution dimensions of the imaging system. From our previous discussion on scattering, we understand that it is difficult to know this microstructure deterministically. Even more difficult, if not impossible, is the inverse problem: to determine the microstructure uniquely from the B-scan image. Nevertheless, given that
Point-Spread Function: Lateral and Axial Resolution The concept of a two-dimensional point-spread function, PSF(x, y), in the theory of imaging systems analysis is well established (24,29). Extending it to three-dimensional
Speckle formation
Time
Coherent sum ∑
Time or depth
Envelope
Figure 11. Coherent summation as a basis for speckle formation.
1422
ULTRASOUND IMAGING
is called the system point-spread function, PSF(x, y). Similarly if a line of delta functions (a thin line object) aligned along the x axis is the input, the resulting image is called a one-dimensional line-spread function, LSF(y). Mathematically, LSF(y) = PSF(x, y) dx. In general, it is an angular projection of the PSF(x, y) in a direction that is given by the line object (24). This concept can be extended to a B-scan imaging system in a lossless medium. Figure 13 illustrates a rectangular element transducer that scans a point target located at some distance zo . A hypothetical beam profile h(x, y), chosen as a two-dimensional rectangular function for the sake of illustration, is also depicted. The exact form of h(x, y) is determined by solving the diffraction problem (more on this later). Scanning can be performed by moving the transducer in the x, y plane. The resulting image, as shown, can be defined as the true three-dimensional PSF(x, y, z). We assume for simplicity that the PSF is separable into an x, y component that depends on the beam profile and a z component that depends on the envelope of the received temporal pulse |p(t)| or equivalently, |p(z)|:
Two-dimensional PSF and LSF
y
y
d(x, y )
1(x ) d(y )
x
x
System
System
y y x
x
PSF(x, y ) LSF(y ) Projection along x axis
PSF(x, y, z) = [h2 (x, y)]|p(z)|
∫ PSF (x, y )dx Figure 12. Two-dimensional point-spread function and its relationship to line-spread function.
The beam profile term appears as a square because transmission and reception are by the same transducer and the reciprocity principle holds for a beam profile in reception. A slice of the PSF along the z axis is shown for both the Rf and the envelope detected signal. The three-dimensional PSF is not measured on clinical scanners. Most quality control phantoms consist of wire targets located at different depths and are held perpendicularly to the beam axis. Scanning is generally performed in a direction that is mutually perpendicular to both the beam axis and the wire (2). By generalizing the
ultrasonic imaging system analysis has remained a subtle distinction (4,29). Consider a two-dimensional imaging system as a black box with an access to its input (object) and the output (the final B-scanned image of the object). Figure 12 illustrates the input–output relationship. If the input is a point object [mathematically a two-dimensional delta function δ(x, y)], then by definition, the output image
Three dimensional PSF (x, y, z)
y
(7)
C-scanned image
h 2(x, y )
PSF(x, y, z )
y
x Envelope slice in x, y plane
x Y X
z slice z0 p (z )
z Envelope
z Figure 13. Components of a three-dimensional point-spread function.
ULTRASOUND IMAGING
two-dimensional projection concept to three dimensions, it is possible to show that the scanned image of a wire target is at best an estimate of one particular projection of the three-dimensional PSF(x, y, z). For the geometry and the example described, if the wire is along the y axis and scanning is along the x axis, the two-dimensional image I(x, z) will be approximately represented as
I(x, z) = |p(z)|
h (x, y)dy = |p(z)|L(x) 2
(8)
where L(x) is the projection of h2 (x, y) along the x axis (24). Similarly if the wire targets are along the x axis and scanning is along the y axis, we will get a projection of h2 (x, y) along the y axis. Either way, it is not the cross section but the projection of the beam pattern that is measured. Note that for a linear array scanner that uses an array of rectangular elements lined up, say, along the x axis, y is referred to as the elevation direction, and the beam width in that direction represents the thickness of the slice of the selected plane. I(x, z) or I(x, t), the image of a wire target located at z = zo shows up as a smudged blob in the brightness plot (B-scan image in gray scale), is generally accepted as the two-dimensional PSF(x, z) defined for the depth zo . Figure 14 shows a B-scan image of a phantom that contains several wire targets at different depths. The scanning was perpendicular to the wire target along what we may consider as the x axis. Disregard the faint speckle pattern. Note the finite shape and size of the individual bright blobs. They represent the B-scan image of the wire targets in the phantom. Each may be considered to represent the system PSF = I(x, z) for the appropriate depth zo . Two spatial resolution parameters can be extracted from I(x, z). The 6-dB width (onehalf amplitude) of L(x) represents the lateral resolution. Similarly, the 6-dB width of the time pulse envelope |p(t)| or |p(z)| represents the axial resolution. Two wire targets
Scanning direction x
Two-dimensional PSF: I (x, z ) varies with depth
Depth z
Figure 14. B-scan image of a wire target phantom.
1423
separated by a distance smaller than the lateral resolution distance in the scanning direction will not appear as two separately distinguishable blobs. Similarly, two wire targets separated by a distance smaller than the axial resolution in the beam propagative direction will not be imaged as two separate distinguishable blobs. Typically, the axial resolution distance is much smaller (0.3 to 1 mm) than the lateral resolution (1 to 5 mm). In noncircularly symmetrical geometry, a third parameter that corresponds to the thickness of the slice may also be defined and measured. In summary, we observe that the PSF does not have a symmetrical shape because its shape and size in the two orthogonal directions, the axial and lateral, depend on different factors. The axial resolution depends on the time width of the short pulse. Recalling from our earlier discussion, this time width is inversely proportional to the transducer bandwidth. Due to the frequency-dependent attenuation as the disturbance propagates in the tissue, the bandwidth of the pulse may change from that excited at the transducer. Therefore, axial resolution may change with depth. The lateral resolution depends on the transducer beam width at any given depth. Consequently it depends on the transducer element shape, dimensions, and the diffractive phenomenon that unfolds as the beam propagates in the medium. Thus we observe that, in general, the PSF is spatially variant. In other words, it is different for different locations of the wire target. Typical values for the resolution can range from few millimeters at low megahertz frequencies to a fraction of a millimeter at 10 MHz and above. Contrast Resolution This is a metric that attempts to evaluate the spatial extent of the side lobes in the PSF which may be 20 to 60 dB below the central peak value. Therefore, it requires examination of the PSF or the lateral and axial profiles on a logarithmic scale plot (27,30). We will examine the structure and magnitude of side lobes in a later section (see Fig. 17 for example). The reason for such emphasis on side-lobe structure becomes clear if we recall two major points. First, we know that speckle comes from low-level echoes from all scatterers located within the three-dimensional region spanned by the PSF, including its side lobes. These echoes are preferentially amplified in a nonlinear amplifier during B-scan signal processing. Second, the B-scan imaging scheme maps all of the echoes that it receives along its line of sight. Therefore, for example, if the side lobes in the transducer beam profile are large, then a strong scatter that is sufficiently offaxis is registered and imaged at an incorrect location laterally. For example, take the barely discernible circular lesion in the image in Fig. 10. The internal speckle echoes inside the disk image are hypoechoic, that is, they are slightly weaker than the surrounding background. If several very strong scatterers were to be located near this ‘‘lesion’’ and the PSF had 20-dB side lobes, then it is entirely possible that the signal from side lobes can reduce the contrast within the disk because we know from the discussion in the scattering section that there is usually a dynamic range of 20–30 dB between the
1424
ULTRASOUND IMAGING
weakest and strongest echoes. Therefore, poor side-lobe structure can decrease the low-contrast detectability of the lesion. On the other hand, if the side lobes are kept below 50–60 dB, a marked improvement in lesion detectability is possible (28). This problem was acutely realized when it first became feasible to use an array of transducer elements to synthesize the beam. The side lobes became a major problem in the initial design stage. Although this is not a universally accepted definition, but one that is adapted by some manufacturers, the lateral distance where the side lobes first reach 50 dB may be used to define contrast resolution. Consideration of this image quality metric has become more and more widespread in recent years. The recent implementation of harmonic imaging technology in scanners (to be considered later) derives its benefits partly from superior performance in contrast resolution. Spatial Uniformity The B-scan image in Fig. 14 is from a scanner that has a linear phased array transducer. Note that the PSF shape and size vary with depth. They become larger as depth increases; the lateral resolution especially is worse with depth. As we shall see later, fixed focusing of the transducer can improve the lateral resolution in the focal zone but at a further distance, it becomes worse due to beam divergence. Therefore, nonuniformity implies that optimal resolution at all depths in such systems is not possible and there is a spatially variant PSF. This is referred to as spatial nonuniformity. It is a direct consequence of the beam shape. When linear array technology is combined with computerized beam forming, it is now possible to focus the beam dynamically at various depths. The PSF can be made more uniform spatially, thus improving on this image quality metric. Temporal Resolution Temporal resolution in two-dimensional B-scan imaging, is defined and is limited by the image frame rate. If the imaging objective is to visualize the rapidly moving part of the body such as a beating heart, temporal resolution becomes important. In such cases, a loss of temporal resolution could contribute to a loss of spatial resolution due to the blurring effects of motion. As we discuss later, a frame rate of 15–30/s is possible in medical ultrasound by using real-time scanners. To summarize, the objective image quality metrics that depend on engineering system design have been defined here. The transmitted pulse length or, equivalently, the pulse bandwidth determines axial resolution. Lateral resolution, slice thickness, contrast resolution, and spatial uniformity are functions mainly of transducer size, shape, center frequency, and the role of the associated diffractive phenomenon as the beam propagates. The basic theory that governs the last aspect will be considered in the next section. The reader should bear in mind that, ultimately, a task-dependent evaluation, for example, detectability of a lesion of some size and contrast, may depend on many other factors. These factors may include tissue properties, the adjustable settings on the scanner used by the
sonographer, and last but not the least, the human visual system (28). Nevertheless if these factors are held fixed, the objective metrics do provide a meaningful input to the subjective evaluation process. Most importantly, these metrics serve a purpose in system intercomparison and in evaluating the efficacy of technological advancements. DIFFRACTION AND ITS ROLE IN ULTRASOUND IMAGING The transducer field pattern plays an important role in determining lateral resolution and the PSF. The variation of the beam pattern with depth is governed by diffraction. Therefore, B-scan imaging is considered a diffraction-limited imaging modality, similar to a lensbased optical imaging system (24). For several reasons, we consider in detail here the diffraction from a circular disk transducer in a lossless medium. For an exact description, the diffractive integral [to be defined in Eq. (9)] must be solved, either numerically or analytically in closed form. Analytical closed form solutions, with or without approximations, do exist for circular geometry for both monochromatic and pulsed excitation. Although circular transducer geometry may not often be applicable to many imaging systems, its consideration here helps us to understand the general features of radiative field patterns (1,13,31). For example, one of the main features is the somewhat arbitrary division of the radiative field pattern into two regions, the near field and the far field, when the excitation is monochromatic. An analytical solution exists for on-axis field variations and for the field distribution in the far field (a large distance from the transducer), but the near field remains amenable only to numerical solution. The Lommel function-based frequency domain approach presented here is an analytical closed form solution under a much stronger approximation, namely, the Fresnel approximation (32). It contains in itself all of the general features of the field pattern, but in addition also provides an estimate of the field pattern in the near-field region. The main reason for including it here is to demonstrate its use in describing the general features of the field pattern. Its potential usefulness does not seem to have been explored by the ultrasound community. The study of diffraction from transducers that have noncircular geometries, such as rectangular elements and linear phased arrays, is also briefly outlined at the end of the section. Circular Piston Transducers Diffraction is a natural phenomenon associated with any wave propagation, including ultrasound. It must be given due consideration if we want to know the distribution of acoustic field parameters (pressure, intensity, etc.) in front of the transducer as a function of both space (three dimensions) and time. The field of a circular disk transducer is circularly symmetrical in any plane perpendicular to the beam axis z (see Fig. 15). Therefore, the field can
be specified as a function of z and off-axis distance ρ = x2 + y2 . Two cases are of practical interest: (1) determination of the impulse response, h1 (ρ, z, t), when the time excitation of the piston or the PZT
ULTRASOUND IMAGING
y
dx.dy r x r
2a
z0
z
Transducer
Diffraction as a linear filter
d(t )
e−j w t
h1(r, zo,t ) H1(r, zo,w)e−j w t
Figure 15. Geometry for diffraction from a circular transducer and the definition as a linear filter.
element is a delta function δ(t); (2) determination of the transfer function, H1 (ρ, z, ω), when the excitation is monochromatic, ejωt . The exact solution in both cases is obtained by solving the appropriate diffractive integral: 1 h1 (ρ, z, t) = 2π
r δ t− c dx dy, u(x, y) r
H1 (ρ, z, ω) =
1 2π
u(x, y)
e−jkr dx dy, r
and studied for the monochromatic case, primarily at the center frequency of the transducer. Although the impulse response solution is exact and can be used to determine the field pattern for monochromatic excitation, computationally, it is not convenient for this purpose. To the best of our knowledge, an exact analytical solution of the diffractive integral for monochromatic excitation at radial frequency ω = 2π f does not exist. Several authors have attempted an approximate solution in terms of Bessel functions (32,34). We have recently investigated the nature of this approximate solution, referred to here as H1a ≈ H1 . For fixed ρ and z, the effect of diffraction on the input excitation signal can be considered a linear low-pass filter (see Fig. 15). As such, its impulse response and transfer function defined in Eq. (9) must be exact Fourier transform pairs, where t and ω are conjugate variables (24,29). Therefore H1a (ρ, z, ω) and h1 (ρ, z, t) should constitute an approximate Fourier transform pair. This equivalence has been theoretically justified and computationally demonstrated (35). Only a short summary of results is given here, the reader is advised to consult the relevant publications for more details (35,36). In (35), it was shown that if one makes a Fresnel approximation (z2 a2 ) of the phase term e−jkr in the diffractive integral of Eq. (9), the analytical solution can be expressed in closed form as H1a (ρ, z, ω) =
A
(9)
1425
1 v2 u exp −j kz + + U1 (u, v) k 2u 2 +jU2 (u, v) , (10)
A
where k = 2π/λ = ω/c, r = |r|, and λ is the wavelength. u(x, y) defines the spatial excitation profile on the transducer face. A rigid baffle boundary condition has been assumed (20,31). In physical terms, the integral is a statement of Huygens’s principle where the transducer is considered a distribution of elemental point sources. The field at any point in front of the transducer is the sum of elementary contributions from all elemental point sources of appropriate strength and phase.h1 and H1 are generally referred to as the velocity potential impulse response and velocity potential transfer function, respectively. In both cases, the integration must be performed over the transducer face of diameter 2a. If spatially uniform excitation is assumed, u(x, y) = u0 . An exact solution for the impulse response h1 (ρ, z, t) in closed form has been derived and has been used by many researchers (33). Note that this impulse response is different from the transducer impulse p(t) defined earlier for the electrical voltage drive signal input to the PZT element. From the standpoint of linear system theory, these two can be considered two separate linear subsystems in cascade (24). Their corresponding impulse responses can be convolved to predict the total system output when the PZT element excitation is an electrical delta function. For example, the shape of the short pulse disturbance received at some point (z, ρ) can be predicted by convolving p(t) with h1 (ρ, z, t) (30). The basic characteristics of diffraction are generally examined
where u = ka2 /z, v = kaρ/z, and a stands for the transducer radius. Un , known as Lommel functions, can be calculated as Un (u, v) =
∞ s=0
(−1)s
u n+2s v
Jn+2s (v)
(11)
where Jn stands for an order n Bessel function of the first kind. It is a convergent series, but for cautionary computational issues, the reader is referred to the published research monograph (36). On-Axis Behavior: Near-Field and Far-Field The diffractive integral defined by Eq. (9) can be solved exactly for on-axis conditions (ρ = 0). Equation (10) can also be used, but the solution will correspond to the Fresnel approximation (z2 a2 ) (4,31). The expression for the on-axis beam intensity is
|H1a (ρ = 0, z, ω = ω0 )|2 = (const) sin2 k( z2 − a2 − z) 2 ka (12) ≈ (const) sin2 4z The first term is exact, the second term that has the approximate sign in front is the Fresnel approximation. This equation is often used to define the near- and farfield regions of a flat disk transducer. An example is
1426
ULTRASOUND IMAGING
the near field (z < a2 /λ0 ). The region beyond this point is known as the far field (z > a2 /λ0 ) (1,13). The Fresnel approximation is considered valid for any z larger than a few wavelengths. Therefore, its region of validity is most of the space in front of the transducer except the region very close to the transducer face. In Fig. 16, the number of maxima and minima become infinite as we approach the transducer face. They remain finite for the exact solution (not shown). This difference occurs because the Fresnel approximation breaks down at small z. The Fraunhofer approximation is a much weaker approximation and is valid only for z a2 /λ0 , that is, only in the far field (4,31).
1.0
Normalized magnitude
0.8
0.6
0.4
Beam Patterns and Beam Profiles
0.2
0.0 0.00
0.05
0.10
0.15
0.20
0.25
Distance (m) Figure 16. On-axis monochromatic signal amplitude variation due to diffraction from a circular unfocused (thin line) and focused (thick line) transducer.
given in Fig. 16, where the thin solid line represents the on-axis normalized intensity variation calculated using the Fresnel approximation. The diameter 2a = 19 mm, the frequency f0 = ω0 /2π = 2.46 MHz, and the speed c = 1, 540 m/s. The intensity value oscillates through a number of axial maxima and minima as we move away from the transducer. The last maximum occurs at an axial distance (a2 /λ0 ) = 14.5 cm in this case. The region in front of the transducer up to a distance equal to (a2 /λ0 ) is called
Beam patterns are the two-dimensional intensity or amplitude distribution in any given x, y plane defined by a fixed z0 . Due to the circular symmetry, beam profiles, defined as the beam amplitude or intensity variations as a function of the off-axis radial distance ρ, may also be used. No exact analytical solution exists for the entire space in front of the transducer. The analytical closed form approximate solution H1a (ρ, z, ω0 ) can be used but it is based on the caveat that its validity may be questioned for planes very close to the transducer face. The top row in Fig. 17 shows the beam patterns for the same flat disk (unfocused) transducer mentioned earlier. Patterns for four different z0 values are shown. The z0 locations have been indicated on each pattern. Note that only the pattern at z0 = 18.3 cm is in the far field of the given transducer. Here, along with the central main lobe of the beam, we can also see the surrounding weaker side lobes. In the Fraunhofer approximation, Eq. (10) reduces to a well-known formula for the magnitude of a radial beam
z 0 = 1.17 cm
z 0 = 4.1 cm
z 0 = 11.0 cm
z 0 = 18.3 cm
z 0 = 1.17 cm
z 0 = 4.1 cm
z 0 = 11.0 cm
z 0 = 18.3 cm
Figure 17. Beam patterns at different z0 . Top row is unfocused; bottom row is focused.
ULTRASOUND IMAGING
Z 0 = 1.2 cm
Z 0 = 4.0 cm
(b)
1.0
1.0
0.8
0.8 Magnitude
Magnitude
(a)
0.6 0.4
−10 0 10 Radial distance (mm)
0
10
20
Z 0 = 18.0 cm
(d)
1.0
1.0
0.8
0.8
0.6 0.4 0.2 0.0 −20
−10
Radial distance (mm)
Magnitude
Magnitude
0.4
0.0 −20
20
Z 0 = 11.0 cm
(c)
0.6
0.2
0.2 0.0 −20
1427
0.6 0.4 0.2
−10 0 10 Radial distance (mm)
20
0.0 −20
−10 0 10 Radial distance (mm)
20
Figure 18. Beam profiles at different z0 . Dashed is unfocused, and solid is focused.
diffraction, is that it is spatially nonuniform. Note also the side lobe structure that exists on either side of the central main lobe.
profile:
kaρ J1 z0 H1a (ρ, z = z0 , ω0 ) = const kaρ . z0
(13)
This simple expression is often used to estimate the beam profile in the far field (z a2 /λ0 ). The four plots shown with dashed lines in Fig. 18 are the radial beam profiles of the corresponding two-dimensional beam patterns shown in Fig. 17, top row (unfocused case). Using these theoretical plots generated within the Fresnel approximation, we can summarize some general features of the beam pattern. In the near field, the beam is essentially confined within the original diameter of the transducer. The intensity varies significantly within the beam and with z. In the far field, the beam assumes a standard form. It has a maximum at the center and the intensity falls off as the radial distance increases. The width of the central lobe becomes larger as z increases, and the on-axis intensity decreases as the beam begins to diverge. The lateral resolution is generally considered to be determined by the beam confinement distance in the near field and by the −6 dB full width of the central lobe, defined by the [J1 (θ )/θ ] formula in the far field. An important aspect of lateral resolution, mainly due to
Focusing The primary function of focusing is to improve the lateral resolution by reducing the central main lobe width in the focal zone. One way to achieve focusing is to curve the transducer spherically by using a radius of curvature F, as shown Fig. 19. In Fig. 16, the thick line shows the onaxis intensity for the same transducer, but the radius of curvature F = 11 cm. An expression similar to Eq. (12) has been used where z is replaced by ε where ε −1 = z−1 + F −1 . Though z = 11 cm defines the geometrical focus, a true focal point may be defined as the z location where the on-axis intensity has a maximum. Note that the true focal point is always at z < a2 /λ0 . In other words, we cannot obtain a focus in the far field of a transducer. A difference often exists between the geometrical focus and the true focus (4). Kossoff defined three different degrees of focusing: strong or short focusing implies F ≈ 0.2a2 /λ0 , medium focusing implies F ≈ 0.5a2 /λ0 , and weak or long focusing implies F ≈ 0.8a2 /λ0 (37). No exact closed form solution exists for unfocused off-axis behavior. A solution in the form of an integral expression can be found in (4). However, the beam pattern only in the plane at the geometrical focal point z = F
1428
ULTRASOUND IMAGING
Focusing
2a
z=F z = a 2/λ With focusing
numerically. However, an approximate understanding of the beam pattern is possible. Two methods are outlined here. The reader is urged to consult the references for more detail (38). The first method uses the angular spectrum technique when the excitation is monochromatic (39). If w(x, y, z = z1 ) is the complex (magnitude and phase) of the two-dimensional monochromatic wave field at frequency ω0 defined in one plane at z = z1 , then it is possible to propagate the field to a different plane at z = z2 . The first step of the process is to take a two-dimensional Fourier transform of the wave field that exists at z1 . Let us call this Fourier transform W(fx , fy , z = z1 ). Here fx and fy are the spatial frequency variables conjugate to x and y, respectively. Then, we multiply the Fourier transform by j2π(z −z )
Without focusing
Figure 19. Effect of focusing on the beam profile.
is given analytically by Eq. (13) where z is replaced by F. Within the Fresnel approximation, a closed form expression based on the Lommel diffraction formulation for a focused piston transducer of radius a is given by (36) H1a (ρ, z, ω)
2 ka kaρ ε j(kz+(kρ 2 /2z)+(ka2 /2ε)) U1 = e , kz ε z 2 ka kaρ . (14) + jU2 , ε z
The circularly symmetrical beam patterns, calculated by using Lommel functions for the focused case at the same four z locations, are shown in the bottom row of Fig. 17, and the top row is the corresponding unfocused case. The solid lines in Fig. 18 represent the beam profiles for the focused case, and the dashed lines represent the unfocused case. Note a marked decrease of −6 dB width of the central lobe and the reduction in the side-lobe level at the geometrical focal point. Figure 19 illustrates the changes in the beam profile due to focusing. The beam width is reduced significantly near the focal point and also within a distance range called the focal zone. However, the beam diverges more rapidly beyond the focal zone. The on-axis intensity increases many fold at the focal point. This type of focusing can improve lateral resolution and reduce side lobes but only within the focal zone. This zone is at a fixed distance from the transducer. As we will see later, introducing phased array transducers has made it possible to achieve variable focusing at many different depths. Noncircular Transducer Geometries For an exact solution to diffraction from any arbitrary geometry, the integral defined in Eq. (9) must be solved
λ2 −f 2 −f 2
2 1 y 0 x a phase filter e . The final step is to take an inverse two-dimensional Fourier transform of the product to obtain the field w(x, y, z = z2 ) for the plane at z = z2 . The transducer geometry can be specified as the initial wave field at z = 0. The solution is valid in the near and the far fields of the transducer. The method can be extended to finite bandwidth (short duration) excitation signals used in medical ultrasound (40). The second method can provide only an estimate of the beam pattern in the far field of the transducer, that is, in a plane at any z D2 /λ0 , where D stands for the maximum length in the full specification of the transducer geometry. It has been shown that when the Fraunhofer approximation is applied to the exact diffractive integral for any specified transducer geometry f (x, y), the resulting field pattern in the far field is simply a two-dimensional Fourier transform of the transducer geometry, followed by a scaling of the spatial frequency variables fx and fy by the term λ0 z (29,39). Another alternative is to perform a three-dimensional finite-element modeling of the wave propagation. These methods can be useful in designing and understanding the transducer array technology that has been driving the market in recent years.
TECHNOLOGICAL ADVANCEMENTS Real-Time B-Scan Imaging The main objective here has been to improve temporal resolution. The mechanical movement of the transducer to vary the line of sight is a slow process. Motion artifacts can be avoided in the image if the object does not move. But in medical applications, the breathing motion time period is of the order of 1 second. If a single frame of a B-scan image is considered to be composed of N number of A-lines, each requiring (PRF)−1 seconds to acquire, then the frame rate is determined by PRF/N. Given the maximum depth zmax to be imaged, the PRF is set at c/2zmax . A compromise is needed between zmax and N. Typically for N = 128 and a frame rate of 30 frames per second, a maximum imaging depth of 19.5 cm is possible. Many different schemes for fast scanning have been developed. The details can be found elsewhere (41). Figure 20 illustrates a sector scanner where a single element transducer wobbles inside a protective sheath, thereby sweeping the beam (line of sight) in an arc. The resulting B-scan image has the familiar shape of a sector of a circle.
ULTRASOUND IMAGING
Sector scan
Beam
Sweep
Figure 20. A sector scanning method for real-time imaging.
Focusing a single-element transducer has been used for many years to improve lateral resolution and increase the sonicated intensity in the focal zone (1,41). Equation (14) provides the beam profile for a circular disk transducer. But there is a price to pay in fixed focus operation. A loss of lateral resolution always accompanies at distances beyond the focal region due to more rapid divergence of the beam (see Fig. 19). Therefore, there is a trade-off with spatial uniformity. Transducer Arrays These are novel assemblies that have more than one transducer element. A linear array generally has rectangular elements all arranged in a line, as illustrated in Fig. 21, or sometimes across a curved line (a convex array). Annular arrays (concentric rings of elements) have also been developed (1,28,29). The number of elements has steadily increased since the late 1970s when they were first introduced, and it is common now to have 128 or more elements. The technical challenges that had to be surmounted include the development of dicing and design methodology, the development of multichannel signal processing that has minimum cross talk, and the reduction of grating side lobes that originate in the beam due to the
Linear phased array
1
2
D
3 Time 5
6
7
8
w
11 12 13 14 15
q
d
1429
inherent periodicity introduced by the array (29). It is necessary to understand the diffractive pattern produced by a given array setup to estimate and improve the image quality metrics defined earlier. Although this is a computationally difficult task, as outlined in the last section, some simple guidelines can be used to define the pattern in the far field. It is convenient to think of each element as a point source. In the operation of a phased array scanner, the effect of a large continuousface transducer is synthesized from these periodic point sources. The aperture size (D) involved in forming the beam and the wavelength (λ0 ) at the dominant frequency determine the central beam width and hence the lateral resolution (Fig. 21). Due to the periodicity of the elements, replicas of the central lobe are created in the beam. These replicas of the central lobe are called grating side lobes. Their amplitudes are not negligible and often are comparable to central beam amplitudes. As stated before, such large side lobes are unacceptable for medical imaging because they compromise contrast resolution. The element spacing d and λ0 determine the locations of the grating side lobes. The width of the elements, w, determines the amplitudes of the grating side lobe relative to the main lobe. Many different approaches to reducing side lobes and improving the main lobe lateral resolution are possible, but none is perfect. Despite the side lobe problems, the advantages are many, and they have been slowly realized in practice during the last decade. First, mechanical scanning is replaced by electronically controlled sequential excitation of a group of elements in a linear array. The beam that is generated is illustrated in Fig. 21; it has excitation of element numbers 1, 2, and 3 followed by 2, 3, and 4, etc. Alternatively, the beam can be steered to any angle θ by applying an appropriate linear time delayed excitation to a group of elements (see, for example, elements 5, 6, 7, and 8 in Fig. 21). Second, a variable degree of focusing can be achieved electronically by imposing a curvature on the excitation time delay that is applied to a group of selected elements (see, for example, elements 11, 12, 13, 14, and 15 in Fig. 21). Third, there is the flexibility of reducing the side-lobe structure by what is known as beam apodization. This refers to nonuniform shaping of the amplitude of the excitation signal across the face of the transducer array or a subset of elements that form the beam. A predetermined weighting function controls the amplitude of the excitation signal in each element. The side lobes can be reduced significantly at the expense of a slight reduction in lateral resolution determined by the main lobe width (27). The fourth advantage stems from the potential power of digital signal processing that can be applied to the multichannel echo signals. One example is considered in the next section. Perhaps only a fraction of that potential has been realized so far. Digital Processing of Multichannel Signals
Figure 21. Linear phased array: example of electronically timed excitation for scanning (left), beam steering (middle), and variable degree of focusing (right).
This is integrating the full power of computer technology with the multichannel signal generation/reception capability afforded by the transducer arrays (27). It has been responsible for steady improvement in image quality
1430
ULTRASOUND IMAGING
Dynamic focusing
Scanning direction x
Dynamic focusing
∑ Two-dimensional PSF I (x, z ) invariant with depth
Time shift operation Time 1
2
3
4
D
5
6
7
Element
w
d
Depth z
Figure 22. Signal processing of different channels for dynamic focusing of received echo signal.
offered by clinical scanners in the last decade. Each manufacturer follows its own proprietary technology. Hybrid analog/digital computers, specially designed to accommodate the data processing requirements of real-time image formation, are used in a dedicated and adaptive mode. Although the details remain proprietary, we can get a general idea of what is being done here. The PSF for B-scan imaging depends on the depth and the beam steering angle (28). The challenge has been to exploit the flexibility and speed of computer hardware and software to improve spatial resolution, the spatial uniformity of the PSF, and the contrast resolution (reduced side lobes), simultaneously. The lateral resolution can be improved and side-lobe reduction can be achieved at each depth by focusing the multichannel echo signal in the software upon reception. Consider an on-axis point scatterer at depth z that sends a short pulse back toward a linear transducer array (Fig. 22). The signal received by each element will have a travel time delay that can be easily calculated by assuming a constant speed of propagation. By applying a time shift in the software before summing up the signal in all of the channels, dynamic focusing for depth z can be achieved. Because dynamic focusing is performed in the software and on the echo signal, there is the flexibility to tailor the time delays for each depth. This processing is usually done as fast as the echoes are received, so that realtime imaging capability can be maintained. Furthermore, the receiver aperture size can also be changed for different depths (arrival times) by adding more channels to the signal summation algorithm. The apodization in transmit and/or in receive can be further adapted to different depths and imaging objectives. Significant improvement in spatial and contrast resolution can now be maintained across a large portion of the image. In other words, a high degree of spatial uniformity is possible now simultaneously along with improved spatial and contrast resolution (27). This is demonstrated in Fig. 23 by the image of wire targets at different depths in a tissue-mimicking phantom taken by a
Figure 23. B-scan image of a wire target phantom where dynamic focusing is turned on.
64-element phased array scanner when the dynamic focusing was turned on. Compare the PSF with those shown in Fig. 14 where the dynamic focusing is turned off. We can visualize the improvement in the lateral resolution at each depth in the two images. At the same time, now the PSF is spatially uniform. Doppler Ultrasound Three types of instruments are currently used for blood flow measurements in heart, arteries and veins based on the Doppler principle: continuous-wave (CW), pulsed-wave (PW), and color-flow (CF) (42). CW is the simplest and least expensive. Two tilted transducers that have nonparallel beams produce an overlap region at some fixed range distance. One acts as a transmitter, and the other as a receiver. The echo signal from blood flow (moving blood cells) is continuously received, and the Doppler shift is estimated by a two-stage process. First, the echo signal is mixed or multiplied by the transmitted frequency signal. Sum and difference frequencies are produced at the output of the mixer. A subsequent low-pass filter removes the sum frequency but preserves the difference frequency. According to Eq. (6), the Doppler shift is proportional to the flow velocity and can be calibrated if the Doppler angle is known. An audio amplifier and speaker are often used to hear the blood flow. If the moving blood cells have a distribution of velocities, a spectrum of Doppler frequencies is displayed as a function of time. CW suffers from a lack of depth range and region of interest (ROI) size selectivity. PW combines the velocity determination of a CW system and the range discrimination of pulse-echo imaging systems. Figure 24 illustrates the processing of a signal for a pulsed Doppler system. Similar to B-scan imaging, only one transducer is used in the pulse-echo mode. The spatial pulse length for Doppler probing is usually longer than that used for imaging. A rate of 5–25 cycles/pulse is not uncommon for achieving the narrowband characteristics necessary for Doppler studies. The pulse repetition frequency (PRF) is under operator
ULTRASOUND IMAGING
1431
1/PRF
A Time t
Transmit signal A
Echo signal B
B
t0 B time gated Transducer 1/PRF
z0
Mixer output f n
Interpolation
Doppler signal
control (see transmit signal A in Fig. 24). The region of interest at some depth z0 is selected by applying a time gate to the echo signal B centered at t0 = 2z0 /c. Unlike CW, the output of the combined mixer and low-pass filter in PW becomes a sampled version of the Dopplershifted frequency waveform. The sampling interval is (PRF)−1 . After some sort of interpolation, this signal is sent to the audio amplifiers. The sampling aspect of PW results in a Nyquist frequency = (1/2)PRF. Consequently, any Doppler-shift frequency above the Nyquist limit is aliased. If the PRF is increased to reduce aliasing, the unambiguous range depth of the ROI is compromised. On the other hand, the lowest Doppler frequency requires data acquisition time equal to at least one complete cycle of the Doppler signal. This, in turn, increases the number of pulses that need to be fired before a single image frame that has velocity information can be displayed. In other words, the frame rate or temporal resolution for a fixed PRF has to be compromised to accommodate smaller Doppler shifts. The system design and optimization problem is full of challenges, notwithstanding the fact that the echo signal from blood is usually very weak and consequently the SNR is generally poor. Quadrature demodulation is often employed at the mixer stage to separate the positive and negative Doppler frequencies. A fast Fourier transform is often used to determine the amplitudes of the Doppler-shifted frequencies in a spectrum. The resolution is determined by the Doppler sample volume (the region from which Doppler-shifted echoes return and are presented audibly or visually). This is determined by the transducer beam width in the lateral and receiver gate length plus half the pulse length in the axial direction. Multiple gates can also be selected in some instruments. Firing of short pulses (for imaging) and long pulses (for Doppler) can be interleaved so that B-scan image generation, image guided gate selection,
Time t
Figure 24. Processing of a pulsed wave Doppler signal. (From Bamber et al. (13)).
and Doppler spectrum information can be presented simultaneously. For obvious reasons, such instruments are called duplex scanners. Color Doppler imaging is a more recent development and has found widespread clinically utility in the last few years. It presents two-dimensional, cross-sectional, realtime blood velocity or tissue motion information in color, along with the two-dimensional, cross-sectional, gray-scale anatomic B-scan image. The near real-time (high temporal resolution) requirement makes the design problem even more challenging. The signal processing chain does not use the full Doppler shift spectrum information because of a lack of time and/or lack of parallel channels necessary for real-time imaging. Instead, the A-line echo signal is split into 100–400 time gates. Consider that the scattering medium seen by a given A-line and a given time gate is interrogated by two pulses separated by a time interval equal to τ = (PRF)−1 . If the scatterer has remained stationary in that time τ , the normalized correlation factor between the first demodulated echo wave train and the second one shifted by a time delay τ will be near or equal to one. Any possible motion will change the magnitude and phase of this correlation factor. In 1985, Kasai et al. established the relationship between the magnitude and phase of this complex correlation factor and the mean and the variance of the Doppler-shifted spectrum (43). As many as eight pulses are used to determine this correlation factor accurately at each time gate center. The summation duration for correlation determination becomes 8τ . The beam must be held stationary for this duration. Therefore, the image frame rate has to be sacrificed. A mean Doppler shift can be estimated at each time gate location. If it falls below a threshold for motion detection, the information in that particular time gate is considered to be coming from a stationary tissue, and the echo signal is processed like any normal B scan and is displayed in gray scale.
1432
ULTRASOUND IMAGING
If any motion is inferred at any gate location, then, the motion data are color mapped by using a color scale and that pixel location is displayed in color. A blue color is generally used to depict flow away from, and red is used to depict flow toward the transducer. There are several problems in color flow mapping systems. Noise and signal from slowly moving body parts can overwhelm the small signal from moving blood. An analog ‘‘wall filter’’ is often employed that passes Doppler frequencies above a set level and eliminates strong but low-frequency Doppler shifts from moving body parts such as heart or vessel walls. The spatial resolution of color Doppler is much worse than the gray-scale B-scan image, and full Doppler spectral information is not depicted. Yet the ability to ‘‘see’’ the flow information simultaneously in color in real time along with the anatomy depicted by the B-scan image has found many applications in medical diagnosis. A more recent development known as amplitude or power Doppler has a signal-to-noise ratio four to five times better and therefore has higher sensitivity for visualizing blood flow in small vessels. Here, instead of using the mean of the Doppler shift and its sign, a number that is proportional to the total area under the Doppler power spectrum (positive and negative shifts included) is used and depicted by a color scale. This information is considered proportional to the density of blood cells in the sampled volume and hence to the blood flow volume. Now, blood flow patterns in small vascular structures can be visualized by power Doppler. There are numerous opportunities for further development of color-flow imaging, particularly in conjunction with the use of contrast agents. Compromises between frame rate, Doppler information content, spatial resolution, and aliasing artifacts still remain major issues and must be carefully taken into consideration while operating these instruments. Time-domain cross-correlation processing, which is a broadband technique used to extract velocity information, is also, it has been shown, of practical value. The reader may refer to a review article by Wells for further details on Doppler and color flow imaging (44). Harmonic Imaging The concepts of harmonic imaging are based on an understanding of finite amplitude distortion of a waveform as the wave propagates in a medium that has a significant nonlinear response. The theory has a rich history of application in underwater and airborne sound. Only recently, the biomedical community has paid clinically significant attention to this phenomenon (45). Simply put, because the propagative velocity of points on a sound wave increases with pressure, the pressure maximum of a finite-amplitude wave travels slightly faster than the pressure minimum. This leads to a continuous distortion of the waveform, and harmonics of the source frequency are generated in the medium during wave propagation. The generation of harmonics becomes more pronounced as peak pressure amplitudes of the interrogating pulse used for imaging, frequency, distance from the transducer, and the magnitude of the nonlinearity parameter B/A of the medium increase [see Eq. (5) and Ref. 40]. Because of low attenuation, significant second- and third-harmonic
amplitude generation is possible in water, even for interrogating pulse pressures as small as 0.5 MPa (46). However, there is ample evidence that diagnostic scanners operating at peak pressure fields near or in excess of 1 MPa can generate harmonics in human body tissue imaging (47). The harmonics can be 10–30 dB lower in amplitude than the fundamental. Latest scanners have incorporated a band-pass filter in the processing chain that removes the first harmonic (or fundamental) but keeps the second harmonic in the Rf echo signal for Bscan image formation. It is available as an option and is generally referred to as harmonic imaging, as opposed to conventional imaging at the fundamental when the filter is turned off (45). The transducers operating in pulseecho mode need sufficient bandwidth so that a moderate first-harmonic signal can be transmitted, but a weak second-harmonic can still be received within the available bandwidth. The frequency overlap between the two bands has to be minimized, and this limits the bandwidth of the transmitted signal. Therefore, there is an inherent compromise between axial resolution and the efficacy of harmonic imaging. Significant improvement in low-contrast lesion detection results when one switches from the fundamental to the harmonic imaging mode. The improvement in image quality has been particularly noted in obese patients. This improvement in contrast resolution is due to two factors. First, the harmonic beams generated during pulse propagation possess a narrower central lobe and also have lower side-lobe levels than the fundamental beam simply because of the higher frequency content. Therefore the point-spread function, specially the lateral resolution, for harmonic imaging turns out to be superior to fundamental imaging. Second, the harmonic component generated has an approximate quadratic relationship to the wave amplitude at the fundamental. Thus most of the harmonic energy originates from the strongest part of the beam in the center. The weaker parts of the beam, (i.e., the side lobes) participate to a lesser extent in the harmonic generation process. Third, the harmonics are absent in the near field of the transducer because the propagation has to take place over a finite distance before a sizable harmonic component can be generated. Consequently, the degradation of the point-spread function normally seen in fundamental imaging due to phase aberrations introduced by propagation through thick body fat layers under the skin is less severe in harmonic imaging. Some of these inferences have been theoretically examined (40,48) and experimentally confirmed (46). Contrast Imaging The development of contrast agents has played a significant role in enhancing the potential of ultrasonic imaging in medicine in the last decade. The original concept was somehow to inject some sort of air bubbles/solid particles into the bloodstream. Due to the large acoustic impedance difference between air and blood/tissue, one would expect enhanced scattering that would result in increased contrast. The challenge was to produce small enough microbubbles that could pass through the lung capillary circulation and would be stable enough to reach
ULTRASOUND IMAGING
the left side of the heart after an intravenous injection. A summary of the use of ultrasound contrast agents (UCA) in medical imaging is given elsewhere (49). To this date, dozens of UCAs have been developed, and many have been approved for clinical use by the Food and Drug Administration (FDA) (22). Now, most UCAs can reach the left heart ventricle cavity; they are either air-filled or contain gases that dissolve poorly in the blood and have mean diameters smaller than 7 µ. The mechanisms that produce contrast enhancement and its potential use still continue under investigation. We present here a summary of the current state of its imaging applications. To this date, all three regimes of bubble response discussed earlier have been exploited for contrast imaging. Initially it was thought that B-mode imaging at the fundamental accompanied by the injection of an UCA would provide sufficient contrast enhancement for many situations. This was the case in imaging the heart cavity filled with blood, but the improvement was only a few decibels in tissue perfusion. The reason is that the blood vessels in body tissue are small, and therefore the bubble density is low. Echoes from tissue can easily mask any increase from an UCA. This is often quantified as the agent-to-tissue ratio. New imaging techniques are continuously being developed to improve this ratio where UCA response to ultrasound is exploited. Transient enhanced scattering has also been explored at higher peak pressure amplitudes. Power Doppler imaging has proven more sensitive for detecting flow in small blood vessels. The introduction of an UCA enhances the signals received from blood further, and the detectability of flow from small vessels is increased further. However, color Doppler methods are susceptible to tissue motion that generates clutter or noise in Doppler information display. A method where harmonic imaging is combined with power Doppler can overcome this limitation and is currently considered one of the most sensitive techniques for tissue enhancement. At very high ultrasound peak pressure amplitudes, the transient destruction of the bubble and its signature has also been exploited for myocardial tissue enhancement (23). In another new technique called pulse-inversion imaging, a sequence of two short pluses is transmitted into the body (50). The second is an inverted replica of the first. In a linear medium, the two responses cancel each other when added. But in a nonlinear system such as bubbles, the sum is not zero and can be used as a signature of the UCA. The excitation in pulse-inversion imaging need not be narrowband. Therefore, it can overcome the small bandwidth limitation that is inherent in harmonic imaging and promises improved axial resolution. In addition to the exploitation of the harmonic response of the UCA in techniques such as harmonic B-mode, harmonic Doppler, and pulse inversion, new opportunities arise when we consider the subharmonic component of the bubble response (51). A potential advantage here is that the contribution from tissue is minimal. However, the application potential is still being explored. 3-D Imaging Theoretically, it is possible to generate 3-D ultrasound images by a simple extension of 2-D imaging. Two possible avenues have been explored. (1) One can acquire
1433
multiple 2-D images of known spatial orientation and then display the 3-D volumetric data set (28). For example, different planes can be scanned by moving the entire one-dimensional transducer array along the y axis in the geometry depicted in Fig. 3. Alternatively, a sector scanner may be rotated through 180° in small angular steps to generate 2-D images along different radial planes. The reader is referred to a recent review article for more information on the computational aspects (52). Many ultrasound manufacturers have already introduced such modifications in their scanners. (2) To avoid mechanical scanning motion, two-dimensional transducer arrays have been explored (53). The practical problems center around the complexity of fabrication, connection, and integrated circuit design required for real-time imaging by using a matrix of elements as large as 64 × 64, not to mention the high cost involved. Current ultrasound systems now allow near realtime image data acquisition, registration and 3-D reconstruction, display, and manipulation. This powerful new imaging capability has clinical utility in many areas such as accurate tumor volume estimation for radiation treatment planning, fetal face rendering, mapping of major vessels in 3-D, and in ultrasound guided surgery and biopsy. Medical simulators are interactive virtual reality environments used to improve human performance in procedures that require visual correlation of three- or higher dimensional information. When fully developed, such tools will serve as a possible alternative to training and performance evaluation for physicians in minimally invasive procedures. The concept is similar to flight simulator training in aviation. In the coming years, 3-D ultrasound is expected to play a significant role in medical imaging. Elasticity Imaging This is a new imaging modality that has potential use in tumor detection and differential diagnosis. Palpation, where a physician can feel the hardness of a tumor, has been common a diagnostic practice, but it is not quantitative. The basic physics of elasticity can be used here for developing a more quantitative imaging modality that provides a measure of the elastic modulus of the tissue (54). When an external stress is applied to an elastic material, the material can undergo deformation. Strain is a quantitative measure of that deformation. For a simple one-dimensional case, strain is given by the fractional change in length L/L. The stress to strain ratio is called Young’s modulus of the material. The elasticity imaging method for biological tissue is based on external tissue compression and subsequent computation of a strain profile along the transducer axis. The deformation L that is needed for strain calculation is generally extracted from cross-correlation analysis of preand postcompression A-line signal pairs. At the very basic level, the processing involves dividing the precompressed reference A line into many overlapping segments, each several millimeters long. The transducer then applies an axial static compression to the tissue by a small amount dz. Another A line is acquired at this compression and
1434
ULTRASOUND IMAGING
divided into same number of segments. Separate crosscorrelation is performed for each segment, resulting in a set of time shifts for each segment. These time shifts can be converted into an estimate of L. A one-dimensional strain image can be generated by mapping the strain value in each segment into gray scale and displaying it as a function of depth in the axial direction. By moving the transducer laterally and repeating the compression step processing, it is possible to generate a two-dimensional strain image, similar to a B-scan image. Using appropriate correction and calibration, it is also possible to map a profile of Young’s modulus in the tissue. The axial resolution depends on the size of the segment chosen for cross-correlation processing. The time shift estimates become less noisy and more robust when the segment window is large. So, there is a trade-off here between resolution and SNR. Generally the resolution is much worse compared to a B-scan. But its utility lies in the possible contrast. For example, it may not be possible to visualize a tumor on a B-scan if the backscatter from the interior of the tumor and the surrounding tissue happen to be equal. But the tumor may become visible on the elasticity image if there is a measurable difference in Young’s modulus between the tumor and the surrounding tissue. The reader is referred to publications by several researchers who have done work in this area in the past decade (55–57). Safety Issues As in any diagnostic test, there may be some risk (probability of damage or injury) in using medical diagnostic ultrasound. The biological effects and safety of diagnostic ultrasound have received considerable attention during the past few years (58). After a considerable review, the American Institute of Ultrasound in Medicine (AIUM) and the World Health Organization (WHO) both issued statements attesting to the safety of diagnostic ultrasound. AIUM issued a modified statement in 1997: ‘‘No confirmed biological effects on patients or instrument operators caused by exposure at intensities typical of present diagnostic ultrasound instruments have ever been reported.’’ They do advise a prudent approach to minimize the risk by minimizing the exposure time and output levels. This is known as the ALARA (as-low-asreasonably achievable) principle. The mechanisms of action by which ultrasound could produce biological effects have been studied extensively. They can be divided into two groups: heating and mechanical [2]. Ultrasound can produce a temperature rise as it propagates in tissue, primarily due to absorption. Heating can increase with ultrasound intensity and frequency. A quantitative measure of the probable thermal activity has now been developed to provide some assistance in risk/benefit decision making. It is called the thermal index (TI). It is equal to transducer acoustic output power divided by the estimated power required to raise the tissue temperature by 1 ° C. Temperature rises are considered significant if they exceed 2 ° C. For further details on calculating the thermal index, the reader is referred to a monograph published by the National Council on Radiation Protection and
Measurement (NCRP) (59). The mechanical risk is related to cavitation. Cavitation is the production and dynamics of bubbles in a liquid medium. Two types of cavitation are recognized. Stable cavitation is used to describe bubbles that oscillate in diameter in response to the incoming pressure wave. Transient cavitation occurs when bubble oscillations are so large that the bubble collapses during a compression cycle. Localized extremely high temperatures and shock waves are created in the immediate vicinity. Transient cavitation has the potential for significant destructive effects. A theory has been developed based on experiments in water that yields a predictable dependence of cavitation threshold on pressure and frequency (58). A mechanical index (MI) has been formulated to assist users in evaluating the likelihood of cavitation-related adverse biological effects for diagnostically relevant exposures. MI is equal to the peak rarefactional pressure (in MPa) divided by the square root of the center frequency (in MHz) of the pulse bandwidth. Thus far, biologically significant, adverse, nonthermal effects and the corresponding threshold MI have been identified with certainty only for medically relevant exposures in tissue that have well-defined populations of stabilized gas bodies. Many manufacturers now provide a display of the TI and MI on the scanner. They are calculated on the basis of the power settings that one chooses at any given time. In summary, this article provides an overview of medical diagnostic ultrasound imaging systems and their status at the turn of the century. These systems are primarily of the pulse-echo type. The ultrasound properties of biological media were considered first so that their implications for system design and image characteristics could be understood. Image quality metrics were defined for the B-scan modality (60,61). Diffraction or beam divergence is a physical phenomenon that occurs as a result of propagation. A basic theory of diffraction from circular disk transducers has been presented. The main objective was to use it to describe the general features of the beam pattern and their implications for B-scan imaging. Technological developments have been fast paced in the last decade. Those that have become clinically successful or hold great potential, for example, harmonic imaging, computer processing of multichannel signals, contrast imaging, 3-D imaging, and elasticity imaging, have been considered here. The Doppler ultrasound technique for blood flow estimation has witnessed major advancements in recent years, therefore, it was included in this section. Most of the advance that have taken place are within the context of the B-scan imaging format (62). Finally a current understanding of the risks of ultrasound is summarized now in terms of two recently formulated indexes, TI and MI. BIBLIOGRAPHY 1. K. K. Shung, M. B. Smith, and B. Tsui, Principles of Medical Imaging, Academic Press, San Diego, 1992, pp. 78–159. 2. F. W. Kremkau, Diagnostic Ultrasound Principles and Instruments, 5th ed., W.B. Saunders, Philadelphia, 1998.
ULTRASOUND IMAGING 3. H. Krautkramer, Ultrasonic Testing of Materials, 4th ed., Springer-Verlag, Berlin, 1990. 4. L. S. Schmerr, Fundamentals of Ultrasonic Nondestructive Evaluation: A Modeling Approach, Plenum, NY, 1998. 5. R. C. Waag, IEEE Trans. Biomedical Eng. 31, 884–892 (1984). 6. C. R. Hill, ed., Physical Principles of Medical Ultrasonics, John Wiley & Sons, Inc., New York, NY, 1986. 7. S. Fields and F. Dunn, J. Acoust. Soc. Am. 54, 809–812 (1973). 8. P. M. Morse and K. U. Ingard, in Theoretical Acoustics, McGraw-Hill, NY, 1968. 9. J. F. Chen, A. Zagzebski, and E. L. Madsen, IEEE Trans. Ultrasonics Ferroelectrics Frequency Control 41, 435–440 (1994). 10. V. M. Narayanan and P. M. Shankar, IEEE Trans. Ultrasonics Ferroelectrics Frequency Control 41, 845–852 (1994). 11. L. F. Joynt, Ph.D. Thesis, Stanford University, 1979. 12. M. Helguera, Ph.D. Thesis, Rochester Institute of Technology, 1999. 13. J. C. Bamber and M. Tristam, in S. Webb, ed., The Physics of Medical Imaging, Adam Hilger, Bristol, 1988. 14. B. Hete and K. K. Shung, IEEE Trans. Ultrasonics Ferroelectrics Frequency Control 40, 354–365 (1993). 15. P. N. T. Wells and M. Halliwell, Ultrasonics, 19, 225–229 (1981). 16. E. J. Feleppa et al., IEEE Ultrasonic Symposium Proceedings, 2, 1413–1417, 1999. 17. R. F. Wagner, M. F. Insana, and D. G. Brown, SPIE Appl. Opt. Instrum. Med. XII 535, 57 (1985). 18. J. M. Thijssen et al., Ultrasound Med. Biol. 19, 13–20 (1993). 19. G. E. Sleefe and P. P. Lele, Ultrasound Med. Biol. 14, 181–189 (1988). 20. Z. Cho, J. P. Jones, and M. Singh, in Foundations of Medical Imaging, John Wiley & Sons, Inc., New York, NY, 1993, Chap. 14. 21. R. T. Bayer and S. V. Lecher, Physical Ultrasonics, Academic Press, NY, 1969. 22. P. J. A. Frinking et al., Ultrasound Med. Biol. 26, 965–975 (2000). 23. K. Wie et al., Circulation 97, 472–483 (1998). 24. J. D. Gaskill, Linear Systems, Fourier Transform and Optics, John Wiley & Sons, Inc., New York, NY, 1978. 25. P. N. T. Wells, Biomedical Ultrasonics, Academic Press, London, 1977. 26. R. M. Gagliardi, Introduction to Communication Engineering, 2nd, ed., John Wiley & Sons, Inc., NY, 1988. 27. S. H. Maslak, in R. C. Sander and M. C. Hill, eds., Ultrasound Annual 1985, Raven Press, NY, 1985, pp. 1–16. 28. M. F. Insana and T. J. Hall, in P. N. T. Wells, ed., Advances in Ultrasound Techniques and Instrumentation, Churchill Livingstone, NY, 1993, pp. 161–181. 29. A. Macovski, in Medical Imaging Systems, Prentice-Hall, NJ, 1983. 30. N. A. H. K. Rao, S. Mehra, J. Bridges, and S. Venkataraman, Ultrasonic Imaging 17, 114–141 (1995). 31. G. S. Kino, Acoustic Waves: Devices, Imaging, and Signal Processing, Prentice-Hall, Englewood Cliffs, NJ, 1987. 32. H. Seki, A. Granato, and R. J. Truell, J. Acoust. Soc. Am. 28, 230–238 (1956). 33. P. Stepanishen, J. Acoust. Soc. Am. 49, 1,627–1,638 (1971).
1435
34. G. Harris, J. Acoust. Soc. Am. 70, 10–19 (1981). 35. C. J. Daly and N. A. H. K. Rao, J. Acoust. Soc. Am. 105, 3,067–3,077 (1999). 36. C. J. Daly and N. A. H. K. Rao, Scalar Diffraction from a Piston Aperture, Kluwer Academic, Boston, 2000. 37. G. Kossoff, Ultrasound Med. Biol. 5, 359–365 (1979). 38. D. Liu and R. C. Waag, IEEE Trans. Ultrasonics Ferroelectrics Frequency Control 44, 1–13 (1997). 39. J. W. Goodman, Introduction to Fourier Optics, McGraw-Hill, NY, 1968, Chap. 3. 40. K. W. Ayer, Ph.D. Thesis, Rochester Institute of Technology, 2000. 41. A. J. Angelsen, in Ultrasound Imaging — Waves, Signals, and Signal Processing, 1st, ed., Emantec, Trondheim, Norway, 2000. \http://www.ultrasoundbook.com. 42. F. W. Kremkau, in Diagnostic Ultrasound: Principles and Instruments, 5th, ed., W.B. Saunders, Philadelphia, 1998, Chaps. 5 and 7. 43. C. Kasai, K. Namekawa, A. Koyano, and R. Omoto, IEEE Trans. Sonics Ultrasonics 32, 458–463 (1985). 44. P. N. T. Wells, Phys. Med. Biol. 39, 2,113–2,145 (1994). 45. T. S. Desser and R. S. Jeffrey, Diagnostic Imaging, 21, No. 9, 63–69, 1999. 46. M. A. Averkiou, D. N. Roundhill, and J. E. Powers, Proc. 1997 IEEE Ultrasonic Symp., 1997, pp. 1,561–1,566. 47. G. Wojcik, J. Mould, S. Ayter, and L. Carcione, Proc. 1998 IEEE Ultrasonic Symp., 1998, pp. 1,583–1,588. 48. T. Christopher, IEEE Trans. Ultrasonics Ferroelectrics Frequency Control 45, 158–162 (1998). 49. J. Ophir and K. J. Parker, Ultrasound Med. Biol. 15, 319–333 (1989). 50. S. D. Hope, C. T. Chin, and P. N. Burns, IEEE Trans. Ultrasonics Ferroelectrics Frequency Control, 46, 372–382 (1999). 51. P. M. Shankar, K. P. Dala, and V. L. Newhouse, Ultrasound Med. Biol. 24, 395–399 (1998). 52. A. Fenster and D. B. Downey, in Handbook of Medical Imaging: Physics and Psychophysics, vol. 1, 1st, ed., The International Society of Optical Engineering (SPIE), 2000, Chap. 7. 53. O. T. von Ramm, S. W. Smith, and H. G. Pavy, IEEE Trans. Ultrasonics Ferroelectrics Frequency Control 38, 100–108 (1991). 54. J. Ophir et al., Ultrasonic Imaging, 13, 111–134 (1991). 55. L. S. Taylor, B. C. Porter, D. J. Rubens, and K. J. Parker, Phys. Med. Biol. 45, 1,477–1,494 (2000). 56. S. Y. Emelianov et al., Proc. IEEE Ultrasonics Symp., 1997, pp. 1,123–1,126. 57. J. Ophir et al., Int. J. Imaging Syst. Technol. 8, 89–103 (1997). 58. American Institute of Ultrasound in Medicine (AIUM): Bioeffects and Safety of Diagnostic Ultrasound in Medicine, Laurel, MD, 1993. 59. National Council on Radiation Protection and Measurements (NCRP) Report No. 113, Exposure Criteria for Medical Diagnostic Ultrasound: I. Criteria Based on Thermal Mechanisms, Bethesda, MD, 1992. 60. C. R. Hill et al., Ultrasonics Med. Biol. 17, 559–575 (1991). 61. R. A. Harris et al., Ultrasonic Med. Biol., 17, 547–558 (1991). 62. J. Lowers ed., Diagnostic Imaging Special Supplementaion on Advanced Ultrasound, pp. 1–32, (Nov 2000).
V VIDEO RECORDING
problems of loss of image definition during the filming and the reconstruction process and to synchronizing discrepancies caused by the difference between television’s frame rate of 30 per second and the 24 frames per second rate of the normal movie camera. Notwithstanding these problems, the use of kines was immense. It was estimated that, during the 1950s, the television industry was a larger user of movie film than the movie industry itself (2). Soon after Sarnoff’s speech, in December 1951 a small group of engineers at the Ampex Corporation began a project to develop the prototype of the videotape recorder as we know it today. By 1956, Ampex had introduced a reel-to-reel, commercial model, using magnetic tape media supplied by the 3M company. For the next 15 years Ampex continued to provide the broadcast industry with ever more potent videotape recorders (VTRs) (3). In 1970 the Sony Corporation, under license from Ampex, introduced the three-quarter-inch U-Matic, color videocassette player for the commercial market. In 1975 Sony established the basis for the home video recorder market by introducing the one-half-inch Betamax (SL6300) consumer videocassette recorder (VCR). Both of these machines utilized tapes wholly contained within a cartridge, the cassette, which precluded the need to thread tape, as was the case then with the reel-to-reel machines (4). In 1976 the Japan Victor Corporation (JVC), and others introduced a competitive and incompatible cassette recording format (VHS) for consumer use. Within a few years the VHS format became the dominant format, virtually eliminating Beta in the consumer marketplace. Despite this, as we shall see, Beta remains alive and well in the commercial realm to this day (5). In 1977, the RCA Corporation and Philips Electronics each introduced a videodisk player to compete with the VCR in the consumer arena. Both of these machines employed analog recording schemes not to be confused with the digital laserdisk, introduced in 1978. None of these three formats permitted recording by consumers at home. They were offered only as prerecorded material, and they therefore had a difficult time competing with the VCR. Within a few years the first two fell by the wayside, leaving only the laserdisk, produced by Philips and others, with its inherently superior picture and audio quality to retain even a small foothold in the VCR-dominated consumer marketplace (6). For almost a decade, until the 1995 announcement of the digital versatile disk, consumer video recording technology remained static. Not so the commercial arena, which saw the transition of analog tape recording to digital component and then to digital composite formats. This has been followed by the application of high-density magnetic recording technology which has brought forth hard disk drives for video recording, in turn, leading to computer-based, nonlinear,
PETER D. LUBELL Rainbow Network Communications Woodbury, NY
Long before television became a household word, man endeavored to capture and record visual images. By the time the first electronic television was introduced at the 1939 World’s Fair, both still and motion photography were well established in black-and-white and in color. Motion pictures, such as The Wizard of Oz, had full color images and complete audio sound tracks. The following section will provide an overview of video recording progress (both visual and aural signals) over almost 50 years since that 1939 World’s Fair demonstration (see Fig. 1). CHRONOLOGY After the development hiatus caused by the diversion of scientific talent and engineering skills to the war effort during World War II, 1947 saw the introduction of commercial broadcast television, and by the mid-1950s color broadcast TV was on the scene as well. Virtually from its beginning, the television industry required a means of recording and storing the live video images that it was producing. In his famous three wishes speech in the fall of 1951, Brigadier General David Sarnoff, then chairman of RCA and founder of the National Broadcasting Company, called for the development of a device capable of recording television signals with a magnetic tape medium (1). In 1951 the fledgling television networks relied on a photographic film process known as kinescope recording. This involved the transferring live television programming to motion picture film by photographing a high-intensity cathode ray tube and then reconstructing the video image by scanning the filmed image with a high-resolution camera tube called a flying-spot scanner. The resulting kines were often of poor visual quality due to the joint
1939 1947 1948 1956 1970 1975 1976 1977 1978 1980 1985 1989 1995 199?
Introduction of electronic television End of wartime hiatus (WW II) B&W recording via kinescope First commercial videotape recorder (VTR)—Ampex First videocassette recorder (VCR)—Sony U-Matic First consumer videocassette recorder—Sony Betamax Introduction of VHS format consumer VCR—JVC et al. Videodisk players—RCA and Philips Laserdisk players—Philips et al. Commercial HDTV tape recorder (VTR)—Sony Hi-Vision Commercial, digital, component VCR—Sony, D-1 Format Commercial, digital, composite VCR-Sony, D-2 format Digital versatile disk (DVD) Player—Sony, et al. HDTV digital disk (DVD) player Figure 1. Video recording milestones — timeline. 1436
VIDEO RECORDING
virtual video tape recorders for commercial recording, editing and playback, and origination applications. (A brief description of this nonlinear video recording technology can be found at the end of this article, entitled nonlinear video. A more detailed discussion can be found elsewhere in this publication under the same title.) It took almost 2 years from the initial announcement of the high-density digital versatile disk (DVD) for it to appear in 1997 in the consumer marketplace. Now it is only available in the prerecorded format, although the technology currently exists to produce a recordable disk, once the DVD format has become an established format. Even now, development is continuing on an enhanced DVD capable of handling a digital, high-definition television signal (7). DESCRIPTION OF THE TECHNOLOGIES Throughout this article, the term video represents (except as specifically noted) a television signal consisting of program video, program audio (1, 2, or 4 channels) and a time-code channel. For historical purposes we include a brief description of the kinescope process to provide a perspective on the more sophisticated magnetic and optical technologies which followed it and which dominate the video recording field. Kinescope Recording In its simplest form, kinescope recording involves filming a television image from a high-intensity picture tube and then playing it back by scanning the film with a flyingspot scanner to convert the filmed image to a standard video format (Fig. 2). This process allowed for time-delayed rebroadcast of television programming throughout the nation. It did so, however, at a high cost and significantly reduced picture quality.
(a)
High-intensity cathode ray tube
“ Motion picture camera (with pull-down”)
Video input
(b)
Motion picture projector-modified
Picture quality suffered for two reasons; the required image-to-film and film-to-image transfers were done with limited light levels, producing grainy images of low contrast. A second cause was the frame-rate difference between television (30 frames per second) and cinematography (24 frames per second). This latter problem was circumvented by using a ‘‘pull-down’’ mechanism in the motion picture camera which allowed exposing one film frame to multiple television fields (two or three) so that the 24/30 ratio was reversed. The resultant product was low in contrast and image stability. Although the broadcast use of kinescope recording disappeared with the introduction in 1956 of the Ampex videotape recorder, there are still special situations where kinescopic recording is used. Most of the timing problems have been overcome through electronic time-base correction circuits and frame-storage techniques. As we shall see, however, the paramount technology in television storage from 1956 to the present has been magnetic tape recording (8). Magnetic Tape Recording-Analog, Linear Transport The first commercially practical video recorder was developed in the United States by the Ampex Corporation during the period 1951 through 1956. It recorded monochromatic (black-and-white) images from a National Television Standards Committee (NTSC), 525-line, 60 Hz signal with monaural sound. This machine, designated the VR-1000, achieved its breakthrough performance by using four moving recording heads, rotating in a plane transverse to the direction of (two-inch wide) tape motion. A second contributor to the success of the VR-1000 was its application of frequency modulation (FM) in the recording process (9). Earlier efforts to produce a videotape recorder (VTR) were based on extensions of the magnetic tape technology developed for audio recording. This approach involved moving the magnetic tape longitudinally past one or more fixed recording heads and also the direct transfer of signal level through amplitude modulation of the magnetic field being applied to the tape (Fig. 3). These early systems failed to pass muster practically because they required excessive tape speed, as high as 20 feet per second, which resulted in limited recording times, and because their use of direct AM level transfer produced unstable record and playback performance due to the nonuniform magnetic characteristics of the tapes available then. Lack of stability
Lens
Processing circuits Video output
Television signal Recording head
High-resolution camera tube (flying-spot scanner) Figure 2. Basic kinescope (a) recording and (b) playback.
1437
Source reel
Pickup reel Tape travel
Figure 3. Linear transport; direct tape recording.
1438
VIDEO RECORDING
Direction of head rotation
Source reel
Four recording heads
Quad recording head Pickup reel
Direction of tape travel Side view
Path of head rotation
Slant track of scan
Head rotation
Head wheel rotation Tape travels into plane of figure
Direction of tape travel
Tape wrap
Figure 5. Helical scan (transverse) tape recording.
End view
Figure 4. Quad-head tape recording.
in their tape transport mechanisms also contributed to recording artifacts, such as jitter, flutter and wow (10). The Ampex machine resolved these issues in the following manner (Fig. 4). Four recording heads were mounted on a cylindrical drum which rotated at high speed (14,400 rpm) in a plane whose axis was at right angles to the motion of the tape. The tape speed was only 15 inches per second, a substantial reduction from the earlier British Broadcasting Company (BBC) and RCA designs which employed tape speeds of 200 and 3,650 inches per second, respectively. In the Ampex machine, each of the four video recording heads laid down 16 horizontal television lines per revolution of the revolving head drum. Thus one drum revolution recorded 64 lines and only eight tape passes were required to complete the entire 525-line NTSC video frame. Another significant feature of the VR-1000 was its use of FM technology for the signal-to-tape information transfer. This approach avoided the pitfalls of the direct methods used in the earlier VTRs because now the information was recorded in the frequency domain using incremental changes about a fixed carrier. The tape stock available then was much more uniform in its frequency-dependent characteristics than in its amplitude response to changing magnetic bias (11). For almost two decades this Ampex Quad VTR, with minor refinements, was the commercial standard for television recording. Helical Scan and the Cassette Format During the next 15 years, companies in the United States and Japan focused their attention on better media (tape stock), improved tape utilization (higher density recording), and a more convenient transport mechanism. The Minnesota Mining and Manufacturing (later 3M) company was a principal provider of media for television recording in the United States during this period. The early product was a paper-backed tape with a magnetic coating on one side. By late 1947, 3M had converted to acetate film for base stock. By the time the VR-1000 was first commercially used by CBS on November 30, 1956), the tape was an improved acetate-based 3M product called Scotch 179 Videotape (12). The next step in the evolutionary process focused on improvements in tape scanning. To improve the performance of its Quad machine, Ampex and its licensees
(RCA and the Victor Corporation of Japan [JVC]) worked to develop a helical scanning format which effectively lengthened the scan distance per head pass by using a tangential rather than a simple longitudinal course (Fig. 5). This type of scanning has become the accepted standard for VTR in most applications (13). In 1962 machines were introduced by Ampex, RCA, and JVC utilizing helical scanning with one-inch wide tape stock. The VR-1000 Quad machines used longitudinal scanning with two-inch wide tape. By the end of the 1960s, a helical scan VTR produced by the International Video Corporation (the IVC-9000), using two-inch tape, for the first time exceeded the performance of the Ampex Quad machine, then the broadcast industry standard. The final nail in the coffin of the Quad process came when digital time-base correction circuits were devised by the Japanese in 1968, which allowed the long-scan, helical process machines to exceed the stability performance of the shorter track Quad system (14,15). The resulting improved helical scan VTRs became widely used in broadcast applications in 1970. These systems employed three-quarter-inch wide tapes as opposed to the one-inch tapes of their predecessors. Until this time even the improved helical scan machines utilized open reel-to-reel tape formats. In 1970, Sony made a major breakthrough in VTR design by introducing the prototype U-Matic color video recorder. This was an analog, helical-scan machine which utilized three-quarterinch wide tape packaged in a self-contained cassette. No longer did the tape installation process require careful and time-consuming threading of loose tape by a highly experienced technician. The cassette package contained both source and pickup reels with the tape installed at the factory. Tapes could now be changed in seconds by someone with virtually no prior experience (16). An improved, production version of the U-Matic recorder (the VP-1000) was offered for sale by Sony in 1972. Thus was born the video cassette recorder (VCR) and the groundwork was laid for a consumer product (17). Video Cassette Recording (VCR) The Ampex Quad machines had always been too complex and expensive to serve as a basis for translation to the consumer market. The Sony VP-1000 was initially conceived as just such a product because its cassette could be readily installed and removed and its use of advanced, lower cost solid-state technology was a further step in this direction. (Ampex had introduced the first fully transistorized VTR, the VR-1100 in 1962,
VIDEO RECORDING
but this machine was still large and cumbersome and had setup requirements too complex to be considered for consumer use.) This prototype U-Matic machine was a cooperative effort of three Japanese companies, Sony, JVC, and Matsushita (known by its US trade name, Panasonic). For this reason, this three-quarter-inch format was termed 3C in Japan. For its efforts in developing the VCR, Sony was awarded an EMMY in 1976 by the Motion Picture Academy of Arts and Science for technical achievement. In actuality, all three companies made significant contributions to the development of the VCR (18). The early U-Matic helical scan machines were not an unqualified success for commercial applications. Their performance, as measured by resolution, signal-to-noise ratio (SNR), and stability, was not equal to the Quad machines then used by all broadcasters. The U-Matics, however, were compact, simple to operate, and inexpensive enough for field use, such as news gathering and other remotes. On the other hand, they were neither compact enough nor inexpensive enough to support their introduction as a consumer product. The threequarter-inch cassette was still rather large and expensive. Another problem was their need for sophisticated timebase correction circuits to overcome the timing errors introduced by helical scanning of relatively narrow tape. Broadcasters eventually overcame these problems, first by introducing electronic time-base correction (TBC) in 1972 and later by using frame synchronizers in 1974 (19). At this stage in 1974, the helical format machines were potential replacements for Quad machines in commercial applications, but not in the consumer market because they still cost too much. Introduction of the Consumer VTR. Having achieved a somewhat competitive footing in the commercial marketplace, Sony and the other Japanese manufacturers again focused their attention on developing a consumer VCR. In 1970, Ampex and Toshiba jointly developed a one-half-inch cassette, monochromatic machine for the consumer market. It did not catch on! Neither did comparable machines introduced in 1972 by Panasonic and Philips (20). During this period, JVC approached Ampex with a proposal to jointly develop a one-half-inch color VCR specifically for consumer use. To support their proposal, JVC had constructed a one-quarter-scale (one-half-inch format) working model based on the Ampex VR-1500 two-inch helical scan machine. The Ampex management was reportedly impressed with the prototype, but they gracefully declined to enter the world of consumer electronics at that time. Ampex, however, licensed JVC to use their technology in such an application. This was to prove a successful financial decision of momentous proportions for Ampex (21). In 1975, Sony introduced a one-half-inch color consumer VCR using the Sony proprietary Beta (helical-scan) format. This machine, the Betamax SL-6300, was relatively compact and simple to operate. The Beta nomenclature was based on considering it a derivative of the U-Matic or Alpha format (22).
1439
Within a year in 1976, JVC refined its prototype onehalf-inch VCR, using its own, unique helical-scan format and proceeded to introduce it as the Video Home System (VHS). Thus did the format wars begin for the consumer market (23). Although many technically astute persons believed that the Beta format was capable of performance superior to that of the VHS format, within a few years VHS had gained supremacy in the consumer marketplace. Many believe that this success was primarily due to JVC’s decision to broadly license its format so as to encourage availability of a wide spectrum of prerecorded material to the consumer in a short time during which Beta titles were somewhat limited. Another drawback of the Beta format was its limited recording time compared to VHS. Even so, more than 20 years after their commercial introduction, U-Matic machines (Beta’s origin) are still being produced and are widely used in the broadcast industry. Early Disk (Nonmagnetic) Recording In 1977, within two years after the introduction of the Sony Betamax, both RCA and Philips introduced a videodisk player for the consumer market. Both were play-only systems. The RCA version (called SelectVision) employed a capacitance pickup which monitored variations in a grooved, 12-inch diameter disk, rotating at 450 rpm. Each side provided up to 60 minutes of television programming (24). The Philips version of the videodisk was quite different. It used a small laser to retrieve television information from a previously etched, grooveless, spiral track on a 12-inch disk. Philips called it the Video Long Play (VLP) system after the LP audio records current then. VLP played back previously programmed disks with a maximum program duration of two hours with the disk rotating at 150 rpm. A commercial version of the VLP system, offered by JVC and others as the Video High Density (VHD) system is still in use occasionally for special applications, such as institutional training programs and national sales promotions in the automobile business (25). Because both SelectVision and VLP were play-only with no home recording capability, they did not receive significant consumer acceptance and were soon withdrawn from the marketplace. The following year (1978) saw the introduction of the optical laserdisk, as we know it today. These units by Philips and others were playback-only machines using factory prerecorded material. However, they were digitally encoded and their reproduction performance was much better than that of the earlier VLP (analog) system and well above the performance of even today’s VHS one-halfinch VCRs. A simplified description of the optical laserdisk recording and playback process is shown in Fig. 6. Many believe that this process led to the development of the audio Compact Disk introduced by Sony and Philips in 1982 (26). For almost 20 years until the 1995 introduction of the DVD, the laserdisk has reigned as the gold standard for consumer video playback performance. Despite this reputation, the high cost of both the hardware and the software (discs) has limited its penetration into the
1440
VIDEO RECORDING
Direction of rotation
Video information (luminance)
Expanded“ view showing pits”
Color burst Underside of disk
Blanking level 0 Volts
One horizontal line Laser disk
Horizontal sync pulse
Figure 7. Composite television waveform.
Lens Reflective surface Playback; optical path Record; optical path Optical beam splitter
Video path Laser (l = 780 nm; infrared)
Figure 6. Laserdisk recording and playback (simplified).
U.S. consumer marketplace. This is not true in Japan, where virtually all new programming is made available simultaneously in both VHS cassette and laserdisk formats. In Japanese electronics outlets, both products remain equally competitive and laserdisk products are widely available. In the United States, after cumulative sales of more than 140 million VHS VCRs, the laserdisk has been virtually drowned in a sea of VHS tape as has the Beta tape format. COMMERCIAL VTR EVOLUTION-COMPONENT AND COMPOSITE, DIGITAL TAPE RECORDING Component Digital Tape Recording Even as the consumer video industry was undergoing attempts to improve the playback performance of onehalf-inch VCRs with optical and digital technologies, the commercial videotape recording field was experiencing a revolution of its own. Until the mid 1970s, commercial videotape recording focused on handling the composite video signal which contained all elements, luminance (brightness), chrominance (color information), and synchronization and timing pulses, within a single envelope (Fig. 7). About the same time in 1978 as the consumer market saw the introduction of the digital laserdisk, the same signal processing technologies were being applied to commercial recording, first by separately processing the luminance and chrominance signals in the analog domain and then, subsequently, by applying digital techniques to this process. Recording these two analog signal elements on separate video channels was termed component signal recording. It offered the advantage of reducing color artifacts during playback. This process eliminates the use of the 3.58 MHz color subcarrier and therefore results in reduced
susceptibility to cross modulation between component signals. During the early 1980s two one-half-inch recording formats competed for market share for professional recording, the Sony Betacam SP (superior performance) and the Matsushita M-II formats. They were incompatible with each other because of significantly different signal processing techniques. The situation was reminiscent of the consumer format conflict (Beta versus VHS) of a decade earlier. The Sony Betacam SP employed time-division multiplexing (TDM) to combine color component signals and record them alternately on a single color channel along with two stereo audio subcarriers at 310 kHz and 540 kHz, respectively. The luminance signal is recorded separately on a second channel (27). The M-II format first sums the quadrature I and Q color elements into a single signal which is then recorded as the color track together with a pair of stereo audio subcarriers at 400 Hz and 700 kHz, respectively. A second channel contains the luminance information (28). Composite Digital Tape Formats As the analog component technology evolved, progress was being made in applying digital techniques, first employed in the consumer laserdisk, to commercial videotape recording. In 1980, an international standards body, the International Radio Consultative Committee (CCIR), drafted Recommendation #601 which sought to define the characteristics of component digital recording. CCIR 601 was intended to be the recording standard for studios throughout the world. For many years thereafter, CCIR 601 was altered and modified. Finally, by 1985 the standards recommendations were firm enough for final approval. At this time Sony announced a digital component tape recording system based on the 601 recommendations. This D-1 system employed a 19-mm wide (approximately threequarter-inch) tape in a cassette (similar to that employed in the U-Matic) and scanned in a helical manner with six tracks per field, two containing video data and four to support two pairs of stereo audio (29). In 1987 Sony and Ampex working together brought forth the D-2 composite digital recording system. Like the D-1 format, the D-2 utilized a 19-mm tape and the same cassette as D-1 and provided four channels (two stereo pairs) of digital audio in addition to the video data. The scanning is also helical with six tracks per field (30).
VIDEO RECORDING
The principal difference between the D-2 and D-1 formats is that D-2 is designed for a composite input signal which contains all of the color signal elements within a single envelope. Therefore the D-2 format is primarily for field recording whereas D-1 is for both studio and postproduction editing.
CONSUMER VIDEO-DIGITAL VERSATILE DISC (DVD) For almost 20 years following its introduction in 1976, the VHS analog tape format remained preeminent in the consumer marketplace for video recording and playback. This was true, notwithstanding the ability of the laserdisk to provide much superior playback performance than even the enhanced VHS format, Super-VHS (S-VHS). In 1995, barely 10 years after the highly successful introduction of the compact disk (CD) which provided digital, laser-scanned playback, an analogous versatile digital disk format (DVD) became available for video recording. Like the laserdisk before it, the DVD, as currently available, is a playback-only format. The DVD is built on experience gained from the audio CD. However, it uses both a higher frequency (and shorter wavelength, red laser at 635 nm) and much more dense physical packing parameters (track spacing, groove width and length) to support an almost 6 : 1 increase in information storage capacity. These improvements were not enough in themselves to provide the 10 : 1 storage improvement needed to go from 74 minutes of audio data (640–680 MB of information) to the 4,000 MB needed to provide over two hours (typically, 135 minutes) of standard resolution (NTSC) television programming. Data compression is used as well. As is the case for the audio CD, the laser scans the reflective surface of the DVD disk from below, and like the audio CD, the DVD (as available today) uses factory prerecorded programming on a CD-like media (12 cm = 5.25 in. in diameter) (Fig. 8) (31).
Direction of rotation
Pits (unit level)
Land (zero level) Disk cross section (magnified) Photodiode detector Video output (digitized)
Optical multiplexer Laser
Figure 8. Digital versatile disk scanning (simplified).
1441
FUTURE DVD ENHANCEMENTS Even as DVD is being initially deployed in the marketplace, companies are investigating enhancement technologies to provide increased (double, quadruple, or higher) information storage capacity. One possible advance, derived linearly from present technology, seeks to replace the present binary digital encoding scheme with a multilevel scheme (nine states) which interprets the depth of the recorded pits to allow for coding beyond simply one or zero. The ability to read nine levels (from zero to eight) allows for a CD-sized disk capable of storing over two hours of HDTV-formatted television programming including three distinct stereo audio channels (32). A second potential avenue of increased storage capability focuses on substituting a blue light laser at a wavelength of 390 nm for the red light laser at 635 nm currently employed in DVD recording. Because data storage capacity follows an inverse-square relationship with wavelength, this blue light technology could provide more than a 2.5 : 1 increase. Combining this with the use of smaller pit dimensions and closer track spacing, the total improvement could reach 4 : 1. This would lead to a storage capacity of over 15 GB on a 12 cm disk, enough capacity to provide five to seven hours of standard resolution (480 lines, NTSC) television or as much as two hours of HDTV (at 1,080 lines) (33). DVD HOME RECORDING Even as the aforementioned investigations of increased capacity are being pursued, attention is also being given to technologies which address the ability of the consumer to record at home, thereby creating a real replacement for the VCR. At least two main approaches are being followed at this time; optical phase change (OPC) and magneto-optical (MO). OPC technology exploits the heating effect of laser energy to alter the reflectivity of the DVD disk so that this change can be read optically as a phase difference. The current DVD technology reads the signal amplitude digitally as one of a pair of binary states, either one or zero. Magneto-optical (MO) technology is not really new. It has been used for digital data storage for some time, and it is the basis of the recordable MiniDisc audio system introduced by Sony in 1992. As used in the MiniDisc (MD) system, virtually simultaneous erasure and recording is possible by having both a magnetic recording head and a heating eraser laser coaxially located, respectively, above and below the recording disk (Fig. 9). To record, the laser first heats the disk to its Curie point (about 400 ° F). This causes the magnetic orientation of that portion of the disk to become totally random. Next, the magnetic recording head above the disk reorganizes the magnetic particles on the disk to represent one of the two binary states, one or zero. Then this orientation can be read back during playback, analogously to the digital optical process used in the current DVDs (33). Once the disk has cooled down, its magnetic orientation is firmly set, even in the presence of significant magnetic
1442
VIDEO RECORDING
Magnetic recording head Video input (digitized) Old recording data
New recording data
Orientation of disk material Lens
Processing circuits Video output (analog)
Optical multiplexer Lens Laser
Figure 9. Magneto-optical recording.
fields. It can be altered only by applying the equivalent of the laser heating cycle. The OPC approach is quite attractive from a manufacturing perspective because it employs many of the components and circuits of the current DVD system. However it is slightly limited in the number of write/rewrite cycles (to between 100,000 and one million), whereas the MO system has virtually no limit on its rewrite capability (34). NONLINEAR RECORDING-COMMERCIAL VIDEO SERVERS A video server is a computer-based system which provides storage and playback of a variety of video formats. It consists of two major components (Fig. 10), the computer, which serves as an interface between the video environment and the storage device, and the hard disk storage device (35). During recording, the computer converts the input video information (both video and audio) into a digital data stream formatted for storage on a hard disk storage array. The input can be either analog (NTSC or component) or digital (uncompressed or compressed) in format. The resulting digital data stream, delivered to the hard drive, can be either uncompressed or compressed. Compressing the data prior to storage provides for efficient use of storage capacity at the cost of minor reduction in video quality during playback.
During playback, the data file is retrieved from the storage device and converted back into a appropriate television format. This playback format may be either the same as, or different from, the input as the operator may desire. The process is termed nonlinear because video information can be accessed from any segment of the complete file rather than linearly, starting from the beginning of the file, as is the case for tape. Video server applications include broadcast editing, commercial insertion, and preparation of interstitial insertion material (clip reels). An aggressive adopter of this technology is the television news sector which employs the nonlinear instant access capability of the VS systems to enhance its creation of timely news programming. Video servers can also be used to provide pay per view and near-video-on-demand (NVOD) type service to subscribers on a delivery system which has feedback capability from the subscriber’s location to the server site (36). Now, video server technology is in a period of rapid growth with the application of constantly improved technology. Philips, Sony, and Hewlett-Packard are some of the companies currently active in this field (37). The storage array concept allows multiple diskrete disks to act as a single unit. The use of such multiple disks and drives provides redundancy in the event a single device (either disk or drive) fails. Even using compression of the storage data stream requires the use of multiple parallel drives to support the data transfer rate. A typical disk drive supports a data rate of three to five MB/s. A full digital component data stream can produce a rate of 35 MB/s. Multiple disk drives combined with parallel data paths can support up to 40 MB/s. Use of compression techniques can reduce the 35 MB/s stream to an effective rate of less than 20 MB/s which then can be handled by a single path storage array (38). Because current storage technology is evolving at such a rapid pace, it is virtually impossible to predict the eventual structure of these nonlinear systems, even in the near future. BIBLIOGRAPHY 1. S. Wolpin, Invention Technol. 10(2), 52 (1994). 2. A. Abramson, SMPTE J. 64, 72 (1955). 3. C. Ginesburg, in The birth of video recording, SMPTE 82nd Convention, San Francisco, CA, October 5, 1957. 4. R. Mueller, AV Video 7, 72 (1985). 5. J. Lardner, in Fast Forward, New York, W. W. Norton, 1987.
Computer interface (video transfer subsystem) Audio/video input (analog or digital format)
6. D. Mennie, IEEE Spectrum 12, 34, August 1975.
Audio/video output (analog or digital format) Data flow
7. P. Lubell, IEEE Spectrum 32, 32, August (1995). 8. A. Abramson, SMPTE J. 90, 508 (1981). 9. Ref. 3, p. 74. 10. Reference 5, pp. 51–52, and N. Heller, Getting a grip on tape, Video Systems, vol. 12, 26, 1986. 11. Ref. 3, pp. 74–75. 12. D. Rushin, HiFi to high definition: Five decades of magnetic tape, dB Magazine, Jan./Feb. 1991.
Hard disk storage array Figure 10. A basic video server.
13. R. Warner Jr., Earl Masterson, IEEE Spectrum 33, 51, February (1996). 14. A. Harris, SMPTE J. 70, 489 (1961).
VIDEO RECORDING 15. C. Bentz, Video Systems 11, 49 (1985).
27. K. Sadashige, SMPTE J. 98, 25–31 (1989).
16. 17. 18. 19.
28. 29. 30. 31.
Ref. 4, p. 73. S. Stalos, AV Video, 7, 75 (1985). Ref. 17, p. 76. C. Bentz, Broadcast Engineering 27, 80 (1985).
20. Ref. 2, p. 74. 21. Ref. 13, p. 57. 22. N. Kihara et al., IEEE Trans. Consum. Electron. 26, 56 (1976). 23. Ref. 5, pp. 145–149. 24. W. Workman, The videodisk player, 25/4, Session 25, ELECTRO/81 Prof. Program Session Record, New York, April 1981. 25. K. Compaan and P. Kramer, SMPTE J. 83, 564 (1974). 26. P. Rice and R. Dubbe, SMPTE J. 91, 237 (1982).
1443
J. Roizen, TV Broadcast 8, 68 (1985). R. Hartman, SMPTE J. 95, 12–15 (1986). D. Brush, SMPTE J. 95, 12–15 (1986). Ref. 7, p. 33.
32. T. Wong and M. O’Neill, in 15 GB per side and no blue laser, Data Storage, April 1997, pp. 80–83. 33. MD Rainbow Book Sony Standard, San Francisco, CA, 1991. 34. Phase-Change Rewritables, Sony Product Release, Palo Alto, CA, September 1997. 35. C. Bernstein, SMPTE J. 106, 511–518 (1997). 36. P. Hejtmanek, Broadcast Eng. 38, 27 (1996). 37. P. Hejtmanek, Broadcast Eng. 39, 36 (1997). 38. Digital video servers for on-air broadcast use, SMPTE Seminar NAB Convention, Las Vegas, NV, April 1997.
W WAVELET TRANSFORMS
1 0.8
Rochester Institute of Technology, RIT Rochester, NY
0.6 y (t )
RAGHUVEER RAO
INTRODUCTION
0.4 0.2
In a 1984 paper, Grossman and Morlet (1) investigated the problem of representing square-integrable functions by using shifts and dilations of a single function called a wavelet. The motivation lay in the fact that there is evidence for the usefulness of such representations in analyzing seismic traces (2). Although earlier articles dealing with similar problems can be found, for example (3), the Grossman–Morlet paper provides an integral transform along with conditions under which one can invert the transform. Thus was born the field of wavelet transforms. Subsequent developments in the field such as the formulation of discrete basis functions, the relationship to subband filtering, and the application to signal/image processing have contributed to widespread interest in the subject (4). The purpose of this article is to provide a short overview of wavelet transforms and their application to certain image processing problems.
0 −0.2 −0.4 −10
−8
−6
−4
−2
0
2
4
6
8
10
Time Figure 1. A wavelet.
b. The family of wavelets ψa,b (t) parameterized by the dilation or scale a and the shift b are sometimes referred to as the daughter wavelets of ψ(t). The energy or squared norm of each member of this √family is the same because of the normalization factor 1/ |a| in Eq. (3). Figure 2 shows two daughter wavelets of the mother wavelet of Fig. 1.
WHAT ARE WAVELETS?
CONTINUOUS WAVELET TRANSFORM
Wavelets are continuous time functions that integrate to zero and are square-integrable. Thus, a one-dimensional wavelet ψ(t) satisfies
The continuous wavelet transform (CWT) of a finitenorm, one-dimensional signal s(t) with respect to a mother wavelet ψ(t) is given by
∞ ψ(t)dt = 0
∞
(1) S(a, b) =
−∞
and
∗ s(t)ψa,b (t)dt,
(4)
−∞
∞ |ψ(t)|2 dt < ∞.
where the asterisk denotes complex conjugation. Defining the inner ∞product of two functions x(t) and y(t) as (t), y(t) −∞ x(t)y∗ (t)dt, the CWT becomes
(2)
−∞
A wavelet can be real or complex-valued. The function argument t is usually regarded as time. An example of a wavelet is shown in Fig. 1. The reason for using the term wavelet is attributed to the resemblance of the function to a decaying wave. Wavelets are band-pass functions in the sense that they have a Fourier transform magnitude that vanishes at the origin and that peaks at a finite frequency. Consider t−b 1 . (3) ψa,b (t) √ ψ a |a|
S(a, b) = s(t), ψa,b (t) .
(5)
Thus, the wavelet transform of a signal with respect to a mother wavelet is simply the collection of projections of the signal onto the set of daughter wavelets as a function of a and b. Due to the effectively finite duration and bandwidth of wavelets, these transforms localize the signal information in both time and frequency. A scalogram, which is the squared magnitude of the wavelet transform is usually plotted to view the time versus scale information. As an example, consider the chirp waveform illustrated in Fig. 3. Its scalogram with respect to the Mexican hat wavelet
where a > 0 and b are real-valued and the symbol stands for ‘‘defined as.’’ The function ψa,b (t) is a mother wavelet ψ(t) dilated by a factor a and shifted in time by an amount 1444
WAVELET TRANSFORMS
1445
Mexican hat wavelet
1 1
a = 1.5, b = 0
0.8 0.6
0.5
ya,0(t )
0.4 0.2 0
0
−0.2 −0.4 −10
−8
−6
−4
−2
0
2
4
6
8
10
Time
−0.5 −10
−5
0
1.5
5
10
Time
a = 0.5, b = 0
Figure 4
1
ya,0(t )
0.5
Scalogram 2.0
0
1.8 1.6
−0.5 −1 −10
−8
−6
−4
−2
0
2
4
6
8
10
Time
Scale
1.4 1.2 1.0 0.8
Figure 2. Wavelet dilations.
0.6 0.4
Chirp 1
0.2
0.8
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
Time 0.6 Figure 5. Scalogram of chirp. See color insert.
0.4 0.2 0 −0.2 −0.4 −0.6 −0.8 −1
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Time Figure 3. Chirp function sin(0.4π t2 ).
(shown in Fig. 4), ψ(t) = (1 − 2t2 ) exp[−t2 ],
The scale variable bears an inverse relationship to frequency. As dilation increases, the zero crossing rate of the daughter wavelet decreases. Therefore, the wavelet transform at large values of a provides information about the lower frequency spectrum of the analyzed signal. Conversely, information contained at the higher end of the frequency spectrum is obtained for very small values of a. Thus, as seen in Fig. 5, the scalogram peaks occupy continually lower positions along the scale axis, as time progresses. This corresponds to increasing frequency of the chirp with time, as seen in Fig. 3. An elegant expression exists for the inverse wavelet transform if a wavelet satisfies the admissibility condition. Suppose that
(6)
∞
(ω) =
is displayed in Fig. 5.
ψ(t) exp(−iωt)dt −∞
(7)
1446
WAVELET TRANSFORMS
is the Fourier transform of a wavelet ψ(t). Then, the wavelet is said to satisfy the admissibility condition if ∞
| (ω)| dω < ∞ |ω| 2
C −∞
(8)
When the admissibility condition is satisfied, the inverse continuous wavelet transform, is given by s(t) =
1 C
In the frequency domain, Eqs. (13) and (15) become
∞ ∞
−∞ −∞
1 S(a, b)ψa,b (t) da db. |a|2
(9)
DISCRETE WAVELET TRANSFORM It is possible to find mother wavelets so that one can synthesize functions from a set of their daughter wavelets whose dilation and shift parameters are indexed by the set of integers. In particular, dilations and translations from a dyadic set, ψa,b (t) : a = 2k , b = 2k i, i and k are integers,
(10)
φ(t) = 2
h(k)φ (2t − k)
˜ =2 φ(t)
˜ φ(2t ˜ h(k) − k)
(12)
k
and
˜ − k) = δ(k), φ(t), φ(t
where δ(k)
1 0
(13)
˜ − n), g(n) = (−1)1−n h(1 ˜ g(n) = (−1)1−n h(1 − n),
h(k) =
k
and
˜ h(k) =1
˜ + 2n) = h(k)h(k
k
Defining H(ω)
δ(n) . 2
(14)
n
˜ ψ(t) =2
˜ h(n) exp (−iωn).
(16)
g(k)φ (2t − k)
˜ φ˜ (2t − k) g(k)
(20)
k
˜ The resulting wavelet ψ(t) is orthogonal to φ(t), and ˜ its integer translates. Similarly, the wavelet ψ(t) is orthogonal to φ(t), and its integer translates. The wavelet ˜ pair ψ(t) and ψ(t) are said to be duals of each other. Along with their dyadic dilations and translations, they form a biorthogonal basis for the space of square-integrable functions. Given a square-integrable function f (t),
where
ak,i ψ2k ,2k i (t),
(21)
k
ak,i = f (t), ψ˜ 2k ,2k i (t) .
(22)
The special case of an orthogonal wavelet basis arises when the filter coefficients and scaling functions are their ˜ and φ(t) ˜ = φ(t). own biorthogonal duals, that is, h() = h() This results in a single wavelet ψ(t) that is orthogonal to its own integral translates, as well as all of its dyadic dilations. The earliest known orthogonal wavelet is the Haar wavelet which appeared in the literature several decades before wavelet transforms (5) and is given by
(15)
k
i
h(n) exp (−iωn)
n
˜ H(ω)
ψ(t) = 2
f (t) =
k
(19)
and then using the two scale relations given by Eqs. (11) and (12).
k=0 otherwise.
˜ The functions φ(t) and φ(t)are called scaling functions. Equations (11) and (12), called the dilation equations or two-scale relations, indicate that the scaling functions are generated as linear combinations of their own dyadic shifts and dilations. Equation (13) imposes orthogonality between one scaling function and integral translations of the other. Two conditions that follow from Eqs. (11)–(13) are
(18)
(11)
k
(17)
˜ ∗ (ω + π ) = 1. ˜ ∗ (ω) + H(ω + π )H H(ω)H
If the sequences h() and h∼ () are viewed as impulse responses of discrete-time filters, then H(ω) and H ∼ (ω) are their frequency responses. It follows from Eqs. (17) and (18) that the product H(π )∼H ˜ ∗ (π ) = 0. Although this condition is satisfied by letting just one of the frequency responses be zero at a frequency of π radians, the interesting and practical cases require that both frequency responses be zero at π radians. Consequently, the filters assume low-pass characteristics. Wavelets are constructed from scaling functions by forming sequences g(n) and g∼ (n) as
provide the daughter wavelets. ˜ Consider finite-norm functions φ(t) and φ(t) that have nonvanishing integrals satisfying
˜ H(0) = H(0) = 1,
ψ(t) =
1 0 ≤ t < 1/2 −1 1/2 ≤ t < 1 0 otherwise.
(23)
The Haar wavelet is a special case of an important class of orthogonal wavelets due to Daubechies (6). The coefficient sequence h(k) in the dilation equation for this class of wavelets consists of a finite and even number of coefficients. The resulting scaling function and wavelet are
WAVELET TRANSFORMS
lim Vk = L2 (R),
Scaling function
k→−∞
1.4
∞
Vk = 0(t),
1.2
1447
(26) (27)
k=−∞
1
where 0(t) is the function whose value is identically zero everywhere. The properties given by Eqs. (24)–(27) make the set of vector spaces {Vk : k = 0, ±1, ±2, . . .}, a multiresolution analysis (MRA). Obviously, an MRA is also generated by {V˜ k : k = 0, ±1, ±2, . . .} where V˜ k is the vector space ˜ −k t − ) : integer}. spanned by {φ(2 Suppose that xk (t) is some function in the vector space Vk . Expanded in terms of the basis functions of Vk .
0.8 0.6 0.4 0.2 0
−0.2 −0.4
0
0.5
1
1.5
2
2.5
3
xk (t) =
(28)
n
Time
where, (by Eq. (13)),
Wavelet 2
˜ −k t − n) . x(k, n) = xk (t), 2−k φ(2
1.5
Notice how the coefficients of the expansion are generated by projecting xk (t) not onto Vk but rather onto V˜ k . Now, suppose that we have a square-integrable signal f (t) that is not necessarily contained in any of the vector spaces Vk for a finite k. Approximations to this signal in each vector space of the MRA can be obtained as projections of the f (t) on these vector spaces as follows. First, inner products fk,n are formed as ˜ −k t − n) . fk,n = f (t), 2−k φ(2 (29)
1 0.5 0 −0.5 −1 −1.5
x(k, n)φ (2−k t − n)
0
0.5
1
1.5
2
2.5
3
Time Figure 6. Daubechies four-tap filter. Top: scaling function. Bottom: wavelet.
Then, the approximation fk (t) of f (t) in Vk , referred to as the approximation at level k, is constructed as fk (t) =
fk,n φ (2−k t − n).
(30)
n
compactly supported. A particularly famous√example1 is one involving four coefficients, h(0) = (1 + 3/8), √ √ h(1) = √ (3 + 3/8), h(2) = (3 − 3/8), and h(3) = (1 − 3/8). The corresponding scaling function and wavelet are shown in Fig. 6. MULTIRESOLUTION ANALYSIS AND DIGITAL FILTERING IMPLEMENTATION Let Vk denote the vector space spanned by the set {φ (2−k t − ) : } integer for an integer k. By virtue of Eq. (11), the vector spaces display the nesting . . . ⊂ V1 ⊂ V0 ⊂ V−1 .
(24)
1
gk (t) = fk−1 (t) − fk (t).
(31)
The detail represents the information lost in going from one level of approximation to the next coarser level and can be represented as a wavelet expansion at dilation 2k as gk (t) =
ak,i ψ2k ,2k i (t),
(32)
i
There are other interesting properties: x(t) ∈ Vk ⇔ x(2t) ∈ Vk−1 ,
For any k, fk (t) is a coarser approximation to the signal f (t) than fk−1 (t). Thus, there is a hierarchy of approximations to the function f (t). As k decreases, we get increasingly finer approximations. Hence, the term multiresolution analysis. The detail function gk (t) at level k is defined as the difference between the approximation at that level and the next finer level
(25)
The Haar wavelet has two coefficients h(0) = 1/2 and h(1) = 1/2.
where the coefficients ak,i are exactly as in Eq. (22). Because ∞ gk (t), (33) fk (t) = j=k+1
1448
WAVELET TRANSFORMS
as can be seen by repeated application of Eq. (31), and f (t) = lim fk (t),
Eqs. (32)–(34) lead to the result in Eq. (21). There is a simple digital filtering scheme available to determine the approximation and detail coefficients, once the approximation coefficients are found at a given resolution. Suppose that we have determined coefficients f0,n at level 0. Then, the next coarser or lower2 level approximation coefficients f1,n are found by passing the sequence f0,n through a digital filter whose impulse ˜ response is given by the sequence 2h(−n) and then ˜ retaining every other sample of the output.3 Here h(n) is the coefficient sequence in the dilation equation (12). Similarly, the detail coefficients a1,n are found by passing the sequence f0,n through a digital filter whose impulse ˜ response is given by the sequence 2g(−n) and once again ˜ retaining every other sample of the filter; we define g(n) as the coefficient sequence in Eq. (19). The block diagram in Fig. 7 illustrates these operations. The circle that has the down arrow represents the downsampling by two operations. Given an input sequence x(n), this block generates the output sequence x(2n) that is, it retains only the even-indexed samples of the input. The combination of filtering and downsampling is called decimation. It is also possible to obtain the approximation coefficients at a finer level from the approximation and detail coefficients at the next lower level of resolution by the digital filtering operation of interpolation, for which the block diagram is shown in Figure 8. The circle that has the up arrow represents upsampling by a factor of 2. For an input sequence x(n), this operation results in the output sequence that has a value zero for odd n and the value x(n/2) for even n. The process of filtering followed by upsampling is called interpolation. In most signal processing applications where the data samples are in discrete time, wavelet decomposition has come to mean the filtering of the input signal at multiple stages of the arrangement in Fig. 7. An Nlevel decomposition uses N such stages yielding one set of approximation coefficients and N sets of detail coefficients. Wavelet reconstruction means the processing of approximation and detail coefficients of a decomposition through multiple stages of the arrangement in Fig. 8. A
~ 2h (−n)
1
~ 2g (−n) 1 Figure 7. Block diagram of decimation.
2
h (n)
0
(34)
k→−∞
0
1
Lower because of the lower resolution in the approximation. Thus, approximation levels go lower as k gets higher. 3 The reader unfamiliar with digital filtering terminology may refer to any of a number of textbooks, for example, Mitra (7).
g (n)
1
Figure 8. Block diagram of interpolation.
block diagram of a two-stage decomposition-reconstruction scheme is shown in Fig. 9. Application to Image Processing Wavelet applications of image processing are based on exploiting the localization properties of the wavelet transform in space and spatial frequency. Noise removal, or what has come to be known as image denoising, is a popular application of the wavelet transform, as is image compression. Other types of two-dimensional (2-D) wavelet constructs are possible, but most applications involve separable wavelet basis functions that are relatively straightforward extensions of 1-D basis functions along the two image axes. For an orthogonal system, these basis functions are ψa (x, y) = ψ(x)φ(y) ψb (x, y) = ψ(y)φ(x) ψc (x, y) = ψ(x)ψ(y).
(35)
The scaling function φd (x, y) = φ(x)φ(y)
(36)
also comes in handy when the wavelet is transformed or decomposed is across a small number of scales. Because images are of finite extent, there are a finite number of coefficients associated with the 2-D wavelet expansion on any dyadic scale. The number of coefficients on a given scale is one-quarter the number of coefficients on the next finer scale. This permits arranging the wavelet coefficients in pyramidal form as shown in Fig. 10. The top left corner of the transform in an N-level decomposition is a projection of the image on φd (x, y) at a dilation of 2N and is called the low-resolution component of the wavelet decomposition. The other coefficients are projections on the wavelets ψa (x, y), ψb (x, y) and ψc (x, y) on various scales and are called the detail coefficients. At each level, the coefficients with respect to these three wavelets are seen in the bottom-left, top-right and bottom-right sections respectively. As can be seen from the figure, the detail coefficients retain edge-related information in the input image. Wavelet expansions converge faster around edges than Fourier or discrete cosine expansions, a fact exploited in compression and denoising applications. Most types of image noise contribute to wavelet expansions principally in the high-frequency detail coefficients. Thus, wavelet transformation followed by a suitable threshold zeroes
WAVELET TRANSFORMS
1449
Decomposition 0
~ 2h(−n) ~ 2g(−n)
1
~ 2h(−n) ~ 2g(−n)
1
g (n)
g (n)
h (n)
h (n) Reconstruction
2
2
Figure 9. Two-level wavelet decomposition and reconstruction.
Original
Reconstruction
Figure 11. Illustration of image reconstruction using a fraction of its wavelet transform coefficients.
out many of these coefficients. A subsequent image reconstruction results in image denoising and minimal edge distrotion. An example is shown in Fig. 11. Wavelets in Image Compression Figure 10. Top: image ‘‘Barbara.’’ Bottom: Its two-level wavelet transform.
Wavelet transforms have made their impact in the world of image compression, and this section provides insight
1450
WEATHER RADAR
into the reason. The fast convergence rate of wavelet expansions is the key to the success of wavelet transforms in image compression. Among known linear transforms, wavelet transforms provide the fastest convergence in the neighborhood of point singularities. Although they do not necessarily provide the fastest convergence, along edges they still converge faster than the discrete Fourier or discrete cosine transforms (DCT) (8). Consequently, good image reconstruction can be obtained by retaining a small number of wavelet coefficients (9). This is demonstrated in Fig. 11 where a reconstruction is performed using only 10% of wavelet coefficients. These coefficients were chosen by sorting them in descending order of magnitudes and retaining the first 10%. This method of reducing the number of coefficients and, consequently, reducing the number of bits used to represent the data is called zonal sampling (10). Zonal sampling is only one component in achieving high compression rates. Quantization and entropy coding are additional components. An examination of the wavelet transform in Fig. 10 reveals vast areas that are close to zero in value especially in the detail coefficients. This is typical of wavelet transforms of most natural scenes and is what wavelet transform-based compression algorithms exploit the most. An example of a compression technique that demonstrates this is the Set Partitioning in Hierarchical Trees algorithm due to Said and Pearlman (10,11). Yet another approach is to be found in the FBI fingerprint image compression standard (12) which uses run-length encoding. Most values in the detail regions can be forced to zero by applying a small threshold. Contiguous sections of zeros can then be coded simply as the number of zeros in that section. This increases the compression ratio because we do not have to reserve a certain number of bits for each coefficient that has a value of zero. For example, suppose that we estimate the maximum number of contiguous zeros ever to appear in the wavelet transform of an image at about 32,000. Then, we can represent the number of zeros in a section of contiguous zeros using a 16-bit binary number. Contrast this with the situation where every coefficient, including those that have a value of zero after thresholding, is individually coded by using binary digits. Then, we would require at least one bit per zero-valued coefficient. Thus, a section of 1,000 contiguous zeros would require 1,000 bits to represent it as opposed to just 16 bits. This approach of coding contiguous sections of zeros is called run-length encoding. Run-length encoding is used as part of the FBI’s wavelet transform-based fingerprint image compression scheme. After wavelet transformation of the fingerprint image, the coefficients are quantized and coefficients close to zero are forced to zero. Run-length encoding followed by entropy coding, using a Huffman code, is performed. Details of the scheme can be found in various articles, for example, (12). The advantages of wavelet transform in compression have made it the basis for the new JPEG-2000 image compression standard. Information regarding this standard may be found elsewhere in the encyclopedia.
BIBLIOGRAPHY 1. A. Grossman and J. Morlet, SIAM J. Math. Anal. 723–736 (1984). 2. J. Morlet, Proc. 51st Annu. Meet. Soc. Exploration Geophys., Los Angeles, 1981. 3. C. W. Helstrom, IEEE Trans. Inf. Theory 12, 81–82 (1966). 4. I. Daubechies, Proc. IEEE 84(4), 510–513 (1996). 5. A. Haar, Math. Annal. 69, 331–371 (1910). 6. I. Daubechies, Ten Lectures on Wavelets, SIAM, Philadelphia, 1992. 7. S. K. Mitra, Digital Signal Processing: A Computer-Based Approach, McGraw-Hill Irwin, Boston, 2001. 8. D. L. Donoho and M. R. Duncan, Proc. SPIE, 4,056, 12–30 (2000). 9. R. M. Rao and A. S. Bopardikar, Wavelet Transforms: Introduction to Theory and Applications, Addison-Wesley Longman, Reading, MA, 1998. 10. N. S. Jayant and P. Noll, Digital Coding of Waveforms: Principles and Applications to Speech and Video, PrenticeHall, Englewood Cliffs, NJ, 1984. 11. A. Said and W. A. Pearlman, IEEE Trans. Circuits Syst. Video Technol. 6(3), 243–250 (1996). 12. C. M. Brislawn, J. N. Bradley, R. J. Onyshczak, and T. Hopper, Proc. SPIE — Int. Soc. Opt. Eng. (USA) 2,847, 344–355 (1996).
WEATHER RADAR ROBERT M. RAUBER University of Illinois at Urbana-Champaign Urbana, IL
This article contains a brief overview of the history of meteorological radars, presents the operating principles of these radars, and explains important applications of radar in the atmospheric sciences. Meteorological radars transmit short pulses of electromagnetic radiation at microwave or radio frequencies and detect energy backscattered toward the radar’s antenna by scattering elements in the atmosphere. Radiation emitted by radar is scattered by water and ice particles, insects, other objects in the path of the beam, and refractive index heterogeneities in air density and humidity. The returned signal is the combination of radiation backscattered toward the radar by each scattering element within the volume illuminated by a radar pulse. Meteorologists use the amplitude, phase, and polarization state of the backscattered energy to deduce the location and intensity of precipitation, the wind speed in the direction of the radar beam, and precipitation characteristics, such as rain versus hail. HISTORICAL OVERVIEW Radar, an acronym for radio detection and ranging, was initially developed to detect aircraft and ships remotely.
WEATHER RADAR
During the late 1930s, military radar applications for aircraft detection were adopted by Britain, Germany, and the United States, but these radars were limited to very low frequencies (0.2–0.4 GHz) and low power output. In 1940, the highly secret British invention of the cavity magnetron permitted radars to operate at higher frequencies (3–10 GHz) and high power output, allowing the Allies of World War II to detect aircraft and ships at long ranges. Studies of atmospheric phenomenon by radar began almost as soon as the first radars were used. These studies were initiated because weather and atmospheric echoes represented undesirable clutter that hampered detection of military targets. Particularly problematic atmospheric echoes were caused by storms and the anomalous propagation of radar beams. Because of extreme security, studies of these phenomena went unpublished until after the war. A key advance during World War II was the development of the theory relating the magnitude of echo intensity and attenuation to the type and size of drops and ice particles illuminated by the radar beam. Theoretical analyses, based on Mie scattering principles, predicted a host of phenomena that were subsequently observed,
1451
such as the bright band at the melting level and the high reflectivity of very large hailstones. The first weather radar equations were published in 1947. Slightly modified, these equations remain today as the foundation of radar meteorology. The first large weather-related field project for a nonmilitary application, the ‘‘Thunderstorm Project,’’ was organized after the war to study coastal and inland thunderstorms. Data from this and other projects stimulated interest in a national network of weather radars. Enthusiasm for a national network was spurred by efforts to estimate precipitation from radar measurements. The discovery of the ‘‘hook echo’’ and its association with a tornado (Fig. 1) led to widespread optimism that tornadoes may be identified by radar. Following the war, several surplus military radars were adapted for weather observation. Beginning in 1957, these were replaced by the Weather Surveillance Radar (WSR-57), which became the backbone of the U.S. National Weather Service radar network until the WSR-88D Doppler radars were installed three decades later. Radar meteorological research in the decades between 1950 and 1970 focused primarily on studies of the physics of precipitation formation, precipitation measurement,
30 km range ring
Thunderstorm echo
Tornado echo
Figure 1. First published photograph of a thunderstorm hook echo observed on a radarscope. The tornado, which was located at the southern end of the hook, occurred just north of Champaign, IL, on 9 April, 1953. (From G. E. Stout and F. A. Huff, Radar records Illinois tornadogenesis. Bulletin of the American Meteorological Society 34, 281–284 (1953). Courtesy of Glenn E. Stout and the American Meteorological Society.)
1452
WEATHER RADAR
storm structure, and severe storm monitoring. The first coordinated research flights through large cyclonic storms were conducted in conjunction with weather radar observations. These studies led to investigations of the physics of the ‘‘bright band,’’ a high reflectivity region in stratiform clouds associated with melting particles, ‘‘generating cells,’’ regions near cloud tops from which streams of ice particles appeared, and other cloud physical processes. Films of radarscopes were first used to document the evolution of storms. Doppler radars, which measure the velocity of scattering elements in the direction of the beam, were first developed. The advent of digital technology, the rapid growth in the number of scientists in the field of radar meteorology, and the availability of research radars to the general meteorological community led to dramatic advances in radar meteorological research beginning around 1970. The fundamental change that led the revolution was the proliferation of microprocessors and computers and associated advances in digital technology. A basic problem that hampers radar scientists is the large volume of information generated by radars. For example, a typical pulsed Doppler radar system samples data at rates as high as three million samples per second. This volume of data is sufficiently large that storage for later analysis, even today, is impractical — the data must be processed in real time to reduce its volume and convert it to useful forms. Beginning in the early 1970s, advances in hardware, data storage technology, digital displays, and software algorithms made it possible to collect, process, store, and view data at a rate equal to the rate of data ingest. A key advance was the development of efficient software to process the data stream from Doppler radars. This development occurred at about the same time that the hardware became available to implement it and led to rapid advances in Doppler measurements. Doppler radars were soon developed whose antennas rotate in azimuth and elevation so that the full hemisphere around the radar could be observed. A network of Doppler radars was installed throughout the United States in the early 1990s to monitor severe weather. Other countries are also using Doppler radars now for storm monitoring. Mobile, airborne, and spaceborne meteorological research radars were developed in the 1980s and 1990s for specialized applications. Scientists currently use these radars and other types of modern meteorological radar systems to study a wide range of meteorological phenomena.
BASIC OPERATING PRINCIPLES OF RADAR Radars transmit brief pulses of microwave energy. Each pulse lasts about 1 microsecond, and pulses are separated by a few milliseconds. The distance r to a target, determined from the time interval t, between transmission of the microwaves and reception of the echo, is given by ct , (1) r= 2 where c is the speed of light.
The pulse repetition frequency determines the maximum unambiguous range across which a radar can detect targets. After a pulse has been transmitted, the radar must wait until echoes from the most distant detectable target of interest return before transmitting the next pulse. Otherwise, echoes of the nth pulse will arrive from distant targets after the (n + 1)-th pulse has been transmitted. The late arriving information from the nth pulse will then be interpreted as echoes of the (n + 1)-th pulse. Echoes from the distant targets will then be folded back into the observable range and will appear as weak elongated echoes close to the radar. Echoes from distant targets that arrive after the transmission of a subsequent pulse are called second-trip or ghost echoes. The maximum unambiguous range rmax , for a radar is given by rmax =
c , 2F
(2)
where F is the pulse repetition frequency. Depending upon the application, the F chosen is generally from 400–1,500 s−1 , leading to rmax between 375 and 100 km. Although a low F is desirable for viewing targets far from the radar, there are other issues that favor a high value for F. Accurate measurement of echo intensity, for example, requires averaging information from a number of pulses from the same volume. The accuracy of the measurement is directly related to the number of samples in the average, so a high F is desired. Using Doppler measurements, F determines the range of velocities observable by the radar. A greater range of velocities can be observed by using a higher F. The four primary parts of a typical pulsed Doppler radar, a transmitter, antenna, receiver, and display, are contained in the simplified block diagram of a Doppler radar shown in Fig. 2. The transmitter section contains a microwave tube that produces power pulses. Two kinds of transmitter tubes, magnetrons and klystrons, are in general use. The magnetron is an oscillator tube in which the frequency is determined mainly from its internal structure. The klystron, illustrated in Fig. 2, is a power amplifier. Its microwave frequency is established by mixing lower frequency signals from a stable lowpower oscillator, termed the STALO, and a coherent local oscillator, termed the COHO. Microwaves are carried from the klystron to the antenna through waveguide plumbing that is designed to minimize energy loss. The transmitter and receiver normally share a single antenna, a task accomplished by using a fast switch called a duplexer that connects the antenna alternately to the transmitter and the receiver. The size and shape of the antenna determine the shape of the microwave beam. Most meteorological radars use circular-parabolic antennas that form the beam into a narrow cone that has a typical halfpower beam width between about 0.8 and 1.5° . Radar antennas typically can be rotated in both azimuth and elevation, so that the entire hemisphere around the radar can be observed. Wind profiler radars employ ‘‘phasedarray’’ antennas, whose beam is scanned by electronic means rather than by moving the antenna. Side lobes, peaks of energy transmission that occur outside the main beam, as shown in Fig. 2, complicate the radiation pattern.
WEATHER RADAR al
Electric field
Side lobes
Pulse
Antenna Half-power beam width Transmitter Duplexer switch
Klystron amplifier
Pulse modulator
Frequency mixer
Frequency mixer
STALO microwave oscillator Amplifier
Display
COHO microwave oscillator
Phase detector
Receiver
Figure 2. Simplified block diagram of a radar showing the key hardware components, the antenna radiation pattern, and a microwave frequency pulse.
Targets that lie off the beam axis may be illuminated by power transmitted into the side lobes. Echoes from these targets cannot be distinguished from echoes from targets in the main lobe. Echoes from side lobes introduce confusion and error into radar observations and are undesirable. Other undesirable echoes occur when the side lobes, or the main lobe, strikes ground targets such as trees. The echoes from these objects, called ground clutter, sometimes make it difficult to interpret meteorological echoes in the same vicinity. When the microwaves strike any object in their path, a small part of the energy is reflected or scattered back toward the antenna. The antenna receives backscattered waves from all scatters in a volume illuminated by a pulse. These waves superimpose to create the received waveform, which passes along the waveguide through the duplexer into the receiver. The power collected by the antenna is small. Whereas a typical peak transmitted power might be a megawatt (106 watts), the typical received power might be only a nanowatt (10−9 watts). The receiver first amplifies the signal and then processes it to determine its amplitude, which is used to calculate the radar reflectivity factor. The radar reflectivity factor is proportional to the sum of the sixth power of the diameter of all of the raindrops in the radar volume and is related to the precipitation intensity. Doppler radar receivers also extract information about the phase of the returned wave, which is used to determine the radial velocity, the velocity of the scatterers in the direction of the radar beam. The radial velocity is related to the component of the wind in the direction of the beam and, when the beam is pointed above the horizontal, to the terminal fall velocity of the particles. Polarization
1453
diversity radars determine polarization information by comparing either the intensity or phase difference between pulses transmitted at different polarization states. This information is used to estimate the shapes and types of particles in clouds. Finally, pulse-to-pulse variations in radial velocity are used to estimate the velocity spectral width, which provides a rough measure of the intensity of turbulence. Weather radars typically operate at microwave frequencies that range from 2.8–35 GHz (wavelength of 10.7–0.86 cm). A few research radars operate at frequencies that exceed 35 GHz, but their use is limited because of extreme attenuation of the beam in clouds. Radar wind profilers operate in ultrahigh frequency (UHF; 0.3–3.0 GHz) and very high frequency (VHF; 0.03–0.3 GHz) bands. Meteorologists typically use radar wavelength rather than frequency because wavelengths can be compared more directly to precipitation particle sizes. This comparison is important because radar-derived meteorological quantities such as the radar reflectivity factor are based on Rayleigh scattering theory, which assumes that particles are small relative to the wavelength. The factors that govern the choice of the wavelength include sensitivity, spatial resolution, the nature of the targets (e.g., thunderstorms, cirrus clouds), the effects of attenuation, as well as equipment size, weight, and cost. For shorter wavelengths, higher sensitivity can be achieved with smaller and cheaper radar systems; however, shorter wavelengths suffer severe attenuation in heavy precipitation, which limits their usefulness. PROPAGATION OF RADAR WAVES THROUGH THE ATMOSPHERE As electromagnetic pulses propagate outward from a radar antenna and return from a target, they pass through air that contains water vapor and may also contain water drops and ice particles. Refraction and absorption of the electromagnetic energy by vapor, water, ice, and air affect the determination of both the location and properties of meteorological targets that comprise the returned signal. The height of a radar beam transmitted at an angle φ above the earth’s surface depends on both the earth’s curvature and the refractive properties of the earth’s atmosphere. Due to the earth’s curvature, a beam propagating away from a radar will progress to ever higher altitudes above the earth’s surface. Refraction acts to oppose the increase in beam altitude. Electromagnetic waves propagate through a vacuum at the speed of light, c = 3 × 108 m s−1 . When these same waves propagate through air or water drops, they no longer propagate at c, but at a slower velocity v, that is related to properties of the medium. The refractive index of air, defined as n = c/v, is related to atmospheric dry air density, water vapor density, and temperature. The value of n varies from approximately 1.003 at sea level to 1.000 at the top of the atmosphere, a consequence of the fact that dry air density and water vapor density decrease rapidly as height increases. In average atmospheric conditions, the
1454
WEATHER RADAR
20 45
20
10 8
6
5
4
3
2
Height (km)
16
1
12 0
8 4
Figure 3. The height of a radar beam above the earth’s surface as a function of range and elevation angle considering earth curvature and standard atmospheric refraction.
0 0
vertical gradient of n is about −4 × 10−8 m−1 . According to Snell’s law of refraction, radar beams that pass through an atmosphere where n decreases with height will bend earthward. To calculate the height of a radar beam, radar meteorologists must consider both the earth’s curvature and the vertical profile of n. Normally, the vertical profile of n is estimated from tables that are based on climatologically average conditions for the radar site. Figure 3 shows examples of beam paths for various values of φ. It is obvious from Fig. 3 that the earth’s curvature dominates refraction under average conditions because the radar beam’s altitude increases as distance from the radar increases. An important consequence of the earth’s curvature is that radars cannot detect storms at long distances because the beam will pass over the distant storm tops. For example, a beam pointed at the horizon (φ = 0° ) in Fig. 3 attains a height of 9.5 km at a 400-km range. Any deviation of a radar beam from the standard paths shown in Fig. 3 is termed anomalous propagation. Severe anomalous propagation can occur in the atmosphere when n decreases very rapidly with height. Ideal conditions for severe refraction of microwaves exists when a cool moist layer of air is found underneath a warm dry layer, and temperature increases with altitude through the boundary between the layers. Under these conditions, which often occur along some coastlines (e.g., the U.S. West Coast in summer), beams transmitted at small φ can bend downward and strike the earth’s surface. In these cases, echoes from the surface appear on the radar display and cause uncertainty in the interpretation of meteorological echoes. Radar waves experience power loss from both energy absorption and scatter. Collectively, radar meteorologists refer to this power loss as attenuation. Although atmospheric gases, cloud droplets, fog droplets, and snow contribute to attenuation, the most serious attenuation is caused by raindrops and hail. Attenuation depends strongly on wavelength; shorter wavelength radars suffer the most serious attenuation. For example, a 3-cmwavelength radar can suffer echo power losses 100 times that of a 10-cm-wavelength radar in heavy precipitation. This is a particular problem when a second storm lies
50
100
150
200
250
300
350
400
Range (km)
along the beam path beyond a closer storm. Two-way attenuation by precipitation in the first storm may make the distant storm appear weak or even invisible on the radar display. Were it not for attenuation, short wavelength radars would be in general use because of their greater sensitivity, superior angular resolution, small size, and low cost. Instead, radars with wavelengths shorter than 5 cm are rarely used. One exception is a class of radars called ‘‘cloud’’ radars, that are typically pointed vertically to study the structure of clouds that pass overhead. Aircraft meteorological radars also employ short wavelengths, despite attenuation. On aircraft, weight and space constraints limit antenna size, whereas beam width constraints required for meteorological measurements using the smaller antenna normally limit useable wavelengths to 3 cm or less. Fortunately, radars whose wavelengths are near 10 cm suffer little from attenuation in nearly all meteorological conditions. For this reason, the wavelength chosen for U.S. National Weather Service WSR-88D radars was 10 cm. In the past, a few research radar systems have been equipped with two radars of different wavelength; two antennas point in the same direction and are mounted on the same pedestal. These dual-wavelength systems have been used for hail detection, improved rainfall measurements, and to understand cloud processes. Dualwavelength techniques depend on the fact that energy of different wavelengths is attenuated differently as it passes through a field of precipitation particles, and is scattered in different ways by both particles and refractive index heterogeneities. Dual-wavelength radars have received less attention since the advent of polarization diversity radars, which provide better techniques for discriminating particle types and estimating rainfall. THE WEATHER RADAR EQUATION AND THE RADAR REFLECTIVITY FACTOR Radar is useful in meteorology because the echo power scattered back to the radar by meteorological targets such as raindrops and snowflakes is related, with some
WEATHER RADAR
caveats, to meteorologically significant quantities such as precipitation intensity. The basis for relating the physical characteristics of the targets to the received echo power is the weather radar equation. The radar range equation for meteorological targets, such as raindrops, is obtained by first determining the radiated power per unit area (the power flux density) incident on the target, next determining the power flux density scattered back toward the radar by the target; and then determining the amount of back-scattered power collected by the antenna. For meteorological targets, the radar range equation is given by Pt G2 λ2 Pr = Vη, (3) 64π 3 r4 where Pr is the average received power, Pt the transmitted power, G the antenna gain, λ the wavelength of transmitted radiation, r the range, V the scattering volume, and η the reflectivity. The volume V illuminated by a pulse is determined from the pulse duration τ , the speed of light, c, the range r, and the angular beam width θ , which is normally taken as the angular distance in radians between the half-power points from the beam center (see Fig. 2). For antennas that have a circular beam pattern, the volume is given by V=
π cτ θ 2 r2 . 8
(4)
Typically the beam width is about 1° , and the pulse duration 1 microsecond, so at a range of 50 km, the scattering volume equals about 108 m3 . In moderate rain, this volume may contain more than 1011 raindrops. The contributions of each scattering element in the volume add in phase to create the returned signal. The returned signal fluctuates from pulse to pulse as the scattering elements move. For this reason, the returned signals from many pulses must be averaged to determine the average received power. The radar cross section σ of a spherical water or ice particle, whose diameter D is small compared to the wavelength λ, is given by the Rayleigh scattering law π5 σ = 4 |K|2 D6 , λ
where |K|2 is a dimensionless factor that depends on the dielectric properties of the particle and is approximately equal to 0.93 for water and 0.18 for ice at radar wavelengths. The radar reflectivity of clouds and precipitation is obtained by summing the cross sections of all of the particles in the scattering volume and is written as π5 (6) η = 4 |K|2 Z. λ The radar reflectivity factor Z is defined as
Z=
D6 , V
to meteorologists because it relates the diameter of the targets (e.g., raindrops), and therefore the raindrop size distribution, to the power received at the radar. The radar equation for meteorological targets is obtained by combining Eqs. (3),(4), and (6) and solving for Z to obtain Z=
512(2 ln 2) π 3c
λ2 Pt τ G2 θ 2
(7)
where the summation is across all of the particles in the scattering volume. The quantity Z is of prime interest
r2 Pr . K2
(8)
The term (2 ln 2) in the numerator was added to account for the fact that most antenna systems are designed for tapered, rather than uniform illumination, to reduce the effects of side lobes. Equation (8) is valid provided that the raindrops and ice particles illuminated by the radar beam satisfy the Rayleigh criterion. For long-wavelength radars (e.g., 10 cm), the Rayleigh criterion holds for all particles except for large hail. However, for shorter wavelength radars, the Rayleigh criterion is sometimes violated. Equation (8) assumes a single value of K. The true value of Z will not be measured for radar volumes that contain water and ice particles or in volumes where water is assumed to exist, but ice is actually present. Equation (8) was derived under the assumption that attenuation can be neglected. As pointed out earlier, this assumption is reasonable only for longer wavelength radars. When one or more of the assumptions used to derive Eq. (8) are invalid, the measured quantity is often termed the equivalent radar reflectivity factor (Ze ). Most problems that have invalid assumptions are minimized by selecting radars with long (e.g., 10 cm) wavelengths. It is customary to use m3 as the unit for volume and to measure particle diameters in millimeters, so that Z has conventional units of mm6 /m3 . Typical values of the radar reflectivity factor range from 10−5 mm6 /m3 to 10 mm6 /m3 in nonprecipitating clouds, 10 to 106 mm6 /m3 in rain, and as high as 107 mm6 /m3 in large hail. Because of the sixth-power weighting on diameter in Eq. (5), raindrops dominate the returned signal in a mixture of rain and cloud droplets. Because Z varies over orders of magnitude, a logarithmic scale, defined as
(5)
1455
dBZ = 10 log10
Z 1 mm6 /m3
,
(9)
is used to display the radar reflectivity factor. Radar images of weather systems commonly seen in the media (e.g., Fig. 4) show the radar reflectivity factor in logarithmic units. Images of the radar reflectivity factor overlain with regional maps permit meteorologists to determine the location and intensity of precipitation. Meteorologists often interchangeably use the terms ‘‘radar reflectivity factor’’ and ‘‘radar reflectivity,’’ although radar experts reserved the term ‘‘reflectivity’’ for η . Radar data are typically collected on a cone formed as the beam is swept through 360° of azimuth at a constant angle of elevation. A series of cones taken at several angles of elevation constitutes a radar volume. Images of the radar reflectivity factor from individual radars are typically projected from the conical surface onto a map-like format called the ‘‘plan-position indicator,’’ or PPI display,
1456
WEATHER RADAR
Figure 4. Plan-position indicator scan of the radar reflectivity factor at 0.5° elevation from the Lincoln, IL, radar on 19 April 1996 showing tornadic thunderstorms moving across the state. The red areas denote the heavy rainfall and hail, and green and blue colors denote lighter precipitation. See color insert.
where the radar is at the center, north at the top, and east at the right. Because of the earth’s curvature, atmospheric refraction, and beam tilt, distant radar echoes on a PPI display are at higher altitudes than those near the radar. Sometimes, a radar beam is swept between the horizon and the zenith at a constant azimuth. In this case, data are plotted in a ‘‘range-height indicator’’ or RHI display, which allows the meteorologist to view a vertical cross section through a storm. Meteorologists broadly characterize precipitation as convective when it originates from storms such as thunderstorms that consist of towering cumulus clouds that have large vertical motions. Convective storms produce locally heavy rain and are characterized by high values of the reflectivity factor. Convective storms typically appear on PPI displays as small cores of very high reflectivity. Weaker reflectivity values typically extend downwind of the convective cores as precipitation particles are swept downstream by the wind. Figure 4 shows an example of the radar reflectivity factor measured by the Lincoln, IL, Doppler radar during an outbreak of tornadic convective storms on 19 April 1996. The red areas in the image, which denote heavy rain, correspond to the convective area of the storm, and the green areas northeast of the convective regions denote the lighter rain downstream of the convective regions. Meteorologists characterize precipitation as stratiform when it originates from clouds that have a layered structure. These clouds have weak vertical motions, lighter
Reflectivity factor(dBZ)
65
55
45
35
25
15
5
precipitation, and generally lower values of the radar reflectivity factor. Echoes from stratiform precipitation in a PPI display appear in Fig. 5, a radar image of a snowstorm over Michigan on 8 January, 1998. Stratiform echoes are generally widespread and weak, although often they do exhibit organization and typically have narrow bands of heavier precipitation embedded in the weaker echo. On RHI scans, convective storms appear as cores of high reflectivity that extend from the surface upward high into the storm. Divergent winds at the top of the storm carry precipitation outward away from the convective region. In strong squall lines, this precipitation can extend 50–100 km behind the convective region, creating a widespread stratiform cloud. These features are all evident in Fig. 6, an RHI scan through a squall line that occurred in Kansas and Oklahoma on 11 June 1985. The radar bright band is a common characteristic often observed in RHI displays of stratiform precipitation. The bright band is a local region of high reflectivity at the melting level (see BB in Fig. 6). As ice particles fall from aloft and approach the melting level, they often aggregate into snowflakes that can reach sizes of a centimeter or more. When these particles first begin to melt, they develop a water coating on the ice surfaces. The reflectivity increases dramatically due to both the larger particle sizes and the change in the dielectric properties of the melting snow (K in Eq. 6). Because of the dependence of Z on the sixth power of the particle diameters, these
WEATHER RADAR
10
20 30 40 50 60 Reflectivity factor(dBZ )
Figure 5. Plan-position indicator scan of the radar reflectivity factor at 0.5° elevation from the Grand Rapids, MI, radar on 8 January, 1998 showing a snowstorm. Bands of heavier snowfall can be seen embedded in the generally weaker echoes. See color insert.
70
Stratiform area BB
−50
−40
−30
−20 7
1457
Convection
Altitude (km) 10
−10
0 16
10 25
20
30 34
40
50
Distance (km)
43
Reflectivity factor (dBZ )
Figure 6. Range-height indicator scan of the radar reflectivity factor taken through a mature squall line on 11 June, 1985 in Texas. The bright band is denoted by the symbol BB in the trailing stratiform region of the storm on the left side of the figure. (Courtesy of Michael Biggerstaff, Texas A & M University, with changes.) See color insert.
large water-coated snowflakes have very high reflectivity. Snowflakes collapse into raindrops, reducing in size as they melt. In addition, their fall speed increases by a factor of 6, so they quickly fall away from the melting layer and reduce particle concentrations. As a result, the reflectivity reduces below the melting level. Therefore, the band of highest reflectivity occurs locally at the melting level (see Fig. 6). The bright band can also appear in PPI displays as a ring of high reflectivity at the range where the beam intersects the melting level (see Fig. 7). Another common feature of stratiform clouds is precipitation streamers, regions of higher reflectivity which begin at cloud top and descend in the cloud. Figure 8 shows an example of a precipitation streamer observed in a snowstorm in Michigan by the NCAR ELDORA airborne radar. Streamers occur when ice particles form in local regions near a cloud top, descend, grow as they fall through the cloud, and are blown downstream by midlevel winds. The PPI and RHI displays are the most common displays used in meteorology. In research applications, radar data are often interpolated to a constant altitude and displayed in a horizontal cross section to visualize a storm’s structure better at a specific height. Similarly, interpolated data can be used to construct vertical cross sections, which appear much like the RHI display. Composite radar images are also constructed by combining reflectivity data from several radars. These composites are
typically projections of data in PPI format onto a single larger map. For this reason, there is ambiguity concerning the altitude of the echoes on composite images. The radar reflectivity factor Z is a general indicator of precipitation intensity. Unfortunately, an exact relationship between Z and the precipitation rate R does not exist. Research has shown that Z and R are approximately related by Eq. (10): Z = aRb , (10) where the coefficient a and the exponent b take different values that depend on the precipitation type. For example, in widespread stratiform rain, a is about 200 and b is 1.6 if R is measured in mm/h and Z is in mm6 /m3 . In general, radar estimates of the short-term precipitation rate at a point can deviate by more than a factor of 2 from surface rain gauge measurements. These differences are due to uncertainties in the values of a and b, radar calibration uncertainties, and other sources of error. Some of these errors are random, so radar estimates of total accumulated rainfall over larger areas and longer times tend to be more accurate. Modern radars use algorithms to display total accumulated rainfall by integrating the rainfall rate, determined from Eq. (10), across selected time periods. For example, Fig. 9 shows the accumulated rainfall during the passage of a weather system over eastern Iowa and Illinois. Radar estimated rainfall exceeded one inch locally near
1458
WEATHER RADAR
05/30/97 02:16:39 SPOL
SUR
3.4 dea 47 # DBZ
processing must be done or assumptions made to extract information about the total wind field. To obtain the Doppler information, the phase information in the echo must be retained. Phase, rather than frequency, is used in Doppler signal processing because of the timescales of the measurements. The period of the Doppler frequency is typically between about 0.1 and 1.0 millisecond. This is much longer than the pulse duration, which is typically about 1 microsecond, so only a fraction of a cycle occurs within the pulse period. Consequently, for meteorological targets, one cannot measure the Doppler frequency by just one transmitted pulse. The Doppler frequency is estimated instead by measuring the phase φ of the echo, at a specific range for each pulse in a train of pulses. Each adjacent pair of sampled phase values of the returned wave, for example, φ1 and φ2 , φ2 and φ3 , etc., can be used to obtain an estimate of the Doppler frequency fd from fd =
−15.0
−5.0
5.0
15.0
25.0
35.0
45.0
Figure 7. Plan-position indicator scan of the radar reflectivity factor through a stratiform cloud. The bright band appears as a ring of strong (red) echoes between the 40 and 50 km range. Melting snowflakes, which have a high radar reflectivity, cause the bright band. (Courtesy of R. Rilling, National Center for Atmospheric Research). See color insert.
Davenport, IA, and Champaign, IL, and other locations received smaller amounts. DOPPLER RADARS Doppler Measurements A Doppler frequency shift occurs in echoes from targets that move along the radar beam. The magnitude and direction of the frequency shift provides information about the targets’ motion along the beam, toward or away from the radar. In meteorological applications, this measurement, termed the radial velocity, is used primarily to estimate winds. The targets’ total motion consists of four components, two horizontal components of air motion, vertical air motion, and the target mean fall velocity in still air. Because only one component is observed, additional
7.5 Height (km)
Figure 8. Range-height indicator scan of the radar reflectivity factor through a snowstorm on 21 January, 1998. The data, taken with the ELDORA radar on the National Center for Atmospheric Research Electra aircraft, show a precipitation streamer, a region of heavier snow that develops near the cloud top and is carried downstream by stronger midlevel winds. See color insert.
(φn+1 − φn )F . 2π
(11)
Conceptually, the average value of fd from a number of pulse pairs determines the final value of the Doppler frequency. In actual signal processing, each phase measurement must be calculated from the returned signal by performing an arctangent calculation. This type of calculation is computationally demanding for radars that collect millions of echo samples each second. In practice, a more computationally efficient technique called the Pulse Pair Processor, which depends on the signal autocorrelation function, is normally used to extract the Doppler frequency. Unfortunately, inversion of the sampled phase values to determine the Doppler frequency and the target radial velocity is not unique. As a result, velocity ambiguity problems exist for Doppler radar systems. In the inversion process, Doppler radars normally use the lowest frequency that fits the observed phase samples to determine the target radial velocity. Using this approach, the maximum unambiguous observable radial velocity vr,max is given by λF |vr,max | = . (12) 4 Doppler frequencies that correspond to velocities higher than |vr,max | are aliased, or folded, back into the observable range. For example, if |vr,max | = 20 m s−1 , then the range of observable velocities will be −20 m s−1 ≤ vr,max ≤ 20 m s−1 ,
Precipitation streamer
5.0 2.5 ELDORA radar 0.0 −10
0 Distance (km)
−15
0
15
30 10
45 (dBZ )
WEATHER RADAR
ILX Storm total precip
1459
4:30 pm CDT Wed Jun 14, 2000 8.0 6.0 5.0 4.0 3.0 2.5 2.0 1.5 1.0 0.8 0.5 0.3 0.2 0.1
Atmospheric Sciences, University of Illinois at Urbana-Champaign
http://www.atmos.uiuc.edu/
0.0
Figure 9. Total precipitation through 4:30 P.M. local time on 14 June, 2000, as measured by the National Weather Service radar at Lincoln, IL, during the passage of a storm system. See color insert.
and a true velocity of 21 m s−1 would be recorded by the radar system as −19 m s−1 . Folding of Doppler velocities is related to a sampling theorem known as the Nyquist criterion, which requires that at least two samples of a sinusoidal signal per cycle be available to determine the frequency of the signal. In a pulse Doppler radar, fd is the signal frequency, and F is the sampling rate. The velocity vr,max , corresponding to the maximum unambiguous Doppler frequency, is commonly called the Nyquist velocity. The Nyquist velocity depends on wavelength (Eq. 12), so that long-wavelength radars (e.g., 10 cm) have a larger range of observable velocities for the same pulse repetition frequency. For example, at F = 1,000 s−1 , the Nyquist velocity will be 25 m s−1 at λ = 10 cm, but only 7.5 m s−1 at λ = 3 cm. Velocities >25 m s−1 commonly occur above the earth’s surface, so velocities recorded by the shorter wavelength radar could potentially be folded multiple times. This is another reason that shorter wavelength radars are rarely used for Doppler measurements. From Eq. (12), it is obvious that long wavelength and high F are preferable for limiting Doppler velocity ambiguity. The choice of a high value of F to mitigate Doppler velocity ambiguity is directly counter to the need for a low value of F to mitigate range ambiguity [Eq. (2)]. Solving for F in Eq. (12) and substituting the result for F in Eq. (2),
we find that rmax vmax =
cλ . 8
(13)
This equation, which shows that the maximum unambiguous range and radial velocity are inversely related, is called the Doppler dilemma because a choice good for one parameter will be a poor choice for the other. Figure 10 shows the limits imposed by the Doppler dilemma for several commonly used wavelengths. There are two ways in common use to avoid the restrictions imposed on range and radial velocity measurements by the Doppler dilemma. The first, which is implemented in the U.S. National Weather Service radars, involves transmitting a series of pulses at a small F, followed by another series at large F. The first set is used to measure the radar reflectivity factor out to longer range, whereas the second set is used to measure radial velocities across a wider Nyquist interval, but at a shorter range. An advantage of this approach is that erroneous velocities introduced from range-folded echoes (i.e., from targets beyond the maximum unambiguous range) during the large F pulse sequence can be identified and removed. This is accomplished by comparing the echoes in the small and large F sequences and deleting all data that appears
Maximum unambiguous velocity (m/s)
1460
WEATHER RADAR
39 36 33 30 27 24 21 18 15 12 9 6 3 0
λ = 10.0 cm
λ = 5.0 cm λ = 3.2 cm λ = 0.86 cm 0
50 100 150 200 Maximum unambiguous range (km)
250
Figure 10. Relationship between the maximum unambiguous range and velocity for Doppler radars of different wavelength (λ).
only in the high F sequence. This approach has the disadvantage of increasing the dwell time (i.e., slowing the antenna rotational rate) because a sufficiently large number of samples is required to determine both the radar reflectivity factor and the radial velocity accurately. A second approach involves transmitting a series of pulses, each at a slightly different value of F, in an alternating sequence. From these measurements, it is possible to calculate the number of times that the velocity measurements have been folded. Algorithms can then unfold and correctly record and display the true radial velocities.
the signal. In the real atmosphere, where billions of drops and ice particles are contained in a single pulse volume, the amplitude of the signal and the Doppler frequency (and therefore the radial velocity) vary from pulse to pulse due to the superimposition of the backscattered waves from each of the particles as they change position relative to each other. The returned signal from each pulse will have a different amplitude and frequency (and radial velocity) which depend on the relative size and position of each of the particles that backscatters the waves. The distribution of velocities obtained from a large number of pulses constitutes the Doppler power spectrum. Figure 11, a Doppler power spectrum, shows the power S returned in each velocity interval v, across the Nyquist interval ±vr,max . The spectrum essentially represents the reflectivity-weighted distribution of particle radial speeds; more echo power appears at those velocities (or frequencies) whose particle reflectivities are greater. Some power is present at all velocities due to noise generated from the electronic components of the radar, the sun, the cosmic background, and other sources. The meteorological signal consists of the peaked part of the spectrum. Full Doppler spectra have been recorded for limited purposes, such as determining the spectrum of fall velocities of precipitation by using a vertically pointing radar or estimating the maximum wind speed in a tornado. However, recording the full Doppler spectrum routinely requires such enormous data storage capacity that, even today, it is impractical. In nearly all applications, the full Doppler spectrum is not recorded. More important to meteorologists are the moments of the spectrum, given by
The Doppler Spectrum
vr,max
Pr =
S(vr )v,
(14)
−vr,max
vr,max
vr =
vr S(vr )v
−vr,max
,
Pr
(15)
0 Power (dB below peak)
The discussion of Doppler radar to this point has not considered the fact that meteorological targets contain many individual scatterers that have a range of radial speeds. The variation of radial speeds is due to a number of factors, including wind shear within the pulse volume (typically associated with winds increasing with altitude), turbulence, and differences in terminal velocities of the large and small raindrops and ice particles. The echoes from meteorological targets contain a spectrum of Doppler frequencies that superimpose to create the received waveform. The way that these waves superimpose changes from pulse to pulse, because, in the intervening time, the particles in the pulse volume change position relative to one another. This can best be understood by considering a volume containing only two raindrops of the same size that lie on the beam axis. Consider that these raindrops are illuminated by a microwave beam whose electrical field oscillates in the form of a sine wave. If the raindrops are separated by an odd number of halfwavelengths, then the round-trip distance of backscattered waves from the two drops will be an even number of wavelengths. The returned waves, when superimposed, will be in phase and increase the amplitude of the signal. On the other hand, consider a case where the two drops are separated by an odd number of quarter-wavelengths. In this case, the superimposed returned waves will be 180° out of phase and will destructively interfere, eliminating
−20
Spectral width Mean velocity
−40 −Vr,max
0 Radial velocity (ms−1)
+Vr,max
Figure 11. An idealized Doppler velocity spectrum. The average returned power is determined from the area under the curve, the mean radial velocity as the reflectivity-weighted average of the velocity in each spectral interval (typically the peak in the curve), and the spectral width as the standard deviation normalized by the mean power [see Eqs. (14)–(16)].
WEATHER RADAR
vr,max
σv =
(vr − vr )2 S(vr )v
−vr,max
Pr
.
(16)
The quantity Pr , the area under the curve in Fig. 11, is the averaged returned power, from which the radar reflectivity factor can be determined by using Eq. (8). The mean radial velocity vr , typically the velocity near the peak in the curve, represents the average motion of the precipitating particles along the radar beam. The spread in velocities is represented by the spectral width σv . The spectral width gives a rough estimate of the turbulence within the pulse volume. Processing techniques used in Doppler radars extract these parameters, which are subsequently displayed and recorded for future use. Doppler Radial Velocity Patterns in PPI Displays Doppler radial velocity patterns appearing in radar displays are complicated by the fact that a radar pulse moves higher above the earth’s surface, as it recedes from the radar. Because of this geometry, radar returns originating from targets near the radar represent the low-level wind field, and returns from distant targets represent winds at higher levels. In a PPI radar display, the distance away from the radar at the center of the display represents both a change in horizontal distance and a change in vertical distance. To determine the wind field at a particular elevation above the radar, radar meteorologists must examine the radial velocities on a ring at a fixed distance from the radar. The exact elevation represented by a particular ring depends on the angle of elevation of the radar beam. Figure 12 shows two examples illustrating the relationship between radial velocity patterns observed on radar images and corresponding atmospheric wind profiles. The examples in Fig. 12 are simulated, rather than real images. Doppler velocity patterns (right) correspond to vertical wind profiles (left), where the wind barbs indicate wind speed and direction from the ground up to 24,000 feet (7,315 m). Each tail on a wind barb represents 10 knots (5 m s−1 ). The direction in which the barb is pointing represents the wind direction. For example, a wind from the south at 30 knots would be represented by an upward pointing barb that has three tails, and a 20-knot wind from the east would be represented by a left pointing barb that has two tails. Negative Doppler velocities (blue-green) are toward the radar and positive (yellow–red) are away. The radar location is at the center of the display. In the top example in Fig. 12, the wind speed increases from 20 to 40 knots (10 to 20 m s−1 ) between zero and 12,000 feet (3,657 m) and then decreases again to 20 knots at 24,000 feet (7,315 m). The wind direction is constant. The radar beam intersects the 12,000-foot level along a ring halfway across the radar display, where the maximum inbound and outbound velocities occur. The bottom panels of Fig. 12 show a case where the wind speed is constant at 40 knots, but the wind direction varies from southerly to westerly between the ground and 24,000 feet. The
1461
innermost rings in the radar display show blue to the south and orange to the north, representing a southerly wind. The outermost rings show blue to the west and orange to the east, representing westerly winds. Intermediate rings show a progressive change from southerly to westerly as one moves outward from the center of the display. In real applications, wind speed and direction vary with height and typically vary across the radar viewing area at any given height. The radial velocity patterns are typically more complicated than the simple patterns illustrated in Fig. 12. Of particular importance to radar meteorologists are radial velocity signatures of tornadoes. When thunderstorms move across the radar viewing area and tornadoes are possible, the average motion of a storm is determined from animations of the radar reflectivity factor or by other means and is subtracted from the measured radial velocities to obtain the storm-relative radial velocity. Images of the storm-relative radial velocity are particularly useful in identifying rotation and strong winds that may indicate severe conditions. Tornadoes are typically less than 1 kilometer wide. When a tornado is present, it is usually small enough that it fits within one or two beam widths. Depending upon the geometry of the beam, the distance of the tornado from the radar, and the location of the beam relative to the tornado, the strong winds of the tornado will typically occupy one or two pixels in a display. Adjacent pixels will have sharply different storm-relative velocities, typically one strong inbound and one strong outbound. Figure 13b shows a small portion of a radar screen located north of a radar (see location in Fig. 13a). Winds in this region are rotating (see Fig. 13c), and the strongest rotation is located close to the center of rotation, as would occur in a tornado. The radial velocity pattern in Fig. 13b is characteristic of a tornado vortex signature. Often, the winds will be so strong in a tornado that the velocities observed by the radar will be folded in the pixel that contains the tornado. Tornado vortex signatures take on slightly different characteristics depending on the position of individual radar beams relative to the tornado and whether or not the velocities are folded. Single Doppler Recovery of Wind Profiles A single Doppler radar provides measurements of the component of target motion along the beam path. At low elevations, the radial velocity is essentially the same as the radial component of the horizontal wind. At high elevation angles, the radial velocity also contains information about the targets’ fall velocity and vertical air motion. Meteorologists want information about the total wind, not just the radial velocity. The simplest method for recovering a vertical profile of the horizontal wind above a radar is a technique called the velocity-azimuth display (VAD), initially named because the wind was estimated from displays of the radial velocity versus azimuth angle at a specific distance from the radar, as the radar scanned through 360° of azimuth. In the VAD technique, the radar antenna is rotated through 360° at a fixed angle of elevation. At a fixed value of range (and elevation above the radar), the sampled volumes lie on a circle centered on the radar.
1462
WEATHER RADAR
Height (kft)
24
12
0 180 225 270 315 360
Height (kft)
24
12
0
0
20
40
60
Wind speed (knots)
80
Wind speed (kt) −51
−37
−24
−10
3
17
30
44
Height (kft)
24
12
0 180 225 270 315 360
Height (kft)
24
12
0
0
20
40
60
Wind speed (knots)
80
Wind speed (kt) −51
−37
−24
−10
3
17
30
44
Figure 12. (Top) Doppler radial velocity pattern (right) corresponding to a vertical wind profile (left) where wind direction is constant and wind speed is 20 knots at the ground and at 24,000 feet altitude and 40 knots at 12,000 feet. Negative Doppler velocities (blues) are toward the radar, which is located at the center of the display. (Bottom) Same as top, except the wind speed is constant and the wind direction varies from southerly at the ground to westerly at 24,000 feet. (Courtesy of R. A. Brown and V. T. Woods, National Oceanic and Atmospheric Administration, with changes). See color insert.
The circles become progressively larger and higher in altitude as range increases, as shown in Fig. 14. The standard convention used for Doppler radars is that approaching radial velocities are negative and receding
velocities are positive. To understand how VAD works, assume initially that the wind is uniform at 10 m s−1 from the northwest across a circle scanned by the radar and that the particles illuminated by the radar along this
WEATHER RADAR (a)
N
Display window
W
E Radar 50 n miles 100 n miles S
Radial velocity pattern in display window
(c) Airflow in display window
(b)
T
1463
to obtain additional information about atmospheric properties, such as vertical air motion. National Weather Service radars used for weather monitoring and forecasting in the United States use the VAD scanning technique routinely to obtain vertical wind profiles every 6 minutes. Figure 15 shows an example of VAD-determined wind profiles from the Lincoln, IL, radar during a tornado outbreak on 19 April 1996. The data, which is typical of an environment supporting tornadic thunderstorms, shows 30-knot southerly low-level winds just above the radar, veering to westerly at 90 knots at 30,000 feet. Profiles such as those in Fig. 15 provide high-resolution data that also allow meteorologists to identify the precise position of fronts, wind shear zones that may be associated with turbulence, jet streams, and other phenomena. The VAD technique works best when echoes completely surround the radar, when the sources of the echoes are clouds that are stratiform rather than convective, and when the echoes are deep. In clear air, winds can be recovered only at lower elevations where the radar may receive echoes from scatterers such as insects and refractive index heterogeneities. Single Doppler Recovery of Wind Fields
Wind speed (knots) −51
−37
−24
−10
3
17
30
44
Figure 13. (a) Location of the 27 × 27 nautical mile radial velocity display window in bottom figures. The window is located 65 nautical miles north of the radar. (b) Radial velocity pattern corresponding to a tornado vortex signature (peak velocity = 60 kt, core radius = 0.5 nautical miles). One of the beams is centered on the circulation center. (c) Wind circulation corresponding to radial velocity pattern. Arrow length is proportional to wind speed, and the curved lines represent the overall wind pattern. (Courtesy of R. A. Brown and V. T. Woods, National Oceanic and Atmospheric Administration, with changes.) See color insert.
circle are falling at 5 m s−1 , as shown in Fig. 14a. Under these conditions, the radial velocity reaches its maximum negative value when the antenna is pointed directly into the wind and reaches its maximum positive value when pointed directly downwind. The radial velocity is negative when the antenna is pointed normal to the wind direction because of the particle fall velocity. When plotted as a function of azimuthal angle, the radial velocity traces out a sine wave (Fig. 14b), the amplitude of the sine wave is a measure of the wind velocity, the phase shift in azimuth from 0° is a measure of the wind direction, and the displacement of the center axis of the sine wave from 0 m s−1 is a measure of the vertical motion of the particles (Fig. 14b–d). In reality, the flow within a circle may not be uniform. However, with some assumptions, the winds and properties of the flow related to the nonuniformity, such as air divergence and deformation, can be estimated by mathematically determining the value of the fundamental harmonics from the plot of radial velocity versus azimuthal angle. These properties are used in research applications
The VAD technique permits meteorologists to obtain only a vertical profile of the winds above the radar. In research, scientists are often interested in obtaining estimates of the total three-dimensional wind fields within a storm. Normally, such wind fields can be obtained only by using two or more Doppler radars that simultaneously view a storm from different directions. However, under certain conditions, it is possible to retrieve estimates of horizontal wind fields in a storm from a single Doppler radar. In the last two decades, a number of methodologies have been developed to retrieve single Doppler wind estimates; two of them are briefly described here. The first method, called objective tracking of radar echo with correlations, or TREC, employs the technique of pattern recognition by crosscorrelating arrays of radar reflectivities measured several minutes apart to determine the translational motion of the echoes. One advantage of the TREC method is that it does not rely on Doppler measurements and therefore can be used by radars that do not have Doppler capability. Typically, fields of radar reflectivity are subdivided into arrays of dimensions from 3–7 km, which for modern radars, consists of 100–500 data points. The wind vector determined from each array is combined with those from other arrays to provide images of the horizontal winds across the radar scan which can be superimposed on the radar reflectivity. Figure 16 shows an example of TRECderived winds from the Charleston, SC, WSR-57 radar, a non-Doppler radar, during the landfall of Hurricane Hugo in 1989. The TREC method in this case captured the salient features of the hurricane circulation, including the cyclonic circulation about the eye, the strongest winds near the eye wall (seen in the superimposed reflectivity field), and a decrease in the magnitude of the winds with distance outside the eyewall. The strongest winds detected, 55–60 m s−1 , were consistent with surface wind speeds observed by other instruments.
1464
WEATHER RADAR
(a)
Zenith Horizontal ring
North
East
H
Vr
V
H
H V
Vr
Vr
V
H West
South
V
Radar
Radial velocity
(b) 15
Radial velocity (Vr)
10
Minimum radial velocity looking into wind
5 0 −5
Maximum radial velocity looking downwind
−10 −15 N
NE
E
SE
S
SW
W
NW
N
Beam direction
Radial velocity
(c) 15 Component of radial velocity due to vertical (V ) motion of precipitation
10 5 0 −5 −10 −15 N
NE
E
SE S SW Beam direction
W
NW
Radial velocity
(d) 15 Figure 14. (a) Geometry for a radar velocity azimuth display scan. The horizontal wind vector is denoted by H, the vertical fall speed of the precipitation by V, and the radial velocity by Vr . (b) Measured radial velocity as a function of azimuthal angle corresponding to H = 10 m s−1 and V = 5 m s−1 . (c) Same as (b), but only for that part of the radial velocity contributed by the fall speed of the precipitation. (d) Same as (b), but only for that part of the radial velocity contributed by the horizontal wind.
N
Component of radial velocity due to horizontal (H ) wind
10 5 0 −5 −10 −15
A second method, called the synthetic dual Doppler (SDD) technique, uses the Doppler radial velocity measurements from two times. This method can be used if the wind field remains nearly steady state in the reference frame of the storm between the times and the storm motion results in a significant change in the radar viewing angle with time. Figure 17 shows a schematic of the SDD geometry for (a) radar-relative and (b) storm-relative coordinates. In radar-relative coordinates, the storm is first viewed at t − t/2, where t is the time separation
N
NE
E
SE
S
SW
W
NW
N
Beam direction
of the two radar volumes used for the SDD analysis and t is the time of the SDD wind retrieval. At some later time, t + t/2, the storm has moved a distance d to a new location, and the radar viewing angle β from the radar to the storm changes. Using radial velocity measurements from these two time periods, an SDD horizontal wind field can be retrieved for an intermediate time period t, when the storm was located at an intermediate distance d/2. The geometry of the storm-relative coordinates in Fig. 17b is identical to a conventional dual-Doppler (i.e., two-radar)
WEATHER RADAR
1465
30 25 Wind speed
20
0 − 20 kts
15
20 − 40 kts
14
40 − 60 kts
KFT MSL
13
60 − 80 kts
12
> 80 kts
11 10 9 8 7 6 5 4 3 2
1 Time
22:51 22:57 23:03 23:09 23:14 23:20 23:26 23:32 23:38 23:44
Figure 15. Wind speed and direction as a function of height and time derived using the velocity-azimuth display (VAD) technique. The data were collected by the Lincoln, IL, National Weather Service radar on 19 April 1996 between 22 : 51 and 23 : 44 Greenwich Mean Time. Long tails on a wind barb represent 10 knots (5 m s−1 ), short tails 5 knots, and flags 50 knots. The direction in which the barb is pointing represents the wind direction. For example, a wind from the north at 20 knots would be represented by downward pointing barb that has two tails, and a 60-knot wind from the west would be represented by a right pointing barb that has a flag and a long tail. See color insert.
Hugo 22 Sept 89
2:04− 2:07
dBZ
Distance north of radar (km)
100
5 15 25 35 45
0
−100
50. m/s −200 −100
0 100 Distance east of radar (km)
200
Figure 16. TREC-determined wind vectors for Hurricane Hugo overlaid on radar reflectivity. A 50 m s−1 reference vector is shown on the lower right. (From J. Tuttle and R. Gall, A single-radar technique for estimating the winds in a tropical cyclone. Bulletin of the American Meteorological Society, 80, 653–688, 1998. Courtesy of John Tuttle and the American Meteorological Society.) See color insert.
system that is viewing a single storm during the same time period. However, when using the SDD technique for a single radar, it necessary to ‘‘shift the position’’ of the radar a distance d/2 for both time periods by using the storm propagation velocity. Figure 17b shows that using data collected at two time periods and shifting the radar position can, in essence, allow a single radar to obtain measurements of a storm from two viewing geometries at an intermediate time and location. Figure 18 shows an example of SDD winds recovered for a vortex that developed over the southern portion of Lake Michigan during a cold air outbreak and moved onshore in Michigan. The forward speed of the vortex has been subtracted from the wind vectors to show the circulation better. The bands of high reflectivity are due to heavy snow. The SDD wind retrieval from the WSR-88D radar at Grand Rapids, MI, clearly shows the vortex circulation and convergence of the wind flows into the radial snowbands extending from the vortex center, which corresponds with the position of a weak reflectivity ‘‘eye.’’ Multiple Doppler Retrieval of 3-D Wind Fields Three-dimensional wind fields in many types of storm systems have been determined during special field campaigns in which two or more Doppler radars have been deployed. In these projects, the scanning techniques
1466
WEATHER RADAR
(a)
Storm at t − ∆t /2 ∆ Radar viewing angle b
d /2 Ra
SDD storm at t
da
rb
as
S ve torm loc ity
eli
ne
d
d /2
Storm at t + ∆t /2
(b)
Radar location at t + ∆t /2 SDD storm at t
d /2
St ve orm loc da ity rb as eli ne d
Ra
The techniques to recover the wind fields from radial velocity measurements from more than one Doppler radar are termed multiple Doppler analysis. Radar data are collected in spherical coordinates (radius, azimuth, elevation). Multiple Doppler analyses are normally done in Cartesian space, particularly because the calculation of derivative quantities, such as divergence of the wind field, and integral quantities, such as air vertical velocity, are required. For this reason, radial velocity and other data are interpolated from spherical to Cartesian coordinates. Data are also edited to remove nonmeteorological echoes such as ground clutter and second-trip echoes, and are unfolded to correct velocity ambiguities discussed previously. The data must also be adjusted spatially to account for storm motion during sampling. The equation relating the radial velocity measured by a radar to the four components of motion of particles in a Cartesian framework is vr = u sin a cos e + v cos a cos e + (w + wt ) sin e
d /2 ∆ Radar viewing angle b Radar location at t − ∆t /2 Figure 17. Schematic diagrams of the SDD geometry for (a) radar-relative and (b) storm-relative coordinates. t is the time separation of the two radar volumes used for the SDD analysis and t is the time of the SDD retrieval. The distance d is analogous to the radar baseline in a conventional dual-Doppler system. The solid circles represent (panel a) the observed storm locations and (panel b) the shifted radar locations. The open circles denote the location of the radar and the SDD retrieved storm position. (Courtesy of N. Laird, Illinois State Water Survey).
are optimized to cover the entire storm circulation from cloud top to the ground in as short a time as feasible.
(17)
where u, v, and w are the west–east, north–south, and vertical components of air motion in the Cartesian system, a and e are the azimuthal and elevation angles of the radar, wt is the mean fall velocity of the particles, and vr is the measured radial velocity. For each radar viewing a specific location in a storm, vr , a, and e are known and u, v, w, and wt are unknown. In principle, four separate measurements are needed to solve for the desired four unknown quantities. In practice, four measurements of radial velocity from different viewing angles are rarely available. Most field campaigns employ two radars, although some have used more. The remaining unknown variables are estimated by applying constraints imposed by mass continuity in air flows, appropriate application of boundary conditions at storm boundaries, estimates of particle fall velocities based on radar reflectivity factors, and additional information available from other sensors.
25
−5 10 M/S 0
Figure 18. Radar reflectivity factor and winds (relative to the forward speed of the vortex) at a 2-km altitude within a vortex over Lake Michigan derived from two radar volumes collected by the Grand Rapids, MI, WSR-88D Doppler radar at 1,023 and 1,123 UTC on 5 December 1997. The winds were derived using the synthetic dual-Doppler technique. (Courtesy of N. Laird, Illinois State Water Survey). See color insert.
5 −25 10
15
−50
20 −75 25 −100 −125
−100
−75
−50
Distance east (km)
−25
0
Reflectivity factor (dBZ )
Distance north (km)
0
WEATHER RADAR
The details of multiple Doppler analysis can become rather involved, but the results are often spectacular and provide exceptional insight into the structure and dynamics of storm circulations. For example, Fig. 19a shows a vertical cross section of the radar reflectivity and winds across a thunderstorm derived from measurements from two Doppler radars located near the storm. The forward speed of the storm has been subtracted from the winds to illustrate the circulations within the storm better. The vertical scale on Fig. 19a is stretched to illustrate the storm structure better. A 15-km wide updraft appears in the center of the storm, and central updraft speeds approach 5 m s−1 . The updraft coincides with the heaviest rainfall, indicated by the high reflectivity region in the center of the storm. Figure 19b shows the horizontal wind speed in the plane of the cross section. The sharp increase in wind speed marks
1467
the position of an advancing front, which is lifting air to the east of the front (right side of the figure), creating the updraft appearing in Fig. 19a. Three-dimensional Doppler wind analyses in severe storms have been used to determine the origin and trajectories of large hailstones, examine the origin of rotation in tornadoes, study the circulations in hurricanes, and investigate the structure of a wide variety of other atmospheric phenomena such as fronts, squall lines, and downbursts. Retrieval of Thermodynamic Parameters from 3-D Doppler Wind Fields Wind in the atmosphere is a response to variations in atmospheric pressure. Pressure variations occur on many scales and for a variety of reasons but are closely tied on larger scales to variations in air temperature. Newton’s second law of motion describes the acceleration of air
(a) 10
8
−5
7
0 5
6
10 15
5
20 4
25 30
3
35 2
Radar reflectivity factor (dBZ)
Height above sea level (km)
9
5 m/s
40 20 m/s
1 0
8
16
24
32
40 48 56 Distance (km)
64
72
80
88
5
9
7 Height above sea level (km)
8
9 11
7
13 6
15 17
5
19 4
21 23
3
25 2
Wind speed in plane of cross section (m/s)
(b) 10
27
1 0
8
16
24
32
40 48 56 Distance (km)
64
72
80
88
Figure 19. Vertical cross sections through a thunderstorm derived from measurements from two Doppler radars. The storm occurred in northeast Kansas on 14 February, 1992. The panels show (a) radar reflectivity (dBZ) and winds (vectors, m s−1 ) in the plane of the cross section. The forward speed of the storm has been subtracted from the wind vectors to illustrate the vertical circulations within the storm; (b) the horizontal wind speed (m s−1 ) in the plane of the cross section. Positive values of the wind speed denote flow from left to right; (c) the perturbation pressure field (millibars) within the storm. The perturbation pressure is the pressure field remaining after the average pressure at each elevation is subtracted from the field. See color insert.
1468
WEATHER RADAR
(c) 10 -1.0
9
Height above sea level (km)
-0.6
7
-0.4
6
-0.2 0.0
5
0.2 4
0.4
3
0.6
2
0.8
Pressure perturbation (millibars)
-0.8 8
1.0 1 0 Figure 19. (continued)
and its relation to atmospheric pressure, thermal fields, the earth’s rotation, and other factors. For atmospheric processes, Newton’s law is expressed in the form of three equations describing the acceleration of the winds as a response to forces in the three cardinal directions. Doppler wind fields and the momentum equations have been used to retrieve perturbation pressure and buoyancy information from derived three-dimensional multiple Doppler wind fields. More recently, retrieval techniques have also been developed that incorporate the thermodynamic equation, which relates temperature variations in air parcels to heating and cooling processes in the atmosphere. These retrieval techniques have extended multiple Doppler radar analyses from purely kinematic descriptions of wind fields in storms to analysis of the dynamic forces that create the wind fields. As an example, the retrieved pressure perturbations associated with the storm in Fig. 19b are shown in Fig. 19c. The primary features of the pressure field include a positive pressure perturbation in the upper part of the storm located just to the left of the primary updraft, a large area of negative pressure perturbation both in and above the sharp wind gradient in the middle and upper part of the storm, a strong positive perturbation at the base of the downdraft to the left of the main updraft and a weak negative pressure perturbation in the low levels ahead of the advancing front. Physically, the pressure perturbations are associated with two physical processes: (1) horizontal accelerations at the leading edge of the front and within the outflow at the top of the updraft, and (2) positive buoyancy in the updraft and negative buoyancy in the downdraft. Analyses such as these help meteorologists understand how storms form and organize and the processes that lead to their structures. POLARIZATION DIVERSITY RADARS As electromagnetic waves propagate away from a radar antenna, the electrical field becomes confined to a plane
8
16
24
32
40
48
56
64
72
80
88
Distance (km)
that is normal to the propagative direction. The orientation of the electrical field vector within this plane determines the wave’s polarization state. For radars, the electrical field vector either lies on a line or traces out an ellipse in this plane, which means that radar waves are polarized. If the electrical field lies on a line, the condition for most meteorological radars, the waves are linearly polarized. A radar wave that propagates toward the horizon is vertically polarized if the electrical field vector oscillates in a direction between the zenith and the earth’s surface, and is horizontally polarized if the vector oscillates in a direction parallel to the earth’s surface. Polarization diversity radars measure echo characteristics at two orthogonal polarizations, typically, horizontal and vertical. This is done either by changing the polarization of successive pulses and/or transmitting one and receiving both polarizations. At this time, polarization diversity radars are used only in meteorological research. However, the U.S. National Weather Service is planning to upgrade its Doppler radar network in the near future to include polarization capability. Polarization diversity radars take advantage of the fact that precipitation particles have different shapes, sizes, orientations, dielectric constants, and number densities. For example, raindrops smaller than about 1 mm in diameter are spherical, but larger raindrops progressively flatten due to air resistance and take on a ‘‘hamburger’’ shape as they become large. Hailstones are typically spherical or conical but may take on more diverse shapes depending on how their growth proceeds. Hailstones sometimes develop a water coating, while growing at subfreezing temperatures, due to heat deposited on the hailstone surface during freezing. The water coating changes the dielectric constant of the hail surface. Small ice crystals typically are oriented horizontally but become randomly oriented as they grow larger. Eventually, individual crystals form loosely packed, low-density snowflakes as they collect one another during fall. When snowflakes fall through the melting level, they develop
WEATHER RADAR
wet surfaces and a corresponding change in dielectric constant. Horizontally and vertically polarized waves are closely aligned with the natural primary axes of falling precipitation particles and therefore are ideal orientations to take advantage of the particle characteristics to identify them remotely. Linearly polarized waves induce strong electrical fields in precipitation particles in the direction of electrical field oscillation and weak fields in the orthogonal direction. For particles that have a large aspect ratio, such as large raindrops, a horizontally polarized wave induces a larger electrical field and subsequently, a larger returned signal than a vertically polarized wave. In general, the two orthogonal fields provide a means of probing particle characteristics in the two orthogonal dimensions. Differences in particle characteristics in these dimensions due to shape or orientation will appear as detectable features in the returned signal. These are backscatter effects related to particles in the radar scattering volume. Propagation effects such as attenuation, which are associated with particles located between the radar and the scattering volume, also differ for the two orthogonal polarizations. Measurement of the differences in propagation at orthogonal polarization provides further information about the characteristics of particles along the beam path. There are six backscatter variables and four propagation variables that carry meaningful information provided by polarization diversity radars that employ linear polarization. Other variables are derived from these basic quantities. The most important backscatter variables are (1) the reflectivity factor Z for horizontal polarization; (2) the differential reflectivity ZDR , which is the ratio of reflected power at horizontal and vertical polarization; (3) the linear depolarization ratio (LDR), which is the ratio of the cross-polar power (transmitted horizontally, received vertically) to the copolar power (transmitted and received horizontally); (4) the complex correlation coefficient ρhv ejδ between copolar horizontally and vertically polarized echo signals; and (5) the phase of the correlation coefficient δ, which is the difference in phase between the horizontally and vertically polarized field caused by backscattering. √ In the expression for the correlation coefficient, j = −1. Propagative effects that influence polarization measurements include (1) attenuation of the horizontally polarized signal, (2) attenuation of the vertically polarized signal, (3) depolarization, and (4) the differential phase shift DP . A differential phase shift, or lag, occurs in rain because horizontally polarized waves propagate more slowly than vertically polarized waves. This occurs because larger raindrops are oblate and present a larger cross section to horizontally polarized waves. The variable of most interest is the specific differential phase KDP , which is the range derivative of DP . The specific differential phase, it has been shown, is an excellent indicator of liquid water content and rain rate and may be superior to rain rates derived from standard Z–R relationships based on Eq. (10). An example of the capabilities of polarimetric radars in identifying precipitation type appears in Fig. 20. This figure shows a cross section of (a) Z, the radar
1469
reflectivity factor for horizontally polarized radiation; (b) ZDR , the differential reflectivity; and (c) a particle classification based on polarization variables. Note in panel (c) that the hail shaft, in yellow, has high Z and low ZDR , and the rain to the right of the hail shaft has low Z and high ZDR . These variables, combined with other polarimetric measurements, allow meteorologists to estimate the other types of particles that populate a storm. For example, the upper part of the storm in Fig. 20 contained hail and graupel (smaller, softer ice spheres), and the ‘‘anvil’’ top of the storm extending to the right of the diagram was composed of dry snow and irregular ice crystals. Identifying boundaries between different particle types uses techniques invoking ‘‘fuzzy logic’’ decisions, which take into account the fact that there is overlap between various polarization parameters for various precipitation types. The use of KDP for precipitation measurements is especially promising. The major advantages of KDP over Z is that KDP is independent of receiver and transmitter calibrations, unaffected by attenuation, less affected by beam blockage, unbiased by ground clutter cancelers, less sensitive to variations in the distributions of drops, biased little by the presence of hail, and can be used to detect anomalous propagation. For these reasons, research efforts using polarization diversity radars have focused particularly on verification measurements of rainfall using KDP . Quantitative predictions of snowfall may also be possible using polarization diversity radars, but this aspect of precipitation measurement has received less attention in meteorological research because of the greater importance of severe thunderstorms and flash flood forecasting. WIND PROFILING RADARS A wind profiling radar, or wind profiler, is a Doppler radar used to measure winds above the radar site, typically to a height of about 15 km above the earth’s surface. Wind profilers are low-power, high-sensitivity radars that operate best in clear air conditions. Profilers, which operate at UHF and VHF frequencies, detect fluctuations in the index of refraction at half the radar wavelength caused by fluctuations in the radio refractive index. These fluctuations arise from variations in air density and moisture content primarily from turbulence. Radar meteorologists assume that the fluctuations in the radio refractive index are carried along with the mean wind, and therefore the Doppler frequency shift for the motion of the scattering elements can be used to estimate the wind. Wind profilers use fixed phased array antennas. In the 404.37-MHz (74-cm wavelength) profilers used by the U.S. National Weather Service, an antenna is made up of a 13 × 13 meter grid of coaxial cables, and the antenna itself consists of many individual radiating elements, each similar to a standard dipole antenna. If a transmitted pulse arrives at each of these elements at the same time (in phase), a beam propagates away from the antenna vertically. If the pulses arrive at rows of elements at slightly different times (out of phase), a beam propagates upward at an angle to the zenith. The phasing is controlled
1470
WEATHER RADAR 06/13/97 01:39:06 SPOL
RHI 118.0 dea
1# DBZ
12.0
4.0
30.0 −27.0 06/13/97 01:39:06 SPOL
50.0 −13.0 RHI 118.0 dea
70.0
1.0
15.0
29.0
90.0 43.0
57.0
1# ZDR
12.0
4.0
30.0 −3.0 06/13/97 01:39:06 SPOL
50.0 2.0 RHI 118.0 dea
70.0
−1.0
0.0
1.0
90.0 2.0
3.0
1# PD
12.0
4.0
30.0
50.0
70.0
90.0
Ground clutter
Birds
Insects
Supercooled liquid water droplets
Irregular ice crystals
Ice crystals
Wet snow
Dry snow
Graupel/Rain
Graupel/Small hail
Rain/Hail
Hail
Heavy rain
Moderate rain
Light rain
Drizzle
Cloud
Figure 20. Range-height indicator scans of (top) radar reflectivity factor for horizontal polarization Z; (middle) differential reflectivity ZDR ; (bottom) particle classification results based on analysis of all polarimetric parameters. (From J. Vivekanandan, D. S. Zrnic, S. M. Ellis, R. Oye, A. V. Ryzhkov, and J. Straka, Cloud microphysics retrieval using S band dual-polarization radar measurements. Bulletin of the American Meteorological Society 80, 381–388 (1999). Courtesy of J. Vivekanandan and the American Meteorological Society.) See color insert.
by changing the feed cable lengths. Profilers typically use a three-beam pattern that has a vertically pointing beam and beams pointing in orthogonal directions (e.g., north and east). In the absence of precipitation, the radial velocities measured by a profiler along each beam are vre = u cos e + w sin e,
(18)
vrn = v cos e + w sin e,
(19)
vrv = w,
(20)
where vre , vrn , and vrv are the radial velocities measured by the beams in the east, north, and vertical pointing positions; u, v, and w are the wind components in the west–east, south–north, and upward directions; and e is the angle of elevation of the east and north beams above
the earth’s surface. Because the vertical beam measures w directly and e and the radial velocities are measured, these equations can be solved for u and v, which can be easily converted to wind speed and direction. Profilers are pulsed radars, so the round-trip travel time of the pulse measures the height of the wind. Precipitation particles also scatter energy and therefore contribute to the measurement of the radial velocity. When precipitation occurs, w in Eqs. (18)–(20) would be replaced by w + wt , where wt is the terminal fall velocity of the precipitation particles. Problems arise if all three beams are not filled by the same size precipitation particles. In this case, the winds may not be recoverable. The radars also require a recovery time after the pulse is transmitted before accurate data can be received by the transmitter; so information in the lowest 500 meters of the atmosphere is typically not recoverable.
WEATHER RADAR
Wind profiles can be obtained at high time resolution, often at times as short as 6 minutes, and vertical resolution of the order of 250 meters. The 6-minute time interval, compared with the standard 12-hour time interval for standard weather balloon launch wind measurements, represents a dramatic increase in a meteorologist’s capability of sampling upper atmospheric winds. Wind profilers are used for a wide range of research applications in meteorology, often in combination with other instruments, such as microwave radiometers and acoustic sounding devices that remotely measure moisture and temperature profiles. The data from profilers are presented in a format essentially identical to Fig. 15. Currently, the U.S. National Oceanic and Atmospheric Administration operates a network of 30 profilers called the Wind Profiler Demonstration Network. Most of these profilers are located in the central United States and are used for weather monitoring by the National Weather Service and in forecasting applications such as the initialization and verification of numerical weather forecasting models. MOBILE RADAR SYSTEMS Mobile radar systems consist of three distinct classes of instruments: rapidly deployable ground based radars, airborne radars, and satellite-borne radars. Each of these radar systems is well-suited to address particular research problems that are either difficult or impossible to carry out from fixed ground-based systems. Rapidly deployable ground-based radars are used to study tornadoes, land-falling hurricanes, and other atmospheric phenomena that have small space and timescales and that must be sampled at close range. These phenomena are unlikely to occur close to a fixed radar network, particularly during relatively short research field campaigns. At this time, six mobile radars are available to the meteorological research community: two truckmounted 3-cm wavelength radars called the ‘‘Doppler on Wheels,’’ or DOWs, operated jointly by the University of Oklahoma and the National Center for Atmospheric Research; two truck-mounted 5-cm wavelength radars called the Shared Mobile Atmospheric Research and Teaching (SMART) radars operated by the National Severe Storms Laboratory, Texas A&M University, Texas Tech University, and the University of Oklahoma; a 3-mm wavelength truck-mounted Doppler system operated by the University of Massachusetts; and a 3-mm wavelength trailer-mounted Doppler radar operated by the University of Miami. The DOW and SMART radars are used in dual-Doppler arrangements to measure wind fields near and within tornadoes and hurricanes. The millimeter wavelength radars have been used to study tornadoes and cloud and precipitation processes. Figure 21 shows a DOW image of the radar reflectivity and radial velocity of a tornado near Scottsbluff, Nebraska. The tornado is located at the position of the tight inbound/outbound radial velocity couplet near the center of the image in the left panel. The reflectivity factor in the right panel of the figure shows a donut-shaped reflectivity region that has a minimum in reflectivity at
1471
the tornado center. It is thought that this occurs because debris is centrifuged outward from the tornado center. In a tornado, the backscattered energy comes primarily from debris. Although great progress has been made using groundbased Doppler radars to study storm structure, remote storms such as oceanic cyclones and hurricanes cannot be observed by these systems. In addition, small-scale phenomena such as tornadoes rarely occur close enough to special fixed dual-Doppler networks so that detailed data, aside from new data obtained by the DOWs, are hard to obtain. Airborne meteorological radars provide a means to measure the structure and dynamics of these difficult to observe weather systems. Currently, three research aircraft flown by the meteorological community have scanning Doppler radars, two P-3 aircraft operated by the National Oceanic and Atmospheric Administration and a Lockheed Electra aircraft operated by the National Science Foundation through the National Center for Atmospheric Research (NCAR). The scanning technique used by these radars to map wind fields is shown in Fig. 22. In the P-3 aircraft, a single radar uses an antenna designed to point alternately at angles fore and then aft of the aircraft. In the Electra, two radars are used; the first points fore, and the second points aft. In both cases, the scan pattern consists of an array of beams that cross at an angle sufficient to sample both components of the horizontal wind. Therefore, the data from the fore and aft beams can be used as a dual-Doppler set that permits recovery of wind fields. Three other aircraft, an ER-2 high-altitude aircraft and a DC-8 operated by NASA, and a King Air operated by the University of Wyoming also have radars used for meteorological research. Airborne radars have significant limitations imposed by weight, antenna size, and electrical power requirements. Aircraft flight capabilities in adverse weather, stability in turbulence, and speed all impact the quality of the measurements. The aircraft’s precise location, instantaneous three-dimensional orientation, and beam pointing angle must be known accurately to position each sample in space. At altitudes other than the flight altitude, the measured radial velocity contains a component of the particle fall velocity, which must be accounted for in processing data. This component becomes progressively more significant as the beam rotates toward the ground and zenith. A time lag exists between the forward and aft observations during which the storm evolution will degrade the accuracy of the wind recovery. Short wavelengths (e.g., 3 cm) must be used because of small antenna and narrow beam-width requirements that make attenuation and radial velocity folding concerns. Despite these limitations, airborne Doppler radars have provided unique data sets and significant new insight into storm structure. For example, Fig. 23 shows an RHI scan of the radar reflectivity factor through a tornadic thunderstorm near Friona, TX, on 2 June 1995 measured by the ELDORA radar on the NCAR Electra aircraft. The data, which was collected during the Verification of the Origins of Rotations in Tornadoes experiment (VORTEX), shows a minimum in the reflectivity of the Friona tornado extending from the ground to the cloud top, the first
1472
WEATHER RADAR
05/21/98 00:45:48 DOW3 ROT PPI 2.0 dea 49# VJ
05/21/98 00:45:48 DOW3 ROT PPI 2.0 dea 49# DZ
−9.0
−9.0
−11.0
−11.0
−13.0
−13.0
−15.0
−15.0
−17.0
−17.0 −1.0
1.0
3.0
−24.0 −16.0 −8.0
0.0
5.0 8.0
16.0
7.0
−1.0
24.0
12.0
1.0 16.0
3.0 20.0
24.0
5.0 28.0
7.0 32.0
36.0
Figure 21. Image of radial velocity (m s−1 ; left) and reflectivity factor (dBZ; right) taken by a Doppler on Wheels radar located near a tornado in Scottsbluff, Nebraska, on 21 May 1998. Blue colors denote inbound velocities in the radial velocity image. (Courtesy of J. Wurman, University of Oklahoma.) See color insert.
(a)
(b)
Beam tilt t qr
Beam rotation
ack
ht tr
Flig
Beam tilt
Figure 22. (a) ELDORA/ASTRAIA airborne radar scan technique showing the dual-radar beams tilted fore and aft of the plane normal to the fuselage. The antennas and radome protective covering rotate as a unit about an axis parallel to the longitudinal axis of the aircraft. (b) Sampling of a storm by the radar. The flight track past a hypothetical storm is shown. Data are taken from the fore and aft beams to form an analysis of the velocity and radar reflectivity field on planes through the storm. The radial velocities at beam intersections are used to derive the two-dimensional wind field on the analysis planes. (From P. H. Hildebrand et al., The ELDORA-ASTRAIA airborne Doppler weather radar: High-resolution observations from TOGA-COARE. Bulletin of the American Meteorological Society 77, 213–232 (1996) Courtesy of American Meteorological Society.)
time such a feature has ever been documented in a tornadic storm. Space radars include altimeters, scatterometers, imaging radars, and most recently, a precipitation radar whose
measurement capabilities are similar to other radars described in this article. Altimeters, scatterometers and imaging radars are used primarily to determine properties of the earth’s surface, such as surface wave height,
WEATHER RADAR
0
5 −15
10 0
15 15
20 30
25
1473
Range (km) 45
Reflectivity factor (dBZ )
Figure 23. RHI cross section of the radar reflectivity factor through a severe thunderstorm and tornado near Friona, TX, on 2 June 1995 measured by the ELDORA radar on board the National Center for Atmospheric Research Electra aircraft. The data was collected during VORTEX, the Verification of the Origins of Rotations in Tornadoes experiment. (From R. M. Wakimoto, W. -C. Lee, H. B. Bluestein, C. -H. Liu, and P. H. Hildebrand, ELDORA observations during VORTEX 95. Bulletin of the American Meteorological Society 77, 1,465–1,481 (1996). Courtesy of R. Wakimoto and the American Meteorological Society.) See color insert.
the location of ocean currents, eddies and other circulation features, soil moisture, snow cover, and sea ice distribution. These parameters are all important to meteorologists because the information can be used to initialize numerical forecast and climate models. Surface winds at sea can also be deduced because short gravity and capillary waves on the ocean surface respond rapidly to the local near-instantaneous wind and the character of these waves can be deduced from scatterometers. The first precipitation radar flown in space was launched aboard the Tropical Rainfall Measuring Mission (TRMM) satellite in November 1997. The TRMM radar, jointly funded by Japan and the United States, is designed to obtain data concerning the three-dimensional structure of rainfall over the tropics where ground-based and oceanbased radar measurements of precipitation are almost nonexistent. Figure 24, for example, shows the reflectivity factor measured in Hurricane Mitch over the Caribbean during October 1998. A unique feature of the precipitation radar is its ability to measure rain over land, where passive microwave instruments have more difficulty. The data are being used in conjunction with other instruments on the satellite to examine the atmospheric energy budget of the tropics. The radar uses a phased array antenna that operates at a frequency of 13.8 GHz. It has a horizontal resolution at the ground of about 4 km and a swath width of 220 km. The radar measures vertical profiles of rain and snow from the surface to a height of about 20 kilometers at a vertical resolution of 250 m and can detect rain rates as low as 0.7 millimeters per hour. The radar echo of the precipitation radar consists of three components: echoes due to rain; echoes from the surface; and mirror image echoes, rain echoes received through double reflection at the surface. At intense rain rates, where the attenuation effects can be strong, new methods of data processing using these echoes have been developed to correct for attenuation. The Precipitation Radar is the first spaceborne instrument that provides threedimensional views of storm structure. The measurements are currently being analyzed and are expected to yield
Figure 24. Radar reflectivity factor within Hurricane Mitch in 1998 over the Caribbean Sea measured by the Precipitation Radar on board the Tropical Rainfall Measuring System (TRMM) satellite in October 1998. The viewing swath is 215 kilometers wide. (Courtesy of the National Aeronautics and Space Administration.) See color insert.
invaluable information on the intensity and distribution of rain, rain type, storm depth, and the height of the melting level across tropical latitudes. FUTURE DEVELOPMENTS A number of advancements are on the horizon in radar meteorology. Scientists now know that polarimetric radars have the potential to obtain superior estimates of rainfall compared to nonpolarimetric radars, and give far better information concerning the presence of hail. Because of these capabilities, polarization diversity should eventually become incorporated into the suite of radars used by the U.S. National Weather Service for weather monitoring and severe weather warning. Experimental bistatic radars, systems that have one transmitter but
1474
WEATHER RADAR
many receivers distributed over a wide geographic area, were first developed for meteorological radars in the 1990s and are currently being tested as a less expensive way to retrieve wind fields in storms. Bistatic radar receivers provide a means to obtain wind fields by using a single Doppler radar. Each of the additional receivers measures the pulse-to-pulse phase change of the Doppler shift, from which they determine the wind component toward or away from the receiver. Because these receivers view a storm from different directions, all wind components are measured, making it possible to retrieve wind fields within a storm similarly to that currently done by using two or more Doppler radars. Future networks of bistatic radars may make it possible to create images of detailed wind fields within storms in near-real time, providing forecasters with a powerful tool to determine storm structure and severity. At present, the operational network of wind profilers in the United States is limited primarily to the central United States. Eventual expansion of this network will provide very high temporal monitoring of the winds, leading to more accurate initialization of numerical weather prediction models and ultimately, better forecasts. Mobile radars are being developed that operate at different wavelengths and may eventually have polarization capability. One of the biggest limitations in storm research is the relatively slow speed at which storms must be scanned. A complete volume scan by a radar, for example, typically takes about 6 minutes. Techniques developed for military applications are currently being examined to reduce this time by using phased array antennas that can scan a number of beams simultaneously at different elevations. Using these new techniques, future radars will enhance scientists’ capability to understand and predict a wide variety of weather phenomena. ABBREVIATIONS AND ACRONYMS COHO DPW ELDORA LDR NASA NCAR
coherent local oscillator doppler on wheels electra doppler radar linear depolarization ratio national aeronautics and space administration national center for atmospheric research
PPI RHI SDD SMART STALO TREC TRMM UHF VAD VHF VORTEX WRS-57 WSR-88D
plan position indicator range-height indicator synthetic dual doppler shared mobile atmospheric research and teaching stable local oscillator objective tracking of radar echo with correlations tropical rainfall measuring mission ultra high frequency velocity azimuth display very high frequency verification of the origins of rotations in storms experiment weather surveillance radar 1957 weather surveillance radar 1988-doppler
BIBLIOGRAPHY 1. D. Atlas, ed., Radar in Meteorology, American Meteorological Society, Boston, 1990. 2. L. J. Battan, Radar Observation of the Atmosphere, University of Chicago Press, Chicago, 1973. 3. T. D. Crum, R. E. Saffle, and J. W. Wilson, Weather and Forecasting 13, 253–262 (1998). 4. R. J. Doviak and D. S. Zrni´c, Doppler Radar and Weather Observations, 2nd ed., Academic Press, San Diego, CA, 1993. 5. T. Matejka and R. C. Srivastava, J. Atmos. Oceanic Technol. 8, 453–466 (1991). 6. R. Reinhart, Radar for Meteorologists, Reinhart, Grand Forks, ND, 1997. 7. F. Roux, Mon. Weather Rev. 113, 2,142–2,157 (1985). 8. M. A. Shapiro, T. Hample, and D. W. Van De Kamp, Mon. Weather Rev. 112, 1,263–1,266 (1984). 9. M. Skolnik, Introduction to Radar Systems, 2nd ed., McGrawHill, NY, 1980. 10. J. Tuttle and R. Gall, Bull. Am. Meteorol. Soc. 79, 653–668 (1998). 11. R. M. Wakimoto et al., Bull. Am. Meteorol. Soc. 77, 1,465– 1,481 (1996). 12. B. L. Weber et al., J. Atmos. Oceanic Technol. 7, 909–918 (1990). 13. J. Vivekanandan et al., Bull. Am. Meteorol. Soc. 80, 381–388 (1999). 14. D. S. Zrnic and A. V. Ryzhkov, Bull. Am. Meteorol. Soc. 80, 389–406 (1999).
X X-RAY FLUORESCENCE IMAGING
the K, L, . . . absorption edges of the atom, or in terms of wavelength, λK , λL , . . ., respectively (2–4). The absorption at/near E ∼ = Wγ is different from the common phenomena of absorption, because the absorptive cross section and the linear absorption coefficient increase abruptly at absorption edges. This phenomenon is the so-called anomalous dispersion due to its resonant nature (2,3). In normal absorption, the intensity of a transmitted X-ray beam through a material is attenuated exponentially as the material increases in thickness according to a normal linear absorption coefficient µ. Usually, the normal linear absorption coefficient µ is related to the photoelectric absorption cross section of the electron σe by µ = σe n0 ρ, where n0 is the number of electrons per unit volume in the material and ρ is the mass density. The atomic absorption coefficient is µa = (A/NO )(µ/ρ), where A is the atomic weight of the element in question and NO is Avogadro’s number. For λ = λγ (i.e., E = Wγ ), µa is approximately proportional to λ3 and Z4 , according to quantum mechanical considerations (5,6), Z is the atomic number. When E increases (i.e., λ decreases), µa decreases according to λ3 . When E = Wγ (i.e., λ = λγ ), µa increases abruptly because X rays are absorbed in the process of ejecting γ electrons. For E > Wγ , the absorption resumes a decreasing trend as λ3 (2,3). According to the electron configuration of atoms, the K, L, M electrons, etc. are specified by the principal quantum number n, orbital angular quantum number , magnetic quantum number ml , and spin quantum number ms , or by n, , j, m, where j = ± ms , m = ±j (3–6). Following the common notations used, the s, p, d, f, . . . subshells and the quantum numbers n, , j, m are also employed to indicate the absorption edges or to relate to the ejection of electrons in the subshell. For example, there are three absorption edges for L electrons, LI ,LII , LIII , corresponding to the three energy levels (states) specified by n, , and j: two electrons in (2, 0, 1/2), two electrons in (2, 1, 1/2), and four electrons in (2, 1, 3/2). The LI state involves 2s electrons, and LII and LIII involve six 2p electrons. The difference between LII and LIII is that j < for the former and j > for the latter. X-ray emission, the other process, involves the allowed transition of an atom from a high-energy state to a lower one. Usually, a high-energy state has a vacancy in the tightly bound inner shell. During the transition, the vacancy in the inner shell is filled with an electron coming from the outer shell. The energy of the fluorescence emitted is the energy difference between the two states involved. According to quantum mechanics (5–7), the allowed transition, the transition whose probability is high, is governed by selection rules. In the electric dipole approximation (5–7), the changes in quantum numbers in going from one state to the other follow the conditions = ±1 and j = 0 or ±1. For example, in the transition from a K state to an L state, the atom can only go from the K state to either the LII or LIII state. The X-rays emitted are the familiar Kα2 and Kα1 radiation, respectively. The
SHIH-LIN CHANG National Tsing Hua University Hsinchu, Taiwan Synchrotron Radiation Research Center Hsinchu, Taiwan
INTRODUCTION In Wilhelm Conrad R¨ontgen’s cathode-ray-tube experiments, the lighting up of a screen made from barium platinocyanide crystals due to fluorescence led to his discovery of X rays in 1895. Subsequent investigations by R¨ontgen and by Straubel and Winkelmann showed that the radiation that emanated from a fluorspar, or fluorite crystal CaF2 excited by X rays is more absorbing than the incident X rays. This so-called fluorspar radiation is X-ray fluorescence (1). Nowadays, it is known that X-ray fluorescence is one of the by-products of the interaction of electromagnetic (EM) waves with the atoms in matter in ˚ During the interaction, the the X-ray regime (0.1–500 A). incident X rays have sufficient energy to eject an inner electron from an atom. The energy of the atom is raised by an amount equal to the work done in ejecting the electron. Therefore, the atom is excited to states of higher energies by an incident EM wave. The tendency to have lower energy brings the excited atom back to its stable initial states (of lower energies) by recapturing an electron. The excess energy, the energy difference between the excited and the stable state, is then released from the atom in the form of EM radiation, or photons — this is fluorescence (2–4). The elapsed time between excitation and emission is only about 10−16 second. The energy of X-ray fluorescence is lower than that of the incident radiation. In general, the distribution of the X-ray fluorescence generated is directionally isotropic; the fluorescence is radiated in all directions at energies characteristic of the emitting atoms. Thus, X-ray fluorescence is elementspecific. X-ray fluorescence is a two-step process; X-ray absorption (the ejection of an inner electron) and X-ray emission (the recapture of an electron) (4). Both involve the electron configuration of atoms in the irradiated material. To eject an electron, the atom needs to absorb sufficient energy from the incident X rays of energy E and E ≥ Wγ , where Wγ is the energy required to remove an electron from the γ shell. The γ shell corresponds to K, L, M electrons. An atom is said to be in the K quantum state if a K electron is ejected. It is similar for the L and M quantum states. Because a K electron is more tightly bound to the nucleus than the L and M electrons, the energy of the K quantum state is higher than that of the L quantum state. The energies corresponding to the work WK , WL , WM , done by the EM wave to remove K, L, M electrons are called 1475
1476
X-RAY FLUORESCENCE IMAGING
transition from the K state to the LI state is forbidden by selection rules. Moreover, an initial double ionization of an atom can also result in the emission of X rays. For example, the Kα3,4 emission is the satellite spectrum produced by a transition from a KL state to an LL state, where the KL state means that the absorbed X-ray energy is sufficient to eject a K and an L electron (3,4). The absorption and emission of X-ray photons by atoms in fluorescent processes can be rigorously described by using the quantum theory of radiation (7). The vector r, t) that represents the electromagnetic potential A( radiation (8) is usually expressed as a function of the position vector r and time t (5–7): 1/2 2 ¯ r, t) = c N h εˆ ei(k∗r−ωt) A( ωV 1/2 2 c (N + 1)h ¯ + εˆ e−i(k∗r−ωt) , (1) ωV where the first term is related to the absorption of a photon quantum and the second term to the emission of a photon quantum by an electron. N is the photon occupation number in the initial state, εˆ is the unit and ω are the wave vector vector of polarization, and k and angular frequency of the EM wave. c, h ¯ , and V are the speed of light in vacuum, Planck’s constant h/2π , and the volume irradiated, respectively. The absorption probability, the transition probability, and the differential cross section can be calculated quantum mechanically by considering the fluorescence as the scattering of photons by atomic electrons (4–7). Before scattering, the photon, angular frequency is ω, and whose wavevector is k, polarization vector is εˆ , is incident on the atom at its initial state A. After scattering, the atom is left in its final state B, and the scattered photon of the polarization vector εˆ and angular frequency ω propagates along the vector. In between are the excited intermediate states. k The Kramers–Heisenberg formula, modified to include the effects of radiation damping, gives the differential cross section, the derivative of the cross section σ with respect to the solid angle , as follows (7): ω p ∗ εˆ )BI ( p ∗ εˆ )IA 1 dσ ( = r2O εˆ ∗ εˆ δAB − I d ω m I EI − EA − h ¯ω−i 2 2 p ∗ εˆ )IA ( p ∗ εˆ )BI ( (2) + , EI − EA + h ¯ω where rO is the classic radius of the electron, rO = (e2 /mc2 ), and the subscript I stands for the excited intermediate is the electric dipole moment, and ϕ is the wave state. p function of a corresponding state. δAB is a Kronecker delta that involves the wave functions ϕA and ϕB of states A and p ∗ εˆ )IA involve the B. The matrix elements ( p ∗ εˆ )BI and ( wave functions, ϕA , ϕB , and ϕI (7). The probability of finding the intermediate state I is proportional to exp(−I t/h ¯ ), where I = h ¯ /τI and τI is the lifetime of state I. The first term in Eq. (2) represents the nonresonant amplitude of scattering (ω = ωIA = (EI − EA /h ¯ )). The second and
third terms have appreciable amplitude in the resonant condition, that is ω = ωIA . This phenomenon is known as resonant fluorescence. The sum of the nonresonant amplitudes is usually of the order of rO , which is much smaller than the resonant amplitude (of the order of c/ω). By ignoring the nonresonant amplitudes, the differential cross section of a single-level resonant fluorescent process in the vicinity of a nondegenerate resonance state R takes the form (7) r2 ω |( p ∗ εˆ )AR |2 dσ |( p ∗ εˆ )RB |2 . = O2 d m ω (ER − EA − h ¯ ω)2 + R2 /4 (3) This expression is related to the product of the probability of finding state R formed by the absorption of a photon εˆ ) and the spontaneous emission characterized by (k, probability per solid angle for the transition from state R to state B characterized by the emitted photon (k , εˆ ). The term in the square bracket is proportional to the absorption probability, and the matrix element after the bracket is connected to the emission probability. For a conventional X-ray and a synchrotron radiation (SR) source, the energy resolution E of an incident Xray beam is around 10−2 to 10 eV (2,3,9,10), except that for some special experiments, such as those involving M¨ossbauer effects, 10−9 eV is needed (11,12). Under normal circumstances, E R (∼10−9 eV) according to the uncertainty principle. This means that the temporal duration of the incident photon (X-ray) beam is shorter than the lifetime of the resonance. In other words, the formation of the metastable resonance state R via absorption and the subsequent X-ray emission can be treated as two independent quantum mechanical processes. In such a case, the emission can be thought of as a spontaneous emission, that is, the atom undergoes a radioactive transition from state R to state B as if there were no incident electromagnetic wave. Then, the corresponding dσ/d takes the following simple form under the electric dipole approximation (7): αω3 −→ 2 dσ ∼ V|XAB | sin2 θ, = PAB d 2π c3
(4)
where PAB is the absorption probability equal to the term in the square bracket given in Eq. (3), and |XAB |2 is the transmission probability equal to the term after the bracket of Eq. (3). α is the fine-structure constant (∼1/137). and θ is the angle between the electric dipole moment p the direction of the X-ray fluorescence emitted from the atom. As is well known, this type of fluorescence has the following distinct properties: (1) In general, fluorescence is directionally isotropic, so that its distribution is uniform in space. (2) The intensity of fluorescence is maximum and is when the emission direction is perpendicular to p . (3) X-ray fluorescence emitted from different zero along p atoms in a sample is incoherent, and the emitted photons will not interfere with each other. (4) Whenever the energy of an incident photon is greater than an absorption edge, the energy of the emitted fluorescence is determined by the energy difference between the two states involved in the transition, which is independent of the incident energy. X-ray fluorescence has diverse applications, including identifying elements in materials and quantitative and
X-RAY FLUORESCENCE IMAGING
qualitative trace-element analyses from X-ray fluorescent spectra. Together with other imaging techniques, X-ray fluorescent imaging can, in principle, provide both spectral and spatial distributions of elements in matter in a corresponding photon-energy range. Unlike other fluorescent imaging techniques, such as optical fluorescence and laser-induced fluorescent imaging, the development and application of X-ray fluorescent imaging is still in a state of infancy, though X-ray fluorescence and related spectroscopy have been known for quite sometime. Yet, due to the advent of synchrotron radiation, new methods and applications using this particular imaging technique have been developed very recently. An X-ray fluorescent image is usually obtained by following these steps: The incident X-ray beam, after proper monochromatization and collimation, impinges on a sample of interest. An absorption spectrum is acquired by scanning the monochromator to identify the locations of the absorption edges of the element investigated. The incident beam is tuned to photon energies higher than the absorption edges so that the constituent elements of the sample are excited. Then, X-ray fluorescence emanates. The spatial distribution of the fluorescent intensity, the fluorescent image, is recorded on a two-dimensional detector or on a point detector using a proper scan scheme for the sample. The following items need to be considered for good quality images of high spatial and spectral resolution: (1) appropriate X-ray optics for shaping the incident beam and for analyzing the fluoresced X rays and (2) a suitable scanning scheme for the sample to cross the incident beam. In addition, image reconstruction of intensity data versus position is also important for direct mapping of the trace elements in a sample. TECHNICAL ASPECTS An X-ray fluorescent imaging setup consists of an X-ray source, an X-ray optics arrangement, a sample holder, an imaging detector, and an image processing system. These components are briefly described here: X-ray Sources Conventional X-ray sources are sealed X-ray tubes and rotating-anode X-ray generators. Synchrotron radiation emitted from accelerated relativistic charged particles is another X-ray source. The spectra of conventional sources consist of characteristic lines according to specific atomic transitions and white radiation background due to the energy loss from the electrons bombarding the metal target used in X-ray generators (2,3). The energy resolution of the characteristic lines is about E/E ≈ 10−4 . In contrast, the spectrum of synchrotron radiation is continuous and has a cutoff on high energies, resulting from the emission of a collection of relativistic charged particles (9). High directionality (parallelism), high brilliance, well-defined polarization (linear polarization in the orbital plane and elliptical polarization off the orbital plane), pulsed time structure (usually the pulse width is in picoseconds and the period in nanoseconds), and a clean environment (in vacuum of 10−9 torr) are the advantages of synchrotron
1477
X-ray sources (9). Other sources such as plasma-generated X rays can also be used. Beam Conditioning Proper X-ray monochromators need to be employed for monochromatic incident X-rays. Different incident beam sizes for different imaging purposes, can be obtained by using a suitable collimation scheme. Beam conditioning for X-ray fluorescent imaging is similar to that for Xray diffraction and scattering. Single- or double-crystal monochromators (10) are usually employed for beam monochromatization. For synchrotron radiation, grazing incidence mirrors before or after the monochromator for focusing or refocusing, respectively, are used (13,14). Pinhole or double-slit systems and an evacuated beam path have been found useful for beam collimation. To obtain a microbeam, two double-crystal monochromators — one vertical and the other horizontal, or multilayered mirrors such as the Kirkpatrick–Baez (K–B) configuration (15) are reasonable choices for the collimation system. Sample Holder Usually a sample holder is designed to facilitate the positioning of the sample with respect to the incident X-ray beam. In general, the sample holder should provide the degree of freedom needed to translate the sample left and right and also up and down. In addition, in some cases, the sample needs to be rotated during the imaging process. Therefore, the azimuthal rotation around the samplesurface normal and rocking about the axis perpendicular to both the surface normal and the incident beam are indispensable. Accuracy in translation is of prime concern in high spatial resolution fluorescent imaging. If soft Xray fluorescent imaging is pursued, a sample holder in a high vacuum (HV) or ultrahigh vacuum (UHV) condition is required to reduce air absorption. Detector A suitable detector is essential in studies/investigations for recording X-ray fluorescent images. Area detectors or two-dimensional array detectors are usually employed to obtain two-dimensional images. They include imaging plates (IP) (a resolution of 100 µm for a pixel), charge coupled device (CCD) cameras (a resolution of about 70–100 µm), microchannel plates (MCP) (25 µm per pixel), and solid-state detectors (SSD) (an energy resolution of about 100–200 eV). Traditionally, point detectors such as the gas proportional counter and NaI(Li) scintillation counter have been used frequently. However, to form images, proper scan schemes for the sample as well as the detector are required. For hard X-ray fluorescent imaging, an imaging plate, CCD camera, and point detectors serve the purpose. For soft X-ray fluorescence imaging, a vacuum CCD camera, microchannel plate, and semiconductor pindiode array may be necessary. Computer-Aided Image Processor The fluorescent signals collected by a detector need to be stored according to the sample position. Usually analog signals, are converted into digital signals, and
1478
X-RAY FLUORESCENCE IMAGING
this digital information is then stored in a computer. Image processing, which includes background subtraction, image-frame addition, and cross-sectional display, can be carried out on a computer. In addition, black-and-white contrast as well as color images can be displayed on a computer monitor and on a hard copy. In some cases when the rotating-sample technique is employed to collect a fluorescent image, corrections to avoid spatial distortion need to be considered (16,17). Image reconstruction depends on the scan mode chosen. There are three different modes scan: the point scan, line scan, and area scan. Usually the data from a point scan give a one to one spatial correspondence. Therefore, there is no need to reconstruct the image using mathematical transformation techniques, provided that high-resolution data are collected. Because the areas covered by the incident beam during a line scan (involving a line beam and sample translation and rotation) and an area scan are larger than the sample area, these two types of scans require image reconstruction from the measured data. The estimated relative errors σrel in the reconstructed images as a function of the percentage of the sample area a % that provide fluorescent intensity are shown in Fig. 1 for the three different scan types without (Fig. 1a) and with (Fig. 1b) a constant background. Except for very small fluorescent areas, the line scan provides the lowest error (18,19). Images from both the line scan and area (a) srel
Area Point
1
Line 0.5
scan can be reconstructed by employing appropriate filter functions and convolution techniques (16,20). The following representative cases illustrate in detail the general basic experimental components necessary for X-ray fluorescent imaging. The required experimental conditions specific for the investigation are given, and the actual images obtained are shown. CASE STUDIES Nondestructive X-ray Fluorescent Imaging of Trace Elements by Point Scanning The two-dimensional spatial distribution of multiple elements in samples is usually observed by X-ray fluorescent imaging using synchrotron radiation, mainly because of the tunability of synchrotron photon energy (21). Proper photon energy can be chosen for a specific element investigated. In addition, the brightness and directionality of synchrotron radiation provide superb capability for microbeam analysis, which can improve the spatial resolution of imaging approximately 100-fold. The minimal radiation damage of synchrotron radiation to specimens is another advantage over charged particles (22) such as ions and electrons. A typical experimental setup (23) for this purpose is shown schematically in Fig. 2. Polychromatic white X-rays from synchrotron storage ring are monochromatized for a selected wavelength using either a Si (111) cut double-crystal monochromator (DCM) or a Si–W synthetic multilayer monochromator (SMM). Using the DCM, the incident photon energy can be varied continuously by rotating the crystal around the horizontal axis perpendicular to the incident beam. Using the SMM, the photon energy can be changed by tilting the multilayer assembly about the vertical axis perpendicular to the incident beam. However, the entire system of the SMM may need rearrangement. Cooling the SMM may be necessary, and this should be taken into consideration. A multilayer substrate is usually attached to a copper block, which can be watercooled. The energy resolution E/E is about 10−4 for the
a (%) 0
100
50
(b) srel
Side view Point
1
M2
Slit 2
Slit 1
Area
Sample
Line
0.5
M1 Top view
SR (DCM)
(SMM)
IC
IC Si(Li) detector
0
50
a (%) 100
Figure 1. Estimated error σrel in reconstructed image vs. signal area a% for point, line, and area scans: (a) with, (b) without constant background (19) (courtesy of M. Bavdaz et al.; reprinted from Nucl. Instrum. Methods A266, M. Bavdaz, A. Knochel, P. Ketelsen, W. Peterson, N. Gurker, M. H. Salehi, and T. Dietrich, Imaging Multi-Element Analysis with Synchrotron Radiation Excited X-ray Fluorescence Radiation, pp. 308–312, copyright 1988, Elsevier Science with permission).
Figure 2. Schematic representation of the experimental setup for synchrotron radiation (SR) fluorescent imaging: The incident SR is monochromatized by either a double-crystal monochromator (DCM) or a synthetic multilayer (SMM) and is then focused by the K–B mirror system (M1 and M2). An ionization chamber (IC) and a Si(Li) SSD are used to monitor the incident beam and the fluorescence (23) (courtesy of A. Iida et al.; reprinted from Nucl. Instrum. Methods B82 A. Lida and T. Norma, Synchrotron X-ray Microprobe and its Application to Human Hair Analysis, pp. 129–138, copyright 1993, Elsevier Science with permission).
X-RAY FLUORESCENCE IMAGING
DCM and 10−2 for the SMM. A double-slit system is usually used to trim the incident beam to a desired size, say from 750 × 750 µm to as small as 3 × 4 µm for point scans. The intensity of the incident beam is monitored by an ionization chamber (IC) as usual for synchrotron experiments. A Kirkpatrick–Baez mirror system that consists of a vertical mirror M1 and a horizontal mirror M2 is placed after the IC to modulate the beam size for spatial resolution and the photon flux for sensitivity. Usually, elliptical mirrors are used to reduce spherical aberration (14,24). A sample is located on a computer-controlled X,Y translation stage. The translation step varies from 5 to 750 µm. Smaller steps can be achieved by special design of the translation stage. The fluorescence emitted from the sample at a given position is detected by a Si (Li) solid-state detector. The fluorescent intensity as a function of the sample position (x, y) is recorded and displayed on a personal computer. For illustration, the following examples are X-ray fluorescent images obtained by using this specific setup.
metal contamination in mammals. The following example involves Hg distribution in rat hair. A hair sample of rats endogenously exposed to methylmercury (MeHg) (internal administration) was subjected to synchrotron X-ray fluorescent measurement at the incident photon energy of 14.28 keV, using an experimental setup similar to that shown in Fig. 2. The beam size was 5 × 6 µm2 . Figure 3 shows the fluorescent spectrum of the hair that was collected in 50 minutes. The presence (contamination) of Hg, Zn, and S is clearly seen from the presence of Hg Lα, Hg Lβ, Zn Kα, and S Kα lines in the spectrum. The elemental images of Hg Lα, Zn Kα, and S Kα from a hair cross section of the MMC (methylmercury chloride)treated rat 13.9 days after the first administration are shown in Fig. 4. The sample was obtained at 1 mm
10000
(a)
Intensity (Cts)
X-ray Fluorescent Images of Biological Tissues
Two-dimensional Distribution of Trace Elements in a Rat Hair Cross Section (25). Trace element analysis of hair samples has been widely employed for biological monitoring of health conditions and for environmental investigations of heavy metal exposure and contamination (26,27). Consider a heavy toxic metal such as Hg, which has already been taken up from the blood to the hair of animals and humans due to exposure to methylmercury (MeHg) — a commonly encountered and widely used form of environmental mercury. It is known that MeHg damages the central nervous system (28) by penetrating the blood–brain barrier (29). The effects on and alteration of biological systems and the dynamics of the distribution of Hg in the organs of animals exposed to MeHg have long been investigated (21,30). Because hair specimens usually provide a historical record of trace elements and are easily accessed, hair is a good bioindicator of
1479
S Kα
Ar Kα
Ca Kα Hg Lα Hg Lβ 5000
Zn Kα
5
10
15
Energy (keV) Figure 3. X-ray fluorescence spectrum obtained from the hair cross section of a rat given MMC (25) (courtesy of N. Shimojo et al.; reprinted from Life Sci. 60, N. Shimojo, S. Homma-Takeda, K. Ohuchi, M. Shinyashiki, G.F. Sun, and Y. Kumagi, Mercury Dynamics in Hair of Rats Exposed to Methylmercury by Synchrotron Radiation X-ray Fluorescence Imaging, pp. 2,129–2,137, copyright 1997, Elsevier Science with permission).
(b)
20 µm (c)
(d)
Figure 4. (a) Optical micrograph and (b) X-ray fluorescent images of Hg Lα, (c) Zn Kα, and (d) S Kα from a hair of a MMC-treated rat (25) (courtesy of N. Shimojo et al.; reprinted from Life Sci. 60, N. Shimojo, S. Hommo-Takeda, K. Ohuchi, M. Shinyashiki, G. K. Sun, and Y. Kumagi, Mercury Dynamics in Hair of Rats Exposed to Methylmercury by Sychrotron Radiation X-ray Fluorescence Imaging, pp. 2,129–2,137, copyright 1997, Elsevier Science with permission).
1480
X-RAY FLUORESCENCE IMAGING
from the root end of the hair. The sample thickness was about 25 µm. The scanning condition was 16 × 16 steps at 5 µm/step, and the counting time was 10 s per step. An optical micrograph of the hair cross section is also shown for comparison (Fig. 4a). The black-andwhite contrast, classified into 14 degrees from maximum to minimum, represents linearly the 14 levels of the element concentrations. Figure 4c clearly shows that the element Zn is localized in the hair cortex rather than the medulla after MMC administration. In addition, endogenous exposure to MMC results in preferential accumulation of Hg in the hair cortex than in the medulla or cuticle (Fig. 4b). However, for hair under exogenous exposure to MMC (external administration), detailed analysis of the fluorescent images, together with the fluorescent spectra across the hair cross section, indicates that Hg is found distributed on the surface of the hair cuticle rather than the cortex (25). This result is consistent with other element characterization using flameless atomic absorption spectrometry (FAAS). Similarly, the Hg concentration profile along the length of a single hair can also be determined by using this imaging technique. Human hair can also be analyzed by using the same Xray fluorescent imaging technique (23). The distribution of the elements S, Ca, Zn, Fe, and Cu in a hair cross section, similar to Fig. 3, can be determined similarly. The concentrations of these elements measured by the quantitative analysis of fluorescent intensity are comparable with the values obtained by other techniques (31–33). This imaging technique can be also applied to dynamic studies of metal contamination in hair and blood because the samples prepared can be repeatedly used due to negligibly small radiation damage. For example, fluorescent images of hair cross sections cut at every 1 mm from the root end of MMC-treated rats show that the Hg concentration, distributed in the center of cortex, increases first to about 1,000 µg/g for about 10 days after MMC administration. Then, the concentration decreases to about 600 µg/g for another 10 days (25). Thus, the history of Hg accumulation in animals can be clearly revealed. In summary, the X-ray fluorescent imaging technique is useful in revealing the distribution and concentrations of metal elements in hair and also in providing dynamic information about the pathway of metal exposure in animals and the human body.
Distribution of Cu, Se, and Zn in Human Kidney Tumors. Copper (Cu), selenium (Se), and zinc (Zn) are important metal cofactors in metalloenzymes and metalloproteins. Their presence in these enzymes and proteins directly influences many biochemical and physiological functions. As is well known, ceruloplasmin (34) and dopamine-β-hydroxylase (35) are involved in iron metabolism and neurotransmitter biosynthesis, respectively. Both contain Cu. Se is an essential component of glutathione peroxidase, which plays an important role in protecting organisms from oxidative damage via the reduction of lipoperoxides (R–O–OR) and hydrogen peroxide (36). Zn is also an important metal element in
DNA polymerase and appears in many enzymes such as carbonic anhydrase, alcohol dehydrogenase, and alkaline phosphatase (37,38). Recent studies of the physiological roles of these essential trace elements have been emphasized in connection with possible causes of cancer. For example, investigations of the blood serum levels of these trace elements in cancer patients show the possible involvement of these elements in many cancerous conditions (39–44). In particular, serum levels of Cu and Zn and Cu/Zn ratios of patients who have malignant neoplasms have been used as an indicator for assessing disease activities and prognoses. Increased serum Cu levels and decreased serum Zn levels have also been found in patients who have sarcomas (39), lung cancer (40), gynecologic tumors (41), and carcinoma of the digestive system (stomach) (42). However, only a limited number of works in the literature, are concerned with the distribution of Cu, Se, and Zn in malignant neoplasms. In the following, two-dimensional distributions of Cu, Se, and Zn in human kidney tumors were determined and visualized using nondestructive synchrotron radiation X-ray fluorescent imaging (45). The distribution of Cu, Se, and Zn in cancerous and normal renal tissues and the correlation among these distributions were studied. The experimental setup shown in Fig. 2 and an incident photon energy of 16 keV were used. The experimental conditions were 750 × 750 µm for the beam size, 750 µm/step for the sample translation, and 10 s counting time for each position. The fluorescence of Cu Kα, Zn Kα, and Se Kα were identified in X-ray fluorescent spectra. Compare the optical micrograph shown in Fig. 5a to the spatial distributions of fluorescent intensities of Cu Kα, Zn Kα, and Se Kα for normal (N) and cancerous (C) renal tissues shown in Fig. 5b, c, and d, respectively. The frame of (b)
(a) C
N
(c)
(d)
Figure 5. Chemical imaging of trace elements in normal (N) and cancerous (C) renal tissue from an aged female: (a) Optical micrograph of a sliced sample and the distributions of (b) Zn, (c) Cu, and (d) Se (45) (courtesy of S. Homma-Takeda et al.; translated from J. Trace Elements in Exp. Med. 6, S. Homma, A. Sasaki, I. Nakai, M. Sagal, K. Koiso, and N. Shimojo, Distribution of Copper, Selenium, and Zinc in Human Kidney Tumours by Nondestructive Synchrotron X-ray Fluorescence Imaging, pp. 163–170, copyright 1993, Wiley-Liss Inc. with permission of John Wiley & Sons, Inc. All rights reserved).
X-RAY FLUORESCENCE IMAGING
the fluorescent images covers the entire tissue sample. The upper left portion of the frame corresponds to the cancerous tissue, and the low right portion corresponds to the normal tissue. Darker pixels indicate positions of higher metal concentration. The images reveal that Cu, Zn, and Se accumulate more densely in normal tissue than in cancerous tissue. The concentrations of these metals in the samples determined by ICP-AES (inductively coupled plasma atomic emission spectrometry) agree qualitatively with those estimated from the intensities of the fluorescent images. The average Zn concentration in the cancerous tissues is 12.30 ± 5.05 µg/g compared to 19.10 ± 10.19 µg/g for the normal tissues. The average concentration of Cu is 0.991 ± 0.503 µg/g in the cancerous tissues, compared to 17.200 ± 0.461 µg/g in the normal tissues. The decreases in Zn and Cu concentrations in the cancerous tissues are statistically significant across hundreds of measured data sets. Moreover, the correlation coefficients among the distributions of trace elements can be calculated from the X-ray intensity data at each analytical point. In general, it is found that the correlation coefficients among the metal elements Cu, Se, and Zn in cancerous tissues are qualitatively lower than those in normal tissues. However, the correlation between Cu and Zn in the cancerous tissue investigated is significantly decreased, more than 30%, compared with that in the normal tissues. It should be noted that the change in the trace element level in cancerous tissues is not the same for all types of tumors. Moreover, Zn levels in cancerous tissues vary in different organs. According to (46), tumors located in organs that normally exhibit a low Zn concentration have a Zn accumulation that is similar to or greater than that in the tissue around the tumor. Tumors located in organs that normally have a high Zn concentration also, exhibit a lower uptake of Zn than tissues not involved in the growth of the tumor. Kidney is an example of an organ that has a high Zn concentration. Therefore, the observation of decreased Zn concentration in the kidney tumor is expected. The age dependence of the distribution of these metal elements in human kidney can also be observed in X-ray fluorescent images (47). Figures 6a–d and 6e–h show the X-ray fluorescent images of Zn, Cu, and Se in adult human kidneys from a 22-year-old and a 61-year-old man, respectively. The same experimental setup and conditions as those used for imaging the kidney tumor were employed. The exposure time was 6 s/point for the sample from the 22-year-old man and 10 s/point from the 61-year-old man. Clearly, Cu, Zn, and Se are more concentrated in the renal cortex than in the medulla. This result agrees with that reported in (48–50) and is consistent with the functions of each tissue that is, the kidney cortex contains glomeruli, responsible for cleaning waste materials from blood, and the proximal tubule is responsible for reabsorbing metal ions. The elemental concentrations of Cu and Zn in the kidney tissue determined by both ICP-AES and X-ray fluorescent analysis are 1.67 ± 0.22 µg/g (Cu) and 13.3 ± 2.6 µg/g (Zn) for the 22-year-old man and 1.06 ± 0.34 µg/g (Cu) and 4.42 ± 1.52 µg/g (Zn) for the 61-year-old man. The correlation coefficients between the two trace elements calculated from the X-ray intensity data indicate that
(a)
(e)
(b)
(f)
(c)
(g)
(d)
(h)
1481
Figure 6. X-ray fluorescent images of trace elements in adult human kidney: (a) photograph of a sliced sample and the distributions of (b) Zn, (c) Cu, (d) Se for a 22-year-old man, and (e) photograph and the distributions of (f) Zn, (g) Cu, and (h) Se for a 61-year-old man (field width 750 × 750 µm2 ; medulla: the upper central region; cortex: the periphery) (47) (courtesy of S. Homma-Takeda et al.; reprinted from Nucl. Instrum Methods B103, S. Homma, I. Nakai, S. Misawa, and N. Shimojo, Site-Specific Distribution of Copper, Selenium, and Zinc in Human Kidney by Synchrotron Radiation Induced X-ray fluorescence, pp. 229–232, copyright 1995, Elsevier Science with permission).
the correlation between Zn and Cu is higher for the 22year-old man than for the 61-year-old man. Moreover, the correlation between Cu and Se is the lowest among the Zn–Cu, Zn–Se, and Cu–Se correlations (47). In summary, the X-ray fluorescent imaging technique is well suited for evaluating levels of trace elements in human organs that may influence smoking, cancer, or other diseases. Consequently, the physiological roles of trace elements could be better understood. Furthermore, using synchrotron radiation, this X-ray fluorescent
X-RAY FLUORESCENCE IMAGING
imaging technique causes negligible damage to samples. As a result, samples can be examined histologically after analysis. This fact makes X-ray fluorescent imaging a useful technique for histochemical analysis of biological specimens. The X-ray fluorescent imaging technique can be used in combination with other techniques, such as isoelectric focusing agarose gel electrophoresis (IEF-AGE), histochemical staining, and X-ray microprobes, to obtain functional and structural information about biological substances. For example, using IEF-AGE, direct detection of structural or functional alteration of protein attributed to the binding of mercury (51) and efficient monitoring of changes in mRNA levels, protein contents, and enzyme activities for brain Cu, Zn-, and Mn-SOD (superoxide dismutase) by MMC administration (52) are feasible. Furthermore, detailed distribution of metal elements and morphological changes in biological specimens that are histochemically stained (53), and high-resolution fluorescent images without tedious sample preparation and treatment using X-ray microprobes (54) can also be obtained. X-ray Fluorescent Imaging by Line Scanning This example is included to demonstrate the X-ray fluorescence imaging of a Ni-wire cross by synchrotron radiation using line scans. Images different from those obtained by point scans are expected. The schematic of the experimental setup for line scanning is shown in Fig. 7 (19). The incident beam is collimated by a beam aperture (1) and focused by a mirror (2). An additional guard aperture (3) is then placed between the linear slit (4) and the mirror to cut down the parasitic scattering from the first aperture and the mirror. A sample aperture (5) in front of the sample holder (6) prevents X-rays scattered by the linear slit from participating in the sample excitation. The sample holder can be rotated around the incident beam and translated across the line-shaped incident beam. A solid-state detector (7) is used to monitor the fluorescence that emanates from the sample. During excitation, the sample, a Ni-wire cross, is translated vertically and then rotated around the incident beam. Because a monochromator is not used, both Ni Kα and
6 4
7
5
3 1
2
Beam Figure 7. Experimental setup for X-ray fluorescent imaging using line scan: (1) aperture, (2) mirror, (3) antiscattering stop, (4) slit-aperture, (5) aperture-stop, (6) sample holder, (7) solid-state detector (19) (courtesy of M. Bavdaz et al.; reprinted from Nucl. Instrum. Methods A266, M. Bavdaz, A. Knochel, P. Ketelsen, W. Petersen, N. Gurker, M. H. Salehi, and T. Dietrich, Imaging Multi-Element Analysis with Sychronotron Radiation Excited X-ray Fluorescence Radiation, pp. 308–312, copyright 1998, Elsevier Science with permission).
180
Angle (deg)
1482
0 0
60
120
180 240 300 360 420 Scanning position (µm)
480
540
Figure 8. Measured Ni K fluorescent intensity distribution of a Ni-wire cross (19) (courtesy of M. Bavdaz et al.; reprinted from Nucl. Instrum. Methods A266, M. Bavdaz, A. Knochel, P. Ketelsen, W. Petersen, N. Gurker, M. H. Salehi, and T. Dietrich, Imaging Multi-Element Analysis with Sychronotron Radiation Excited X-ray Fluorescence Radiation, pp. 308–312, copyright 1998, Elsevier Science with permission).
Ni Kβ are excited. Figure 8 shows the two-dimensional Ni Kα and Ni Kβ fluorescent distribution measured at a translation of 9 µm per step and 3 ° per step in rotation. Clearly, the measured distribution is far from the real image of the cross, which can usually be obtained by point scans. The spatial resolution of Fig. 8 depends on the sizes of the apertures and slits used and the distances between the sample, the apertures, and the detector. The accuracy in rotating the sample is also an important factor. Using this kind of setup, 10-µm resolution can easily be achieved. By applying the filtered back-projection technique (20) for image reconstruction from the line scanned distribution, the final reconstructed image of the Ni cross shown in Fig. 9 is obtained. This fluorescent image from linear scanning is equally applicable to imaging trace elements for various types of specimens (16). Soft X-ray Fluorescent Imaging at Submicron Scale Resolution Spatial resolution of soft X-ray fluorescent imaging depends in some cases on the fluorescent material used in making the screen for image recording and display. Among the various materials available for making fluorescent screens, polycrystalline phosphor is most frequently used to detect X rays and electrons. The required features of a material such as phosphor for imaging are high conversion efficiency and high spatial resolution. However, polycrystalline phosphor (powder) usually suffers from particle size and the scattering of fluorescent light among powder particles, which limit the spatial resolution (≥1 µm) and the resultant conversion efficiency. To improve the optical quality of fluorescent
X-RAY FLUORESCENCE IMAGING
1483
540 480
ICCD
Vert.sample pos. (µm)
420 View port 360 300
Fluorescent light
100X, NA = 1.25 objective
240 180
Single-crystal phosphor
120
Aerial image Primary mirror
Vacuum chamber
60
Schwarzschild camera
0 0
60
120 180 240 300 360 420 480 540 Hor. sample pos. (µm)
Figure 9. Reconstructed image of a Ni-wire cross (19) (courtesy of M. Bavdaz et al.; reprinted from Nucl. Instrum. Methods A266, M. Bavdaz, A. Knochel, P. Ketelsen, W. Petersen, N. Gurker, M. H. Salehi, and T. Dietrich, Imaging Multi-Element Analysis with Sychronotron Radiation Excited X-ray Fluorescence Radiation, pp. 308–312, copyright 1998, Elsevier Science with permission).
screen materials, the use of a plastic scintillator, or doped polymer films, has been suggested to provide better spatial uniformity. However, the efficiency of this material is lower than that of phosphor powder (55). Recently, phosphor single crystals have been grown by liquid-phase epitaxy (LPE) (56,57). The resulting monocrystalline phosphor possesses excellent optical quality and high conversion efficiency. The characteristics of a short absorption depth of soft X-rays in the phosphor (200 A˚ for ˚ provide a well-defined and localized fluorescent λ = 139 A) image without any smearing. The high efficiency of conversion and the high sensitivity of the phosphor crystal that covering the emission spectrum in the soft X-ray regime afford low light level detection. Moreover, because the single-crystal phosphor is transparent to emitted light and possesses a superpolished crystal surface, deleterious scattering effects are eliminated, and a well-defined perfect X-ray image plane is achieved. Collectively, these superb optical properties of the single-crystal phosphor lead to soft X-ray fluorescent imaging at submicron resolution, which is described here (58). The imaging experiment was performed by using soft X rays from a synchrotron storage ring. The schematic representation of the experimental setup is given in Fig. 10. It comprises a 20X reduction Schwarzschild ˚ a fluorescent crystal, an optical camera working at 139 A, microscope, and an intensified charge-coupled device (ICCD) camera. The Schwarzschild camera contains two highly reflecting mirrors that have multilayer coatings. The main component of the optical microscope is an oil immersion 100X objective that has a numerical aperture of 1.25. Referring to Fig. 10, an X-ray image
Aperture
Secondary mirror Transmission mask Synchrotron radiation (139 Å)
Figure 10. Experimental setup for submicron soft X-ray fluorescent imaging (58) (courtesy of B. LaFontaine et al.; reprinted from Appl. Phys. Lett. 63, B. La Fontaine, A. A. MacDowell, Z. Tan, D. L. Taylor, O. R. Wood II, J. E. Bjorkholm, D. M. Tennant, and S. L. Hulbert, Submicron, Soft X-ray Fluorescence Imaging, pp. 282–284, copyright 1995, American Institute of Physics with permission).
of a transmission mask is projected onto the image plane of the Schwarzschild soft X-ray camera, at which a commercially available phosphor crystal [a STI-F10G crystal, manufactured by Star Tech Instruments Inc. (58)] is positioned to convert the soft X-ray image into the visible. This visible image is well localized and easily magnified by the optical microscope. The magnified image, located at about 11 cm from the crystal outside the vacuum chamber, is then magnified with a 5X long-workingdistance microscope and viewed by the ICCD camera. The X-ray fluorescent spectrum of the fluorescent crystal, STI-F10 G, excited by soft X rays from 30 to 400 A˚ is shown in Fig. 11. The broad fluorescence in the emission spectrum is useful for uniform image mapping. Figure 12 illustrates the submicron-resolution fluorescent image (Fig. 12a) of a mask, which can be compared with the actual mask shown in Fig. 12b. As can be seen, all of the lines whose widths range from 0.5 to 0.1 µm after the reduction due to the Schwarzschild camera can be clearly identified in the fluorescent image. To improve the resolution of the fluorescent imaging system further, the diffraction limit (0.1 ∼ 0.2 µm, the Rayleigh criterion) of the microscope used needs to be considered. According to the Rayleigh criterion, the smallest size which can be resolved by a lens is proportional to the ratio between the wavelength used and the numerical aperture. The latter is related to the
X-RAY FLUORESCENCE IMAGING
Signal (a.u.)
1484
This high-resolution X-ray fluorescent imaging technique has potential in a variety of applications, especially in optimizing designs for deep ultraviolet and extreme ultraviolet lithographic exposure tools.
20 10
X-ray Fluorescent Holographic Images at Atomic Resolution 0 4000
4500
5000
5500
6000
6500
Emission wavelength (Å) Figure 11. X-ray fluorescent spectrum of the STI-F10 G crystal (58) (courtesy of B. LaFontaine et al.; reprinted from Appl. Phys. Lett. 63, B. La Fontaine, A. A. MacDowell, Z. Tan, D. L. Taylor, O. R. Wood II, J. E. Bjorkholm, D. M. Tennant, and S. L. Hulbert, Submicron, Soft X-ray Fluorescence Imaging, pp. 282–284, copyright 1995, American Institute of Physics with permission).
(a)
8 µm 4 µm 2 µm 6 µm 10 µm (b)
5 µm Figure 12. Soft X-ray fluorescent mask (a) and image (b) (58) (courtesy of B. LaFontaine et al.; reprinted from Appl. Phys. Lett. 63, B. La Fontaine, A. A. MacDowell, Z. Tan, D. L. Taylor, O. R. Wood II, J. E. Bjorkholm, D. M. Tennant, and S. L. Hulbert, Submicron, Soft X-ray Fluorescence Imaging, pp. 282–284, copyright 1995, American Institute of Physics with permission).
The relative positions of atoms in a crystal unit cell — the so-called single-crystal structure — are valuable structural information for a crystalline material. Diffraction methods are usually applied to determine crystal structure. However, due to the fact that a diffraction pattern provides intensity but not phase information for diffracted beams — the so-called phase problem (59) — unique determination of crystal structure cannot usually be achieved. Several methods have been developed to solve this phase problem: the Patterson method (60), molecular replacement (60) and isomorphous replacement (61), direct methods (62,63), multiwavelength anomalous dispersion (MAD) (64), and multiple diffraction (65,66). It is, most desirable however, to image the crystallographic structure and to visualize the atomic arrangement in a crystal directly. The holographic approach is certainly a promising candidate that is well suited to obtain phase information for a diffracted beam. The idea of using X-ray fluorescen holographic imaging has been proposed to attain atomic resolution and element-specific requirements (67–79). Very recently, effective methods for mapping the structure of single crystals and crystal surfaces at atomic resolution have been developed, and they are described here. Normal (single-energy) X-ray Fluorescent Holographic Imaging (NXFH). The formation of a holograph involves an objective beam (wave) and a reference beam (wave) (80). The phase information (or contrast) can be retained (or revealed) by the interference between an objective (scattered) beam and a reference beam. For X-ray fluorescence, the internal X-ray source inside a crystal is the key to holographic formation. This internal Xray source is the emitted fluorescence from atoms inside the crystal excited by incident X radiation. Figure 13a shows the schematic of single-energy X-ray fluorescent holography, where externally excited fluorescence from a target atom A is treated as a holographic reference beam. Fluorescence scattered from neighboring atoms serves as the objective beam. The interference between the two modulates the scattered fluorescent intensity, which is monitored by a detector located at a large distance from the crystal (73). By moving the detector around the crystal, is recorded, where k is the wave an intensity pattern I(k) vector of the scattered beam. The Fourier transform of I(k) yields an atomic resolution hologram. The basic formula that describes the scattered intensity is given by (72). I(k) = Io /R2 [1 + | I(k)
refractive index of the lens. Therefore, using fluorescent crystals that have a high index of refraction and a shorter wavelength emitter may put the resolution below a tenth of a micron.
aj |2 + 2Re(
aj )],
(5)
where Io is the intensity of the source atom A, aj is the amplitude of the objective wave scattered by the jth atom inside the crystal, and R is the distance between the
X-RAY FLUORESCENCE IMAGING
(a)
I (K ) E F, K rj
Incident radiation
A
(b)
E g,K
A
I (K )
EF
Figure 13. Schematic of (a) normal X-ray fluorescent holographic (NXFH) imaging and (b) inverse X-ray fluorescent holographic (IXFH) imaging (73) (courtesy of T. Gog et al.; reprinted from Phys. Rev. Lett. 76 T. Gog, P. M. Len, G. Materlik, D. Bahr, C. S. Fadley, and C. Sanchez-Hank, Multiple-Energy X-Ray Holography: Atomic Images of Hematite (Fe2 O3 ), pp. 3,132–3,135, copyright 1996, American Physical Society with permission).
translational symmetry imposed by the crystal, the value of the second term of Eq. (5) can be substantial. As pointed out in (68), the spatial frequency of this term is much larger than that of the holographic information provided by the nearest neighbor radiating atoms. Therefore, a lowpass filter (68) can be used to remove the contribution of the second term. (4) The effect of absorption by the crystal on the scattered intensity, depending on the crystal shape and the scattering geometry, needs to be corrected to see the expected 0.3% oscillatory signals in the hologram. (5) Correcting the dead time of the detector is necessary. (6) Reducing the noise by using a high-pass filter is required (78). In addition to these requirements, a large scanning angular range must be covered to obtain a threedimensional image that has isotropic atomic resolution. Some diffraction effects, such as the formation of Kossel lines (65,81), generated by the widely distributed fluorescence in the crystal, need to be suppressed. A typical experimental setup for normal X-ray fluorescent holographic (NXFH) imaging is shown in Fig. 14 (72). The incident X-rays from a conventional sealed X-ray tube or synchrotron radiation facility are monochromatized by a graphite crystal and impinge at 45° on the crystal sample mounted on a goniometer, which provides θ -angle tilting and azimuthal φ rotation normal to the crystal surface. A Ge solid-state detector cooled by liquid nitrogen is used to monitor the scattered fluorescence. The two-dimensional hologram is measured by turning the sample around the φ axis and moving the detector by rotating the θ axis in the plane of incidence defined by the incident beam and the φ axis. The upper panel of Fig. 15 shows the 2,402-pixel hologram of a SrTiO3 single crystal after absorption correction and removing the incident-beam contribution. The crystal is plate-like, 30 mm in diameter, and 0.5 mm thick. It is crystallized in the perovskite-type SrTiO3 ˚ The Sr atoms structure whose lattice constant a = 3.9 A. form a simple cubic lattice. The large surface is parallel to the (110) plane. Mo Kα radiation (E ∼ 17.425 keV) is used to excite Sr Kα, where the Sr K edge is 16.105 keV. The
f SrTiO3
Source Graphite
q
D LN 2
sample and the detector. Re means real part. The first term in the square bracket represents the intensity of the reference wave without interaction with neighboring atoms. The second term corresponds to the intensity of the scattered objective waves. The last term results from the interference between the reference and objective waves. Equation (5) also holds for multiple scattering. Because the scattering cross section of X rays is relatively small compared with that of electrons, the main contribution to interference is from the holographic process. Usually, for weak scattering of photons, the second term is of an order of 10−3 smaller than that of the interference term. For a crystal that has multiple atoms, the amplitudes of the objective and the interference waves are much more complicated to estimate, and separating the two is difficult. In addition, because many atoms radiate independently, many holograms thus formed complicate the image. The basic requirements for obtaining a single hologram are the following: (1) The crystal size needs to be small. (2) All of the irradiating atoms must have the same environment, and isotropy of the irradiation centers distributed in the sample is essential. For the former, the crystal size illuminated by the incident X rays has to be smaller than Rλ/r, where λ is the wavelength of X-ray fluorescence and r is the atomic resolution expected. (3) Due to the
1485
Figure 14. Experimental setup for NXFH imaging using a conventional X-ray source: the monochromator (graphite), the sample (SrTiO3 ), and the detector (liquid nitrogen LN2 cooled SSD) (72) (courtesy of M. Tegze et al.; reprinted with permission from Nature 380, M. Tegze and G. Faigel X-Ray Holography with Atomic Resolution, pp. 49–51, copyright 1996 Macmillan Magazines Limited).
1486
X-RAY FLUORESCENCE IMAGING
Inverse (multiple-energy) X-ray Fluorescent Holographic Imaging (IXFH)
(a)
(b) 4 2 0 −2 −4 5 0 −5
−5
0
5
Figure 15. (a) The normal X-ray fluorescent hologram of SrTiO3 , and (b) three-dimensional reconstructed image of SrTiO3 structure (only Sr atoms are revealed) (72) (courtesy of G. Faigel et al.; reprinted with permission from Nature 380, M. Tegze and G. Faigel X-Ray Holography with Atomic Resolution, pp. 49–51, copyright 1996 Macmillan Magazines Limited).
photon-counting statistic for each pixel is about 0.05%, and the overall anisotropy, that is, the spatial distribution of interference signals, in the measured intensities is 0.3%, in agreement with the theoretical expectations described in Eq. (5). The atomic arrangement is reconstructed by using the Helmholtz–Kirchoff formula (78,82) and the proper absorption and dead-time corrections mentioned. Three atomic planes parallel to the crystal surface are depicted in the lower panel of Fig. 15. Only the Sr atom that has a large atomic scattering factor can be observed. Twin images are known to occur in traditional holography. However, due to the centrosymmetry of the crystal lattice, the twin and real images of different atoms occur at the same position. Because these two waves (one of the twin and the other of the real image) may have different phases, the interference between the two can cause an intensity modulation close to the atomic positions. This effect can lead to an appreciable shift in the positions of the atoms in question or to cancellation of a given atomic image if the two waves are out of phase. The different sizes of the atoms in Fig. 15 are due mainly to the flat plate geometry of the crystal because of the different resolutions in the inplane (parallel to the crystal surface) and the out-of-plane (perpendicular direction).
Multiple-energy holographic imaging (83–85) is well established for photoelectrons, Auger electrons, backscattered Kikuchi lines, and diffuse low-energy electrons and positrons. The idea of using multiple energy for X-ray fluorescence has been adopted in inverse X-ray fluorescent holographic imaging (IXFH). A schematic representation of IXFH is shown in Fig. 13b (73). The radiation source and detector are interchanged compared to their positions in normal X-ray fluorescent holography (NXFH) (Fig. 13a). The detector used in NXFH is now replaced by the incident monochromatic radiation of energy EK , which produces an incident plane wave at the sample that propagates along As the plane wave moves toward atom the wave vector k. A, a holographic reference wave is formed. The scattered radiation from the neighboring atoms serves as the holographic objective wave. The overlap and interaction of the reference and the objective waves at atom A excite it, which results in the emission of fluorescence of energy EF . The intensity of the fluorescence is proportional to the strength of the interference. Thus, atom A, formerly the source of radiation in NXFH, now serves as a detector. By and changing its scanning the incident beam direction k that energy EK , the fluorescent intensity distribution I(k) is collected. The is emitted at atom A as a function of k is then Fourier transformed to intensity distribution I(k) produce an atomic holographic image. The two holographic schemes, NXFH and IXFH, are, in principle, equivalent according to the reciprocity theorem in optics (80). The main difference between these two schemes is that NXFH uses monochromatic fluorescence from internal excited atoms to produce a holographic scattering field, which is measured in the far field (8). The IXFH scheme, on the other hand, utilizes energy-tuned external radiation to generate a holographic scattering field, which is detected in the near field (8). Therefore, the fluorescence of the latter plays no role in establishing the scattering field for the holographic process. Because an energy-tunable source is required for IXFH, the use of synchrotron radiation is essential. Figure 16 shows the experimental configuration for IXFH measurements using synchrotron radiation. Monochromatic radiation, after the double-crystal monochromator, is focused onto the sample, which is mounted on a six-circle diffractometer. The sample can be tilted (θ -angle) in the plane of incidence and rotated via the azimuthal angle around an axis normal to the crystal surface. Fluorescence emitted from the atoms in the sample excited by the incident radiation is collected by a cylindrical graphite analyzer and detected by a proportional counter. The inset of Fig. 16 shows a typical scan at E = 9.65 keV and θ = 55° for a (001) cut hematite (Fe2 O3 ) natural crystal slab (73). The effect of the detector dead time has been corrected. The mosaic spread of the crystal is 0.01° . The angular acceptance of the curved analyzer is approximately 14° in the direction of curvature and 0.5° along the straight width. As can be seen, the signal modulation is about 0.5% of the averaged background. The local structural images, such as Kossel lines, are
X-RAY FLUORESCENCE IMAGING
1487
(a) Focusing analyzer
a Proportional counter
Q Sample
5.89
Q = 55° E = 9.65 keV
×107 5.84 −60°
F
Incident beam monitor
a1 a2 (b)
0° F
60° Monochromator
Figure 16. Experimental setup for multiple-energy IXFH imaging. The inset shows the intensity of fluorescence, corrected for detector dead time, versus the azimuthal angle (73) (courtesy of T. Gog et al.; reprinted from Phys. Rev. Lett. 76, T. Gog, P. M. Len, G. Materlik, D. Bahr, C. S. Fadley, and C. Sanchez-Hank, Multiple-Energy X-Ray Holography: Atomic Images of Hematite (Fe2 O3 ), pp. 3,132–3,135, copyright 1996, American Physical Society with permission).
(c)
suppressed by the large angular acceptance of the curved analyzer. Hematite forms a hexagonal crystal that has lattice ˚ Figure 17a is constants a = 5.038A˚ and c = 13.772A. the projection of Fe atoms onto the basal plane (001) perpendicular to the c axis, where the Fe positions in two different kinds of stacking order are shown. For the first kind, the black circles represent iron atoms in an upper plane, and the gray circles are atoms in the plane 0.6A˚ below the black circles. For the second kind, white circles denote upper plane atoms, and black circles are also 0.6A˚ under the white ones. Figure 17b shows the calculated fluorescent image of the Fe sites. Using the experimental arrangement mentioned, the two different Fe atom stackings are not distinguishable in the IXFH images. A superposition of Fe atom images in a holographic reconstruction of these layers occurs, as described in the previous section. The reconstructed image (Fig. 17c) of the (001) Fe layer from the experimental data indeed shows the effect of superposition (see Fig. 17a for comparison). As can be seen, six nearest neighbored Fe atoms, three for each kind of stacking order, are located 2.9A˚ from the center atom. The diagonal distance of this figure ˚ corresponds to 8.7A. The experimental data consist of the fluorescent intensities for three incident energies, E = 9.00, 9.65, and 10.30 keV, measured in 150 hours within the ranges of 45° ≤ θ ≤ 85° and −60° ≤ ≤ 60° by the scan window is θ = 5° and = 5° . The measured intensity I(k) mapped onto the entire 2π hemisphere above the sample by considering the threefold symmetry of the c axis. The background Io (k) derived from a Gaussian lowpass convolution (86) is subtracted from the measured intensity. The theoretically reconstructed image (Fig. 17b) calculated for the clusters of Fe atoms shows good agreement with the experiment.
Å Figure 17. (a) The projection of Fe atom positions on the (001) plane of hematite. (b) Image calculated for the Fe sites. (c) Holographic reconstruction of the (001) Fe layer (73) (courtesy of T. Gog et al.; reprinted from Phys. Rev. Lett. 76, T. Gog, P. M. Len, G. Materlik, D. Bahr, C. S. Fadley, and C. Sanchez-Hank, Multiple-Energy X-Ray Holography: Atomic Images of Hematite (Fe2 O3 ), pp. 3,132–3,135, copyright 1996, American Physical Society with permission).
Isotropic resolution in every direction for threedimensional reconstruction can be improved by combining the NXFH and IXFH techniques (79). Figure 18 is the experimental setup suitable for this combined NXFH and IXFH experiment. The arrangement is similar to that shown in Figs. 14 and 16, except that the three-circle diffractometer comprises two coaxial vertical rotations θ and θ goniometers. The upper stacked one (θ ) is on top of the other θ circle. A sample can be mounted on the horizontal φ axis situated on the lower θ circle. The sample can be rotated as usual by varying the incident angle θ (tilting angle) and the azimuthal φ angle. A detector, for example, a high purity germanium solid-state detector
1488
X-RAY FLUORESCENCE IMAGING
(a) X-ray beam
Sample
1
f
0.9
q q′
0.8 0.7
Detector (b) Sample
0.01
Graphite
0 Detector
−0.01
Figure 18. Experimental setup for combined NXFH and IXFH imaging. The inset shows a curved graphite analyzer used in the synchrotron experiments in place of the detector in the main picture (79) (courtesy of G. Faigel et al.; reprinted from Phys. Rev. Lett. 82, M. Tegze, G. Faigel, S. Marchesini, M. Belakhovsky, and A. I. Chumakov, Three Dimensional Imaging of Atoms with Isotropic 0.5A˚ Resolution, pp. 4,847–4,850, copyright 1999, American Physical Society with permission).
(c) 0.01 0 -0.01
(SSD), is mounted on the θ circle. Again a doubly focused curved graphite monochromator is used for synchrotron sources (see the inset in Fig. 18), and an avalanche photodiode (APD) is employed to handle the high counting rate (87). The θ and θ circles facilitate both the NXFH and IXFH experiments without changing the diffractometer. Special care must be taken with respect to the detector and sample motions to avoid losing any information between pixels and to maintain uniform resolution throughout the holographic image. Figures 19, 20, and 21 represent the holograms and reconstructed atomic images of a circular platelet of CoO crystal (79). The large face of the crystal is parallel to the (111) plane, and the mosaic spread of the crystal is 0.3° . CoO forms a face-centered cubic (fcc) crystal whose ˚ The sample was subjected to lattice constant a = 4.26A. both NXFH and IXFH imaging experiments at several photon energies. The time for data collection ranged from three hours for synchrotron sources to some 20 days for conventional sources. The measured Co Kα fluorescent intensity distribution, normalized to the incident intensity shown in Fig. 19a, is the projection of the hologram mapped onto the surface of a sphere defined by the coordinates θ and φ, where θ ≤ 70° . The picture is obtained by a synchrotron measurement in the IXFH mode at the photon energy E = 13.861 keV sufficient to excite Co Kα. The dominant feature in this picture is due to the strong θ -dependent absorption of the emitted fluorescence by the sample, according to its shape (78). This absorption effect can be easily corrected by a theoretical calculation (78). Figure 19b is the hologram after this correction. The wide radial stripes and the narrow conic lines (the Kossel lines) are the two main features. The former originates from the change of the crystal orientation relative to the detector during the crystal scan, so as to keep the incident radiation source on the hemisphere above the crystal. This relative change in crystal orientation
×10−3
(d)
2 0 −2 (e)
Å 6 4 2 0 −2 −4 −6 −6 −4 −2 0
0.2
0.4
0
2 0.6
4 0.8
6 Å 1
Figure 19. Holograms obtained at various stages: (a) the normalized intensity distribution of Co Kα fluorescence measured; (b) intensity distribution after correction for sample absorption; (c) image after the correction for the detector position and crystal orientation; (d) image after removal of Kossel lines, using filter techniques; (e) reconstructed atomic images in the plane containing the source atom and parallel to the crystal surface. The dotted lines indicate the crystal lattice (79) (courtesy of G. Faigel et al.; reprinted from Phys. Rev. Lett. 82, M. Tegze, G. Faigel, S. Marchesini, M. Belakhovsky, and A. I. Chumakov, Three Dimensional Imaging of Atoms with Isotropic 0.5A˚ Resolution, pp. 4,847–4,850, copyright 1999, American Physical Society with permission).
X-RAY FLUORESCENCE IMAGING
(a)
−0.02
0
0.02
(b)
−5
0
5 × 10−3
(c) 1 0.8 0.6 0.4 0.2 0 Figure 20. (a) Extended hologram before and (b) after employing a low-pass filter; (c) reconstructed atomic image of Co atoms. The first nearest neighbors are shown in dashed lines, and the next nearest neighbors in dotted lines (79) (courtesy of G. Faigel et al.; reprinted from Phys. Rev. Lett. 82, M. Tegze, G. Faigel, S. Marchesini, M. Belakhovsky, and A. I. Chumakov, Three Dimensional Imaging of Atoms with Isotropic 0.5A˚ Resolution, pp. 4,847–4,850, copyright 1996, American Physical Society with permission).
introduces modulation of the fluorescent intensity, which can be precisely measured and subtracted. Figure 19c is the corrected hologram. The Kossel, or, sometimes, called XSW (X-ray standing wave) lines, can be filtered out by a low-pass spatial filter, as already mentioned. This filtering is carried out by calculating the convolution of the hologram that has a Gaussian on the surface of the sphere. The width of the Gaussian is σ ≈ λ/2π rmax , where the maximum radius rmax ≈ 5A˚ of the region for imaging and λ is the X-ray wavelength. The hologram after filtering is shown in Fig. 19d. Threefold symmetry due to the (111) surface is clearly seen in the figure. The atomic image in the plane that contains the source atom, parallel to the crystal surface, was reconstructed by using the Helmholtz–Kirchhoff formula (78,82) and is shown in Fig. 19e. The six nearest neighbored Co atoms appear at
1489
approximately the proper positions relative to the central fluorescent atom (not shown). Clearly, Fig. 19e is merely a two-dimensional reconstructed atomic image. For threedimensional imaging, the sampling angular range should be increased up to 4π in solid angle, and the structural information provided by the Kossel lines (or XSW lines) needs to be included. Figure 20a shows the extended hologram after accounting for these two factors. The images of conic Kossel lines become much clearer. The lowpass filtered image is shown in Fig. 20b for comparison. The corresponding three-dimensional image reconstructed from Fig. 20b and the outline of the crystal lattice are given in Fig. 20c. Both the nearest and next nearest neighbor Co atoms are clearly seen. However, the nearest neighbor sites are slightly shifted toward the central atom due to the twin-image interference and the angular dependence of the atomic scattering factor (78). This distortion can be corrected by summing holograms taken at different energies (82), that is, the properly phased summing of the reconstructed wave amplitudes suppresses the twin images and reduces the unwanted intensity oscillation. Therefore, the combination of NXFH and IXFH (multipleenergy) could provide better image quality. Figures 21a–d show the holograms of CoO measured at E = 6.925, 13.861, 17.444, and 18.915 keV, respectively. The first and the third pictures were taken in the normal mode (NXFH) using a conventional X-ray source, and the second and the fourth were taken in the inverse mode (IXFH) using a synchrotron source. The combination of these four measurements leads to the three-dimensional atomic structure of the Co atoms shown in Fig. 21e. Slight improvement of the image quality in Fig. 21e, compared with the single-energy image in Fig. 20c can be detected. The atomic positions of Co atoms are closer to their real positions in the known CoO crystal, and the background noise is substantially reduced. The spatial resolution of the atomic positions is estimated from the full width at half-maxima (FWHM) of the fluorescent intensity ˚ The deviation of the intensity at approximately 0.5A. maxima from the expected real positions of the nearest ˚ Co neighbors is less than 0.1A. In summary, X-ray fluorescent holographic imaging can be used to map three-dimensional local atomic structures at isotropic resolution and without any prior knowledge of the structures. The combination of the experimental techniques of NXFH and IXFH, singleenergy and multiple-energy, together with mathematical evaluation methods, reinforces the three-dimensional capabilities of X-ray fluorescent holographic imaging. Very recently, improved experimental conditions and a datahandling scheme have made possible the imaging of light atoms (88), as well as atoms in quasi-crystals (89). It is hoped that this X-ray fluorescent imaging method can be applied in the future to more complex crystal systems, such as macromolecules. X-ray Fluorescence Images from Solids in Electric Fields X-ray fluorescence emitted from atoms usually exhibits a homogeneous spatial intensity distribution, except for the direction along a polarization axis. The familiar doughnutshaped distribution that has zero intensity along the
1490
X-RAY FLUORESCENCE IMAGING
(a)
(b) (e)
1
× 10−3
0.8
5 0.6 (c)
0
(d)
0.4
−5
0.2 0
Figure 21. Holograms of CoO measured at E = (a) 6.925, (b) 13.861, (c) 17.444, and (d) 18.915 keV. (e) Three-dimensional images obtained from the combination of the four measurements (79) (courtesy of G. Faigel et al.; reprinted from Phys. Rev. Lett. 82, M. Tegze, G. Faigel, S. Marchesini, M. Belakhovsky, and A. I. Chumakov, Three Dimensional Imaging of Atoms with Isotropic 0.5A˚ Resolution, pp. 4,847–4,850, copyright 1996, American Physical Society with permission).
polarization axis, as expected from Eq. (4), is commonly observed in many experiments (8, 90–96). In contrast to this common feature, different X-ray fluorescent images could be formed if a solid sample is placed in an external electric field during photon excitation. In recent experiments, discrete ring-like fluorescent images have been observed from solids excited by synchrotron radiation (97). The semi angles of the rings found are less than 20° and are closely related to the atomic number Z and to the principal and the orbital angular momentum quantum numbers n and . The details of this unusual observation are described here. Synchrotron radiation of energies ranging from 15 eV to 9 keV is used as the X-ray source to cover the excitation energies of the elements of 4 < Z < 27. The formation of X-ray fluorescent images in amorphous, single- and polycrystalline materials that contain these elements was investigated. The samples were placed on a holder at the center of a UHV chamber (10−9 Torr) (see Fig. 22). Incident radiation whose beam size was 1.5 × 1.5 mm and photon energy E0 hit the sample and fluorescent radiation generated was recorded on a five-plate MCP (microchannel plate) detector placed 6 to 10 cm from the sample (Fig. 22). The MCP detector is operated at −4,050 volts during fluorescent detection. The diameter of the MCP was 30 mm, and the detecting area contained 1,022 × 1,022 pixels. The spatial resolution was 25 µm, the width of a pixel. A copper grid with a +150 volts bias was mounted on the front face of the detector to stop the positive ions. The −4,050 volt bias of the MCP is sufficient to prevent negatively charged particles from entering the detector. The fluorescent signals amplified by the electron gain of 5 × 107 were digitized and displayed on a color monitor. The sample current Is can be measured and a variable bias Vs can be applied to the sample. ϕ and θ are the angles of the sample surface and the detection direction with respect to the incident beam, respectively. The detector is placed at θ = 45° or θ = 90° . Total fluorescence yield (TFY) and total
Synchrotron radiation
Sample j q = 45° MCP
Position analyzer
Is
Bias (Vs)
+150 V −4050 V Computer
Figure 22. Schematic of the experimental setup for X-ray fluorescence in an electric field. The monochromatic SR excites the sample and generates fluorescence measured by the MCP area detector. The electric field results from the working bias of the MCP (97).
electron yield (TEY) versus E0 , measured from the MCP and the sample current Is , respectively, gave absorption spectra that indicate the energy positions of absorption edges. Figures 23a and b show the fluorescent images obtained at θ = 45° for ϕ = 45° from a sapphire sample whose photon energy was slightly above (E0 = 545 eV) and below (E0 = 500 eV) the oxygen K edge (EK = 543 eV). A bright ring and a dark hole are observed for E0 > EK (Fig. 23a) and E0 < EK (Fig. 23b), respectively. The ring images remain the same for various ϕ angles, as long as E0 ≥ EK (the energies of absorption edges), irrespective of the crystalline form of the samples. For example, the fluorescent rings of oxygen 1s from a single-crystal Al2 O3 ,
X-RAY FLUORESCENCE IMAGING
Cursor: 510 X 510 Y Z 1.000 R 12.00 25.00 23.33 21.67 20.00 18.33 16.67 15.00 13.33 11.67 10.00 8.333 6.667 5.000 3.333 1.667
(b)
(c)
Cursor: X 510 510 Y Z 3.000 R 3.000 35.00 32.67 30.33 28.00 25.67 23.33 21.00 18.67 16.33 14.00 11.67 9.333 7.000 4.667 2.333
(d)
Cursor: 510 X 510 Y Z 2.000 R 2.000 13.00 12.13 11.27 10.40 9.533 8.667 7.800 6.933 6.067 5.200 4.333 3.467 2.600 1.733 0.8667
(e)
Cursor: X 510 510 Y Z 18.00 R 18.00 25.00 23.33 21.67 20.00 18.33 16.67 15.00 13.33 11.67 10.00 8.333 6.667 5.000 3.333 1.667
(f)
Cursor: X 1020 400 Y 0. Z R 5.000 23.00 21.47 19.93 18.40 16.87 15.33 13.80 12.27 10.73 9.200 7.667 6.133 4.600 3.067 1.533
(a)
1491
Cursor: 510 X 510 Y 0. Z R 1.000 10.00 9.333 8.667 8.000 7.333 6.667 6.000 5.333 4.667 4.000 3.333 2.667 2.000 1.333 0.6667
Figure 23. X-ray fluorescent ring images at θ = 45 ° and ϕ = 45 ° of (a) O1s of sapphire excited at E0 = 545 eV, (b) sapphire at E0 = 500 eV, (c) B1s of boron nitride (BN) powder excited at E0 = 194 eV > EK (B1s ), (d) C1s of graphite at E0 = 285 eV > EK (C1s ), (e) Ti1s excited at E0 = 5,000 eV > EK (Ti1s ), and (f) the photoelectron image of Mn1s excited at E0 = 6,565 eV > EK (Mn1s ), where EK (O1s ) = 543 eV, EK (B1s ) = 192 eV, EK (C1s ) = 284 eV, EK (Ti1s ) = 4,966 eV, and EK (Mn1s ) = 6,539 eV. (A scratch is always observed on the MCP screen (97).
a piece of glass, and amorphous LaNiO3 and SiO2 samples are nearly the same. Fluorescent rings are repeatedly observed for various elements of atomic numbers 4 < Z < 27 at θ = 45° and 90° . For illustration, Figs. 23c–e display the images of the excitations of B1s , C1s , and Ti1s in color. Compared with fluorescence, photoelectron ring images (98) can also be
clearly observed by applying a −2,500 volt sample bias. Figure 23f shows such a ring image of Mn1s photoelectrons [EK (Mn) = 6,539 eV]. The experimental conditions for recording Fig. 23 are the following: The energy resolutions are E = 100 meV for Figs. 23a and b, 50 meV for Fig. 23c, and 800 meV for Fig. 23e and f. The sample-detector distances are 9.4 cm for Figs. 23a–d and 7.9 cm for
X-RAY FLUORESCENCE IMAGING
Figs. 23e and f. The angle ϕ is set at 45° and the counting time is 300 s for each image. Fluorescence involving 2p and 3d electrons that show ring images is also observed (not shown here). In addition, the excitation of the samples by photons of higher harmonics can also give ring images (not shown). The emitted photon spectra, measured by using a seven-element solid-state detector (E = 100 eV), proved to be the fluorescent radiation of energies E < EK . The semiangles δ corresponding to the ring radii follow the E−1/2 relationship (Fig. 24a), where E is the emission energy between the energy levels of atoms. The δ value remains the same for the same element involved, irrespective of its crystalline form. The origin of the formation of fluorescent rings is the following: The linearly polarized monochromatic radiation polarizes the atoms of the samples by the excitation of a given state (n, ). The excited atoms then emit fluorescence in all directions when returning to their stable states. Because fluorescent rings are observed at θ = 45° and 90° , irrespective of the rotation of the sample, this implies that wherever the detector is placed, the fluorescent ring is observed along that direction. The high working bias
(b) 0.04
ER
0.12
b b
0.08
C1s
0.03
0.12
0.02
0.08
0.06 1s
0.04
0.01
0.04
Cu2p
Ca1s
Cr1s
Fe1s
0
1
Au3d
0.02 0
1
2 3 4 5 Emission energy (kev)
6
2 3 4 5 Emission energy (kev)
6
7
(c) 0.12
2p
0.7
Cu b (radian)
d (radian)
b (radian)
B1s 0.10
0.16
0.10
0.6
Mn
0.5 0.08
Cr Sr
0.4 0.3
0.06 0.4
0.6
0.8 1.0 1.2 1.4 Emission energy (kev)
1.6
1.8
Figure 24. (a) The semiangles δ of the fluorescent cones versus the emission energy E of the various elements investigated; B1s , C1s , N1s , O1s , Cr2p , Mn2p , Cu2p , Al1s , Sr2p , P1s , Au3d , S1s , Cl1s , K1s , Ca1s , Ti1s , V1s , Cr1s , Mn 1s , Fe1s in the order of emission energy. The fitting curve is the function 0.058(±0.0016) × 1/E + 0.012(±0.0017). The standard deviation in δ is about 5%. (b) and (c) The corresponding opening angles β (the cross) and σ F /σ a (the triangle) for the elements investigated involving 1s and 2p fluorescence, respectively (see the text). The inset: Schematic of the redistributed dipoles (97).
7
s F/ s a
(a)
of the MCP detector may also affect the polarized atoms of the sample during photon excitation. The influence of the electric field between the detector and the sample on the fluorescent image from changing the sample bias Vs is shown in Figs. 25a, b, and c for Cr1s at Vs = 0, 2,000 and −2,500 volts, respectively. As can be seen, the intensity of the fluorescence (Fig. 25b) increases when the electric potential difference between the detector and the sample increases. The image becomes blurred when the electric potential difference decreases (Fig. 25c). The latter is also due partly to the presence of photoelectrons. This R result indicates that the presence of the electric field E between the detector and the sample clearly causes the , to align to polarized atoms, namely, the emitting dipoles p R . Thus, the spherically distributed some extent with the E dipoles originally in random fashion are now redistributed as shown in the inset of Fig. 24a. The preferred orientation R , and there are no dipoles in the of the dipoles is along E R field area defined by the opening angle 2β. Because the E is very weak, angle β is very small. The emitted fluorescent intensity distribution IF (θ ) R for the dipole versus the angle θ with respect to E
s F/ s a
1492
X-RAY FLUORESCENCE IMAGING
(a)
Image Trak 2−D
0.
Channel
Image Trak 2−D
(b)
0.
Channel Image Trak 2−D
(c)
0.
Channel
Cursor: 510 X 510 Y Z 14.00 R 14.00 30.00 28.00 26.00 24.00 22.00 20.00 18.00 16.00 14.00 12.00 10.00 8.000 6.000 4.000 2.000 1020.
Cursor: X 510 Y 510 Z 14.00 R 14.00 30.00 28.00 26.00 24.00 22.00 20.00 18.00 16.00 14.00 12.00 10.00 8.000 6.000 4.000 2.000 1020. Cursor: X 510 Y 510 Z 13.00 R 13.00 30.00 28.00 26.00 24.00 22.00 20.00 18.00 16.00 14.00 12.00 10.00 8.000 6.000 4.000 2.000 1020.
Figure 25. The fluorescent images of Cr1s where the sample bias Vs is equal to (a) 0, (b) 2,000, and (c) −2,500 volts in a 200 s-exposure. The photoelectron ring appears near the center of the screen in (a) and (c), and the fluorescent ring is slightly below the center (97).
distribution shown in the inset of Fig. 24a can be calculated by considering the absorption of the incident and the fluorescent beams by the sample and the usual doughnut-shaped distribution sin2 (θ ) of the dipoles (see,
1493
also Eq. 4):
∞
IF (θ ) =
a
dz 0
dγ γ =−a
µa z µF z Bn − sin2 (θ + γ )e cos ϕ e− cos θ , (6) cos ϕ
αω3 ៝ 2 π |XAB | V and a = − β. µa and µF 2π c3 2 are the linear absorption coefficients of the incident and the fluoresced beams, respectively, where µa (ω) = n0 σ a and µF (ω) = n0 σ F ; σ ’s are the total absorption/fluorescence cross sections, and n0 is the total number of the dipoles per unit volume. γ covers all of the angular range, except for the β range. z is along the inward surface normal. I0 , A, V, and PAB are the intensity and the cross section of the incident beam, the irradiated volume, and the probability of absorption of the incident photons by the atoms in the transition from state A to state B, respectively. XAB is the corresponding transition matrix element (see Eq. 4). Integration of Eq. (6) leads to
where B = I0 APAB
IF (θ ) = B ×
sec ϕ(π/2 − β − 1/2 sin 2β cos 2θ ) , (σ F sec θ − σ a sec ϕ)
(7)
which describes a ring-like distribution. From the values of σ ’s given in Refs. 99 and 100, the β angles for various emission energies are determined by fitting the measured fluorescent profile IF (θ ) where β is the only adjustable variable. Figures 24b and c are the β angles determined and the ratios σ F /σ a versus emission energy for the 1s and 2p fluorescence. Clearly, both the β and σ F /σ a behave very similarly. Because the mean free path of X rays in the sample equals 1/µ, therefore, the β angle is closely related to the mean-free-path ratio a /F . The ring centers of Figs. 23a, c–e and the dark hole in R field. The strength Fig. 23b indicate the direction of the E R affects only the fluorescence intensity not the of the E ring size. The formation of a bright ring or a dark hole on the detector depends only on whether the incident photon energy is greater or smaller than the absorption edges. This is analogous to the photoelectric effect. Based on the experimental data, the semiangle of the fluorescent ring for a given Z, n, and can be predicted by using the curve-fitting relationship: δ = 0.058(±0.0016) × 1/E + 0.012(±0.0017), where E is the emission energy between the energy levels of atoms. In summary, the observed ring-like discrete spatial distributions of X-ray fluorescence result from the collective alignment effect of the dipoles in solids exposed to an external electric field. The induced small opening of the dipole distribution due to this alignment effect and the self-absorption by the samples related to X-ray mean free paths are responsible for the formation of a fluorescent ring, which differs drastically from our common knowledge about emitted radiation. An empirical formula predicts the dimension of the fluorescent rings for given Z, n, and . This interesting feature, in turn, provides an alternative for characterizing materials according to their fluorescent images. X-ray fluorescence is a useful imaging technique for trace element analysis in biomedical and environmental applications and material characterization. It has the
1494
X-RAY FLUORESCENCE IMAGING
potential for crystallographic structural studies such as crystal-structure determination of surfaces, interfaces, and bulk materials, using X-ray fluorescent holographic imaging. The availability of synchrotron radiation, the excellent soft and hard X-rays obtained from this source, and the increasing number of synchrotron facilities in the world will certainly enlarge the applicability of this fluorescent imaging technique. It is anticipated that in the near future this technique will develop into a major imaging tool for investigating various kinds of materials at submicron or even smaller scales. Acknowledgments The author is indebted to G. Faigel, T. Gog, N. Gurker, S. Homma-Takeda, A. Iida, Y. Kumagai, N. Shimojo, the American Institute of Physics, the American Physical Society, and the publisher of Nature, Elsevier Science B. V., and John Wiley and Sons for permission to reproduce figures and photographs from their published materials.
ABBREVIATIONS AND ACRONYMS APD CCD DCM EM wave FAAS FWHM HV IC ICCD ICP-AES IEF-AGE IP IXFH K-B LN2 LPE MAD MCP MeHg MMC NXFH SMM SOD SR SSD TEY TFY UHV XSW
avalanche photodiode charge-coupled device double-crystal monochromator electromagnetic wave flameless atomic absorption spectrometry full width at half-maxima high vacuum ionization chamber intensified charge-coupled device inductively coupled plasma atomic emission spectrometry isoelectric focusing agarose gel electrophoresis imaging plate inverse (multiple-energy) X-ray fluorescent holography Kirkpatrick-Baez liquid nitrogen liquid-phase epitaxy multiwavelength anomalous dispersion microchannel plate methylmercury methylmercury chloride normal (single-energy) X-ray fluorescent holography synthetic multilayer monochromator superoxide dismutase synchrotron radiation solid-state detector total electron yield total fluorescence yield ultrahigh vacuum X-ray standing wave
BIBLIOGRAPHY 1. P. P. Ewald, FiftyYears of X-Ray Diffraction, N.V.A. Oosthoek’s Uitgeversmaatschappij, Utrecht, The Netherlands, 1962. 2. A. H. Compton and S. K. Allison, X-Rays in Theory and Experiment, 2nd ed., Van Nostrand, Princeton, 1935. 3. L. V. Azaroff, Elements of X-Ray Crystallography, McGrawHill, NY, 1968.
˚ 4. T. Aberg and J. Tulkki, in B. Crasemann, ed., Atomic InnerShell Physics, Plenum press, New York, 1985, pp. 419–463 and the references therein. 5. E. Merzbacher, Quantum Mechanics, Wiley, NY, 1961. 6. A. Messiah, Quantum Mechanics, North-Holland, Amsterdam, 1962. 7. J. J. Sakurai, Advanced Quantum Mechanics, AddisonWesley, NY, 1967. 8. J. D. Jackson, Classical Electrodynamics, Wiley, NY, 1967. 9. E. -E. Koch, D. E. Eastman, and Y. Farge, in E. -E. Koch, ed., Handbook on Synchrotron Radiation, 1a ed., NorthHolland, Amsterdam, 1983, pp. 1–63. 10. T. Matsushita and H. Hashizume, in E. -E. Koch, ed., Handbook on Synchrotron Radiation, 1a ed., North-Holland, Amsterdam, 1983, pp. 261–314. 11. R. L. M¨ossbauer, Z. Phys. 151, 124–143 (1958). 12. D. P. Siddons et al., Rev. Sci. Instrum. 60, 1,649–1,654 (1989). 13. S. Hayakawa et al., Nucl. Instrum. Methods. B49, 555–560 (1990). 14. Y. Suzuki and F. Uchida, Rev. Sci. Instrum. 63, 578–581 (1992). 15. P. Kirkpatrick and A. V. Baez, J. Opt. Soc. Am. 38, 766–774 (1948). 16. A. Iida, M. Takahashi, K. Sakurai, and Y. Gohshi, Rev. Sci. Instrum. 60, 2,458–2,461 (1989). 17. N. Gurker, X-Ray Spectrom. 14, 74–83 (1985). 18. N. Gurker, Adv. X-Ray Analysis 30, 53–60 (1987). 19. M. Bavdaz et al., Nucl. Instrum. Methods A266, 308–312 (1988). 20. G. T. Herman, Image Reconstruction from Projections, Academic Press, NY, 1980. 21. N. Shimojo, S. Homma, I. Nakgi, and A. Iida, Anal. Lett. 24, 1,767–1,777 (1991). 22. W. J. M. Lenglet et al., Histochemistry 81, 305–309 (1984). 23. A. Iida and T. Norma, Nucl. Instrum. Methods B82, 129–138 (1993). 24. Y. Suzuki, F. Uchida, and Y. Hirai, Jpn. J. Appl. Phys. 28, L1,660–(1989). 25. N. Shimojo et al., Life Sci. 60, 2,129–2,137 (1997). 26. S. A. Katz and R. B. Katz, J. Appl. Toxicol. 12, 79–84 (1992). 27. W. M. A. Burgess, L. Diberardinis, and F. E. Speizer, Am Ind. Hyg. Assoc. J. 38, 184–191 (1977). 28. W. E. Atchison and M. F. Hare, FASEB J. 8, 622–629 (1994). 29. M. Aschner and J. L. Aschner, Neurosci. Biobehaviour Rev. 14, 169–176 (1990). 30. Y. Kumagai, S. Homma-Takeda, M. Shinyashiki, and N. Shimojo, Appl. Organomet. Chem. 11, 635–643 (1997). 31. A. J. J. Bos et al., Nucl. Instrum. Methods B3, 654–659 (1984). 32. I. Orlic, J. Makjanic, and V. Valkovic, Nucl. Instrum. Methods B3, 250–252 (1984). 33. K. Okmoto et al., Clin. Chem. 31, 1,592–1,597 (1985). 34. S. Osaki, D. A. Johnson, and E. Freiden, J. Biol. Chem. 241, 2,746–2,751 (1966). 35. T. L. Sourkes, Pharmacol. Rev. 24, 349–359 (1972). 36. S. H. Oh, H.E. Ganther, and W. G. Hoekstra, Biochemistry 13, 1,825–1,829 (1974). 37. D. Keilin and T. Mann, Biochem. J. 34, 1,163–1,176 (1940).
X-RAY TELESCOPE 38. C. G. Elinder, in L. Friberg, G. F. Nordberg, and V. B. Vouk, eds., Handbook on Toxicology of Metals, vol. 2, Oxford, Amsterdam, 1986, p. 664. 39. G. L. Fisher, V. S. Byers, M. Shifrine, and A. S. Levin, Cancer 37, 356–363 (1976). 40. B. F. Issel et al., Cancer 47, 1,845–1,848 (1981). 41. N. Cetinkaya, D. Cetinkaya, and M. Tuce, Biol. Trace Element Res. 18, 29–38 (1988). 42. S. Inutsuka and S. Araki, Cancer 42, 626–631 (1978). 44. E. Huhti, A. Poukkula, and E. Uksila, Respiration 40, 112–116 (1980). 45. S. Homma et al., J. Trace Elements Exp. Med. 6, 163–170 (1993). 46. B. Rosoff and H. Spence, Nature 207, 652–654 (1965). 47. S. Homma, I. Nakai, S. Misawa, and N. Shimojo, Nucl. Instrum. Methods B103, 229–232 (1995). 48. K. Julshamn et al., Sci. Total Environ. 84, 25–33 (1989). 50. N. Koizumi et al., Environ. Res. 49, 104–114 (1989). Phamacol.
53. S. Homma-Takeda, Y. Kumagai, M. Shinyashiki, N. Shimojo, J. Synchrotron Radiat. 5, 57–59 (1998).
76. B. Adams et al., Phys. Rev. B57, 7,526–7,534 (1998). 77. S. Y. Tong, C. W. Mok, H. Wu, and L. Z. Xin, Phys. Rev. B58, 10 815–10 822 (1998). 78. G. Faigel and M. Tegze, Rep. Prog. Phys. 62, 355–393 (1999). 80. R. J. Collier, C. B. Burckhardt, and L. H. Lin, Optical Holography, Academic Press, NY, 1971. 81. W. Kossel, Ann. Phys. (Leipzig) 26, 533–553 (1936). 82. J. J. Barton, Phys. Rev. Lett. 61, 1,356–1,359 (1988). 83. J. J. Barton, Phys. Rev. Lett. 67, 3,106–3,109 (1991). 84. S. Y. Tong, H. Huang, and C. M. Wei, Phys. Rev. B46, 2,452–2,459 (1992). 85. S. Thevuthasan et al., Phys. Rev. Lett. 70, 595–598 (1993).
87. A. Q. R. Baron, Nucl. Instrum. Methods A352, 665–667 (1995).
51. S. Homma-Takeda et al., Anal. Lett. 29, 601–611 (1996). Toxicol.
75. P. M. Len et al., Phys. Rev. B56, 1,529–1,539 (1997).
86. G. P. Harp et al., J. Electron Spectrosc. Relat. Phenomena 70, 331–337 (1991).
49. R. Scott et al., Urol. Res. 11, 285–290 (1983).
Environ.
73. T. Gog et al., Phys. Rev. Lett. 76, 3,132–3,135 (1996). 74. P. M. Len, T. Gog, C. S. Fadley, and G. Materlik, Phys. Rev. B55, 3,323–3,327 (1997).
79. M. Tegze et al., Phys. Rev. Lett. 82, 4,847–4,850 (1999).
43. M. Hrgovcic et al., Cancer 31, 1,337–1,345 (1973).
52. M. Shinyashiki et al., 359–366 (1996).
1495
2, and
54. N. Shimojo et al., J. Occup. Health 39, 64–65 (1997). 55. J. A. R. Samson, Techniques of Vacuum Ultraviolet Spectroscopy, Wiley & Sons, NY, 1967. 56. G. W. Berkstresser, J. Shmulovich, D. T. C. Huo, and G. Matulis, J. Electrochem. Soc. 134, 2,624–2,628 (1987). 57. G. W. Berkstresser et al., J. Electrochem. Soc. 135, 1,302– 1,305 (1988). 58. B. La Fontaine et al., Appl. Phys. Lett. 63, 282–284 (1995). 59. H. A. Hauptman, Phys. Today 42, 24–30 (1989). 60. G. H. Stout and L. H. Jensen, X-ray Structure Determination, 2nd ed., Wiley, NY, 1989. 61. R. G. Rossmann, ed., The Molecular Replacement Method, Gordon and Breach, NY, 1972. 62. H. Schenk, ed., Direct Methods for Solving Crystal Structures, Plenum Press, NY, 1991. 63. M. M. Woolfson and H. -F. Han, Physical and Non-Physical Methods of Solving Crystal Structures, Cambridge University Press, Cambridge, 1995.
88. M. Tegze et al., Nature 407, 38–40 (2000). 89. S. Marchesini et al., Phys. Rev. Lett. 85, 4,723–4,726 (2000). 90. J. Muller et al., Phys. Lett. 44A, 263–264 (1973). 91. U. Fano and J. H.Macek, Rev. Mod. Phys. 45, 553–573 (1973). 92. C. H.Greene and R. N.Zare, Ann. Rev. Phys. Chem. 33, 119–150 (1982). 93. D. W. Lindle et al., Phys. Rev. Lett. 60, 1,010–1,013 (1988). 94. S. H. Southworth et al., Phys. Rev. Lett. 67, 1,098–1,101 (1991). 95. Y. Ma et al., Phys. Rev. Lett. 74, 478–481 (1995). 96. J. A. Carlisle et al., Phys. Rev. Lett. 74, 1,234–1,237 (1995) 97. C. K. Chen et al., Paper 14P9, Third Crystallogr. Assoc., Kuala Lumpur, 1998.
Conf.
Asian
98. H. Helm et al., Phys. Rev. Lett. 70, 3,221–3,224 (1993). 99. J. J. Yeh, Atomic Calculation of Photoionization Crosssections and Asymmetry Parameters, Gordon & Breach, NY, 1993. 100. E. B. Saloman, J. H. Hubbel, and J. H. Scofield, Atom. Data Nucl. Data Tables 38, 1–51 (1988).
64. W. A. Hendrickson, Science 254, 51–58 (1991). 65. S. -L. Chang, Multiple Diffraction of X-Rays in Crystals, Springer-Verlag, Berlin, 1984.
X-RAY TELESCOPE
66. S. -L. Chang, Acta Crystallogr. A54, 886–894 (1998); also in H. Schenk, ed., Crystallography Across the Sciences, International Union of Crystallography, Chester, 1998, pp. 886–894. 67. A. Szoke, in D. T. Attwood and J. Boker, eds., Short Wavelength Coherent Radiation: Generation and Applications, AIP Conf. Proc. No. 147, American Institute of Physics, NY, 1986, pp. 361–467. 68. M. Tegze and G. Faigel, Europhys. Lett. 16, 41–46 (1991). 69. G. J. Maalouf et al., Acta Crystallogr. A49, 866–871 (1993). 70. A. Szoke, Acta Crystallogr. A49, 853–866 (1993). 71. P. M. Len, S. Thevuthasan, and C. S. Fadley, Phys. Rev. B50, 11 275–11 278 (1994). 72. M. Tegze and G. Faigel, Nature 380, 49–51 (1996).
WEBSTER CASH University of Colorado Boulder, CO
An X-ray telescope is an optic that is used to focus and image x-rays like a conventional telescope for visible light astronomy. X-ray telescopes are exclusively the domain of X-ray astronomy, the discipline that studies highenergy emissions from objects in space. A telescope, by its very nature, is for studying objects at large distances, concentrating their radiation, and magnifying their angular extent. Because X rays cannot travel large distances through the earth’s atmosphere, they can be
1496
X-RAY TELESCOPE
used only in spacecraft, observing targets through the vacuum of space. X rays penetrate matter and are used to image the interior of the human body. Unfortunately, this means that X rays also tend to penetrate telescope mirrors, making the mirror worthless for astronomy. But, by using of a specialized technique called grazing incidence, telescopes can be made to reflect, and astronomy can be performed. In this article, we describe the basic techniques used to build X-ray telescopes for X-ray astronomy. Because X-ray astronomy must be performed above the atmosphere, it is a child of the space program. Telescopes are carried above the atmosphere by rockets and used to study the sun, the planets, and objects in the depths of space. These telescopes have provided a different view of the hot and energetic constituents of the universe and have produced the key observations to establish the existence of black holes. X-RAY ASTRONOMY Spectra of Hot Objects X rays are created by the interactions of energetic charged particles. A sufficiently fast moving electron that impacts an atom or ion or is accelerated in a magnetic or electric field can create high-frequency radiation in the X-ray band. Thus X rays tend to be associated with objects that involve high-energy phenomena and high temperatures. In general, if radiation is generated by thermal processes, the characteristic frequency emitted will be given by hν ≈ kT (1)
Intensity
The X-ray band stretches from 1016 –1018 Hz, which indicates that the characteristic temperatures of objects range from 106 K up to 108 K. At these extreme temperatures, matter does not exist in the forms we see in everyday life. The particles are moving so fast that atoms become ionized to form plasma. The plasma can have very low density as in the blast wave of a supernova, or very high density under the surface gravity of a neutron star. Many stars, including the sun, have a hot, X-ray emitting gas around them, called a corona. For example, the Xray spectrum of the corona of the star HR1,099 is shown in Fig. 1.
5
10
15
20 25 Wavelength (Å)
30
Figure 1. X-ray spectrum of the star HR1,099.
35
Figure 2. Image of the X rays emitted from the 300-year-old supernova explosion remnant named Cas-A. The picture shows an expanding shock front of interstellar gas and the collapsed remnant star in the middle. This image was acquired using the 1.2-m diameter X-ray telescope on the Chandra Satellite. See color insert.
Dramatic events in the universe, such as supernovae, also generate X rays. The shock waves from the exploding star heat as they pass through interstellar space to create supernova remnants, an expanding shell of hot plasma, as shown in Fig. 2. Another way by which X rays are created is through the acceleration of matter near the surfaces of collapsed stars, including white dwarfs, neutron stars, and black holes. As matter spirals into these extreme objects, it heats to high temperatures and emits X rays. X-ray telescopes, through imaging, timing, and spectroscopy, have played a central role in proving the existence of black holes and in studying the physics of the most extreme objects in the universe. Interstellar Absorption The space between stars is a vacuum, better than the best vacuum ever created in a laboratory on earth. But in the immense stretches of interstellar space, there is so much volume that the quantities of gas become large enough to absorb X rays. In our part of the galaxy, there is an average of about one atom of gas for every 10 cc of space. The composition of this gas is similar to that of the sun, mostly hydrogen and some helium mixed in. There is also a small, but significant quantity of other heavier elements such as oxygen and carbon. Across the distances between the stars, the absorption caused by these stars can become significant in the soft X-ray band. In Fig. 3, we show a graph of the transmission of interstellar gas as a function of X-ray energy. It shows that low-energy X rays have higher absorption, so that one cannot see as far through the galaxy at 0.1 keV, as at 1 keV. It is for this reason that x-ray telescopes are usually
X-RAY TELESCOPE
For hard X rays, however, there is a more modest means of achieving altitude — the balloon. Large balloons can carry telescopes to altitudes above 100,000 feet, which is sufficient to observe the harder X rays. Balloons can stay up for hours, days, and even weeks, compensating for the lower flux of observable signal.
1.0
Transmission
0.8
0.6
Signal-to-Noise Issues 0.4
0.2
0.0
0
20
40
60 80 Wavelength (Å)
100
120
Figure 3. The thin gas between the stars can absorb X rays, particularly at low energies. This graph shows the transmission of the interstellar medium as a function of X-ray wavelength for the amount of interstellar gas expected at 2,000 parsecs.
designed to function above 0.5 keV and up to 10 keV, if possible. Atmospheric Absorption Because of absorption, X rays cannot penetrate the earth’s atmosphere. The primary interaction is by the photoelectric effect in the oxygen and nitrogen that are the main constituents of our atmosphere. Thus, to use an X-ray telescope, we must be above most of these gases. In Fig. 4, we show the transmission of the atmosphere at an altitude of 110 km, as it becomes partially transparent. A soft X-ray telescope requires a rocket to gain sufficient altitude. Whether by suborbital rocket to view above the atmosphere quickly for five minutes, by a larger missile carrying a satellite to orbit, or by a larger launcher yet carrying the telescope to interplanetary space, the rocket is essential.
1.0
0.8 Transmission
1497
0.6
Telescopes were not used in the early days of Xray astronomy. The first observations of the sky were performed using proportional counters. These are modified Geiger counters that have large open areas and no sensitivity to the direction from which the photon came. Grid and slat collimators were used to restrict the solid angle of sky to which the detector was sensitive, so that individual sources could be observed. Collimators were built that achieved spatial resolution as fine as 20 arcminutes, which was adequate for the first surveys of the sky and for studying the few hundred brightest X-ray sources in the sky. However, the detectors had to be large to collect much signal, and that led to high detector background. The large angular extent caused signal from the sky to add to the background as well. Additionally, many faint sources could be unresolved in the field of view at the same time, leading to confusion. If X-ray astronomy were to advance, it needed telescopes to concentrate the signal and resolve weak sources near each other in the sky. GRAZING INCIDENCE X rays are often referred to as ‘‘penetrating radiation’’ because they pass easily through matter. The medical applications of this property revolutionized medicine. However, when the task is to build a telescope, it becomes necessary to reflect the radiation rather than transmit or absorb it. The fraction of 1-keV X rays reflected from a mirror at normal incidence can be as low as 10−10 , effectively killing its utility as an optic. Another problem of conventional mirrors is their roughness. If a mirror is to reflect radiation specularly, it needs to a have surface roughness substantially lower than the wavelength of the radiation. In the X ray, this can be difficult, considering that the wavelength of a 10-keV X ray is comparable to the diameter of a single atom. These problems would have made X-ray telescopes impossible, except for the phenomenon of grazing incidence, in which an X ray reflects off a mirror surface at a very low angle, like a stone skipping off a pond (Fig. 5).
0.4
Grazing Incidence The graze angle is the angle between the direction of the incident photon and the plane of the mirror surface. In
0.2
0.0
0
20
40
60
80
100
Wavelength (Å) Figure 4. This shows the fraction of X rays transmitted from overhead down to an altitude of 110 km above sea level as a function of X-ray wavelength.
Figure 5. An X ray approaches a mirror at a very low graze angle, reflecting by the property of total external reflection.
1498
X-RAY TELESCOPE
most optical notation, the incidence angle is the angle between the direction of the ray and the normal to the plane of the mirror. Thus, the graze angle is the complement of the incidence angle. At any given X-ray wavelength, there is a critical angle, below which the X rays reflect. As the graze angle drops below the critical angle, the efficiency of the reflection rises. As the energy of the X ray rises, the critical angle drops, so hard X-ray optics feature very low graze angles. In general the critical angle θc is given by sin θc = λ
e2 N mc2 π
The reflectance of the transverse electric wave (one of the two plane polarizations) is given by RE =
(2)
where λ is the wavelength of the X ray and N is the number of electrons per unit volume (1). This behavior comes about because the index of refraction inside a metal is less than one, to an X ray. The X ray interacts with the electrons in the reflecting metal as if it had encountered a plasma of free electrons. The wave is dispersed and absorbed as it passes through the metal, allowing the index of refraction to fall below one. This process can be described by assigning a complex index of refraction to the material (2). The index of refraction n of a metal is given by n = 1 − δ − iβ,
(3)
where the complex term β is related to the absorption coefficient of the metal. If β is zero, then δ cannot be positive. A well-known optical effect in the visible is total internal reflection, which leads to the mirror-like reflection of the glass in a fish tank. If the radiation approaches the glass at an angle for which there is no solution to Snell’s law, it is reflected instead of transmitted. This happens when the radiation attempts to pass from a medium of higher index of refraction to one of lower index. In the X ray, where the index of refraction in metals is less than one, total external reflection is experienced when the ray tries to pass into a metal from air or vacuum where the index of refraction is closer to unity.
RM =
but now ϕt is complex, and evaluation involves complex arithmetic.
cos ϕi − n cos ϕt cos ϕi + n cos ϕt
∗ ,
(5)
cos ϕi − cos ϕt /n cos ϕi + cos ϕt /n
cos ϕi − cos ϕt /n cos ϕi + cos ϕt /n
∗ ,
(6)
where now the star on the second term represents the complex conjugate. Extensive work has been done over the years to tabulate the index of refraction of X rays in a wide variety of optical materials and elements. Tabulations can be found in the literature (3). Certainly, the most common material used as a coating in the X ray is gold, which is stable and easy to deposit. A few other materials can be better, including platinum and osmium at high energies and nickel and uranium below 1 keV. In Fig. 6, we show the reflectivity of gold as a function of X-ray energy. From perusal of the chart, one can see that in the quarter kilovolt band (0.1–0.25 keV), graze angles as high as 5° are possible. At 1 keV, angles close to 1° are required, and at 10 keV, angles below half a degree are necessary. Mirror Quality The principles of grazing incidence provide a technique for the efficient reflection of X rays, but the mirrors must be of adequate quality. Analogous to all telescope mirrors, they must have good figure, good polish, and adequate size to suppress diffraction (4). In Fig. 7, we show the reflection of the ray in three dimensions. The direction of the ray can be deflected to the side by a slope error in the ‘‘off-plane’’ direction.
1.0
Fresnel Equations
1°
0.8
Reflectance
The equations for reflection of X rays are the same as those for longer wavelength radiation and are known as the Fresnel equations (2). In the X ray, evaluation of the equations differs from the usual because the index of refraction has an imaginary component, so the algebra becomes complex. If radiation approaches at an angle ϕi with respect to the normal (ϕ, the incidence angle, is the complement of θ , the graze angle), then some of the radiation will reflect at an angle ϕr , which is equal to ϕi . Some of the power will be transmitted into the material at an angle ϕt with respect to the normal, where ϕt is given by Snell’s law: sin ϕi (4) sin ϕt = n
and the transverse magnetic wave (the other polarization) is given by
1/2 ,
cos ϕi − n cos ϕt cos ϕi + n cos ϕt
2° 0.6 3° 0.4
0.2
0.0
0
20
40
60
80
100
Wavelength (Å) Figure 6. Plot of the fraction of X rays reflected off a polished gold surface as a function of the wavelength of the X ray. Three graze angles are shown. One can see the drop in efficiency as the angle rises.
X-RAY TELESCOPE
have a diffraction limit that is given by
In-plane scatter Off-plane scatter
ϕ = 0.7
Figure 7. Symmetry is broken by a grazing incidence reflection. Scatter is much worse in the plane of the incident and reflected ray. There is scatter off-plane, but it is much lower.
However, as one can see from the diagram, to change the direction of the ray that reflects at angle θ by an angle close to θ requires a slope error of the order of 30° . The effective angle through which the ray is thrown is reduced by a factor of sin θ in the off-plane direction. This means that errors in mirror quality lead to greater error in the in-plane direction. The resulting image is blurred anisotropically and creates a long, narrow image in the vertical direction (4). The height of the image blur is roughly 1/ sin θ times the width. Microscopic roughness can also degrade the image by scattering the X rays. Scatter is a problem at all wavelengths, but it is particularly severe in the X ray region where the radiation has a short wavelength. Scatter is caused by small deviations in the wave fronts of the reflected light. A deviation in height on the surface of the mirror of size δ will create a deviation in phase of size 2δ sin θ , where θ is the graze angle. This means that the amount of scattering drops with the sine of the graze angle, that allows relaxation of polish requirements. If one assumes that the surface roughness can be described as a probability function that has Gaussian distribution, then the total amount of scatter will be given by −
S=e
4π σ sin θ λ
1499
λ D
(9)
where λ is the wavelength of the light, D is the diameter of the aperture through which the wave passes, and ϕ is the width of the diffracted beam in radians. The same equation holds true for the X ray, but now D represents the projected aperture of the optic, as viewed by the incoming beam. A grazing incidence optic of length L would create a projected aperture of size D sin θ , where θ is the graze angle. A large grazing incidence optic has a length of 50 cm, which indicates that D will be typically 1 cm in projection. Because an X ray has λ = 1 nm, ϕ will have a value of 10−7 radians, which is 0.02 arcseconds. This resolution is higher than that of the Hubble Space Telescope, so diffraction has not yet become a limiting factor in X-ray telescopes. WOLTER TELESCOPES The previous section describes how one can make a mirror to reflect X rays efficiently, but it does not provide a plan for building a telescope. The geometry of grazing incidence is so different from the geometry of conventional telescopes that the designs become radically different in form. At the root of the design of a telescope is the parabola of rotation. As shown schematically in Fig. 8, parallel light from infinity that reflects off the surface of a parabola will come to a perfect focus. However, this is a mathematical fact for the entire parabola, not just the normal incident part near the vertex. Figure 9 shows a full parabola. The rays that strike the part of the parabola where the slope is large compared to one, reflect at grazing incidence but also pass through the same focus. The mirror can be a figure of rotation about the axis of symmetry of the parabola,
2 ,
(7)
where σ is the standard deviation of the surface roughness (5). This means, for example, that for a surface ˚ a mirror at a graze angle of 1° can suppress polish of 10-A, scatter to less than 5%. The angle through which the X ray is scattered is given by ϕ = λ/ρ,
(8)
where ρ is the ‘‘correlation length’’ — the characteristic distance between the surface errors. This equation is similar to the grating equation that governs diffraction and also leads to scattering preferentially in the plane of reflection. Because X rays are electromagnetic radiation, they are subject to diffraction just like longer wavelength photons. However, because of their very short wavelengths, the effects are not usually noticeable. Visible light telescopes
Figure 8. A conventional telescope features a parabolic surface that focuses parallel rays from infinity onto a single point.
1500
X-RAY TELESCOPE
Figure 9. Extending the parabola to a grazing geometry does not interfere with its focusing properties.
just as at normal incidence. Such a figure of revolution is called a paraboloid. Thus, a paraboloidal mirror can reflect at grazing incidence near the perimeter and at normal incidence near the center. But, because the reflectivity is low near the center, in practice the mirrors are truncated, as shown in Fig. 9. The resulting shape resembles a circular wastepaper basket that has a polished interior. The aperture of the telescope is an annulus. Because the telescope is a simple parabola, the surface can be described by Z(r) =
ρ r2 − , 2ρ 2
(10)
where Z(r) is the height of the telescope above the focal position at a radius r. The parameter ρ is the radius of curvature at the vertex, which is never part of the fabricated paraboloid. As such, it is merely a formal parameter of the system. In a normal incidence telescope, the radius of curvature is approximately twice the focal length, but in a grazing parabola, the focal length is the distance from the point of reflection to the focal point. This causes a major problem for the performance of the telescope. Radiation that enters the telescope on the optic axis is theoretically concentrated into a perfect focus, but that approaching off-axis is not. The effective focal length of the ray is the distance between the point of reflection and the focus, which is approximately equal to Z. Because the paraboloid usually is long in the Z direction, the focal length is a highly variable function of the surface position. This is called comatic aberration. It is so severe that it limits the value of the paraboloid as an imaging telescope. Such paraboloids have been used, but mostly for photometric and spectroscopic work on bright, known targets, where field of view is unimportant.
The severity of this problem was recognized early in the history of X ray optics and was solved by adding a second reflection. In 1952, Wolter (6) showed that two optical surfaces in sequence could remove most of the coma, allowing for the design of a true imaging telescope. It is a mathematical property of the hyperbola that light converging to one focus will be refocused onto the other. Thus, by placing the focal point of the paraboloid at one of the foci of a hyperboloid, the light will be refocused, as shown schematically in Fig. 10. In the process, the distances traveled by the rays after reflection become close to equalized, and the comatic aberration is reduced. Wolter described three types of these paraboloidhyperboloid telescopes, as shown in Fig. 11. Each has a paraboloidal surface followed by a hyperboloidal surface. The first reflection focuses parallel light to a point. The second reflection redirects the light to a secondary focus. In X-ray astronomy, we use mostly Type I because the angles at which the rays reflect are additive and create a shorter distance to the focal plane. When the graze angles are low, this can be very important. Sometimes, in the extreme ultraviolet, where graze angles can be in the range of 5–15° , the Wolter Type 2 has been used (7). The equations for the paraboloid-hyperboloid telescopes are r2 ρ (11) z1 = 1 − − 2 a2 + b2 2ρ 2 and z2 =
a b2 + r22 − a2 + b2 . b
(12)
Paraboloid
Hyperboloid
Focus of hyperboloid
Focus of paraboloid Figure 10. A Wolter Type I telescope, also known as a paraboloid-hyperboloid telescope, features two reflections. The first focuses the light to a distant point, the second reflection, off the hyperboloid, refocuses to a nearer point and provides a wider field of view.
X-RAY TELESCOPE
Type I
Type II
Type III Figure 11. There are three types of Wolter telescopes shown. The Wolter Type I is dominant because it shortens the focal length and allows nesting. Type II is used for spectroscopic experiments at the longest X-ray wavelengths, where graze angles are higher.
Thus, three parameters define the surfaces of the optics, and the designer must additionally define the range of z1 and z2 over which the optic will be built. During the design process, care must be taken to ensure that the proper range of graze angles is represented in the optic, so that the spectral response of the telescope is as expected. This is usually done by computer ray tracing for field of view and throughput. In 1953, Wolter extended the theoretical basis for grazing incidence telescopes by applying the Abbe sine condition in the manner of Schwarzschild to create the Wolter–Schwarzschild optic (8). This optic is a double reflection that has three types analogous to the paraboloidhyperboloids, but have, theoretically, perfect lack of coma on-axis. The Wolter–Schwarzschild is described by the parametric equations, r1 = F sin α, z1 =
−F FC sin2 α F + + 1 − C sin2 (α/2) C 4 R
2−C 1−C
(13)
−2C
× cos 1−C (α/2),
(14)
z2 = d cos α,
(15)
r2 = d sin α,
(16)
where −C 1 R C 1−C 2 2 = sin (α/2) + 1 − C sin (α/2) d F F × cos
2 1−C
(α/2).
(17)
1501
For this system, a ray that approaches on-axis, strikes the primary at (r1 , z1 ), reflects, and strikes the secondary at (r2 , z2 ), and then approaches the focus (located at the origin) at an angle of α off-axis. The parameter F is the effective focal length of the system and has units of length. C and R are dimensionless shape parameters. In practice, as the graze angle becomes small, the performance advantage of the Wolter–Schwarzschild over the paraboloid-hyperboloid becomes small. So often added complexity is avoided, and simple paraboloid-hyperboloids are built. One advantage of the type I Wolter telescopes is that their geometry allows nesting. Because the effective focal length of each optic is approximately the distance from the plane at the intersection of the paraboloids and hyperboloids to the focus, a series of Wolters of different diameter can be nested, one inside the other, to increase the effective area of the telescope. This implies that the outer nested pairs must reflect X rays at higher graze angles. The actual design process for X ray telescopes is usually performed by computer ray tracing. The resolution as a function of angle off-axis is difficult to estimate in closed form. Furthermore, the reflectivity of the X rays is a function of angle, and the angle can change substantially across the aperture, meaning that not all parts of the aperture have equal weight and that the response is a function of the energy of the incident radiation. By ray tracing, the resolution and throughput can be evaluated as a function of X-ray energy across the field of view of the telescope. THIN MIRROR TELESCOPES For the astronomer, the ability to study faint objects is crucial, meaning that a large collecting area is of primary importance. Wolter telescopes can be nested to enhance the total collecting area, as described in the previous section. However, the thickness of the mirrors employed occults much of the open aperture of a typical Wolter. The ideal Wolter telescope would have mirrors of zero thickness, so that all of the aperture could be used. Thus, mirrors made from thin, densely packed shells can significantly enhance capability. This is also important because thin mirrors allow the maximum collecting area for a given weight. Because these mirrors are to be launched into space aboard rockets, maximizing the ratio of collecting area to weight can be of central concern. This is especially true at high energies, where graze angles are low and the mass of mirror needed to collect X rays rises. Unfortunately, as mirrors are made thinner, they become less rigid and less able to hold a precise optical form. Thus a telescope made of thin, closely packed mirrors is likely have lower quality. Additionally, paraboloid and hyperboloid mirrors are expensive to fabricate in large numbers on thin backings. One solution has been to make the mirrors out of thin, polished foils. When a thin (often well below a millimeter) foil is cut to the right shape and rolled up, it can form a conical shape that approximates a paraboloid or hyperboloid, as shown in Figure 12. Two of these, properly
1502
X-RAY TELESCOPE
Figure 12. The paraboloid of Fig. 9 can be approximated by a cone. This allows fabricating low-cost, low-quality optics.
spectroscopy. In X-ray spectroscopy, the targets that are bright enough to be studied tend to be far apart in the sky, so low resolution is not an issue. Also, because the background signal from the sky is very low, there is no loss of signal-to-noise by letting the focal spot size grow. An alternative to simple cones has been to use electroforming techniques to make thin shell mirrors. In the electroforming process, one builds a ‘‘mandrel’’ which is like a mold, the inverse of the desired optic. The mandrel is machined from a piece of metal and then polished. When placed in an electroforming bath, a shell of metal (usually nickel) is built up around the mandrel by electrochemical processes. When the shell is sufficiently thick, it is removed from the bath and separated from the mandrel. The disadvantage of this process is that many mandrels must be made. The advantage is that many copies are possible at low cost. Because the mandrel is machined, a sag can easily be added to approximate the ideal Wolter shape more closely. Metals (such as nickel) that are used in electroforming tend to have high density and lead to heavy telescopes. Sometimes, replication is used to reduce weight. A shell of some lightweight material like carbon-fiber epoxy is built, that approximates the desired shape. The mandrel is covered by a thin layer of epoxy, and the shell placed over it. The epoxy dries to the shape and polish of the mandrel. When the replica is pulled away, it provides a mirror that has low weight and good imaging properties. KIRKPATRICK–BAEZ TELESCOPES
Figure 13. Photograph of an X-ray telescope that has densely nested thin shell mirrors to build a large collecting area.
The first imaging optic built using grazing incidence was an X-ray microscope built by Kirkpatrick and Baez in 1948 (9). (At the time, the space program had not really started, and only the sun was known to be a source of X rays. So there was no immediate application for X-ray telescopes.) The optic consisted of two, standard, spherically shaped mirrors in sequence, which, together, formed an imaging optic, as shown in Fig. 14. Parallel light incident onto a mirror that has a concave, spherically shaped surface of curvature R and graze angle θ will come to a line focus a distance (R/2) sin θ from the mirror, parallel to the mirror surface. There is very little focusing in the other dimension. Because we usually want a two-dimensional image, a second reflection is added by placing the second mirror oriented orthogonally, beyond the first. If the two mirrors are properly placed, then both dimensions focus at the same point. Kirkpatrick and Baez used spheres because they are readily available, but spheres have severe on-axis coma, and thus can have poor focus. The usual solution for
configured, then approximate a Wolter. Because there is no sag to the mirrors, the rays do not come to a theoretically perfect point focus. Instead, they form a spot of diameter equal to the projected width of the conical mirror annulus. If the goal of the builder is to create a focus of modest quality that has a large collecting area, especially at high energies, then this is an acceptable trade. A number of such telescopes have been built (Fig. 13), usually for
Figure 14. The Kirkpatrick–Baez optic is the first focusing X-ray optic made. It features two flat (or nearly flat) mirrors that reflect X rays in sequence. Each mirror provides focus in a different dimension.
X-RAY TELESCOPE
1503
1.0
Reflectance
0.8
0.6
0.4
0.2
0.0
Figure 15. To achieve a large collecting area, the Kirkpatrick–Baez telescopes require many co-aligned mirrors.
making a telescope is to replace the sphere by a onedimensional paraboloid. The paraboloid, which can be made by bending a sheet of glass or metal, has the geometric property of focusing the on-axis light to a perfect line focus. The paraboloids can then be nested to add extra signal into the line focus. To achieve a two-dimensional image, another set of co-aligned paraboloids must be placed after the first set, rotated 90° around the optic axis, as shown in Fig. 15. This creates a two-dimensional image. Using this ‘‘Kirkpatrick–Baez’’ geometry, one can build telescopes that have very large collecting areas, suitable for studying faint sources. The disadvantages of these telescopes are that they have relatively poor resolution, typically no better than 20 arcseconds, and have a fairly small field of view due to comatic aberration. MULTILAYER TELESCOPES A multilayer coating can be considered a synthetic version of a Bragg crystal. It consists of alternating layers of two materials deposited on a smooth substrate. Typically, one material has a high density and the other a low density to maximize the change in the index of refraction at the material interface. If the radiation is incident on a multilayer of thickness d for each layer pair at an angle θ , then constructive interference is experienced if the Bragg condition, mλ = 2d cos θ, (18) is met. This creates a narrow-wavelength band where the reflectance of the surface is much higher than using a metal coating alone. In Fig. 16, we show the reflectance of a multilayer mirror that consists of alternating layers of tungsten and silicon as a function of incident energy. This narrow energy response can be tuned to the strong emission lines of an object like a star but leads to the absorption of most of the flux from a continuum source. By placing the multilayer on the surface of a conventional, normal incidence mirror, it becomes possible
0
2000
4000 6000 Energy (eV)
8000
10000
Figure 16. Multilayer-coated mirrors can provide high reflectivity, where otherwise there would be none. In this graph, we show the response of a multilayer-coated mirror at a 3° graze. To the left, at low energy, the X rays reflect as usual. The multilayer provides narrow bands of high reflectivity at higher energies. Three Bragg orders are visible in this plot.
to create a narrow band of good reflectivity at normal incidence, where before there was none. This is usually applied in the extreme ultraviolet, but multilayers now work effectively up to 0.25 keV and even 0.5 keV. This approach has been used to excellent advantage in the study of the sun (10). The spectral band in which most X-ray telescopes have functioned is between 0.1 and 3.0 keV. Above 3 keV, the required graze angle becomes so small that the aperture annulus becomes small as well. The problem is compounded by the relatively low flux of the sources. Thus, to build a telescope, either we live with low efficiency or find a better way to improve the collecting area. Multilayers can be used to enhance reflectivity at grazing incidence. Higher energy radiation can be reflected at any given graze angle, but the narrow spectral response is difficult to match to the changing graze angles of the surface of a Wolter telescope. It has now been shown (11) that by varying the thickness of the multilayers as a function of the depth, a broad band response can be created, making the multilayers more useful for spectroscopy and the study of continuum sources. This effect works particularly well for hard X-ray telescopes, where the penetrating power of the X rays is higher, leading to interaction with a larger number of layers. The next generation of hard X-ray telescopes will probably use this effect. X-RAY INTERFEROMETRIC TELESCOPES Telescope resolution is limited by diffraction. Diffraction in radio telescopes is so severe that most new major observatories are interferometric. By linking together the signals from the telescopes without losing the phase information, synthetic images may be created to match the resolution of a single giant telescope whose diameter is equal to the baseline of the interferometer. The need for
1504
X-RAY TELESCOPE
interferometry comes from engineering realities. At some point, a telescope simply becomes so large that either it cannot be built or the expenses cannot be met. The largest, practical mirrors of quality sufficient for X rays are about 1 m long. At a graze angle of 2° , the entrance aperture of a 1-m mirror is about 3 cm; this means that a 1-keV signal will be diffraction-limited at about 10 milliarcseconds (0.01’’). Thus, conventional X-ray telescopes can achieve very high resolution, ten times that of the Hubble Space Telescope, before the diffraction limit becomes a serious problem. But, there is much exciting science beyond the X-ray diffraction limit. For example, at 10−3 arcseconds, it is possible to image the corona of Alpha Centuari, and at 10−6 arcseconds, it will be possible to resolve the event horizons of black holes in nearby active galactic nuclei. To build an X-ray interferometer that can accomplish these goals may, seem impossible at first, but the properties of grazing incidence, coupled with the properties of interferometers, provide a pathway. First, one must achieve diffraction-limited performance in two or more grazing incidence mirrors and then combine the signals in a practical way to achieve a synthetic aperture. Such a system has now been built and demonstrated in the laboratory (12), but, although there are plans to fly an X-ray interferometer in the 2,010 time frame, none has yet been launched. In theory, paraboloid-hyperboloid telescopes can reach the diffraction limit but in practice are too expensive. So, flat mirrors are used to ease the mirror figure and polish requirements. Figure 17 shows a schematic of a practical X-ray interferometer that has produced fringes in the laboratory and is being scaled up for flight. The idea is provide two flat, grazing incidence mirrors set at an arbitrary separation that direct the X rays into a beam combiner. The separation of these two mirrors sets the resolution of the interferometer. The two beams are then mixed by a beam combiner. This has traditionally been accomplished for X rays by using Laue crystals (13), but the thickness of the crystals, coupled with their low efficiency, makes them impractical for astronomy.
A solution that provides broad spectral response and high efficiency is simply to use two more flat mirrors. The beams cross and then strike the second set of flat mirrors at a graze angle just slightly higher than the initial reflection. This brings the two beams (still plane parallel) back together at a low angle. As Fig. 17 shows, if the two beams are coherent when they cross, fringes will appear. The spacing s of the fringes is given by Lλ , (19) s= d where d is the separation of the second set of mirrors and L is the distance from the mirrors to the focal plane where the beams cross. If L/d is sufficiently large, then the fringes can be resolved by a conventional detector. For example, if L/d is 100,000, then the fringe spacing from 10-A˚ X rays will be 100 µ, easily resolved by most detectors. This approach to beam combination is highly practical because it uses flat mirrors and has high efficiency. It also works in a panchromatic way. Each wavelength of radiation creates fringes at its own spacing, so, if the detector can resolve the energy of each photon, the individual sine waves will be resolved, and the interferometer will function across a wide spectral band. A single pair of mirror channels is inadequate in an interferometer for X-ray astronomy. In the early days of radio interferometry, a single pair was used, and the UV plane was sampled as the source drifted across the sky. However, many of the most interesting X-ray sources are highly variable, and it may not be possible to wait for a change of orientation. Thus, a substantial portion of the UV plane needs sampling simultaneously, effectively requiring that more than two channels mix at the focal plane. One attractive geometry that is being pursued is to use a ring of flats, as shown in Fig. 18. This can be considered a dilute aperture telescope. Because each of the many mirror paths interferes against all of the others, there is a large range of sampling in frequency space, and the beam pattern starts to resemble that of a telescope. GRATINGS
Detector
The analysis of spectra is a central tool of the X-ray astronomer, just as it is in other bands of the spectrum.
Flats Beams cross
Figure 17. A simple X-ray interferometer, capable of synthetic aperture imaging may be built in this configuration. A pair of flat mirrors at grazing incidence causes the wave fronts to cross. Another pair of flat, grazing incidence mirrors redirects the light to an almost parallel geometry, where the beams cross at a large distance. Because of the low angles at which the wave fronts cross, fringes much larger than the wavelength of the X rays can be created.
Figure 18. The interferometer of Fig. 17 can be made more powerful by placing multiple sets of flat mirrors in a ring that feeds a common focus.
X-RAY TELESCOPE
X-ray astronomy differs from other fields in that much of the spectroscopy is performed by using energy-sensitive, photon-counting detectors. The first X-ray observatories had no optics but could still perform low-resolution spectroscopy by using proportional counters. These devices measured the number of secondary ionization events caused by an X ray and thus led to an energy estimate of each photon. However, the spectral resolution (R = λ/δλ) was only about 5 at 1 keV. Solid-state systems, as exemplified by CCDs, now reach R = 20 at 1 keV, and quantum calorimeters have a resolution of several hundred and are still improving. To achieve very high spectral resolution, the X-ray astronomer, just like the visible light astronomer, needs a diffraction grating. Similarly, diffraction gratings come in two categories. Transmission gratings allow the photons to pass through and diffract them in the process; reflection gratings reflect the photons at grazing incidence and diffract them in the process. Transmission Transmission gratings are, in essence, a series of thin, parallel wires, as shown in Fig. 19. The space between the wires is ideally empty to minimize absorption of X rays. The wires should be optically thick to absorb the flux that strikes them. This creates a wave-front amplitude on the far side of the grating that is shaped like a repeating square wave. At a large distance from the grating, constructive interference can be experienced where the wave satisfies the ‘‘grating equation’’ (1), nλ = d sin α,
(20)
d
where d is the groove spacing, λ the wavelength of the radiation, and α is defined as in Fig. 19. The value of α is the dispersion angle, which is limited to about ±1° in the X ray. Because the grating consists of simple wires that cast shadows, the grating cannot be blazed in the same way as a reflection grating. If the wires are fully opaque and they cover about half of the area, then the plus and minus first orders of diffraction have about 10% of the beam at a maximum signal of 20%. This is a low efficiency, but is still much higher than crystals, because the entire spectrum is dispersed with that efficiency. To disperse the short wavelength X rays through a substantial angle, the grooves must be very close together, typically 100–200 nm. This in turn means that the gratings must be very thin, substantially under 1 µ. Making a single grating of this groove density and thickness is currently impractical, so the transmission grating used in a telescope consists of many small facets, each about a square centimeter in extent. These are arranged behind the Wolter telescope. The aberrations introduced off-axis by the grating array are smaller than those from the Wolter telescope, so a better spectrum is gained by placement behind the optic in the converging beam. Reflection A reflection grating is an alternative to transmission gratings. The same reflection gratings that are used in the ultraviolet and longer wavelength bands can be used in X ray as well, as long as they are used at grazing incidence. When X rays approach a grating at grazing incidence, the symmetry of the interaction is broken. The rays can approach the grating in the plane that lies perpendicular to the grooves, and the diffracted light will also emerge in that plane, as shown in Fig. 20. This is known as the ‘‘inplane’’ mount. However, it is also possible for the radiation to approach the grating quasi-parallel to the grooves, in which case it diffracts into a cone about the direction of the grooves, as shown in Fig. 21. This is known as conical diffraction in the ‘‘extreme off-plane’’ mount because the diffracted light no longer lies in the same plane as the incident and zero-order reflected light. The grating equation can be written in general form, nλ = d sin γ (sin α + sin β),
q
-1
0
+1
Figure 19. An X-ray transmission grating is made of many finely spaced, parallel wires. Diffraction of X rays results at wire densities of up to 10,000 per millimeter.
1505
(21)
where n is the order number, λ the wavelength, and d the groove spacing. α and β are the azimuthal angles of the incident and diffracted rays about the direction of the grooves, and γ is the angle between the direction of the grooves and the direction of the incident radiation. For an in-plane mount, γ is 90° , and the grating equation simplifies to nλ = d(sin α + sin β), (22) which is the usual form of the equation. In the off-plane mount, γ is small and makes the change of azimuthal angle larger at a given groove density.
1506
X-RAY TELESCOPE
nλ d
β
α
Figure 20. In the conventional grating geometry, a ray approaches in the plane that lies perpendicular to the groove direction. The reflected and diffracted rays emerge in the same plane. α measures the direction of the reflected, zero-order light, and β measures the diffracted ray. At grazing incidence, α and β must both be close to 90° .
nl d b a
g
a Figure 21. In the off-plane geometry, the ray approaches the grating nearly parallel to the grooves, separated by the angle γ . α and β are defined as in Fig. 20 but now are azimuthal angles. The diffracted light lies in the cone of half angle γ centered on the direction of the rulings in space.
Both approaches have been used in X-ray astronomy. The advantage of the in-plane mount is that the groove density can be low, easing the difficulty of fabrication. One advantage of the off-plane mount is that its dispersion can be greater at a given graze angle because all diffraction is at the same angle. Additionally, the efficiency of diffraction tends to be higher in the off-plane mount. In both mounts, as with transmission gratings, it is preferable to place the gratings in the converging beam
Figure 22. A varied line space grating features parallel grooves that become closer together along the length of the grating. This removes the coma from the spectrum.
behind the Wolter optic. If the gratings are placed ahead of the optic (14), then the aberrations of the telescope degrade the spectral resolution. However, a major advantage is that conventional, plane gratings can be used to disperse the light. If the gratings are used in the in-plane mount, then the grooves may be parallel. To compensate for the changing distance between the point of reflection and the focus, the dispersion must increase as one goes farther down the grating (Fig. 22). This leads to a grating whose grooves are parallel, but change density (15). Such grating arrays were used on XMM-Newton (16). An alternative that can lead to higher efficiency and greater resolution is to use the off-plane mount in the converging beam (17). Again, the gratings stretch from right behind the mirror to closer to the focal plane. Near the bottom of the gratings, the dispersion needs to be higher, so the grooves must be closer together. Such a geometry requires the grooves to be radial, fanning outward from a hub which is near the focus (Fig. 23). Such gratings can be more difficult to fabricate, given their requirement for higher groove density, but can give much better performance (18). Such a grating was used to obtain an extreme ultraviolet spectrum of a hot white dwarf in a suborbital rocket experiment (19). When the gratings are placed in the converging beam, behind the optics, severe aberrations result if plane gratings are used. So, the grooves must not be parallel. Additionally, because there is a range of angles eight times the graze angle represented in the converging beam, eight or more gratings are required in an array, as shown in Fig. 24. MAJOR MISSIONS The history of X-ray telescopes can be tracked in the major missions that have flown and performed new astronomy
X-RAY TELESCOPE
1507
Figure 23. A radial groove grating features grooves that radiate from a hub off the edge of the grating. This removes the coma in the spectrum of the converging beam used in the off-plane mount.
Figure 25. An image of the sun was taken at 1 keV using a paraboloid-hyperboloid telescope on the Yokoh mission. See color insert.
the telescope and returned the film to earth for analysis. Using this instrument, scientists could track the changing complexity of the solar corona in detail for the first time. In Fig. 25, we show an X-ray image of the sun captured by the Yokoh satellite in 1991 (20). Einstein
Figure 24. The radial groove and varied line space gratings must be placed in arrays behind the telescope to cover the entire beam.
The first major observatory for X-ray astronomy that featured a telescope was flown in 1978 (21). Dubbed the Einstein Observatory, the mission was based on a set of nested, Wolter Type I telescopes. Using a 4-m focal length and a maximum diameter of 60 cm, this observatory captured the first true images of the X-ray sky at a resolution as fine as 6 arcseconds. It had a low-resolution imaging proportional counter and a higher resolution microchannel plate at the focal plane. A transmission grating array was available for the study of bright sources. This observatory operated for just 3 years, but through the use of grazing incidence telescopes, moved X-ray astronomy onto the central stage of astrophysics. Rosat
using optics. There are too many missions to discuss in detail, but a few stand out as landmarks and show the evolution of X-ray telescopes. Apollo Telescope Mount The first high-quality X-ray images captured by telescopes were taken in an experiment called the Apollo Telescope Mount which was flown on Skylab, the orbiting space station of the 1970s. This featured a small Wolter telescope that had a few arcseconds of resolution for imaging the sun. This telescope was built before there was general access to digital, imaging detectors, and so it used film at the focal plane. Astronauts on the station operated
The next major advance came from the ROSAT (22), which featured nested Wolter telescopes of 3-arcsecond quality. Unlike the Einstein Observatory, which was used in a conventional observatory mode of pointing at selected targets, Rosat was first used to survey the entire sky. By sweeping continuously around the sky, it was able to build up an image of the entire sky in the X-ray region at unprecedented resolution. Then, it spent the ensuing years studying individual targets in a pointed mode. Trace Remarkable new images of the X-ray emission from the sun became available from the TRACE mission (10).
1508
X-RAY TELESCOPE
Figure 26. An image of the hot plasma on the limb of the sun was taken using a multilayer-coated telescope on the TRACE satellite. See color insert.
This satellite used a multilayer coating on a normal incidence optic to image the detailed structure of the sun’s corona (Fig. 26). Because of its use of normal incidence optics, the telescope achieved exceptionally high resolution, but across a limited band of the spectrum. The structure of the X rays shows previously unsuspected detail in the sun’s magnetic fields.
Figure 27. The Chandra Satellite before launch. The 10-meter focal length of the X-ray mirror created a long, skinny geometry.
Chandra NASA launched the Chandra Observatory in July 1999. This was the natural successor to the Einstein Observatory (23). Chandra features high-resolution Wolter Type I optics nested six deep. The telescope is so finely figured and polished that it achieves resolution better than onehalf arcsecond, not far from the performance of the Hubble Space Telescope. Chandra, which is shown in Fig. 27, uses a CCD detector or a microchannel plate at the focal plane. It performs spectroscopy through the energy sensitivity of the CCD and by using a transmission grating array. The improved resolution is providing dramatic new results. For example, the very first image taken was of the supernova remnant Cas-A. The telescope is so fine that it could immediately identify a previously unremarkable feature near the center as a stellar remnant of the supernova explosion. In Fig. 28, we compare the images of the Crab Nebula from Rosat, and Chandra, illustrating the importance of improved resolution from telescopes in X-ray astronomy.
Figure 28. X-ray telescopes are improving. To the left is an image of the Crab Nebula acquired with the ROSAT in 1990. It shows the synchrotron emission in green and the bright spot which is the pulsar in the center. To the right is an image captured with Chandra in 1999. The resolution has improved from 3 arcseconds one-half arcsecond, and the level of detail is much higher. See color insert.
of the thin mirror variety. At the cost of some resolution (15 arcseconds), it can achieve a very high collecting area for studing faint objects. Spectroscopic studies using the CCDs and an array of reflection gratings are now starting to generate unique new information about the physics of X-ray sources.
XMM-Newton
Future Missions
In December 1999, the European Space Agency launched a major X-ray telescope into orbit. Called XMM-Newton, it is an X-ray Multi-Mirror Mission named in honor of Sir Isaac Newton (16). It features three high collecting area mirrors
Considerable effort has gone into the definition of future missions for X-ray astronomy. They appear to be splitting into two varieties, similarly to visible light astronomy. The first class of observatory will feature modest resolution
X-RAY TELESCOPE
(1–30 arcseconds) but very high collecting area using thin mirror telescopes. NASA is planning a mission called Constellation-X (24) which will generate an order of magnitude more collecting area for spectroscopy than we currently enjoy with Chandra. The European Space Agency is studying a mission called XEUS (X-ray Evolving Universe Satellite), which will feature a huge X-ray telescope of modest resolution using thin mirrors in orbit near the Space Station Freedom (25). The other area of development is toward high resolution. NASA is now studying a mission called MAXIM (Micro-Arcsecond X-ray Imaging Mission) which has the goal of using X-ray interferometry to improve our resolution of X-ray sources by more than a factor of 1,000,000(26). At resolution below a microarcsecond, it should be actually possible to image the event horizons of black holes in the centers of our galaxy and others. BIBLIOGRAPHY 1. J. A. R. Samson, Techniques of Vacuum Ultraviolet Spectroscopy, Cruithne, Glasgow, 2000. 2. M. Born and E. Wolf, Principles of Optics, 7e, Cambridge University Press, Cambridge, 1999, pp. 292–263. 3. B. L. Henke, E. M. Gullikson, and J. C. Davis, At. Data Nucl. Data Tables 54, 181 (1993). 4. W. Cash, Appl. Opt. 26, 2,915–2,920 (1987). 5. D. K. G. de Boer, Phys. Rev. B 53, 6,048–6,064 (1996).
1509
6. H. Wolter, Ann. Phys. 10, 94–114 (1952). 7. M. Lampton, W. Cash, R. F. Malina, and S. Bowyer, Proc. Soc. Photo-Opt. Instrum. Eng. 106, 93–97 (1977). 8. H. Wolter, Ann. Phys. 10, 286–295 (1952). 9. P. Kirkpatrick and A. V. Baez, J. Opt. Soc. Am. 38, 766–774 (1948). 10. http://vestige.lmsal.com/TRACE/ 11. D. L. Windt, App. Phys. Lett. 74, 2,890–2,892 (1999). 12. W. Cash, A. Shipley, S. Osterman, and M. Joy, Nature 407, 160–162 (2000). 13. U. Bonse and M. Hart, App. Phys Lett. 6, 155–156 (1965). 14. R. Catura, R. Stern, W. Cash, D. Windt, J. L. Culhane, J. Lappington, and K. Barnsdale, Proc. Soc. Photo-Opt. Instrum. Eng. 830, 204–216 (1988). 15. M. C. Hettrick, Appl. Opt. 23, 3,221–3,235 (1984). 16. http://sci.esa.int/home/xmm-newton/index.cfm 17. W. Cash, Appl. Opt. 22, 3,971 (1983). 18. W. Cash, Appl. Opt. 30, 1,749–1,759 (1991). 19. E. Wilkinson, J. C. Green, and W. Cash, Astrophys. J. (Suppl.), 89, 211–220 (1993). 20. http://www.lmsal.com/SXT/ 21. http://heasarc.gsfc.nasa.gov/docs/einstein/heao2− about. html 22. 23. 24. 25.
http://heasarc.gsfc.nasa.gov/docs/rosat/rosgof.html http://chandra.harvard.edu http://constellation.gsfc.nasa.gov http://sci.esa.int/home/xeus/index.cfm
26. http://maxim.gsfc.nasa.gov
This page intentionally left blank
INDEX A A or B wind, 1040 A roll, 1034 A scan, in ultrasonography, 1413 A&B cutting, 1040 Abbe (V) number, 234–235, 1081–1082, 1100–1102, 1109, 1122–1123 Abbe sine condition of light optics, in charged particle optics and, 88 Abbe, Ernst, 261, 1100, 1106, 1109 Aberrations, 91–92, 234, 559, 1095, 1098, 1114–1124 accommodation and, 547 in charged particle optics, 92–94, 98 chromatic aberration, 1081–1082 human eye, 547–548 human vision and, 554–555 microscopy and, 1106 monochromatic, 1083–1085 optical transfer function (OTF) and, 1095–1098 PSF and, 1088–1089 wave theory of, 1083–1084 Absolute fluorescence (ABF), 863 Absorptance, 527–529 Absorption, 253–256, 527–529, 803 astronomy science and, 684–686 atmospheric, 1497 interstellar, 1496–1497 microscopy and, 1126 near resonance, 228–233 photodetectors and, 1184–1187 silver halide, 1273 Absorption band, 229 Absorption edge, 255 Absorption lines, 235 Academy aperture, 1040, 1359 Academy leader, 1040 Accommodation, 547, 1328 Acetate film, 1039, 1040 Acetone, PLIF and, 409 Achromatic doublet, 235 Acoustic antireflective coatings (AARC) in scanning acoustic microscopy (SAM), 1231 Acoustic impedance, in ultrasonography, 1415 Acoustic microscopy, 1128–1148 Acoustic reciprocity theorem, 1 Acoustic sources or receiver arrays, 1–9 Active centers, 1137 Active glasses, in three-dimensional imaging, 1331 Active matrix liquid crystal displays, 374, 857–858, 956
Active MATrix coating (AMAT), in electrophotography, 301 Active pixel sensors (APS), 1199–1200 Active snakes, thresholding and segmentation in, 644–645 Acuity, of human vision, 558–560 Acutance, 1357 Addition, image processing and, 590 Addition, Minkowski, 433, 612 Additive color films, instant photography and, 847–849 Additive color matching, 102 Additive color mixing, 126–127 Additive printing, 1040 Additivity, Grassmann’s, 531 Addressability, cathode ray tube (CRT), vs. resolution, 32 Addressing displays, liquid crystal, 955–959 Admissibility condition, wavelet transforms in, 1446 Advance, 1040 Advanced Photo System (APS) in, 141 Advanced Technology Materials Inc. (ATMI), 377 Advanced Very High Resolution Radiometer (AVHRR), 759, 760–761 Advanced Visible Infrared Imaging Spectrometer (AVIRIS) geologic imaging and, 648, 649 in overhead surveillance systems, 787 Aerial imaging, 350, 463–476, 1040 Aerial perspective, 1328 Aerosol scattering, lidar and, 880–882 Afocal systems, 1074, 1079 Agfa, 1024 Agfa Geaert, 839 Agfachrome Speed, 839 Agfacolor Neue process in, 128 Aging and human vision, 541, 549, 560 AgX (See Silver halide) Airborne and Topographic SAR (AIRSAR/TOPSAR), 650 Airborne radar, 1471–1473 Aircraft in overhead surveillance, 773–802 Airy disk, 249 flow imaging and, 399 microscopy and, 1110, 1124 telescopes, 688 Airy function, 94, 399 Airy pattern, 1082 1511
Albedo, 610, 611 AlGaN/GaN, scanning capacitance microscope (SCM) analysis, 22 Algebraic opening/closing, 437 Algebraic reconstruction technique (ART), in tomography, 1407–1408 Algebraic theory, in morphological image processing, 430 Aliasing, 50, 59, 60, 69, 75, 84 flow imaging and, 401 in magnetic resonance imaging (MRI), 986 Along Track Scanning Radiometer (ATSR), 772 Alternating current scanning tunneling microscopy (ACSTM), 28 Alternating pairs, in three-dimensional imaging, 1338 Alternating sequential filters, 437 Alumina dopants, secondary ion mass spectroscopy (SIMS) in analysis, 487–489 Aluminum lithium alloys, secondary ion mass spectroscopy (SIMS) in analysis, 484 Ambient light, vs. cathode ray tube (CRT), 182–183 American National Standards Institute (ANSI), 1041 American Standards Association (ASA), 1023 Amines, 1179 Amino acids (See Biochemistry) Ampere’s law, 211, 218 Amplifiers, SQUID sensors, dc array, 12 Amplifying medium, radar, 223 Amplitude, 212, 226, 1098 beam tilting, 5 nonsymmetrical, 4 Amplitude modulation (AM), 383, 1362 Amplitude reflection coefficient, 235 Amplitude resolution, 151 Amplitude time slice imaging, ground penetrating radar and, 472–475, 472 Amplitude transmission coefficient, 235 Amplitude weighting, 3 Anaglyph method, in three-dimensional imaging, 1331
1512
INDEX
Analog technology, 1040 in endoscopy, 334 SQUID sensors using, 10–12 Analog to digital conversion (ADC), 49, 61 in field emission displays (FED), 382 in forensic and criminology research, 721 in overhead surveillance systems, 786 Analytical density, 1040 Anamorphic image, 1040 Anamorphic lens, 1031, 1040 Anamorphic release print, 1040 AND, 590 Anger cameras, neutron/in neutron imaging, 1062–1063 Angle, 1040 Angle alpha, 541 Angle of incidence, 234, 468 Angle of reflection, 234 Angle of refraction, 234 Angular dependency, in liquid crystal displays (LCDs), 184 Angular field size, 1080 Angular frequency, 213 Angular magnification, 1076–1077 Angular resolution, High Energy Neutral Atom Imager (HENA), 1010 Angular spectrum, 1099 Animation, 1022, 1040–1042 cel, 1042 meteor/in meteorological research, 763–764, 769 in motion pictures, 1035 Anisotropic coma, 94 Anisotropic distortion, 94 Anisotropic astigmatism, 94 Anode, in field emission displays (FED), 380, 386, 387 Anomalies, gravitational, 444–445 Anomalous dispersion, 228 Anomalous propagation, radar, 1454 Anomalous scattering, in biochemical research, 698 Ansco Color process in, 128 Answer print, 1041 Antennas, 220, 242 ground penetrating radar and, 464–465, 468–469 in magnetic resonance imaging (MRI), 979 radar and over the horizon (OTH) radar, 1142, 1151, 1450 terahertz electric field imaging and, 1394 Antifoggants, in photographic color display technology, 1216 Antihalation backing, 1041
Antireflective coatings (ARCs), 381 Aperture, 54, 57–59, 243, 247–249, 1041, 1101, 1352 academy, 1040, 1359 Fraunhofer diffraction in, 247–249 microscopy and, 1108, 1122, 1124, 1128 in motion pictures, 1028 numerical, 1081, 1115 in overhead surveillance systems, 783 radar and over the horizon (OTH) radar, 1147 relative, 1081 Aperture plate, 1041 Aperture stop, 1080 Aperture transmission function, 248 Aplanatic condensers, 1123 Apodization, 1086, 1088 Apollo Telescope Mount, 1507 Appliances, orthopedic, force imaging and, 422 Applications Technology Satellite (ATS), 757 Aqueous humor, 512 Arc lamps, 1041 Archaeology, ground penetrating radar and, 464 Archiving systems, 661–682 art conservation and analysis using, 661–682 in motion pictures, 1038–1039 Argon plasma coagulation (APC), in endoscopy, 340 Armat, Thomas, 1022 Array amplifiers, SQUID sensors, 12 Array of Low Energy X Ray Imaging Sensors (ALEXIS), lightning locators, 905, 929 Array theorem, 249 Arrival Time Difference (ATD), lightning locators, 890–904 Art conservation, 661–682 Artifacts quality metrics and, 598–616 tomography/in tomography, 1410 Artificial intelligence, 371 feature recognition and object classification in, 351 in search and retrieval systems, 632 Artificial vision systems, 352 Arylazo, 135 ASA/ISO rating, film, 1023, 1041 ASOS Lightning Sensor (ALS), 907, 922 Aspect ratio, 1031, 1041 in motion pictures, 1022 television, 147, 1359
Associated Legendre functions, 253 ASTER, 660 Astigmatism, 92, 94, 1084–1085, 1089 in charged particle optics, 93 electron gun, 40–41 microscopy and, 1119 Astronomy (See also Telescopes), 682–693 Apollo Telescope Mount, 1507 Chandra Observatory, 1508 Constellation X mission, 1509 Einstein Observatory Telescope, 1507 magnetospheric imaging, 1002–1021 ROSAT telescopes, 1507 TRACE telescopes, 1507–1508 X-ray Evolving Universe Satellite, 1509 X-ray telescope, 1495–1509 XMM Newton telescope, 1508 Asynchronous transfer mode (ATM), 1382 Atacama Large Millimeter Array (ALMA), 693 Atmospheric pressure chemical vapor deposition (APCVD), 384 Atmospherics, 890 Atomic force microscope (AFM), 16 Atomic transitions, 215 ATSC Digital Television Standard, 1359 ATSC Digital Television Standard, 1382–1389, 1382 Attenuation, 229 gamma ray, 260 ground penetrating radar and, 466 human vision and, 559, 562 radar and over the horizon (OTH) radar, 1147 in ultrasonography, 1416–1417 Audio (See Sound) Auroras, far ultraviolet imaging of, 1016–1020 Authentication digital watermarking and, 161 in forensic and criminology research, 739–740 Autoassemble, 1041 Autochrome Plate process, color photography, 127 Autocorrelation, 1105 Autoexposure, 1355–1356 Autofluorescence, 1136 Autofocus, 1356 Automated Surface Observing System (ASOS) lightning locators, 907, 922
INDEX
Autonomous System Lab, art conservation and analysis using, 664 Autostereoscopic displays, 1328, 1336–1341 Avalanche photoconductors, 1173–1174 Aviation meteorology, 767 Axis, camera, 1041 Azimuth range Doppler (ARD), radar and over the horizon (OTH) radar, 1148 Azohydroquinones, instant photography and, 834 Azomethines, instant photography and, 835 Azos, 1179 Azosulfones, instant photography and, 839
B B frame, video, 1387 B roll, 1034 B scan, in ultrasonography, 1413, 1417, 1428–1429 Babinet’s principle, 249 Background, 568–569 Background, in infrared imaging, 807, 810, 813 Background haze, 604 Background limited infrared photodetector (BLIP), 1189 Background uniformity, 605 Backing, 1041 Backlight systems, in three-dimensional imaging, 1339 Backpropagation algorithm, in neural networks, 372 Backscatter, 873, 1450, 1469 Backscattered electron (BE) imaging, in scanning electron microscopes (SEM), 276 Baffled circular piston, 8 Baffled rectangular piston, 8–9 Balance stripe, 1041 Balance, color, 114–116, 118 Ballistics analysis, in forensic and criminology research, 716 Balloons in overhead surveillance, 773 Band theory of matter, photoconductors and, 1170–1171 Bandpass filters, 593, 611 Bandwidth, 50 cathode ray tube (CRT), 179–180 digital watermarking and, 149–150 expansion of, 149–150 lightning locators, 905
in magnetic resonance imaging (MRI), 985 radar and over the horizon (OTH) radar, 1147 television, 1362, 1367 Bandwidth compression, in overhead surveillance systems, 786 Barn door, 1041 Barney, 1041 Barrel distortion, 93 Baryta paper, 1209–1210 Base, for film, 1022–23, 1041, 1045 BASIC, 186 Batchelor scale, flow imaging and, 404 Beam candlepower seconds (BCPS) rating, 492 Beam conditioning, X-ray fluorescence imaging and, 1477 Beam expanders, holography, 509 Beam patterns and profiles, 1, 1426–1427 Beam splitters, in holography, 509 Beam tilting, amplitude, 5 Bechtold, M., 456 Beer-Lambert absorption law, 409 Beer’s law, 344 Bell, Thomas, 455, 456 Benchmark comparison, 607 Bent contours, 282 Benzisoxazolone, in instant photography, 838 Benzoylacetanilides, 134 Bertrand polarization lens, 1132 Bessel function, 94, 1082 Best optical axis, in human vision, 540–541 Beta decay, 220 Bethe–Bloch equation, 1156 Biacetyl, PLIF and, 409 Bias voltage, scanning capacitance microscope (SCM) vs., 20 Biased transfer roll (BTR), in electrophotography, 301 Bidirectional reflectance distribution function (BRDF), 51, 528 Binary imaging, 584–589 Binder or dead layer, in field emission displays (FED), 381 Binocular disparity, 1328 Binoculars, 1074 Biochemistry and biological research, 693–709 force imaging and, 422 in scanning acoustic microscopy (SAM), 1228 in scanning electrochemical microscopy (SECM), 1255–1256
1513
secondary ion mass spectroscopy (SIMS) in analysis, 484 X-ray fluorescence imaging and, 1479–1482 Biochips, terahertz electric field imaging and, 1403 Bioorthogonal wavelet basis, 1446 Bipack filming, 1041 Bipedal locomotion, force imaging and analysis of, 419–430 Bipolar junction transistors, 1173 Birefraction, 1131–1132 Birefringence, 233, 1134 Bisazos, 1179–1180 Bistatic radar, 772 Bit rates, compressed vs. uncompressed, 152 Black-and-white film, 1023, 1041, 1356 Black-and-white images, 584–589, 830–833, 1347 Black-and-white TV, 1359 Black light, 1041 Black-white vision, 567 Black, James Wallace, 773 BLACKBEARD lightning locators, 905, 929 Blackbody radiation, 103, 211, 222, 525, 690, 803, 804, 813 in overhead surveillance systems, 782, 789 photodetectors and, 1184–1187 Blanking, television, 1360 Bleaches, 138–139, 1217 Blimp, 1041 Blind spot, human eye, 514, 516 Blobs, 646 Blobworld, 624, 630 Block based search and retrieval systems, 623 Blocking lens, in three-dimensional imaging, 1331 Blooping, 1041 Blow up, 1041 Blue light, 101 Blue screen, 1041 Blueshift (See also Doppler shift), 686, 772 Blur function, 596 Blurring, 50, 56, 58, 59, 72, 81, 82, 83, 84, 544, 577, 604 digital watermarking and, 167 flow imaging and, 399–400 high-speed photography and, 491–492 image processing and, 579, 581, 595–596 in medical imaging, 756 in overhead surveillance systems, 784 quality metrics and, 598–616
1514
INDEX
Bohr magnetron, 288 Bolometers, 1193–1194, 1204–1206 Boltzmann’s constant, 222 Bonvillan, L.P., 773 Boolean operations, 590 Boom, 1041 Borehole gravity meter, 450 Bormann effect, 282 Born Oppenheimer approximation, 216 Bouguer anomaly/correction, 447, 453 Bounce light, 1041 Boundaries, thresholding and segmentation in, 644–645 Boundary conditions, 233–234 Boundary hugging, thresholding and segmentation in, 642 Bragg reflection, 244 Bragg’s law, 267 Brain and human vision, 513–514, 558, 569 Braking radiation (bremsstrahlung), 223 breakdown, 1041 Breast cancer detection, in infrared imaging, 812 Bremsstrahlung, 219, 223–224 Brewster angle, 237 Brewster window, 237 Bright field image, 268 Bright fringes, 243, 269–270 Brightness, 102, 618, 1037 cathode ray tube (CRT), 34–35, 180 in charged particle optics, electron gun, 88 feature measurement and, 344–345 in field emission displays (FED), 382 in forensic and criminology research, 723–724 image processing and, 580–584 Broad light, 1041 Broadatz textures, 622 Broadcast transmission standards, television, 1359–1393 Browsing, in search and retrieval systems, 617 Bucky diagrams, 572 Buffer rods, in scanning acoustic microscopy (SAM), 1233, 1234 Building block structures, in transmission electron microscopes (TEM), 271 Buried heterostructure laser, scanning capacitance microscope (SCM) analysis, 22–23 Butt splice, 1041
C C scan ultrasonography, 1413 Cadmium sulfide photodetectors and, 1190, 1200 Calibration color image, 116–117 DPACK, 27 FASTC2D, 27 force imaging and, 424 Hall generators, 974 scanners, 603 scanning capacitance microscope (SCM), 27–28 TSUPREM4, 28 Calotype, 1259–1309, 1345 Cambridge Display Technology (CDT), 818 CAMECA, 478, 479 Camera axis, 1041 Camera film, in motion pictures, 1026–1027 Camera log, 1041 Camera obscura, 1344–1345 Camera operator/animator, 1042 Cameras Anger, 1062–1063 animation, 1041 aperture in, 1028 autoexposure, 1355–1356 autofocus, 1356 Captiva, 847 digital imaging, 854–855 dollies, 1030 electronic flash in, 1348–1349 energetic neutral atom (ENA) imaging, 1006–1010 flow imaging and, 393–394 in forensic and criminology research, 710–714 frame and film gate in, 1028 frame rates in, 495 handheld cameras, 1031 high-speed, 494–498, 1047 I Zone, 847 image dissection cameras, 498 image formation in, 571 instant photography and, 827–859 intermittent action high-speed cameras for, 495 large format, 1352–1354 lens in, 1029, 1348 medium format, 1351 microimaging, 1351 mirrors in, 1074, 1346, 1349–1350 for motion pictures, 1022, 1027–1029, 1042 motor drive in, 1348 in overhead surveillance systems, 773–802
photoconductors and, 1174 pinhole, 1072–1073 Pocket Camera, 847 Polachrome, 848 Polacolor, 843–844 Polaroid, 844–847 Polavision, 848 pull down claw in, 1027 rangefinder in, 1350–51 reflex type, 1345, 1349–1350 rotating drum and mirror cameras, 496–497 rotating mirror framing cameras, 497–498 rotating prism cameras, 495–496 scintillation, 1313–1314 shutter in, 1027–28, 1351–1352 single lens reflex (SLR), 1349–1350 speed of, 1028–1029 Steadicam, 1031 still photography, 1344–1358 streak cameras, 499–500 strip cameras, 500 tripods, 1030 trucks, 1030 video assist, 1031 video, 1029–1031, 1174 viewfinders in, 1029 Campisi, George J., 375 Canadian Lightning Detection Network (CLDN), 890–904, 935 Cancer detection, using infrared imaging, 812 Candela, 1042 Candescent, 377–378 Canon Image Runner, 300 Capacitance electronic disk (CED), 16 Capacitance sensors, 17–18, 423–424 Capacitance–voltage (C–V) curve, scanning capacitance microscope (SCM), 21, 25 Capacitive probe microscopy (See also Scanning capacitance microscope), 16–31 Capacity, information, 1082–1083 Captiva, 847 Captured orders, microscopy and, 1109 Carbon arc lamps, in projectors, 1037 Carlson, Chester F., 299, 1174 Cascade development, in electrophotography, 312 Cassegrain mirrors, 783 Cassini Saturn Orbiter, 1020 Cataract, human eye, 548 Category scaling, quality metrics and, 608–609
INDEX
Cathode cathode ray tube (CRT), 44–45 electron gun, 39 in field emission displays (FED), 378–380, 384–387 Cathode ray direction finder (CRDF) lightning locators, 890, 912, 935 Cathode ray tube (CRT), 31–43, 44–48 in three-dimensional imaging, 1330 electron microscope use of, 262 in field emission displays (FED) vs., 374 graphics cards and, 174–175 meteor/in meteorological research, 757 oscilloscope using, 47 radar tubes using, 47 in three-dimensional imaging, 1333 Cathodluminescence (CL), 374 CAVE three-dimensional display, 1335 Cavity resonators, 1223–27 CD ROM compression, 151, 156–157 CD-2/3/4 developers, 131 Cel animation, 1042 Cellulose triacetate, 1042 Cement splice, 1042 Ceramics, secondary ion mass spectroscopy (SIMS) analysis, 484 Cerenkov radiation/counters, 1158, 1162 CERN Accelerator, 970 CGR3 flash counters, 907 Chain codes, in search and retrieval systems, 625 Chambers of human eye, 746 Chandra Observatory, 1508 Changeover, 1042 Channel buffer, 1388 Channel capacity, 65–66 Channel constancy, 173–174, 181, 184 Channel independence, 176, 181 Channeling, electron, 285–286 Channels, 1362, 1388 Character animation, 1042 Characterization of image systems, 48–86 Charge carrier, photoconductors and, 1170 Charge coupled devices (CCD), 1090 art conservation and analysis using, 663, 664, 671 astronomy science and, 688 color photography and, 141 digital photomicrography, 1138–1140
in endoscopy, 334 flow imaging and, 393–394 in forensic and criminology research, 722–723 high-speed photography and, 498–499 human vision and vs., 552–553 image formation in, 571 instant photography and, 827 multipinned phase (MPP), 1198 noise in, 396–397 in overhead surveillance systems, 784–787 particle detector imaging and, 1167 photoconductors, 1173–1174 photodetectors and, 1184, 1187, 1190, 1193, 1194–1198 radiography/in radiographic imaging, 1067–1069 terahertz electric field imaging and, 1398 virtual phase CCD (VPCCD), 1198 X-ray fluorescence imaging and, 1477 Charge cycle, in electrophotography, 300–304 Charge exchange process, 1004 Charge injection devices (CID), 1192–1193 Charge mode detectors, 1191–1193 Charge-to-mass ratio, in electrophotography, 313–315 Charge transfer devices (CTD), 1192 Charge transport layer (CTL), 1176 Charged area development (CAD), in electrophotography, 317 Charged particle optics, 86–100 Charging cycle, 1176 Charters of Freedom project, art conservation, 663 Chemical composition analysis, 684–686 Chemical sensitization, 1288–1293 Chemical shift imaging, MRI, 995–1000 Chemical shift saturation, MRI, 994–995 Chemical vapor deposition (CVD), 383–384 Chemiluminescence, 255 Chen operator, image processing and, 582 Chirped probe pulse, terahertz electric field imaging and, 1396–1397 Cholesteric liquid crystal displays, 965–966 Chomratic aberration, 92 Choppers, lidar, 873
1515
Chroma in, 103 ChromaDepth three-dimensional imaging, 1332 Chromakey, 1042 Chromatic aberration, 94, 234, 1081–1082 in charged particle optics, 93, 98 in human vision, 544–545, 554–555 in microscopy, 1117–1118, 1123 Chromatic adaptation, human color vision and, 520 Chromatic difference of refraction, 544 Chromatic filters, human vision and, 560 Chromaticity, 119, 148, 380, 512, 534–535 Chromaticity diagrams, 107–108, 534–535 Chrome, in gravure printing, 457–458 Chrominance, 103, 109, 1042 Chromogenic chemistry, 129–139 Chromophores, instant photography and, 836 Cibachrome process, 127 CIECAM97, 538 CIELAB, 521, 533, 535, 536, 537, 619 CIELUV, 535, 537, 619 CIGRE flash counter lightning locators, 907 Cinch marks, 1042 Cinemascope, 1031, 1042 Cinematographe, 1022 Cinemiracle, 1042 Cineon digital film system, 1042 Cineorama, 1031 Cinerama, 1031, 1042 Cinex strip, 1042 Circ function, 542 Circaram, 1042 Circles of confusion, 1347 Circular pistons, 8, 1424–1426 Circular polarization, 231–232 Circularity, 347–348, 627 Clapsticks, 1042 Clarity, 71, 75 Classical electron radius, 250 Clausius–Mossotti relationship, 229 Claw, 1042 Clay animation, 1042 Climatic change, geologic imaging and, 655–656 Close up, 1042 Closed captioning, 1389 Closed interval, 434, 435 Closing, morphological, 430, 436–438, 585 Cloud classification, 761–764
1516
INDEX
Cloud to Ground Lightning Surveillance System (CGLSS), 890–904, 940–941 Clustering, in feature recognition and object classification, 366–370 Clutter, radar vs., 470–472 CMOS image sensors, 1139–1140, 1199–1200 Coated lens, 1042 Coaterless film, 832 Coatings, 528 Coded OFDM (COFDM), 1389 Codewords, 1388 Coding, retinex, 75–83 Codoping, secondary ion mass spectroscopy (SIMS) analysis of, 489 Coefficient of thermal expansion (CTE), 382, 384, 387 Coherent image formation, 242, 258, 504, 507, 874, 1086, 1098–1100 Coherent PSF, 1087–1088 Coherent transfer functions, 1098 Coils, deflection yoke, 41–42 Cold field emitter (CFE), in charged particle optics, 89 Cold pressure transfer, in electrophotography, 323 Collimation, 247, 255, 509, 1042 Collinear sources, in-phase signals, 2–3 Colonoscopes, 332 Color analyzer, 1042 Color appearance models, 537–538 Color balance, 114–116, 118, 1042 Color blindness, 522–523 Color burst reference, television, 1367 Color characterization, in field emission displays (FED), 380 Color circle in, 109–110 Color codes, 705–706, 1116 Color compensation in, 116–117 Color constancy, in human vision, 520–521 Color coordinate systems, 105–114, 531–536, 619, 537, 630, 641 Color correction, 114–116, 1042 Color difference signals, 106–107, 536–537, 1367 Color duplicate negative, 1035, 1043 Color electrophotography, 325–329 Color film, 1023, 1356–1357 Color image calibration in, 116–117 Color image fidelity assessor (CIFA) model, 612 Color image processing, 100–122, 325–329 Color imaging, 51, 56, 75, 219, 524, 641 in biochemical research, 705–706
cathode ray tube (CRT), 35, 46, 47 in endoscopy, 335–336 in forensic and criminology research, 714 geologic imaging and, 654 in infrared imaging, 809 instant photography and, 833–842 liquid crystal displays, 967–968 quality metrics and, 598–616 in search and retrieval systems, 618–622 in ultrasonography, 1431–1432 Color internegative, 1043 Color lookup tables (CLUTs), 575 Color master, 1034 Color matching, 102, 104–105, 531–534 Color negative, 1043 Color photography, 122–145, 1208–22 Color positive, 1043 Color print film, 1043 Color purity, 388 Color reproduction, in dye transfer printing, 194 Color reversal film, 1043 Color reversal intermediate film, 1043 Color saturation, 1043 Color separation negative, 1043 Color shifted dye developers, 835 Color space, 512, 521, 535–536, 619, 641 Color television, 1365–67 Color temperature, 103, 525 Color ultrasound, 1431–1432 Color vision, 512–514, 518–521, 551–552, 560–561, 564, 567, 747, 1328 deficiencies of, 522–523 Coloray, 377 Colored masking couplers for, 136–137 Colorimetry, 103–105, 512, 523–524 Coma, 92–94, 1084, 1089, 1119 Comb filters, television, 1366 Combined feature, in search and retrieval systems, 630–631 Combined images, image processing and, 589–591 Combined negative, 1043 Commission Internationale de l’Eclariage (CIE), 102, 618 Common intermediate format (CIF), 156–157 Communication theory, 49, 75–83, 161, 168–171 Compensating eyepieces, microscopy and, 1115, 1120 Compensation, color, 116–117
Complementary color, 1043 Complex cells, 567–568 Component analog video (CAV), 1373–74 Component analysis for dimensionality reduction, 363–364 Component digital television 1374–75 Component TV systems, 149–150 Component video standards, 1380–1382 Composite digital television, 148–149, 1377–1380 Composite materials, in scanning acoustic microscopy (SAM) analysis for, 1228 Composite print, 1043 Composite video, 1043 Compound, 1043 Compressed video, 1385–1386 Compression, 151, 156–157, 519 advantages of, 151 audio, 1388 bit rate, vs. uncompressed, 152 CD ROM, 151, 156–157 common intermediate format (CIF) in, 156–157 comparison of techniques in, 155 complexity/cost of, 156 digital compression standards (DCS) in, 156 digital watermarking and, 150–157 discrete cosine transform (DCT) in, 154 displaced frame difference (DFD) in, 155 entropy coding in, 154–155 error handling in, 156 in forensic and criminology research, 740, 741 high definition TV (HDTV), 151, 153, 157 human vision and color vision, 521 interactivity and, 152 MPEG and, 156–157 multiple encoding and, 152–153 noise in, 152 in overhead surveillance systems, 786 packet communications and, 151, 152 packetization delay and, 152 perceptual factors and, 155 perceptual redundancy and, 151 predictive coding (DPCM) in, 153, 155
INDEX
quality of, 151 quantization in, 154 redundancy in, 155 requirements of, 151–153 resolution vs., 151 robustness of, 152 sampling in, 153 scalability of, 153 in search and retrieval systems, 631 standard definition TV (SDTV), 157 standards for, 156–157 statistical redundancy and, 150 subband/wavelet coding in, 154 symmetry of, 152 temporal processing in, 155 transform coding in, 153–154 transformation in, 153 vector quantization in, 154 video, 1385–86 Videophone standard for, 156–157 wavelet transforms in, 154, 1449–1450 Compton scattering, 211, 250, 256–258, 260, 1323–1324 Compton, Arthur, 211 Compur shutters, 1351 Computed tomography (CT), 219, 1057–1071, 1404 image formation in, 571–572 in magnetic resonance imaging (MRI) vs., 983 in medical imaging, 743 single photon emission computed tomography (SPECT), 1310–1327 Computer modeling, gravity imaging and, 452–453 Computer simulations of human vision, 50 Computers in biochemical research, 695–696 holography in, 510–511 in magnetic resonance imaging (MRI), 999–1000 meteor/in meteorological research, 769–771 in search and retrieval systems, 616–637 X-ray fluorescence imaging and, 1477–1478 Concentration density, flow imaging and, 408–411 Concentration measurements, planar laser induced fluorescence (PLIF) in, 863–864 Condensers, microscope, 1112–1124 Conduction electrons, 230
Conductivity, 220–231 developer, in electrophotography, 315–317 photoconductors and, 1171–1172 Cone coordinates, 176–178 Cone opsins, 517, 561 Cones, of human eye, 122, 513–517, 551–554, 558, 560–563, 746–747 Conform, 1043 Conjugate images, holography in, 505–506 Conjunctive granulometry, 441 Constant deltaC or closed loop mode, SCM, 21 Constant deltaV or open loop mode, SCM, 20 Constant luminance, television, 1367 Constant separation imaging, SECM, 1254–1255 Constellation X Observatory, 693, 1509 Constructive interference, 241, 243, 245, 507 Contact heat transfer, in electrophotography, 324–325 Contact print, 1043 Content based indexing, 617–618 Continuity, 1032–1033, 1043 Continuous contact printer, 1043 Continuous line sources, 6–7 Continuous planar sources, 7–8 Continuous wave (CW), 288, 293, 391 Continuous wavelet transform (CWT), 1444–1446 Contours, 615, 644–645 Contrast, 81, 117, 270, 605, 606, 611, 612, 623, 1094 in cathode ray tube (CRT), 34–35, 183 in field emission displays (FED), 387–388 in forensic and criminology research, 723–724 human vision and, 521, 558–560 liquid crystal displays, 967 in magnetic resonance imaging (MRI), 983, 988–991 in medical imaging, 752–755 microscopy and, 1110, 1113–1124, 1127, 1132–1134 in motion pictures, 1043 Rose model and medical imaging, 753–754 in scanning acoustic microscopy (SAM), 1229, 1235–1244 silver halide, 1261–1262 in transmission electron microscopes (TEM), 269
1517
in ultrasonography, 1418, 1423–1424, 1432–33 Contrast agents, in ultrasonography, 1418 Contrast mechanism, scanning capacitance microscope (SCM), 17–18 Contrast resolution, in ultrasonography, 1423–1424 Contrast sensitivity, 167, 521–522, 558–560, 605 Contrast transfer function (CTF), 401 Control strip, 1043 Conventional fixed beam transmission electron microscopy (CTEM), 277 Convergence, 34, 1328, 1330 Convergent beam diffraction, 268 Convex, 438 Convexity, 438 Convolution, 594–597, 1085–1092, 1105 Cooling, of photodetectors, 1188–1190 Coordinate registration, radar, 1144, 1147–1148 Copal shutters, 1351 Copiers, 574 electrophotographic/xerographic process in, 299 photoconductors, 1174–1183 quality metrics and, 598–616 Copper, in gravure printing, 457–458, 463 Copper phthalocyanines, 301 Copy control, digital watermarking and, 159 Copycolor CCN, 839, 855 Core, film, 1043 Corey Pauling Koltun (CPK) models, 694 Cornea, 512, 539, 541, 547 Corona, in electrophotography, 302–303 CORONA project, 777, 780 Corotron charging, in electrophotography, 302–303 Corpuscles, in light theory, 211 Correction filter, 1043 Corrections, optical, microscopy and, 1115 Correlated color temperature (CCT), 525 Correlation, image processing and, 594–597 Correspondence principle, 215 Corrosion science, scanning electrochemical microscopy (SECM) in, 1255 Coulombic aging, 380
1518
INDEX
Couplers, coupling, 1043 in color photography, 129, 131–137 in photographic color display technology, 1216 scanning capacitance microscope (SCM) vs., 20 in scanning acoustic microscopy (SAM), 1230 Coupling displacement, in instant photography dyes, 839 Cover glass, microscope, 1115–1116 Coverslip, microscope, 1115–1116, 1118 Crane, 1043 Credits, 1043 Criminology (See Forensics and criminology) Critical angle, 238 Cronar films, 1023 Cropping, 1043 Cross talk, in three-dimensional imaging, 1330, 1333 Cross viewing, in three-dimensional imaging, 1336 Cryptography, digital watermarking and, 161–164 Crystal growth and structure of silver halide, 125–126, 1262–1268 Crystal spectrometers, 244 Crystal structure, Bragg reflection and, 244 Crystallography, 282–285, 696–699 Cube root systems in, 107–108 Cubic splines, tomography/in tomography, 1405 Current density, in charged particle optics, 87, 93 Curvature of field, 92, 93, 1085, 1118–1119 Cut, cutting, 1043 Cyan, 1043 Cyan couplers, color photography, 133–135 Cycles of wave, 213 Cyclic color copying/printing, 328 Cycolor, 827 Cylinders, in gravure printing, 457–463 Cylindrical color spaces in, 109–110 Cylindrical coordinates in, 110–111 Czochralksi vertical puller, 1197
D D log E, 1043 D log H curve, 1043 Daguere, Louis, 1345 Daguerreotype, 1259–1309, 1345 Dailies, 1033–1034, 1044
Damping force, 226, 231 Dancer roll, in gravure printing, 459–460 Dark adaptation, 519 Dark current, 396, 1188–1190 Dark discharge, 311–312 Dark field image, 269 in scanning transmission electron microscopes (STEM), 277 in transmission electron microscopes (TEM), 264 Dark fringes, 243, 269–270 Dark noise, 785–786 Dark signal, 397 Darkfield microscopy, 1127–1128 Darkness, overall, 604 Darkroom techniques, in forensic and criminology research, 724–725 Darwin–Howie–Whelan equations, 282, 284 Data acquisition in infrared imaging, 807 in scanning acoustic microscopy (SAM), 1230 Data rate, 50, 63–64, 67, 81, 84 Data reduction processes, in feature recognition, 353–358 Data transmission, over TV signals, 1389 Databases in biochemical research, 699 in search and retrieval systems, 617–637 Dating, (EPR) imaging for, 296–297 Daylight, 524, 525, 1044 DC restoration, cathode ray tube (CRT), 180 De Broglie, Louis, 261 De-exicitation of atoms, 253 De Morgan’s law, 431, 435 Dead layer, 381 Decay, 254, 260, 980–981 Decibels, 1044 Decision theory, 161, 359–363 Decomposition, in search and retrieval systems, 627 Deconvolution, 595–596 Defect analysis and detection electron paramagnetic resonance (EPR) imaging for, 295–296 feature recognition and object classification in, 355 microscopy and, 1116–1117 scanning capacitance microscope (SCM) for, 22 in scanning acoustic microscopy (SAM) analysis for, 1228 silver halide, 1268–1269 Defense Meteorological Satellite Program (DMSP), 890–904, 929
Definition, 1044 Deflection angle, cathode ray tube (CRT), 35–36 Deflection yoke, CRT, 31, 41–43, 46, 48 Deflectometry, flow imaging and, 412 Defocus, 272, 555, 1088 Deinterlacing, in forensic and criminology research, 725–726 Delphax Systems, 301, 322 Delrama, 1044 Delta function, 1103–1104 Delta rays, particle detector imaging and, 1156 Denisyuk, N., 504 Denoising, wavelet transforms in, 1448 Densitometer, 1044 Density film, 1357 gravity imaging and, 449–450 human vision and, 513, 598–616, 598 in motion pictures, 1044 Density gradients, flow imaging and, 405 Depth cues, in three-dimensional imaging, 1327–1328 Depth of field (DOF), 403, 1044, 1124, 1347 Depth of focus, 403, 1044, 1124 Derived features, 357–358 Desertification, geologic imaging and, 655–656 Designer, 1044 Desktop publishing, 300 Destructive interference, 241 Detection theory, digital watermarking and, 161 Detectivity, photodetector, 1188 Detector arrays, photoconductive, 1203–1205 Detectors background limited infrared photodetector (BLIP), 1189 charge mode detectors, 1191–1193 diamond, 1168 fabrication and performance of, 1194–1198 in infrared imaging, 805, 806 lidar, 873 magneto/ in magnetospheric imaging, 1007 neutron/in neutron imaging, 1058–1062 in overhead surveillance systems, 785, 790–793 particle detector imaging, 1154–1169 photoconductor, 1169–1183
INDEX
photodetectors, 1183–1208 photoelectric, 1169 pixel, 1167–1168 radiography/in radiographic imaging, 1067 scintillator, 1062–1064 semiconductor, 1064–1065, 1163–1165, 1168 silicon drift, 1167 silver halide, 1259–1309 strip type, 1165–1167 in ultrasonography, 1418–1419 X-ray fluorescence imaging and, 1477 Deuteranopia, 522–523 Developers and development processes, 1356–1357 color photography, 129–131 in electrophotography, 300, 312–322 in forensic and criminology research, 724–725 holography in, 509 instant photography and, 827, 830, 834–842 in motion pictures, 1034, 1044 silver halide, 1299–1302 Development inhibitor releasing (DIR) couplers, 129, 137–139 Diagnostic imaging, art conservation and analysis using, 665–680 Dialogue, 1044 Diamond detectors, 1168 Diaphragm, 1044, 1080 Dichroic coatings, 1044 Dichroism, 233 Dicorotron charging, 303 Diderot, Denis, 455 Dielectric constants, 225 Dielectrics, speed of light in, 224–225 Difference of Gaussian (DOG) function, 80 Difference threshold, 612 Differential absorption lidar (DIAL), 869, 874, 882 Differential interference contrast (DIC), 1106, 1116, 1134–1135 Differential pulse code modulation (DPCF), 67, 786 Differential scattering, 250, 410 Diffraction, 3, 58, 246–253, 278–286, 1073, 1082, 1083, 1087 in charged particle optics, 94 convergent beam, 268 flow imaging and, 403 image formation in, 571 microscopy and, 1107–1108, 1110, 1126, 1128 in motion pictures, 1044 in overhead surveillance systems, 792–794
in transmission electron microscopes (TEM), 266–274 in ultrasonography, 1424–1428 Diffraction contrast imaging, TEM, 268–270 Diffraction limited devices, 249 Diffraction limited PSF, 1087 Diffraction theory, 3 Diffractometers, 244 Diffuse reflection, 234 Diffusers, 509, 528 Diffusion, 404, 525, 528, 1044 Diffusion coefficients, electron paramagnetic resonance (EPR), 294 Diffusion etch, gravure printing, 460 Diffusion transfer process, 456 Diffusion transfer reversal (DTR), 827, 833 Diffusivity, 404 Digital cameras, 854–855 Digital cinema, 1039–1040 Digital compression standards (DCS), 156 Digital effects, 1044 Digital imaging, 49–50, 516 art conservation and analysis using, 662–665, 680 CMOS sensors and, 1139–40 in forensic and criminology research, 722–723 instant photography and, 827, 854–855 in medical imaging, 754 microscopy and, 1106 in overhead surveillance systems, 786 in photographic color display technology, 1216–1217 photography, 1358 in photomicrography, 1137–1140 quality metrics and, 602–603 in search and retrieval systems, 616–637 television, 1374–1380 video, 1374–80 wavelet transforms in, 1447–1450 Digital intermediate, in motion pictures, 1035 Digital Library Initiative (DLI), 616 Digital light processor (DLP), 183, 185 Digital photography, 141–142, 827 Digital photomicrography, 1138–1140 Digital processing Cineon digital film system, 1042 in forensic and criminology research, 725
1519
in motion pictures, 1039–1040, 1044 in ultrasonography, 1429–1430, 1429 Digital rights management (DRM), 159 Digital sound, 1031, 1033, 1037, 1044 Digital television, 1374–1380 Digital to analog conversion (DAC), 721 Digital versatile disk (DVD), 159 Digital video (See also Compression), 146–157 Digital Video Broadcast (DVB), 1392 Digital watermarking, 158–172, 158 Digitizers, in forensic and criminology research, 709, 722–723 Dilation, 430, 433, 584–589, 1446 Dimension Technologies Inc. (DTI), 1338–1339 Dimensionality reduction, in feature recognition, 363–364 Dimmer, 1044 Dipole moment matrix element, 254 Dirac functions, 57 Direct image holography, 505 Direct mapping, in RF magnetic field mapping, 1125–1126 Direct thermal process, 196, 853 Direct transfer imaging, gravure printing, 461 Direction finding (DF) lightning locators, 890, 906–907 Directional response characteristics, 1 Directivity patterns, 1 Director, 1044 Dirichlet conditions, 1103 Disaster assessment, geologic imaging and, 651–655 Discharge cycle, 310–312, 1176 Discharge lamps, 222 Discharged area development (DAD), 317 Discontinuities, SAM, 1243–44 Discounting of illuminant, 520–521, 558, 561–562 DISCOVERER mission, 777 Discrete cosine transform (DCT) compression, 154 digital watermarking and, 171 in search and retrieval systems, 631–632 video, 1387–88 Discrete element counters, 1060 Discrete Fourier transform (DFT), 57, 61, 67–68, 80, 263, 1405, 1448 Discrete wavelet transform (DWT), 1446–1447
1520
INDEX
Discrimination, 607 Disjunctive granulometry, 441 Dislocation contrast, TEM, 270 Dispersion, 225–229, 234–235, 466, 1081 Dispersive power, 234 Displaced frame difference (DFD), 155 Displacement current, 212 Displays, 172–199 autostereoscopic, 1336–1341 characterization of, 172–199 field emission display (FED) panels, 374–389 flat panel display (FPD), 374 holography in, 510 liquid crystal (LCD), 955–969 in medical imaging, 754 photographic color technology, 1208–1222 secondary ion mass spectroscopy (SIMS) in, 482–484 in three-dimensional imaging, 1331–1335 in ultrasonography, 1413–1415 Dissolves, 1034, 1044 Distortion, 82, 92–94, 398, 1085, 1106 Distributors, 1044 DNA (See Genetic research) Doctor blade, in gravure printing, 454–455, 458 Documentation, in forensic and criminology research, 716–717 Dolby Digital Sound, 1033, 1044 Dollies, 1030, 1044 Dolph–Chebyshev shading, 3 Dopants and doping in field emission displays (FED), 384 germanium photoconductors, 1204–1205 in photographic color display technology, 1214 photoconductors and, 1171 secondary ion mass spectroscopy (SIMS) in analysis, 487–489 silver halide, 1293–1294 Doppler lidar, 870 Doppler radar, 223, 758, 764, 766, 767, 768, 772, 1142, 1458–1468 Doppler shift, 764, 772 astronomy science and, 685–686 flow imaging and, 415–416 lidar and, 879–881 magneto/ in magnetospheric imaging, 1017 planar laser induced fluorescence (PLIF) in, 863 radar and over the horizon (OTH) radar, 1142, 1145
in ultrasonography, 1418, 1430–1432 Doppler ultrasound, 1430–32 Dot matrix printers, 300 Dots per inch (DPI), 602 Double exposures, silver halide, 1288 Double frame, 1044 Double refraction, 1131–1132 Double system recording, 1033, 1044 Doublet, achromatic, 235 Downward continuation, gravity imaging and, 452 DPACK calibration, 27 DPCM, 83 Drag force, 226 DRAM, scanning capacitance microscope (SCM), 22 Drift correction, gravity imaging and, 446 Drift tube scintillation tracking, 1161 Drive mechanisms, display, 382–383 printer, 193 Driven equilibrium, in magnetic resonance imaging (MRI), 992 Driving force, 226 Dry developers, in electrophotography, 300 Dry process printers, 195 Dryers, gravure printing, 459 Dual color polymer light emitting diodes (PLED), 820–822 Dual mapping, 431 Dubbing, 1044 Dupe, dupe negative, 1044 Duplexers, radar, 1452 DuraLife paper, 1211 Dwell coherent integration time, radar and over the horizon (OTH) radar, 1147 Dwells, radar, 1146 Dyadic set, 1446 Dye bleach, 1217 Dye desensitization, 1298 Dye diffusion thermal transfer (D2T2), 853 Dye lasers, 885 Dye sublimation printers, 189–190, 189, 194–195, 194 Dye transfer printing, 188–197, 827 Dyes, 1356 in color photography, 123–125, 131–133, 140 in dye transfer printing, 188–197 in instant photography, 827, 834–842 in motion pictures, 1045 silver halide, 1295–1296 Dynamic astigmatism, electron gun, 40–41
Dynamic spatial range (DSR), 404–405, 415
E E field Change Sensor Array (EDOT), 890–904 Earth Observing System (EOS), 659, 772 Earth Probe, 660 Earth Resources Technology Satellite (ERTS), 778–779 Earthquake imaging, 453 Eastman Color film, 1024 Eastman Kodak, 498 Eastman, George, 1022 Echo, in magnetic resonance imaging (MRI), 981–983, 981 Echo planar imaging (EPI), 989, 992 Echo signal processing, in ultrasonography, 1419–1420 Edge detection, 517 Edge enhancement, 580–583 Edge finding, 582 Edge following, 642 Edge histograms, 626 Edge numbering, edge codes, film, 1026–1027, 1045 Edge sharpness, 1357 Edge spread function (ESF), 751–752, 1091–1092 Edgerton, Harold, 774 Edison, Thomas, 1022 Edit decision list (EDL), 1035, 1045 Editing, motion pictures, 1034–1035, 1045 Effective/equivalent focal length (EFL), 1078 EI number, film, 1023 Eigenfunctions, eigenstates, 285–286 8 mm film, 1025 Einstein Observatory Telescope, 1507 Einstein, Albert, 211 Einstein’s coefficient of spontaneous emission, 253 Einstein’s Theory of Special Relativity, 228 Ektapro process, 498 Elastic scattering, 249, 338 Elastic theory, liquid crystal displays, 959 Elasticity imaging, in ultrasonography, 1433–1434 ELDORA radar, 1457, 1471 Electric dipole radiation, 218, 220 Electric discharge sources, 222–223 Electric field imaging, X-ray fluorescence imaging and, 1489–1494 Electric Field Measurement System (EFMS), 890–904, 908 Electric permittivity, 212
INDEX
Electric polarization, 227 Electrical conductivity, 230 Electrical discharge, 910–911 Electrical fields, magnetospheric imaging, 1002–1021 Electrical Storm Identification Device (ESID), 907, 922–924 Electro–optic effect, 233 Electro optics, terahertz electric field imaging and, 1394–1396 Electrocardiogram (ECG), 198 Electrochemical microscopy (SECM), 1248–1259 Electroencephalogram (EEG), 198–210, 744 Electroluminescent display, 817–827 Electromagnetic pulse (EMP), 909, 941 Electromagnetic radiation, 210–261, 682, 803, 1072, 1393 Electromagnetic spectrum, 218–220, 1072 Electromechanical engraving, 461–462 Electromyography, force imaging and, 424 Electron beam, cathode ray tube (CRT), 32–34 Electron beam gravure (EBG), 462 Electron beam induced current (EBIC), SEM, 276 Electron channeling, 285–286 Electron gun, 31, 39–45, 48, 87–88, 173 Electron magnetic resonance (EMR), 287 Electron microscopes, 87, 261–287, 573, 590, 594 Electron paramagnetic resonance (EPR) imaging, 287–299, 1223–1227 Electron positron annihilation, 220 Electron radius, classical, 250 Electron sources, in charged particle optics, 87–91 Electronic endoscopes, 334 Electronic flash, 1348–1349 Electrophotography, 299–331, 574, 598–616, 1174–1183 Electroplating, gravure printing, 457–458 Electrosensitive transfer printers, 195 Electrostatic energy analyzers (ESA), 482 Electrostatic image tube cameras, 498 Electrostatic transfer, 323 Electrostatics and lightning locators, 911–912 Elementary bit streams, 1382
Elliptical polarization, 232 Emission, 223, 253–256, 259, 379, 383–384, 387, 804 Emission computed tomography (ECT), 743 Emission electron microscope (EMM), 479 Emission ion microscope (EIM), 478 Emissivity, 804, 1072 Emittance, 804 in charged particle optics, electron gun, 88 in infrared imaging, 807–810, 813–814 Emitters, in charged particle optics, 89–91 Emmert’s law, 1330 Empty magnification, microscopy and, 1121 Emulsion speed, 1045 Emulsions, 1023, 1039, 1045, 1208–1222, 1268 Encircled energy, 1090 Encoding, 50, 61–62, 152, 152–153, 1045 Encryption (See also Steganography), 160 Endogenous image contrast, MRI, 988 Endoscopy, 331–342 Energetic neural atom (ENA) imaging, 1003–104, 1006–1016 Energy and momentum transport by, 213–214 Energy density, in transmission electron microscopes (TEM), 213 Energy exploration, geologic imaging and, 650 Energy flux, astronomy science and, 688–690 Energy levels and transitions in, 215 Engraving, 454–462 Enhanced definition TV (EDTV), 1391–1392 Enhancement of image color/in color image processing, 117–119 feature recognition and object classification in, 351–353 in forensic and criminology research, 722 in medical imaging, 756 in overhead surveillance systems, 787 SPECT imaging, 1316–1321 Enteroscopes, 331–332 Entrance pupil, 1080, 1354 Entropy, tomography/in tomography, 1408 Entropy coding, 83, 154–155
1521
Environmental issues, color photography, 141 E¨otv¨os correction, 448 Equalization, 725, 755, 1360 Equatorial anomaly, 1144 Ergonomics, force imaging and, 424 Erosion, 430, 432–434, 584–589 Error correction, television, 1390 Error handling, 156, 174–175 Estar films, 1023, 1045 ETANN neural network, 371–373 Etching in field emission displays (FED), 384 gravure printing, 456, 460–461 Ethylenediaminodisuccinic acid, 141 Euclidean distance functions, 624, 646 Euclidean granulometries, 439–442 Euclidean mapping, 586–587 Euclidean properties, 439 Euclidean set theory, 430 Euler’s formula, 226 European Broadcasting Union (EBU), 1374 European Radar Satellite, 649 European Spallation Source (ESS), 1057 Evanescent waves, 238 Evaporation, 383 Event frequency, high-speed photography and, 493 Evoked brain activity, evoked potential (EEG), 199, 201–202 Ewald diffraction, 267 Ewald’s sphere, 267, 279, 280 Exchange, 1045 Excimer lasers, 391 Excitation, in magnetic resonance imaging (MRI), 979–980 Excitation error, 268 Excitation of atoms, 253 Existing light, 1045 Exit pupil, 1080 Expanders, holography in, 509 Expectation maximum (EM), tomography/in tomography, 1408–1409 Exposure, 1045, 1352 autoexposure, 1355–1356 in electrophotography, 304–310 electrophotography, 1176–77 in forensic and criminology research, 723–724 gravure printing, 460–461 high-speed photography and, 492 in motion pictures, 1043–1045 silver halide, 1288 Exposure latitude, 1045 Extended definition TV (EDTV), 1382 External reflection, 236
1522
INDEX
Extinction contours, 282 Extinction distance, 280 Extraction, in search and retrieval systems, 622, 625 Extraction of features, 353–358 Extraneous marks, 604, 605 Extreme ultraviolet imaging (EUV), 1005–1006 Eye (See Human vision) Eye tracker systems, 522 Eyepieces, microscope, 1108–1109, 1114–1124
F f-number, 1045 f-stops, 1352, 1354 Fabry Perot devices, 246 Fade, 1034, 1045 Fakespace CAVE three-dimensional imaging, 1335 Fakespace PUSH three-dimensional imaging, 1334 Far distance sharp, 1347 Far field, 220, 911–912, 1425–1426 Far field approximation, 251 Far field diffraction, 246 Far ultraviolet imaging of proton/electron auroras, 1016–1020 Far Ultraviolet Spectographic Imager (FUV-SI), 1017–1020 Faraday effect, 975 Faraday, Michael, 1022 Faraday’s laws, 211, 212 Fast, 1045 Fast Fourier transform (FFT), 1095, 1151, 1405 Fast On Orbit Recording of Transient Events (FORTE), 890–904, 929 Fast spin echo, MRI, 992 FASTC2D calibration, 27 Fawcett, Samuel, 456 Feature extraction, in feature recognition, 353–358 Feature Index Based Similar Shape Retrieval (FIBSSR), 626 Feature measurement (See also Measurement), 343–350 Feature recognition and object classification, 350–374 Fermat’s principle of least time, 234 Fermi energy levels, 1170–1171 Ferric ethylendiaminetetraacetic acid (ferric EDTA), 138, 141 Ferric propylenediaminetetraacetic acid (ferric PDTA), 141 Ferricyanide, 138 Ferroelectric liquid crystal displays (FLC), 964–965 Feynman diagrams, 259
Fiber optics, 333–334, 509, 1063–1064 Fidelity, 50, 71–74, 81, 84, 598, 611, 612, 615 Field angle, 1080 Field curvature, 1085 Field effect transistors (FET), 1173, 1199 Field emission, in charged particle optics, 89 Field emission display (FED) panels, 374–389 Field emission guns (FEG), 277 Field emitter arrays (FEA), 89, 375, 376 Field number (FN), 1121 Field of view (FOV), 60 lidar and, 870 in magnetic resonance imaging (MRI), 986, 996 microscopy and, 1121 in motion pictures, 1045 scanning capacitance microscope (SCM), 19 in three-dimensional imaging, 1330, 1333 Field points, 251 in three-dimensional imaging, 1330–1331, 1333 Field stop, 1080 Fields, television, 1359 Figures of merit (See also Quality metrics), 50, 62–64, 1409 Filaments, in charged particle optics, 87–88 Fill, 604 Fill light, 1045 Filling-in phenomenon, 516 Film acetate, 1039, 1040 additive color films, 847–849 Agfa, 1024 antihalation backing, 1041 art conservation and analysis using, 661–662 ASA/ISO rating for, 1023, 1041 aspect ratio, 1022 backing, 1041 balance stripe, 1041 base, 1041 black-and-white, 1023, 1041, 1356 camera type, 1026–1027 cellulose triacetate, 1042 coaterless, in instant photography, 832 color photography, 124, 139–142 color reproduction in, 139 color reversal intermediate, 1043 color reversal, 1043 color, 1023, 1043, 1356–1357 containers for, 1039
core for, 1043 density of, 1357 dichroic coatings, 1044 dye stability in, 140 Eastman Color, 1024 edge numbering, edge codes in, 1026–1027 EI number for, 1023 8 mm, 1025 emulsion in, 1023, 1039 Fuji instant films, 849–851 Fuji, 1024 Fujix Pictrography 1000, 851–852 granularity in, 140 gravure printing, 461 high-speed photography and, 498 holography in, 509 image formation in, 571 image structure and, 140–141 imbibition (IB) system for, 1024 instant photography and, 827–829 integral, for instant photography, 828 intermediate, 1026, 1047 Kodachrome, 1024 Kodak instant films, 849 Kodak, 1024 laboratory, 1048 length of, in motion pictures, 1025 magazines of, 1026 in medical imaging, 754 modulation transfer function (MTF) in, 140–141 in motion pictures, 1022–1023, 1045 negative and reversal, 1023 negative, 1049 negative, intermediate, and reversal, 1023–1024 neutron/in neutron imaging, 1065 nitrate, 1023, 1039, 1049 orthochromatic, 1050 in overhead surveillance systems, 790 Panchromatic, 1050 peel apart, for instant photography, 828 perforations in, in motion pictures, 1025–1026, 1050 photographic color display technology, 1208–1222 photomicrography, 1137–1138 Pictrography 3000/4000, 852–853 Pictrostat 300, 852–853 Pictrostat Digital 400, 852–853 pitch in, 1026 Pocket Camera instant films, 847 Polachrome, 848 Polacolor, 843–844 Polavision, 848
INDEX
polyester, 1023, 1039, 1050 print type, 1024 projection type, 1023 quality metrics and, 598–616 rem jet backing, 1051 reversal type, 139, 1052 root mean square (rms) granularity in, 140 safety acetate, 1023, 1024, 1052 sensitivity or speed of, 124, 139 seventy/70 mm, 1025 sharpness and, 140 silver halide and, 140 sixteen/16 mm, 1024, 1025, 1052 sixty/ 65 mm film, 1024 Spectra instant film, 847 speed of, 1023 still photography, 1344–1358 Super xxx, 1025 SX70 instant film, 844–847 Technicolor, 1024 thirty/35 mm, 1022, 1024, 1054 Time Zero, 846–847 Type 500/600 instant films, 847 vesicular, 662 width of, in motion pictures, 1024–1025 Film base, 1045 Film can, 1045 Film cement, 1045 Film gate, 1028, 1036, 1045 Film gauge, 1045 Film identification code, 1045 Film perforation, 1045 Film to tape transfer, 1045 Filtered backprojection (FBP), 1405 Filtered Rayleigh scattering (FRS), 411, 412, 415 Filters and filtering, 55, 56, 59, 65, 68, 69, 70, 71, 73, 76, 80, 100, 262, 437, 1092–1100 alternating sequential, 437 color/in color image processing, 118–119 comb, 1366 digital watermarking and, 150, 167 downward continuation, 452 extreme ultraviolet imaging (EUV), 1006 flow imaging and, 394 in forensic and criminology research, 717 Gabor, 623 gravity imaging and, 450–452 haze, 1047 holography in, 509 human vision and, 548, 558–560, 566–567 image processing and, 578, 579, 589, 593–597
incoherent spatial, 1096 lidar and, 872 light, 1048 linear, 756 liquid crystal displays, 968 logical structural, 442 in medical imaging, 755–756 microscopy and, 1113–1114 in motion pictures, 1043, 1045 multispectral image processing, 101 neutral density, 1049 open-close, 437 polarizing, 1050 quadrature mirror filter (QMF), 622 quality metrics and, 611 Ram Lak, 1405 in search and retrieval systems, 623 spatial, 1100 SPECT imaging, 1322 strike, 452 upward continuation in, 451 vertical derivatives, 452 wavelet transforms in, 1447–1450 Final cut, 1046 Fine grain, 1046 Fine structure, 218, 254 Fingerprinting, digital watermarking and, 159 Finite ray tracing, 1083 First hop sky waves, 912 First order radiative processes, 253 First print, 1046 Fisher’s discriminant, 364–366 Fixers, 1345 Fixing bath, 1046 Fixing process, 138–139, 324–325, 324 Flaking, 1046 Flame imaging, 409 Flange, 1046 Flare, in cathode ray tube (CRT), 182–183 Flash photography, 104, 492, 1348–1349 Flashing, 1046 Flat, 458–459, 1046 Flat-bed editing tables, 1034 Flat panel display (FPD), 374 Flat, motion pictures, 1031 Floating wire method, in magnetic field imaging, 975 Flooding disasters, geologic imaging and, 651–655 Flow imaging, 390–419, 501, 989–991 Fluid dynamics flow imaging and, 390
1523
gravity imaging and, 453 in magnetic resonance imaging (MRI), 991 Fluorescence and fluorescence imaging, 210, 223, 255, 259, 529 absolute fluorescence (ABF), 863 art conservation and analysis using, 661, 676–677 flow imaging and, 397 laser induced fluorescence (LIF), 408, 861–869 phosphor thermography, 864–867 planar laser induced (PLIF), 391, 408–409, 411–416, 861–864 pressure sensitive paint, 867–868 thermally assisted fluorescence (THAF), 863 X-ray fluorescence imaging, 1475–1495 Fluorescence microscopy, 1106, 1135–1137 Fluorescent lifetime, 254 Fluorescent sources, 524, 525, 529 Fluorescent yield, 255 Fluorochrome stains, microscopy and, 1137 Flux density, 214 Flux measurement, 972–973 Flux, lens-collected, 1081 Flux, photon, 215 Flux, reflected, 527–529 Fluxgate magnetometer, 974–975 Fluxmeters, 971–973 Focal length, 1073, 1078, 1354–1355, 1347 Focal plane, 1046, 1352 Focal plane array (FPA), 804, 805 Focal plane shutters, 1352 Focal point, 54 Focus, 1073, 1347 autofocus, 1356 flow imaging and, 403 ground penetrating radar and, 471 human vision and, 513, 555 in infrared imaging, 809–810 microscopy and, 1122, 1124 in ultrasonography, 1427–1428 Focus variation, holographic, 272 Focused ion beam (FIB) imaging, 90–91, 93, 479 Fog, 1046 Fold degeneration, 217 Foley, 1046, 1053 Follow focus, 1046 Foot, human, force imaging and analysis of, 421 Footage, 1046 Footlambert, 1046 Force imaging, 419–430
1524
INDEX
Force process, 1046 Forecasting and lightning locators, 909 Foreground, 1046 Forensics and criminology, 709–742, 1393 Foreshortening, 1328 Forgery detection, art conservation and analysis using, 661 Format, 1046 Format conversion, video, 720–722 Formation of images (See Image formation) Forward error correction (FEC), 1390 Foundations of morphological image processing, 430–443 Four field sequence, television, 1366–1367 Fourier analysis, 1102–1106 in forensic and criminology research, 731–732 gravity imaging and, 448 Fourier descriptors, in search and retrieval systems, 625–626 Fourier series, 698, 1102–1103 Fourier transform infrared (FTIR) microscope, 667 Fourier transforms, 50–58, 77, 280, 285, 1073, 1088, 1092–95, 1098–1099, 1102–1104, 1448 human vision and, 542 image processing and, 591–594, 596 in magnetic resonance imaging (MRI), 985, 987 in medical imaging, 751, 756 periodic functions and, 1104–1105 tomography/in tomography, 1405, 1406 in transmission electron microscopes (TEM), 263 two dimensional, 1104–1105 Fovea, 513, 515, 522, 561, 566, 746–747 Fowler–Nordheim plot, 379 Fox-Talbot, William Henry, 455, 492, 1345 Foxfet bias, 1166 Fractals, feature measurement and, 349–350 Fractional k space, in magnetic resonance imaging (MRI), 992 Frame, 1046 Frame and film gate, 1028 Frame by frame, 1046 Frame grabbers, 709, 722–723 Frame line, 1046 Frame rates, high-speed photography and cameras, 495 Frame transfer CCD, 393
Frames, television, 1359 Frames per second (FPS), 1046 Fraud detection, digital watermarking and, 159 Fraunhofer diffraction, 246, 247–249, 264, 280 Fraunhofer lines, 235 Free air correction, gravity imaging and, 447 Free electron gas, 230 Free electron lasers, 223 Free induction decay, in magnetic resonance imaging (MRI), 980–981 Free viewing, in three-dimensional imaging, 1336 Freeze frame, 1046 Frei operator, image processing and, 582 Frenkel equilibrium, silver halide, 1270 Frequency, 213, 227, 293, 559 Frequency band power mapping, 200–201 Frequency domain, image processing and, 591–595 Frequency interference, ground penetrating radar and, 470–471 Frequency modulation (FM), television, 1362 Frequency multiplexing, television, 1366–1367 Frequency response, 50, 1046 Frequency spectrum, 1103 Fresnel approximation, 285 Fresnel diffraction, 246 Fresnel equations, 235–236, 1498 Fresnel rhomb, 239 Fresnel sine integral, 1091 Fringes, 243, 1102 Frit, in field emission displays (FED), 387 Frustrated total internal reflection (FTIR), 238 Fuji, 1024 Fuji Colorcopy, 840 Fuji instant films, 849–851 Fujix Pictrography 1000, 851–852 Full frame CCD, 393 Fuming, 1345 Functional MRI (fMRI), 744 Functional parallelism, 562, 563, 565 Fundamental tristimulus values, 534 Fur brush development, in electrophotography, 312 Fusing process, in electrophotography, 324–325
Futaba field emission displays (FED), 377
G G strings, in search and retrieval systems, 628 Gabor filters, 623 Gabor, Dennis, 504 Gabriel graph (GG), 370 Gain, screen, 1046 Gallium arsenic phosphorus (GaAsP) photodiodes, 1200 Gallium arsenide (GaAs) photodiodes, 1172, 1200 Gamma, 1046 in cathode ray tube (CRT), 176, 177, 179 in liquid crystal displays (LCDs), 184 in motion pictures, 1043 silver halide and, 1261–62 in television, 1362 Gamma radiation, 218–220, 257, 803 art conservation and analysis using, 677–680 astronomy science and, 682, 683, 688 attenuation in, 260 neutron imaging and, 1057–58 photoconductors, 1169 SPECT imaging, 1311–1313, 1323 Ganglion cells, human vision and, 517–518, 562, 563 Gastroscopes, 331 Gate, frame and film, in camera, 1028, 1046 Gauge, 1046 Gauss’ law, 25, 211, 212 Gaussian distribution, 1157 Gaussian operator/function, 542, 581, 593, 637 Gaussian optics, 1078–1079 Gaussian rays, 1083–1084, 1085 Gelatin filter, 1046 Generalized functions, 1103 Generation, 220–224 Genetic research (See also Biochemistry and biological research), 694, 745–746 Geneva movement, 1046–1047 Geodesy, gravity imaging and, 444 Geodetic Reference System, 445 Geographic Information System (GIS), 453 Geoid, 445–446 Geological Survey, 453
INDEX
Geology, 647–661 ground penetrating radar and, 464 in magnetic field imaging, 970 instant photography and, 855 magnetospheric imaging, 1002–1021 radar and over the horizon (OTH) radar, 1149 Geostationary Meteorological Satellite (GMS), 760 Geostationary Operational Environmental Satellite (GOES), 760, 778 Germanium photoconductors, 1190, 1197, 1204–1205 Gettering materials, in field emission displays (FED), 380 Ghosting in radar, 1452 in three-dimensional imaging, 1330, 1333 Giant Segmented Mirror Telescope, 693 Gibbs phenomenon, 71 Gladstone–Dale relationship, flow imaging and, 406 Global change dynamics, 655–656 Global circuit and lightning locators, 910 Global field power (GFP), EEG, 203 Global Position and Tracking System (GPATS), 890–904 Global Positioning System (GPS), 445 Global processing, in human vision, 567–568 Glossy surfaces, 528 Gobo, 1047 Godchaux, Auguste, 456 Gold sensitization, in silver halide, 1292–1293 Gradient, in dye transfer printing, 193–194 Gradient echo, MRI, 981, 992–993, 998–999 Grain boundary segmentation, SIMS analysis, 487–489 Graininess, 598–616, 1047, 1303–1304, 1357 Granularity, 75, 140, 439, 604–605 Granulometric size density (GSD), 442 Granulometries, 439–442 Graphics cards, cathode ray tube (CRT), 174–175 Grasp, microscopy and, 1108 GRASP software, 706 Grassmann’s law, 531 Graticules, microscope, 1113, 1121 Gratings, 244–246, 560, 1092, 1094, 1099, 1102
holography in, 508 human vision and, 565, 567 lobes of, 5 microscopy and, 1108–1109 X-ray telescopes, 1504–1506 Gravimeter, 446 Gravitation imaging, 444–454 Gravity, 444, 445 Gravity anomalies, gravity imaging and, 444–445 Gravure multicopy printing, 454–463 Gravure Research Institute, 459 Gray card, 1047 Gray levels, 68, 618, 622 color/in color image processing, 114–115 in medical imaging, 752 monochrome image processing and, 100 multispectral image processing and, 101 thresholding and segmentation in, 638–641 Gray scale, 589–591, 961, 1347 Gray surface, 804 Gray value, 103, 646, 1421 Gray, Henry F., 375 Graybody radiation, 222 Grazing incidence, X-ray telescopes, 1497–1499 Great Plains 1 (GP-1) lightning locators, 924–928 Green print, 1047 Greene, Richard, 376 Gridding, gravity imaging and, 450 Ground clutter, radar, 1453 Ground coupling, ground penetrating radar and, 468 Ground instantaneous field of view (GIFOV) overhead surveillance systems, 783, 791 Ground penetrating radar, 463–476 Ground reaction force (GRF), force imaging and, 419–420 Ground resolvable distance (GRD), 790 Ground sampled distance (GSD), 790, 791–792 Groundwater detection, 454, 464 Group of pictures (GOP), 1386 Group range, 1144 Group velocity, 228 Guide rails, 1047 Guide roller, 1047 Guillotine splice, 1047 Gutenberg, Johannes, 455 Gyromagnetic ratio, 217, 978
H Haar wavelet, 1446 Hadronic cascades, 1158
1525
Halation, 1047 Half-wave antennas, 220 Half-wave plates, 233 Halftone screening, 455 Halide, 1047 Hall effect/Hall generators, in magnetic field imaging, 973–974 Halogen, 135, 222 Halos, 1135 Hamiltonians, 285 Hamming windows, 611 Handheld cameras, 1031 Hanover bars, 1371 Hard, 1047 Hard light, 1047 Harmonic imaging, 1432 Harmonics, 212, 225–226, 250, 972, 1092, 1098, 1104 Hartmann–Shack sensors, 547 Hazard assessment, geologic imaging, 651–655 Haze, 604 Haze filters, 1047 Head-end, 1047 Head recording, 1047 Head-up displays (HUD), 509 Heat transfer, in infrared imaging, 809 Heater, electron gun, 39 Heidelberg Digimaster, 300 Heisenberg uncertainty principle, 215 Helio Klischograph, 456–457, 461 Helium neon lasers, 508 Hell machine, 456, 461 Helmholtz invariant, 1080 Helmholtz–Kirchoff formula, 1486 Helmholtz–Lagrange relationship, 481 Hermitian transforms, 987 Herschel, John, 1022, 1345 Hertz, 213, 1047 Hertz, Heinrich, 211 Heterojunction photoconductors, 1173 Hewlett-Packard Laserjet, 302 Heyl, Henry, 1022 Hi Vision television, 1391 High-definition TV (HDTV), 41, 42, 47, 147, 151, 153, 157, 1039, 1047, 1382, 1390 High-Energy Neutral Atom Imager (HENA), 1007–1010 High-energy radiation (See X-ray; Gamma radiation) High-frequency voltage, scanning capacitance microscope (SCM) vs., 20
1526
INDEX
High-frequency waves, radar and over-the-horizon (OTH) radar, 1142 High-pass filters, image processing, 593–597 High-resolution electron microscopy (HREM), 273 High-resolution images, TEM, 270 High-resolution secondary ion mass spectroscopy, 477–491 High-resolution visible (HRV) imaging systems, 649, 655 High-speed cameras, 1047 High-speed photographic imaging, 491–504 High-voltage regulation, cathode ray tube (CRT), 180 Highlights, 1047 Hindered amine stabilizers (HAS), 296 HinesLab three-dimensional imaging, 1341 HIRES geologic imaging, 660 Histograms art conservation and analysis using, 666 in forensic and criminology research, 723 image processing and, 584 in search and retrieval systems, 619–620, 626, 629 in medical imaging, 755 thresholding and segmentation in, 637–638, 640 Hit or miss transform, 434–436 HLS coordinate system, 619 HMI lights, 1047 Hoffman Modulation Contrast, 1106, 1132–1134 Hoffman, Robert, 1106, 1133 Hold, 1047 Holographic optical elements (HOE), 509, 510 Holographic PIV (HPIV), 417 Holography, 223, 262, 504–512, 1328 flow imaging and, 417 image formation in, 571 inverse X-ray fluorescent holographic (IXFH) imaging, 1486–1489 normal X-ray fluorescent holographic (NXFH) imaging, 1484–1486 stereograms using, 1336–1337 in three-dimensional imaging, 1336–1337 in transmission electron microscopes (TEM), 272 X-ray fluorescence imaging and, 1484–1489
Homogeneity, in cathode ray tube (CRT), 181–182 Homologous points, in three-dimensional imaging, 1329 Horizontal gradients, gravity imaging, 452 Horn, Hermann, 456 Hot, 1047 HSB color coordinate system, 641 HSI color coordinate system, 641 HSI coordinate system, 112–114 HSV color coordinate system, 619, 641 Hubble telescope, 595 Hue, 103, 111, 117–119, 578, 580, 618, 1047 Human vision, 49–51, 84, 122–142, 512–570, 746–748 color vision in, 122–142 in color image processing, 101 display characterization in and, 186 feature recognition and object classification in, 353–358 image processing and, 583 optical geometry of, 54 persistence of vision, 1021–1022, 1050 in three-dimensional imaging, depth cues and, 1327–1328 Humidity, 1047 Huygen’s principle, 242–243, 246 Huygenian eyepieces, 1120 Huygens, Christian, 243 Hybrid image formation, 571, 574 Hybrid ink jet printing technology (HIJP), 818–819 Hybrid scavengeless development (HSD), electrophotography, 312, 320–321 Hydrazone, 1179 Hydrology, ground penetrating radar, 464 Hydroquinone, instant photography, 834–839 Hydroxylamine, instant photography, 839 Hyperfine splitting, electron paramagnetic resonance (EPR), 288 Hyperfocal distance, 1347 Hyperspectral imaging, in overhead surveillance systems, 787 Hypo, 1046, 1047 Hyposulfite, 1345
I I frame, video, 1387 iWERKS, 1031
I-Zone camera, 847 Idempotent operators, 436–437 Identification systems, instant photography, 855–856 Idle roller, 1047 IHS coordinate system, 111–112 Infrared light, 356 IKONOS satellite, 780 Ilfochrome, in photographic color display technology, 1217 Illuminance, 1345 Illuminants, 103–104, 524–527, 558, 561, 610–611 discounting, in human vision, 520–521 Illumination, 243, 528, 558, 1072, 1101 in art conservation and analysis, 665 in endoscopy, 335–336 in holography, 504 Kohler, 1126 matched, 1099 in microscopy and, 1110–1014, 1125–1128 in monochrome image processing, 100 Rheinberg, 1128 standard illuminants and, CIE, 103–104 Illuminators, microscope, 1107, 1125–1127 Image, 1047 Image aspect ratio (See Aspect ratio) Image authentication, digital watermarking, 161 Image chain, in overhead surveillance systems, 782–800 Image combination, image processing, 589–591 Image correction, flow imaging, 397 Image dissection cameras, 498 Image enhancement (See Enhancement of image) IMAGE EUV, 1006 Image fidelity, 598, 611, 612, 615 Image formation, 571–575 in human eye, 541–544 in microscope, 1107–1114 Image gathering, 49, 54–67, 84 Image integrity, in forensic and criminology research, 740–741 Image manipulation, instant photography, 856 Image-on-image or REaD color printing, 328–329, 328 Image plates, in neutron imaging, 1065–1066
INDEX
Image processing, 575–598 color (See Color image processing) in dye transfer printing, 193–194 in endoscopy, 336–338 feature recognition and object classification in, 351–353 in forensic and criminology research, 719–732 in infrared imaging, 807 instant photography and, 828, 829–830, 833–842 in magnetic resonance imaging (MRI), 987–988 in medical imaging, 754–756 monochrome, 100–101 morphological, 430–443 multispectral, 101 in overhead surveillance systems, 787 in scanning acoustic microscopy (SAM), 1231–1233 silver halide, 1299–1303 wavelet transforms in, 1448 X-ray fluorescence imaging and, 1477–1478 Image processing and pattern recognition (IPPR), 351 Image quality metrics (See Quality metrics) Image restoration, 49, 51, 60, 67–75, 84, 118–119, 167, 722–756 IMAGE satellite, 1018 Image search and retrieval (See Search and retrieval systems) IMAGE SEEK, 618, 621 Image sensors, 1199–1200 Image space, 1075 Image vectors, in tomography, 1407 Imagebase, 630 Imagery Resolution Assessment and Reporting Standards (IRARS), 795 ImageScape, 633 Imaging arrays, photodetector, 1194–1198 Imaging satellite elevation angle (ISEA), 791–792 IMAX, 1031, 1335 Imbibition (IB) system, 1024, 1047 Immersion medium, microscopy, 1116 Impedance, boundaries, non-planar radiators, 9 Impedance, acoustic, in ultrasonography, 1415 Impression rolls, gravure printing, 458–459 Improved Accuracy from Combined Technology (IMPACT) sensors, 935, 937, 941
Impulse response, in transmission electron microscopes (TEM), 265 In-phase signals, 1–3 In-plane switching (IPS), liquid crystal displays, 963–964 In the can, 1048 In vivo imaging electron paramagnetic resonance (EPR) imaging for, 297–298 endoscopy, 331–342 fluorescence microscopy, 1136–1137 terahertz electric field imaging and, 1398–1399 INCA, 1020 Incandescent sources, 222, 242, 524, 1037 Incoherent light, 232, 242, 1085 Incoherent spatial filtering, 1096 Incoherent transfer functions, 1098 Index of refraction, 225, 1075, 1079 in charged particle optics, 88 complex, 227 flow imaging and, 405, 412 human vision and, 551 microscopy and, 1109 radar, 1453 Indexed color, cathode ray tube (CRT), 176 Indexing, in search and retrieval systems, 617–618 Indium antimonide (InSb), 806–807, 1201 Indium gallium arsenide (InGaAs), 806, 1200 Indium tin oxide (ITO), 381, 819, 957 Indoaniline, 138 Indophenol, instant photography, 835 Induced dipole moment, 225 Induction coils, in magnetic field imaging, 971–972 Inelastic scattering, 249 Infinity corrected microscopes, 1106–1107 Infinity space, microscopy, 1107 Information capacity, 1082–1083 Information efficiency, 50, 64, 84 Information hiding, digital watermarking vs., 160 Information processing, holography, 510–511 Information rate, 50, 62–67, 72–74, 81, 84 Information theory, 99, 161, 168–171 Infrared imaging, 218–219, 230, 1393 art conservation and analysis using, 668–672 astronomy science and, 690, 691–693
1527
geologic imaging and, 648, 660 in overhead surveillance systems, 789 National Imagery Interpretability Rating Scale (NIIRS), 795–800 phosphor thermography vs., 866–867 pigments and paints in, 668–672 quantum well infrared photodetector (QWIP), 1190, 1205 satellite imaging systems and, 758 Space Infrared Telescope Facility (SIRTF), 690, 691–692 Stratospheric Observatory for Infrared Astronomy (SOFIA), 692–693 Television and Infrared Observational Satellite (TIROS), 757, 777 thermography, 802–817 Initial phase, 213 Ink-jet printers hybrid systems, 818–819 instant photography and, 853 organic electroluminescent display and, 817–827 quality metrics and, 598–616 shadow masks in, 825–826 Inks in electrophotography, 321–322 in gravure printing, 454, 455, 458, 459 InP lasers, 22 InP/InGaAsP buried hetero structure laser, 22–23 Instant color photography, Polaroid Corp., 127 Instant photography, 827–859 Instantaneous field of vision (IFOV), 54, 805 Intaglio, gravure printing, 454–463 Integrity, image (See Image integrity) Intensified CCD (ICCD), 393–394, 862 Intensity, 102, 110–111, 214, 241, 610, 611, 1047 in cathode ray tube (CRT), 177, 178–179 in charged particle optics, 89 feature measurement and, 344–345 flow imaging and, 397–398 image processing and, 578, 580 in television, 147–148 Interface contrast, in transmission electron microscopes (TEM), 269
1528
INDEX
Interference, 239–246 in cathode ray tube (CRT), 38–39 ground penetrating radar and, 470–471 Interferometry, 242, 246 astronomy science and, 691 flow imaging and, 405, 412 geologic imaging and, 648 holography in, 510 lightning locators, 907, 946–947 X-ray interferometric telescopes, 1503–1504 Interlace, 146–147, 725–726, 1048, 1359 Interline transfer CCD, 393 Interlock, 1047 Intermediate atomic state, 258 Intermediate films, 1026, 1047 Intermediate sprocket, 1047 Intermittent, 1048 Intermittent action high-speed cameras, 495 Intermittent contact SCM, 21–22 Intermittent movement, 1048 Internal conversion, 255 Internal reflection, 236, 237–239 International Color Consortium (ICC) standards, 172 International Commission on Illumination (CIE), 523 International Electrochemical Commission (IEC), 602 International Standards Organization (ISO), 139, 602, 1048 International Telecommunications Union (ITU), 102, 1362 Internegative, 1043, 1048 Interocular distance, 1329 Interplanetary magnetic fields (IMF), magnetospheric imaging, 1002–1021 Interposition, 1328 Interpositive films, 1034, 1048 Intersystem crossing, 255 Intertropical Convergence Zone (ITCZ), 655–656 Invariant class, in morphological image processing, 437 Inverse continuous wavelet transform, 1446 Inverse Fourier transform (IFT), 1095 Inverse square law, 214 Inverse X-ray fluorescent holographic (IXFH) imaging, 1486–1489 Inversion recovery, in magnetic resonance imaging (MRI), 993–994 Iodate, 1213 Iodide, 1212–1214
Ion beam-induced chemistry, in charged particle optics, 87 Ion fraction, secondary ion mass spectroscopy (SIMS), 478 Ion-induced secondary electrons (ISE), 482–483 Ion-induced secondary ions (ISI), 482–483 Ion selective electrodes (ISE), 1248–1259 IONCAD, 482 Ionic conductivity, silver halide, 1271 Ionization, 1154–1157, 1159–1162, 1173 Ionographic process, 301, 322 Ionospheric analysis, 219, 1141, 1149 Ionospheric plasma outflow, ENA imaging, 1012–1016 Ionospheric propagation, radar and over-the-horizon (OTH) radar, 1143–1145 IRE units, television, 1361 Iridium, in photographic color display technology, 1214 Iris, 112–113, 512 Irradiance, 50–55, 75, 80–83, 214, 524–525, 530, 782, 1079–1081, 1085, 1092, 1094, 1102, 1287–1288 Isostatic correction, gravity imaging, 448 Isotope decay, art conservation and analysis using, 679–680 Issacs, John D., 1022
J Jacobian transformation, 97, 98 Jagged edges, 75 Johnson, C.L., 775 Joint motion, force imaging, 420 Josphson tunnel junctions, SQUID sensors, 9–15 JPEG, 154, 171, 519, 521, 631, 741 Jumping development, in electrophotography, 301, 312, 318–320 Just noticeable difference (JND), 747
K K, 1048 K nearest neighbor classification, 366–370 K space, 574, 987–988, 992 Kell factor, television, 1362 Kernel operations, image processing, 577–578, 580 Kerr cells, high-speed photography, 492–493 Kerr effect, 233 Ketocarboxiamdie, 134
Keykode number, 1048 Keystoning, 1048, 1331 Kinematical theory, 286, 278–281 Kinematics, 421, 1154–1169 Kinematoscope, 1022 Kinescope, 1048 Kinetics, force imaging, 424 Kinetograph, 1022 Kinetoscope, 1022 Kirchoff’s laws, 759 Kirkpatrick–Baez telescopes, 1502–1503 Klein–Nishina formula, 257 Klietsch, Karl, 456 Klystrons, radar, 1452 Knife edge technique, 402–403, 501 Knock-on electrons, 1156 Kodachrome process, 127–128, 1024 Kodacolor process, 127–128 Kodak, 165–168, 829–830, 849, 1024 Kohler illumination, 1110–1114, 1126 Kohler, August, 1106, 1110–1111 Kolmogorov scale, flow imaging, 404 KONTRON, 483 Kramers–Heisenberg formula, 1476 Kronecker delta, 78 Kuwahara filter, 582
L L*a*b* coordinate system, 108–109, 630 L*u*v* coordinate system, 109, 537 Laboratory film, 1048 Laboratory image analysis, in forensic and criminology research, 732–740 Lagrange invariant, 1079–1081 Lambertian diffusers, 525 Lambertian operators, 525, 528, 782 Lambertian surfaces, 51 Laminar flow, resolution, 402–403 Lamps, in projectors, 1037 Land cameras, 827 Land, Edwin, 1331 Landau distributions, 1155–1156, 1159 Landmark-based decomposition, 627 Landsat, 353, 444, 453, 648, 778, 787 Laplacian of Gaussian (V2G) operator, 71 Laplacian operator, 580–581, 593 Large area density variation (LADV), 604 Large format cameras, 1352–1354 Larmor frequency, 979 Larmor’s formula, 220, 221 Laser engraving systems, 462–463 Laser-induced fluorescence (LIF), 338, 408, 861–869
INDEX
Laser printers, 195–196, 302, 306–310 Laser pumps, 223 Laserjet printer, 302 Lasers, 223, 255, 1177 continuous wave (CW) lasers, 391 dye lasers, 885 in electrophotography, 306–310 engraving systems using, 462–463 excimer lasers, 391 flow imaging and, 391–394, 416 free electron, 223 gravure printing, 462–463 helium neon lasers, 508 in holography, 507–508, 507 laser-induced fluorescence imaging, 861–869 lidar, 869–889 liquid crystal displays, 958 NdYAG, 391, 416 in phosphor thermography, 866–867 Q switching in, 391–392 terahertz electric field imaging and, 1402–1403 yttrium aluminum garnet (YAG), 391 Latent image, 1047, 1048, 1276–1284 Latent impressions, in forensic and criminology research, 712–714 Lateral geniculate nucleus (LGN), 563, 564 Lateral hard dot imaging, gravure printing, 461 Lateral inhibition, 1093 Latitude, 1048 Latitude correction, gravity imaging, 447 Lateral geniculate nucleus (LGN), 516, 518 Lattice, sampling, 50, 54, 57, 58, 59, 65, 71 Laue patterns, 244, 279 Launch Pad Lightning Warning System (LPLWS), 890–904, 908, 921–922 Law of reflection, 234 Law of refraction, 234 Law, H.B., 31 Layout, 1048 Lead chalcogenide photoconductors, 1200–1201 Leader, 1048 Leaf shutters, 1351 Least mean square (LMS) algorithm, 372 Least time, Fermat’s principle, 234 Legendre functions, 253 Leith, E.N., 504
Lens, 54, 58, 59, 65, 91–100, 541, 1073, 1345, 1347, 1348, 1354–1355 anamorphic, 1031, 1040 Bertrand polarization lens, 1132 charged particle optics, 86–100 coatings on, 1042 electron gun, 40 eyepiece, 1119–1122 flow imaging and, 392–394, 399 flux in, 1081 Gaussian optics in, 1078–1079 in holography in, 505–506, 508–509 in human eye, 54, 512–514, 540, 547, 560, 746 in microscopy, 1106, 1107, 1117 for motion pictures, 1029, 1048 in overhead surveillance systems, 783 in projectors, 1036–1037 in scanning acoustic microscopy (SAM), 1233–1234 in secondary ion mass spectroscopy (SIMS) in, 480–481 in terahertz electric field imaging, 1397 for wide-screen motion pictures, 1031 zoom, 1029 Lenticular method, color photography, 127 Lenticular sheets, in three-dimensional imaging, 1337–1338 Letter press, 455 Leyden jars, 492 Lichtenberg, 299 Lidar, 223, 804, 869–889 Lif converter, 1064–1065 Lifetime, radiative, 254 Light speed of, 224–225 wave vs. particle behavior of light in, 210–211 Light adaptation, human vision, 519–520 Light amplification by stimulated emission of radiation (See Lasers) Light axis, 1048 Light detection and ranging (See Lidar) Light emitting diodes (LED), 1177 in electrophotography, 300 in electrophotography, 305–306, 305 multicolor organic, 822–825, 822 in three-dimensional imaging displays, 1340
1529
Light filter, 1048 Light in flight measurement, holography, 510 Light intensity, 1048 Light levels, human vision, 558 Light meter, 1048 Light microscope, 261 Light output, 1048 Light sensitive microcapsule printers, 195 Light sources, 524–527 Light valve, 1048 Lighting monochrome image processing and, 100–101 in motion pictures, 1033 Lighting ratio, 1048 Lightness, 102–103, 618 Lightning Detection and Ranging (LDAR), 890–904, 914, 941–945 lightning direction finders, 906–907 Lightning Imaging Sensor (LIS), 890–904, 929, 932–935 Lightning locators, 572, 890–955 Lightning Mapping System (LMS), 890–904, 929 Lightning Position and Tracking System (LPATS), 890–904, 935 Lightning warning systems, 907 LightSAR, 660 Ligplot software, 706 Line metrics, 603–604 Line of sight, 541 Line scan image systems, 60, 1482 Line sources, continuous, 6–7 Line spread function (LSF), 402, 751–752, 1090–1091, 1097 Line spread profiles, cathode ray tube (CRT), 33–34 Linear attenuation coefficient, 260 Linear filtering, in medical imaging, 756 Linear operations, image processing, 577–578 Linear perspective, 1328 Linear polarization, 231, 233 Linearity, 1085 Lines per picture height, television, 1362 Linogram method, in tomography, 1406 Lip sync, 1048 Liquid crystal display (LCD), 176, 184–186, 955–969 in field emission displays (FED) vs., 374 in three-dimensional imaging, 1331, 1338–1339 response time in, 387 Liquid gate, 1048
1530
INDEX
Liquid immersion development (LID), 312, 321–322 Liquid metal ion sources (LMIS), 90–91, 93, 478, 479 Liquid mirror telescopes (LMT), 872 Lithography, 456 electron beam, 87 quality metrics for, 598–616 secondary ion mass spectroscopy (SIMS) in, 479 Lithosphere, gravity imaging, 444 Live action, 1048 Lobes, grating, 5 Local (residual) gravity anomaly, 448 Local motion signal, human vision, 566 Local processing, human vision, 567–568 Localization, 1074–1077 in human vision and, 569 in magnetic resonance imaging (MRI), 984–987 Locomotion, force imaging and analysis, 419–430 Log, camera, 1041 Logical granulometry, 442 Logical structural filters, 442 Long Range Lightning Detection Network (LRLDN), 890–904 Long shot, 1048 Longitudinal magnification, 1076 Lookup table (LUT), 575, 612 Loop, 1048 Lorentz electromagnetic force, 1157 Lorentzian, 229 Lossless coding, 50 Lossless compression, 741 Lossy compression, 741 Low Energy Neutral Atom Imager (LENA), 1013–1016 Low key, 1048 Low-pass filters, 593, 594–597, 611 Low resolution electromagnetic tomography (LORETA), 204–208 Lowe, Thaddeus, 773 LPE, scanning capacitance microscope (SCM) analysis, 22 Lubrication, 1048 Luma, 103 Lumen, 1048 Lumiere, Louis and Auguste, 1022 Luminance, 102, 109, 530, 611, 618, 1081 in cathode ray tube (CRT), 173, 183 in field emission displays (FED), 380, 387 human vision and, 521, 564 in liquid crystal displays, 957, 968 in motion pictures, 1048
in television, 148, 1365, 1367, 1375 Luminescence, 255 in field emission displays (FED), 375 in forensic and criminology research, 717–719 Luminiferous ether, 211 Luminosity, flow imaging, 397–398 Luminous efficiency function, 529–531 Luminous flux, 1345 Lux, 1048 Lyman alpha signals, 1017 Lythoge, Harry, 456
M M scan ultrasonography, 1414–1415 M way search, in search and retrieval systems, 626 Mach bands, 81 Macro lenses, 1354–1355 Macula, 513, 514, 549, 746–747 Macular degeneration, 549 Magazines, film, 1026, 1049 Magenta, 1049 Magenta couplers, color photography, 135–136 Magic lasso, thresholding and segmentation, 644–645 Magic wand, thresholding and segmentation, 643–645 Magnetic brush developer, in electrophotography, 312–318 Magnetic dipole moment, 217 Magnetic direction finders (MDF), 906 Magnetic field imaging, 211–212, 803, 970–977 cathode ray tube (CRT), 38 electron paramagnetic resonance (EPR) imaging for, gradients in, 289–292 energetic neutral atom (ENA) imaging, 1006–1010 magnetospheric imaging, 1002–1021 particle detector imaging and, 1157 RF magnetic field mapping, 1223–1227 Magnetic permeability, 467 Magnetic permittivity, 212 Magnetic quantum number, 217 Magnetic resonance imaging (MRI), 210, 977–1002 electron paramagnetic resonance (EPR) imaging for vs., 293–294 image formation in, 571, 573, 574 in magnetic field imaging, 970
in medical imaging, 744, 745 in RF magnetic field mapping, 1224 Magnetic resonance measurement, 970–971 Magnetic sector mass spectrometer, 481–482 Magnetic sound, 1037, 1049 Magnetic splitting, atomic, 217 Magnetic storms, 1145 Magnetic striping, 1049 Magnetic susceptibility effects, 297 Magnetic tape, 1049 Magnetic toner touchdown development, 312 Magnetic track, 1049 Magnetization transfer imaging, 991 Magnetoencephalograms, 12–15, 744 Magnetoinductive technology, 976 Magnetometers, 1015, 974–975 Magnetopause to Aurora Global Exploration (See IMAGE) Magnetoresistivity effect, 975 Magnetospheric imaging, 1002–1021 Magnetron, 288, 1452 Magnets, for MRI, 980–981, 997–998 Magnification, 1076, 1079, 1080 angular, 1076–1077 longitudinal, 1076 in microscopy, 1107, 1121, 1124 in overhead surveillance systems, 783 transverse, 1076 visual, 1077 Magnification radiography, 673–674 Magnitude, 226, 241 Magoptical, 1049 MagSIMS, 479–484 Mahanlanobis distance measure, 624 Makeup table, 1049 Manhattan Project, 744 Mapping in biochemical research, 699 geologic imaging and, 647–661 gravity imaging and, 444–454 in infrared imaging, 812–815 lightning locators, 909 in magnetic field imaging, 970 RF magnetic field mapping, 1223–1227 secondary ion mass spectroscopy (SIMS) in, 477 Marcus electron transfer theory, 1297 Markov random field, in search and retrieval systems, 623 Mars Express mission, 1020–1021 MARS systems, 622, 632 Masking, 129, 136–137, 139, 1049 Mass, gravity imaging, 444 Mass attenuation coefficient, 260
INDEX
Mass spectrometry, secondary ion (SIMS), 477–491 Master, 1034, 1049 Master positive, 1049 Matched illumination, 1099 MATLAB, 186, 542 Matte, 1049 Matte surfaces, 528 Mauguin condition, 960 Maxima, 245 Maximum realizable fidelity, 73, 81, 84 Maxwell pairs, in magnetic resonance imaging (MRI), 984 Maxwell, James Clerk, 123, 211 Maxwell’s equations, 212, 225, 233, 234 Maxwell’s theory, 211–214 Mean size distribution (MSD), 442 Mean square restoration error (MSRE), 69–70 Measurement, 343–350 in astronomy science, 686–688 in color image processing, 120 in colorimetry, 528 in display characterization, 185–186 in endoscopy, lesion size, 336–338 in gravity imaging, 453–454 in holography, 510 in infrared imaging, 814–815 in magnetic field imaging, 970–976 in meteorological research, 758–759 in planar laser-induced fluorescence (PLIF), 863 quality metrics and, 598–616 in radar imagery, 764–765 in satellite imaging systems, 758–759 in terahertz electric field imaging, 1396–1397 Media relative colorimetry, 533 Medial axis transform (MAS), 625 Medical imaging, 742–757 infrared imaging for, 812 instant photography and, 856 lens in, 1354 magnetic resonance imaging (MRI) in, 977–1002 scanning acoustic microscopy (SAM) in, 1228 single photon emission computed tomography (SPECT) in, 1310–1327 terahertz electric field imaging, 1393–1404 three-dimensional imaging in, 1327
ultrasound, 1412–1435 X-ray fluorescence, 1479–1482 Medium Energy Neutral Atom Imager (MENA), 1010 Medium format cameras, 1351 Medium shot, 1049 MegaSystems, 1031 MEH PPV polymer, 819 Memory color, 520–521, 612 Mercury cadmium telluride (HgCdTe or MCT), 807, 1190, 1201–1205 Mercury discharge tubes, 222 Mercury vapor lamps, 222 Mesopic vision, 515 Metal alloys, SAM analysis for, 1228 Metal halide lamps, 1037 Metal matrix composites, SIMS analysis, 484, 487 Metal oxide semiconductors (MOS), 18–19 Metal semiconductor diodes, 1173 Meteorology, 757–773 airborne radar, 1471–1473 ASOS Lightning Sensor (ALS), 907, 922 Automated Surface Observing System (ASOS), 907, 922 Defense Meteorological Satellite Program (DMSP), 890–904, 929 Doppler radar, 1458–1468 Electrical Storm Identification Device (ESID), 907, 922–924 lidar and, 869–870 lightning locators, 890–955 magnetospheric imaging, 1002–1021 mobile radar, 1471–1473 Thunderstorm Sensor Series, 907 Tropical Rainfall Measuring Mission (TRMM), 660, 771–772, 890–904, 929, 932–935, 1473 weather radar, 1450–1474 wind profiling radar, 1469–1471 Meter candle, 1049 Meteorological Satellite (METEOSAT), 760 Methodology for Art Reproduction in Color (MARC), 664 Methyliminodiacetic acid (MIDA), 141 Metz filters, 1322 Mexican hat wavelet, 1444 Michelson interferometer, 242 Michelson, Albert, 211 Micro dry process printers, 195 Microchannel plates (MCP), in neutron imaging, 1066–1067 Microcrystal preparation, silver halide, 1262–1268
1531
Microcrystals, SIMS analysis, 484–486 Microfilm, art conservation and analysis using, 661–662 Microimaging, 1351 Micromachining, 87 Micron Display, 377 Micropattern gas detector scintillation tracking, 1162–1163 Micropolarizers, in three-dimensional imaging, 1334–1335 Microreagant action in SECM, 1256–1257 Microscopy, 210, 261, 1072, 1106–1141 alternating current scanning tunneling microscopy (ACSTM), 28 art conservation and analysis using, 666–668 atomic force microscope (AFM), 16 biochemistry research and, 693–709 capacitive probe microscopy, 16–31 in charged particle optics and, 87 conventional fixed beam transmission electron microscopy (CTEM), 277 darkfield microscopy, 1127–1128 differential interference contrast (DIC) microscopy, 1106, 1134–1135 digital photomicrography, 1138–1140 electron microscopes, 261–287 emission electron microscope (EMM), 479 emission ion microscope (EIM), 478 fluorescence, 1106, 1135–1137 Fourier transform infrared (FTIR) microscope, 667 high-resolution electron microscopy (HREM), 273 Hoffman Modulation Contrast, 1106, 1132–1134 infinity corrected, 1106–1107 instant photography and, 855 Nomarksi interference microscopy, 1106 phase contrast microscopy, 265–266, 1106, 1128–1130 photomicrography, 1106, 1124, 1137–1138 reflected light microscopy, 1124–1127 scanned probe microscopy (SPM), 1248
1532
INDEX
Microscopy, (continued ) scanning acoustic microscopy (SAM), 1128–1148 scanning capacitance microscope (SCM), 16–31 scanning electrochemical microscopy (SECM), 1248–1259 scanning electron microscope (SEM), 23, 87–88, 262, 274–278, 477, 1243 scanning evanescent microwave microscope (SEMM), 28 scanning ion microscope (SIM), 477 scanning Kelvin probe microscope (SKPM), 16, 28 scanning microwave microscope (SMWM), 16, 28 scanning transmission electron microscope (STEM), 87, 93, 262, 276–278 scanning transmission ion microscope (STIM), 479 secondary ion mass spectroscopy (SIMS), 477–491 stereomicroscope, 1106 transmission electron microscope (TEM), 23, 87, 93, 262–274, 262 Microstrip gas chambers (MSGC), 1060–1062 Microstrip, silicon/Gd, 1065 Microtips, molybdenum, in field emission displays (FED), 375–376 Microwave radar ducting, 1141 Microwaves, 218, 220, 230, 242, 288, 803, 1223–1227, 1393, 1452 Mid-foot key number, 1049 Mie scattering, 253, 548, 875, 1451 Miller indices, 267 Mineral resource exploration, geologic imaging, 650 Mineralogy, electron paramagnetic resonance (EPR) imaging for, 296–297 Minima, 245 Minimum density, 1049 Minimum ionizing particles (MIPs), 1155 Minkowski addition/subtraction, 433, 612 Mirrors, 1073, 1074, 1075 astronomy science and, 691 camera, 1346, 1349–1350 extreme ultraviolet imaging (EUV), 1006 Giant Segmented Mirror Telescope, 693 in holography, 508–509
liquid mirror telescopes (LMT), 872 in overhead surveillance systems, 783–784 quadrature mirror filter (QMF), 622 rotating drum and mirror cameras, 496–497 rotating mirror framing cameras, 497–498 thin mirror telescopes, 1501–1502 in three-dimensional imaging, 1341–1343 X-ray telescopes, 1498–1499 Mix, 1049 Mixed high-signals, television, 1365 Mixing studies, flow imaging, 409, 410 Mobile radar, 1471–1473 Modeling in biochemical research, 699 quality metrics and, 610–615 three-dimensional imaging and, 1327 MODIS, 659–660 MODTRAN, 783, 785 Modulation, 1094 amplitude modulation (AM), 383, 1362 differential pulse code modulation (DPCF), 67, 786 frequency modulation (FM), 1362 pulse code modulation (PCM), 150 pulse width modulation (PWM), 383 quadrature amplitude (QAM), 149 quadrature modulation, 1365–1366 quantization index modulation (QIM), 170–171 sequential frequency modulation, 1367–1368 Modulation transfer function (MTF), 57–58, 95–96, 99, 140–141, 749, 750, 1091, 1094–1098, 1358 cathode ray tube (CRT), 34, 182 flow imaging and, 400–402 human vision and, 521, 544, 555 Moir´e patterns, 38–39, 75 Molecular biology spectroscopy, 1403 Molecular medicine, 745–746 Molecular polarization, 227, 252 Molecular replacement, in biochemical research, 698–699 MOLMOL software, 706 MolScript software, 706 Molybdenum, in field emission displays (FED), 375–376
Moment-based search and retrieval systems, 626 Mondrian values, 52–53 Monitors (See Displays) Monochromatic aberrations, 545–548, 1083–1085 Monochromatic primaries, 104 Monochrome image processing, 100–101 Morley, Edward, 211 Morphological image processing, 430–443, 584–589 MOSFETs, 22–23 Motion analysis, 424, 494 Motion detection, human vision, 566–567 Motion parallax, 1328 Motion pictures (See also Television; Video), 1021–1056 color in, 142 electrostatic image tube cameras, 498 high-speed photography and, 494–498 image dissection cameras, 498 intermittent action high-speed cameras for, 495 photographic color display technology, 1208–1222 Polachrome, 848 Polavision, 848 rotating drum and mirror cameras, 496–497 rotating mirror framing cameras, 497–498 rotating prism cameras, 495–496 Motion studies in meteorological research, 769–771 in overhead surveillance systems, 784 Motorboating, 1049 Motorola, 377 Mottle, 604 Moving slit parallax barrier, 1338 Moviola, 1034, 1049 MOVPE, 22 MPEG, 156–157, 171, 631 MPS PPV, 821–822 MSMS software, 706 MSP software, 707 Mug shots, 715–716 Multi-up processing, 384 Multiangle Imaging Spectroradiometer (MISR), 772 Multiangle viewing instruments, 772 Multichannel scaler (MCS), 873 Multichannel sound, 1362, 1365 Multicolor organic light emitting diodes (OLED), 822–825
INDEX
Multihop operations, radar and over-the-horizon (OTH) radar, 1141 Multilayer perceptron for pattern recognition, 371–373 Multilayer telescopes, 1503 Multipinned phase (MPP) CCD, 1198 Multiplanar imaging, 1328, 1341 Multiple beam interference, 244–246 Multiple carrier systems, television, 1365 Multiple coulombic scattering (MCS), 1157 Multiple isomorphous replacement (MIR), 698 Multiple wavelength anomalous dispersion (MAD), 698 Multiplex sound, television, 1362, 1365 Multiplexed analog component (MAC), television, 1374 Multiplexing, 1049 in liquid crystal displays, 956, 957 in television, 1366–1367, 1382–1385, 1389 Multislice method for crystals, 284–285 Multispectral imaging, 56, 101 geologic imaging and, 660 National Imagery Interpretability Rating Scale (NIIRS), 801 in overhead surveillance systems, 787 Multispectral scanner (MSS), 352, 360, 779 Multispectral sensors, 356 Multivariate granulometry, 441 Multiwave proportional chamber (MWPC), 1160 Multiwavelength imaging, 682–686 Multiwire proportional chambers (MWPC), 1059–1060 Munsell system, 110 MUSE TV transmission systems, 1391 Muybridge, Edweard, 1022 Mylar, in electrophotography, 301
N Naphthol, 133, 134 Narration, 1049 National Imaging and Mapping Agency, 445 National Imagery Interpretability Rating Scale (NIIRS), 795–800 National Integrated Ballistics Information Network (NIBIN), 716 National Lightning Detection Network (NLDN), 890–904, 924, 925, 935–941
National Television Standards Committee (NTSC) standards, 102, 146–149, 382, 519, 521, 619, 1049–1050, 1359–1393 NdYAG lasers, 391, 416, 867 Near distance sharp, 1347 Near field lightning locators, 911–912 Near field diffraction, 246 Near field imaging, 1400–1403, 1425–1426 Near instantaneous companded audio multiplex (NICAM), 1365 Near point, 1077 Nearest neighbor classification, 366–370 Negative and reversal film, 1023 Negative film, 1049 Negative image, 1043, 1049 Negative positive process, 1049, 1347–1348 Negative, intermediate, and reversal film, 1023–1024 Neighborhood operations in image processing, 576–578, 583–584 in medical imaging, 755 in search and retrieval systems, 623 in thresholding and segmentation, 640 Network locator lightning locators, 905, 908 Network mapper lightning locators, 905, 909 Neural activity, electroencephalogram (EEG), 198–199 Neural networks, 371–373, 510–511, 632 Neurons and human vision, 513, 560, 563, 565 Neutral density filters, 1049 Neutral test card, 1049 Neutron activation autoradiography (NAAR), 678–680 Neutron imaging, 1057–71 Neutron radiography, art conservation and analysis using, 678–680 New York Times, 456 Newton’s law of gravitation, 444 Newton’s rings, 1049 Next Generation Space Telescope (NGST), 693 Niepce, Joseph, 455, 1345 Nier–Johnson geometry, 481 NIMBUS satellites, 778 Niobium, in SQUID magnetometers, 10 Nitrate film, 1023, 1039, 1049
1533
Nitride oxide silicon (NOS), 23 Nitrogen, 135 Nitroxyl radicals, 289, 296 NMOS, scanning capacitance microscope (SCM), 22 Noise, 50, 60, 61, 62, 63, 64, 68, 69, 72, 74, 75, 80, 1049, 1357 compression, 152 in forensic and criminology research, 727–731 in human vision and color vision, 517 image processing and, 593–594 in infrared imaging, 809 in magnetic resonance imaging (MRI), 996 in medical imaging, 748–749 in overhead surveillance systems, 785 photodetectors and, 1187–1188 quality metrics and, 598–616 radar and over-the-horizon (OTH) radar, 1142 Rose model and medical imaging of, 753–754 SPECT imaging, 1322 thresholding and segmentation in, 639 in ultrasonography, 1416–1417 wavelet transforms in, 1448 Noise equivalent power (NEP), 1188 Noise equivalent temperature difference (NETD), 814 Noise factor, flow imaging, 395, 396 Nomarksi interference microscopy, 1106 Nomarski, Georges, 1106, 1134 Nondestructive evaluation (NDE) infrared imaging for, 811–812 ultrasonography for, 1412 X-ray fluorescence imaging for, 1478–1479 Nonlinear editing, 1035, 1230–1231 Nonlinearity, in ultrasonography, 1417–1418 Nonparametric decision theoretic classifiers, 359–363 Nonradiometric IR imaging systems, 804 Nonsymmetrical amplitude, 4 Normal dispersion, 227 Normal X-ray fluorescent holographic (NXFH) imaging, 1484–1486 Normalization, 532, 604–605, 1156 Normalized colorimetry, 533 Normally black (NB) liquid crystal displays, 963 Normally white (NW) liquid crystal displays, 960
1534
INDEX
North American Lightning Detection Network (NALDN), 890–904, 935 Notching, 1049 Nuclear magnetic resonance (NMR) in biochemical research, 694, 699, 705, 706 electron paramagnetic resonance (EPR) imaging for vs., 287, 293–294 in magnetic field imaging, 971 RF magnetic field mapping, 1223–1227 Nuclear systems, 215 Nucleation and growth method, for silver halide, 1276–1280 Nucleic Acid Database (NDB), 699 Numerical aperture, 1081, 1115 Nyquist frequency, 790 Nyquist limit, 401, 561 Nyquist’s criterion, MRI, 986 Nyquist’s sampling theorem, 552
O O software, 707 Object classification, human vision, 569 Object recognition, 520–521 Object shaped search and retrieval systems, 625–628 Object space, 1075 Objective quality metrics, 602–606 Objectives, microscope, 1114–1124 Observed gravity value, 446 Ocean wave height, radar and over-the-horizon (OTH) radar, 1149 Oceanography, 760 Oculars (See Eyepieces) Odd and even field scanning, 146 Off axis aberration, 547 Off axis holography, 272 Off line editing, 1050 Offset gravure, 460 Oil exploration, 454, 650 Oil immersion microscopy, 1109–1110 Omnimax, 1031 On axis aberration, 545–546 Online editing, 1050 OPAL jet chamber, 1155 Opaque, 1050 Open cascade development, 312 Open-close filters, 437 Openings, morphological, 430, 436–438, 585 Optic nerve, 513, 552, 562 Optical activity, 233 Optical axis, 54, 540–543 Optical character recognition (OCR), 680
Optical coherence tomography (OCT), 339 Optical density, 234 Optical design, 1074 Optical disks storage 483 Optical effects, 1050 Optical fibers, 509 Optical geometry, 54 Optical image formation, 571, 1072–1106 Optical microscopy, 1106–1141 Optical power, 1074 Optical printer, 1050 Optical resonators, 223 Optical sectioning, microscopy, 1133–1134 Optical sound, 1031, 1037, 1050 Optical thickness, 1135 Optical transfer function (OTF), 57, 1092–1100 aberrations and, 1095–1098 flow imaging and, 399–402 in overhead surveillance systems, 784 Optical Transient Detector (OTD), 890–904, 929–932 Optics (See also Lens) astronomy science and, 691 flow imaging and, 392–393 holography in, 508–509 in human vision, 558–560 microscopy and, 1106–1107 paraxial, 1073–1074 secondary ion mass spectroscopy (SIMS) in, 480–481 terahertz electric field imaging and, 1394 Optimization criterion, in tomography, 1407 Optimum classification, 364–366 OR, 590 Orbital angular momentum, 216 Orders, microscopy, 1108 Organic electroluminescent display, 817–827 Organic ligands, in photographic color display technology, 1214 Organic light emitting diodes (OLED), 817, 822–825 Original, 1050 Orientation, human vision, 565 Original negative, 1050 Orthochromatic film, 1050 Orthogonal frequency division multiplexing (OFDM), 1389 Orthogonal wavelet basis, 1446 Orthopedic medicine, force imaging, 422 Oscillating planar mirror displays, 1341–1342 Oscillations of waves, 226, 803
Oscillator strength, 229 Oscillators, 242 Oscilloscope, cathode ray tube (CRT), 47 Ostwald system, 110 Out of phase signals, 2 Out take, 1050 Outgassing, in field emission displays (FED), 370–380 Over-the-horizon (OTH) radar, 1141–1153 Overall darkness, 604 Overexposure, in electrophotography, 318 Overhead surveillance (See also Aerial imaging), 773–802 Overlap splice, 1050 Overlay, 1050 Overshoot tolerance, 912–914 Oxichromic developers, 835 Oxidation of dyes, 836–839 Oximetry, EPR imaging for, 297–298
P P-frame, video, 1387 p–n photodiodes, 1172 Packet communications, 151, 152, 1382–1385 Packetization delay, compression, 152 Packetized elementary streams (PES), 1382–1385 Paints infrared light and, 668–672 Pressure sensitive paint, 867–868 Pair production, 256, 260 PAL standard, 146–149, 382, 1050, 1371–1373 PALplus television systems, 1392 Pan, 1050 Panavision 35, 1050 Panchromatic film, 1050 Panchromatic imaging, 55, 61, 85 Papers, photographic, 141, 605–606, 1209–1211 Parallacticscope, 1338 Parallax, 1050, 1330 Parallax barrier displays, 1337 Parallax, motion, 1328 Parallel viewing, 1336 Parallelism, functional, 562, 563 Parallelism, spatial, 562, 564 Paraxial marginal ray (PMR), 1080, 1083, 1085 Paraxial optics, 1073–74 Paraxial pupil ray (PPR), 1080, 1083, 1085 Paraxial ray tracing, 1074–1075, 1078, 1083, 1085 Paraxial rays, 97
INDEX
Parietal lobe, 569 Parseval’s theorem, 1105 Particle accelerators, 970 Particle beam measurement, 975 Particle detector imaging, 1154–1169 Particle form factor, 252 Particle image velocimetry (PIV), 391, 413–416 Particle polarization, 252 Particle sizing, in holography, 510 Passband sampling, 50, 56–61, 65 Passive matrix liquid crystal displays, 956 Pathogen dispersal, geologic imaging, 656–659 Pattern classification, 358–371 Pattern recognition, 350–353, 371–373 human vision and, 522, 559, 565 in search and retrieval systems, 625, 633 Pattern scanning, 573–574 Pattern spectrum, 442 Pattern spectrum density (PSD), 442 Pauli exclusion, in silver halide, 1272 Pediatric medicine, force imaging, 422 PEDOT polymer, 819–820, 823 Peel apart films, 828 Pendellousung effect, 282, 286 Penetration depth, in scanning acoustic microscopy (SAM), 1129–1130 Perceptron, pattern recognition using, 371–373 Perceptual redundancy and compression, 151 Perfect reflecting diffuser, 528 Perforations, film, 1025–1026, 1045, 1050 Period of wave, 212 Periodic functions, 1102–1105 Permeability, ground penetrating radar, 467 Permittivity, 212, 225, 466, 467 Persistence of vision, 1021–1022, 1050 Perspecta sound, 1050 Perspective, 1328, 1330, 1345–1347 Persulfate bleaches, 139 Perturbing spheres, in RF magnetic field mapping, 1223–1224 Perylenes, 1179, 1181 Phase, 241 Phase contrast imaging, in scanning acoustic microscopy (SAM), 1230 Phase contrast microscope, 265–266, 1106, 1116, 1128–1130 Phase object approximation, 278 Phase of wave, 213
Phase retarders, 233 Phase shift, 238–239, 1094, 1130 Phase transfer function (PTF), 1094 Phase velocity, 213, 228 Phasor diagrams, 241 Phasors, 246 Phenols, 133 Phenylenediamine (PPD) developer, color photography, 130 Phosphor primaries, 104 Phosphor screen, CRT, 31, 44–46, 173, 380 PhosPhor thermography, 864–867 Phosphoresence, 255 Photo-induced discharge curve (PIDC), 301–302, 318 Photobook, 623, 627 Photoconductivity, 1169–83 Photoconductors, 300–302, 310–312, 1169–83, 1190, 1200–05 Photodetectors, 49–61, 68, 70, 75, 80, 1087, 1090, 1172–1174, 1190–1191, 1198–1201 quantum well infrared photodetectors (QWIP), 807 Photodetectors, 1183–1208 Photodiodes, 1172–1174, 1190–1191, 1198–1201 Photodynamic therapy (PDT), in endoscopy, 339 Photoelastic materials, force imaging, 420 Photoelectric detectors, 1169 Photoelectric effect, 211, 255–256 Photoemission, 1169–1170 Photoemulsion microcrystals, SIMS analysis, 484–486 Photofinish photography, 500–501 Photogrammetry, 737–739, 1327 Photographic color display technology, 1208–1222 Photography, 456, 1344–58 art conservation and analysis using, 661–682 color (See Color photography) digital, 141–142 electro- (See Electrophotography) in forensic and criminology research, 709–714 high-speed imaging (See High-speed photographic imaging) instant (See Instant photography) in medical imaging, 754 motion picture (See Motion pictures) mug shots, 715–716 in overhead surveillance systems, 773–802
1535
photofinish photography, 500–501 process screen photography, 1051 quality metrics and, 598–616 in search and retrieval systems, 616–637 silver halide detectors and, 1259–1309 still, 1344–1358 stroboscopic, 493–494 surveillance imaging using, 714–715 synchroballistic photography, 500–501 Photointerpretation, 774 Photolithography, 383, 384 Photoluminescence, 255 Photometer, 185–186, 1050 Photomicrography, 855, 1106, 1124, 1137–1138 Photomultiplier, 1062, 1183–1208 Photomultiplier tubes (PMT), 873 Photon absorbers, in infrared imaging, 806 Photon counting, lidar, 873–874 Photon flux, 215 Photons, 211, 214–215, 223 Photopic luminous efficiency function, 555 Photopic vision, 515 Photopigments, 101, 563 Photopolymers, holography, 509 Photoreceptors, 49, 50, 58, 65, 71, 72, 513–517, 549–552, 558–562, 746–747, 1178–1183 in copiers, 1175–1183 Photoreconnaissance, 774 Photostat, 299 Photothermographic imaging, 851–853 Phototransistors, 1172–1174 Photovoltaics, 1190, 1204 Phthalocyanines, 1179, 1181 Physical vapor deposition (PVD), 383 PicHunter, 632 Pickup coils, in RF magnetic field mapping, 1224 Pictorial interface, in search and retrieval systems, 618 Pictrography 3000/4000, 852–853 Pictrostat 300, 852–853 Pictrostat Digital 400, 852–853 Piezoelectric effect, 420 Piezoelectric sensors, 423–424 Piezoelectric transducers, 1233, 1234 Pigment epithelium, 513 Pigments, infrared light, 668–672 Pin photodiodes, 1173 Pin registration, 1050 Pin scorotron, in electrophotography, 303
1536
INDEX
Pincushion distortion, 42–43, 93 Pinhole camera, 1072–1073 Piston transducers, 8–9, 1424–26 Pitch, 1050 Pitch, film, 1026 Pivaloylacetanilides, 134–135 Pixel detectors, particle detector imaging, 1167–1168 Pixel independence, cathode ray tube (CRT), 176, 179–181 Pixels, 1050 in cathode ray tube (CRT), 173 feature extraction using, 357 feature measurement and, 343–344 image processing and, 575–578 PixTech, 376 Planar Doppler velocimetry (PDV), flow imaging, 415–416 Planar imaging, flow imaging, 390–391, 397 Planar laser-induced fluorescence (PLIF), 391, 408–409, 411–416, 861–864 Planar neutron radiography, 1067 Planar sources, continuous, 7–8 Planck distribution, 804 Planck, Max, 211 Planck’s constant, 287, 288, 394, 1184, 1273, 1476 Planck’s equation, 525, 782 Planck’s function, 759 Planck’s law, 683 Planckian radiation (See also Blackbody radiation), 525 Plane of incidence, 233, 234 Plane of vibration, 231 Plasma displays, display characterization, 185 Plasma enhanced chemical vapor deposition (PECVD), 384 Plasma frequency, 227, 231 Plasma sheet and cusp, ENA imaging, 1010–1012 Plasma wave generators, terahertz electric field imaging, 1403 Plasmasphere, extreme ultraviolet imaging (EUV), 1005–1006 Plasmatrope, 1022 Plateau, Joseph A., 1022 Platinum silicide Schottky barrier arrays, 1201 PMOS, scanning capacitance microscope (SCM), 22 p–n junctions, scanning capacitance microscope (SCM) analysis, 22 Pockels cells, high-speed photography, 492–493 Pockels effect, 233, 1394, 1397 Pocket Camera instant films, 847 Podobarograph, force imaging, 421
Point of denotation (POD), in search and retrieval systems, 621–622, 630 Point of subjective equality (PSE), 609 Point spread function (PSF), 55, 1082, 1085–1092, 1097, 1098, 1099 flow imaging and, 399, 402 human vision and, 542, 543 image processing and, 596 in magnetic resonance imaging (MRI), 987 in medical imaging, 750 in overhead surveillance systems, 784 telescopes, 688 in ultrasonography, 1421–1423 Poisson probability, 53 Poisson’s equation, 26–27 Polachrome, 848 Polacolor, 834, 840, 843–844 Polarization, 212, 223, 226, 227, 231–233, 235, 241, 250, 251, 252, 255, 257 art conservation and analysis using, 666–668 in forensic and criminology research, 719 in ground penetrating radar, 468 in high-speed photography, 492–493 in holography, 509 in human vision, 550–551 in lidar, 873 in magnetic resonance imaging (MRI), 978, 996 in meteorological research, 772 in microscopy and, 1116, 1131–1132, 1134 modifying materials vs., 233 in radar, 1453, 1468 reflectance and, 237 in terahertz electric field imaging, 1397 in three-dimensional imaging, 1330, 1331 Polarization and Directionality of Earth’s Reflectance (POLDER), 772 Polarization angle, 237 Polarization diversity radar, 772, 1468–69 Polarization rotators, holography, 509 Polarizing filter, 1050 Polaroid, 127, 829, 831–833, 844–849, 1331 Polaroid sheet, 233 Polavision, 848 Polyaromatic hydrocarbons, 1179
Polyester film, 1023, 1039, 1050 Polyethylene terephthalate films, 1023 Polymer degradation, EPR imaging, 296 Polymer films, gravure printing, 461 Polymer light emitting diodes (PLED), 817, 820–822 Polymer light emitting logo, 819–820 Polymeric research, scanning electrochemical microscopy (SECM) for, 1255–1256 Polyvinylcarbazole (PVK), 301, 823–825, 1178 Population inversion, 223 Position of objects, feature measurement, 345–347 Positive film, 1051 Positron emission tomography (PET), 210, 220, 743, 1407, 1324–1326 Postdevelopment processes, silver halide, 1303 Postproduction phase, in motion pictures, 1034, 1051 Potential well (POW) Scorotron, in electrophotography, 303 POV Ray software, 707 Powder cloud development, in electrophotography, 312 Power amplification, radar and over-the-horizon (OTH) radar, 1151 Power consumption, in field emission displays (FED), 388 Power K, 1078 Power law, human vision, 747 Power spectral density (PSD), 52, 53, 61–75, 80 Power, optical, 1074 Poynting vector, 213 PPP NEt3 polymer, 820–822 Prandtl number, flow imaging, 404 Precipitation, in silver halide, 1266 Predictive coding (DPCM), compression, 153, 155 Preproduction, in motion pictures, 1031–1032 Presbyopia, 540 Pressure effects flow imaging and, 411–416 force imaging and, 420, 421, 424–426 planar laser-induced fluorescence (PLIF) in, 863 pressure sensitive paint, 867–868 Pressure garments, force imaging, 420 Pressure sensitive paint, 867–868 Primary colors, 102, 104, 126–127, 358, 531–534, 1051 Principal maxima, 245
INDEX
Principal planes, 1078 Principle of superposition, 239–240 Print film, 1024, 1051 Printed edge numbers, 1051 Printer lights, 1051 Printers, 300 cyclic color copier/printers, 328 direct thermal printers, 196 display characterization in, 183 dots per inch (DPI) in, 602 drive mechanisms in, 193 dye sublimation, 189–190, 194–195 in dye transfer printing, 188–197 electrosensitive transfer printers, 195 image-on-image or REaD color printing, 328–329 laser printers, 195–196, 302, 306–310 light sensitive microcapsule printers, 195 micro dry process printers, 195 in motion pictures, 1022 quality metrics and, 598–616 REaD color printing, 328–329 tandem color printers, 328 thermal transfer printers, 189 wax melt, 190–191, 195 wet gate, 1055 Printing, 454–463, 1051 Printing press, 455 Prisms, 495–496, 1073, 1106 Prisoner’s problem, 160 Process screen photography, 1051 Processing, 1051 Processing chemicals (See also Developers), in holography, 509 Processing of images (See Image processing) Product theorem, 4 Production phase, in motion pictures, 1032 Production supervisor, 1051 Progressive scanning, 146–147 Projection, image formation, 571–574 Projection film, 1023 Projection matrix, in tomography, 1407 Projection speed, 1051 Projection theorem, in tomography, 1406 Projectors, 184–185, 1022, 1035–1038, 1330 Prontor shutters, 1351 Propagation of waves, 220–231 in ground penetrating radar, 466, 467–469
in radar and over-the-horizon (OTH) radar, 1147, 1453–1454 Propagative speed, in ultrasonography, 1415 Proportionality, Grassmann’s, 531 Proportioned bandwidth, television, 1367 Protanopia, 522–523 Protein Data Bank (PDB), 699 Proteins (See Biochemistry) Proton/electron auroras, far ultraviolet imaging, 1016–1020 Proximity sensors, for lightning locators, 907 Pseudocolor display, 120–121 Pseudostereo imaging, 1330 Psychological quality metrics, 614–615 Psychometrics in quality metrics, 609–610 Psychophysical quality metrics, 607, 614–615 Ptychography, 262 Pulfrich technique, in three-dimensional imaging, 1333–1334 Pull down claw, 1027, 1051 Pulse code modulation (PCM), 67, 150 Pulse echo imaging, in ultrasonography, 1412 Pulse wave mode, in scanning acoustic microscopy (SAM), 1232–1234 Pulse width modulation (PWM), 383 Pulsed EPR imaging, 293 Pulsed repetition frequency (PRF), 872, 1452 Pump, laser, 223, 391 Punch-through bias, particle detector imaging, 1166 Pupil, 112–113, 514, 519, 540–543, 547, 553–554, 558 Pupil functions, 1086–1087 Pupil rays, 1081 Purple Crow lidar, 871 Push-broom scanners, 806 PUSH three-dimensional imaging, 1334 Push processing, 1051 Pyrazolin, 135 Pyrazolinone, 135 Pyrazolotriazole, 133, 135–136 Pythagoras’ theory, 1094
Q Q switching, 391–392, 508 QBIC, 618–630 Quad tree, in search and retrieval systems, 629
1537
Quad tree method, thresholding and segmentation, 645–646 Quadrature amplitude modulation (QAM), 149 Quadrature mirror filter (QMF), 622 Quadrature modulation, 1365–1366 Quality metrics and figures of merit, 598–616, 1081–1085, 1357–1358 human vision and, 543–544 in medical imaging, 748–754 in overhead surveillance systems, 789 PSF and, 1089–90 in tomography, 1409–1410 in ultrasonography, 1420–1424 Quantitative imaging, flow imaging, 391, 397 Quantitative SPECT, 1322–1324 Quantitative structure determination, in transmission electron microscopes (TEM), 273–274 Quantization, 61, 64, 211 compression, 154 digital watermarking and, 150 electron paramagnetic resonance (EPR) imaging for, 287 flow imaging and, 402 liquid crystal displays (LCDs), 184 television, 1375 video, 1388 Quantization index modulation (QIM), digital watermarking, 170–171 Quantization noise, 62, 63, 64, 68, 69, 80 Quantum detection efficiency/quantum efficiency, 691, 748, 1190 Quantum electrodynamics (QED), 256, 259 Quantum mechanical cross section, 255–256 Quantum nature, 214–218 Quantum sensitivity, in silver halide, 1282, 1284–1285 Quantum step equivalence (QSE), 786 Quantum terahertz biocavity spectroscopy, 1403 Quantum theory, 211, 214–218, 253, 1170 Quantum well arrays, 1205 Quantum well infrared photodetector (QWIP), 807, 1190, 1205 Quantum yield, 255, 862 Quarter wave plates, 233 Quasi 3D, scanning capacitance microscope (SCM), 25–26
1538
INDEX
Query by color, in search and retrieval systems, 621–622 Query by example, in search and retrieval systems, 620–621, 624, 628, 630 Query by sample, in search and retrieval systems, 624 Query by sketch, in search and retrieval systems, 618, 627, 630 Quinone, instant photography, 837–839 Quinonedimine (QDI) developer, color photography, 130
R R 190 spool, 1051 R 90 spool, 1051 Radar, 1450–1474 airborne, 1471–1473 bistatic, 772 cathode ray tube (CRT) in, 47 coordinate registration, 1144 Doppler (See Doppler radar) Doppler shift, 1145 dwells, 1146 equatorial anomaly, 1144 geologic imaging and, 648 ground penetrating, 463–476 group range, 1144 lidar vs., 869–870 magnetic storms vs., 1145 measurements in, 764–765 meteorology, 757–773 microwave radar ducting, 1141 mobile systems, 1471–73 National Imagery Interpretability Rating Scale (NIIRS), 795–800 over-the-horizon (OTH) radar, 1141–1153 in overhead surveillance systems, 775–802 polarization diversity radar, 772, 1468–1469 range folding, 1145 scattering, 1145–1146 shortwave fadeout, 1145 skip zone, 1141 sporadic E, 1144 spotlights, 1146 storm tracking, 765–769 surface wave radar, 1141 synthetic aperture radar (SAR), 356, 789 terminators, 1145 traveling ionsopheric disturbance (TID), 1144 weather radar, 1450–1474 wind profiling radar, 1469–1471 Radar cross section (RCS), 1145
Radar reflectivity factor, 765, 1453, 1454–1458 RADARSAT, 649 Radial velocity, 764 Radiance, 61, 524–525, 530, 785, 968, 1081, 1092 Radiance field, 51–54 Radiant heat transfer, in electrophotography, 325 Radiated fields, lightning locators, 909–914 Radiation, lightning locators, 911–912 Radiation belts, energetic neutral atom (ENA) imaging, 1006–1010 Radiation damping, 226 Radiation oncology, in medical imaging, 744 Radiation pressure, in transmission electron microscopes (TEM), 214 Radiation zone, 220 Radiative lifetime, 254 Radiators, non-planar, impedance boundaries, 9 Radio astronomy, 210 Radio frequency (RF), 220, 909 Radio interferometry, lightning locators, 946–947 Radio waves, 218, 230, 242, 803 astronomy science and, 682, 693 Atacama Large Millimeter Array (ALMA), 693 lightning locators, 890 RF magnetic field mapping, 1223–27 Radiography, 1057–71, 1067 Radiometric IR imaging systems, 804 Radiometry, 524, 810 Advanced Very High-Resolution Radiometer (AVHRR), 759, 760–761 Along Track Scanning Radiometer (ATSR), 772 Multiangle Imaging Spectroradiometer (MISR), 772 in overhead surveillance systems, 783 Radiosity, 804 Radiotracers, SPECT imaging, 1310–1314 Radon transform, in tomography, 1404–1406 Raggedness, 604 Rainbow schlieren, flow imaging, 412 Ram Lak filters, in tomography, 1405 Raman scattering, 259–260, 391, 411, 874, 882–885 Ramsden disk, 1111, 1120
Ramsden eyepiece, 1120 Random dot autostereogram, in three-dimensional imaging, 1336 Range folding, 1145 Range normalized signal strength (RNSS), lightning locators, 937 Range of use magnification, 1121 Rangefinders, 1350–1351 Rank ordering, quality metrics, 607–608 Ranking operations, image processing, 578–580, 583 RasMol software, 707 Raster, 31, 35, 42–43, 1051 Raster effect, 72 Raster3D, 707 Rate conversion, television, 1380 Rating techniques, quality metrics, 608–609 Raw stock, 1051 Ray tracing, 1076 finite, 1083 paraxial, 1074–1075, 1078, 1083, 1085 Rayleigh criterion, 245–246, 249 Rayleigh function, 94 Rayleigh–Gans–Debye scattering, 251–252 Rayleigh range, flow imaging and, 392 Rayleigh resolution limit, 1082, 1087 Rayleigh scattering, 252–253, 257–259, 391 flow imaging and, 410, 411, 412, 416 human vision and, 548 lidar and, 874, 876–880 flow imaging and, 415 Reach-through bias, particle detector imaging, 1166 REaD color printing, 328–329 Read noise, flow imaging, 395, 396 Reagents, instant photography, 828–829 Real time, 1051 Real time imaging, ultrasonography, 1428–1429 Receivers lidar and, 872–873 radar and over-the-horizon (OTH) radar, 1147, 1151 Reception, in magnetic resonance imaging (MRI), 979–980 Receptive field, human vision, 563 Reciprocity failure, silver halide, 1286–1288 Reciprocity law, 1051 Recombination, in silver halide, 1275, 1281 Recommended practice, 1051
INDEX
Reconstruction of image algebraic reconstruction technique (ART), 1407–1408 in magnetic resonance imaging (MRI), 987–988 SPECT imaging, 1316–1321 in tomography, 1407–1408 wavelet transforms in, 1448 Reconstructive granulometries, 441–442 Recording systems, 504–505, 873 Rectangular piston, baffled, 8–9 Red green blue (RGB) system, 105–106, 531–534 in cathode ray tube (CRT), 173, 174 in color image processing, 101–102 in color photography, 123–142 in feature recognition and object classification in, 358 HSI conversion in, 112–114 image processing and, 578, 580 in motion pictures, 1052 in search and retrieval systems, 619 in television, 147–148 in thresholding and segmentation, 641 Redshift (See also Doppler shift), 686, 772 Reduction dye release, in instant photography, 839–840 Reduction printing, 1051 Reduction sensitization, silver halide, 1293 Redundancy, compression, 155 Reed Solomon coding, 1390 Reel 3D Enterprises, 1333 Reel band, 1051 Reference carriers, television, 1365 Reference white, 102 Reflectance, 50–53, 61, 68, 236–237, 527–530, 558, 610, 611, 803, 1072 in forensic and criminology research, 717 in scanning acoustic microscopy (SAM), 1235–1244 Reflected light microscopy, 1124–1127 Reflecting prisms, 1073 Reflection, 233–239, 267, 527–529, 1072, 1075 Bragg, 244 conducting surface, 239 ground penetrating radar and, 464–466, 468 holography in, 508 human vision and, 553–554, 561
in scanning acoustic microscopy (SAM), 1235–1244 in ultrasonography, 1415–1416 Reflection coefficient, in scanning acoustic microscopy (SAM), 1235–1244 Reflection grating, X-ray telescopes, 1505–06 Reflection holograms, 508 Reflective liquid crystal displays, 965–966 Reflectivity factor, radar, 1453, 1454–1458 Reflex cameras, 1345 Reflex radiography, art conservation and analysis using, 675–676 Refraction, 233–239, 1075, 1131–1132 dispersion vs., 234–235 ground penetrating radar and, 468 index of (See Index of refraction) in motion pictures, 1051 Refractive error, 548 Refractive index (See also Index of refraction), 512, 1075, 1079, 1109, 1453 Region growing, thresholding and segmentation, 643–645 Regional gravity anomaly, gravity imaging, 448 Regions of influence, feature recognition and object classification, 370–371 Register tolerances, cathode ray tube (CRT), 37 Registration of image, in three-dimensional imaging, 1331 Regularization parameters, in tomography, 1407 Regularized least squares, in tomography, 1408 Rehabilitation, force imaging, 422, 424 Rehalogenating bleaches, 138 Reich, Theodore, 456 Relative aperture, 1081 Relative colorimetry, 533 Relative dielectric permittivity (RDP), 467 Relative edge response (RER), 800 Relative neighborhood graph (RNG), 370–371 Relative quantum efficiency (RQE), silver halide, 1296–1298 Relaxation parameters, in tomography, 1408 Release print, 1023, 1051 Rem jet backing, 1051 Rembrandt Intaglio Printing Co., 456
1539
Remote sensing, 210, 356 geologic imaging and, 648–650 lidar and, 869, 870 in overhead surveillance systems, 780, 782 in three-dimensional imaging, 1327 Remote Sensing Act, 1992, 780 Rendering, 1051 Repetitive flash (stroboscopic) photography, 493–494 Reset noise, flow imaging, 396 Residual gravity anomaly, 448 Residual index, in biochemical research, 699 Resin coated (RC) paper, 1210–1211 Resolution, 60, 71, 75, 84, 1073, 1082, 1087, 1100–1103, 1357, 1358 Abbe numbers in, 1100–1102 amplitude, 151 astronomy science and, 686–688 cathode ray tube (CRT), 32, 33–34 charged particle optics, 86–100 in charged particle optics, 94–100 compression and, 151 detector, 790–793 electron paramagnetic resonance (EPR) imaging for, 292 in endoscopy, 334–336 flow imaging and, 398–405 in forensic and criminology research, 712–713 ground penetrating radar and, 469–471 High-Energy Neutral Atom Imager (HENA), 1010 human vision and, 561 liquid crystal displays (LCDs), 184, 858–859 in magnetic resonance imaging (MRI), 987 in magnetospheric imaging, 1010 in medical imaging, 750–752 microscopy and, 1106 in motion pictures, 1051 in overhead surveillance systems, 789–790, 792–794 photoconductors, 1181–82 Rose model and medical imaging, 753–754 scanning capacitance microscope (SCM), 19 in scanning acoustic microscopy (SAM), 1229, 1244–1245 spatial, 151 television, 1362 temporal, 151 in tomography, 1410
1540
INDEX
Resolution, (continued ) in transmission electron microscopes (TEM), 263, 266 in ultrasonography, 1421–1424 wavelet transforms in, 1448 X-ray fluorescence imaging and, 1482–1484 Resolution limit, 1073, 1087 Resolving power, 1051, 1358 Resonance, 226, 228–233, 287 Resonance curve, 226 Resonance lidar, 883–885 Resonant coils, RF magnetic field mapping, 1223–1227 Resonant fluorescence, 259 Resonant frequency, 228 Resonators, 223 Response time, in field emission displays (FED), 387 Responsitivity, 54–56 human vision and color vision, 529–531 photodetectors and, 1187 Restoration of images (See Image restoration) Restoring force, 225 Reticulation, 1051 Retina, 65, 513–519, 522, 543, 547, 549, 552, 558–564, 746–747 Retinal image size, 1328 Retinex coding, 61, 65, 75–83 Retrace lines, television, 1362 Retrofocus, 1354 Return beam vidicon (RBV), in overhead surveillance systems, 779 Reversal film, 139, 1052 Reversal process, 1052 Reverse modeling, scanning capacitance microscope (SCM), 27 Reverse perspective, 1330 Reynolds number, flow imaging, 404, 410, 412 RF coils, in magnetic resonance imaging (MRI), 999 RF magnetic field mapping, 1223–27 RF spoiling, in magnetic resonance imaging (MRI), 992–993 Rheinberg illumination, 1128 Rhodopsin, 517, 561, 747 Ring current, energetic neutral atom (ENA) imaging, 1006–1010 Ring imaging, particle detector imaging, 1162 Ring imaging Cerenkov counters (RICH), 1162 Ring opening single electron transfer (ROSET) dye release, 840, 851 Ringing, 75 Rise time, in lightning locators, 913
Ritchey-Chretein mirrors, 783 Ritter von Stampfer, Simon, 1022 Robustness, 74 Rocking curve, 280, 282 Rods, 122, 513–519, 530–531, 551–554, 558, 560–561, 562, 746–747 Roget, Peter Mark, 1022 Roller charging, in electrophotography, 303 Ronalds, 299 Rontgen, Wilhelm C., 1475 Root mean square (rms) aberration, 1090 Root mean square (rms) granularity, 140 ROSAT telescopes, 1507 Rose model, in medical imaging, 753–754 Rotating drum and mirror cameras, 496–497 Rotating mirror framing cameras, 497–498 Rotating mirrors, in three-dimensional imaging, 1343 Rotating prism cameras, 495–496 Rotation, 167, 1052 Rotation, molecular, 216 Rotational frequency, in magnetic resonance imaging (MRI), 979 Rotational mapping, in RF magnetic field mapping, 1225 Rotogravure, 454–463 Rough cut, 1052 RS strings, in search and retrieval systems, 629 Run-length encoding, 516 Ruska, Ernst, 261 Rutherford scattering, particle detector imaging, 1157
S Saddle coils, deflection yoke, 41–42 Safety acetate film, 1023, 1024, 1052 SAFIR lightning locators, 914, 945–950 Sampling, 56, 61 compression, 153 digital watermarking and, 150 human vision and, 552, 554, 560 lightning locators, 918–921 in magnetic resonance imaging (MRI), 986–987 passband, 50, 56–61, 65 thresholding and segmentation in, 640 Sampling lattice, 50, 54, 57, 58, 59, 65, 71 Sanyo three-dimensional display, 1340–41
Satellite imaging systems, 350, 356 Advanced Very High-Resolution Radiometer (AVHRR), 759, 760–761 Advanced Visible Infrared Imaging Spectrometer (AVIRIS), 650, 787 Along Track Scanning Radiometer (ATSR), 772 Applications Technology Satellite (ATS), 757 Array of Low Energy X Ray Imaging Sensors (ALEXIS), 905, 929 cloud classification, 761–764 Defense Meteorological Satellite Program (DMSP), 890–904, 929 Earth Observing System (EOS), 772 Earth Resources Technology Satellite (ERTS), 778–779 Fast On Orbit Recording of Transient Events (FORTE), 890–904, 929 feature recognition and object classification in, 352–353 geologic imaging and, 647–661 Geostationary Meteorological Satellite (GMS), 760 Geostationary Operational Environmental Satellite (GOES), 760, 778 gravity imaging and, 444–454 IKONOS satellite, 780 image processing and, 589 IMAGE, 1018 imaging satellite elevation angle (ISEA) in, 791 in overhead surveillance systems, 773–802 infrared in, 758 Landsat, 778, 787 Lightning Imaging Sensor (LIS), 890–904, 929, 932–935 lightning locators, 890, 928–935 magnetospheric imaging, 1002–1021 measurement in, 758–759 meteorology, 757–773 Meteorological Satellite (METEOSAT), 760 Multiangle Imaging Spectroradiometer (MISR), 772 multiangle viewing instruments, 772 multispectral image processing and, 101 NIMBUS, 778 oceanography, 760
INDEX
Optical Transient Detector (OTD), 890–904, 929–932 Polarization and Directionality of Earth’s Reflectance (POLDER), 772 Systeme Probatoire d’Observation de la Terre (SPOT), 779–780 Television and Infrared Observational Satellite (TIROS), 757, 777 Thematic Mapper, 648, 653, 654, 657, 779 time delay integration (TDI), 1018 Tropical Rainfall Measuring Mission (TRMM), 660, 771–772, 890–904, 929, 932–935, 1473 X-ray Evolving Universe Satellite, 1509 Saturation, 103, 111, 117, 119, 578, 580, 618, 994–995, 1043, 1052 Sawtooth irradiance distribution, 1092 Scaling, digital watermarking, 167 Scaling factors, television, 1367 Scaling methods, quality metrics, 607–608 Scalograms, 1444 Scan a Graver, 456, 461 Scanned probe microscopy (SPM), 1248 Scanners, 574 art conservation and analysis using, 663 calibration of, 603 in forensic and criminology research, 709 in infrared imaging, 804, 805–806 in meteorological research, 769 multispectral, 360 in overhead surveillance systems, 779 push broom type, 806 quality metrics and, 602–603 whisk broom type, 806 Scanning, 1072 high-definition TV (HDTV), 147 image formation in, 571, 573–574 interlaced, 146–147 k space, 574 odd and even field, 146 pattern, 573–574 progressive, 146–147 television, 146–148, 1359, 1362, 1366–1367 ultrasonography, 1413–1415 X-ray fluorescence imaging and, 1478–1479, 1482 Scanning acoustic microscopy (SAM), 1128–1148
Scanning capacitance microscope (SCM), 16–31 Scanning capacitance spectroscopy, 21 Scanning electrochemical microscopy (SECM), 1248–1259, 1248 Scanning electron microscope (SEM), 23, 87–88, 262, 274–278, 477, 1243 Scanning evanescent microwave microscope (SEMM), 28 Scanning ion microscopy (SIM), 477 Scanning Kelvin probe microscope (SKPM), 16, 28 Scanning lines, television, 1359 Scanning microwave microscope (SMWM), 16, 28 Scanning transmission electron microscope (STEM), 87, 93, 262, 276–278 Scanning transmission ion microscope (STIM), 479 Scattering, 51, 242, 244, 249–253, 256–260, 282, 283, 285, 286, 391, 1072 in biochemical research, 698 flow imaging and, 397–398, 410, 411, 415–416 ground penetrating radar and, 471 holography in, 504 human vision and, 548, 549 image formation in, 571 lidar and, 874–875 multiple coulombic scattering (MCS), 1157 in overhead surveillance systems, 785 particle detector imaging and, 1157 radar, 1145–1146, 1451 Rutherford scattering, 1157 in scanning acoustic microscopy (SAM), 1243 SPECT imaging, 1323–1324 in transmission electron microscopes (TEM), 269 in ultrasonography, 1415–1416 Scattering angle, 250 Scattering plane, 251 Scene, 1052 Scherzer defocus, 272 Schlieren images, 405–408, 412, 501–504 Schottky diode, 19, 1172–1174, 1190, 1201 Schottky thermal field emitter, in charged particle optics, 90 Schulze, Johnann Heinrich, 1345
1541
Science (See also Astronomy; Biochemistry; Medicine and medical research), 742 Scientific Working Group on Imaging Technologies (SWGIT), 719, 741 Scintillation, 688, 1158, 1313–1314 Scintillation cameras, SPECT imaging, 1313–1314 Scintillation tracking devices, particle detector imaging, 1158–1168 Scintillator detectors, in neutron imaging, 1062–1064 Scintillators, in radiographic imaging, 1067, 1068 Scope, 1052 Scorotron charging, in electrophotography, 303 Scorotrons, 1176 Scotopic (rod) vision, human, 122, 515, 747 Scotopic luminous efficiency function, 555 Scrambling, 60 Screened images, 455 Screw threads, microscopy, 1116 Scrim, 1052 Script, 1052 Sealing glass (frit), in field emission displays (FED), 387 Seaphone three-dimensional display, 1339–1340 Search and retrieval systems, 616–637 Search engines, in search and retrieval systems, 633 SECAM standards, 146–148, 1052, 1367–1371 Second order radiative process, 256 Secondary electron (SE) imaging, in scanning electron microscopes (SEM), 275–276 Secondary ion mass spectroscopy (SIMS), 477–491 Secondary maxima, 245 Security, digital watermarking, 159–161 Segmentation, 615 in color image processing, 119–120 human vision and, 568–569 image processing and, 587 in search and retrieval systems, 622, 625 Seidel polynomials, 542 Selection rules, atomic, 253 Selectivity, human vision, 565–567 Selenium, 300, 301, 1170 Sellers, Coleman, 1022
1542
INDEX
Semiconductor detectors, 1064–65, 1163–1165, 1168 Semiconductors, 22–23, 1183–1208 Semigloss surfaces, 528 Sensitivity, 1052 astronomy science and, 688–690 in magnetic resonance imaging (MRI), 983 in radar and over-the-horizon (OTH) radar, 1147 silver halide, 1261, 1282, 1284–1285 in SQUID sensors, 14 Sensitivity or speed of film, 124, 139 Sensitization, in photographic color display technology, 1215–1216, 1288–1293 Sensitizers, color photography, 124–125 Sensitometer, 1052 Sensitometry, silver halide, 1262 Sensors, 101, 356 active pixel sensors (APS), 1199–1200 capacitance, 17–18 CMOS image sensors, 1199–1200 force imaging and, 420, 422–424 monochrome image processing and, 100 in overhead surveillance systems, 787–789 scanning capacitance microscope (SCM), 17–18 Separation light, 1052 Separation masters, 1052 Sequence, 1052 Sequential color TV, 1365 Sequential frequency modulation, 1367–1368 Series expansion methods, in tomography, 1406–1409 Serrations, television, 1360 Set, 1052 Set theory, 430 Setup level, television, 1362 70 mm film, 1025 Sferics, 890 Shading, 3, 58, 65, 82, 1328 Shadow mask, 31, 36–38, 44, 47, 173, 825–826 Shadowgraphs, 405–408, 501, 743 Shadowing, 1328 Shadows, 51, 53, 58, 75, 82 Shallow electron trapping (SET) dopants, 1215 Shannon’s law, 49, 63 Shannon’s theory of information, 99 Shape analysis-based search and retrieval systems, 625–628 Shape factor, 357–358
Shape of objects, in feature measurement, 347–350 Sharpness, 71, 75, 81, 140, 1052, 1347, 1357 in color photography, 137 in high-speed photography, 492 in image processing, 590–591 quality metrics and, 598–616 silver halide and, 1304 Sheet-fed gravure, 460 Shielding, ground penetrating radar, 468 Shore A scale, gravure printing, 459 Short, 1052 Shortwave broadcast, radar and over-the-horizon (OTH) radar, 1142 Shortwave fadeout, 1145 Shot, 1052 Shot noise, flow imaging, 395 Show Scan, 1031 Shutter, 492–493, 1027–1028, 1036, 1052, 1351–1352 Shuttle Imaging Radar, 649 Shuttle Radar Topographic Mapping Mission (SRTM), 660 Sibilance, 1052 SiC, scanning capacitance microscope (SCM) analysis, 22 Sidebands, television, 1362, 1366, 1389 Signal coding (See also Encoding), 49–51, 61–62, 84 Signal detection, particle detector imaging, 1168 Signal-induced noise (SIN), lidar, 873 Signal levels, television, 1361–1362 Signal processing, in digital watermarking, 161, 171 in human vision and color vision, 516–518 in radar and over-the-horizon (OTH) radar, 1147, 1151–1152 signal propagation model, lightning locators, 937 signal to noise ratio, 50 signal to noise ratio (SNR), 60, 64–66, 74, 81, 84 in charged particle optics, 99 electron paramagnetic resonance (EPR) imaging for, 289 in flow imaging, 393, 394–397 in magnetic resonance imaging (MRI), 987, 996 in medical imaging, 749 in overhead surveillance systems, 794–795 in radar and over-the-horizon (OTH) radar, 1147, 1150 silver halide, 1303
in sound systems, 1388 in tomography, 1410 in X-ray telescopes, 1497 Signal transduction, in scanning electrochemical microscopy (SECM), 1249–1253 Silicon dioxide, scanning capacitance microscope (SCM) analysis, 23 Silicon drift detectors, particle detector imaging, 1167 Silicon nitride, scanning capacitance microscope (SCM) analysis, 23 Silicon photoconductors, 1204–1205 Silicon technology, 384–385 Silicon transistors, scanning capacitance microscope (SCM) analysis, 23 Silicon Video Corp. (SVC), 377 Silver assisted cleavage dye release, instant photography, 840, 851 Silver clusters and development, silver halide, 1280–1281 Silver Dye Bleach, 127 Silver halide, 140, 1259–1309, 1345, 1356–1357 art conservation and analysis using, 661–662 in color photography, 123, 125–126, 129–130 detectors using, 1259–1309 in holography, 509 in instant photography, 827, 830–833 in motion pictures, 1052 in photographic color display technology, 1208–1222 secondary ion mass spectroscopy (SIMS) analysis of, 484–486 Silver nitrate, 1345 Silver oxide, 381 SIMION, 482 Simulated images, in transmission electron microscopes (TEM), 271–273 Simulations, 769–771, 1282, 1327 Simultaneous autoregressive (SAR) model, 623–624 Single electron transfer dye release, 840, 851 Single frame exposure, 1052 Single hop mode, radar and over-the-horizon (OTH) radar, 1141 Single lens reflex cameras, 1349–1350 Single perforation film, 1052 Single photon emission computed tomography (SPECT), 743, 1310–1327 Single pixel image processing, 575–576
INDEX
Single poSitive imaging, gravure printing, 461 Single station lightning locators, 907–908 Single system sound, 1052 16 mm film, 1024–1025, 1052 Size distributions, 442 Size of image, 1074–77 Size of objects, 343–344, 686–688 Sketch interface in search and retrieval systems, 618 Skin depth, 229, 230 Skip frame, 1052 Skip zone, 1141 Skunk Works, 775, 776 Sky waves, 912 Slew rate, cathode ray tube (CRT), 179–180 Slides, microscope, 1124 Slitting, 1052 Slow motion, 1052 Smith, Willoughby, 1170 Smoothing, 577, 580–583, 593, 598–616, 755–756 Snell’s law, 234, 238, 468, 1076 Snellen chart, human vision, 747 Sobel operator, image processing, 582 Society for Information Display (SID), 818 Society of Motion Picture and Television Engineers (SMPTE), 102, 1052, 1374 Sodium arc lamps, 222 Sodium double, 222 Sodium lidar, 884–885 Soft, 1053 Soft light, 1053 Soil moisture mapping, geologic imaging, 656–659 Solar wind, magnetospheric imaging, 1002–1021 Solid state detectors (SSD), 1007, 1477 SOLLO lightning locators, 905, 908, 909 Sound drum, 1053 Sound editing, in motion pictures, 1035 Sound effects, 1053 Sound gate, 1053 Sound head, 1037–38, 1053 Sound navigation and ranging (SONAR), 1412 Sound pressure level (SPL), 3 Sound recorder, 1053 Sound speed, in motion pictures, 1028–1029 Sound sprocket, 1053 Sound systems in motion pictures, 1031, 1033, 1037
in television, 1362, 1365, 1388–1389, 1388 Sound track, 1053 Sounding, radar and over-the-horizon (OTH) radar, 1142 Source points, 251 Space-based imaging technology, astronomy science, 691 Space exploration, magnetospheric imaging, 1002–1021 Space Infrared Telescope Facility (SIRTF), 690, 691–692 spacers, in field emission displays (FED), 381–382, 386 Spallation Neutron Source (SNS), 1057 Sparrow resolution limit, 1087 Spatial domain, image processing, 577, 594 Spatial filters, 509, 1100 Spatial frequency, 248, 559–562, 565–566, 1098, 1103 Spatial frequency response (SFR), 50, 56–61, 62, 63, 66, 68, 70–74, 72, 79, 80 Spatial homogeneity, in cathode ray tube (CRT), 181–182 Spatial parallelism, 562, 564 Spatial relationship, in search and retrieval systems, 628–630 Spatial resolution, 151 in medical imaging, 750–752 in microscopy, 1136 in overhead surveillance systems, 789–790 in scanning capacitance microscope (SCM), 19 in X-ray fluorescence imaging, 1482–84 Spatial response (SR), 50, 54–56, 68, 78, 79, 80 Spatial uniformity, in ultrasonography, 1424 Spatial visual processing, human vision, 558–570 Special effect, 1034, 1053 Specimens, microscopy, 1108–1109 Speckle, in ultrasonography, 1420–1421 Spectra, microscopy, 1108 Spectra instant film, 847 Spectral filters, 55, 872 Spectral imaging, feature recognition and object classification, 356–357 Spectral lines, velocity analysis using, 685–686 Spectral luminosity function (SLF), 554–555 Spectral power density/distribution (SPD), 100, 524–527, 618
1543
Spectral purity, radar and over-the-horizon (OTH) radar, 1147 Spectral radiant exitance, 222 Spectral radiant flux density, 222 Spectral response, in feature recognition and object classification, 358 Spectral sensitization, silver halide, 1294–1299 Spectrometer, 239, 244, 481–482, 650, 787, 970, 1403 Spectroradiometer, 185–186, 524 Spectroscopy, 571 Constellation X mission, 1509 electron paramagnetic resonance (EPR) imaging for, 289 in endoscopy, 338 high-resolution secondary ion mass, 477–491 quantum terahertz biocavity spectroscopy, 1403 terahertz electric field imaging and, 1393–1404 Spectrum, 57, 1053 Specular surfaces, 234 Speed, 1053 Speed of film, 124, 139, 1023 Speed of light, 224–225 Spherical aberration, 92, 98, 1088–1089, 1117, 1123 Spherical waves, 214 Spherics, 890 Spider stop, microscopy, 1128 Spin angular momentum, 217 Spin density, in magnetic resonance imaging (MRI), 983–984 Spin echo in magnetic resonance imaging (MRI), 981, 992 in RF magnetic field mapping, 1125–1126 Spin states, atomic, 217 Spindt emitters, 385 Spindt technique, 385 Splice, 1053 Splicer, 1053 Splicing tape, 1053 Spline fits, thresholding and segmentation, 644–645 Split and merge techniques, thresholding and segmentation, 645–646 Splitters, holography, 509–509 Spontaneous emission, 253–254 Spontaneous Raman scattering, 411 Spool, 1053 Sporadic E, 1144 Sports medicine, force imaging, 422 SPOT high-resolution visible (HRV) imaging systems, 649, 655
1544
INDEX
Spotlight, 1053 Spotlights, radar, 1146 Sprockets, in projectors, 1036, 1053 Sputtering, 383, 478 Square root integral (SQRI), quality metrics, 606 Squarilium, 1179 SQUID sensors, analog and digital, 9–15 Stabilization of dyes, 831, 841–842 Staircase patterns, 75 Standard definition TV (SDTV), compression, 157 Standard illuminants, CIE, 103–104 Standard Observer, 618 Standards converters, television, 1373 Stanford Research Institute, 375 Stanford, Leland, 1022 Static electricity, 1053 Stationarity, 1085–1086 Statistical redundancy and compression, 150–156 Steadicam, 1031 Steering, 4–5 Stefan–Boltzmann law/constant, 222, 804 Steganography, digital watermarking and vs., 160 Step response function (SRF), 402 Stereo display technologies, 1327–1344 Stereo pairs, in three-dimensional imaging, 1329–1330 Stereo window, 1329 StereoGraphics three-dimensional imaging, 1333 StereoJet three-dimensional imaging, 1332 Stereolithography three-dimensional imaging, 1327 Stereomicroscope, 1106 Stereophonic, 1053 Stereoscopic vision, 566 Stiffness matrix, 627 Stiles–Crawford effect, 58, 513, 542, 552–554 Stiles–Holladay approximation, 548 Still photography, 491–494, 1344–58 Stimulated echo, in magnetic resonance imaging (MRI), 983 Stimulated emission, 223, 253, 254–255 Stock, 1053 Stop, 1053 Stop motion, 1053 Stops, 1354 Storage and retrieval systems art conservation and analysis using, 661–682
in forensic and criminology research, 716–717 Methodology for Art Reproduction in Color (MARC), 664 in motion pictures, 1038–39 Visual Arts System for Archiving and Retrieval of Images (VASARI), 663–664 Storage systems, secondary ion mass spectroscopy (SIMS), 482–484 Storm tracking, 655–656, 765–769 Storyboard, 1053 Stratospheric Observatory for Infrared Astronomy (SOFIA), 692–693 Streak cameras, 499–500 Strehl ratio, 543, 544, 1090 Strike filtering, gravity imaging, 452 Strip cameras, 500 Strip scintillation detectors, particle detector imaging, 1165–67 Stripe, magnetic, 1053 Stroboscopic photography, 492, 493–494 Structural dilation, 433 Structural erosion, 432–433 Subband/wavelet coding, compression, 154 Subcarriers, television, 1362, 1365 Subclustering, in feature recognition and object classification, 367–370 Subjective quality metrics, 602, 606–610 Sublimation printers, 189–190, 194–195 Subtraction, image processing, 590 Subtraction, Minkowski, 432, 612 Subtractive color, 833–841, 1053 Subtractive color matching, 102 Subtractive color mixing, 127–128, 139 Sulfite developer, color photography, 130 Sulfonamidonaphthol, 837, 838 Sulfonamidophenol, 838 Sulfoselenide, 301 Sulfur plus gold sensitization, silver halide, 1292–1293 Sulfur sensitization, silver halide, 1289–1292 Super xxx films, 1025 Super Panavision, 1053 Superadditivity, silver halide, 1303 Superconducting quantum interference devices (SQUIDs), 9–15, 976 Superconductors, 9–15, 484, 486–487, 1106 Superposition, 239–240 Superscope, 1053
Supersensitization, silver halide, 1298–1299 Supersonic flow, flow imaging, 409 Supertwisted nematic (STN) liquid crystal displays, 961–962 Surface acoustic waves (SAW), in scanning acoustic microscopy (SAM), 1236–1243 Surface stabilized ferroelectric LCD (SSFLC), 965 Surface wave radar, 1141 Surround function, 77, 79 Surround speakers, 1054 Surveillance et Alerte Foudre par Interferometrie Radioelectriquie (See SAFIR) Surveillance imaging in forensic and criminology research, 709, 714–715 overhead, 773–802 radar and over-the-horizon (OTH) radar, 1141–1153 SVGA video, in field emission displays (FED), 382 Swan, J.W., 455 Sweetening, 1054 Swell, 1054 Swiss PdbViewer, 708 SX70 instant film, 844–847 Symmetry, in compression, 152 Sync pulse, 1054, 1360 Sync sound, in motion pictures, 1033 Synchroballistic photography, 500–501 Synchronization high-speed photography and, 493 in motion pictures, 1054 in television, 1360, 1375 Synchronizer, 1054 Synchrotron radiation (SR), 221, 1476 Synthetic aperture radar (SAR), 356, 648, 789 Systeme Probatoire d’Observation de la Terre (SPOT), 779–780
T T-grain emulsion, 1054 T1/T2 relaxation, MRI, 983–984, 988–991 Tail ends, 1054 Take, 1054 Tamper detection, digital watermarking, 159 Tandem color printing, 328 Tape splice, 1054 Tapetum, 513 Taylor procedure, 3 Taylor series, 227 Technicolor, 1024 Technirama, 1031, 1054
INDEX
Techniscope, 1054 Telecine, 1054 Telemacro lens, 1354–1355 Telephotography, 59 Telephoto lens, 1347, 1354 Telescopes (See also Astronomy), 210, 1072 Apollo Telescope Mount, 1507 astronomy science and, 682–693 Atacama Large Millimeter Array (ALMA), 693 Chandra Observatory, 1508 Constellation X Observatory, 693, 1509 Einstein Observatory Telescope, 1507 Giant Segmented Mirror Telescope, 693 Kirkpatrick Baez telescopes, 1502–1503 lidar and, 871–872 limitations on, 688, 690–691 liquid mirror telescopes (LMT), 872 mirrors for, 691 multilayer telescopes, 1503 Next Generation Space Telescope (NGST), 693 in overhead surveillance systems, 783 ROSAT telescopes, 1507 Space Infrared Telescope Facility (SIRTF), 690, 691–692 Stratospheric Observatory for Infrared Astronomy (SOFIA), 692–693 Terrestrial Planet Finder, 693 thin mirror telescopes, 1501–1502 TRACE telescopes, 1507–1508 Very Large Array Radio Telescope, 693 Wolter, 1499–1501 X-ray Evolving Universe Satellite, 1509 X-ray interferometric telescopes, 1503–1504 X-ray telescope, 1495–1509 XMM Newton telescope, 1508 Television (See also Motion pictures; Video), 59, 1021 ATSC Digital Television Standard for, 1359, 1382–1389 black-and-white, 1359 broadcast transmission standards, 1359–1393 cathode ray tube (CRT) using, 47 chromaticity in, 148 component systems in, 148–150 compression in, 150–157
digital watermarking and, 146–148 digitized video and, 149–150 high-definition (HDTV), 41, 42, 47, 147, 151, 153, 157, 1039, 1047, 1382, 1390 image aspect ratio in, 147 image intensity in, 147–148 interlaced scanning in, 146–147 luminance in, 148 National Television System Committee (NTSC) standards for, 146–149 National Television Systems Committee (NTSC), 1359–1393 PAL standard for, 146–149, 1359–1393 progressive scanning in, 146–147 red green blue (RGB) system in, 147–148 scanning in, 146–148 SECAM standard for, 146–148, 1359–1393 standard definition TV (SDTV), 157 trichromatic color systems in, 147–148 Television and Infrared Observational Satellite (TIROS), 757, 777 Temperature photodetectors and, 1188–1190 in scanning acoustic microscopy (SAM), 1230 Temperature calculation, in infrared imaging, 814–815 Temperature effects, flow imaging, 411–416 Temperature mapping, in infrared imaging, 812–815 Temperature measurement, planar laser-induced fluorescence (PLIF), 863 Temperature, color, 103, 525 Tempone, 289 Temporal dependencies, 180, 184 Temporal homogeneity in cathode ray tube (CRT), 182 Temporal lobe, 569 Temporal resolution, 151, 1424 Terahertz electric field imaging, 1393–1404 Terminator, radar, 1145 TERRA, 659 Terrain correction, gravity imaging, 448 Terrestrial Planet Finder, 693 Test patterns, quality metrics, 603 Tetramethyl ammonium hydroxide (TMAH), 384
1545
Texas Instruments, 376–377 Text tag information, in search and retrieval systems, 617 Textile presses, 456 Textural gradient, 1328 Texture, in search and retrieval systems, 622–625 Texture processing, image processing, 583–584 Theatres, 1038 Thematic Mapper, 648, 653, 654, 657, 779 Thermal emission, 1176, 1184–1187 Thermal field emitter (TFE), 90 Thermal head, in dye transfer printing, 191–193 Thermal imaging, 810–811 Thermal Infrared Multispectral Scanner (TIMS), 650 Thermal radiation, 356 Thermal signatures, 803 Thermal sources, 222 Thermal transfer process, 189, 853 Thermally assisted fluorescence (THAF), 863 Thermionic emission, 223 Thermofax, 299 Thermograms, 802–817 Thermographic imaging, 851–854 Thermography, 802–817, 864–867 Thermoplastics, 509 Thiazolidine, 840, 841 Thickness extinction contours, 282 Thin-film technology, 383–384 in field emission displays (FED), 377, 379, 383–384 in liquid crystal displays, 957 in scanning acoustic microscopy (SAM) analysis for, 1228 Thin lens conjugate equation, 1078 Thin mirror telescopes, 1501–1502 Thin objects, in transmission electron microscopes (TEM), 270–271 Think Laboratories, 461 Thinker ImageBase, 617 Thiols, 135 Thiopyrilium, 1179 Thiosulfate bleaches, 139 35 mm film, 1022, 1024, 1054 Thomson scattering, 249–250, 256 Thread, 1054 Three-dimensional imaging, 1054, 1072, 1327–1344 in biochemical research, 694–708 Doppler radar and, 1465–1468 flow imaging and, 416–417 force imaging and, 424 ground penetrating radar and, 472–475, 476 human vision and, 566
1546
INDEX
Three-dimensional imaging, (continued ) in meteorological research, 772 in ultrasonography, 1433 Thresholding, 590, 584–589, 637–638 Thresholds, quality metrics, 609–610 Throw, 1054 Thunderstorm Sensor Series (TSS), 907, 922 Thunderstorms (See Lightning locators) Thyristor flash systems, 1348–1349 Tidal correction, gravity imaging, 447 Tight wind, 1054 Tilting, 4–5 Time delay and integration (TDI), 785, 1018 Time domain waveform analysis, 912–914 Time–energy uncertainty principle, 259 Time lapse, 1054 Time of arrival (TOA) lightning locators, 906–907, 935, 941–945 Time of flight imaging, 989–991, 1015–1016 Time parallel techniques, in three-dimensional imaging, 1331 Time projection chamber (TPC), particle detector imaging, 1160 Time sequence maps, electroencephalogram (EEG), 201–204 Time slice imaging, ground penetrating radar, 472–475 Time Zero film, 846–847 Timing, 184, 1054 Timing layer, in instant films, 832 Titanyl phthalocyanine (TiOPc), 1180–1181 Todd AO, 1031, 1054 Toe, 1054 Tomography, 1404–1411 flow imaging and, 416 ground penetrating radar and, 475–476 image formation in, 571 low resolution electromagnetic tomography (LORETA), 204–208 in medical imaging, 743 in radiographic imaging, 1068 single photon emission computed tomography (SPECT), 1310–1327 terahertz electric field imaging and, 1399–1400 Tone, 598–616, 1054
Tone burst wave mode, in scanning acoustic microscopy (SAM), 1231, 1233 Toner, in electrophotography, 301, 312, 313–315, 325–329 Top hat transforms, 430 Topographic imaging technology, 199–201 TOPS software, 708 Toroidal coils, deflection yoke, 41–42 Total internal reflectance (TIR), 238–239 Total scattering cross section, 250 Tournachon, Gaspard Felix, 773 TRACE telescopes, 1507–1508 TRACKERR, 1158 Tracking, radar and over-the-horizon (OTH) radar, 1148, 1152 Trailer, 1055 Trajectories, particle detector imaging, 1157 Trajectory effects, gravity imaging, 444 Tranceivers in magnetic resonance imaging (MRI), 999 in terahertz electric field imaging and, 1399–1400 Transducers, 1, 1418–1419, 1424–1429 Transfer function, 264, 575 Transfer process, in electrophotography, 322–324, 322 Transverse electromagnetic modes (TEM), 392 Transform coding, compression, 153–154 Transformation, compression, 153 Transfusion, in electrophotography, 322 Transistors, scanning capacitance microscope (SCM) analysis, 23 Transition, 1055 Transitions, atomic, 215 Translation invariant operators, 431–436 Transmission, 527–529, 548–550, 561, 1404 Transmission electron microscopes (TEM), 23, 87, 93, 262–274 Transmission grating, X-ray telescopes, 1505 Transmission holograms, 507–508 Transmission line model (TLM), 937 Transmittance, 51, 54, 58, 236–237, 527–529, 783, 803, 1072, 1095 Transmitters, lidar, 871–872 Transparency views, in three-dimensional imaging, 1333
Transverse chromatic aberration, 545 Transverse electric or magnetic waves, 235 Transverse magnification, 1076 Transverse viewing, in three-dimensional imaging, 1336 Transverse waves, 212 Trapping, scanning capacitance microscope (SCM) analysis, 23 Traps, silver halide, 1273–75 Traveling ionsopheric disturbance (TID), 1144 Traveling matte, 1055 Trellis coding, 1390 TREMBLE lightning locators, 909 Triangle, 1055 Triangulation, 571, 572–573, 908 Triarylmethyl radicals, 289 Triboelectrification, 944 Trichromatic color theory, television, 147–148 Trichromatic color vision, 567 Trichromatic receptors, human vision and color vision, 519 Tricolor image processing systems, 101–102 Triiodide, 1213 Trims, 1055 Trinitron electron gun, 40 Triphenylamine, 1179 Tripods, 1030 Tristimulus values, 102, 148, 531–534, 537 Tritanopia, 522–523 Tropical Rainfall Measuring Mission (TRMM), 660, 771–772, 890–904, 929, 932–935, 1473 Truck, 1055 Trucks, 1030 True color mode, cathode ray tube (CRT), 174 TSUPREM4 calibration, 28 Tube length, microscopy, 1115 Tungsten filaments, 222 Tungsten light, 1055 Tunics of human eye, 746 Turbulent flow, flow imaging, 405 Twinning, silver halide, 1266–1267 Twisted nematic (TN) liquid crystal displays, 959–961 Two-beam dynamic theory for crystals, 281–284, 281 Two-dimensional Fourier transforms, 1104–1105 Two-dimensional imaging backlight systems for, 1339 ground penetrating radar and, 471–472 in infrared imaging, 809
INDEX
in magnetic resonance imaging (MRI), 983, 987–988 in search and retrieval systems, 623, 628–630 terahertz electric field imaging and, 1398 Two-point resolution limit, 1087 Two-positive imaging, gravure printing, 461 Two-scale relations, 1446 Two-slit interference, 242–243 Type 500/600 instant films, 847 Type C videotape, 1055 Type K/T/U/Y or Z core, 1055
University of Chicago, secondary ion mass spectroscopy (SIMS) in (UC SIM), 478–479 Unmanned aerial vehicles (UAVs), 780 Unsharp masks, in forensic and criminology research, 725 Unsqueezed print, 1055 Upatnieks, J., 504 Upward continuation, gravity imaging, 451 Useful yield, secondary ion mass spectroscopy (SIMS), 477 UVW and U*V*W* coordinate systems, 108
U U space representation, 5–6 U2 aerial surveillance planes, 775–776 UC SIM, 478–484 Ultra high-frequency (UHF) television, 1362 Ultra Panavision, 1031 Ultramicroelectrodes (UME), 1248–1259 Ultrasonic cleaner, 1055 Ultrasonography, ultrasound, 1412–1435 in endoscopy, 338–340 image formation in, 571, 573 in magnetic resonance imaging (MRI) vs., 983 in medical imaging, 745 in scanning acoustic microscopy (SAM), 1228 Ultraviolet radiation, 218, 219, 239, 356, 1055 art conservation and analysis using, 661, 668–672 electron paramagnetic resonance (EPR) imaging for, 296 extreme ultraviolet imaging (EUV), 1005–06 far ultraviolet imaging of proton/electron auroras, 1016–1020 fluorescence microscopy, 1135–37 gravure printing, 461 photodetectors and, 1196 radar and over-the-horizon (OTH) radar, 1143 Ultraviolet catastrophe, 211 Uncertainty principle, 215, 259 Uncrossed viewing, in three-dimensional imaging, 1336 Undulator magnet, 221 Uniform Chromaticity Scale (UCS), 535 Universal leader, 1055
V V number (See also Abbe number), 234 Vacuum and wave equation, 212 Value, 103 Valve rollers, 1055 Van Allen belts, energetic neutral atom (ENA) imaging, 1006–1010 Van Dyke Company, 456 Variable area sound track, 1055 Variable density sound track, 1055 Variable length coding, 1388 Varifocal mirrors, in three-dimensional imaging, 1342–1343 Vectograph three-dimensional imaging, 1331 Vector quantization, compression, 154, 633 Vegetation, geologic imaging, 653 Velocimetry, 413–416 Velocity effects, 228, 764 flow imaging and, 413–416 planar laser-induced fluorescence (PLIF) in, 863 spectral line analysis of, 685–686 Verifax, 299 Versatec, 299 Vertical derivatives, gravity imaging, 452 Vertical disparity, in three-dimensional imaging, 1329 Vertical interval time code (VITC), 1055 Very high-frequency (VHF) television, 1362 Very Large Array Radio Telescope, 693 Vesicular films, art conservation and analysis using, 662 Vestigial sideband (VSB) television, 1362, 1389
1547
VGA video, in field emission displays (FED), 382 Vibration, molecular, 216 Vibrational relaxation, 255 Video (See also Motion pictures; Television), 1385–1388 authentication techniques, 740 cameras for, 1029–31 component video standards, 1380–82 compressed video, 1385–86 digital (See Digital video) Digital Video Broadcast (DVB), 1392 in forensic and criminology research, 709–714 format conversion in, 720–722 group of pictures (GOP) in, 1386 high-speed photography and, 498–499 I, P, and B frames in, 1387 photoconductors cameras, 1174 Polachrome, 848 Polavision, 848 surveillance imaging using, 714–715 Video assist, 1031 VideoDisc, 16 Videophone, 156–157, 156 Videotape editing, 1035 Viewer, 1055 Viewfinders, 1029, 1346 Viewing angle, 387, 967 Viewing distance, 1347 ViewMaster, 1330, 1331 Vignetting, 1055 Virtual image, 1040, 1328, 1330 Virtual phase CCD (VPCCD), 1198 Virtual states, 259 Virtual transitions, 258–259 Visible light, 356, 1072 Visible Human Project, search and retrieval systems, 616–617 Visible light, 218, 219, 665–666, 782, 803, 1393 Vision tests, 747 VisionDome, 1335–1336 VistaVision, 1031 Visual angle, human vision, 747 Visual areas, 516, 518, 563, 565, 569 Visual Arts System for Archiving and Retrieval of Images (VASARI), 663–664 Visual cortex, 65, 72, 563–570 Visual field, human vision, 566 Visual field mapping, in magnetic field imaging, 975 Visual information rate, 50, 73–74 Visual magnification, 1077 Visual quality, 74–75 Visualization technology, 773, 1327
1548
INDEX
VisualSEEk, 618, 621–622, 624, 627, 629, 630 Vitascope, 1022 Voice over, in motion pictures, 1033, 1055 Voids, 604 Volcanic activity, geologic imaging, 651 Volume grating, holography, 508 Volume imaging, holography, 507–508 Volume Imaging Lidar, in meteorological research, 769 Volumetric displays, in three-dimensional imaging, 1341–1343 von Ardenne, Manfred, 262 von Laue interference function, 279 von Uchatius, Franz, 1022 VORTEX radar, 1471 VREX micropolarizers, in three-dimensional imaging, 1334–1335
W Wall eyed, in three-dimensional imaging, 1330 Warm up, liquid crystal displays (LCDs), 184 Waste management, geologic imaging, 656–659 Water Cerenkov counters, particle detector imaging, 1162 Watermarking, digital (See Digital watermarking) Watershed transforms, 430, 587, 646 Watts, 524 Wave aberration, 542–544 Wave equation, in transmission electron microscopes (TEM), 212 Wave fronts, 212, 243, 1083, 1084, 1086, 1090 Wave number, 213 Wave propagation, 220–231 in ground penetrating radar, 466, 467–469 Wave vector transfer (Q), 251 Wave vs. particle behavior of light, 210–211 Waveform monitors, television, 1361 Waveform repetition frequency (WRF), radar and over-the-horizon (OTH) radar, 1147, 1150 Waveforms, 212, 285, 1147, 1150–1151 Waveguides, radar, 1452 Wavelength, 212, 1072, 1109 Wavelength analysis, 448 Wavelet coding, compression, 154
Wavelet transforms, 1444–1450 Wavelets, 243, 622 Wax melt printers, 190–191, 195 Weak beam images, in transmission electron microscopes (TEM), 270 Weather radar, 1450–74 Weave, 1055 Web-fed presses, gravure printing, 459–460, 459 Weber–Fechner law, 747 Weber fractions, quality metrics, 611 Weber’s law, 611 Wedgewood, Josiah, 1345 Weighting, amplitude, 3 Wet-gate printer, 1055 WHAT IF software, 708 Whisk broom scanners, 806 White, 219 White balance, 520 White field response, flow imaging, 397 White light, 219 White point normalization, 533 White uniformity, cathode ray tube (CRT), 35 White, reference, 102 Whole field imaging, 1072 Wide-angle lens, 1347, 1354 Wide-screen, 1031, 1055 Wien displacement law, 222, 803 Wiener filters, 49, 69, 71, 73, 76, 80, 167, 1322 Wiener matrix, 60 Wiener restoration, 68, 70–71, 74–75, 82 Wiggler magnet, 221 Wild, 1055 Wind profiling radar, 1149, 1469–1471 Winding, 1055 Window thermal testing, using infrared imaging, 815 Wipe, 1055 Wire chamber scintillation tracking, 1159–1162, 1168 WKB approximation, in charged particle optics, 89 Wold components, in search and retrieval systems, 623 Wollaston prisms, 1106 Wolter telescopes, 1499–1501 Work print, 1056 Working distance, microscopy, 1116 WorkWall three-dimensional imaging, 1335 World Geodetic System, 445 Wright, Wilbur, 773 Write black, in electrophotography, 317
Write gates, SQUID sensors, 13–14 Write white, in electrophotography, 317 Wynne, Klass, 1400–01
X X-ray analysis (EDX), 262, 478 X-ray astronomy, 219 X-ray crystallography, 696–699 X-ray diffractometers, 244 X-ray Evolving Universe Satellite, 1509 X-ray fluorescence (XRF), 676–677, 1475–1495 X-ray interferometric telescopes, 1503–04 X-ray telescope, 239, 1495–1509 X-ray telescopes, 239 X-ray transform, 1404 X-rays, 210–211, 218, 219, 221, 224, 239, 242, 249, 256–260, 272, 350, 590, 803, 1067, 1393 Array of Low Energy X Ray Imaging Sensors (ALEXIS), 905, 929 art conservation and analysis using, 672–680 astronomy science and, 683, 688 in biochemical research, 694, 696–699, 705 Bragg reflection in, 244 image formation in, 572 in medical imaging, 743, 745, 748, 752–753, 756 non-silver output in, 676 phosphor thermography, 865 photodetectors and, 1197 radar and over-the-horizon (OTH) radar, 1143 sources of, 1477 in tomography, 1404 X-ray Evolving Universe Satellite, 1509 X-ray fluorescence imaging, 1475–1495 X-ray interferometric telescopes, 1503–1504 X-ray telescope, 1495–1509 Xenon arc, 1056 Xenon lamps, 1037 Xerography (See also Electrophotography), 299, 1174 Xeroradiography, 312 Xerox copiers, 574 Xerox Corporation, 299 Xerox Docu Tech, 300, 302, 303 XMM Newton telescope, 1508 XYZ coordinate system, 107, 532, 537, 619
INDEX
Y
Z
Yellow, 1056 Yellow couplers, color photography, 134 YIQ coordinate system, 106–107, 149 Young, Thomas, 242–243 Yttrium aluminum garnet (YAG) laser, 391 YUV coordinate system, 106–107, 149
Z contrast imaging, 277 Z dependence, 478 Zeeman effect, 217, 218 Zeeman states, 217 Zeiss, Carl, 1106 Zernicke moments, 627 Zernike polynomials, 542–543 Zernike, Frits, 1106, 1128
1549
Zero frame reference mark, 1056 Zero padding, 68 Zero power (afocal) systems, 1074, 1079 Zinc oxide, 381 Zoetropic effect, 1022 Zoom in/out, 1056 Zoom lens, 1029 Zoylacetanilides, 134 Zweiton, 1365 Zwitterions, 124–125